linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv4 0/5] Fix compatible mmap() return pointer over 4Gb
@ 2017-01-30 12:04 Dmitry Safonov
  2017-01-30 12:04 ` [PATCHv4 1/5] x86/mm: split arch_mmap_rnd() on compat/native versions Dmitry Safonov
                   ` (5 more replies)
  0 siblings, 6 replies; 25+ messages in thread
From: Dmitry Safonov @ 2017-01-30 12:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: 0x7f454c46, Dmitry Safonov, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andy Lutomirski, Borislav Petkov, x86, linux-mm,
	Shuah Khan, linux-kselftest

Changes since v3:
- fixed usage of 64-bit random mask for 32-bit mm->mmap_compat_base,
  during introducing mmap_compat{_legacy,}_base

Changes since v2:
- don't distinguish native and compat tasks by TIF_ADDR32,
  introduced mmap_compat{_legacy,}_base which allows to treat them
  the same
- fixed kbuild errors

Changes since v1:
- Recalculate mmap_base instead of using max possible virtual address
  for compat/native syscall. That will make policy for allocation the
  same in 32-bit binaries and in 32-bit syscalls in 64-bit binaries.
  I need this because sys_mmap() in restored 32-bit process shouldn't
  hit the stack area.
- Fixed mmap() with MAP_32BIT flag in the same usecases
- used in_compat_syscall() helper rather TS_COMPAT check (Andy noticed)
- introduced find_top() helper as suggested by Andy to simplify code
- fixed test error-handeling: it checked the result of sys_mmap() with
  MMAP_FAILED, which is not correct, as it calls raw syscall - now
  checks return value to be aligned to PAGE_SIZE.

Description from v1 [2]:

A fix for bug in mmap() that I referenced in [1].
Also selftest for it.

I would like to mark the fix as for stable v4.9 kernel if it'll
be accepted, as I try to support compatible 32-bit C/R
after v4.9 and working compatible mmap() is really wanted there.

[1]: https://marc.info/?l=linux-kernel&m=148311451525315
[2]: https://marc.info/?l=linux-kernel&m=148415888707662

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: x86@kernel.org
Cc: linux-mm@kvack.org

Dmitry Safonov (5):
  x86/mm: split arch_mmap_rnd() on compat/native versions
  x86/mm: introduce mmap{,_legacy}_base
  x86/mm: fix 32-bit mmap() for 64-bit ELF
  x86/mm: check in_compat_syscall() instead TIF_ADDR32 for
    mmap(MAP_32BIT)
  selftests/x86: add test to check compat mmap() return addr

 arch/Kconfig                                   |   7 +
 arch/x86/Kconfig                               |   1 +
 arch/x86/include/asm/elf.h                     |   4 +-
 arch/x86/include/asm/processor.h               |   3 +-
 arch/x86/kernel/sys_x86_64.c                   |  32 +++-
 arch/x86/mm/mmap.c                             |  89 +++++++----
 include/linux/mm_types.h                       |   5 +
 tools/testing/selftests/x86/Makefile           |   2 +-
 tools/testing/selftests/x86/test_compat_mmap.c | 208 +++++++++++++++++++++++++
 9 files changed, 311 insertions(+), 40 deletions(-)
 create mode 100644 tools/testing/selftests/x86/test_compat_mmap.c

-- 
2.11.0

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCHv4 1/5] x86/mm: split arch_mmap_rnd() on compat/native versions
  2017-01-30 12:04 [PATCHv4 0/5] Fix compatible mmap() return pointer over 4Gb Dmitry Safonov
@ 2017-01-30 12:04 ` Dmitry Safonov
  2017-02-09 13:55   ` Borislav Petkov
  2017-01-30 12:04 ` [PATCHv4 2/5] x86/mm: introduce mmap{,_legacy}_base Dmitry Safonov
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Dmitry Safonov @ 2017-01-30 12:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: 0x7f454c46, Dmitry Safonov, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andy Lutomirski, Borislav Petkov, x86, linux-mm

I need those arch_{native,compat}_rnd() to compute separately
random factor for mmap() in compat syscalls for 64-bit binaries
and vice-versa for native syscall in 32-bit compat binaries.
They will be used in the following patches.

Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
---
 arch/x86/mm/mmap.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index d2dc0438d654..42063e787717 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -65,20 +65,27 @@ static int mmap_is_legacy(void)
 	return sysctl_legacy_va_layout;
 }
 
-unsigned long arch_mmap_rnd(void)
+#ifdef CONFIG_COMPAT
+static unsigned long arch_compat_rnd(void)
 {
-	unsigned long rnd;
+	return (get_random_long() & ((1UL << mmap_rnd_compat_bits) - 1))
+		<< PAGE_SHIFT;
+}
+#endif
 
-	if (mmap_is_ia32())
+static unsigned long arch_native_rnd(void)
+{
+	return (get_random_long() & ((1UL << mmap_rnd_bits) - 1)) << PAGE_SHIFT;
+}
+
+unsigned long arch_mmap_rnd(void)
+{
 #ifdef CONFIG_COMPAT
-		rnd = get_random_long() & ((1UL << mmap_rnd_compat_bits) - 1);
-#else
-		rnd = get_random_long() & ((1UL << mmap_rnd_bits) - 1);
+	if (mmap_is_ia32())
+		return arch_compat_rnd();
 #endif
-	else
-		rnd = get_random_long() & ((1UL << mmap_rnd_bits) - 1);
 
-	return rnd << PAGE_SHIFT;
+	return arch_native_rnd();
 }
 
 static unsigned long mmap_base(unsigned long rnd)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCHv4 2/5] x86/mm: introduce mmap{,_legacy}_base
  2017-01-30 12:04 [PATCHv4 0/5] Fix compatible mmap() return pointer over 4Gb Dmitry Safonov
  2017-01-30 12:04 ` [PATCHv4 1/5] x86/mm: split arch_mmap_rnd() on compat/native versions Dmitry Safonov
@ 2017-01-30 12:04 ` Dmitry Safonov
  2017-02-11 14:13   ` Thomas Gleixner
  2017-01-30 12:04 ` [PATCHv4 3/5] x86/mm: fix 32-bit mmap() for 64-bit ELF Dmitry Safonov
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Dmitry Safonov @ 2017-01-30 12:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: 0x7f454c46, Dmitry Safonov, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andy Lutomirski, Borislav Petkov, x86, linux-mm

In the following patch they will be used to compute:
- mmap{,_legacy}_base for 64-bit mmap()
- mmap_compat{,_legacy}_base for 32-bit mmap()

This patch makes it possible to calculate mmap bases for any specified
task_size, which is needed to correctly choose the base address for mmap
in 32-bit syscalls and 64-bit syscalls.

Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
---
 arch/x86/include/asm/elf.h       |  4 +++-
 arch/x86/include/asm/processor.h |  3 ++-
 arch/x86/mm/mmap.c               | 32 ++++++++++++++++++++------------
 3 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index e7f155c3045e..120b4f3d8a6a 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -286,6 +286,7 @@ do {									\
 
 #ifdef CONFIG_X86_32
 
+#define STACK_RND_MASK_MODE(native) (0x7ff)
 #define STACK_RND_MASK (0x7ff)
 
 #define ARCH_DLINFO		ARCH_DLINFO_IA32
@@ -295,7 +296,8 @@ do {									\
 #else /* CONFIG_X86_32 */
 
 /* 1GB for 64bit, 8MB for 32bit */
-#define STACK_RND_MASK (test_thread_flag(TIF_ADDR32) ? 0x7ff : 0x3fffff)
+#define STACK_RND_MASK_MODE(native) ((native) ? 0x3fffff : 0x7ff)
+#define STACK_RND_MASK STACK_RND_MASK_MODE(!test_thread_flag(TIF_ADDR32))
 
 #define ARCH_DLINFO							\
 do {									\
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 1be64da0384e..52086e65b422 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -862,7 +862,8 @@ extern void start_thread(struct pt_regs *regs, unsigned long new_ip,
  * This decides where the kernel will search for a free chunk of vm
  * space during mmap's.
  */
-#define TASK_UNMAPPED_BASE	(PAGE_ALIGN(TASK_SIZE / 3))
+#define _TASK_UNMAPPED_BASE(task_size)	(PAGE_ALIGN(task_size / 3))
+#define TASK_UNMAPPED_BASE	_TASK_UNMAPPED_BASE(TASK_SIZE)
 
 #define KSTK_EIP(task)		(task_pt_regs(task)->ip)
 
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index 42063e787717..98be520fd270 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -35,12 +35,14 @@ struct va_alignment __read_mostly va_align = {
 	.flags = -1,
 };
 
-static unsigned long stack_maxrandom_size(void)
+static unsigned long stack_maxrandom_size(unsigned long task_size)
 {
 	unsigned long max = 0;
 	if ((current->flags & PF_RANDOMIZE) &&
 		!(current->personality & ADDR_NO_RANDOMIZE)) {
-		max = ((-1UL) & STACK_RND_MASK) << PAGE_SHIFT;
+		max = (-1UL);
+		max &= STACK_RND_MASK_MODE(task_size == TASK_SIZE_MAX);
+		max <<= PAGE_SHIFT;
 	}
 
 	return max;
@@ -51,8 +53,8 @@ static unsigned long stack_maxrandom_size(void)
  *
  * Leave an at least ~128 MB hole with possible stack randomization.
  */
-#define MIN_GAP (128*1024*1024UL + stack_maxrandom_size())
-#define MAX_GAP (TASK_SIZE/6*5)
+#define MIN_GAP(task_size) (128*1024*1024UL + stack_maxrandom_size(task_size))
+#define MAX_GAP(task_size) (task_size/6*5)
 
 static int mmap_is_legacy(void)
 {
@@ -88,16 +90,22 @@ unsigned long arch_mmap_rnd(void)
 	return arch_native_rnd();
 }
 
-static unsigned long mmap_base(unsigned long rnd)
+static unsigned long mmap_base(unsigned long rnd, unsigned long task_size)
 {
 	unsigned long gap = rlimit(RLIMIT_STACK);
 
-	if (gap < MIN_GAP)
-		gap = MIN_GAP;
-	else if (gap > MAX_GAP)
-		gap = MAX_GAP;
+	if (gap < MIN_GAP(task_size))
+		gap = MIN_GAP(task_size);
+	else if (gap > MAX_GAP(task_size))
+		gap = MAX_GAP(task_size);
 
-	return PAGE_ALIGN(TASK_SIZE - gap - rnd);
+	return PAGE_ALIGN(task_size - gap - rnd);
+}
+
+static unsigned long mmap_legacy_base(unsigned long rnd,
+		unsigned long task_size)
+{
+	return _TASK_UNMAPPED_BASE(task_size) + rnd;
 }
 
 /*
@@ -111,13 +119,13 @@ void arch_pick_mmap_layout(struct mm_struct *mm)
 	if (current->flags & PF_RANDOMIZE)
 		random_factor = arch_mmap_rnd();
 
-	mm->mmap_legacy_base = TASK_UNMAPPED_BASE + random_factor;
+	mm->mmap_legacy_base = mmap_legacy_base(random_factor, TASK_SIZE);
 
 	if (mmap_is_legacy()) {
 		mm->mmap_base = mm->mmap_legacy_base;
 		mm->get_unmapped_area = arch_get_unmapped_area;
 	} else {
-		mm->mmap_base = mmap_base(random_factor);
+		mm->mmap_base = mmap_base(random_factor, TASK_SIZE);
 		mm->get_unmapped_area = arch_get_unmapped_area_topdown;
 	}
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCHv4 3/5] x86/mm: fix 32-bit mmap() for 64-bit ELF
  2017-01-30 12:04 [PATCHv4 0/5] Fix compatible mmap() return pointer over 4Gb Dmitry Safonov
  2017-01-30 12:04 ` [PATCHv4 1/5] x86/mm: split arch_mmap_rnd() on compat/native versions Dmitry Safonov
  2017-01-30 12:04 ` [PATCHv4 2/5] x86/mm: introduce mmap{,_legacy}_base Dmitry Safonov
@ 2017-01-30 12:04 ` Dmitry Safonov
  2017-02-11 19:49   ` Thomas Gleixner
  2017-01-30 12:04 ` [PATCHv4 4/5] x86/mm: check in_compat_syscall() instead TIF_ADDR32 for mmap(MAP_32BIT) Dmitry Safonov
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Dmitry Safonov @ 2017-01-30 12:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: 0x7f454c46, Dmitry Safonov, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andy Lutomirski, Borislav Petkov, x86, linux-mm

Fix 32-bit compat_sys_mmap() mapping VMA over 4Gb in 64-bit binaries
and 64-bit sys_mmap() mapping VMA only under 4Gb in 32-bit binaries.
Introduced new bases for compat syscalls in mm_struct:
mmap_compat_base and mmap_compat_legacy_base for top-down and
bottom-up allocations accordingly.
Taught arch_get_unmapped_area{,_topdown}() to use the new mmap_bases
in compat syscalls for high/low limits in vm_unmapped_area().

I discovered that bug on ZDTM tests for compat 32-bit C/R.
Working compat sys_mmap() in 64-bit binaries is really needed for that
purpose, as 32-bit applications are restored from 64-bit CRIU binary.

Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
---
 arch/Kconfig                 |  7 +++++++
 arch/x86/Kconfig             |  1 +
 arch/x86/kernel/sys_x86_64.c | 28 ++++++++++++++++++++++++----
 arch/x86/mm/mmap.c           | 36 ++++++++++++++++++++++++------------
 include/linux/mm_types.h     |  5 +++++
 5 files changed, 61 insertions(+), 16 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 99839c23d453..6bdca6d86855 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -671,6 +671,13 @@ config ARCH_MMAP_RND_COMPAT_BITS
 	  This value can be changed after boot using the
 	  /proc/sys/vm/mmap_rnd_compat_bits tunable
 
+config HAVE_ARCH_COMPAT_MMAP_BASES
+	bool
+	help
+	  If this is set, one program can do native and compatible syscall
+	  mmap() on architecture. Thus kernel has different bases to
+	  compute high and low virtual address limits for allocation.
+
 config HAVE_COPY_THREAD_TLS
 	bool
 	help
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index e487493bbd47..b3acb836567a 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -102,6 +102,7 @@ config X86
 	select HAVE_ARCH_KMEMCHECK
 	select HAVE_ARCH_MMAP_RND_BITS		if MMU
 	select HAVE_ARCH_MMAP_RND_COMPAT_BITS	if MMU && COMPAT
+	select HAVE_ARCH_COMPAT_MMAP_BASES	if MMU && COMPAT
 	select HAVE_ARCH_SECCOMP_FILTER
 	select HAVE_ARCH_TRACEHOOK
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index a55ed63b9f91..90be0839441d 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -16,6 +16,7 @@
 #include <linux/uaccess.h>
 #include <linux/elf.h>
 
+#include <asm/compat.h>
 #include <asm/ia32.h>
 #include <asm/syscalls.h>
 
@@ -113,10 +114,19 @@ static void find_start_end(unsigned long flags, unsigned long *begin,
 		if (current->flags & PF_RANDOMIZE) {
 			*begin = randomize_page(*begin, 0x02000000);
 		}
-	} else {
-		*begin = current->mm->mmap_legacy_base;
-		*end = TASK_SIZE;
+		return;
 	}
+
+#ifdef CONFIG_COMPAT
+	if (in_compat_syscall()) {
+		*begin = current->mm->mmap_compat_legacy_base;
+		*end = IA32_PAGE_OFFSET;
+		return;
+	}
+#endif
+
+	*begin = current->mm->mmap_legacy_base;
+	*end = TASK_SIZE_MAX;
 }
 
 unsigned long
@@ -157,6 +167,16 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
 	return vm_unmapped_area(&info);
 }
 
+static unsigned long find_top(void)
+{
+#ifdef CONFIG_COMPAT
+	if (in_compat_syscall())
+		return current->mm->mmap_compat_base;
+	else
+#endif
+		return current->mm->mmap_base;
+}
+
 unsigned long
 arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
 			  const unsigned long len, const unsigned long pgoff,
@@ -190,7 +210,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
 	info.flags = VM_UNMAPPED_AREA_TOPDOWN;
 	info.length = len;
 	info.low_limit = PAGE_SIZE;
-	info.high_limit = mm->mmap_base;
+	info.high_limit = find_top();
 	info.align_mask = 0;
 	info.align_offset = pgoff << PAGE_SHIFT;
 	if (filp) {
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index 98be520fd270..99e6a81d9c87 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -70,6 +70,8 @@ static int mmap_is_legacy(void)
 #ifdef CONFIG_COMPAT
 static unsigned long arch_compat_rnd(void)
 {
+	if (!(current->flags & PF_RANDOMIZE))
+		return 0;
 	return (get_random_long() & ((1UL << mmap_rnd_compat_bits) - 1))
 		<< PAGE_SHIFT;
 }
@@ -77,6 +79,8 @@ static unsigned long arch_compat_rnd(void)
 
 static unsigned long arch_native_rnd(void)
 {
+	if (!(current->flags & PF_RANDOMIZE))
+		return 0;
 	return (get_random_long() & ((1UL << mmap_rnd_bits) - 1)) << PAGE_SHIFT;
 }
 
@@ -112,22 +116,30 @@ static unsigned long mmap_legacy_base(unsigned long rnd,
  * This function, called very early during the creation of a new
  * process VM image, sets up which VM layout function to use:
  */
-void arch_pick_mmap_layout(struct mm_struct *mm)
+static void arch_pick_mmap_base(unsigned long *base, unsigned long *legacy_base,
+		unsigned long random_factor, unsigned long task_size)
 {
-	unsigned long random_factor = 0UL;
-
-	if (current->flags & PF_RANDOMIZE)
-		random_factor = arch_mmap_rnd();
-
-	mm->mmap_legacy_base = mmap_legacy_base(random_factor, TASK_SIZE);
+	*legacy_base = mmap_legacy_base(random_factor, task_size);
+	if (mmap_is_legacy())
+		*base = *legacy_base;
+	else
+		*base = mmap_base(random_factor, task_size);
+}
 
-	if (mmap_is_legacy()) {
-		mm->mmap_base = mm->mmap_legacy_base;
+void arch_pick_mmap_layout(struct mm_struct *mm)
+{
+	if (mmap_is_legacy())
 		mm->get_unmapped_area = arch_get_unmapped_area;
-	} else {
-		mm->mmap_base = mmap_base(random_factor, TASK_SIZE);
+	else
 		mm->get_unmapped_area = arch_get_unmapped_area_topdown;
-	}
+
+	arch_pick_mmap_base(&mm->mmap_base, &mm->mmap_legacy_base,
+			arch_native_rnd(), TASK_SIZE_MAX);
+
+#ifdef CONFIG_COMPAT
+	arch_pick_mmap_base(&mm->mmap_compat_base, &mm->mmap_compat_legacy_base,
+			arch_compat_rnd(), IA32_PAGE_OFFSET);
+#endif
 }
 
 const char *arch_vma_name(struct vm_area_struct *vma)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 808751d7b737..48274a84cebe 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -404,6 +404,11 @@ struct mm_struct {
 #endif
 	unsigned long mmap_base;		/* base of mmap area */
 	unsigned long mmap_legacy_base;         /* base of mmap area in bottom-up allocations */
+#ifdef CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES
+	/* Base adresses for compatible mmap() */
+	unsigned long mmap_compat_base;
+	unsigned long mmap_compat_legacy_base;
+#endif
 	unsigned long task_size;		/* size of task vm space */
 	unsigned long highest_vm_end;		/* highest vma end address */
 	pgd_t * pgd;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCHv4 4/5] x86/mm: check in_compat_syscall() instead TIF_ADDR32 for mmap(MAP_32BIT)
  2017-01-30 12:04 [PATCHv4 0/5] Fix compatible mmap() return pointer over 4Gb Dmitry Safonov
                   ` (2 preceding siblings ...)
  2017-01-30 12:04 ` [PATCHv4 3/5] x86/mm: fix 32-bit mmap() for 64-bit ELF Dmitry Safonov
@ 2017-01-30 12:04 ` Dmitry Safonov
  2017-02-11 20:13   ` Thomas Gleixner
  2017-01-30 12:04 ` [PATCHv4 5/5] selftests/x86: add test to check compat mmap() return addr Dmitry Safonov
  2017-02-06 16:46 ` [PATCHv4 0/5] Fix compatible mmap() return pointer over 4Gb Dmitry Safonov
  5 siblings, 1 reply; 25+ messages in thread
From: Dmitry Safonov @ 2017-01-30 12:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: 0x7f454c46, Dmitry Safonov, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andy Lutomirski, Borislav Petkov, x86, linux-mm

At this momet, logic in arch_get_unmapped_area{,_topdown} for mmaps with
MAP_32BIT flag checks TIF_ADDR32 which means:
o if 32-bit ELF changes mode to 64-bit on x86_64 and then tries to
  mmap() with MAP_32BIT it'll result in addr over 4Gb (as default is
  top-down allocation)
o if 64-bit ELF changes mode to 32-bit and tries mmap() with MAP_32BIT,
  it'll allocate only memory in 1GB space: [0x40000000, 0x80000000).

Fix it by handeling MAP_32BIT in 64-bit syscalls only.
As a little bonus it'll make thread flag a little less used.

Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
---
 arch/x86/kernel/sys_x86_64.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index 90be0839441d..533e879e9cdf 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -101,7 +101,7 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
 static void find_start_end(unsigned long flags, unsigned long *begin,
 			   unsigned long *end)
 {
-	if (!test_thread_flag(TIF_ADDR32) && (flags & MAP_32BIT)) {
+	if (!in_compat_syscall() && (flags & MAP_32BIT)) {
 		/* This is usually used needed to map code in small
 		   model, so it needs to be in the first 31bit. Limit
 		   it to that.  This means we need to move the
@@ -195,7 +195,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
 		return addr;
 
 	/* for MAP_32BIT mappings we force the legacy mmap base */
-	if (!test_thread_flag(TIF_ADDR32) && (flags & MAP_32BIT))
+	if (!in_compat_syscall() && (flags & MAP_32BIT))
 		goto bottomup;
 
 	/* requesting a specific address */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCHv4 5/5] selftests/x86: add test to check compat mmap() return addr
  2017-01-30 12:04 [PATCHv4 0/5] Fix compatible mmap() return pointer over 4Gb Dmitry Safonov
                   ` (3 preceding siblings ...)
  2017-01-30 12:04 ` [PATCHv4 4/5] x86/mm: check in_compat_syscall() instead TIF_ADDR32 for mmap(MAP_32BIT) Dmitry Safonov
@ 2017-01-30 12:04 ` Dmitry Safonov
  2017-02-06 16:46 ` [PATCHv4 0/5] Fix compatible mmap() return pointer over 4Gb Dmitry Safonov
  5 siblings, 0 replies; 25+ messages in thread
From: Dmitry Safonov @ 2017-01-30 12:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: 0x7f454c46, Dmitry Safonov, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andy Lutomirski, Borislav Petkov, x86, linux-mm,
	Shuah Khan, linux-kselftest

We can't just add segfault handler and use addr, returned by compat
mmap() syscall, because the lower 4 bytes can be the same as already
existed VMA. So, the test parses /proc/self/maps file, founds new
VMAs those appeared after compatible sys_mmap() and checks if mmaped
VMA is in that list.

On failure it prints:
[NOTE]	Allocated mmap 0x6f36a000, sized 0x400000
[NOTE]	New mapping appeared: 0x7f936f36a000
[FAIL]	Found VMA [0x7f936f36a000, 0x7f936f76a000] in maps file, that was allocated with compat syscall

Cc: Shuah Khan <shuah@kernel.org>
Cc: linux-kselftest@vger.kernel.org
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
---
 tools/testing/selftests/x86/Makefile           |   2 +-
 tools/testing/selftests/x86/test_compat_mmap.c | 208 +++++++++++++++++++++++++
 2 files changed, 209 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/x86/test_compat_mmap.c

diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index 8c1cb423cfe6..9c3e746a6064 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -10,7 +10,7 @@ TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt ptrace_sysc
 TARGETS_C_32BIT_ONLY := entry_from_vm86 syscall_arg_fault test_syscall_vdso unwind_vdso \
 			test_FCMOV test_FCOMI test_FISTTP \
 			vdso_restorer
-TARGETS_C_64BIT_ONLY := fsgsbase
+TARGETS_C_64BIT_ONLY := fsgsbase test_compat_mmap
 
 TARGETS_C_32BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_32BIT_ONLY)
 TARGETS_C_64BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_64BIT_ONLY)
diff --git a/tools/testing/selftests/x86/test_compat_mmap.c b/tools/testing/selftests/x86/test_compat_mmap.c
new file mode 100644
index 000000000000..245d9407653e
--- /dev/null
+++ b/tools/testing/selftests/x86/test_compat_mmap.c
@@ -0,0 +1,208 @@
+/*
+ * Check that compat 32-bit mmap() returns address < 4Gb on 64-bit.
+ *
+ * Copyright (c) 2017 Dmitry Safonov (Virtuozzo)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+#include <sys/mman.h>
+#include <sys/types.h>
+
+#include <stdio.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <signal.h>
+#include <stdlib.h>
+
+#define PAGE_SIZE 4096
+#define MMAP_SIZE (PAGE_SIZE*1024)
+#define MAX_VMAS 50
+#define BUF_SIZE 1024
+
+#ifndef __NR32_mmap2
+#define __NR32_mmap2 192
+#endif
+
+struct syscall_args32 {
+	uint32_t nr, arg0, arg1, arg2, arg3, arg4, arg5;
+};
+
+static void do_full_int80(struct syscall_args32 *args)
+{
+	asm volatile ("int $0x80"
+		      : "+a" (args->nr),
+			"+b" (args->arg0), "+c" (args->arg1), "+d" (args->arg2),
+			"+S" (args->arg3), "+D" (args->arg4),
+			"+rbp" (args->arg5)
+			: : "r8", "r9", "r10", "r11");
+}
+
+void *mmap2(void *addr, size_t len, int prot, int flags,
+	int fildes, off_t off)
+{
+	struct syscall_args32 s;
+
+	s.nr	= __NR32_mmap2;
+	s.arg0	= (uint32_t)(uintptr_t)addr;
+	s.arg1	= (uint32_t)len;
+	s.arg2	= prot;
+	s.arg3	= flags;
+	s.arg4	= fildes;
+	s.arg5	= (uint32_t)off;
+
+	do_full_int80(&s);
+
+	return (void *)(uintptr_t)s.nr;
+}
+
+struct vm_area {
+	unsigned long start;
+	unsigned long end;
+};
+
+static struct vm_area vmas_before_mmap[MAX_VMAS];
+static struct vm_area vmas_after_mmap[MAX_VMAS];
+
+static char buf[BUF_SIZE];
+
+int parse_maps(struct vm_area *vmas)
+{
+	FILE *maps;
+	int i;
+
+	maps = fopen("/proc/self/maps", "r");
+	if (maps == NULL) {
+		printf("[ERROR]\tFailed to open maps file: %m\n");
+		return -1;
+	}
+
+	for (i = 0; i < MAX_VMAS; i++) {
+		struct vm_area *v = &vmas[i];
+		char *end;
+
+		if (fgets(buf, BUF_SIZE, maps) == NULL)
+			break;
+
+		v->start = strtoul(buf, &end, 16);
+		v->end = strtoul(end + 1, NULL, 16);
+		//printf("[NOTE]\tVMA: [%#lx, %#lx]\n", v->start, v->end);
+	}
+
+	if (i == MAX_VMAS) {
+		printf("[ERROR]\tNumber of VMAs is bigger than reserved array's size\n");
+		return -1;
+	}
+
+	if (fclose(maps)) {
+		printf("[ERROR]\tFailed to close maps file: %m\n");
+		return -1;
+	}
+	return 0;
+}
+
+int compare_vmas(struct vm_area *vmax, struct vm_area *vmay)
+{
+	if (vmax->start > vmay->start)
+		return 1;
+	if (vmax->start < vmay->start)
+		return -1;
+	if (vmax->end > vmay->end)
+		return 1;
+	if (vmax->end < vmay->end)
+		return -1;
+	return 0;
+}
+
+unsigned long vma_size(struct vm_area *v)
+{
+	return v->end - v->start;
+}
+
+int find_new_vma_like(struct vm_area *vma)
+{
+	int i, j = 0, found_alike = -1;
+
+	for (i = 0; i < MAX_VMAS && j < MAX_VMAS; i++, j++) {
+		int cmp = compare_vmas(&vmas_before_mmap[i],
+				&vmas_after_mmap[j]);
+
+		if (cmp == 0)
+			continue;
+		if (cmp < 0) {/* Lost mapping */
+			printf("[NOTE]\tLost mapping: %#lx\n",
+				vmas_before_mmap[i].start);
+			j--;
+			continue;
+		}
+
+		printf("[NOTE]\tNew mapping appeared: %#lx\n",
+				vmas_after_mmap[j].start);
+		i--;
+		if (!compare_vmas(&vmas_after_mmap[j], vma))
+			return 0;
+
+		if (((vmas_after_mmap[j].start & 0xffffffff) == vma->start) &&
+				(vma_size(&vmas_after_mmap[j]) == vma_size(vma)))
+			found_alike = j;
+	}
+
+	/* Left new vmas in tail */
+	for (; i < MAX_VMAS; i++)
+		if (!compare_vmas(&vmas_after_mmap[j], vma))
+			return 0;
+
+	if (found_alike != -1) {
+		printf("[FAIL]\tFound VMA [%#lx, %#lx] in maps file, that was allocated with compat syscall\n",
+			vmas_after_mmap[found_alike].start,
+			vmas_after_mmap[found_alike].end);
+		return -1;
+	}
+
+	printf("[ERROR]\tCan't find [%#lx, %#lx] in maps file\n",
+		vma->start, vma->end);
+	return -1;
+}
+
+int main(int argc, char **argv)
+{
+	void *map;
+	struct vm_area vma;
+
+	if (parse_maps(vmas_before_mmap)) {
+		printf("[ERROR]\tFailed to parse maps file\n");
+		return 1;
+	}
+
+	map = mmap2(0, MMAP_SIZE, PROT_READ | PROT_WRITE | PROT_EXEC,
+			MAP_PRIVATE | MAP_ANON, -1, 0);
+	if (((uintptr_t)map) % PAGE_SIZE) {
+		printf("[ERROR]\tmmap2 failed: %d\n",
+				(~(uint32_t)(uintptr_t)map) + 1);
+		return 1;
+	} else {
+		printf("[NOTE]\tAllocated mmap %p, sized %#x\n", map, MMAP_SIZE);
+	}
+
+	if (parse_maps(vmas_after_mmap)) {
+		printf("[ERROR]\tFailed to parse maps file\n");
+		return 1;
+	}
+
+	munmap(map, MMAP_SIZE);
+
+	vma.start = (unsigned long)(uintptr_t)map;
+	vma.end = vma.start + MMAP_SIZE;
+	if (find_new_vma_like(&vma))
+		return 1;
+
+	printf("[OK]\n");
+
+	return 0;
+}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCHv4 0/5] Fix compatible mmap() return pointer over 4Gb
  2017-01-30 12:04 [PATCHv4 0/5] Fix compatible mmap() return pointer over 4Gb Dmitry Safonov
                   ` (4 preceding siblings ...)
  2017-01-30 12:04 ` [PATCHv4 5/5] selftests/x86: add test to check compat mmap() return addr Dmitry Safonov
@ 2017-02-06 16:46 ` Dmitry Safonov
  5 siblings, 0 replies; 25+ messages in thread
From: Dmitry Safonov @ 2017-02-06 16:46 UTC (permalink / raw)
  To: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov
  Cc: 0x7f454c46, x86, linux-mm, Shuah Khan, linux-kselftest

On 01/30/2017 03:04 PM, Dmitry Safonov wrote:
> Changes since v3:
> - fixed usage of 64-bit random mask for 32-bit mm->mmap_compat_base,
>   during introducing mmap_compat{_legacy,}_base
>
> Changes since v2:
> - don't distinguish native and compat tasks by TIF_ADDR32,
>   introduced mmap_compat{_legacy,}_base which allows to treat them
>   the same
> - fixed kbuild errors
>
> Changes since v1:
> - Recalculate mmap_base instead of using max possible virtual address
>   for compat/native syscall. That will make policy for allocation the
>   same in 32-bit binaries and in 32-bit syscalls in 64-bit binaries.
>   I need this because sys_mmap() in restored 32-bit process shouldn't
>   hit the stack area.
> - Fixed mmap() with MAP_32BIT flag in the same usecases
> - used in_compat_syscall() helper rather TS_COMPAT check (Andy noticed)
> - introduced find_top() helper as suggested by Andy to simplify code
> - fixed test error-handeling: it checked the result of sys_mmap() with
>   MMAP_FAILED, which is not correct, as it calls raw syscall - now
>   checks return value to be aligned to PAGE_SIZE.
>
> Description from v1 [2]:
>
> A fix for bug in mmap() that I referenced in [1].
> Also selftest for it.

Gentle ping. Any thought on this?

>
> [1]: https://marc.info/?l=linux-kernel&m=148311451525315
> [2]: https://marc.info/?l=linux-kernel&m=148415888707662
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: x86@kernel.org
> Cc: linux-mm@kvack.org
>
> Dmitry Safonov (5):
>   x86/mm: split arch_mmap_rnd() on compat/native versions
>   x86/mm: introduce mmap{,_legacy}_base
>   x86/mm: fix 32-bit mmap() for 64-bit ELF
>   x86/mm: check in_compat_syscall() instead TIF_ADDR32 for
>     mmap(MAP_32BIT)
>   selftests/x86: add test to check compat mmap() return addr
>
>  arch/Kconfig                                   |   7 +
>  arch/x86/Kconfig                               |   1 +
>  arch/x86/include/asm/elf.h                     |   4 +-
>  arch/x86/include/asm/processor.h               |   3 +-
>  arch/x86/kernel/sys_x86_64.c                   |  32 +++-
>  arch/x86/mm/mmap.c                             |  89 +++++++----
>  include/linux/mm_types.h                       |   5 +
>  tools/testing/selftests/x86/Makefile           |   2 +-
>  tools/testing/selftests/x86/test_compat_mmap.c | 208 +++++++++++++++++++++++++
>  9 files changed, 311 insertions(+), 40 deletions(-)
>  create mode 100644 tools/testing/selftests/x86/test_compat_mmap.c
>


-- 
              Dmitry

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4 1/5] x86/mm: split arch_mmap_rnd() on compat/native versions
  2017-01-30 12:04 ` [PATCHv4 1/5] x86/mm: split arch_mmap_rnd() on compat/native versions Dmitry Safonov
@ 2017-02-09 13:55   ` Borislav Petkov
  2017-02-09 23:06     ` Andy Lutomirski
  2017-02-10 20:10     ` Thomas Gleixner
  0 siblings, 2 replies; 25+ messages in thread
From: Borislav Petkov @ 2017-02-09 13:55 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, 0x7f454c46, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andy Lutomirski, Borislav Petkov, x86, linux-mm

On Mon, Jan 30, 2017 at 03:04:28PM +0300, Dmitry Safonov wrote:
> I need those arch_{native,compat}_rnd() to compute separately
> random factor for mmap() in compat syscalls for 64-bit binaries
> and vice-versa for native syscall in 32-bit compat binaries.
> They will be used in the following patches.
> 
> Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
> ---
>  arch/x86/mm/mmap.c | 25 ++++++++++++++++---------
>  1 file changed, 16 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
> index d2dc0438d654..42063e787717 100644
> --- a/arch/x86/mm/mmap.c
> +++ b/arch/x86/mm/mmap.c
> @@ -65,20 +65,27 @@ static int mmap_is_legacy(void)
>  	return sysctl_legacy_va_layout;
>  }
>  
> -unsigned long arch_mmap_rnd(void)
> +#ifdef CONFIG_COMPAT
> +static unsigned long arch_compat_rnd(void)
>  {
> -	unsigned long rnd;
> +	return (get_random_long() & ((1UL << mmap_rnd_compat_bits) - 1))
> +		<< PAGE_SHIFT;
> +}
> +#endif
>  
> -	if (mmap_is_ia32())
> +static unsigned long arch_native_rnd(void)
> +{
> +	return (get_random_long() & ((1UL << mmap_rnd_bits) - 1)) << PAGE_SHIFT;
> +}
> +
> +unsigned long arch_mmap_rnd(void)
> +{
>  #ifdef CONFIG_COMPAT
> -		rnd = get_random_long() & ((1UL << mmap_rnd_compat_bits) - 1);
> -#else
> -		rnd = get_random_long() & ((1UL << mmap_rnd_bits) - 1);
> +	if (mmap_is_ia32())
> +		return arch_compat_rnd();
>  #endif

I can't say that I'm thrilled about the ifdeffery this is adding.

But I can't think of a cleaner approach at a quick glance, though -
that's generic and arch-specific code intertwined muck. Sad face.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4 1/5] x86/mm: split arch_mmap_rnd() on compat/native versions
  2017-02-09 13:55   ` Borislav Petkov
@ 2017-02-09 23:06     ` Andy Lutomirski
  2017-02-10 20:10     ` Thomas Gleixner
  1 sibling, 0 replies; 25+ messages in thread
From: Andy Lutomirski @ 2017-02-09 23:06 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dmitry Safonov, linux-kernel, Dmitry Safonov, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Andy Lutomirski, Borislav Petkov,
	X86 ML, linux-mm

On Thu, Feb 9, 2017 at 5:55 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Mon, Jan 30, 2017 at 03:04:28PM +0300, Dmitry Safonov wrote:
>> I need those arch_{native,compat}_rnd() to compute separately
>> random factor for mmap() in compat syscalls for 64-bit binaries
>> and vice-versa for native syscall in 32-bit compat binaries.
>> They will be used in the following patches.
>>
>> Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
>> ---
>>  arch/x86/mm/mmap.c | 25 ++++++++++++++++---------
>>  1 file changed, 16 insertions(+), 9 deletions(-)
>>
>> diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
>> index d2dc0438d654..42063e787717 100644
>> --- a/arch/x86/mm/mmap.c
>> +++ b/arch/x86/mm/mmap.c
>> @@ -65,20 +65,27 @@ static int mmap_is_legacy(void)
>>       return sysctl_legacy_va_layout;
>>  }
>>
>> -unsigned long arch_mmap_rnd(void)
>> +#ifdef CONFIG_COMPAT
>> +static unsigned long arch_compat_rnd(void)
>>  {
>> -     unsigned long rnd;
>> +     return (get_random_long() & ((1UL << mmap_rnd_compat_bits) - 1))
>> +             << PAGE_SHIFT;
>> +}
>> +#endif
>>
>> -     if (mmap_is_ia32())
>> +static unsigned long arch_native_rnd(void)
>> +{
>> +     return (get_random_long() & ((1UL << mmap_rnd_bits) - 1)) << PAGE_SHIFT;
>> +}
>> +
>> +unsigned long arch_mmap_rnd(void)
>> +{
>>  #ifdef CONFIG_COMPAT
>> -             rnd = get_random_long() & ((1UL << mmap_rnd_compat_bits) - 1);
>> -#else
>> -             rnd = get_random_long() & ((1UL << mmap_rnd_bits) - 1);
>> +     if (mmap_is_ia32())
>> +             return arch_compat_rnd();
>>  #endif
>
> I can't say that I'm thrilled about the ifdeffery this is adding.
>
> But I can't think of a cleaner approach at a quick glance, though -
> that's generic and arch-specific code intertwined muck. Sad face.

I can, but it could be considerably more churn: get rid of the
compat/native split and do a 32-bit/64-bit split instead.

>
> --
> Regards/Gruss,
>     Boris.
>
> Good mailing practices for 400: avoid top-posting and trim the reply.



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4 1/5] x86/mm: split arch_mmap_rnd() on compat/native versions
  2017-02-09 13:55   ` Borislav Petkov
  2017-02-09 23:06     ` Andy Lutomirski
@ 2017-02-10 20:10     ` Thomas Gleixner
  2017-02-10 20:25       ` Borislav Petkov
  2017-02-10 21:28       ` Dmitry Safonov
  1 sibling, 2 replies; 25+ messages in thread
From: Thomas Gleixner @ 2017-02-10 20:10 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dmitry Safonov, linux-kernel, 0x7f454c46, Ingo Molnar,
	H. Peter Anvin, Andy Lutomirski, Borislav Petkov, x86, linux-mm

On Thu, 9 Feb 2017, Borislav Petkov wrote:
> I can't say that I'm thrilled about the ifdeffery this is adding.
> 
> But I can't think of a cleaner approach at a quick glance, though -
> that's generic and arch-specific code intertwined muck. Sad face.

It's trivial enough to do ....

Thanks,

	tglx

---
 arch/x86/mm/mmap.c |   22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -55,6 +55,10 @@ static unsigned long stack_maxrandom_siz
 #define MIN_GAP (128*1024*1024UL + stack_maxrandom_size())
 #define MAX_GAP (TASK_SIZE/6*5)
 
+#ifndef CONFIG_COMPAT
+# define mmap_rnd_compat_bits	mmap_rnd_bits
+#endif
+
 static int mmap_is_legacy(void)
 {
 	if (current->personality & ADDR_COMPAT_LAYOUT)
@@ -66,20 +70,14 @@ static int mmap_is_legacy(void)
 	return sysctl_legacy_va_layout;
 }
 
-unsigned long arch_mmap_rnd(void)
+static unsigned long arch_rnd(unsigned int rndbits)
 {
-	unsigned long rnd;
-
-	if (mmap_is_ia32())
-#ifdef CONFIG_COMPAT
-		rnd = get_random_long() & ((1UL << mmap_rnd_compat_bits) - 1);
-#else
-		rnd = get_random_long() & ((1UL << mmap_rnd_bits) - 1);
-#endif
-	else
-		rnd = get_random_long() & ((1UL << mmap_rnd_bits) - 1);
+	return (get_random_long() & ((1UL << rndbits) - 1)) << PAGE_SHIFT;
+}
 
-	return rnd << PAGE_SHIFT;
+unsigned long arch_mmap_rnd(void)
+{
+	return arch_rnd(mmap_is_ia32() ? mmap_rnd_compat_bits : mmap_rnd_bits);
 }
 
 static unsigned long mmap_base(unsigned long rnd)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4 1/5] x86/mm: split arch_mmap_rnd() on compat/native versions
  2017-02-10 20:10     ` Thomas Gleixner
@ 2017-02-10 20:25       ` Borislav Petkov
  2017-02-10 21:28       ` Dmitry Safonov
  1 sibling, 0 replies; 25+ messages in thread
From: Borislav Petkov @ 2017-02-10 20:25 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Dmitry Safonov, linux-kernel, 0x7f454c46, Ingo Molnar,
	H. Peter Anvin, Andy Lutomirski, x86, linux-mm

On Fri, Feb 10, 2017 at 09:10:30PM +0100, Thomas Gleixner wrote:
> On Thu, 9 Feb 2017, Borislav Petkov wrote:
> > I can't say that I'm thrilled about the ifdeffery this is adding.
> > 
> > But I can't think of a cleaner approach at a quick glance, though -
> > that's generic and arch-specific code intertwined muck. Sad face.
> 
> It's trivial enough to do ....
> 
> Thanks,
> 
> 	tglx
> 
> ---
>  arch/x86/mm/mmap.c |   22 ++++++++++------------
>  1 file changed, 10 insertions(+), 12 deletions(-)
> 
> --- a/arch/x86/mm/mmap.c
> +++ b/arch/x86/mm/mmap.c
> @@ -55,6 +55,10 @@ static unsigned long stack_maxrandom_siz
>  #define MIN_GAP (128*1024*1024UL + stack_maxrandom_size())
>  #define MAX_GAP (TASK_SIZE/6*5)
>  
> +#ifndef CONFIG_COMPAT
> +# define mmap_rnd_compat_bits	mmap_rnd_bits
> +#endif
> +
>  static int mmap_is_legacy(void)
>  {
>  	if (current->personality & ADDR_COMPAT_LAYOUT)
> @@ -66,20 +70,14 @@ static int mmap_is_legacy(void)
>  	return sysctl_legacy_va_layout;
>  }
>  
> -unsigned long arch_mmap_rnd(void)
> +static unsigned long arch_rnd(unsigned int rndbits)
>  {
> -	unsigned long rnd;
> -
> -	if (mmap_is_ia32())
> -#ifdef CONFIG_COMPAT
> -		rnd = get_random_long() & ((1UL << mmap_rnd_compat_bits) - 1);
> -#else
> -		rnd = get_random_long() & ((1UL << mmap_rnd_bits) - 1);
> -#endif
> -	else
> -		rnd = get_random_long() & ((1UL << mmap_rnd_bits) - 1);
> +	return (get_random_long() & ((1UL << rndbits) - 1)) << PAGE_SHIFT;
> +}
>  
> -	return rnd << PAGE_SHIFT;
> +unsigned long arch_mmap_rnd(void)
> +{
> +	return arch_rnd(mmap_is_ia32() ? mmap_rnd_compat_bits : mmap_rnd_bits);
>  }

Ha! Nice. :-)

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4 1/5] x86/mm: split arch_mmap_rnd() on compat/native versions
  2017-02-10 20:10     ` Thomas Gleixner
  2017-02-10 20:25       ` Borislav Petkov
@ 2017-02-10 21:28       ` Dmitry Safonov
  2017-02-11  8:23         ` Thomas Gleixner
  1 sibling, 1 reply; 25+ messages in thread
From: Dmitry Safonov @ 2017-02-10 21:28 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Borislav Petkov, Dmitry Safonov, open list, Ingo Molnar,
	H. Peter Anvin, Andy Lutomirski, Borislav Petkov, X86 ML,
	linux-mm

2017-02-10 23:10 GMT+03:00 Thomas Gleixner <tglx@linutronix.de>:
> On Thu, 9 Feb 2017, Borislav Petkov wrote:
>> I can't say that I'm thrilled about the ifdeffery this is adding.
>>
>> But I can't think of a cleaner approach at a quick glance, though -
>> that's generic and arch-specific code intertwined muck. Sad face.
>
> It's trivial enough to do ....
>
> Thanks,
>
>         tglx
>
> ---
>  arch/x86/mm/mmap.c |   22 ++++++++++------------
>  1 file changed, 10 insertions(+), 12 deletions(-)
>
> --- a/arch/x86/mm/mmap.c
> +++ b/arch/x86/mm/mmap.c
> @@ -55,6 +55,10 @@ static unsigned long stack_maxrandom_siz
>  #define MIN_GAP (128*1024*1024UL + stack_maxrandom_size())
>  #define MAX_GAP (TASK_SIZE/6*5)
>
> +#ifndef CONFIG_COMPAT
> +# define mmap_rnd_compat_bits  mmap_rnd_bits
> +#endif
> +

>From my POV, I can't say that it's clearer to shadow mmap_compat_bits
like that then to have two functions with native/compat names.
But if you insist, I'll resend patches set with your version.

>  static int mmap_is_legacy(void)
>  {
>         if (current->personality & ADDR_COMPAT_LAYOUT)
> @@ -66,20 +70,14 @@ static int mmap_is_legacy(void)
>         return sysctl_legacy_va_layout;
>  }
>
> -unsigned long arch_mmap_rnd(void)
> +static unsigned long arch_rnd(unsigned int rndbits)
>  {
> -       unsigned long rnd;
> -
> -       if (mmap_is_ia32())
> -#ifdef CONFIG_COMPAT
> -               rnd = get_random_long() & ((1UL << mmap_rnd_compat_bits) - 1);
> -#else
> -               rnd = get_random_long() & ((1UL << mmap_rnd_bits) - 1);
> -#endif
> -       else
> -               rnd = get_random_long() & ((1UL << mmap_rnd_bits) - 1);
> +       return (get_random_long() & ((1UL << rndbits) - 1)) << PAGE_SHIFT;
> +}
>
> -       return rnd << PAGE_SHIFT;
> +unsigned long arch_mmap_rnd(void)
> +{
> +       return arch_rnd(mmap_is_ia32() ? mmap_rnd_compat_bits : mmap_rnd_bits);
>  }
>
>  static unsigned long mmap_base(unsigned long rnd)

-- 
             Dmitry

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4 1/5] x86/mm: split arch_mmap_rnd() on compat/native versions
  2017-02-10 21:28       ` Dmitry Safonov
@ 2017-02-11  8:23         ` Thomas Gleixner
  2017-02-13 11:12           ` Dmitry Safonov
  0 siblings, 1 reply; 25+ messages in thread
From: Thomas Gleixner @ 2017-02-11  8:23 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: Borislav Petkov, Dmitry Safonov, open list, Ingo Molnar,
	H. Peter Anvin, Andy Lutomirski, Borislav Petkov, X86 ML,
	linux-mm

On Sat, 11 Feb 2017, Dmitry Safonov wrote:

> 2017-02-10 23:10 GMT+03:00 Thomas Gleixner <tglx@linutronix.de>:
> > On Thu, 9 Feb 2017, Borislav Petkov wrote:
> >> I can't say that I'm thrilled about the ifdeffery this is adding.
> >>
> >> But I can't think of a cleaner approach at a quick glance, though -
> >> that's generic and arch-specific code intertwined muck. Sad face.
> >
> > It's trivial enough to do ....
> >
> > Thanks,
> >
> >         tglx
> >
> > ---
> >  arch/x86/mm/mmap.c |   22 ++++++++++------------
> >  1 file changed, 10 insertions(+), 12 deletions(-)
> >
> > --- a/arch/x86/mm/mmap.c
> > +++ b/arch/x86/mm/mmap.c
> > @@ -55,6 +55,10 @@ static unsigned long stack_maxrandom_siz
> >  #define MIN_GAP (128*1024*1024UL + stack_maxrandom_size())
> >  #define MAX_GAP (TASK_SIZE/6*5)
> >
> > +#ifndef CONFIG_COMPAT
> > +# define mmap_rnd_compat_bits  mmap_rnd_bits
> > +#endif
> > +
> 
> >From my POV, I can't say that it's clearer to shadow mmap_compat_bits
> like that then to have two functions with native/compat names.
> But if you insist, I'll resend patches set with your version.

You can make that

#ifdef CONFIG_64BIT
# define mmap32_rnd_bits  mmap_compat_rnd_bits
# define mmap64_rnd_bits  mmap_rnd_bits
#else
# define mmap32_rnd_bits  mmap_rnd_bits
# define mmap64_rnd_bits  mmap_rnd_bits
#endif

and use that. That's still way more readable than the unholy ifdef mess.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4 2/5] x86/mm: introduce mmap{,_legacy}_base
  2017-01-30 12:04 ` [PATCHv4 2/5] x86/mm: introduce mmap{,_legacy}_base Dmitry Safonov
@ 2017-02-11 14:13   ` Thomas Gleixner
  2017-02-13 13:02     ` Dmitry Safonov
  2017-02-13 14:37     ` Dmitry Safonov
  0 siblings, 2 replies; 25+ messages in thread
From: Thomas Gleixner @ 2017-02-11 14:13 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, 0x7f454c46, Ingo Molnar, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, x86, linux-mm

On Mon, 30 Jan 2017, Dmitry Safonov wrote:

> In the following patch they will be used to compute:
> - mmap{,_legacy}_base for 64-bit mmap()
> - mmap_compat{,_legacy}_base for 32-bit mmap()
> 
> This patch makes it possible to calculate mmap bases for any specified
> task_size, which is needed to correctly choose the base address for mmap
> in 32-bit syscalls and 64-bit syscalls.

Please rework your changelogs so they follow the requirements in
Documentation....  Look for 'this patch' there.

Also order it in a way which makes it clear what the problem is and how
it's solved.

> +#define STACK_RND_MASK_MODE(native) (0x7ff)
>  #define STACK_RND_MASK (0x7ff)
>  
>  #define ARCH_DLINFO		ARCH_DLINFO_IA32
> @@ -295,7 +296,8 @@ do {									\
>  #else /* CONFIG_X86_32 */
>  
>  /* 1GB for 64bit, 8MB for 32bit */
> -#define STACK_RND_MASK (test_thread_flag(TIF_ADDR32) ? 0x7ff : 0x3fffff)
> +#define STACK_RND_MASK_MODE(native) ((native) ? 0x3fffff : 0x7ff)
> +#define STACK_RND_MASK STACK_RND_MASK_MODE(!test_thread_flag(TIF_ADDR32))

I had to look twice what MODE means. The macro name suggests that it
returns the mode, while it actually returns the mask. That's confusing at
best. And 'native' is not intuitive either.

#define __STACK_RND_MASK(is64bit)	((is64bit) ? 0x3fffff : 0x7ff)
#define STACK_RND_MASK			__STACK_RND_MASK(!mmap_is_ia32())

That is pretty much clear and it uses mmap_is_ia32() as we do in other
places. As a consequemce you can use the same macros for 64 and 32 bit.

> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 1be64da0384e..52086e65b422 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -862,7 +862,8 @@ extern void start_thread(struct pt_regs *regs, unsigned long new_ip,
>   * This decides where the kernel will search for a free chunk of vm
>   * space during mmap's.
>   */
> -#define TASK_UNMAPPED_BASE	(PAGE_ALIGN(TASK_SIZE / 3))
> +#define _TASK_UNMAPPED_BASE(task_size)	(PAGE_ALIGN(task_size / 3))
> +#define TASK_UNMAPPED_BASE	_TASK_UNMAPPED_BASE(TASK_SIZE)

Please use two underscores and align the macros in tabular fashion. 

#define __TASK_UNMAPPED_BASE(task_size)	(PAGE_ALIGN(task_size / 3))
#define TASK_UNMAPPED_BASE		__TASK_UNMAPPED_BASE(TASK_SIZE)

That way stuff becomes easy to read.

>  #define KSTK_EIP(task)		(task_pt_regs(task)->ip)
>  
> diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
> index 42063e787717..98be520fd270 100644
> --- a/arch/x86/mm/mmap.c
> +++ b/arch/x86/mm/mmap.c
> @@ -35,12 +35,14 @@ struct va_alignment __read_mostly va_align = {
>  	.flags = -1,
>  };
>  
> -static unsigned long stack_maxrandom_size(void)
> +static unsigned long stack_maxrandom_size(unsigned long task_size)
>  {
>  	unsigned long max = 0;
>  	if ((current->flags & PF_RANDOMIZE) &&
>  		!(current->personality & ADDR_NO_RANDOMIZE)) {
> -		max = ((-1UL) & STACK_RND_MASK) << PAGE_SHIFT;
> +		max = (-1UL);
> +		max &= STACK_RND_MASK_MODE(task_size == TASK_SIZE_MAX);

That just makes me barf, really. I have to go and lookup how TASK_SIZE_MAX
is defined in order to read that code. TASK_SIZE_MAX as is does not give a
hint at all that it means TASK_SIZE_MAX of 64 bit tasks.

You just explained me that you want stuff proper for clarity reasons. So
what's so wrong with adding a helper inline tasksize_64bit() or such?

> +		max <<= PAGE_SHIFT;
>  	}
>  
>  	return max;
> @@ -51,8 +53,8 @@ static unsigned long stack_maxrandom_size(void)
>   *
>   * Leave an at least ~128 MB hole with possible stack randomization.
>   */
> -#define MIN_GAP (128*1024*1024UL + stack_maxrandom_size())
> -#define MAX_GAP (TASK_SIZE/6*5)
> +#define MIN_GAP(task_size) (128*1024*1024UL + stack_maxrandom_size(task_size))
> +#define MAX_GAP(task_size) (task_size/6*5)

Eew. Yes it's existing macro mess, but there is not point in proliferating
that. That macro crap is only used in mmap_base() and there is no
justification for these unreadable macros at all. It just makes stuff
obfuscated for no reason. Just blindly making it more obfuscated does not
make it any better.

>  static int mmap_is_legacy(void)
>  {
> @@ -88,16 +90,22 @@ unsigned long arch_mmap_rnd(void)
>  	return arch_native_rnd();
>  }

#define SIZE_128M	(128 * 1024 * 1024UL)

> -static unsigned long mmap_base(unsigned long rnd)
> +static unsigned long mmap_base(unsigned long rnd, unsigned long task_size)
>  {
> 	unsigned long gap = rlimit(RLIMIT_STACK);
	unsigned long gap_min, gap_max;

	/* Add comment what this means */
	gap_min = SIZE_128M + stack_maxrandom_size(task_size);
	/* Explain that ' /6 * 5' magic */
	gap_max = (task_size / 6) * 5;

and use gap_min/gap_max below. That would be too intuitive and readable,
right?

> -	if (gap < MIN_GAP)
> -		gap = MIN_GAP;
> -	else if (gap > MAX_GAP)
> -		gap = MAX_GAP;
> +	if (gap < MIN_GAP(task_size))
> +		gap = MIN_GAP(task_size);
> +	else if (gap > MAX_GAP(task_size))
> +		gap = MAX_GAP(task_size);
>  
> -	return PAGE_ALIGN(TASK_SIZE - gap - rnd);
> +	return PAGE_ALIGN(task_size - gap - rnd);
> +}
> +
> +static unsigned long mmap_legacy_base(unsigned long rnd,
> +		unsigned long task_size)

Please align the argument in the second line with the argument in the first
one

static unsigned long mmap_legacy_base(unsigned long rnd,
		     		      unsigned long task_size)

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4 3/5] x86/mm: fix 32-bit mmap() for 64-bit ELF
  2017-01-30 12:04 ` [PATCHv4 3/5] x86/mm: fix 32-bit mmap() for 64-bit ELF Dmitry Safonov
@ 2017-02-11 19:49   ` Thomas Gleixner
  2017-02-14 15:24     ` Dmitry Safonov
  0 siblings, 1 reply; 25+ messages in thread
From: Thomas Gleixner @ 2017-02-11 19:49 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, 0x7f454c46, Ingo Molnar, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, x86, linux-mm

On Mon, 30 Jan 2017, Dmitry Safonov wrote:

> Fix 32-bit compat_sys_mmap() mapping VMA over 4Gb in 64-bit binaries
> and 64-bit sys_mmap() mapping VMA only under 4Gb in 32-bit binaries.
> Introduced new bases for compat syscalls in mm_struct:
> mmap_compat_base and mmap_compat_legacy_base for top-down and
> bottom-up allocations accordingly.
> Taught arch_get_unmapped_area{,_topdown}() to use the new mmap_bases
> in compat syscalls for high/low limits in vm_unmapped_area().
> 
> I discovered that bug on ZDTM tests for compat 32-bit C/R.
> Working compat sys_mmap() in 64-bit binaries is really needed for that
> purpose, as 32-bit applications are restored from 64-bit CRIU binary.

Again that changelog sucks.

Explain the problem/bug first. Then explain the way to fix it and do not
tell fairy tales about what you did without explaing the bug in the first
place.

Documentation....SubittingPatches explains that very well.


> +config HAVE_ARCH_COMPAT_MMAP_BASES
> +	bool
> +	help
> +	  If this is set, one program can do native and compatible syscall
> +	  mmap() on architecture. Thus kernel has different bases to
> +	  compute high and low virtual address limits for allocation.

Sigh. How is a user supposed to decode this?

	  This allows 64bit applications to invoke syscalls in 64bit and
	  32bit mode. Required for ....

>  
> @@ -113,10 +114,19 @@ static void find_start_end(unsigned long flags, unsigned long *begin,
>  		if (current->flags & PF_RANDOMIZE) {
>  			*begin = randomize_page(*begin, 0x02000000);
>  		}
> -	} else {
> -		*begin = current->mm->mmap_legacy_base;
> -		*end = TASK_SIZE;
> +		return;
>  	}
> +
> +#ifdef CONFIG_COMPAT

Can you please find a solution which does not create that ifdef horror in
the code? Just a few accessors to those compat fields are required to do
that.

> +
> +#ifdef CONFIG_COMPAT
> +	arch_pick_mmap_base(&mm->mmap_compat_base, &mm->mmap_compat_legacy_base,
> +			arch_compat_rnd(), IA32_PAGE_OFFSET);
> +#endif

Ditto

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4 4/5] x86/mm: check in_compat_syscall() instead TIF_ADDR32 for mmap(MAP_32BIT)
  2017-01-30 12:04 ` [PATCHv4 4/5] x86/mm: check in_compat_syscall() instead TIF_ADDR32 for mmap(MAP_32BIT) Dmitry Safonov
@ 2017-02-11 20:13   ` Thomas Gleixner
  2017-02-14 16:11     ` Dmitry Safonov
  0 siblings, 1 reply; 25+ messages in thread
From: Thomas Gleixner @ 2017-02-11 20:13 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, 0x7f454c46, Ingo Molnar, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, x86, linux-mm

On Mon, 30 Jan 2017, Dmitry Safonov wrote:

> At this momet, logic in arch_get_unmapped_area{,_topdown} for mmaps with
> MAP_32BIT flag checks TIF_ADDR32 which means:
> o if 32-bit ELF changes mode to 64-bit on x86_64 and then tries to
>   mmap() with MAP_32BIT it'll result in addr over 4Gb (as default is
>   top-down allocation)
> o if 64-bit ELF changes mode to 32-bit and tries mmap() with MAP_32BIT,
>   it'll allocate only memory in 1GB space: [0x40000000, 0x80000000).
> 
> Fix it by handeling MAP_32BIT in 64-bit syscalls only.

I really have a hard time to understand what is fixed and how that is
related to the $subject.

Again. Please explain the problem first properly so one can understand the
issue immediately.

> As a little bonus it'll make thread flag a little less used.

I really do not understand the bonus part here. You replace the thread flag
check with a different one and AFAICT this looks like oart of the 'fix'.

Thanks,

	tglx

> @@ -101,7 +101,7 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
>  static void find_start_end(unsigned long flags, unsigned long *begin,
>  			   unsigned long *end)
>  {
> -	if (!test_thread_flag(TIF_ADDR32) && (flags & MAP_32BIT)) {
> +	if (!in_compat_syscall() && (flags & MAP_32BIT)) {
>  		/* This is usually used needed to map code in small
>  		   model, so it needs to be in the first 31bit. Limit
>  		   it to that.  This means we need to move the
> @@ -195,7 +195,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
>  		return addr;
>  
>  	/* for MAP_32BIT mappings we force the legacy mmap base */
> -	if (!test_thread_flag(TIF_ADDR32) && (flags & MAP_32BIT))
> +	if (!in_compat_syscall() && (flags & MAP_32BIT))
>  		goto bottomup;
>  
>  	/* requesting a specific address */
> -- 
> 2.11.0
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4 1/5] x86/mm: split arch_mmap_rnd() on compat/native versions
  2017-02-11  8:23         ` Thomas Gleixner
@ 2017-02-13 11:12           ` Dmitry Safonov
  2017-02-13 11:22             ` Thomas Gleixner
  0 siblings, 1 reply; 25+ messages in thread
From: Dmitry Safonov @ 2017-02-13 11:12 UTC (permalink / raw)
  To: Thomas Gleixner, Dmitry Safonov
  Cc: Borislav Petkov, open list, Ingo Molnar, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, X86 ML, linux-mm

On 02/11/2017 11:23 AM, Thomas Gleixner wrote:
> On Sat, 11 Feb 2017, Dmitry Safonov wrote:
>
>> 2017-02-10 23:10 GMT+03:00 Thomas Gleixner <tglx@linutronix.de>:
>>> On Thu, 9 Feb 2017, Borislav Petkov wrote:
>>>> I can't say that I'm thrilled about the ifdeffery this is adding.
>>>>
>>>> But I can't think of a cleaner approach at a quick glance, though -
>>>> that's generic and arch-specific code intertwined muck. Sad face.
>>>
>>> It's trivial enough to do ....
>>>
>>> Thanks,
>>>
>>>         tglx
>>>
>>> ---
>>>  arch/x86/mm/mmap.c |   22 ++++++++++------------
>>>  1 file changed, 10 insertions(+), 12 deletions(-)
>>>
>>> --- a/arch/x86/mm/mmap.c
>>> +++ b/arch/x86/mm/mmap.c
>>> @@ -55,6 +55,10 @@ static unsigned long stack_maxrandom_siz
>>>  #define MIN_GAP (128*1024*1024UL + stack_maxrandom_size())
>>>  #define MAX_GAP (TASK_SIZE/6*5)
>>>
>>> +#ifndef CONFIG_COMPAT
>>> +# define mmap_rnd_compat_bits  mmap_rnd_bits
>>> +#endif
>>> +
>>
>> >From my POV, I can't say that it's clearer to shadow mmap_compat_bits
>> like that then to have two functions with native/compat names.
>> But if you insist, I'll resend patches set with your version.
>
> You can make that
>
> #ifdef CONFIG_64BIT
> # define mmap32_rnd_bits  mmap_compat_rnd_bits
> # define mmap64_rnd_bits  mmap_rnd_bits
> #else
> # define mmap32_rnd_bits  mmap_rnd_bits
> # define mmap64_rnd_bits  mmap_rnd_bits
> #endif
>
> and use that. That's still way more readable than the unholy ifdef mess.

Ok, will send this version in v5.
Ping me if you mind me using your SOB for this patch.

>
> Thanks,
>
> 	tglx
>


-- 
              Dmitry

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4 1/5] x86/mm: split arch_mmap_rnd() on compat/native versions
  2017-02-13 11:12           ` Dmitry Safonov
@ 2017-02-13 11:22             ` Thomas Gleixner
  0 siblings, 0 replies; 25+ messages in thread
From: Thomas Gleixner @ 2017-02-13 11:22 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: Dmitry Safonov, Borislav Petkov, open list, Ingo Molnar,
	H. Peter Anvin, Andy Lutomirski, Borislav Petkov, X86 ML,
	linux-mm

On Mon, 13 Feb 2017, Dmitry Safonov wrote:
> On 02/11/2017 11:23 AM, Thomas Gleixner wrote:
> > On Sat, 11 Feb 2017, Dmitry Safonov wrote:
> > 
> > > 2017-02-10 23:10 GMT+03:00 Thomas Gleixner <tglx@linutronix.de>:
> > > > On Thu, 9 Feb 2017, Borislav Petkov wrote:
> > > > > I can't say that I'm thrilled about the ifdeffery this is adding.
> > > > > 
> > > > > But I can't think of a cleaner approach at a quick glance, though -
> > > > > that's generic and arch-specific code intertwined muck. Sad face.
> > > > 
> > > > It's trivial enough to do ....
> > > > 
> > > > Thanks,
> > > > 
> > > >         tglx
> > > > 
> > > > ---
> > > >  arch/x86/mm/mmap.c |   22 ++++++++++------------
> > > >  1 file changed, 10 insertions(+), 12 deletions(-)
> > > > 
> > > > --- a/arch/x86/mm/mmap.c
> > > > +++ b/arch/x86/mm/mmap.c
> > > > @@ -55,6 +55,10 @@ static unsigned long stack_maxrandom_siz
> > > >  #define MIN_GAP (128*1024*1024UL + stack_maxrandom_size())
> > > >  #define MAX_GAP (TASK_SIZE/6*5)
> > > > 
> > > > +#ifndef CONFIG_COMPAT
> > > > +# define mmap_rnd_compat_bits  mmap_rnd_bits
> > > > +#endif
> > > > +
> > > 
> > > >From my POV, I can't say that it's clearer to shadow mmap_compat_bits
> > > like that then to have two functions with native/compat names.
> > > But if you insist, I'll resend patches set with your version.
> > 
> > You can make that
> > 
> > #ifdef CONFIG_64BIT
> > # define mmap32_rnd_bits  mmap_compat_rnd_bits
> > # define mmap64_rnd_bits  mmap_rnd_bits
> > #else
> > # define mmap32_rnd_bits  mmap_rnd_bits
> > # define mmap64_rnd_bits  mmap_rnd_bits
> > #endif
> > 
> > and use that. That's still way more readable than the unholy ifdef mess.
> 
> Ok, will send this version in v5.
> Ping me if you mind me using your SOB for this patch.

Suggested-by is enough

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4 2/5] x86/mm: introduce mmap{,_legacy}_base
  2017-02-11 14:13   ` Thomas Gleixner
@ 2017-02-13 13:02     ` Dmitry Safonov
  2017-02-13 13:13       ` Thomas Gleixner
  2017-02-13 14:37     ` Dmitry Safonov
  1 sibling, 1 reply; 25+ messages in thread
From: Dmitry Safonov @ 2017-02-13 13:02 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, 0x7f454c46, Ingo Molnar, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, x86, linux-mm

On 02/11/2017 05:13 PM, Thomas Gleixner wrote:
> On Mon, 30 Jan 2017, Dmitry Safonov wrote:
>
>> In the following patch they will be used to compute:
>> - mmap{,_legacy}_base for 64-bit mmap()
>> - mmap_compat{,_legacy}_base for 32-bit mmap()
>>
>> This patch makes it possible to calculate mmap bases for any specified
>> task_size, which is needed to correctly choose the base address for mmap
>> in 32-bit syscalls and 64-bit syscalls.
>
> Please rework your changelogs so they follow the requirements in
> Documentation....  Look for 'this patch' there.

Sure, I'll rewrite the changelogs.

>
> Also order it in a way which makes it clear what the problem is and how
> it's solved.
>
>> +#define STACK_RND_MASK_MODE(native) (0x7ff)
>>  #define STACK_RND_MASK (0x7ff)
>>
>>  #define ARCH_DLINFO		ARCH_DLINFO_IA32
>> @@ -295,7 +296,8 @@ do {									\
>>  #else /* CONFIG_X86_32 */
>>
>>  /* 1GB for 64bit, 8MB for 32bit */
>> -#define STACK_RND_MASK (test_thread_flag(TIF_ADDR32) ? 0x7ff : 0x3fffff)
>> +#define STACK_RND_MASK_MODE(native) ((native) ? 0x3fffff : 0x7ff)
>> +#define STACK_RND_MASK STACK_RND_MASK_MODE(!test_thread_flag(TIF_ADDR32))
>
> I had to look twice what MODE means. The macro name suggests that it
> returns the mode, while it actually returns the mask. That's confusing at
> best. And 'native' is not intuitive either.
>
> #define __STACK_RND_MASK(is64bit)	((is64bit) ? 0x3fffff : 0x7ff)
> #define STACK_RND_MASK			__STACK_RND_MASK(!mmap_is_ia32())
>
> That is pretty much clear and it uses mmap_is_ia32() as we do in other
> places. As a consequemce you can use the same macros for 64 and 32 bit.

Thanks, that's more readable. I was tempting to name it more
self-descriptive, but two underscores looks enough.
I'll resend with `is32bit' parameter - this way I'll also get rid of
negaion in a call.
I'll also need to move it down in header to omit forward-declaration of
mmap_is_ia32().

>
>> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
>> index 1be64da0384e..52086e65b422 100644
>> --- a/arch/x86/include/asm/processor.h
>> +++ b/arch/x86/include/asm/processor.h
>> @@ -862,7 +862,8 @@ extern void start_thread(struct pt_regs *regs, unsigned long new_ip,
>>   * This decides where the kernel will search for a free chunk of vm
>>   * space during mmap's.
>>   */
>> -#define TASK_UNMAPPED_BASE	(PAGE_ALIGN(TASK_SIZE / 3))
>> +#define _TASK_UNMAPPED_BASE(task_size)	(PAGE_ALIGN(task_size / 3))
>> +#define TASK_UNMAPPED_BASE	_TASK_UNMAPPED_BASE(TASK_SIZE)
>
> Please use two underscores and align the macros in tabular fashion.
>
> #define __TASK_UNMAPPED_BASE(task_size)	(PAGE_ALIGN(task_size / 3))
> #define TASK_UNMAPPED_BASE		__TASK_UNMAPPED_BASE(TASK_SIZE)
>
> That way stuff becomes easy to read.

Ok

>
>>  #define KSTK_EIP(task)		(task_pt_regs(task)->ip)
>>
>> diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
>> index 42063e787717..98be520fd270 100644
>> --- a/arch/x86/mm/mmap.c
>> +++ b/arch/x86/mm/mmap.c
>> @@ -35,12 +35,14 @@ struct va_alignment __read_mostly va_align = {
>>  	.flags = -1,
>>  };
>>
>> -static unsigned long stack_maxrandom_size(void)
>> +static unsigned long stack_maxrandom_size(unsigned long task_size)
>>  {
>>  	unsigned long max = 0;
>>  	if ((current->flags & PF_RANDOMIZE) &&
>>  		!(current->personality & ADDR_NO_RANDOMIZE)) {
>> -		max = ((-1UL) & STACK_RND_MASK) << PAGE_SHIFT;
>> +		max = (-1UL);
>> +		max &= STACK_RND_MASK_MODE(task_size == TASK_SIZE_MAX);
>
> That just makes me barf, really. I have to go and lookup how TASK_SIZE_MAX
> is defined in order to read that code. TASK_SIZE_MAX as is does not give a
> hint at all that it means TASK_SIZE_MAX of 64 bit tasks.
>
> You just explained me that you want stuff proper for clarity reasons. So
> what's so wrong with adding a helper inline tasksize_64bit() or such?

I thought about that, but I'll need to redefine it under ifdefs :-/
I mean, for 32-bit native code.
Hmm, I think, if I use is32bit parameter for __STACK_RND_MASK(),
will it be more readable if I just compare to IA32_PAGE_OFFSET here?
Or does it makes sence to introduce tasksize_32bit()?

>
>> +		max <<= PAGE_SHIFT;
>>  	}
>>
>>  	return max;
>> @@ -51,8 +53,8 @@ static unsigned long stack_maxrandom_size(void)
>>   *
>>   * Leave an at least ~128 MB hole with possible stack randomization.
>>   */
>> -#define MIN_GAP (128*1024*1024UL + stack_maxrandom_size())
>> -#define MAX_GAP (TASK_SIZE/6*5)
>> +#define MIN_GAP(task_size) (128*1024*1024UL + stack_maxrandom_size(task_size))
>> +#define MAX_GAP(task_size) (task_size/6*5)
>
> Eew. Yes it's existing macro mess, but there is not point in proliferating
> that. That macro crap is only used in mmap_base() and there is no
> justification for these unreadable macros at all. It just makes stuff
> obfuscated for no reason. Just blindly making it more obfuscated does not
> make it any better.
>
>>  static int mmap_is_legacy(void)
>>  {
>> @@ -88,16 +90,22 @@ unsigned long arch_mmap_rnd(void)
>>  	return arch_native_rnd();
>>  }
>
> #define SIZE_128M	(128 * 1024 * 1024UL)
>
>> -static unsigned long mmap_base(unsigned long rnd)
>> +static unsigned long mmap_base(unsigned long rnd, unsigned long task_size)
>>  {
>> 	unsigned long gap = rlimit(RLIMIT_STACK);
> 	unsigned long gap_min, gap_max;
>
> 	/* Add comment what this means */
> 	gap_min = SIZE_128M + stack_maxrandom_size(task_size);
> 	/* Explain that ' /6 * 5' magic */
> 	gap_max = (task_size / 6) * 5;
>
> and use gap_min/gap_max below. That would be too intuitive and readable,
> right?

Haha, will do, thanks again.

>
>> -	if (gap < MIN_GAP)
>> -		gap = MIN_GAP;
>> -	else if (gap > MAX_GAP)
>> -		gap = MAX_GAP;
>> +	if (gap < MIN_GAP(task_size))
>> +		gap = MIN_GAP(task_size);
>> +	else if (gap > MAX_GAP(task_size))
>> +		gap = MAX_GAP(task_size);
>>
>> -	return PAGE_ALIGN(TASK_SIZE - gap - rnd);
>> +	return PAGE_ALIGN(task_size - gap - rnd);
>> +}
>> +
>> +static unsigned long mmap_legacy_base(unsigned long rnd,
>> +		unsigned long task_size)
>
> Please align the argument in the second line with the argument in the first
> one
>
> static unsigned long mmap_legacy_base(unsigned long rnd,
> 		     		      unsigned long task_size)

Ok

>
> Thanks,
>
> 	tglx
>


-- 
              Dmitry

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4 2/5] x86/mm: introduce mmap{,_legacy}_base
  2017-02-13 13:02     ` Dmitry Safonov
@ 2017-02-13 13:13       ` Thomas Gleixner
  0 siblings, 0 replies; 25+ messages in thread
From: Thomas Gleixner @ 2017-02-13 13:13 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, 0x7f454c46, Ingo Molnar, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, x86, linux-mm

On Mon, 13 Feb 2017, Dmitry Safonov wrote:
> > That just makes me barf, really. I have to go and lookup how TASK_SIZE_MAX
> > is defined in order to read that code. TASK_SIZE_MAX as is does not give a
> > hint at all that it means TASK_SIZE_MAX of 64 bit tasks.
> > 
> > You just explained me that you want stuff proper for clarity reasons. So
> > what's so wrong with adding a helper inline tasksize_64bit() or such?
> 
> I thought about that, but I'll need to redefine it under ifdefs :-/
> I mean, for 32-bit native code.
> Hmm, I think, if I use is32bit parameter for __STACK_RND_MASK(),
> will it be more readable if I just compare to IA32_PAGE_OFFSET here?
> Or does it makes sence to introduce tasksize_32bit()?

Yes, having such a helper makes it immediately clear what this is
about. IA32_PAGE_OFFSET is not really helpful either.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4 2/5] x86/mm: introduce mmap{,_legacy}_base
  2017-02-11 14:13   ` Thomas Gleixner
  2017-02-13 13:02     ` Dmitry Safonov
@ 2017-02-13 14:37     ` Dmitry Safonov
  2017-02-13 15:35       ` Thomas Gleixner
  1 sibling, 1 reply; 25+ messages in thread
From: Dmitry Safonov @ 2017-02-13 14:37 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, 0x7f454c46, Ingo Molnar, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, x86, linux-mm

On 02/11/2017 05:13 PM, Thomas Gleixner wrote:
>> -static unsigned long mmap_base(unsigned long rnd)
>> +static unsigned long mmap_base(unsigned long rnd, unsigned long task_size)
>>  {
>> 	unsigned long gap = rlimit(RLIMIT_STACK);
> 	unsigned long gap_min, gap_max;
>
> 	/* Add comment what this means */
> 	gap_min = SIZE_128M + stack_maxrandom_size(task_size);
> 	/* Explain that ' /6 * 5' magic */
> 	gap_max = (task_size / 6) * 5;

So, I can't find about those limits on a gap size:
They were introduced by commit 8913d55b6c58 ("i386 virtual memory
layout rework").
All I could find is that 128Mb limit was more limit on virtual adress
space than on a memory available those days.
And 5/6 of task_size looks like heuristic value.
So I'm not sure, what to write in comments:
that rlimit on stack can't be bigger than 5/6 of task_size?
That looks obvious from the code.


-- 
              Dmitry

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4 2/5] x86/mm: introduce mmap{,_legacy}_base
  2017-02-13 14:37     ` Dmitry Safonov
@ 2017-02-13 15:35       ` Thomas Gleixner
  0 siblings, 0 replies; 25+ messages in thread
From: Thomas Gleixner @ 2017-02-13 15:35 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, 0x7f454c46, Ingo Molnar, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, x86, linux-mm

On Mon, 13 Feb 2017, Dmitry Safonov wrote:
> On 02/11/2017 05:13 PM, Thomas Gleixner wrote:
> > > -static unsigned long mmap_base(unsigned long rnd)
> > > +static unsigned long mmap_base(unsigned long rnd, unsigned long
> > > task_size)
> > >  {
> > > 	unsigned long gap = rlimit(RLIMIT_STACK);
> > 	unsigned long gap_min, gap_max;
> > 
> > 	/* Add comment what this means */
> > 	gap_min = SIZE_128M + stack_maxrandom_size(task_size);
> > 	/* Explain that ' /6 * 5' magic */
> > 	gap_max = (task_size / 6) * 5;
> 
> So, I can't find about those limits on a gap size:
> They were introduced by commit 8913d55b6c58 ("i386 virtual memory
> layout rework").
> All I could find is that 128Mb limit was more limit on virtual adress
> space than on a memory available those days.
> And 5/6 of task_size looks like heuristic value.
> So I'm not sure, what to write in comments:
> that rlimit on stack can't be bigger than 5/6 of task_size?
> That looks obvious from the code.

So just leave it alone. 5/6 is pulled from thin air and 128M probably as
well. I hoped there would be some reasonable explanation ....

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4 3/5] x86/mm: fix 32-bit mmap() for 64-bit ELF
  2017-02-11 19:49   ` Thomas Gleixner
@ 2017-02-14 15:24     ` Dmitry Safonov
  0 siblings, 0 replies; 25+ messages in thread
From: Dmitry Safonov @ 2017-02-14 15:24 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, 0x7f454c46, Ingo Molnar, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, x86, linux-mm

On 02/11/2017 10:49 PM, Thomas Gleixner wrote:
> On Mon, 30 Jan 2017, Dmitry Safonov wrote:
>
>> Fix 32-bit compat_sys_mmap() mapping VMA over 4Gb in 64-bit binaries
>> and 64-bit sys_mmap() mapping VMA only under 4Gb in 32-bit binaries.
>> Introduced new bases for compat syscalls in mm_struct:
>> mmap_compat_base and mmap_compat_legacy_base for top-down and
>> bottom-up allocations accordingly.
>> Taught arch_get_unmapped_area{,_topdown}() to use the new mmap_bases
>> in compat syscalls for high/low limits in vm_unmapped_area().
>>
>> I discovered that bug on ZDTM tests for compat 32-bit C/R.
>> Working compat sys_mmap() in 64-bit binaries is really needed for that
>> purpose, as 32-bit applications are restored from 64-bit CRIU binary.
>
> Again that changelog sucks.
>
> Explain the problem/bug first. Then explain the way to fix it and do not
> tell fairy tales about what you did without explaing the bug in the first
> place.
>
> Documentation....SubittingPatches explains that very well.

Rewrote changelog.

>> +config HAVE_ARCH_COMPAT_MMAP_BASES
>> +	bool
>> +	help
>> +	  If this is set, one program can do native and compatible syscall
>> +	  mmap() on architecture. Thus kernel has different bases to
>> +	  compute high and low virtual address limits for allocation.
>
> Sigh. How is a user supposed to decode this?
>
> 	  This allows 64bit applications to invoke syscalls in 64bit and
> 	  32bit mode. Required for ....

Ok

>>
>> @@ -113,10 +114,19 @@ static void find_start_end(unsigned long flags, unsigned long *begin,
>>  		if (current->flags & PF_RANDOMIZE) {
>>  			*begin = randomize_page(*begin, 0x02000000);
>>  		}
>> -	} else {
>> -		*begin = current->mm->mmap_legacy_base;
>> -		*end = TASK_SIZE;
>> +		return;
>>  	}
>> +
>> +#ifdef CONFIG_COMPAT
>
> Can you please find a solution which does not create that ifdef horror in
> the code? Just a few accessors to those compat fields are required to do
> that.

I'll try

>> +
>> +#ifdef CONFIG_COMPAT
>> +	arch_pick_mmap_base(&mm->mmap_compat_base, &mm->mmap_compat_legacy_base,
>> +			arch_compat_rnd(), IA32_PAGE_OFFSET);
>> +#endif
>
> Ditto
>
> Thanks,
>
> 	tglx
>


-- 
              Dmitry

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4 4/5] x86/mm: check in_compat_syscall() instead TIF_ADDR32 for mmap(MAP_32BIT)
  2017-02-11 20:13   ` Thomas Gleixner
@ 2017-02-14 16:11     ` Dmitry Safonov
  2017-02-14 16:14       ` Dmitry Safonov
  0 siblings, 1 reply; 25+ messages in thread
From: Dmitry Safonov @ 2017-02-14 16:11 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, 0x7f454c46, Ingo Molnar, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, x86, linux-mm

On 02/11/2017 11:13 PM, Thomas Gleixner wrote:
> On Mon, 30 Jan 2017, Dmitry Safonov wrote:
>
>> At this momet, logic in arch_get_unmapped_area{,_topdown} for mmaps with
>> MAP_32BIT flag checks TIF_ADDR32 which means:
>> o if 32-bit ELF changes mode to 64-bit on x86_64 and then tries to
>>   mmap() with MAP_32BIT it'll result in addr over 4Gb (as default is
>>   top-down allocation)
>> o if 64-bit ELF changes mode to 32-bit and tries mmap() with MAP_32BIT,
>>   it'll allocate only memory in 1GB space: [0x40000000, 0x80000000).
>>
>> Fix it by handeling MAP_32BIT in 64-bit syscalls only.
>
> I really have a hard time to understand what is fixed and how that is
> related to the $subject.
>
> Again. Please explain the problem first properly so one can understand the
> issue immediately.

Ok, rewrote the changes log.

>
>> As a little bonus it'll make thread flag a little less used.
>
> I really do not understand the bonus part here. You replace the thread flag
> check with a different one and AFAICT this looks like oart of the 'fix'.

It's a part of the fix, right.
What I meant here is that after those patches TIF_ADDR32 is no more
used after exec() time. That's bonus as if we manage to change exec()
code in some way (i.e., pass address restriction as a parameter), we'll
have additional free thread info flag.

> Thanks,
>
> 	tglx
>
>> @@ -101,7 +101,7 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
>>  static void find_start_end(unsigned long flags, unsigned long *begin,
>>  			   unsigned long *end)
>>  {
>> -	if (!test_thread_flag(TIF_ADDR32) && (flags & MAP_32BIT)) {
>> +	if (!in_compat_syscall() && (flags & MAP_32BIT)) {
>>  		/* This is usually used needed to map code in small
>>  		   model, so it needs to be in the first 31bit. Limit
>>  		   it to that.  This means we need to move the
>> @@ -195,7 +195,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
>>  		return addr;
>>
>>  	/* for MAP_32BIT mappings we force the legacy mmap base */
>> -	if (!test_thread_flag(TIF_ADDR32) && (flags & MAP_32BIT))
>> +	if (!in_compat_syscall() && (flags & MAP_32BIT))
>>  		goto bottomup;
>>
>>  	/* requesting a specific address */
>> --
>> 2.11.0
>>
>>


-- 
              Dmitry

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4 4/5] x86/mm: check in_compat_syscall() instead TIF_ADDR32 for mmap(MAP_32BIT)
  2017-02-14 16:11     ` Dmitry Safonov
@ 2017-02-14 16:14       ` Dmitry Safonov
  0 siblings, 0 replies; 25+ messages in thread
From: Dmitry Safonov @ 2017-02-14 16:14 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, 0x7f454c46, Ingo Molnar, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, x86, linux-mm

On 02/14/2017 07:11 PM, Dmitry Safonov wrote:
> On 02/11/2017 11:13 PM, Thomas Gleixner wrote:
>> On Mon, 30 Jan 2017, Dmitry Safonov wrote:
>>
>>> At this momet, logic in arch_get_unmapped_area{,_topdown} for mmaps with
>>> MAP_32BIT flag checks TIF_ADDR32 which means:
>>> o if 32-bit ELF changes mode to 64-bit on x86_64 and then tries to
>>>   mmap() with MAP_32BIT it'll result in addr over 4Gb (as default is
>>>   top-down allocation)
>>> o if 64-bit ELF changes mode to 32-bit and tries mmap() with MAP_32BIT,
>>>   it'll allocate only memory in 1GB space: [0x40000000, 0x80000000).
>>>
>>> Fix it by handeling MAP_32BIT in 64-bit syscalls only.
>>
>> I really have a hard time to understand what is fixed and how that is
>> related to the $subject.
>>
>> Again. Please explain the problem first properly so one can understand
>> the
>> issue immediately.
>
> Ok, rewrote the changes log.
>
>>
>>> As a little bonus it'll make thread flag a little less used.
>>
>> I really do not understand the bonus part here. You replace the thread
>> flag
>> check with a different one and AFAICT this looks like oart of the 'fix'.
>
> It's a part of the fix, right.
> What I meant here is that after those patches TIF_ADDR32 is no more
> used after exec() time. That's bonus as if we manage to change exec()
> code in some way (i.e., pass address restriction as a parameter), we'll
> have additional free thread info flag.

Oh, I lied here: it'll be still used for TASK_SIZE macro and etc.
But, I don't see why in generic code we can't use TASK_SIZE_MAX,
at least at this moment.

>
>> Thanks,
>>
>>     tglx
>>
>>> @@ -101,7 +101,7 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr,
>>> unsigned long, len,
>>>  static void find_start_end(unsigned long flags, unsigned long *begin,
>>>                 unsigned long *end)
>>>  {
>>> -    if (!test_thread_flag(TIF_ADDR32) && (flags & MAP_32BIT)) {
>>> +    if (!in_compat_syscall() && (flags & MAP_32BIT)) {
>>>          /* This is usually used needed to map code in small
>>>             model, so it needs to be in the first 31bit. Limit
>>>             it to that.  This means we need to move the
>>> @@ -195,7 +195,7 @@ arch_get_unmapped_area_topdown(struct file *filp,
>>> const unsigned long addr0,
>>>          return addr;
>>>
>>>      /* for MAP_32BIT mappings we force the legacy mmap base */
>>> -    if (!test_thread_flag(TIF_ADDR32) && (flags & MAP_32BIT))
>>> +    if (!in_compat_syscall() && (flags & MAP_32BIT))
>>>          goto bottomup;
>>>
>>>      /* requesting a specific address */
>>> --
>>> 2.11.0
>>>
>>>
>
>


-- 
              Dmitry

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2017-02-14 16:18 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-30 12:04 [PATCHv4 0/5] Fix compatible mmap() return pointer over 4Gb Dmitry Safonov
2017-01-30 12:04 ` [PATCHv4 1/5] x86/mm: split arch_mmap_rnd() on compat/native versions Dmitry Safonov
2017-02-09 13:55   ` Borislav Petkov
2017-02-09 23:06     ` Andy Lutomirski
2017-02-10 20:10     ` Thomas Gleixner
2017-02-10 20:25       ` Borislav Petkov
2017-02-10 21:28       ` Dmitry Safonov
2017-02-11  8:23         ` Thomas Gleixner
2017-02-13 11:12           ` Dmitry Safonov
2017-02-13 11:22             ` Thomas Gleixner
2017-01-30 12:04 ` [PATCHv4 2/5] x86/mm: introduce mmap{,_legacy}_base Dmitry Safonov
2017-02-11 14:13   ` Thomas Gleixner
2017-02-13 13:02     ` Dmitry Safonov
2017-02-13 13:13       ` Thomas Gleixner
2017-02-13 14:37     ` Dmitry Safonov
2017-02-13 15:35       ` Thomas Gleixner
2017-01-30 12:04 ` [PATCHv4 3/5] x86/mm: fix 32-bit mmap() for 64-bit ELF Dmitry Safonov
2017-02-11 19:49   ` Thomas Gleixner
2017-02-14 15:24     ` Dmitry Safonov
2017-01-30 12:04 ` [PATCHv4 4/5] x86/mm: check in_compat_syscall() instead TIF_ADDR32 for mmap(MAP_32BIT) Dmitry Safonov
2017-02-11 20:13   ` Thomas Gleixner
2017-02-14 16:11     ` Dmitry Safonov
2017-02-14 16:14       ` Dmitry Safonov
2017-01-30 12:04 ` [PATCHv4 5/5] selftests/x86: add test to check compat mmap() return addr Dmitry Safonov
2017-02-06 16:46 ` [PATCHv4 0/5] Fix compatible mmap() return pointer over 4Gb Dmitry Safonov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).