linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5] mm: Optional full ASLR for mmap(), mremap(), vdso and stack
@ 2020-11-29 21:15 Topi Miettinen
  2020-11-30  6:08 ` kernel test robot
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Topi Miettinen @ 2020-11-29 21:15 UTC (permalink / raw)
  To: linux-hardening, akpm, linux-mm, linux-kernel
  Cc: Topi Miettinen, Jann Horn, Kees Cook, Matthew Wilcox,
	Mike Rapoport, Linux API

Writing a new value of 3 to /proc/sys/kernel/randomize_va_space
enables full randomization of memory mappings created with mmap(NULL,
...). With 2, the base of the VMA used for such mappings is random,
but the mappings are created in predictable places within the VMA and
in sequential order. With 3, new VMAs are created to fully randomize
the mappings.

Also mremap(..., MREMAP_MAYMOVE) will move the mappings even if not
necessary and the location of stack and vdso are also randomized.

The method is to randomize the new address without considering
VMAs. If the address fails checks because of overlap with the stack
area (or in case of mremap(), overlap with the old mapping), the
operation is retried a few times before falling back to old method.

On 32 bit systems this may cause problems due to increased VM
fragmentation if the address space gets crowded.

On all systems, it will reduce performance and increase memory usage
due to less efficient use of page tables and inability to merge
adjacent VMAs with compatible attributes. In the worst case,
additional page table entries of up to 4 pages are created for each
mapping, so with small mappings there's considerable penalty.

In this example with sysctl.kernel.randomize_va_space = 2, dynamic
loader, libc, anonymous memory reserved with mmap() and locale-archive
are located close to each other:

$ cat /proc/self/maps (only first line for each object shown for brevity)
5acea452d000-5acea452f000 r--p 00000000 fe:0c 1868624                    /usr/bin/cat
74f438f90000-74f4394f2000 r--p 00000000 fe:0c 2473999                    /usr/lib/locale/locale-archive
74f4394f2000-74f4395f2000 rw-p 00000000 00:00 0
74f4395f2000-74f439617000 r--p 00000000 fe:0c 2402332                    /usr/lib/x86_64-linux-gnu/libc-2.31.so
74f4397b3000-74f4397b9000 rw-p 00000000 00:00 0
74f4397e5000-74f4397e6000 r--p 00000000 fe:0c 2400754                    /usr/lib/x86_64-linux-gnu/ld-2.31.so
74f439811000-74f439812000 rw-p 00000000 00:00 0
7fffdca0d000-7fffdca2e000 rw-p 00000000 00:00 0                          [stack]
7fffdcb49000-7fffdcb4d000 r--p 00000000 00:00 0                          [vvar]
7fffdcb4d000-7fffdcb4f000 r-xp 00000000 00:00 0                          [vdso]

With sysctl.kernel.randomize_va_space = 3, they are located at
unrelated addresses and the order is random:

$ echo 3 > /proc/sys/kernel/randomize_va_space
$ cat /proc/self/maps (only first line for each object shown for brevity)
3850520000-3850620000 rw-p 00000000 00:00 0
28cfb4c8000-28cfb4cc000 r--p 00000000 00:00 0                            [vvar]
28cfb4cc000-28cfb4ce000 r-xp 00000000 00:00 0                            [vdso]
9e74c385000-9e74c387000 rw-p 00000000 00:00 0
a42e0233000-a42e0234000 r--p 00000000 fe:0c 2400754                      /usr/lib/x86_64-linux-gnu/ld-2.31.so
a42e025f000-a42e0260000 rw-p 00000000 00:00 0
bea40427000-bea4044c000 r--p 00000000 fe:0c 2402332                      /usr/lib/x86_64-linux-gnu/libc-2.31.so
bea405e8000-bea405ec000 rw-p 00000000 00:00 0
f6d446fa000-f6d44c5c000 r--p 00000000 fe:0c 2473999                      /usr/lib/locale/locale-archive
fcfbf684000-fcfbf6a5000 rw-p 00000000 00:00 0                            [stack]
619aba62d000-619aba62f000 r--p 00000000 fe:0c 1868624                    /usr/bin/cat

CC: Andrew Morton <akpm@linux-foundation.org>
CC: Jann Horn <jannh@google.com>
CC: Kees Cook <keescook@chromium.org>
CC: Matthew Wilcox <willy@infradead.org>
CC: Mike Rapoport <rppt@kernel.org>
CC: Linux API <linux-api@vger.kernel.org>
Signed-off-by: Topi Miettinen <toiwoton@gmail.com>
---
v2: also randomize mremap(..., MREMAP_MAYMOVE)
v3: avoid stack area and retry in case of bad random address (Jann
Horn), improve description in kernel.rst (Matthew Wilcox)
v4:
- use /proc/$pid/maps in the example (Mike Rapaport)
- CCs (Andrew Morton)
- only check randomize_va_space == 3
v5: randomize also vdso and stack
---
 Documentation/admin-guide/hw-vuln/spectre.rst |  6 ++--
 Documentation/admin-guide/sysctl/kernel.rst   | 20 +++++++++++++
 arch/x86/entry/vdso/vma.c                     | 26 +++++++++++++++-
 include/linux/mm.h                            |  8 +++++
 init/Kconfig                                  |  2 +-
 mm/mmap.c                                     | 30 +++++++++++++------
 mm/mremap.c                                   | 27 +++++++++++++++++
 mm/util.c                                     |  6 ++++
 8 files changed, 111 insertions(+), 14 deletions(-)

diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst
index e05e581af5cf..9ea250522077 100644
--- a/Documentation/admin-guide/hw-vuln/spectre.rst
+++ b/Documentation/admin-guide/hw-vuln/spectre.rst
@@ -254,7 +254,7 @@ Spectre variant 2
    left by the previous process will also be cleared.
 
    User programs should use address space randomization to make attacks
-   more difficult (Set /proc/sys/kernel/randomize_va_space = 1 or 2).
+   more difficult (Set /proc/sys/kernel/randomize_va_space = 1, 2 or 3).
 
 3. A virtualized guest attacking the host
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -499,8 +499,8 @@ Spectre variant 2
    more overhead and run slower.
 
    User programs should use address space randomization
-   (/proc/sys/kernel/randomize_va_space = 1 or 2) to make attacks more
-   difficult.
+   (/proc/sys/kernel/randomize_va_space = 1, 2 or 3) to make attacks
+   more difficult.
 
 3. VM mitigation
 ^^^^^^^^^^^^^^^^
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index d4b32cc32bb7..806e3b29d2b5 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -1060,6 +1060,26 @@ that support this feature.
     Systems with ancient and/or broken binaries should be configured
     with ``CONFIG_COMPAT_BRK`` enabled, which excludes the heap from process
     address space randomization.
+
+3   Additionally enable full randomization of memory mappings created
+    with mmap(NULL, ...). With 2, the base of the VMA used for such
+    mappings is random, but the mappings are created in predictable
+    places within the VMA and in sequential order. With 3, new VMAs
+    are created to fully randomize the mappings.
+
+    Also mremap(..., MREMAP_MAYMOVE) will move the mappings even if
+    not necessary and the location of stack and vdso are also
+    randomized.
+
+    On 32 bit systems this may cause problems due to increased VM
+    fragmentation if the address space gets crowded.
+
+    On all systems, it will reduce performance and increase memory
+    usage due to less efficient use of page tables and inability to
+    merge adjacent VMAs with compatible attributes. In the worst case,
+    additional page table entries of up to 4 pages are created for
+    each mapping, so with small mappings there's considerable penalty.
+
 ==  ===========================================================================
 
 
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 9185cb1d13b9..03ea884822e3 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -12,6 +12,7 @@
 #include <linux/init.h>
 #include <linux/random.h>
 #include <linux/elf.h>
+#include <linux/elf-randomize.h>
 #include <linux/cpu.h>
 #include <linux/ptrace.h>
 #include <linux/time_namespace.h>
@@ -32,6 +33,8 @@
 	const size_t name ## _offset = offset;
 #include <asm/vvar.h>
 
+#define MAX_RANDOM_VDSO_RETRIES			5
+
 struct vdso_data *arch_get_vdso_data(void *vvar_page)
 {
 	return (struct vdso_data *)(vvar_page + _vdso_data_offset);
@@ -361,7 +364,28 @@ static unsigned long vdso_addr(unsigned long start, unsigned len)
 
 static int map_vdso_randomized(const struct vdso_image *image)
 {
-	unsigned long addr = vdso_addr(current->mm->start_stack, image->size-image->sym_vvar_start);
+	unsigned long addr;
+
+	if (randomize_va_space == 3) {
+		/*
+		 * Randomize vdso address.
+		 */
+		int i = MAX_RANDOM_VDSO_RETRIES;
+
+		do {
+			int ret;
+
+			/* Try a few times to find a free area */
+			addr = arch_mmap_rnd();
+
+			ret = map_vdso(image, addr);
+			if (!IS_ERR_VALUE(ret))
+				return ret;
+		} while (--i >= 0);
+
+		/* Give up and try the less random way */
+	}
+	addr = vdso_addr(current->mm->start_stack, image->size-image->sym_vvar_start);
 
 	return map_vdso(image, addr);
 }
diff --git a/include/linux/mm.h b/include/linux/mm.h
index db6ae4d3fb4e..ac5464f66066 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -97,6 +97,14 @@ extern const int mmap_rnd_compat_bits_max;
 extern int mmap_rnd_compat_bits __read_mostly;
 #endif
 
+#ifndef arch_get_mmap_end
+#define arch_get_mmap_end(addr)	(TASK_SIZE)
+#endif
+
+#ifndef arch_get_mmap_base
+#define arch_get_mmap_base(addr, base) (base)
+#endif
+
 #include <asm/page.h>
 #include <asm/processor.h>
 
diff --git a/init/Kconfig b/init/Kconfig
index 02d13ae27abb..9a0db5bed032 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1863,7 +1863,7 @@ config COMPAT_BRK
 	  also breaks ancient binaries (including anything libc5 based).
 	  This option changes the bootup default to heap randomization
 	  disabled, and can be overridden at runtime by setting
-	  /proc/sys/kernel/randomize_va_space to 2.
+	  /proc/sys/kernel/randomize_va_space to 2 or 3.
 
 	  On non-ancient distros (post-2000 ones) N is usually a safe choice.
 
diff --git a/mm/mmap.c b/mm/mmap.c
index d91ecb00d38c..3677491e999b 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -47,6 +47,7 @@
 #include <linux/pkeys.h>
 #include <linux/oom.h>
 #include <linux/sched/mm.h>
+#include <linux/elf-randomize.h>
 
 #include <linux/uaccess.h>
 #include <asm/cacheflush.h>
@@ -73,6 +74,8 @@ const int mmap_rnd_compat_bits_max = CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX;
 int mmap_rnd_compat_bits __read_mostly = CONFIG_ARCH_MMAP_RND_COMPAT_BITS;
 #endif
 
+#define MAX_RANDOM_MMAP_RETRIES			5
+
 static bool ignore_rlimit_data;
 core_param(ignore_rlimit_data, ignore_rlimit_data, bool, 0644);
 
@@ -206,7 +209,7 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
 #ifdef CONFIG_COMPAT_BRK
 	/*
 	 * CONFIG_COMPAT_BRK can still be overridden by setting
-	 * randomize_va_space to 2, which will still cause mm->start_brk
+	 * randomize_va_space to >= 2, which will still cause mm->start_brk
 	 * to be arbitrarily shifted
 	 */
 	if (current->brk_randomized)
@@ -1445,6 +1448,23 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 	if (mm->map_count > sysctl_max_map_count)
 		return -ENOMEM;
 
+	/* Pick a random address even outside current VMAs? */
+	if (!addr && randomize_va_space == 3) {
+		int i = MAX_RANDOM_MMAP_RETRIES;
+		unsigned long max_addr = arch_get_mmap_base(addr, mm->mmap_base);
+
+		do {
+			/* Try a few times to find a free area */
+			addr = arch_mmap_rnd();
+			if (addr >= max_addr)
+				continue;
+			addr = get_unmapped_area(file, addr, len, pgoff, flags);
+		} while (--i >= 0 && !IS_ERR_VALUE(addr));
+
+		if (IS_ERR_VALUE(addr))
+			addr = 0;
+	}
+
 	/* Obtain the address to map to. we verify (or select) it and ensure
 	 * that it represents a valid section of the address space.
 	 */
@@ -2142,14 +2162,6 @@ unsigned long vm_unmapped_area(struct vm_unmapped_area_info *info)
 	return addr;
 }
 
-#ifndef arch_get_mmap_end
-#define arch_get_mmap_end(addr)	(TASK_SIZE)
-#endif
-
-#ifndef arch_get_mmap_base
-#define arch_get_mmap_base(addr, base) (base)
-#endif
-
 /* Get an address range which is currently unmapped.
  * For shmat() with addr=0.
  *
diff --git a/mm/mremap.c b/mm/mremap.c
index 138abbae4f75..c5b2ed2bfd2d 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -24,12 +24,15 @@
 #include <linux/uaccess.h>
 #include <linux/mm-arch-hooks.h>
 #include <linux/userfaultfd_k.h>
+#include <linux/elf-randomize.h>
 
 #include <asm/cacheflush.h>
 #include <asm/tlbflush.h>
 
 #include "internal.h"
 
+#define MAX_RANDOM_MREMAP_RETRIES		5
+
 static pmd_t *get_old_pmd(struct mm_struct *mm, unsigned long addr)
 {
 	pgd_t *pgd;
@@ -720,6 +723,30 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
 		goto out;
 	}
 
+	if ((flags & MREMAP_MAYMOVE) && randomize_va_space == 3) {
+		/*
+		 * Caller is happy with a different address, so let's
+		 * move even if not necessary!
+		 */
+		int i = MAX_RANDOM_MREMAP_RETRIES;
+		unsigned long max_addr = arch_get_mmap_base(addr, mm->mmap_base);
+
+		do {
+			/* Try a few times to find a free area */
+			new_addr = arch_mmap_rnd();
+			if (new_addr >= max_addr)
+				continue;
+			ret = mremap_to(addr, old_len, new_addr, new_len,
+					&locked, flags, &uf, &uf_unmap_early,
+					&uf_unmap);
+			if (!IS_ERR_VALUE(ret))
+				goto out;
+		} while (--i >= 0);
+
+		/* Give up and try the old address */
+		new_addr = addr;
+	}
+
 	/*
 	 * Always allow a shrinking remap: that just unmaps
 	 * the unnecessary pages..
diff --git a/mm/util.c b/mm/util.c
index 4ddb6e186dd5..f81a6de4f82a 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -319,6 +319,12 @@ unsigned long randomize_stack_top(unsigned long stack_top)
 {
 	unsigned long random_variable = 0;
 
+	/*
+	 * Randomize stack address.
+	 */
+	if (randomize_va_space == 3)
+		return arch_mmap_rnd();
+
 	if (current->flags & PF_RANDOMIZE) {
 		random_variable = get_random_long();
 		random_variable &= STACK_RND_MASK;

base-commit: aae5ab854e38151e69f261dbf0e3b7e396403178
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v5] mm: Optional full ASLR for mmap(), mremap(), vdso and stack
  2020-11-29 21:15 [PATCH v5] mm: Optional full ASLR for mmap(), mremap(), vdso and stack Topi Miettinen
@ 2020-11-30  6:08 ` kernel test robot
  2020-11-30 17:57 ` Andy Lutomirski
  2020-12-03  9:47 ` Florian Weimer
  2 siblings, 0 replies; 11+ messages in thread
From: kernel test robot @ 2020-11-30  6:08 UTC (permalink / raw)
  To: Topi Miettinen, linux-hardening, akpm, linux-mm, linux-kernel
  Cc: kbuild-all, clang-built-linux, Topi Miettinen, Jann Horn,
	Kees Cook, Matthew Wilcox, Mike Rapoport, Linux API

[-- Attachment #1: Type: text/plain, Size: 4621 bytes --]

Hi Topi,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on aae5ab854e38151e69f261dbf0e3b7e396403178]

url:    https://github.com/0day-ci/linux/commits/Topi-Miettinen/mm-Optional-full-ASLR-for-mmap-mremap-vdso-and-stack/20201130-051703
base:    aae5ab854e38151e69f261dbf0e3b7e396403178
config: x86_64-randconfig-a002-20201130 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project f502b14d40e751fe00afc493ef0d08f196524886)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # https://github.com/0day-ci/linux/commit/c06384c5cecf700db214c69a4565c41a4c4fad82
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Topi-Miettinen/mm-Optional-full-ASLR-for-mmap-mremap-vdso-and-stack/20201130-051703
        git checkout c06384c5cecf700db214c69a4565c41a4c4fad82
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   arch/x86/entry/vdso/vma.c:38:19: warning: no previous prototype for function 'arch_get_vdso_data' [-Wmissing-prototypes]
   struct vdso_data *arch_get_vdso_data(void *vvar_page)
                     ^
   arch/x86/entry/vdso/vma.c:38:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   struct vdso_data *arch_get_vdso_data(void *vvar_page)
   ^
   static 
>> arch/x86/entry/vdso/vma.c:382:9: warning: cast to 'void *' from smaller integer type 'int' [-Wint-to-void-pointer-cast]
                           if (!IS_ERR_VALUE(ret))
                                ^~~~~~~~~~~~~~~~~
   include/linux/err.h:22:49: note: expanded from macro 'IS_ERR_VALUE'
   #define IS_ERR_VALUE(x) unlikely((unsigned long)(void *)(x) >= (unsigned long)-MAX_ERRNO)
                           ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/compiler.h:48:41: note: expanded from macro 'unlikely'
   #  define unlikely(x)   (__branch_check__(x, 0, __builtin_constant_p(x)))
                            ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/compiler.h:33:34: note: expanded from macro '__branch_check__'
                           ______r = __builtin_expect(!!(x), expect);      \
                                                         ^
>> arch/x86/entry/vdso/vma.c:382:9: warning: cast to 'void *' from smaller integer type 'int' [-Wint-to-void-pointer-cast]
                           if (!IS_ERR_VALUE(ret))
                                ^~~~~~~~~~~~~~~~~
   include/linux/err.h:22:49: note: expanded from macro 'IS_ERR_VALUE'
   #define IS_ERR_VALUE(x) unlikely((unsigned long)(void *)(x) >= (unsigned long)-MAX_ERRNO)
                           ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/compiler.h:48:68: note: expanded from macro 'unlikely'
   #  define unlikely(x)   (__branch_check__(x, 0, __builtin_constant_p(x)))
                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~
   include/linux/compiler.h:35:19: note: expanded from macro '__branch_check__'
                                                expect, is_constant);      \
                                                        ^~~~~~~~~~~
   3 warnings generated.

vim +382 arch/x86/entry/vdso/vma.c

   364	
   365	static int map_vdso_randomized(const struct vdso_image *image)
   366	{
   367		unsigned long addr;
   368	
   369		if (randomize_va_space == 3) {
   370			/*
   371			 * Randomize vdso address.
   372			 */
   373			int i = MAX_RANDOM_VDSO_RETRIES;
   374	
   375			do {
   376				int ret;
   377	
   378				/* Try a few times to find a free area */
   379				addr = arch_mmap_rnd();
   380	
   381				ret = map_vdso(image, addr);
 > 382				if (!IS_ERR_VALUE(ret))
   383					return ret;
   384			} while (--i >= 0);
   385	
   386			/* Give up and try the less random way */
   387		}
   388		addr = vdso_addr(current->mm->start_stack, image->size-image->sym_vvar_start);
   389	
   390		return map_vdso(image, addr);
   391	}
   392	#endif
   393	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 34002 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5] mm: Optional full ASLR for mmap(), mremap(), vdso and stack
  2020-11-29 21:15 [PATCH v5] mm: Optional full ASLR for mmap(), mremap(), vdso and stack Topi Miettinen
  2020-11-30  6:08 ` kernel test robot
@ 2020-11-30 17:57 ` Andy Lutomirski
  2020-11-30 20:27   ` Topi Miettinen
  2020-12-03  9:47 ` Florian Weimer
  2 siblings, 1 reply; 11+ messages in thread
From: Andy Lutomirski @ 2020-11-30 17:57 UTC (permalink / raw)
  To: Topi Miettinen
  Cc: linux-hardening, Andrew Morton, Linux-MM, LKML, Jann Horn,
	Kees Cook, Matthew Wilcox, Mike Rapoport, Linux API

On Sun, Nov 29, 2020 at 1:20 PM Topi Miettinen <toiwoton@gmail.com> wrote:
>
> Writing a new value of 3 to /proc/sys/kernel/randomize_va_space
> enables full randomization of memory mappings created with mmap(NULL,
> ...). With 2, the base of the VMA used for such mappings is random,
> but the mappings are created in predictable places within the VMA and
> in sequential order. With 3, new VMAs are created to fully randomize
> the mappings.
>
> Also mremap(..., MREMAP_MAYMOVE) will move the mappings even if not
> necessary and the location of stack and vdso are also randomized.
>
> The method is to randomize the new address without considering
> VMAs. If the address fails checks because of overlap with the stack
> area (or in case of mremap(), overlap with the old mapping), the
> operation is retried a few times before falling back to old method.
>
> On 32 bit systems this may cause problems due to increased VM
> fragmentation if the address space gets crowded.
>
> On all systems, it will reduce performance and increase memory usage
> due to less efficient use of page tables and inability to merge
> adjacent VMAs with compatible attributes. In the worst case,
> additional page table entries of up to 4 pages are created for each
> mapping, so with small mappings there's considerable penalty.
>
> In this example with sysctl.kernel.randomize_va_space = 2, dynamic
> loader, libc, anonymous memory reserved with mmap() and locale-archive
> are located close to each other:
>
> $ cat /proc/self/maps (only first line for each object shown for brevity)
> 5acea452d000-5acea452f000 r--p 00000000 fe:0c 1868624                    /usr/bin/cat
> 74f438f90000-74f4394f2000 r--p 00000000 fe:0c 2473999                    /usr/lib/locale/locale-archive
> 74f4394f2000-74f4395f2000 rw-p 00000000 00:00 0
> 74f4395f2000-74f439617000 r--p 00000000 fe:0c 2402332                    /usr/lib/x86_64-linux-gnu/libc-2.31.so
> 74f4397b3000-74f4397b9000 rw-p 00000000 00:00 0
> 74f4397e5000-74f4397e6000 r--p 00000000 fe:0c 2400754                    /usr/lib/x86_64-linux-gnu/ld-2.31.so
> 74f439811000-74f439812000 rw-p 00000000 00:00 0
> 7fffdca0d000-7fffdca2e000 rw-p 00000000 00:00 0                          [stack]
> 7fffdcb49000-7fffdcb4d000 r--p 00000000 00:00 0                          [vvar]
> 7fffdcb4d000-7fffdcb4f000 r-xp 00000000 00:00 0                          [vdso]
>
> With sysctl.kernel.randomize_va_space = 3, they are located at
> unrelated addresses and the order is random:
>
> $ echo 3 > /proc/sys/kernel/randomize_va_space
> $ cat /proc/self/maps (only first line for each object shown for brevity)
> 3850520000-3850620000 rw-p 00000000 00:00 0
> 28cfb4c8000-28cfb4cc000 r--p 00000000 00:00 0                            [vvar]
> 28cfb4cc000-28cfb4ce000 r-xp 00000000 00:00 0                            [vdso]
> 9e74c385000-9e74c387000 rw-p 00000000 00:00 0
> a42e0233000-a42e0234000 r--p 00000000 fe:0c 2400754                      /usr/lib/x86_64-linux-gnu/ld-2.31.so
> a42e025f000-a42e0260000 rw-p 00000000 00:00 0
> bea40427000-bea4044c000 r--p 00000000 fe:0c 2402332                      /usr/lib/x86_64-linux-gnu/libc-2.31.so
> bea405e8000-bea405ec000 rw-p 00000000 00:00 0
> f6d446fa000-f6d44c5c000 r--p 00000000 fe:0c 2473999                      /usr/lib/locale/locale-archive
> fcfbf684000-fcfbf6a5000 rw-p 00000000 00:00 0                            [stack]
> 619aba62d000-619aba62f000 r--p 00000000 fe:0c 1868624                    /usr/bin/cat
>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Jann Horn <jannh@google.com>
> CC: Kees Cook <keescook@chromium.org>
> CC: Matthew Wilcox <willy@infradead.org>
> CC: Mike Rapoport <rppt@kernel.org>
> CC: Linux API <linux-api@vger.kernel.org>
> Signed-off-by: Topi Miettinen <toiwoton@gmail.com>
> ---
> v2: also randomize mremap(..., MREMAP_MAYMOVE)
> v3: avoid stack area and retry in case of bad random address (Jann
> Horn), improve description in kernel.rst (Matthew Wilcox)
> v4:
> - use /proc/$pid/maps in the example (Mike Rapaport)
> - CCs (Andrew Morton)
> - only check randomize_va_space == 3
> v5: randomize also vdso and stack
> ---
>  Documentation/admin-guide/hw-vuln/spectre.rst |  6 ++--
>  Documentation/admin-guide/sysctl/kernel.rst   | 20 +++++++++++++
>  arch/x86/entry/vdso/vma.c                     | 26 +++++++++++++++-
>  include/linux/mm.h                            |  8 +++++
>  init/Kconfig                                  |  2 +-
>  mm/mmap.c                                     | 30 +++++++++++++------
>  mm/mremap.c                                   | 27 +++++++++++++++++
>  mm/util.c                                     |  6 ++++
>  8 files changed, 111 insertions(+), 14 deletions(-)
>
> diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst
> index e05e581af5cf..9ea250522077 100644
> --- a/Documentation/admin-guide/hw-vuln/spectre.rst
> +++ b/Documentation/admin-guide/hw-vuln/spectre.rst
> @@ -254,7 +254,7 @@ Spectre variant 2
>     left by the previous process will also be cleared.
>
>     User programs should use address space randomization to make attacks
> -   more difficult (Set /proc/sys/kernel/randomize_va_space = 1 or 2).
> +   more difficult (Set /proc/sys/kernel/randomize_va_space = 1, 2 or 3).
>
>  3. A virtualized guest attacking the host
>  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> @@ -499,8 +499,8 @@ Spectre variant 2
>     more overhead and run slower.
>
>     User programs should use address space randomization
> -   (/proc/sys/kernel/randomize_va_space = 1 or 2) to make attacks more
> -   difficult.
> +   (/proc/sys/kernel/randomize_va_space = 1, 2 or 3) to make attacks
> +   more difficult.
>
>  3. VM mitigation
>  ^^^^^^^^^^^^^^^^
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index d4b32cc32bb7..806e3b29d2b5 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -1060,6 +1060,26 @@ that support this feature.
>      Systems with ancient and/or broken binaries should be configured
>      with ``CONFIG_COMPAT_BRK`` enabled, which excludes the heap from process
>      address space randomization.
> +
> +3   Additionally enable full randomization of memory mappings created
> +    with mmap(NULL, ...). With 2, the base of the VMA used for such
> +    mappings is random, but the mappings are created in predictable
> +    places within the VMA and in sequential order. With 3, new VMAs
> +    are created to fully randomize the mappings.
> +
> +    Also mremap(..., MREMAP_MAYMOVE) will move the mappings even if
> +    not necessary and the location of stack and vdso are also
> +    randomized.
> +
> +    On 32 bit systems this may cause problems due to increased VM
> +    fragmentation if the address space gets crowded.
> +
> +    On all systems, it will reduce performance and increase memory
> +    usage due to less efficient use of page tables and inability to
> +    merge adjacent VMAs with compatible attributes. In the worst case,
> +    additional page table entries of up to 4 pages are created for
> +    each mapping, so with small mappings there's considerable penalty.
> +
>  ==  ===========================================================================
>
>
> diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
> index 9185cb1d13b9..03ea884822e3 100644
> --- a/arch/x86/entry/vdso/vma.c
> +++ b/arch/x86/entry/vdso/vma.c
> @@ -12,6 +12,7 @@
>  #include <linux/init.h>
>  #include <linux/random.h>
>  #include <linux/elf.h>
> +#include <linux/elf-randomize.h>
>  #include <linux/cpu.h>
>  #include <linux/ptrace.h>
>  #include <linux/time_namespace.h>
> @@ -32,6 +33,8 @@
>         const size_t name ## _offset = offset;
>  #include <asm/vvar.h>
>
> +#define MAX_RANDOM_VDSO_RETRIES                        5
> +
>  struct vdso_data *arch_get_vdso_data(void *vvar_page)
>  {
>         return (struct vdso_data *)(vvar_page + _vdso_data_offset);
> @@ -361,7 +364,28 @@ static unsigned long vdso_addr(unsigned long start, unsigned len)
>
>  static int map_vdso_randomized(const struct vdso_image *image)
>  {
> -       unsigned long addr = vdso_addr(current->mm->start_stack, image->size-image->sym_vvar_start);
> +       unsigned long addr;
> +
> +       if (randomize_va_space == 3) {
> +               /*
> +                * Randomize vdso address.
> +                */
> +               int i = MAX_RANDOM_VDSO_RETRIES;
> +
> +               do {
> +                       int ret;
> +
> +                       /* Try a few times to find a free area */
> +                       addr = arch_mmap_rnd();
> +
> +                       ret = map_vdso(image, addr);
> +                       if (!IS_ERR_VALUE(ret))
> +                               return ret;
> +               } while (--i >= 0);
> +
> +               /* Give up and try the less random way */
> +       }
> +       addr = vdso_addr(current->mm->start_stack, image->size-image->sym_vvar_start);

This is IMO rather ugly.  You're picking random numbers and throwing
them at map_vdso(), which throws them at get_unmapped_area(), which
will validate them.  And you duplicate the same ugly loop later on.

How about instead pushing this logic into get_unmapped_area()?

--Andy


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5] mm: Optional full ASLR for mmap(), mremap(), vdso and stack
  2020-11-30 17:57 ` Andy Lutomirski
@ 2020-11-30 20:27   ` Topi Miettinen
  0 siblings, 0 replies; 11+ messages in thread
From: Topi Miettinen @ 2020-11-30 20:27 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-hardening, Andrew Morton, Linux-MM, LKML, Jann Horn,
	Kees Cook, Matthew Wilcox, Mike Rapoport, Linux API

On 30.11.2020 19.57, Andy Lutomirski wrote:
> On Sun, Nov 29, 2020 at 1:20 PM Topi Miettinen <toiwoton@gmail.com> wrote:
>>
>> Writing a new value of 3 to /proc/sys/kernel/randomize_va_space
>> enables full randomization of memory mappings created with mmap(NULL,
>> ...). With 2, the base of the VMA used for such mappings is random,
>> but the mappings are created in predictable places within the VMA and
>> in sequential order. With 3, new VMAs are created to fully randomize
>> the mappings.
>>
>> Also mremap(..., MREMAP_MAYMOVE) will move the mappings even if not
>> necessary and the location of stack and vdso are also randomized.
>>
>> The method is to randomize the new address without considering
>> VMAs. If the address fails checks because of overlap with the stack
>> area (or in case of mremap(), overlap with the old mapping), the
>> operation is retried a few times before falling back to old method.
>>
>> On 32 bit systems this may cause problems due to increased VM
>> fragmentation if the address space gets crowded.
>>
>> On all systems, it will reduce performance and increase memory usage
>> due to less efficient use of page tables and inability to merge
>> adjacent VMAs with compatible attributes. In the worst case,
>> additional page table entries of up to 4 pages are created for each
>> mapping, so with small mappings there's considerable penalty.
>>
>> In this example with sysctl.kernel.randomize_va_space = 2, dynamic
>> loader, libc, anonymous memory reserved with mmap() and locale-archive
>> are located close to each other:
>>
>> $ cat /proc/self/maps (only first line for each object shown for brevity)
>> 5acea452d000-5acea452f000 r--p 00000000 fe:0c 1868624                    /usr/bin/cat
>> 74f438f90000-74f4394f2000 r--p 00000000 fe:0c 2473999                    /usr/lib/locale/locale-archive
>> 74f4394f2000-74f4395f2000 rw-p 00000000 00:00 0
>> 74f4395f2000-74f439617000 r--p 00000000 fe:0c 2402332                    /usr/lib/x86_64-linux-gnu/libc-2.31.so
>> 74f4397b3000-74f4397b9000 rw-p 00000000 00:00 0
>> 74f4397e5000-74f4397e6000 r--p 00000000 fe:0c 2400754                    /usr/lib/x86_64-linux-gnu/ld-2.31.so
>> 74f439811000-74f439812000 rw-p 00000000 00:00 0
>> 7fffdca0d000-7fffdca2e000 rw-p 00000000 00:00 0                          [stack]
>> 7fffdcb49000-7fffdcb4d000 r--p 00000000 00:00 0                          [vvar]
>> 7fffdcb4d000-7fffdcb4f000 r-xp 00000000 00:00 0                          [vdso]
>>
>> With sysctl.kernel.randomize_va_space = 3, they are located at
>> unrelated addresses and the order is random:
>>
>> $ echo 3 > /proc/sys/kernel/randomize_va_space
>> $ cat /proc/self/maps (only first line for each object shown for brevity)
>> 3850520000-3850620000 rw-p 00000000 00:00 0
>> 28cfb4c8000-28cfb4cc000 r--p 00000000 00:00 0                            [vvar]
>> 28cfb4cc000-28cfb4ce000 r-xp 00000000 00:00 0                            [vdso]
>> 9e74c385000-9e74c387000 rw-p 00000000 00:00 0
>> a42e0233000-a42e0234000 r--p 00000000 fe:0c 2400754                      /usr/lib/x86_64-linux-gnu/ld-2.31.so
>> a42e025f000-a42e0260000 rw-p 00000000 00:00 0
>> bea40427000-bea4044c000 r--p 00000000 fe:0c 2402332                      /usr/lib/x86_64-linux-gnu/libc-2.31.so
>> bea405e8000-bea405ec000 rw-p 00000000 00:00 0
>> f6d446fa000-f6d44c5c000 r--p 00000000 fe:0c 2473999                      /usr/lib/locale/locale-archive
>> fcfbf684000-fcfbf6a5000 rw-p 00000000 00:00 0                            [stack]
>> 619aba62d000-619aba62f000 r--p 00000000 fe:0c 1868624                    /usr/bin/cat
>>
>> CC: Andrew Morton <akpm@linux-foundation.org>
>> CC: Jann Horn <jannh@google.com>
>> CC: Kees Cook <keescook@chromium.org>
>> CC: Matthew Wilcox <willy@infradead.org>
>> CC: Mike Rapoport <rppt@kernel.org>
>> CC: Linux API <linux-api@vger.kernel.org>
>> Signed-off-by: Topi Miettinen <toiwoton@gmail.com>
>> ---
>> v2: also randomize mremap(..., MREMAP_MAYMOVE)
>> v3: avoid stack area and retry in case of bad random address (Jann
>> Horn), improve description in kernel.rst (Matthew Wilcox)
>> v4:
>> - use /proc/$pid/maps in the example (Mike Rapaport)
>> - CCs (Andrew Morton)
>> - only check randomize_va_space == 3
>> v5: randomize also vdso and stack
>> ---
>>   Documentation/admin-guide/hw-vuln/spectre.rst |  6 ++--
>>   Documentation/admin-guide/sysctl/kernel.rst   | 20 +++++++++++++
>>   arch/x86/entry/vdso/vma.c                     | 26 +++++++++++++++-
>>   include/linux/mm.h                            |  8 +++++
>>   init/Kconfig                                  |  2 +-
>>   mm/mmap.c                                     | 30 +++++++++++++------
>>   mm/mremap.c                                   | 27 +++++++++++++++++
>>   mm/util.c                                     |  6 ++++
>>   8 files changed, 111 insertions(+), 14 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst
>> index e05e581af5cf..9ea250522077 100644
>> --- a/Documentation/admin-guide/hw-vuln/spectre.rst
>> +++ b/Documentation/admin-guide/hw-vuln/spectre.rst
>> @@ -254,7 +254,7 @@ Spectre variant 2
>>      left by the previous process will also be cleared.
>>
>>      User programs should use address space randomization to make attacks
>> -   more difficult (Set /proc/sys/kernel/randomize_va_space = 1 or 2).
>> +   more difficult (Set /proc/sys/kernel/randomize_va_space = 1, 2 or 3).
>>
>>   3. A virtualized guest attacking the host
>>   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> @@ -499,8 +499,8 @@ Spectre variant 2
>>      more overhead and run slower.
>>
>>      User programs should use address space randomization
>> -   (/proc/sys/kernel/randomize_va_space = 1 or 2) to make attacks more
>> -   difficult.
>> +   (/proc/sys/kernel/randomize_va_space = 1, 2 or 3) to make attacks
>> +   more difficult.
>>
>>   3. VM mitigation
>>   ^^^^^^^^^^^^^^^^
>> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
>> index d4b32cc32bb7..806e3b29d2b5 100644
>> --- a/Documentation/admin-guide/sysctl/kernel.rst
>> +++ b/Documentation/admin-guide/sysctl/kernel.rst
>> @@ -1060,6 +1060,26 @@ that support this feature.
>>       Systems with ancient and/or broken binaries should be configured
>>       with ``CONFIG_COMPAT_BRK`` enabled, which excludes the heap from process
>>       address space randomization.
>> +
>> +3   Additionally enable full randomization of memory mappings created
>> +    with mmap(NULL, ...). With 2, the base of the VMA used for such
>> +    mappings is random, but the mappings are created in predictable
>> +    places within the VMA and in sequential order. With 3, new VMAs
>> +    are created to fully randomize the mappings.
>> +
>> +    Also mremap(..., MREMAP_MAYMOVE) will move the mappings even if
>> +    not necessary and the location of stack and vdso are also
>> +    randomized.
>> +
>> +    On 32 bit systems this may cause problems due to increased VM
>> +    fragmentation if the address space gets crowded.
>> +
>> +    On all systems, it will reduce performance and increase memory
>> +    usage due to less efficient use of page tables and inability to
>> +    merge adjacent VMAs with compatible attributes. In the worst case,
>> +    additional page table entries of up to 4 pages are created for
>> +    each mapping, so with small mappings there's considerable penalty.
>> +
>>   ==  ===========================================================================
>>
>>
>> diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
>> index 9185cb1d13b9..03ea884822e3 100644
>> --- a/arch/x86/entry/vdso/vma.c
>> +++ b/arch/x86/entry/vdso/vma.c
>> @@ -12,6 +12,7 @@
>>   #include <linux/init.h>
>>   #include <linux/random.h>
>>   #include <linux/elf.h>
>> +#include <linux/elf-randomize.h>
>>   #include <linux/cpu.h>
>>   #include <linux/ptrace.h>
>>   #include <linux/time_namespace.h>
>> @@ -32,6 +33,8 @@
>>          const size_t name ## _offset = offset;
>>   #include <asm/vvar.h>
>>
>> +#define MAX_RANDOM_VDSO_RETRIES                        5
>> +
>>   struct vdso_data *arch_get_vdso_data(void *vvar_page)
>>   {
>>          return (struct vdso_data *)(vvar_page + _vdso_data_offset);
>> @@ -361,7 +364,28 @@ static unsigned long vdso_addr(unsigned long start, unsigned len)
>>
>>   static int map_vdso_randomized(const struct vdso_image *image)
>>   {
>> -       unsigned long addr = vdso_addr(current->mm->start_stack, image->size-image->sym_vvar_start);
>> +       unsigned long addr;
>> +
>> +       if (randomize_va_space == 3) {
>> +               /*
>> +                * Randomize vdso address.
>> +                */
>> +               int i = MAX_RANDOM_VDSO_RETRIES;
>> +
>> +               do {
>> +                       int ret;
>> +
>> +                       /* Try a few times to find a free area */
>> +                       addr = arch_mmap_rnd();
>> +
>> +                       ret = map_vdso(image, addr);
>> +                       if (!IS_ERR_VALUE(ret))
>> +                               return ret;
>> +               } while (--i >= 0);
>> +
>> +               /* Give up and try the less random way */
>> +       }
>> +       addr = vdso_addr(current->mm->start_stack, image->size-image->sym_vvar_start);
> 
> This is IMO rather ugly.  You're picking random numbers and throwing
> them at map_vdso(), which throws them at get_unmapped_area(), which
> will validate them.  And you duplicate the same ugly loop later on.

I agree it's not very pretty, but I'd expect that the first number would 
already have high probability of getting accepted and the probability of 
all five attempts failing should be very low. For example, on a system 
with 16GB of RAM, maximum VM of 32GB (35 bits) and 47 bits of available 
VM space (since kernel takes one bit), the chances of failure should be 
1 - 1 / 2^(47 - 35) or only one out of 4096 first attempts should be 
expected to fail. Chances of all five failing should be 1 / 2^60.

> How about instead pushing this logic into get_unmapped_area()?

The real work seems to be done in unmapped_area() and similar 
unmapped_area_topdown(), which traverse a RB tree when checking the 
address. Maybe a more clever algorithm could walk the tree using a 
random starting address, and when on a branch the address is found 
invalid (again, not very likely), instead of restarting from top, the 
search could mutate some bits of the address and continue randomly 
either sideways or backing up. I'm not sure how randomness properties 
would be affected by this and how to guarantee that the random walk will 
always stop eventually. This is not a problem with the proposed simple 
approach.

-Topi

> --Andy
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5] mm: Optional full ASLR for mmap(), mremap(), vdso and stack
  2020-11-29 21:15 [PATCH v5] mm: Optional full ASLR for mmap(), mremap(), vdso and stack Topi Miettinen
  2020-11-30  6:08 ` kernel test robot
  2020-11-30 17:57 ` Andy Lutomirski
@ 2020-12-03  9:47 ` Florian Weimer
  2020-12-03 12:02   ` Topi Miettinen
  2 siblings, 1 reply; 11+ messages in thread
From: Florian Weimer @ 2020-12-03  9:47 UTC (permalink / raw)
  To: Topi Miettinen
  Cc: linux-hardening, akpm, linux-mm, linux-kernel, Jann Horn,
	Kees Cook, Matthew Wilcox, Mike Rapoport, Linux API

* Topi Miettinen:

> +3   Additionally enable full randomization of memory mappings created
> +    with mmap(NULL, ...). With 2, the base of the VMA used for such
> +    mappings is random, but the mappings are created in predictable
> +    places within the VMA and in sequential order. With 3, new VMAs
> +    are created to fully randomize the mappings.
> +
> +    Also mremap(..., MREMAP_MAYMOVE) will move the mappings even if
> +    not necessary and the location of stack and vdso are also
> +    randomized.
> +
> +    On 32 bit systems this may cause problems due to increased VM
> +    fragmentation if the address space gets crowded.

Isn't this a bit of an understatement?  I think you'll have to restrict
this randomization to a subregion of the entire address space, otherwise
the reduction in maximum mapping size due to fragmentation will be a
problem on 64-bit architectures as well (which generally do not support
the full 64 bits for user-space addresses).

> +    On all systems, it will reduce performance and increase memory
> +    usage due to less efficient use of page tables and inability to
> +    merge adjacent VMAs with compatible attributes. In the worst case,
> +    additional page table entries of up to 4 pages are created for
> +    each mapping, so with small mappings there's considerable penalty.

The number 4 is architecture-specific, right?

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5] mm: Optional full ASLR for mmap(), mremap(), vdso and stack
  2020-12-03  9:47 ` Florian Weimer
@ 2020-12-03 12:02   ` Topi Miettinen
  2020-12-03 17:10     ` Andy Lutomirski
  0 siblings, 1 reply; 11+ messages in thread
From: Topi Miettinen @ 2020-12-03 12:02 UTC (permalink / raw)
  To: Florian Weimer
  Cc: linux-hardening, akpm, linux-mm, linux-kernel, Jann Horn,
	Kees Cook, Matthew Wilcox, Mike Rapoport, Linux API

On 3.12.2020 11.47, Florian Weimer wrote:
> * Topi Miettinen:
> 
>> +3   Additionally enable full randomization of memory mappings created
>> +    with mmap(NULL, ...). With 2, the base of the VMA used for such
>> +    mappings is random, but the mappings are created in predictable
>> +    places within the VMA and in sequential order. With 3, new VMAs
>> +    are created to fully randomize the mappings.
>> +
>> +    Also mremap(..., MREMAP_MAYMOVE) will move the mappings even if
>> +    not necessary and the location of stack and vdso are also
>> +    randomized.
>> +
>> +    On 32 bit systems this may cause problems due to increased VM
>> +    fragmentation if the address space gets crowded.
> 
> Isn't this a bit of an understatement?  I think you'll have to restrict
> this randomization to a subregion of the entire address space, otherwise
> the reduction in maximum mapping size due to fragmentation will be a
> problem on 64-bit architectures as well (which generally do not support
> the full 64 bits for user-space addresses).

Restricting randomization would reduce the address space layout 
randomization and make this less useful. There's 48 or 56 bits, which 
translate to 128TB and 64PB of VM for user applications. Is it really 
possible to build today (or in near future) a system, which would 
contain so much RAM that such fragmentation could realistically happen? 
Perhaps also in a special case where lots of 1GB huge pages are 
necessary? Maybe in those cases you shouldn't use randomize_va_space=3. 
Or perhaps there could be randomize_va_space=3 which does something, and 
randomize_va_space=4 for those who want maximum randomization.

>> +    On all systems, it will reduce performance and increase memory
>> +    usage due to less efficient use of page tables and inability to
>> +    merge adjacent VMAs with compatible attributes. In the worst case,
>> +    additional page table entries of up to 4 pages are created for
>> +    each mapping, so with small mappings there's considerable penalty.
> 
> The number 4 is architecture-specific, right?

Yes, I only know x86_64. Actually it could have 5 level page tables. 
I'll fix this in next version.

-Topi


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5] mm: Optional full ASLR for mmap(), mremap(), vdso and stack
  2020-12-03 12:02   ` Topi Miettinen
@ 2020-12-03 17:10     ` Andy Lutomirski
  2020-12-03 17:28       ` Florian Weimer
  0 siblings, 1 reply; 11+ messages in thread
From: Andy Lutomirski @ 2020-12-03 17:10 UTC (permalink / raw)
  To: Topi Miettinen
  Cc: Florian Weimer, linux-hardening, akpm, linux-mm, linux-kernel,
	Jann Horn, Kees Cook, Matthew Wilcox, Mike Rapoport, Linux API


> On Dec 3, 2020, at 4:06 AM, Topi Miettinen <toiwoton@gmail.com> wrote:
> 
> On 3.12.2020 11.47, Florian Weimer wrote:
>> * Topi Miettinen:
>>> +3   Additionally enable full randomization of memory mappings created
>>> +    with mmap(NULL, ...). With 2, the base of the VMA used for such
>>> +    mappings is random, but the mappings are created in predictable
>>> +    places within the VMA and in sequential order. With 3, new VMAs
>>> +    are created to fully randomize the mappings.
>>> +
>>> +    Also mremap(..., MREMAP_MAYMOVE) will move the mappings even if
>>> +    not necessary and the location of stack and vdso are also
>>> +    randomized.
>>> +
>>> +    On 32 bit systems this may cause problems due to increased VM
>>> +    fragmentation if the address space gets crowded.
>> Isn't this a bit of an understatement?  I think you'll have to restrict
>> this randomization to a subregion of the entire address space, otherwise
>> the reduction in maximum mapping size due to fragmentation will be a
>> problem on 64-bit architectures as well (which generally do not support
>> the full 64 bits for user-space addresses).
> 
> Restricting randomization would reduce the address space layout randomization and make this less useful. There's 48 or 56 bits, which translate to 128TB and 64PB of VM for user applications. Is it really possible to build today (or in near future) a system, which would contain so much RAM that such fragmentation could realistically happen? Perhaps also in a special case where lots of 1GB huge pages are necessary? Maybe in those cases you shouldn't use randomize_va_space=3. Or perhaps there could be randomize_va_space=3 which does something, and randomize_va_space=4 for those who want maximum randomization.

If you want a 4GB allocation to succeed, you can only divide the address space into 32k fragments.  Or, a little more precisely, if you want a randomly selected 4GB region to be empty, any other allocation has a 1/32k chance of being in the way.  (Rough numbers — I’m ignoring effects of the beginning and end of the address space, and I’m ignoring the size of a potential conflicting allocation.). This sounds good, except that a program could easily make a whole bunch of tiny allocations that get merged in current kernels but wouldn’t with your scheme.

So maybe this is okay, but it’s not likely to be a good default.

> 
>>> +    On all systems, it will reduce performance and increase memory
>>> +    usage due to less efficient use of page tables and inability to
>>> +    merge adjacent VMAs with compatible attributes. In the worst case,
>>> +    additional page table entries of up to 4 pages are created for
>>> +    each mapping, so with small mappings there's considerable penalty.
>> The number 4 is architecture-specific, right?
> 
> Yes, I only know x86_64. Actually it could have 5 level page tables. I'll fix this in next version.
> 
> -Topi


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5] mm: Optional full ASLR for mmap(), mremap(), vdso and stack
  2020-12-03 17:10     ` Andy Lutomirski
@ 2020-12-03 17:28       ` Florian Weimer
  2020-12-03 17:42         ` Andy Lutomirski
  2020-12-03 19:26         ` Joseph Myers
  0 siblings, 2 replies; 11+ messages in thread
From: Florian Weimer @ 2020-12-03 17:28 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Topi Miettinen, linux-hardening, akpm, linux-mm, linux-kernel,
	Jann Horn, Kees Cook, Matthew Wilcox, Mike Rapoport, Linux API

* Andy Lutomirski:

> If you want a 4GB allocation to succeed, you can only divide the
> address space into 32k fragments.  Or, a little more precisely, if you
> want a randomly selected 4GB region to be empty, any other allocation
> has a 1/32k chance of being in the way.  (Rough numbers — I’m ignoring
> effects of the beginning and end of the address space, and I’m
> ignoring the size of a potential conflicting allocation.).

I think the probability distribution is way more advantageous than that
because it is unlikely that 32K allocations are all exactly spaced 4 GB
apart.  (And with 32K allocations, you are close to the VMA limit anyway.)

My knowledge of probability theory is quite limited, so I have to rely
on simulations.  But I think you would see a 40 GiB gap somewhere for a
47-bit address space with 32K allocations, most of the time.  Which is
not too bad.

But even with a 47 bit address space and just 500 threads (each with at
least a stack and local heap, randomized indepently), simulations
suggestion that the largest gap is often just 850 GB.  At that point,
you can't expect to map your NVDIMM (or whatever) in a single mapping
anymore, and you have to code around that.

Not randomizing large allocations and sacrificing one bit of randomness
for small allocations would avoid this issue, though.

(I still expect page walking performance to suffer drastically, with or
without this tweak.  I assume page walking uses the CPU cache hierarchy
today, and with full randomization, accessing page entry at each level
after a TLB miss would result in a data cache miss.  But then, I'm
firmly a software person.)

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5] mm: Optional full ASLR for mmap(), mremap(), vdso and stack
  2020-12-03 17:28       ` Florian Weimer
@ 2020-12-03 17:42         ` Andy Lutomirski
  2020-12-03 18:00           ` Matthew Wilcox
  2020-12-03 19:26         ` Joseph Myers
  1 sibling, 1 reply; 11+ messages in thread
From: Andy Lutomirski @ 2020-12-03 17:42 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Topi Miettinen, linux-hardening, akpm, linux-mm, linux-kernel,
	Jann Horn, Kees Cook, Matthew Wilcox, Mike Rapoport, Linux API


> On Dec 3, 2020, at 9:29 AM, Florian Weimer <fweimer@redhat.com> wrote:
> 
> * Andy Lutomirski:
> 
>> If you want a 4GB allocation to succeed, you can only divide the
>> address space into 32k fragments.  Or, a little more precisely, if you
>> want a randomly selected 4GB region to be empty, any other allocation
>> has a 1/32k chance of being in the way.  (Rough numbers — I’m ignoring
>> effects of the beginning and end of the address space, and I’m
>> ignoring the size of a potential conflicting allocation.).
> 
> I think the probability distribution is way more advantageous than that
> because it is unlikely that 32K allocations are all exactly spaced 4 GB
> apart.  (And with 32K allocations, you are close to the VMA limit anyway.)

I’m assuming the naive algorithm of choosing an address and trying it.  Actually looking for a big enough gap would be more reliable.

I suspect that something much more clever could be done in which the heap is divided up into a few independently randomized sections and heap pages are randomized within the sections might do much better. There should certainly be a lot of room for something between what we have now and a fully randomized scheme.

It might also be worth looking at what other OSes do.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5] mm: Optional full ASLR for mmap(), mremap(), vdso and stack
  2020-12-03 17:42         ` Andy Lutomirski
@ 2020-12-03 18:00           ` Matthew Wilcox
  0 siblings, 0 replies; 11+ messages in thread
From: Matthew Wilcox @ 2020-12-03 18:00 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Florian Weimer, Topi Miettinen, linux-hardening, akpm, linux-mm,
	linux-kernel, Jann Horn, Kees Cook, Mike Rapoport, Linux API

On Thu, Dec 03, 2020 at 09:42:54AM -0800, Andy Lutomirski wrote:
> I suspect that something much more clever could be done in which the heap is divided up into a few independently randomized sections and heap pages are randomized within the sections might do much better. There should certainly be a lot of room for something between what we have now and a fully randomized scheme.
> 
> It might also be worth looking at what other OSes do.

How about dividing the address space up into 1GB sections (or, rather,
PUD_SIZE sections), allocating from each one until it's 50% full, then
choose another one?  Sufficiently large allocations would ignore this
division and just look for any space.  I'm thinking something like the
slab allocator (so the 1GB chunk would go back into the allocatable list
when >50% of it was empty).

That might strike a happy medium between full randomisation and efficient
use of page tables / leaving large chunks of address space free for
large mmaps.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5] mm: Optional full ASLR for mmap(), mremap(), vdso and stack
  2020-12-03 17:28       ` Florian Weimer
  2020-12-03 17:42         ` Andy Lutomirski
@ 2020-12-03 19:26         ` Joseph Myers
  1 sibling, 0 replies; 11+ messages in thread
From: Joseph Myers @ 2020-12-03 19:26 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Andy Lutomirski, Topi Miettinen, linux-hardening, akpm, linux-mm,
	linux-kernel, Jann Horn, Kees Cook, Matthew Wilcox,
	Mike Rapoport, Linux API

On Thu, 3 Dec 2020, Florian Weimer wrote:

> My knowledge of probability theory is quite limited, so I have to rely
> on simulations.  But I think you would see a 40 GiB gap somewhere for a
> 47-bit address space with 32K allocations, most of the time.  Which is
> not too bad.

This is very close to a Poisson process (if the number of small 
allocations being distributed independently in the address space is 
large), so the probability that any given gap is at least x times the mean 
gap is about exp(-x).

-- 
Joseph S. Myers
joseph@codesourcery.com


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-12-03 19:26 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-29 21:15 [PATCH v5] mm: Optional full ASLR for mmap(), mremap(), vdso and stack Topi Miettinen
2020-11-30  6:08 ` kernel test robot
2020-11-30 17:57 ` Andy Lutomirski
2020-11-30 20:27   ` Topi Miettinen
2020-12-03  9:47 ` Florian Weimer
2020-12-03 12:02   ` Topi Miettinen
2020-12-03 17:10     ` Andy Lutomirski
2020-12-03 17:28       ` Florian Weimer
2020-12-03 17:42         ` Andy Lutomirski
2020-12-03 18:00           ` Matthew Wilcox
2020-12-03 19:26         ` Joseph Myers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).