linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/vmalloc: randomize vmalloc() allocations
@ 2020-12-01 21:45 Topi Miettinen
  2020-12-02 18:49 ` Topi Miettinen
  2020-12-02 18:53 ` Matthew Wilcox
  0 siblings, 2 replies; 11+ messages in thread
From: Topi Miettinen @ 2020-12-01 21:45 UTC (permalink / raw)
  To: linux-hardening, akpm, linux-mm, linux-kernel
  Cc: Topi Miettinen, Andy Lutomirski, Jann Horn, Kees Cook, Linux API,
	Matthew Wilcox, Mike Rapoport

Memory mappings inside kernel allocated with vmalloc() are in
predictable order and packed tightly toward the low addresses. With
new kernel boot parameter 'randomize_vmalloc=1', the entire area is
used randomly to make the allocations less predictable and harder to
guess for attackers.

Without randomize_vmalloc=1:
$ cat /proc/vmallocinfo
0xffffc90000000000-0xffffc90000002000    8192 acpi_os_map_iomem+0x29e/0x2c0 phys=0x000000003ffe1000 ioremap
0xffffc90000002000-0xffffc90000005000   12288 acpi_os_map_iomem+0x29e/0x2c0 phys=0x000000003ffe0000 ioremap
0xffffc90000005000-0xffffc90000007000    8192 hpet_enable+0x36/0x4a9 phys=0x00000000fed00000 ioremap
0xffffc90000007000-0xffffc90000009000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
0xffffc90000009000-0xffffc9000000b000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
0xffffc9000000b000-0xffffc9000000d000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
0xffffc9000000d000-0xffffc9000000f000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
0xffffc90000011000-0xffffc90000015000   16384 n_tty_open+0x16/0xe0 pages=3 vmalloc
0xffffc900003de000-0xffffc900003e0000    8192 acpi_os_map_iomem+0x29e/0x2c0 phys=0x00000000fed00000 ioremap
0xffffc900003e0000-0xffffc900003e2000    8192 memremap+0x1a1/0x280 phys=0x00000000000f5000 ioremap
0xffffc900003e2000-0xffffc900003f3000   69632 pcpu_create_chunk+0x80/0x2c0 pages=16 vmalloc
0xffffc900003f3000-0xffffc90000405000   73728 pcpu_create_chunk+0xb7/0x2c0 pages=17 vmalloc
0xffffc90000405000-0xffffc9000040a000   20480 pcpu_create_chunk+0xed/0x2c0 pages=4 vmalloc
0xffffe8ffffc00000-0xffffe8ffffe00000 2097152 pcpu_get_vm_areas+0x0/0x1a40 vmalloc

With randomize_vmalloc=1, the allocations are randomized:
$ cat /proc/vmallocinfo
0xffffca3a36442000-0xffffca3a36447000   20480 pcpu_create_chunk+0xed/0x2c0 pages=4 vmalloc
0xffffca63034d6000-0xffffca63034d9000   12288 acpi_os_map_iomem+0x29e/0x2c0 phys=0x000000003ffe0000 ioremap
0xffffcce23d32e000-0xffffcce23d330000    8192 memremap+0x1a1/0x280 phys=0x00000000000f5000 ioremap
0xffffcfb9f0e22000-0xffffcfb9f0e24000    8192 hpet_enable+0x36/0x4a9 phys=0x00000000fed00000 ioremap
0xffffd1df23e9e000-0xffffd1df23eb0000   73728 pcpu_create_chunk+0xb7/0x2c0 pages=17 vmalloc
0xffffd690c2990000-0xffffd690c2992000    8192 acpi_os_map_iomem+0x29e/0x2c0 phys=0x000000003ffe1000 ioremap
0xffffd8460c718000-0xffffd8460c71c000   16384 n_tty_open+0x16/0xe0 pages=3 vmalloc
0xffffd89aba709000-0xffffd89aba70b000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
0xffffe0ca3f2ed000-0xffffe0ca3f2ef000    8192 acpi_os_map_iomem+0x29e/0x2c0 phys=0x00000000fed00000 ioremap
0xffffe3ba44802000-0xffffe3ba44804000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
0xffffe4524b2a2000-0xffffe4524b2a4000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
0xffffe61372b2e000-0xffffe61372b30000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
0xffffe704d2f7c000-0xffffe704d2f8d000   69632 pcpu_create_chunk+0x80/0x2c0 pages=16 vmalloc
0xffffe8ffffc00000-0xffffe8ffffe00000 2097152 pcpu_get_vm_areas+0x0/0x1a40 vmalloc

CC: Andrew Morton <akpm@linux-foundation.org>
CC: Andy Lutomirski <luto@kernel.org>
CC: Jann Horn <jannh@google.com>
CC: Kees Cook <keescook@chromium.org>
CC: Linux API <linux-api@vger.kernel.org>
CC: Matthew Wilcox <willy@infradead.org>
CC: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Topi Miettinen <toiwoton@gmail.com>
---
 .../admin-guide/kernel-parameters.txt         |  2 ++
 mm/vmalloc.c                                  | 25 +++++++++++++++++--
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 44fde25bb221..a0242e31d2d8 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4017,6 +4017,8 @@
 
 	ramdisk_start=	[RAM] RAM disk image start address
 
+	randomize_vmalloc= [KNL] Randomize vmalloc() allocations.
+
 	random.trust_cpu={on,off}
 			[KNL] Enable or disable trusting the use of the
 			CPU's random number generator (if available) to
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 6ae491a8b210..a5f7bb46ddf2 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -34,6 +34,7 @@
 #include <linux/bitops.h>
 #include <linux/rbtree_augmented.h>
 #include <linux/overflow.h>
+#include <linux/random.h>
 
 #include <linux/uaccess.h>
 #include <asm/tlbflush.h>
@@ -1079,6 +1080,17 @@ adjust_va_to_fit_type(struct vmap_area *va,
 	return 0;
 }
 
+static int randomize_vmalloc = 0;
+
+static int __init set_randomize_vmalloc(char *str)
+{
+	if (!str)
+		return 0;
+	randomize_vmalloc = simple_strtoul(str, &str, 0);
+	return 1;
+}
+__setup("randomize_vmalloc=", set_randomize_vmalloc);
+
 /*
  * Returns a start address of the newly allocated area, if success.
  * Otherwise a vend is returned that indicates failure.
@@ -1152,7 +1164,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
 				int node, gfp_t gfp_mask)
 {
 	struct vmap_area *va, *pva;
-	unsigned long addr;
+	unsigned long addr, voffset;
 	int purged = 0;
 	int ret;
 
@@ -1207,11 +1219,20 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
 	if (pva && __this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva))
 		kmem_cache_free(vmap_area_cachep, pva);
 
+	/* Randomize allocation */
+	if (randomize_vmalloc) {
+		voffset = get_random_long() & (roundup_pow_of_two(vend - vstart) - 1);
+		voffset = PAGE_ALIGN(voffset);
+		if (voffset + size > vend - vstart)
+			voffset = vend - vstart - size;
+	} else
+		voffset = 0;
+
 	/*
 	 * If an allocation fails, the "vend" address is
 	 * returned. Therefore trigger the overflow path.
 	 */
-	addr = __alloc_vmap_area(size, align, vstart, vend);
+	addr = __alloc_vmap_area(size, align, vstart + voffset, vend);
 	spin_unlock(&free_vmap_area_lock);
 
 	if (unlikely(addr == vend))
-- 
2.29.2


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm/vmalloc: randomize vmalloc() allocations
  2020-12-01 21:45 [PATCH] mm/vmalloc: randomize vmalloc() allocations Topi Miettinen
@ 2020-12-02 18:49 ` Topi Miettinen
  2020-12-03  6:58   ` Mike Rapoport
  2020-12-02 18:53 ` Matthew Wilcox
  1 sibling, 1 reply; 11+ messages in thread
From: Topi Miettinen @ 2020-12-02 18:49 UTC (permalink / raw)
  To: linux-hardening, akpm, linux-mm, linux-kernel
  Cc: Andy Lutomirski, Jann Horn, Kees Cook, Linux API, Matthew Wilcox,
	Mike Rapoport

On 1.12.2020 23.45, Topi Miettinen wrote:
> Memory mappings inside kernel allocated with vmalloc() are in
> predictable order and packed tightly toward the low addresses. With
> new kernel boot parameter 'randomize_vmalloc=1', the entire area is
> used randomly to make the allocations less predictable and harder to
> guess for attackers.
> 
> Without randomize_vmalloc=1:
> $ cat /proc/vmallocinfo
> 0xffffc90000000000-0xffffc90000002000    8192 acpi_os_map_iomem+0x29e/0x2c0 phys=0x000000003ffe1000 ioremap
> 0xffffc90000002000-0xffffc90000005000   12288 acpi_os_map_iomem+0x29e/0x2c0 phys=0x000000003ffe0000 ioremap
> 0xffffc90000005000-0xffffc90000007000    8192 hpet_enable+0x36/0x4a9 phys=0x00000000fed00000 ioremap
> 0xffffc90000007000-0xffffc90000009000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> 0xffffc90000009000-0xffffc9000000b000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> 0xffffc9000000b000-0xffffc9000000d000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> 0xffffc9000000d000-0xffffc9000000f000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> 0xffffc90000011000-0xffffc90000015000   16384 n_tty_open+0x16/0xe0 pages=3 vmalloc
> 0xffffc900003de000-0xffffc900003e0000    8192 acpi_os_map_iomem+0x29e/0x2c0 phys=0x00000000fed00000 ioremap
> 0xffffc900003e0000-0xffffc900003e2000    8192 memremap+0x1a1/0x280 phys=0x00000000000f5000 ioremap
> 0xffffc900003e2000-0xffffc900003f3000   69632 pcpu_create_chunk+0x80/0x2c0 pages=16 vmalloc
> 0xffffc900003f3000-0xffffc90000405000   73728 pcpu_create_chunk+0xb7/0x2c0 pages=17 vmalloc
> 0xffffc90000405000-0xffffc9000040a000   20480 pcpu_create_chunk+0xed/0x2c0 pages=4 vmalloc
> 0xffffe8ffffc00000-0xffffe8ffffe00000 2097152 pcpu_get_vm_areas+0x0/0x1a40 vmalloc
> 
> With randomize_vmalloc=1, the allocations are randomized:
> $ cat /proc/vmallocinfo
> 0xffffca3a36442000-0xffffca3a36447000   20480 pcpu_create_chunk+0xed/0x2c0 pages=4 vmalloc
> 0xffffca63034d6000-0xffffca63034d9000   12288 acpi_os_map_iomem+0x29e/0x2c0 phys=0x000000003ffe0000 ioremap
> 0xffffcce23d32e000-0xffffcce23d330000    8192 memremap+0x1a1/0x280 phys=0x00000000000f5000 ioremap
> 0xffffcfb9f0e22000-0xffffcfb9f0e24000    8192 hpet_enable+0x36/0x4a9 phys=0x00000000fed00000 ioremap
> 0xffffd1df23e9e000-0xffffd1df23eb0000   73728 pcpu_create_chunk+0xb7/0x2c0 pages=17 vmalloc
> 0xffffd690c2990000-0xffffd690c2992000    8192 acpi_os_map_iomem+0x29e/0x2c0 phys=0x000000003ffe1000 ioremap
> 0xffffd8460c718000-0xffffd8460c71c000   16384 n_tty_open+0x16/0xe0 pages=3 vmalloc
> 0xffffd89aba709000-0xffffd89aba70b000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> 0xffffe0ca3f2ed000-0xffffe0ca3f2ef000    8192 acpi_os_map_iomem+0x29e/0x2c0 phys=0x00000000fed00000 ioremap
> 0xffffe3ba44802000-0xffffe3ba44804000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> 0xffffe4524b2a2000-0xffffe4524b2a4000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> 0xffffe61372b2e000-0xffffe61372b30000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> 0xffffe704d2f7c000-0xffffe704d2f8d000   69632 pcpu_create_chunk+0x80/0x2c0 pages=16 vmalloc
> 0xffffe8ffffc00000-0xffffe8ffffe00000 2097152 pcpu_get_vm_areas+0x0/0x1a40 vmalloc
> 
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Andy Lutomirski <luto@kernel.org>
> CC: Jann Horn <jannh@google.com>
> CC: Kees Cook <keescook@chromium.org>
> CC: Linux API <linux-api@vger.kernel.org>
> CC: Matthew Wilcox <willy@infradead.org>
> CC: Mike Rapoport <rppt@kernel.org>
> Signed-off-by: Topi Miettinen <toiwoton@gmail.com>
> ---
>   .../admin-guide/kernel-parameters.txt         |  2 ++
>   mm/vmalloc.c                                  | 25 +++++++++++++++++--
>   2 files changed, 25 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 44fde25bb221..a0242e31d2d8 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4017,6 +4017,8 @@
>   
>   	ramdisk_start=	[RAM] RAM disk image start address
>   
> +	randomize_vmalloc= [KNL] Randomize vmalloc() allocations.
> +
>   	random.trust_cpu={on,off}
>   			[KNL] Enable or disable trusting the use of the
>   			CPU's random number generator (if available) to
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 6ae491a8b210..a5f7bb46ddf2 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -34,6 +34,7 @@
>   #include <linux/bitops.h>
>   #include <linux/rbtree_augmented.h>
>   #include <linux/overflow.h>
> +#include <linux/random.h>
>   
>   #include <linux/uaccess.h>
>   #include <asm/tlbflush.h>
> @@ -1079,6 +1080,17 @@ adjust_va_to_fit_type(struct vmap_area *va,
>   	return 0;
>   }
>   
> +static int randomize_vmalloc = 0;
> +
> +static int __init set_randomize_vmalloc(char *str)
> +{
> +	if (!str)
> +		return 0;
> +	randomize_vmalloc = simple_strtoul(str, &str, 0);
> +	return 1;
> +}
> +__setup("randomize_vmalloc=", set_randomize_vmalloc);
> +
>   /*
>    * Returns a start address of the newly allocated area, if success.
>    * Otherwise a vend is returned that indicates failure.
> @@ -1152,7 +1164,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
>   				int node, gfp_t gfp_mask)
>   {
>   	struct vmap_area *va, *pva;
> -	unsigned long addr;
> +	unsigned long addr, voffset;
>   	int purged = 0;
>   	int ret;
>   
> @@ -1207,11 +1219,20 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
>   	if (pva && __this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva))
>   		kmem_cache_free(vmap_area_cachep, pva);
>   
> +	/* Randomize allocation */
> +	if (randomize_vmalloc) {
> +		voffset = get_random_long() & (roundup_pow_of_two(vend - vstart) - 1);
> +		voffset = PAGE_ALIGN(voffset);
> +		if (voffset + size > vend - vstart)
> +			voffset = vend - vstart - size;
> +	} else
> +		voffset = 0;
> +
>   	/*
>   	 * If an allocation fails, the "vend" address is
>   	 * returned. Therefore trigger the overflow path.
>   	 */
> -	addr = __alloc_vmap_area(size, align, vstart, vend);
> +	addr = __alloc_vmap_area(size, align, vstart + voffset, vend);

Does not work so well after all:

Dec 02 18:25:01 kernel: systemd-udevd: vmalloc: allocation failure: 
10526720 bytes, mode:0xcc0(GFP_KERNEL), 
nodemask=(null),cpuset=/,mems_allowed=0
Dec 02 18:25:01 kernel: CPU: 12 PID: 716 Comm: systemd-udevd Tainted: G 
            E     5.10.0-rc5+ #25
Dec 02 18:25:01 kernel: Hardware name: <redacted>
Dec 02 18:25:01 kernel: Call Trace:
Dec 02 18:25:01 kernel:  dump_stack+0x7d/0xa3
Dec 02 18:25:01 kernel:  warn_alloc.cold+0x83/0x126
Dec 02 18:25:01 kernel:  ? zone_watermark_ok_safe+0x140/0x140
Dec 02 18:25:01 kernel:  ? __kasan_slab_free+0x122/0x150
Dec 02 18:25:01 kernel:  ? slab_free_freelist_hook+0x66/0x110
Dec 02 18:25:01 kernel:  ? kfree+0xba/0x3e0
Dec 02 18:25:01 kernel:  __vmalloc_node_range+0xd7/0xf0
Dec 02 18:25:01 kernel:  ? load_module+0x29e0/0x3f40
Dec 02 18:25:01 kernel:  module_alloc+0x9f/0x110
Dec 02 18:25:01 kernel:  ? load_module+0x29e0/0x3f40
Dec 02 18:25:01 kernel:  load_module+0x29e0/0x3f40
Dec 02 18:25:01 kernel:  ? ima_post_read_file+0x140/0x150
Dec 02 18:25:01 kernel:  ? module_frob_arch_sections+0x20/0x20
Dec 02 18:25:01 kernel:  ? kernel_read_file+0x1d2/0x3e0
Dec 02 18:25:01 kernel:  ? __x64_sys_fsopen+0x1f0/0x1f0
Dec 02 18:25:01 kernel:  ? up_write+0x92/0x140
Dec 02 18:25:01 kernel:  ? downgrade_write+0x160/0x160
Dec 02 18:25:01 kernel:  ? kernel_read_file_from_fd+0x4b/0x90
Dec 02 18:25:01 kernel:  __do_sys_finit_module+0x110/0x1a0
Dec 02 18:25:01 kernel:  ? __x64_sys_init_module+0x50/0x50
Dec 02 18:25:01 kernel:  ? get_nth_filter.part.0+0x160/0x160
Dec 02 18:25:01 kernel:  ? randomize_stack_top+0x70/0x70
Dec 02 18:25:01 kernel:  ? __x64_sys_fstat+0x30/0x30
Dec 02 18:25:01 kernel:  ? __audit_syscall_entry+0x16a/0x1d0
Dec 02 18:25:01 kernel:  ? ktime_get_coarse_real_ts64+0x4a/0x70
Dec 02 18:25:01 kernel:  do_syscall_64+0x33/0x40
Dec 02 18:25:01 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 02 18:25:01 kernel: RIP: 0033:0xdd0fd2fb989
Dec 02 18:25:01 kernel: Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 
44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 
24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d7 54 0c 00 f7 d8 64 
89 01 48
Dec 02 18:25:01 kernel: RSP: 002b:00000ceb4f03f028 EFLAGS: 00000246 
ORIG_RAX: 0000000000000139
Dec 02 18:25:01 kernel: RAX: ffffffffffffffda RBX: 00000ef04a12fa90 RCX: 
00000dd0fd2fb989
Dec 02 18:25:01 kernel: RDX: 0000000000000000 RSI: 000003b119220e4d RDI: 
0000000000000017
Dec 02 18:25:01 kernel: RBP: 0000000000020000 R08: 0000000000000000 R09: 
00000ef04a11b018
Dec 02 18:25:01 kernel: R10: 0000000000000017 R11: 0000000000000246 R12: 
000003b119220e4d
Dec 02 18:25:01 kernel: R13: 0000000000000000 R14: 00000ef04a124a10 R15: 
00000ef04a12fa90
Dec 02 18:25:01 kernel: Mem-Info:
Dec 02 18:25:01 kernel: active_anon:96 inactive_anon:17667 isolated_anon:0
                          active_file:15598 inactive_file:35563 
isolated_file:0
                          unevictable:0 dirty:0 writeback:0
                          slab_reclaimable:8064 slab_unreclaimable:159447
                          mapped:10434 shmem:229 pagetables:5844 bounce:0
                          free:3176890 free_pcp:2892 free_cma:0
Dec 02 18:25:01 kernel: Node 0 active_anon:384kB inactive_anon:70668kB 
active_file:62392kB inactive_file:142252kB unevictable:0kB 
isolated(anon):0kB isolated(file):0kB mapped:41736kB dirty:0kB 
writeback:0kB shmem:916kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 
0kB wri>
Dec 02 18:25:01 kernel: DMA free:13860kB min:76kB low:92kB high:108kB 
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB 
active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB 
present:15996kB managed:15908kB mlocked:0kB pagetables:0kB bounce:0kB 
free_pc>
Dec 02 18:25:01 kernel: lowmem_reserve[]: 0 2650 13377 13377 13377
Dec 02 18:25:01 kernel: DMA32 free:2790432kB min:13372kB low:16712kB 
high:20052kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB 
active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB 
present:2796348kB managed:2796008kB mlocked:0kB pagetables:0kB bo>
Dec 02 18:25:01 kernel: lowmem_reserve[]: 0 0 10726 10726 10726
Dec 02 18:25:01 kernel: Normal free:9903268kB min:54128kB low:67660kB 
high:81192kB reserved_highatomic:0KB active_anon:384kB 
inactive_anon:70668kB active_file:62392kB inactive_file:142252kB 
unevictable:0kB writepending:0kB present:13356288kB managed:10991672kB 
mlocked:0kB>
Dec 02 18:25:01 kernel: lowmem_reserve[]: 0 0 0 0 0
Dec 02 18:25:01 kernel: DMA: 3*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 
2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 2*2048kB (UM) 
2*4096kB (M) = 13860kB
Dec 02 18:25:01 kernel: DMA32: 12*4kB (UM) 10*8kB (UM) 8*16kB (M) 9*32kB 
(M) 8*64kB (UM) 6*128kB (UM) 7*256kB (UM) 9*512kB (UM) 5*1024kB (UM) 
6*2048kB (M) 675*4096kB (M) = 2790432kB
Dec 02 18:25:01 kernel: Normal: 82*4kB (UE) 1*8kB (E) 3*16kB (UME) 
16*32kB (UM) 1*64kB (U) 1*128kB (U) 1*256kB (M) 6*512kB (UM) 8*1024kB 
(UME) 1*2048kB (E) 2414*4096kB (M) = 9902400kB

I suppose the random address happened to be too near 'vend' and no 
suitable block was found. Perhaps the search in __alloc_vmap_area() 
should then continue at 'vstart' instead (so __alloc_vmap_area() would 
be passed all three of vstart, voffset, vend instead of just 
vstart+voffset, vend).

This also seems to randomize module addresses. I was going to check that 
next, so nice surprise!

-Topi

>   	spin_unlock(&free_vmap_area_lock);
>   
>   	if (unlikely(addr == vend))
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm/vmalloc: randomize vmalloc() allocations
  2020-12-01 21:45 [PATCH] mm/vmalloc: randomize vmalloc() allocations Topi Miettinen
  2020-12-02 18:49 ` Topi Miettinen
@ 2020-12-02 18:53 ` Matthew Wilcox
  2020-12-02 21:28   ` Topi Miettinen
  1 sibling, 1 reply; 11+ messages in thread
From: Matthew Wilcox @ 2020-12-02 18:53 UTC (permalink / raw)
  To: Topi Miettinen
  Cc: linux-hardening, akpm, linux-mm, linux-kernel, Andy Lutomirski,
	Jann Horn, Kees Cook, Linux API, Mike Rapoport

On Tue, Dec 01, 2020 at 11:45:47PM +0200, Topi Miettinen wrote:
> +	/* Randomize allocation */
> +	if (randomize_vmalloc) {
> +		voffset = get_random_long() & (roundup_pow_of_two(vend - vstart) - 1);
> +		voffset = PAGE_ALIGN(voffset);
> +		if (voffset + size > vend - vstart)
> +			voffset = vend - vstart - size;
> +	} else
> +		voffset = 0;
> +
>  	/*
>  	 * If an allocation fails, the "vend" address is
>  	 * returned. Therefore trigger the overflow path.
>  	 */
> -	addr = __alloc_vmap_area(size, align, vstart, vend);
> +	addr = __alloc_vmap_area(size, align, vstart + voffset, vend);
>  	spin_unlock(&free_vmap_area_lock);

What if there isn't any free address space between vstart+voffset and
vend, but there is free address space between vstart and voffset?
Seems like we should add:

	addr = __alloc_vmap_area(size, align, vstart + voffset, vend);
+	if (!addr)
+		addr = __alloc_vmap_area(size, align, vstart, vend);
	spin_unlock(&free_vmap_area_lock);

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm/vmalloc: randomize vmalloc() allocations
  2020-12-02 18:53 ` Matthew Wilcox
@ 2020-12-02 21:28   ` Topi Miettinen
  0 siblings, 0 replies; 11+ messages in thread
From: Topi Miettinen @ 2020-12-02 21:28 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-hardening, akpm, linux-mm, linux-kernel, Andy Lutomirski,
	Jann Horn, Kees Cook, Linux API, Mike Rapoport

On 2.12.2020 20.53, Matthew Wilcox wrote:
> On Tue, Dec 01, 2020 at 11:45:47PM +0200, Topi Miettinen wrote:
>> +	/* Randomize allocation */
>> +	if (randomize_vmalloc) {
>> +		voffset = get_random_long() & (roundup_pow_of_two(vend - vstart) - 1);
>> +		voffset = PAGE_ALIGN(voffset);
>> +		if (voffset + size > vend - vstart)
>> +			voffset = vend - vstart - size;
>> +	} else
>> +		voffset = 0;
>> +
>>   	/*
>>   	 * If an allocation fails, the "vend" address is
>>   	 * returned. Therefore trigger the overflow path.
>>   	 */
>> -	addr = __alloc_vmap_area(size, align, vstart, vend);
>> +	addr = __alloc_vmap_area(size, align, vstart + voffset, vend);
>>   	spin_unlock(&free_vmap_area_lock);
> 
> What if there isn't any free address space between vstart+voffset and
> vend, but there is free address space between vstart and voffset?
> Seems like we should add:
> 
> 	addr = __alloc_vmap_area(size, align, vstart + voffset, vend);
> +	if (!addr)
> +		addr = __alloc_vmap_area(size, align, vstart, vend);
> 	spin_unlock(&free_vmap_area_lock);
> 

How about:

	addr = __alloc_vmap_area(size, align, vstart + voffset, vend);
+	if (!addr)
+		addr = __alloc_vmap_area(size, align, vstart, vstart + voffset + size);
	spin_unlock(&free_vmap_area_lock);

That way the search would not be redone for the area that was already 
checked and rejected.

Perhaps my previous patch for mmap() etc. randomization could also 
search towards higher addresses instead of trying random addresses five 
times in case of clashes.

-Topi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm/vmalloc: randomize vmalloc() allocations
  2020-12-02 18:49 ` Topi Miettinen
@ 2020-12-03  6:58   ` Mike Rapoport
  2020-12-03 23:15     ` David Laight
                       ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Mike Rapoport @ 2020-12-03  6:58 UTC (permalink / raw)
  To: Topi Miettinen
  Cc: linux-hardening, akpm, linux-mm, linux-kernel, Andy Lutomirski,
	Jann Horn, Kees Cook, Linux API, Matthew Wilcox

On Wed, Dec 02, 2020 at 08:49:06PM +0200, Topi Miettinen wrote:
> On 1.12.2020 23.45, Topi Miettinen wrote:
> > Memory mappings inside kernel allocated with vmalloc() are in
> > predictable order and packed tightly toward the low addresses. With
> > new kernel boot parameter 'randomize_vmalloc=1', the entire area is
> > used randomly to make the allocations less predictable and harder to
> > guess for attackers.
> > 
> 
> This also seems to randomize module addresses. I was going to check that
> next, so nice surprise!

Heh, that's because module_alloc() uses vmalloc() in that way or another :)

> -Topi
> 
> >   	spin_unlock(&free_vmap_area_lock);
> >   	if (unlikely(addr == vend))
> > 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH] mm/vmalloc: randomize vmalloc() allocations
  2020-12-03  6:58   ` Mike Rapoport
@ 2020-12-03 23:15     ` David Laight
  2020-12-04 10:58       ` Topi Miettinen
  2020-12-09 19:08     ` Topi Miettinen
  2020-12-10 19:58     ` Topi Miettinen
  2 siblings, 1 reply; 11+ messages in thread
From: David Laight @ 2020-12-03 23:15 UTC (permalink / raw)
  To: 'Mike Rapoport', Topi Miettinen
  Cc: linux-hardening, akpm, linux-mm, linux-kernel, Andy Lutomirski,
	Jann Horn, Kees Cook, Linux API, Matthew Wilcox

From: Mike Rapoport
> Sent: 03 December 2020 06:58
> 
> On Wed, Dec 02, 2020 at 08:49:06PM +0200, Topi Miettinen wrote:
> > On 1.12.2020 23.45, Topi Miettinen wrote:
> > > Memory mappings inside kernel allocated with vmalloc() are in
> > > predictable order and packed tightly toward the low addresses. With
> > > new kernel boot parameter 'randomize_vmalloc=1', the entire area is
> > > used randomly to make the allocations less predictable and harder to
> > > guess for attackers.

Isn't that going to horribly fragment the available address space
and make even moderate sized allocation requests fail (or sleep).

I'm not even sure that you need to use 'best fit' rather than
'first fit'.
'best fit' is certainly a lot better for a simple linked list
user space malloc.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm/vmalloc: randomize vmalloc() allocations
  2020-12-03 23:15     ` David Laight
@ 2020-12-04 10:58       ` Topi Miettinen
  2020-12-04 13:33         ` David Laight
  0 siblings, 1 reply; 11+ messages in thread
From: Topi Miettinen @ 2020-12-04 10:58 UTC (permalink / raw)
  To: David Laight, 'Mike Rapoport'
  Cc: linux-hardening, akpm, linux-mm, linux-kernel, Andy Lutomirski,
	Jann Horn, Kees Cook, Linux API, Matthew Wilcox

On 4.12.2020 1.15, David Laight wrote:
> From: Mike Rapoport
>> Sent: 03 December 2020 06:58
>>
>> On Wed, Dec 02, 2020 at 08:49:06PM +0200, Topi Miettinen wrote:
>>> On 1.12.2020 23.45, Topi Miettinen wrote:
>>>> Memory mappings inside kernel allocated with vmalloc() are in
>>>> predictable order and packed tightly toward the low addresses. With
>>>> new kernel boot parameter 'randomize_vmalloc=1', the entire area is
>>>> used randomly to make the allocations less predictable and harder to
>>>> guess for attackers.
> 
> Isn't that going to horribly fragment the available address space
> and make even moderate sized allocation requests fail (or sleep).

For 32 bit architecture this is a real issue, but I don't think for 64 
bits it will be a problem. You can't fragment the virtual memory space 
for small allocations because the resulting page tables will not fit in 
RAM for existing or near future systems.

For large allocations (directly mapping entire contents of TB sized NVME 
drives or a special application which needs 1GB huge pages) this could 
be a risk. Maybe this could be solved by reserving some space for them, 
or perhaps in those cases you shouldn't use randomize_vmalloc=1.

The method for reserving the large areas could something like below.

First, consider a simple arrangement of reserving high addresses for 
large allocations and low addresses for smaller allocations. The 
allocator would start searching downwards from high addresses for a free 
large block and upwards from low addresses for small blocks. Also the 
address space would be semi-rigidly divided to priority areas: area 0 
with priority for small allocations, area 1 with equal priority for both 
small and large, and area 2 where small allocations would be placed only 
as a last resort (which probably would never be the case).

The linear way of dividing the allocations would of course be very much 
non-random, so this could be improved with a pseudo-random scrambling 
function to distribute the addresses in memory. A simple example would 
be to randomly choose a value for one bit in the address for large 
allocations (not necessarily the most significant available but also 
large enough to align 1GB/TB sized allocations if needed), or a bit 
pattern across several address bits for non-even distribution.

The addresses would be also fully randomized inside each priority area.

The division would mean some loss of randomization. A simple rigid 
division of 50%/50% for small vs. large allocations would mean a loss of 
one bit but the above methods could help this. Dividing the address 
space less evenly would improve one side at the expense of the other. 
Cracking the scrambling function would reveal the bit(s) used for the 
division.

It would be nice to remove the current rigid division of the kernel 
address space (Documentation/x86/x86_64/mm.rst) and let the allocations 
be placed more randomly in the entire 47 bit address space. Would the 
above priority scheme (perhaps with a rigid priority for certain users) 
be good enough to allow this?

Even better would be to remove the use of highest bit for selecting 
kernel/user addresses but I suppose it would be a lot of work for 
gaining just one extra bit of randomness. There could be other effects 
though (good or bad).

-Topi

> I'm not even sure that you need to use 'best fit' rather than
> 'first fit'.
> 'best fit' is certainly a lot better for a simple linked list
> user space malloc.
> 
> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH] mm/vmalloc: randomize vmalloc() allocations
  2020-12-04 10:58       ` Topi Miettinen
@ 2020-12-04 13:33         ` David Laight
  2020-12-04 16:53           ` Topi Miettinen
  0 siblings, 1 reply; 11+ messages in thread
From: David Laight @ 2020-12-04 13:33 UTC (permalink / raw)
  To: 'Topi Miettinen', 'Mike Rapoport'
  Cc: linux-hardening, akpm, linux-mm, linux-kernel, Andy Lutomirski,
	Jann Horn, Kees Cook, Linux API, Matthew Wilcox

From: Topi Miettinen
> Sent: 04 December 2020 10:58
> 
> On 4.12.2020 1.15, David Laight wrote:
> > From: Mike Rapoport
> >> Sent: 03 December 2020 06:58
> >>
> >> On Wed, Dec 02, 2020 at 08:49:06PM +0200, Topi Miettinen wrote:
> >>> On 1.12.2020 23.45, Topi Miettinen wrote:
> >>>> Memory mappings inside kernel allocated with vmalloc() are in
> >>>> predictable order and packed tightly toward the low addresses. With
> >>>> new kernel boot parameter 'randomize_vmalloc=1', the entire area is
> >>>> used randomly to make the allocations less predictable and harder to
> >>>> guess for attackers.
> >
> > Isn't that going to horribly fragment the available address space
> > and make even moderate sized allocation requests fail (or sleep).
> 
> For 32 bit architecture this is a real issue, but I don't think for 64
> bits it will be a problem. You can't fragment the virtual memory space
> for small allocations because the resulting page tables will not fit in
> RAM for existing or near future systems.

Hmmm truly random allocations are going to need 3 or 4 extra page tables
on 64bit systems. A bit overhead for 4k allocates.
While you won't run out of address space, you will run out of memory.

Randomising the allocated address with the area that already
has page tables allocated might make a bit of sense.
Then allocate similar(ish) sized items from the same 'large' pages.

I was wondering if a flag indicating whether an allocate was 'long term'
or 'short term' might help the placement.
Short term small items could be used to fill the space in 'large pages' left
by non-aligned length large items.

Trouble is you need a CBU (Crystal Ball Unit) to get it right.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm/vmalloc: randomize vmalloc() allocations
  2020-12-04 13:33         ` David Laight
@ 2020-12-04 16:53           ` Topi Miettinen
  0 siblings, 0 replies; 11+ messages in thread
From: Topi Miettinen @ 2020-12-04 16:53 UTC (permalink / raw)
  To: David Laight, 'Mike Rapoport'
  Cc: linux-hardening, akpm, linux-mm, linux-kernel, Andy Lutomirski,
	Jann Horn, Kees Cook, Linux API, Matthew Wilcox

On 4.12.2020 15.33, David Laight wrote:
> From: Topi Miettinen
>> Sent: 04 December 2020 10:58
>>
>> On 4.12.2020 1.15, David Laight wrote:
>>> From: Mike Rapoport
>>>> Sent: 03 December 2020 06:58
>>>>
>>>> On Wed, Dec 02, 2020 at 08:49:06PM +0200, Topi Miettinen wrote:
>>>>> On 1.12.2020 23.45, Topi Miettinen wrote:
>>>>>> Memory mappings inside kernel allocated with vmalloc() are in
>>>>>> predictable order and packed tightly toward the low addresses. With
>>>>>> new kernel boot parameter 'randomize_vmalloc=1', the entire area is
>>>>>> used randomly to make the allocations less predictable and harder to
>>>>>> guess for attackers.
>>>
>>> Isn't that going to horribly fragment the available address space
>>> and make even moderate sized allocation requests fail (or sleep).
>>
>> For 32 bit architecture this is a real issue, but I don't think for 64
>> bits it will be a problem. You can't fragment the virtual memory space
>> for small allocations because the resulting page tables will not fit in
>> RAM for existing or near future systems.
> 
> Hmmm truly random allocations are going to need 3 or 4 extra page tables
> on 64bit systems. A bit overhead for 4k allocates.
> While you won't run out of address space, you will run out of memory.

There are 3500 entries in /proc/vmallocinfo on my system with lots of 
BPF filters (which allocate 8kB blocks). The total memory used is 740MB. 
Assuming that every entry needed additional 4 pages, it would mean 55MB, 
or 7.4% extra. I don't think that's a problem and even if it would be in 
some case, there's still the option of not using randomize_vmalloc.

-Topi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm/vmalloc: randomize vmalloc() allocations
  2020-12-03  6:58   ` Mike Rapoport
  2020-12-03 23:15     ` David Laight
@ 2020-12-09 19:08     ` Topi Miettinen
  2020-12-10 19:58     ` Topi Miettinen
  2 siblings, 0 replies; 11+ messages in thread
From: Topi Miettinen @ 2020-12-09 19:08 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-hardening, akpm, linux-mm, linux-kernel, Andy Lutomirski,
	Jann Horn, Kees Cook, Linux API, Matthew Wilcox

On 3.12.2020 8.58, Mike Rapoport wrote:
> On Wed, Dec 02, 2020 at 08:49:06PM +0200, Topi Miettinen wrote:
>> On 1.12.2020 23.45, Topi Miettinen wrote:
>>> Memory mappings inside kernel allocated with vmalloc() are in
>>> predictable order and packed tightly toward the low addresses. With
>>> new kernel boot parameter 'randomize_vmalloc=1', the entire area is
>>> used randomly to make the allocations less predictable and harder to
>>> guess for attackers.
>>>
>>
>> This also seems to randomize module addresses. I was going to check that
>> next, so nice surprise!
> 
> Heh, that's because module_alloc() uses vmalloc() in that way or another :)

The modules are still allocated from their small (1.5GB) separate area 
instead of the much larger (32TB/12.5PB) vmalloc area, which would 
greatly improve ASLR for the modules. To fix that, I tried to to #define 
MODULES_VADDR to VMALLOC_START etc. like x86_32 does, but then kernel 
dies very early without even any output.

-Topi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm/vmalloc: randomize vmalloc() allocations
  2020-12-03  6:58   ` Mike Rapoport
  2020-12-03 23:15     ` David Laight
  2020-12-09 19:08     ` Topi Miettinen
@ 2020-12-10 19:58     ` Topi Miettinen
  2 siblings, 0 replies; 11+ messages in thread
From: Topi Miettinen @ 2020-12-10 19:58 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-hardening, akpm, linux-mm, linux-kernel, Andy Lutomirski,
	Jann Horn, Kees Cook, Linux API, Matthew Wilcox

On 3.12.2020 8.58, Mike Rapoport wrote:
> On Wed, Dec 02, 2020 at 08:49:06PM +0200, Topi Miettinen wrote:
>> On 1.12.2020 23.45, Topi Miettinen wrote:
>>> Memory mappings inside kernel allocated with vmalloc() are in
>>> predictable order and packed tightly toward the low addresses. With
>>> new kernel boot parameter 'randomize_vmalloc=1', the entire area is
>>> used randomly to make the allocations less predictable and harder to
>>> guess for attackers.
>>>
>>
>> This also seems to randomize module addresses. I was going to check that
>> next, so nice surprise!
> 
> Heh, that's because module_alloc() uses vmalloc() in that way or another :)

I got a bit further with really using vmalloc with 
[VMALLOC_START..VMALLOC_END] for modules, but then inserting a module 
fails because of the relocations:
[    9.202856] module: overflow in relocation type 11 val ffffe1950e27f080

Type 11 is R_X86_64_32S expecting a 32 bits signed offset, so the loader 
obviously can't fit the relocation from the highest 2GB to somewhere 32 
TB lower.

The problem seems to be that the modules aren't really built as 
position-independent shared objects with -fPIE/-fPIC, but instead 
there's explicit -fno-PIE. I guess the modules also shouldn't use 
-mcmodel=kernel. Though tweaking the flags shows that some combinations 
aren't well supported (like ’-mindirect-branch=thunk-extern’ and 
‘-mcmodel=large’ are not compatible) and the handwritten assembly code 
also assumes 32 bit offsets.

A different approach could be to make the entire kernel relocatable to 
lower addresses and then the modules could stay close nearby. I guess 
the asm files aren't written with position independence in mind either.

But it seems that I'm finding and breaking lots of assumptions built in 
to the system. What's the experts' opinion, is full module/kernel 
randomization ever going to fly?

-Topi

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-12-10 19:59 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-01 21:45 [PATCH] mm/vmalloc: randomize vmalloc() allocations Topi Miettinen
2020-12-02 18:49 ` Topi Miettinen
2020-12-03  6:58   ` Mike Rapoport
2020-12-03 23:15     ` David Laight
2020-12-04 10:58       ` Topi Miettinen
2020-12-04 13:33         ` David Laight
2020-12-04 16:53           ` Topi Miettinen
2020-12-09 19:08     ` Topi Miettinen
2020-12-10 19:58     ` Topi Miettinen
2020-12-02 18:53 ` Matthew Wilcox
2020-12-02 21:28   ` Topi Miettinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).