All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] mm/vmalloc: randomize vmalloc() allocations
@ 2020-12-12 17:56 Topi Miettinen
  2021-02-13 11:30 ` Topi Miettinen
  0 siblings, 1 reply; 6+ messages in thread
From: Topi Miettinen @ 2020-12-12 17:56 UTC (permalink / raw)
  To: linux-hardening, akpm, linux-mm, linux-kernel
  Cc: Topi Miettinen, Andy Lutomirski, Jann Horn, Kees Cook, Linux API,
	Matthew Wilcox, Mike Rapoport

Memory mappings inside kernel allocated with vmalloc() are in
predictable order and packed tightly toward the low addresses. With
new kernel boot parameter 'randomize_vmalloc=1', the entire area is
used randomly to make the allocations less predictable and harder to
guess for attackers. Also module and BPF code locations get randomized
(within their dedicated and rather small area though) and if
CONFIG_VMAP_STACK is enabled, also kernel thread stack locations.

On 32 bit systems this may cause problems due to increased VM
fragmentation if the address space gets crowded.

On all systems, it will reduce performance and increase memory and
cache usage due to less efficient use of page tables and inability to
merge adjacent VMAs with compatible attributes. On x86_64 with 5 level
page tables, in the worst case, additional page table entries of up to
4 pages are created for each mapping, so with small mappings there's
considerable penalty.

Without randomize_vmalloc=1:
$ cat /proc/vmallocinfo
0xffffc90000000000-0xffffc90000002000    8192 acpi_os_map_iomem+0x29e/0x2c0 phys=0x000000003ffe1000 ioremap
0xffffc90000002000-0xffffc90000005000   12288 acpi_os_map_iomem+0x29e/0x2c0 phys=0x000000003ffe0000 ioremap
0xffffc90000005000-0xffffc90000007000    8192 hpet_enable+0x36/0x4a9 phys=0x00000000fed00000 ioremap
0xffffc90000007000-0xffffc90000009000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
0xffffc90000009000-0xffffc9000000b000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
0xffffc9000000b000-0xffffc9000000d000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
0xffffc9000000d000-0xffffc9000000f000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
0xffffc90000011000-0xffffc90000015000   16384 n_tty_open+0x16/0xe0 pages=3 vmalloc
0xffffc900003de000-0xffffc900003e0000    8192 acpi_os_map_iomem+0x29e/0x2c0 phys=0x00000000fed00000 ioremap
0xffffc900003e0000-0xffffc900003e2000    8192 memremap+0x1a1/0x280 phys=0x00000000000f5000 ioremap
0xffffc900003e2000-0xffffc900003f3000   69632 pcpu_create_chunk+0x80/0x2c0 pages=16 vmalloc
0xffffc900003f3000-0xffffc90000405000   73728 pcpu_create_chunk+0xb7/0x2c0 pages=17 vmalloc
0xffffc90000405000-0xffffc9000040a000   20480 pcpu_create_chunk+0xed/0x2c0 pages=4 vmalloc
0xffffe8ffffc00000-0xffffe8ffffe00000 2097152 pcpu_get_vm_areas+0x0/0x1a40 vmalloc

With randomize_vmalloc=1, the allocations are randomized:
$ cat /proc/vmallocinfo
0xffffca3a36442000-0xffffca3a36447000   20480 pcpu_create_chunk+0xed/0x2c0 pages=4 vmalloc
0xffffca63034d6000-0xffffca63034d9000   12288 acpi_os_map_iomem+0x29e/0x2c0 phys=0x000000003ffe0000 ioremap
0xffffcce23d32e000-0xffffcce23d330000    8192 memremap+0x1a1/0x280 phys=0x00000000000f5000 ioremap
0xffffcfb9f0e22000-0xffffcfb9f0e24000    8192 hpet_enable+0x36/0x4a9 phys=0x00000000fed00000 ioremap
0xffffd1df23e9e000-0xffffd1df23eb0000   73728 pcpu_create_chunk+0xb7/0x2c0 pages=17 vmalloc
0xffffd690c2990000-0xffffd690c2992000    8192 acpi_os_map_iomem+0x29e/0x2c0 phys=0x000000003ffe1000 ioremap
0xffffd8460c718000-0xffffd8460c71c000   16384 n_tty_open+0x16/0xe0 pages=3 vmalloc
0xffffd89aba709000-0xffffd89aba70b000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
0xffffe0ca3f2ed000-0xffffe0ca3f2ef000    8192 acpi_os_map_iomem+0x29e/0x2c0 phys=0x00000000fed00000 ioremap
0xffffe3ba44802000-0xffffe3ba44804000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
0xffffe4524b2a2000-0xffffe4524b2a4000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
0xffffe61372b2e000-0xffffe61372b30000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
0xffffe704d2f7c000-0xffffe704d2f8d000   69632 pcpu_create_chunk+0x80/0x2c0 pages=16 vmalloc
0xffffe8ffffc00000-0xffffe8ffffe00000 2097152 pcpu_get_vm_areas+0x0/0x1a40 vmalloc

With CONFIG_VMAP_STACK, also kernel thread stacks are placed in
vmalloc area and therefore they also get randomized (only one example
line from /proc/vmallocinfo shown for brevity):

unrandomized:
0xffffc90000018000-0xffffc90000021000   36864 kernel_clone+0xf9/0x560 pages=8 vmalloc

randomized:
0xffffcb57611a8000-0xffffcb57611b1000   36864 kernel_clone+0xf9/0x560 pages=8 vmalloc

CC: Andrew Morton <akpm@linux-foundation.org>
CC: Andy Lutomirski <luto@kernel.org>
CC: Jann Horn <jannh@google.com>
CC: Kees Cook <keescook@chromium.org>
CC: Linux API <linux-api@vger.kernel.org>
CC: Matthew Wilcox <willy@infradead.org>
CC: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Topi Miettinen <toiwoton@gmail.com>
---
v2: retry allocation from other end of vmalloc space in case of
failure (Matthew Wilcox), improve commit message and documentation
---
 .../admin-guide/kernel-parameters.txt         | 23 +++++++++++++++
 mm/vmalloc.c                                  | 29 +++++++++++++++++--
 2 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 44fde25bb221..9386b1b40a27 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4017,6 +4017,29 @@
 
 	ramdisk_start=	[RAM] RAM disk image start address
 
+	randomize_vmalloc= [KNL] Randomize vmalloc() allocations. With 1,
+			the entire vmalloc() area is used randomly to
+			make the allocations less predictable and
+			harder to guess for attackers. Also module and
+			BPF code locations get randomized (within
+			their dedicated and rather small area though)
+			and if CONFIG_VMAP_STACK is enabled, also
+			kernel thread stack locations.
+
+			On 32 bit systems this may cause problems due
+			to increased VM fragmentation if the address
+			space gets crowded.
+
+			On all systems, it will reduce performance and
+			increase memory and cache usage due to less
+			efficient use of page tables and inability to
+			merge adjacent VMAs with compatible
+			attributes. On x86_64 with 5 level page
+			tables, in the worst case, additional page
+			table entries of up to 4 pages are created for
+			each mapping, so with small mappings there's
+			considerable penalty.
+
 	random.trust_cpu={on,off}
 			[KNL] Enable or disable trusting the use of the
 			CPU's random number generator (if available) to
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 6ae491a8b210..d78528af6316 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -34,6 +34,7 @@
 #include <linux/bitops.h>
 #include <linux/rbtree_augmented.h>
 #include <linux/overflow.h>
+#include <linux/random.h>
 
 #include <linux/uaccess.h>
 #include <asm/tlbflush.h>
@@ -1079,6 +1080,17 @@ adjust_va_to_fit_type(struct vmap_area *va,
 	return 0;
 }
 
+static int randomize_vmalloc = 0;
+
+static int __init set_randomize_vmalloc(char *str)
+{
+	if (!str)
+		return 0;
+	randomize_vmalloc = simple_strtoul(str, &str, 0);
+	return 1;
+}
+__setup("randomize_vmalloc=", set_randomize_vmalloc);
+
 /*
  * Returns a start address of the newly allocated area, if success.
  * Otherwise a vend is returned that indicates failure.
@@ -1152,7 +1164,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
 				int node, gfp_t gfp_mask)
 {
 	struct vmap_area *va, *pva;
-	unsigned long addr;
+	unsigned long addr, voffset;
 	int purged = 0;
 	int ret;
 
@@ -1207,11 +1219,24 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
 	if (pva && __this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva))
 		kmem_cache_free(vmap_area_cachep, pva);
 
+	/* Randomize allocation */
+	if (randomize_vmalloc) {
+		voffset = get_random_long() & (roundup_pow_of_two(vend - vstart) - 1);
+		voffset = PAGE_ALIGN(voffset);
+		if (voffset + size > vend - vstart)
+			voffset = vend - vstart - size;
+	} else
+		voffset = 0;
+
 	/*
 	 * If an allocation fails, the "vend" address is
 	 * returned. Therefore trigger the overflow path.
 	 */
-	addr = __alloc_vmap_area(size, align, vstart, vend);
+	addr = __alloc_vmap_area(size, align, vstart + voffset, vend);
+
+	if (unlikely(addr == vend) && voffset)
+		/* Retry randomization from other end */
+		addr = __alloc_vmap_area(size, align, vstart, vstart + voffset + size);
 	spin_unlock(&free_vmap_area_lock);
 
 	if (unlikely(addr == vend))

base-commit: 7f376f1917d7461e05b648983e8d2aea9d0712b2
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] mm/vmalloc: randomize vmalloc() allocations
  2020-12-12 17:56 [PATCH v2] mm/vmalloc: randomize vmalloc() allocations Topi Miettinen
@ 2021-02-13 11:30 ` Topi Miettinen
  2021-02-13 11:55   ` Uladzislau Rezki
  0 siblings, 1 reply; 6+ messages in thread
From: Topi Miettinen @ 2021-02-13 11:30 UTC (permalink / raw)
  To: linux-hardening, akpm, linux-mm, linux-kernel, Andy Lutomirski,
	Jann Horn, Kees Cook, Linux API, Matthew Wilcox, Mike Rapoport

Hello,

Is there a chance of getting this reviewed and maybe even merged, please?

-Topi

On 12.12.2020 19.56, Topi Miettinen wrote:
> Memory mappings inside kernel allocated with vmalloc() are in
> predictable order and packed tightly toward the low addresses. With
> new kernel boot parameter 'randomize_vmalloc=1', the entire area is
> used randomly to make the allocations less predictable and harder to
> guess for attackers. Also module and BPF code locations get randomized
> (within their dedicated and rather small area though) and if
> CONFIG_VMAP_STACK is enabled, also kernel thread stack locations.
> 
> On 32 bit systems this may cause problems due to increased VM
> fragmentation if the address space gets crowded.
> 
> On all systems, it will reduce performance and increase memory and
> cache usage due to less efficient use of page tables and inability to
> merge adjacent VMAs with compatible attributes. On x86_64 with 5 level
> page tables, in the worst case, additional page table entries of up to
> 4 pages are created for each mapping, so with small mappings there's
> considerable penalty.
> 
> Without randomize_vmalloc=1:
> $ cat /proc/vmallocinfo
> 0xffffc90000000000-0xffffc90000002000    8192 acpi_os_map_iomem+0x29e/0x2c0 phys=0x000000003ffe1000 ioremap
> 0xffffc90000002000-0xffffc90000005000   12288 acpi_os_map_iomem+0x29e/0x2c0 phys=0x000000003ffe0000 ioremap
> 0xffffc90000005000-0xffffc90000007000    8192 hpet_enable+0x36/0x4a9 phys=0x00000000fed00000 ioremap
> 0xffffc90000007000-0xffffc90000009000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> 0xffffc90000009000-0xffffc9000000b000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> 0xffffc9000000b000-0xffffc9000000d000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> 0xffffc9000000d000-0xffffc9000000f000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> 0xffffc90000011000-0xffffc90000015000   16384 n_tty_open+0x16/0xe0 pages=3 vmalloc
> 0xffffc900003de000-0xffffc900003e0000    8192 acpi_os_map_iomem+0x29e/0x2c0 phys=0x00000000fed00000 ioremap
> 0xffffc900003e0000-0xffffc900003e2000    8192 memremap+0x1a1/0x280 phys=0x00000000000f5000 ioremap
> 0xffffc900003e2000-0xffffc900003f3000   69632 pcpu_create_chunk+0x80/0x2c0 pages=16 vmalloc
> 0xffffc900003f3000-0xffffc90000405000   73728 pcpu_create_chunk+0xb7/0x2c0 pages=17 vmalloc
> 0xffffc90000405000-0xffffc9000040a000   20480 pcpu_create_chunk+0xed/0x2c0 pages=4 vmalloc
> 0xffffe8ffffc00000-0xffffe8ffffe00000 2097152 pcpu_get_vm_areas+0x0/0x1a40 vmalloc
> 
> With randomize_vmalloc=1, the allocations are randomized:
> $ cat /proc/vmallocinfo
> 0xffffca3a36442000-0xffffca3a36447000   20480 pcpu_create_chunk+0xed/0x2c0 pages=4 vmalloc
> 0xffffca63034d6000-0xffffca63034d9000   12288 acpi_os_map_iomem+0x29e/0x2c0 phys=0x000000003ffe0000 ioremap
> 0xffffcce23d32e000-0xffffcce23d330000    8192 memremap+0x1a1/0x280 phys=0x00000000000f5000 ioremap
> 0xffffcfb9f0e22000-0xffffcfb9f0e24000    8192 hpet_enable+0x36/0x4a9 phys=0x00000000fed00000 ioremap
> 0xffffd1df23e9e000-0xffffd1df23eb0000   73728 pcpu_create_chunk+0xb7/0x2c0 pages=17 vmalloc
> 0xffffd690c2990000-0xffffd690c2992000    8192 acpi_os_map_iomem+0x29e/0x2c0 phys=0x000000003ffe1000 ioremap
> 0xffffd8460c718000-0xffffd8460c71c000   16384 n_tty_open+0x16/0xe0 pages=3 vmalloc
> 0xffffd89aba709000-0xffffd89aba70b000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> 0xffffe0ca3f2ed000-0xffffe0ca3f2ef000    8192 acpi_os_map_iomem+0x29e/0x2c0 phys=0x00000000fed00000 ioremap
> 0xffffe3ba44802000-0xffffe3ba44804000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> 0xffffe4524b2a2000-0xffffe4524b2a4000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> 0xffffe61372b2e000-0xffffe61372b30000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> 0xffffe704d2f7c000-0xffffe704d2f8d000   69632 pcpu_create_chunk+0x80/0x2c0 pages=16 vmalloc
> 0xffffe8ffffc00000-0xffffe8ffffe00000 2097152 pcpu_get_vm_areas+0x0/0x1a40 vmalloc
> 
> With CONFIG_VMAP_STACK, also kernel thread stacks are placed in
> vmalloc area and therefore they also get randomized (only one example
> line from /proc/vmallocinfo shown for brevity):
> 
> unrandomized:
> 0xffffc90000018000-0xffffc90000021000   36864 kernel_clone+0xf9/0x560 pages=8 vmalloc
> 
> randomized:
> 0xffffcb57611a8000-0xffffcb57611b1000   36864 kernel_clone+0xf9/0x560 pages=8 vmalloc
> 
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Andy Lutomirski <luto@kernel.org>
> CC: Jann Horn <jannh@google.com>
> CC: Kees Cook <keescook@chromium.org>
> CC: Linux API <linux-api@vger.kernel.org>
> CC: Matthew Wilcox <willy@infradead.org>
> CC: Mike Rapoport <rppt@kernel.org>
> Signed-off-by: Topi Miettinen <toiwoton@gmail.com>
> ---
> v2: retry allocation from other end of vmalloc space in case of
> failure (Matthew Wilcox), improve commit message and documentation
> ---
>   .../admin-guide/kernel-parameters.txt         | 23 +++++++++++++++
>   mm/vmalloc.c                                  | 29 +++++++++++++++++--
>   2 files changed, 50 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 44fde25bb221..9386b1b40a27 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4017,6 +4017,29 @@
>   
>   	ramdisk_start=	[RAM] RAM disk image start address
>   
> +	randomize_vmalloc= [KNL] Randomize vmalloc() allocations. With 1,
> +			the entire vmalloc() area is used randomly to
> +			make the allocations less predictable and
> +			harder to guess for attackers. Also module and
> +			BPF code locations get randomized (within
> +			their dedicated and rather small area though)
> +			and if CONFIG_VMAP_STACK is enabled, also
> +			kernel thread stack locations.
> +
> +			On 32 bit systems this may cause problems due
> +			to increased VM fragmentation if the address
> +			space gets crowded.
> +
> +			On all systems, it will reduce performance and
> +			increase memory and cache usage due to less
> +			efficient use of page tables and inability to
> +			merge adjacent VMAs with compatible
> +			attributes. On x86_64 with 5 level page
> +			tables, in the worst case, additional page
> +			table entries of up to 4 pages are created for
> +			each mapping, so with small mappings there's
> +			considerable penalty.
> +
>   	random.trust_cpu={on,off}
>   			[KNL] Enable or disable trusting the use of the
>   			CPU's random number generator (if available) to
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 6ae491a8b210..d78528af6316 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -34,6 +34,7 @@
>   #include <linux/bitops.h>
>   #include <linux/rbtree_augmented.h>
>   #include <linux/overflow.h>
> +#include <linux/random.h>
>   
>   #include <linux/uaccess.h>
>   #include <asm/tlbflush.h>
> @@ -1079,6 +1080,17 @@ adjust_va_to_fit_type(struct vmap_area *va,
>   	return 0;
>   }
>   
> +static int randomize_vmalloc = 0;
> +
> +static int __init set_randomize_vmalloc(char *str)
> +{
> +	if (!str)
> +		return 0;
> +	randomize_vmalloc = simple_strtoul(str, &str, 0);
> +	return 1;
> +}
> +__setup("randomize_vmalloc=", set_randomize_vmalloc);
> +
>   /*
>    * Returns a start address of the newly allocated area, if success.
>    * Otherwise a vend is returned that indicates failure.
> @@ -1152,7 +1164,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
>   				int node, gfp_t gfp_mask)
>   {
>   	struct vmap_area *va, *pva;
> -	unsigned long addr;
> +	unsigned long addr, voffset;
>   	int purged = 0;
>   	int ret;
>   
> @@ -1207,11 +1219,24 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
>   	if (pva && __this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva))
>   		kmem_cache_free(vmap_area_cachep, pva);
>   
> +	/* Randomize allocation */
> +	if (randomize_vmalloc) {
> +		voffset = get_random_long() & (roundup_pow_of_two(vend - vstart) - 1);
> +		voffset = PAGE_ALIGN(voffset);
> +		if (voffset + size > vend - vstart)
> +			voffset = vend - vstart - size;
> +	} else
> +		voffset = 0;
> +
>   	/*
>   	 * If an allocation fails, the "vend" address is
>   	 * returned. Therefore trigger the overflow path.
>   	 */
> -	addr = __alloc_vmap_area(size, align, vstart, vend);
> +	addr = __alloc_vmap_area(size, align, vstart + voffset, vend);
> +
> +	if (unlikely(addr == vend) && voffset)
> +		/* Retry randomization from other end */
> +		addr = __alloc_vmap_area(size, align, vstart, vstart + voffset + size);
>   	spin_unlock(&free_vmap_area_lock);
>   
>   	if (unlikely(addr == vend))
> 
> base-commit: 7f376f1917d7461e05b648983e8d2aea9d0712b2
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] mm/vmalloc: randomize vmalloc() allocations
  2021-02-13 11:30 ` Topi Miettinen
@ 2021-02-13 11:55   ` Uladzislau Rezki
  2021-02-13 13:43     ` Topi Miettinen
  0 siblings, 1 reply; 6+ messages in thread
From: Uladzislau Rezki @ 2021-02-13 11:55 UTC (permalink / raw)
  To: Topi Miettinen
  Cc: linux-hardening, akpm, linux-mm, linux-kernel, Andy Lutomirski,
	Jann Horn, Kees Cook, Linux API, Matthew Wilcox, Mike Rapoport

> Hello,
> 
> Is there a chance of getting this reviewed and maybe even merged, please?
> 
> -Topi
> 
I can review it and help with it. But before that i would like to
clarify if such "randomization" is something that you can not leave?

For example on 32bit system vmalloc space is limited, such randomization
can slow down it, also it will lead to failing of allocations much more,
thus it will require repeating with different offset.

Second. There is a space or region for modules. Using various offsets
can waste of that memory, thus can lead to failing of module loading.

On the other side there is a per-cpu allocator. Interfering with it
also will increase a rate of failing.

--
Vlad Rezki

> > Memory mappings inside kernel allocated with vmalloc() are in
> > predictable order and packed tightly toward the low addresses. With
> > new kernel boot parameter 'randomize_vmalloc=1', the entire area is
> > used randomly to make the allocations less predictable and harder to
> > guess for attackers. Also module and BPF code locations get randomized
> > (within their dedicated and rather small area though) and if
> > CONFIG_VMAP_STACK is enabled, also kernel thread stack locations.
> > 
> > On 32 bit systems this may cause problems due to increased VM
> > fragmentation if the address space gets crowded.
> > 
> > On all systems, it will reduce performance and increase memory and
> > cache usage due to less efficient use of page tables and inability to
> > merge adjacent VMAs with compatible attributes. On x86_64 with 5 level
> > page tables, in the worst case, additional page table entries of up to
> > 4 pages are created for each mapping, so with small mappings there's
> > considerable penalty.
> > 
> > Without randomize_vmalloc=1:
> > $ cat /proc/vmallocinfo
> > 0xffffc90000000000-0xffffc90000002000    8192 acpi_os_map_iomem+0x29e/0x2c0 phys=0x000000003ffe1000 ioremap
> > 0xffffc90000002000-0xffffc90000005000   12288 acpi_os_map_iomem+0x29e/0x2c0 phys=0x000000003ffe0000 ioremap
> > 0xffffc90000005000-0xffffc90000007000    8192 hpet_enable+0x36/0x4a9 phys=0x00000000fed00000 ioremap
> > 0xffffc90000007000-0xffffc90000009000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> > 0xffffc90000009000-0xffffc9000000b000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> > 0xffffc9000000b000-0xffffc9000000d000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> > 0xffffc9000000d000-0xffffc9000000f000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> > 0xffffc90000011000-0xffffc90000015000   16384 n_tty_open+0x16/0xe0 pages=3 vmalloc
> > 0xffffc900003de000-0xffffc900003e0000    8192 acpi_os_map_iomem+0x29e/0x2c0 phys=0x00000000fed00000 ioremap
> > 0xffffc900003e0000-0xffffc900003e2000    8192 memremap+0x1a1/0x280 phys=0x00000000000f5000 ioremap
> > 0xffffc900003e2000-0xffffc900003f3000   69632 pcpu_create_chunk+0x80/0x2c0 pages=16 vmalloc
> > 0xffffc900003f3000-0xffffc90000405000   73728 pcpu_create_chunk+0xb7/0x2c0 pages=17 vmalloc
> > 0xffffc90000405000-0xffffc9000040a000   20480 pcpu_create_chunk+0xed/0x2c0 pages=4 vmalloc
> > 0xffffe8ffffc00000-0xffffe8ffffe00000 2097152 pcpu_get_vm_areas+0x0/0x1a40 vmalloc
> > 
> > With randomize_vmalloc=1, the allocations are randomized:
> > $ cat /proc/vmallocinfo
> > 0xffffca3a36442000-0xffffca3a36447000   20480 pcpu_create_chunk+0xed/0x2c0 pages=4 vmalloc
> > 0xffffca63034d6000-0xffffca63034d9000   12288 acpi_os_map_iomem+0x29e/0x2c0 phys=0x000000003ffe0000 ioremap
> > 0xffffcce23d32e000-0xffffcce23d330000    8192 memremap+0x1a1/0x280 phys=0x00000000000f5000 ioremap
> > 0xffffcfb9f0e22000-0xffffcfb9f0e24000    8192 hpet_enable+0x36/0x4a9 phys=0x00000000fed00000 ioremap
> > 0xffffd1df23e9e000-0xffffd1df23eb0000   73728 pcpu_create_chunk+0xb7/0x2c0 pages=17 vmalloc
> > 0xffffd690c2990000-0xffffd690c2992000    8192 acpi_os_map_iomem+0x29e/0x2c0 phys=0x000000003ffe1000 ioremap
> > 0xffffd8460c718000-0xffffd8460c71c000   16384 n_tty_open+0x16/0xe0 pages=3 vmalloc
> > 0xffffd89aba709000-0xffffd89aba70b000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> > 0xffffe0ca3f2ed000-0xffffe0ca3f2ef000    8192 acpi_os_map_iomem+0x29e/0x2c0 phys=0x00000000fed00000 ioremap
> > 0xffffe3ba44802000-0xffffe3ba44804000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> > 0xffffe4524b2a2000-0xffffe4524b2a4000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> > 0xffffe61372b2e000-0xffffe61372b30000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> > 0xffffe704d2f7c000-0xffffe704d2f8d000   69632 pcpu_create_chunk+0x80/0x2c0 pages=16 vmalloc
> > 0xffffe8ffffc00000-0xffffe8ffffe00000 2097152 pcpu_get_vm_areas+0x0/0x1a40 vmalloc
> > 
> > With CONFIG_VMAP_STACK, also kernel thread stacks are placed in
> > vmalloc area and therefore they also get randomized (only one example
> > line from /proc/vmallocinfo shown for brevity):
> > 
> > unrandomized:
> > 0xffffc90000018000-0xffffc90000021000   36864 kernel_clone+0xf9/0x560 pages=8 vmalloc
> > 
> > randomized:
> > 0xffffcb57611a8000-0xffffcb57611b1000   36864 kernel_clone+0xf9/0x560 pages=8 vmalloc
> > 
> > CC: Andrew Morton <akpm@linux-foundation.org>
> > CC: Andy Lutomirski <luto@kernel.org>
> > CC: Jann Horn <jannh@google.com>
> > CC: Kees Cook <keescook@chromium.org>
> > CC: Linux API <linux-api@vger.kernel.org>
> > CC: Matthew Wilcox <willy@infradead.org>
> > CC: Mike Rapoport <rppt@kernel.org>
> > Signed-off-by: Topi Miettinen <toiwoton@gmail.com>
> > ---
> > v2: retry allocation from other end of vmalloc space in case of
> > failure (Matthew Wilcox), improve commit message and documentation
> > ---
> >   .../admin-guide/kernel-parameters.txt         | 23 +++++++++++++++
> >   mm/vmalloc.c                                  | 29 +++++++++++++++++--
> >   2 files changed, 50 insertions(+), 2 deletions(-)
> > 
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index 44fde25bb221..9386b1b40a27 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -4017,6 +4017,29 @@
> >   	ramdisk_start=	[RAM] RAM disk image start address
> > +	randomize_vmalloc= [KNL] Randomize vmalloc() allocations. With 1,
> > +			the entire vmalloc() area is used randomly to
> > +			make the allocations less predictable and
> > +			harder to guess for attackers. Also module and
> > +			BPF code locations get randomized (within
> > +			their dedicated and rather small area though)
> > +			and if CONFIG_VMAP_STACK is enabled, also
> > +			kernel thread stack locations.
> > +
> > +			On 32 bit systems this may cause problems due
> > +			to increased VM fragmentation if the address
> > +			space gets crowded.
> > +
> > +			On all systems, it will reduce performance and
> > +			increase memory and cache usage due to less
> > +			efficient use of page tables and inability to
> > +			merge adjacent VMAs with compatible
> > +			attributes. On x86_64 with 5 level page
> > +			tables, in the worst case, additional page
> > +			table entries of up to 4 pages are created for
> > +			each mapping, so with small mappings there's
> > +			considerable penalty.
> > +
> >   	random.trust_cpu={on,off}
> >   			[KNL] Enable or disable trusting the use of the
> >   			CPU's random number generator (if available) to
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 6ae491a8b210..d78528af6316 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -34,6 +34,7 @@
> >   #include <linux/bitops.h>
> >   #include <linux/rbtree_augmented.h>
> >   #include <linux/overflow.h>
> > +#include <linux/random.h>
> >   #include <linux/uaccess.h>
> >   #include <asm/tlbflush.h>
> > @@ -1079,6 +1080,17 @@ adjust_va_to_fit_type(struct vmap_area *va,
> >   	return 0;
> >   }
> > +static int randomize_vmalloc = 0;
> > +
> > +static int __init set_randomize_vmalloc(char *str)
> > +{
> > +	if (!str)
> > +		return 0;
> > +	randomize_vmalloc = simple_strtoul(str, &str, 0);
> > +	return 1;
> > +}
> > +__setup("randomize_vmalloc=", set_randomize_vmalloc);
> > +
> >   /*
> >    * Returns a start address of the newly allocated area, if success.
> >    * Otherwise a vend is returned that indicates failure.
> > @@ -1152,7 +1164,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
> >   				int node, gfp_t gfp_mask)
> >   {
> >   	struct vmap_area *va, *pva;
> > -	unsigned long addr;
> > +	unsigned long addr, voffset;
> >   	int purged = 0;
> >   	int ret;
> > @@ -1207,11 +1219,24 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
> >   	if (pva && __this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva))
> >   		kmem_cache_free(vmap_area_cachep, pva);
> > +	/* Randomize allocation */
> > +	if (randomize_vmalloc) {
> > +		voffset = get_random_long() & (roundup_pow_of_two(vend - vstart) - 1);
> > +		voffset = PAGE_ALIGN(voffset);
> > +		if (voffset + size > vend - vstart)
> > +			voffset = vend - vstart - size;
> > +	} else
> > +		voffset = 0;
> > +
> >   	/*
> >   	 * If an allocation fails, the "vend" address is
> >   	 * returned. Therefore trigger the overflow path.
> >   	 */
> > -	addr = __alloc_vmap_area(size, align, vstart, vend);
> > +	addr = __alloc_vmap_area(size, align, vstart + voffset, vend);
> > +
> > +	if (unlikely(addr == vend) && voffset)
> > +		/* Retry randomization from other end */
> > +		addr = __alloc_vmap_area(size, align, vstart, vstart + voffset + size);
> >   	spin_unlock(&free_vmap_area_lock);
> >   	if (unlikely(addr == vend))
> > 
> > base-commit: 7f376f1917d7461e05b648983e8d2aea9d0712b2
> > 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] mm/vmalloc: randomize vmalloc() allocations
  2021-02-13 11:55   ` Uladzislau Rezki
@ 2021-02-13 13:43     ` Topi Miettinen
  2021-02-15 12:51       ` Uladzislau Rezki
  0 siblings, 1 reply; 6+ messages in thread
From: Topi Miettinen @ 2021-02-13 13:43 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: linux-hardening, akpm, linux-mm, linux-kernel, Andy Lutomirski,
	Jann Horn, Kees Cook, Linux API, Matthew Wilcox, Mike Rapoport

On 13.2.2021 13.55, Uladzislau Rezki wrote:
>> Hello,
>>
>> Is there a chance of getting this reviewed and maybe even merged, please?
>>
>> -Topi
>>
> I can review it and help with it. But before that i would like to
> clarify if such "randomization" is something that you can not leave?

This happens to interest me and I don't mind the performance loss since 
I think there's also an improvement in security. I suppose (perhaps 
wrongly) that others may also be interested in such features. For 
example, also `nosmt` can take away a big part of CPU processing 
capability. Does this answer your question, I'm not sure what you mean 
with leaving? I hope you would not want me to go away and leave?

> For example on 32bit system vmalloc space is limited, such randomization
> can slow down it, also it will lead to failing of allocations much more,
> thus it will require repeating with different offset.

I would not use `randomize_vmalloc=1` on a 32 bit systems, because in 
addition to slow down, the address space could become so fragmented that 
large allocations may not fit anymore. Perhaps the documentation should 
warn about this more clearly. I haven't tried this on a 32 bit system 
though and there the VM layout is very different.

__alloc_vm_area() scans the vmalloc space starting from a random address 
up to end of the area. If this fails, the scan is restarted from the 
bottom of the area up to this random address. Thus the entire area is 
scanned.

> Second. There is a space or region for modules. Using various offsets
> can waste of that memory, thus can lead to failing of module loading.

The allocations for modules (or BPF code) are also randomized within 
their dedicated space. I don't think other allocations should affect 
module space. Within this module space, fragmentation may also be 
possible because there's only 1,5GB available. The largest allocation on 
my system seems to be 11M at the moment, others are 1M or below and most 
are 8k. The possibility of an allocation failing probably depends on the 
fill ratio. In practice haven't seen problems with this.

It would be possible to have finer control, for example 
`randomize_vmalloc=3` (1 = general vmalloc, 2 = modules, bitwise ORed) 
or `randomize_vmalloc=general,modules`.

I experimented by trying to change how the modules are compiled 
(-mcmodel=medium or -mcmodel=large) so that they could be located in the 
normal vmalloc space, but instead I found a bug in the compiler 
(-mfentry produces incorrect code for -mcmodel=large, now fixed).

> On the other side there is a per-cpu allocator. Interfering with it
> also will increase a rate of failing.

I didn't notice the per-cpu allocator before. I'm probably missing 
something, but it seems to be used for a different purpose (for 
allocating the vmap_area structure objects instead of the address space 
range), so where do you see interference?

Thanks for the review!

-Topi

> 
> --
> Vlad Rezki
> 
>>> Memory mappings inside kernel allocated with vmalloc() are in
>>> predictable order and packed tightly toward the low addresses. With
>>> new kernel boot parameter 'randomize_vmalloc=1', the entire area is
>>> used randomly to make the allocations less predictable and harder to
>>> guess for attackers. Also module and BPF code locations get randomized
>>> (within their dedicated and rather small area though) and if
>>> CONFIG_VMAP_STACK is enabled, also kernel thread stack locations.
>>>
>>> On 32 bit systems this may cause problems due to increased VM
>>> fragmentation if the address space gets crowded.
>>>
>>> On all systems, it will reduce performance and increase memory and
>>> cache usage due to less efficient use of page tables and inability to
>>> merge adjacent VMAs with compatible attributes. On x86_64 with 5 level
>>> page tables, in the worst case, additional page table entries of up to
>>> 4 pages are created for each mapping, so with small mappings there's
>>> considerable penalty.
>>>
>>> Without randomize_vmalloc=1:
>>> $ cat /proc/vmallocinfo
>>> 0xffffc90000000000-0xffffc90000002000    8192 acpi_os_map_iomem+0x29e/0x2c0 phys=0x000000003ffe1000 ioremap
>>> 0xffffc90000002000-0xffffc90000005000   12288 acpi_os_map_iomem+0x29e/0x2c0 phys=0x000000003ffe0000 ioremap
>>> 0xffffc90000005000-0xffffc90000007000    8192 hpet_enable+0x36/0x4a9 phys=0x00000000fed00000 ioremap
>>> 0xffffc90000007000-0xffffc90000009000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
>>> 0xffffc90000009000-0xffffc9000000b000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
>>> 0xffffc9000000b000-0xffffc9000000d000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
>>> 0xffffc9000000d000-0xffffc9000000f000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
>>> 0xffffc90000011000-0xffffc90000015000   16384 n_tty_open+0x16/0xe0 pages=3 vmalloc
>>> 0xffffc900003de000-0xffffc900003e0000    8192 acpi_os_map_iomem+0x29e/0x2c0 phys=0x00000000fed00000 ioremap
>>> 0xffffc900003e0000-0xffffc900003e2000    8192 memremap+0x1a1/0x280 phys=0x00000000000f5000 ioremap
>>> 0xffffc900003e2000-0xffffc900003f3000   69632 pcpu_create_chunk+0x80/0x2c0 pages=16 vmalloc
>>> 0xffffc900003f3000-0xffffc90000405000   73728 pcpu_create_chunk+0xb7/0x2c0 pages=17 vmalloc
>>> 0xffffc90000405000-0xffffc9000040a000   20480 pcpu_create_chunk+0xed/0x2c0 pages=4 vmalloc
>>> 0xffffe8ffffc00000-0xffffe8ffffe00000 2097152 pcpu_get_vm_areas+0x0/0x1a40 vmalloc
>>>
>>> With randomize_vmalloc=1, the allocations are randomized:
>>> $ cat /proc/vmallocinfo
>>> 0xffffca3a36442000-0xffffca3a36447000   20480 pcpu_create_chunk+0xed/0x2c0 pages=4 vmalloc
>>> 0xffffca63034d6000-0xffffca63034d9000   12288 acpi_os_map_iomem+0x29e/0x2c0 phys=0x000000003ffe0000 ioremap
>>> 0xffffcce23d32e000-0xffffcce23d330000    8192 memremap+0x1a1/0x280 phys=0x00000000000f5000 ioremap
>>> 0xffffcfb9f0e22000-0xffffcfb9f0e24000    8192 hpet_enable+0x36/0x4a9 phys=0x00000000fed00000 ioremap
>>> 0xffffd1df23e9e000-0xffffd1df23eb0000   73728 pcpu_create_chunk+0xb7/0x2c0 pages=17 vmalloc
>>> 0xffffd690c2990000-0xffffd690c2992000    8192 acpi_os_map_iomem+0x29e/0x2c0 phys=0x000000003ffe1000 ioremap
>>> 0xffffd8460c718000-0xffffd8460c71c000   16384 n_tty_open+0x16/0xe0 pages=3 vmalloc
>>> 0xffffd89aba709000-0xffffd89aba70b000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
>>> 0xffffe0ca3f2ed000-0xffffe0ca3f2ef000    8192 acpi_os_map_iomem+0x29e/0x2c0 phys=0x00000000fed00000 ioremap
>>> 0xffffe3ba44802000-0xffffe3ba44804000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
>>> 0xffffe4524b2a2000-0xffffe4524b2a4000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
>>> 0xffffe61372b2e000-0xffffe61372b30000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
>>> 0xffffe704d2f7c000-0xffffe704d2f8d000   69632 pcpu_create_chunk+0x80/0x2c0 pages=16 vmalloc
>>> 0xffffe8ffffc00000-0xffffe8ffffe00000 2097152 pcpu_get_vm_areas+0x0/0x1a40 vmalloc
>>>
>>> With CONFIG_VMAP_STACK, also kernel thread stacks are placed in
>>> vmalloc area and therefore they also get randomized (only one example
>>> line from /proc/vmallocinfo shown for brevity):
>>>
>>> unrandomized:
>>> 0xffffc90000018000-0xffffc90000021000   36864 kernel_clone+0xf9/0x560 pages=8 vmalloc
>>>
>>> randomized:
>>> 0xffffcb57611a8000-0xffffcb57611b1000   36864 kernel_clone+0xf9/0x560 pages=8 vmalloc
>>>
>>> CC: Andrew Morton <akpm@linux-foundation.org>
>>> CC: Andy Lutomirski <luto@kernel.org>
>>> CC: Jann Horn <jannh@google.com>
>>> CC: Kees Cook <keescook@chromium.org>
>>> CC: Linux API <linux-api@vger.kernel.org>
>>> CC: Matthew Wilcox <willy@infradead.org>
>>> CC: Mike Rapoport <rppt@kernel.org>
>>> Signed-off-by: Topi Miettinen <toiwoton@gmail.com>
>>> ---
>>> v2: retry allocation from other end of vmalloc space in case of
>>> failure (Matthew Wilcox), improve commit message and documentation
>>> ---
>>>    .../admin-guide/kernel-parameters.txt         | 23 +++++++++++++++
>>>    mm/vmalloc.c                                  | 29 +++++++++++++++++--
>>>    2 files changed, 50 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>>> index 44fde25bb221..9386b1b40a27 100644
>>> --- a/Documentation/admin-guide/kernel-parameters.txt
>>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>>> @@ -4017,6 +4017,29 @@
>>>    	ramdisk_start=	[RAM] RAM disk image start address
>>> +	randomize_vmalloc= [KNL] Randomize vmalloc() allocations. With 1,
>>> +			the entire vmalloc() area is used randomly to
>>> +			make the allocations less predictable and
>>> +			harder to guess for attackers. Also module and
>>> +			BPF code locations get randomized (within
>>> +			their dedicated and rather small area though)
>>> +			and if CONFIG_VMAP_STACK is enabled, also
>>> +			kernel thread stack locations.
>>> +
>>> +			On 32 bit systems this may cause problems due
>>> +			to increased VM fragmentation if the address
>>> +			space gets crowded.
>>> +
>>> +			On all systems, it will reduce performance and
>>> +			increase memory and cache usage due to less
>>> +			efficient use of page tables and inability to
>>> +			merge adjacent VMAs with compatible
>>> +			attributes. On x86_64 with 5 level page
>>> +			tables, in the worst case, additional page
>>> +			table entries of up to 4 pages are created for
>>> +			each mapping, so with small mappings there's
>>> +			considerable penalty.
>>> +
>>>    	random.trust_cpu={on,off}
>>>    			[KNL] Enable or disable trusting the use of the
>>>    			CPU's random number generator (if available) to
>>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>>> index 6ae491a8b210..d78528af6316 100644
>>> --- a/mm/vmalloc.c
>>> +++ b/mm/vmalloc.c
>>> @@ -34,6 +34,7 @@
>>>    #include <linux/bitops.h>
>>>    #include <linux/rbtree_augmented.h>
>>>    #include <linux/overflow.h>
>>> +#include <linux/random.h>
>>>    #include <linux/uaccess.h>
>>>    #include <asm/tlbflush.h>
>>> @@ -1079,6 +1080,17 @@ adjust_va_to_fit_type(struct vmap_area *va,
>>>    	return 0;
>>>    }
>>> +static int randomize_vmalloc = 0;
>>> +
>>> +static int __init set_randomize_vmalloc(char *str)
>>> +{
>>> +	if (!str)
>>> +		return 0;
>>> +	randomize_vmalloc = simple_strtoul(str, &str, 0);
>>> +	return 1;
>>> +}
>>> +__setup("randomize_vmalloc=", set_randomize_vmalloc);
>>> +
>>>    /*
>>>     * Returns a start address of the newly allocated area, if success.
>>>     * Otherwise a vend is returned that indicates failure.
>>> @@ -1152,7 +1164,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
>>>    				int node, gfp_t gfp_mask)
>>>    {
>>>    	struct vmap_area *va, *pva;
>>> -	unsigned long addr;
>>> +	unsigned long addr, voffset;
>>>    	int purged = 0;
>>>    	int ret;
>>> @@ -1207,11 +1219,24 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
>>>    	if (pva && __this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva))
>>>    		kmem_cache_free(vmap_area_cachep, pva);
>>> +	/* Randomize allocation */
>>> +	if (randomize_vmalloc) {
>>> +		voffset = get_random_long() & (roundup_pow_of_two(vend - vstart) - 1);
>>> +		voffset = PAGE_ALIGN(voffset);
>>> +		if (voffset + size > vend - vstart)
>>> +			voffset = vend - vstart - size;
>>> +	} else
>>> +		voffset = 0;
>>> +
>>>    	/*
>>>    	 * If an allocation fails, the "vend" address is
>>>    	 * returned. Therefore trigger the overflow path.
>>>    	 */
>>> -	addr = __alloc_vmap_area(size, align, vstart, vend);
>>> +	addr = __alloc_vmap_area(size, align, vstart + voffset, vend);
>>> +
>>> +	if (unlikely(addr == vend) && voffset)
>>> +		/* Retry randomization from other end */
>>> +		addr = __alloc_vmap_area(size, align, vstart, vstart + voffset + size);
>>>    	spin_unlock(&free_vmap_area_lock);
>>>    	if (unlikely(addr == vend))
>>>
>>> base-commit: 7f376f1917d7461e05b648983e8d2aea9d0712b2
>>>
>>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] mm/vmalloc: randomize vmalloc() allocations
  2021-02-13 13:43     ` Topi Miettinen
@ 2021-02-15 12:51       ` Uladzislau Rezki
  2021-02-15 18:12         ` Topi Miettinen
  0 siblings, 1 reply; 6+ messages in thread
From: Uladzislau Rezki @ 2021-02-15 12:51 UTC (permalink / raw)
  To: Topi Miettinen
  Cc: Uladzislau Rezki, linux-hardening, akpm, linux-mm, linux-kernel,
	Andy Lutomirski, Jann Horn, Kees Cook, Linux API, Matthew Wilcox,
	Mike Rapoport

On Sat, Feb 13, 2021 at 03:43:39PM +0200, Topi Miettinen wrote:
> On 13.2.2021 13.55, Uladzislau Rezki wrote:
> > > Hello,
> > > 
> > > Is there a chance of getting this reviewed and maybe even merged, please?
> > > 
> > > -Topi
> > > 
> > I can review it and help with it. But before that i would like to
> > clarify if such "randomization" is something that you can not leave?
> 
> This happens to interest me and I don't mind the performance loss since I
> think there's also an improvement in security. I suppose (perhaps wrongly)
> that others may also be interested in such features. For example, also
> `nosmt` can take away a big part of CPU processing capability.
>
OK. I was thinking about if it is done for some production systems or
some specific projects where this is highly demanded.

>
> Does this
> answer your question, I'm not sure what you mean with leaving? I hope you
> would not want me to go away and leave?
>
No-no, that was a type :) Sorry for that. I just wanted to figure out
who really needs it.

> > For example on 32bit system vmalloc space is limited, such randomization
> > can slow down it, also it will lead to failing of allocations much more,
> > thus it will require repeating with different offset.
> 
> I would not use `randomize_vmalloc=1` on a 32 bit systems, because in
> addition to slow down, the address space could become so fragmented that
> large allocations may not fit anymore. Perhaps the documentation should warn
> about this more clearly. I haven't tried this on a 32 bit system though and
> there the VM layout is very different.
> 
For 32-bit systems that would introduce many issues not limited to fragmentations.

> __alloc_vm_area() scans the vmalloc space starting from a random address up
> to end of the area. If this fails, the scan is restarted from the bottom of
> the area up to this random address. Thus the entire area is scanned.
> 
> > Second. There is a space or region for modules. Using various offsets
> > can waste of that memory, thus can lead to failing of module loading.
> 
> The allocations for modules (or BPF code) are also randomized within their
> dedicated space. I don't think other allocations should affect module space.
> Within this module space, fragmentation may also be possible because there's
> only 1,5GB available. The largest allocation on my system seems to be 11M at
> the moment, others are 1M or below and most are 8k. The possibility of an
> allocation failing probably depends on the fill ratio. In practice haven't
> seen problems with this.
> 
I think it depends on how many modules your system loads. If it is a big
system it might be that such fragmentation and wasting of module space
may lead to modules loading.

> It would be possible to have finer control, for example
> `randomize_vmalloc=3` (1 = general vmalloc, 2 = modules, bitwise ORed) or
> `randomize_vmalloc=general,modules`.
> 
> I experimented by trying to change how the modules are compiled
> (-mcmodel=medium or -mcmodel=large) so that they could be located in the
> normal vmalloc space, but instead I found a bug in the compiler (-mfentry
> produces incorrect code for -mcmodel=large, now fixed).
> 
> > On the other side there is a per-cpu allocator. Interfering with it
> > also will increase a rate of failing.
> 
> I didn't notice the per-cpu allocator before. I'm probably missing
> something, but it seems to be used for a different purpose (for allocating
> the vmap_area structure objects instead of the address space range), so
> where do you see interference?
> 


   A                       B
 ---->                   <----
<---------------------------><--------->
|   vmalloc address space    |
|<--------------------------->


A - is a vmalloc allocations;
B - is a percpu-allocator.

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] mm/vmalloc: randomize vmalloc() allocations
  2021-02-15 12:51       ` Uladzislau Rezki
@ 2021-02-15 18:12         ` Topi Miettinen
  0 siblings, 0 replies; 6+ messages in thread
From: Topi Miettinen @ 2021-02-15 18:12 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: linux-hardening, akpm, linux-mm, linux-kernel, Andy Lutomirski,
	Jann Horn, Kees Cook, Linux API, Matthew Wilcox, Mike Rapoport

On 15.2.2021 14.51, Uladzislau Rezki wrote:
> On Sat, Feb 13, 2021 at 03:43:39PM +0200, Topi Miettinen wrote:
>> On 13.2.2021 13.55, Uladzislau Rezki wrote:
>>>> Hello,
>>>>
>>>> Is there a chance of getting this reviewed and maybe even merged, please?
>>>>
>>>> -Topi
>>>>
>>> I can review it and help with it. But before that i would like to
>>> clarify if such "randomization" is something that you can not leave?
>>
>> This happens to interest me and I don't mind the performance loss since I
>> think there's also an improvement in security. I suppose (perhaps wrongly)
>> that others may also be interested in such features. For example, also
>> `nosmt` can take away a big part of CPU processing capability.
>>
> OK. I was thinking about if it is done for some production systems or
> some specific projects where this is highly demanded.
> 
>>
>> Does this
>> answer your question, I'm not sure what you mean with leaving? I hope you
>> would not want me to go away and leave?
>>
> No-no, that was a type :) Sorry for that. I just wanted to figure out
> who really needs it.

It's not needed. The goal is just to increase address space layout 
randomization, to harden the system against attacks which depend on 
predictable kernel memory layout. This should not be used when 
performance is more important than hardening.

>>> For example on 32bit system vmalloc space is limited, such randomization
>>> can slow down it, also it will lead to failing of allocations much more,
>>> thus it will require repeating with different offset.
>>
>> I would not use `randomize_vmalloc=1` on a 32 bit systems, because in
>> addition to slow down, the address space could become so fragmented that
>> large allocations may not fit anymore. Perhaps the documentation should warn
>> about this more clearly. I haven't tried this on a 32 bit system though and
>> there the VM layout is very different.
>>
> For 32-bit systems that would introduce many issues not limited to fragmentations.
> 
>> __alloc_vm_area() scans the vmalloc space starting from a random address up
>> to end of the area. If this fails, the scan is restarted from the bottom of
>> the area up to this random address. Thus the entire area is scanned.
>>
>>> Second. There is a space or region for modules. Using various offsets
>>> can waste of that memory, thus can lead to failing of module loading.
>>
>> The allocations for modules (or BPF code) are also randomized within their
>> dedicated space. I don't think other allocations should affect module space.
>> Within this module space, fragmentation may also be possible because there's
>> only 1,5GB available. The largest allocation on my system seems to be 11M at
>> the moment, others are 1M or below and most are 8k. The possibility of an
>> allocation failing probably depends on the fill ratio. In practice haven't
>> seen problems with this.
>>
> I think it depends on how many modules your system loads. If it is a big
> system it might be that such fragmentation and wasting of module space
> may lead to modules loading.

# echo 1 > /proc/sys/kernel/kptr_restrict
# grep 0xffffffff /proc/vmallocinfo | awk '{s=s+$2;c++} END {print 
"total\tcount\tavg\tof 1536MB";print s,c,s/c,s/1536/1024/1024}'
total   count   avg     of 1536MB
34201600 1022 33465.4 0.0212351

I think that on my system fragmentation shouldn't be a danger since only 
2% (34MB) of the 1536MB available is used for the 1022 module/BPF blocks.

>> It would be possible to have finer control, for example
>> `randomize_vmalloc=3` (1 = general vmalloc, 2 = modules, bitwise ORed) or
>> `randomize_vmalloc=general,modules`.
>>
>> I experimented by trying to change how the modules are compiled
>> (-mcmodel=medium or -mcmodel=large) so that they could be located in the
>> normal vmalloc space, but instead I found a bug in the compiler (-mfentry
>> produces incorrect code for -mcmodel=large, now fixed).
>>
>>> On the other side there is a per-cpu allocator. Interfering with it
>>> also will increase a rate of failing.
>>
>> I didn't notice the per-cpu allocator before. I'm probably missing
>> something, but it seems to be used for a different purpose (for allocating
>> the vmap_area structure objects instead of the address space range), so
>> where do you see interference?
>>
> 
> 
>     A                       B
>   ---->                   <----
> <---------------------------><--------->
> |   vmalloc address space    |
> |<--------------------------->
> 
> 
> A - is a vmalloc allocations;
> B - is a percpu-allocator.

OK, now I get it, thanks. These can be seen in /proc/vmallocinfo as 
allocations done by pcpu_get_vm_areas(). The way of allocating very 
predictably downwards of a fixed address is bad for ASLR, so I'll try to 
randomize the location of these too. Other allocations by 
pcpu_populate_chunk() and
pcpu_create_chunk() seem to be randomized already.

-Topi

> 
> --
> Vlad Rezki
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-02-15 18:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-12 17:56 [PATCH v2] mm/vmalloc: randomize vmalloc() allocations Topi Miettinen
2021-02-13 11:30 ` Topi Miettinen
2021-02-13 11:55   ` Uladzislau Rezki
2021-02-13 13:43     ` Topi Miettinen
2021-02-15 12:51       ` Uladzislau Rezki
2021-02-15 18:12         ` Topi Miettinen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.