linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/3] KASLR feature to randomize each loadable module
@ 2018-08-29 22:59 Rick Edgecombe
  2018-08-29 22:59 ` [PATCH v4 1/3] vmalloc: Add __vmalloc_node_try_addr function Rick Edgecombe
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Rick Edgecombe @ 2018-08-29 22:59 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, linux-kernel, linux-mm, kernel-hardening,
	daniel, jannh, keescook
  Cc: kristen, dave.hansen, arjan, Rick Edgecombe

Hi,

This is v4 of the "KASLR feature to randomize each loadable module" patchset.
The purpose is to increase the randomization and also to make the modules
randomized in relation to each other instead of just the base, so that if one
module leaks the location of the others can't be inferred. It is enabled for
x86_64 for now.

V4 is a few small fixes. I humbly think this is in pretty good shape at this
point, unless anyone has any comments. The only other big change I was
considering was moving the new randomization algorithm into vmalloc so it could
be re-used for other architectures or possibly other vmalloc usages.

A few words on how this was tested - As previously mentioned, the entropy
estimates were done using extracted module text sizes from the in-tree modules.
These were also used to run 100,000's of simulated module allocations by calling
module_alloc from a test module, including testing until allocation failure. The
simulations kept track of every allocation address to make sure there were no
collisions, and verified memory was actually mapped.

In addition the __vmalloc_node_try_addr function has a suite of unit tests that
verify for a bunch of edge cases that it:
 - Allows for allocations when it should
 - Reports the right error code if it collides with a lazy-free area or real
   allocation
 - Verifies it frees a lazy free area when it should

These synthetic tests were also how the performance metrics were gathered.

Changes for V4:
 - Fix issue caused by KASAN, kmemleak being provided different allocation
   lengths (padding).
 - Avoid kmalloc until sure its needed in __vmalloc_node_try_addr.
 - Fix for debug file hang when the last VA is a lazy purge area
 - Fixed issues reported by 0-day build system.

Changes for V3:
 - Code cleanup based on internal feedback. (thanks to Dave Hansen and Andriy
   Shevchenko)
 - Slight refactor of existing algorithm to more cleanly live along side new
   one.
 - BPF synthetic benchmark

Changes for V2:
 - New implementation of __vmalloc_node_try_addr based on the
   __vmalloc_node_range implementation, that only flushes TLB when needed.
 - Modified module loading algorithm to try to reduce the TLB flushes further.
 - Increase "random area" tries in order to increase the number of modules that
   can get high randomness.
 - Increase "random area" size to 2/3 of module area in order to increase the
   number of modules that can get high randomness.
 - Fix for 0day failures on other architectures.
 - Fix for wrong debugfs permissions. (thanks to Jann Horn)
 - Spelling fix. (thanks to Jann Horn)
 - Data on module_alloc performance and TLB flushes. (brought up by Kees Cook
   and Jann Horn)
 - Data on memory usage. (suggested by Jann)


Rick Edgecombe (3):
  vmalloc: Add __vmalloc_node_try_addr function
  x86/modules: Increase randomization for modules
  vmalloc: Add debugfs modfraginfo

 arch/x86/include/asm/pgtable_64_types.h |   7 +
 arch/x86/kernel/module.c                | 165 ++++++++++++++++---
 include/linux/vmalloc.h                 |   3 +
 mm/vmalloc.c                            | 279 +++++++++++++++++++++++++++++++-
 4 files changed, 429 insertions(+), 25 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v4 1/3] vmalloc: Add __vmalloc_node_try_addr function
  2018-08-29 22:59 [PATCH v4 0/3] KASLR feature to randomize each loadable module Rick Edgecombe
@ 2018-08-29 22:59 ` Rick Edgecombe
  2018-08-29 22:59 ` [PATCH v4 2/3] x86/modules: Increase randomization for modules Rick Edgecombe
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Rick Edgecombe @ 2018-08-29 22:59 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, linux-kernel, linux-mm, kernel-hardening,
	daniel, jannh, keescook
  Cc: kristen, dave.hansen, arjan, Rick Edgecombe

Create __vmalloc_node_try_addr function that tries to allocate at a specific
address and supports caller specified behavior for whether any lazy purging
happens if there is a collision.

This new function draws from the __vmalloc_node_range implementation. Attempts
to merge the two into a single allocator resulted in logic that was difficult
to follow, so they are left separate.

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 include/linux/vmalloc.h |   3 +
 mm/vmalloc.c            | 177 +++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 179 insertions(+), 1 deletion(-)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 398e9c9..c7712c8 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -82,6 +82,9 @@ extern void *__vmalloc_node_range(unsigned long size, unsigned long align,
 			unsigned long start, unsigned long end, gfp_t gfp_mask,
 			pgprot_t prot, unsigned long vm_flags, int node,
 			const void *caller);
+extern void *__vmalloc_node_try_addr(unsigned long addr, unsigned long size,
+			gfp_t gfp_mask,	pgprot_t prot, unsigned long vm_flags,
+			int node, int try_purge, const void *caller);
 #ifndef CONFIG_MMU
 extern void *__vmalloc_node_flags(unsigned long size, int node, gfp_t flags);
 static inline void *__vmalloc_node_flags_caller(unsigned long size, int node,
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index a728fc4..1954458 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1709,6 +1709,181 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 	return NULL;
 }
 
+static bool pvm_find_next_prev(unsigned long end,
+			       struct vmap_area **pnext,
+			       struct vmap_area **pprev);
+
+/* Try to allocate a region of KVA of the specified address and size. */
+static struct vmap_area *try_alloc_vmap_area(unsigned long addr,
+				unsigned long size, int node, gfp_t gfp_mask,
+				int try_purge)
+{
+	struct vmap_area *va;
+	struct vmap_area *cur_va = NULL;
+	struct vmap_area *first_before = NULL;
+	int need_purge = 0;
+	int blocked = 0;
+	int purged = 0;
+	unsigned long addr_end;
+
+	WARN_ON(!size);
+	WARN_ON(offset_in_page(size));
+
+	addr_end = addr + size;
+	if (addr > addr_end)
+		return ERR_PTR(-EOVERFLOW);
+
+	might_sleep();
+
+	va = kmalloc_node(sizeof(struct vmap_area),
+			gfp_mask & GFP_RECLAIM_MASK, node);
+	if (unlikely(!va))
+		return ERR_PTR(-ENOMEM);
+
+	/*
+	 * Only scan the relevant parts containing pointers to other objects
+	 * to avoid false negatives.
+	 */
+	kmemleak_scan_area(&va->rb_node, SIZE_MAX, gfp_mask & GFP_RECLAIM_MASK);
+
+retry:
+	spin_lock(&vmap_area_lock);
+
+	pvm_find_next_prev(addr, &cur_va, &first_before);
+
+	if (!cur_va)
+		goto found;
+
+	/*
+	 * If there is no VA that starts before the target address, start the
+	 * check from the closest VA in order to cover the case where the
+	 * allocation overlaps at the end.
+	 */
+	if (first_before && addr < first_before->va_end)
+		cur_va = first_before;
+
+	/* Linearly search through to make sure there is a hole */
+	while (cur_va->va_start < addr_end) {
+		if (cur_va->va_end > addr) {
+			if (cur_va->flags & VM_LAZY_FREE) {
+				need_purge = 1;
+			} else {
+				blocked = 1;
+				break;
+			}
+		}
+
+		if (list_is_last(&cur_va->list, &vmap_area_list))
+			break;
+
+		cur_va = list_next_entry(cur_va, list);
+	}
+
+	/*
+	 * If a non-lazy free va blocks the allocation, or
+	 * we are not supposed to purge, but we need to, the
+	 * allocation fails.
+	 */
+	if (blocked || (need_purge && !try_purge))
+		goto fail;
+
+	if (try_purge && need_purge) {
+		/* if purged once before, give up */
+		if (purged)
+			goto fail;
+
+		/*
+		 * If the va blocking the allocation is set to
+		 * be purged then purge all vmap_areas that are
+		 * set to purged since this will flush the TLBs
+		 * anyway.
+		 */
+		spin_unlock(&vmap_area_lock);
+		purge_vmap_area_lazy();
+		need_purge = 0;
+		purged = 1;
+		goto retry;
+	}
+
+found:
+	va->va_start = addr;
+	va->va_end = addr_end;
+	va->flags = 0;
+	__insert_vmap_area(va);
+	spin_unlock(&vmap_area_lock);
+
+	return va;
+fail:
+	spin_unlock(&vmap_area_lock);
+	kfree(va);
+	if (need_purge && !blocked)
+		return ERR_PTR(-EUCLEAN);
+	return ERR_PTR(-EBUSY);
+}
+
+/**
+ *	__vmalloc_try_addr  -  try to alloc at a specific address
+ *	@addr:		address to try
+ *	@size:		size to try
+ *	@gfp_mask:	flags for the page level allocator
+ *	@prot:		protection mask for the allocated pages
+ *	@vm_flags:	additional vm area flags (e.g. %VM_NO_GUARD)
+ *	@node:		node to use for allocation or NUMA_NO_NODE
+ *	@try_purge:	try to purge if needed to fulfill and allocation
+ *	@caller:	caller's return address
+ *
+ *	Try to allocate at the specific address. If it succeeds the address is
+ *	returned. If it fails an EBUSY ERR_PTR is returned. If try_purge is
+ *	zero, it will return an EUCLEAN ERR_PTR if it could have allocated if it
+ *	was allowed to purge. It may trigger TLB flushes if a purge is needed,
+ *	and try_purge is set.
+ */
+void *__vmalloc_node_try_addr(unsigned long addr, unsigned long size,
+			gfp_t gfp_mask,	pgprot_t prot, unsigned long vm_flags,
+			int node, int try_purge, const void *caller)
+{
+	struct vmap_area *va;
+	struct vm_struct *area;
+	void *alloc_addr;
+	unsigned long real_size = size;
+
+	size = PAGE_ALIGN(size);
+	if (!size || (size >> PAGE_SHIFT) > totalram_pages)
+		return NULL;
+
+	WARN_ON(in_interrupt());
+
+	if (!(vm_flags & VM_NO_GUARD))
+		size += PAGE_SIZE;
+
+	va = try_alloc_vmap_area(addr, size, node, gfp_mask, try_purge);
+	if (IS_ERR(va))
+		goto fail;
+
+	area = kzalloc_node(sizeof(*area), gfp_mask & GFP_RECLAIM_MASK, node);
+	if (unlikely(!area)) {
+		warn_alloc(gfp_mask, NULL, "kmalloc: allocation failure");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	setup_vmalloc_vm(area, va, vm_flags, caller);
+
+	alloc_addr = __vmalloc_area_node(area, gfp_mask, prot, node);
+	if (!alloc_addr) {
+		warn_alloc(gfp_mask, NULL,
+			"vmalloc: allocation failure: %lu bytes", real_size);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	clear_vm_uninitialized_flag(area);
+
+	kmemleak_vmalloc(area, real_size, gfp_mask);
+
+	return alloc_addr;
+fail:
+	return va;
+}
+
 /**
  *	__vmalloc_node_range  -  allocate virtually contiguous memory
  *	@size:		allocation size
@@ -2355,7 +2530,6 @@ void free_vm_area(struct vm_struct *area)
 }
 EXPORT_SYMBOL_GPL(free_vm_area);
 
-#ifdef CONFIG_SMP
 static struct vmap_area *node_to_va(struct rb_node *n)
 {
 	return rb_entry_safe(n, struct vmap_area, rb_node);
@@ -2403,6 +2577,7 @@ static bool pvm_find_next_prev(unsigned long end,
 	return true;
 }
 
+#ifdef CONFIG_SMP
 /**
  * pvm_determine_end - find the highest aligned address between two vmap_areas
  * @pnext: in/out arg for the next vmap_area
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v4 2/3] x86/modules: Increase randomization for modules
  2018-08-29 22:59 [PATCH v4 0/3] KASLR feature to randomize each loadable module Rick Edgecombe
  2018-08-29 22:59 ` [PATCH v4 1/3] vmalloc: Add __vmalloc_node_try_addr function Rick Edgecombe
@ 2018-08-29 22:59 ` Rick Edgecombe
  2018-08-29 22:59 ` [PATCH v4 3/3] vmalloc: Add debugfs modfraginfo Rick Edgecombe
  2018-08-30  2:27 ` [PATCH v4 0/3] KASLR feature to randomize each loadable module Alexei Starovoitov
  3 siblings, 0 replies; 6+ messages in thread
From: Rick Edgecombe @ 2018-08-29 22:59 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, linux-kernel, linux-mm, kernel-hardening,
	daniel, jannh, keescook
  Cc: kristen, dave.hansen, arjan, Rick Edgecombe

This changes the behavior of the KASLR logic for allocating memory for the text
sections of loadable modules. It randomizes the location of each module text
section with about 17 bits of entropy in typical use. This is enabled on X86_64
only. For 32 bit, the behavior is unchanged.

It refactors existing code around module randomization somewhat. There are now
three different behaviors for x86 module_alloc depending on config.
RANDOMIZE_BASE=n, and RANDOMIZE_BASE=y ARCH=x86_64, and RANDOMIZE_BASE=y
ARCH=i386. The refactor of the existing code is to try to clearly show what
those behaviors are without having three separate versions or threading the
behaviors in a bunch of little spots. The reason it is not enabled on 32 bit
yet is because the module space is much smaller and simulations haven't been
run to see how it performs.

The new algorithm breaks the module space in two, a random area and a backup
area. It first tries to allocate at a number of randomly located starting pages
inside the random section without purging any lazy free vmap areas and
triggering the associated TLB flush. If this fails, it will try again a number
of times allowing for purges if needed. It also saves any position that could
have succeeded if it was allowed to purge, which doubles the chances of finding
a spot that would fit. Finally if those both fail to find a position it will
allocate in the backup area. The backup area base will be offset in the same
way as the current algorithm does for the base area, 1024 possible locations.

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 arch/x86/include/asm/pgtable_64_types.h |   7 ++
 arch/x86/kernel/module.c                | 165 +++++++++++++++++++++++++++-----
 2 files changed, 149 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 04edd2d..5e26369 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -143,6 +143,13 @@ extern unsigned int ptrs_per_p4d;
 #define MODULES_END		_AC(0xffffffffff000000, UL)
 #define MODULES_LEN		(MODULES_END - MODULES_VADDR)
 
+/*
+ * Dedicate the first part of the module space to a randomized area when KASLR
+ * is in use.  Leave the remaining part for a fallback if we are unable to
+ * allocate in the random area.
+ */
+#define MODULES_RAND_LEN	PAGE_ALIGN((MODULES_LEN/3)*2)
+
 #define ESPFIX_PGD_ENTRY	_AC(-2, UL)
 #define ESPFIX_BASE_ADDR	(ESPFIX_PGD_ENTRY << P4D_SHIFT)
 
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index f58336a..d50a0a0 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -48,34 +48,151 @@ do {							\
 } while (0)
 #endif
 
-#ifdef CONFIG_RANDOMIZE_BASE
+#if defined(CONFIG_X86_64) && defined(CONFIG_RANDOMIZE_BASE)
+static inline unsigned long get_modules_rand_len(void)
+{
+	return MODULES_RAND_LEN;
+}
+#else
+static inline unsigned long get_modules_rand_len(void)
+{
+	BUILD_BUG();
+	return 0;
+}
+
+inline bool kaslr_enabled(void);
+#endif
+
+static inline int kaslr_randomize_each_module(void)
+{
+	return IS_ENABLED(CONFIG_RANDOMIZE_BASE)
+		&& IS_ENABLED(CONFIG_X86_64)
+		&& kaslr_enabled();
+}
+
+static inline int kaslr_randomize_base(void)
+{
+	return IS_ENABLED(CONFIG_RANDOMIZE_BASE)
+		&& !IS_ENABLED(CONFIG_X86_64)
+		&& kaslr_enabled();
+}
+
 static unsigned long module_load_offset;
+static const unsigned long NR_NO_PURGE = 5000;
+static const unsigned long NR_TRY_PURGE = 5000;
 
 /* Mutex protects the module_load_offset. */
 static DEFINE_MUTEX(module_kaslr_mutex);
 
 static unsigned long int get_module_load_offset(void)
 {
-	if (kaslr_enabled()) {
-		mutex_lock(&module_kaslr_mutex);
-		/*
-		 * Calculate the module_load_offset the first time this
-		 * code is called. Once calculated it stays the same until
-		 * reboot.
-		 */
-		if (module_load_offset == 0)
-			module_load_offset =
-				(get_random_int() % 1024 + 1) * PAGE_SIZE;
-		mutex_unlock(&module_kaslr_mutex);
-	}
+	mutex_lock(&module_kaslr_mutex);
+	/*
+	 * Calculate the module_load_offset the first time this
+	 * code is called. Once calculated it stays the same until
+	 * reboot.
+	 */
+	if (module_load_offset == 0)
+		module_load_offset = (get_random_int() % 1024 + 1) * PAGE_SIZE;
+	mutex_unlock(&module_kaslr_mutex);
+
 	return module_load_offset;
 }
-#else
-static unsigned long int get_module_load_offset(void)
+
+static unsigned long get_module_vmalloc_start(void)
 {
-	return 0;
+	if (kaslr_randomize_each_module())
+		return MODULES_VADDR + get_modules_rand_len()
+					+ get_module_load_offset();
+	else if (kaslr_randomize_base())
+		return MODULES_VADDR + get_module_load_offset();
+
+	return MODULES_VADDR;
+}
+
+static void *try_module_alloc(unsigned long addr, unsigned long size,
+					int try_purge)
+{
+	const unsigned long vm_flags = 0;
+
+	return __vmalloc_node_try_addr(addr, size, GFP_KERNEL, PAGE_KERNEL_EXEC,
+					vm_flags, NUMA_NO_NODE, try_purge,
+					__builtin_return_address(0));
+}
+
+/*
+ * Find a random address to try that won't obviously not fit. Random areas are
+ * allowed to overflow into the backup area
+ */
+static unsigned long get_rand_module_addr(unsigned long size)
+{
+	unsigned long nr_max_pos = (MODULES_LEN - size) / MODULE_ALIGN + 1;
+	unsigned long nr_rnd_pos = get_modules_rand_len() / MODULE_ALIGN;
+	unsigned long nr_pos = min(nr_max_pos, nr_rnd_pos);
+
+	unsigned long module_position_nr = get_random_long() % nr_pos;
+	unsigned long offset = module_position_nr * MODULE_ALIGN;
+
+	return MODULES_VADDR + offset;
+}
+
+/*
+ * Try to allocate in the random area. First 5000 times without purging, then
+ * 5000 times with purging. If these fail, return NULL.
+ */
+static void *try_module_randomize_each(unsigned long size)
+{
+	void *p = NULL;
+	unsigned int i;
+	unsigned long last_lazy_free_blocked = 0;
+
+	/* This will have a guard page */
+	unsigned long va_size = PAGE_ALIGN(size) + PAGE_SIZE;
+
+	if (!kaslr_randomize_each_module())
+		return NULL;
+
+	/* Make sure there is at least one address that might fit. */
+	if (va_size < PAGE_ALIGN(size) || va_size > MODULES_LEN)
+		return NULL;
+
+	/* Try to find a spot that doesn't need a lazy purge */
+	for (i = 0; i < NR_NO_PURGE; i++) {
+		unsigned long addr = get_rand_module_addr(va_size);
+
+		/* First try to avoid having to purge */
+		p = try_module_alloc(addr, size, 0);
+
+		/*
+		 * Save the last value that was blocked by a
+		 * lazy purge area.
+		 */
+		if (IS_ERR(p) && PTR_ERR(p) == -EUCLEAN)
+			last_lazy_free_blocked = addr;
+		else if (!IS_ERR(p))
+			return p;
+	}
+
+	/* Try the most recent spot that could be used after a lazy purge */
+	if (last_lazy_free_blocked) {
+		p = try_module_alloc(last_lazy_free_blocked, size, 1);
+
+		if (!IS_ERR(p))
+			return p;
+	}
+
+	/* Look for more spots and allow lazy purges */
+	for (i = 0; i < NR_TRY_PURGE; i++) {
+		unsigned long addr = get_rand_module_addr(va_size);
+
+		/* Give up and allow for purges */
+		p = try_module_alloc(addr, size, 1);
+
+		if (!IS_ERR(p))
+			return p;
+	}
+	return NULL;
 }
-#endif
 
 void *module_alloc(unsigned long size)
 {
@@ -84,16 +201,18 @@ void *module_alloc(unsigned long size)
 	if (PAGE_ALIGN(size) > MODULES_LEN)
 		return NULL;
 
-	p = __vmalloc_node_range(size, MODULE_ALIGN,
-				    MODULES_VADDR + get_module_load_offset(),
-				    MODULES_END, GFP_KERNEL,
-				    PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
-				    __builtin_return_address(0));
+	p = try_module_randomize_each(size);
+
+	if (!p)
+		p = __vmalloc_node_range(size, MODULE_ALIGN,
+				get_module_vmalloc_start(), MODULES_END,
+				GFP_KERNEL, PAGE_KERNEL_EXEC, 0,
+				NUMA_NO_NODE, __builtin_return_address(0));
+
 	if (p && (kasan_module_alloc(p, size) < 0)) {
 		vfree(p);
 		return NULL;
 	}
-
 	return p;
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v4 3/3] vmalloc: Add debugfs modfraginfo
  2018-08-29 22:59 [PATCH v4 0/3] KASLR feature to randomize each loadable module Rick Edgecombe
  2018-08-29 22:59 ` [PATCH v4 1/3] vmalloc: Add __vmalloc_node_try_addr function Rick Edgecombe
  2018-08-29 22:59 ` [PATCH v4 2/3] x86/modules: Increase randomization for modules Rick Edgecombe
@ 2018-08-29 22:59 ` Rick Edgecombe
  2018-08-30  2:27 ` [PATCH v4 0/3] KASLR feature to randomize each loadable module Alexei Starovoitov
  3 siblings, 0 replies; 6+ messages in thread
From: Rick Edgecombe @ 2018-08-29 22:59 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, linux-kernel, linux-mm, kernel-hardening,
	daniel, jannh, keescook
  Cc: kristen, dave.hansen, arjan, Rick Edgecombe

Add debugfs file "modfraginfo" for providing info on module space fragmentation.
This can be used for determining if loadable module randomization is causing any
problems for extreme module loading situations, like huge numbers of modules or
extremely large modules.

Sample output when KASLR is enabled and X86_64 is configured:
	Largest free space:	897912 kB
	  Total free space:	1025424 kB
Allocations in backup area:	0

Sample output when just X86_64:
	Largest free space:	897912 kB
	  Total free space:	1025424 kB

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 mm/vmalloc.c | 102 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 101 insertions(+), 1 deletion(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 1954458..a44b902 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -18,6 +18,7 @@
 #include <linux/interrupt.h>
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
+#include <linux/debugfs.h>
 #include <linux/debugobjects.h>
 #include <linux/kallsyms.h>
 #include <linux/list.h>
@@ -33,6 +34,7 @@
 #include <linux/bitops.h>
 
 #include <linux/uaccess.h>
+#include <asm/setup.h>
 #include <asm/tlbflush.h>
 #include <asm/shmparam.h>
 
@@ -2919,7 +2921,105 @@ static int __init proc_vmalloc_init(void)
 		proc_create_seq("vmallocinfo", 0400, NULL, &vmalloc_op);
 	return 0;
 }
-module_init(proc_vmalloc_init);
+#else
+static int __init proc_vmalloc_init(void)
+{
+	return 0;
+}
+#endif
+
+#if defined(CONFIG_RANDOMIZE_BASE) && defined(CONFIG_X86_64)
+static inline unsigned long is_in_backup(unsigned long addr)
+{
+	return addr >= MODULES_VADDR + MODULES_RAND_LEN;
+}
+#else
+static inline unsigned long is_in_backup(unsigned long addr)
+{
+	return 0;
+}
 
+inline bool kaslr_enabled(void);
 #endif
 
+
+#if defined(CONFIG_DEBUG_FS) && defined(CONFIG_X86_64)
+static int modulefraginfo_debug_show(struct seq_file *m, void *v)
+{
+	unsigned long last_end = MODULES_VADDR;
+	unsigned long total_free = 0;
+	unsigned long largest_free = 0;
+	unsigned long backup_cnt = 0;
+	unsigned long gap;
+	struct vmap_area *prev, *cur = NULL;
+
+	spin_lock(&vmap_area_lock);
+
+	if (!pvm_find_next_prev(MODULES_VADDR, &cur, &prev) || !cur)
+		goto done;
+
+	for (; cur->va_end <= MODULES_END; cur = list_next_entry(cur, list)) {
+		/* Don't count areas that are marked to be lazily freed */
+		if (!(cur->flags & VM_LAZY_FREE)) {
+			backup_cnt += is_in_backup(cur->va_start);
+			gap = cur->va_start - last_end;
+			if (gap > largest_free)
+				largest_free = gap;
+			total_free += gap;
+			last_end = cur->va_end;
+		}
+
+		if (list_is_last(&cur->list, &vmap_area_list))
+			break;
+	}
+
+done:
+	gap = (MODULES_END - last_end);
+	if (gap > largest_free)
+		largest_free = gap;
+	total_free += gap;
+
+	spin_unlock(&vmap_area_lock);
+
+	seq_printf(m, "\tLargest free space:\t%lu kB\n", largest_free / 1024);
+	seq_printf(m, "\t  Total free space:\t%lu kB\n", total_free / 1024);
+
+	if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && kaslr_enabled())
+		seq_printf(m, "Allocations in backup area:\t%lu\n", backup_cnt);
+
+	return 0;
+}
+
+static int proc_module_frag_debug_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, modulefraginfo_debug_show, NULL);
+}
+
+static const struct file_operations debug_module_frag_operations = {
+	.open       = proc_module_frag_debug_open,
+	.read       = seq_read,
+	.llseek     = seq_lseek,
+	.release    = single_release,
+};
+
+static void __init debug_modfrag_init(void)
+{
+	debugfs_create_file("modfraginfo", 0400, NULL, NULL,
+			&debug_module_frag_operations);
+}
+#else /* defined(CONFIG_DEBUG_FS) && defined(CONFIG_X86_64) */
+static void __init debug_modfrag_init(void)
+{
+}
+#endif
+
+#if defined(CONFIG_DEBUG_FS) || defined(CONFIG_PROC_FS)
+static int __init info_vmalloc_init(void)
+{
+	proc_vmalloc_init();
+	debug_modfrag_init();
+	return 0;
+}
+
+module_init(info_vmalloc_init);
+#endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v4 0/3] KASLR feature to randomize each loadable module
  2018-08-29 22:59 [PATCH v4 0/3] KASLR feature to randomize each loadable module Rick Edgecombe
                   ` (2 preceding siblings ...)
  2018-08-29 22:59 ` [PATCH v4 3/3] vmalloc: Add debugfs modfraginfo Rick Edgecombe
@ 2018-08-30  2:27 ` Alexei Starovoitov
  2018-08-30 18:24   ` Edgecombe, Rick P
  3 siblings, 1 reply; 6+ messages in thread
From: Alexei Starovoitov @ 2018-08-30  2:27 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: tglx, mingo, hpa, x86, linux-kernel, linux-mm, kernel-hardening,
	daniel, jannh, keescook, kristen, dave.hansen, arjan, netdev

On Wed, Aug 29, 2018 at 03:59:36PM -0700, Rick Edgecombe wrote:
> Hi,
> 
> This is v4 of the "KASLR feature to randomize each loadable module" patchset.
> The purpose is to increase the randomization and also to make the modules
> randomized in relation to each other instead of just the base, so that if one
> module leaks the location of the others can't be inferred. It is enabled for
> x86_64 for now.
> 
> V4 is a few small fixes. I humbly think this is in pretty good shape at this
> point, unless anyone has any comments. The only other big change I was
> considering was moving the new randomization algorithm into vmalloc so it could
> be re-used for other architectures or possibly other vmalloc usages.
> 
> A few words on how this was tested - As previously mentioned, the entropy
> estimates were done using extracted module text sizes from the in-tree modules.
> These were also used to run 100,000's of simulated module allocations by calling
> module_alloc from a test module, including testing until allocation failure. The
> simulations kept track of every allocation address to make sure there were no
> collisions, and verified memory was actually mapped.
> 
> In addition the __vmalloc_node_try_addr function has a suite of unit tests that
> verify for a bunch of edge cases that it:
>  - Allows for allocations when it should
>  - Reports the right error code if it collides with a lazy-free area or real
>    allocation
>  - Verifies it frees a lazy free area when it should
> 
> These synthetic tests were also how the performance metrics were gathered.
> 
> Changes for V4:
>  - Fix issue caused by KASAN, kmemleak being provided different allocation
>    lengths (padding).
>  - Avoid kmalloc until sure its needed in __vmalloc_node_try_addr.
>  - Fix for debug file hang when the last VA is a lazy purge area
>  - Fixed issues reported by 0-day build system.
> 
> Changes for V3:
>  - Code cleanup based on internal feedback. (thanks to Dave Hansen and Andriy
>    Shevchenko)
>  - Slight refactor of existing algorithm to more cleanly live along side new
>    one.
>  - BPF synthetic benchmark

I don't see this benchmark in this patch set.
Could you prepare it as a test in tools/testing/selftests/bpf/ ?
so we can double check what is being tested and run it regularly
like we do for all other tests in there.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4 0/3] KASLR feature to randomize each loadable module
  2018-08-30  2:27 ` [PATCH v4 0/3] KASLR feature to randomize each loadable module Alexei Starovoitov
@ 2018-08-30 18:24   ` Edgecombe, Rick P
  0 siblings, 0 replies; 6+ messages in thread
From: Edgecombe, Rick P @ 2018-08-30 18:24 UTC (permalink / raw)
  To: alexei.starovoitov
  Cc: linux-kernel, daniel, jannh, keescook, arjan, tglx, linux-mm,
	x86, kristen, hpa, mingo, kernel-hardening, Hansen, Dave, netdev

On Wed, 2018-08-29 at 19:27 -0700, Alexei Starovoitov wrote:
> On Wed, Aug 29, 2018 at 03:59:36PM -0700, Rick Edgecombe wrote:
> > Changes for V3:
> >  - Code cleanup based on internal feedback. (thanks to Dave Hansen and
> > Andriy
> >    Shevchenko)
> >  - Slight refactor of existing algorithm to more cleanly live along side new
> >    one.
> >  - BPF synthetic benchmark
> I don't see this benchmark in this patch set.
> Could you prepare it as a test in tools/testing/selftests/bpf/ ?
> so we can double check what is being tested and run it regularly
> like we do for all other tests in there.
Sure.

There were two benchmarks I had run with BPF in mind, one was the timing the
module_alloc function in different scenarios, looking to make sure there were no
slowdowns for insertions.

The other was to check if the fragmentation caused any measurable runtime
performance:
"For runtime performance, a synthetic benchmark was run that does 5000000 BPF
JIT invocations each, from varying numbers of parallel processes, while the
kernel compiles sharing the same CPU to stand in for the cache impact of a real
workload. The seccomp filter invocations were just Jann Horn's seccomp filtering
test from this thread http://openwall.com/lists/kernel-hardening/2018/07/18/2,
except non-real time priority. The kernel was configured with KPTI and
retpoline, and pcid was disabled. There wasn't any significant difference
between the new and the old."

From what I know about the bpf kselftest, the first one would probably be a
better fit. Not sure if the second one would fit, with the kernel compiling
sharing the same CPU, a special config, and a huge amount of processes being
spawned... I can try to add a micro-benchmark instead if that sounds good.

Rick

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-08-30 18:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-29 22:59 [PATCH v4 0/3] KASLR feature to randomize each loadable module Rick Edgecombe
2018-08-29 22:59 ` [PATCH v4 1/3] vmalloc: Add __vmalloc_node_try_addr function Rick Edgecombe
2018-08-29 22:59 ` [PATCH v4 2/3] x86/modules: Increase randomization for modules Rick Edgecombe
2018-08-29 22:59 ` [PATCH v4 3/3] vmalloc: Add debugfs modfraginfo Rick Edgecombe
2018-08-30  2:27 ` [PATCH v4 0/3] KASLR feature to randomize each loadable module Alexei Starovoitov
2018-08-30 18:24   ` Edgecombe, Rick P

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).