linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/7] Rlimit for module space
@ 2018-10-11 23:31 Rick Edgecombe
  2018-10-11 23:31 ` [PATCH v2 1/7] modules: Create rlimit " Rick Edgecombe
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Rick Edgecombe @ 2018-10-11 23:31 UTC (permalink / raw)
  To: kernel-hardening, daniel, keescook, catalin.marinas, will.deacon,
	davem, tglx, mingo, bp, x86, arnd, jeyu, linux-arm-kernel,
	linux-kernel, linux-mips, linux-s390, sparclinux, linux-fsdevel,
	linux-arch
  Cc: kristen, dave.hansen, arjan, deneen.t.dock, Rick Edgecombe

Hi,

This is v2 of a patch series that was first sent to security@kernel.org. The
recommendation was to pursue the fix on public lists. First I’ll describe the
issue that this is trying to solve, and then the general solution being
proposed, and lastly summarize the feedback so far. At a high level, this is
coming from a local DOS on eBPF, a KASLR module offset leak and desire for
general hardening of module space usage.

Problem
-------
If BPF JIT is on, there is no effective limit to prevent filling the entire
module space with JITed e/BPF filters. For classic BPF filters attached with
setsockopt SO_ATTACH_FILTER, there is no memlock rlimit check to limit the
number of insertions like this is for the bpf syscall. The cBPF gets converted
to eBPF and then handled by the JIT depending on if JIT is enabled. There is a
low enough default limit for open file descriptors per process, but this can be
worked around easily by forking before inserting. If the memlock rlimit is set
high for some other reason, eBPF programs inserted with the bpf syscall can also
exhaust the space.

This can cause problems not limited to:
 - Filling the entire module space with filters so that kernel modules cannot
   be loaded.
 - If CONFIG_BPF_JIT_ALWAYS_ON is configured, then if the module space is full,
   other BPF insertions will fail. This could cause filters that apps are
   relying on for security to fail to insert.
 - Counting the allocations until failure, since the module space is
   allocated linearly, the number of allocations can be used to de-randomize
   modules, with KASLR module base randomization. (This has been POCed with some
   assumptions)

Thanks to Daniel Borkmann for helping me understand what was happening with the
classic BPF JIT compilation and CONFIG_BPF_JIT_ALWAYS_ON config.

Proposed solution
-----------------
The solution being proposed here is to add a per user rlimit for module space,
similar to memlock rlimit. For the case of fds with attached filters being sent
over domain sockets, there is tracking for the uid of each module allocation.

Hopefully this could also be used for helping prevent BPF JIT spray type attacks
if a lower, more locked down setting is used.

The default memlock rlimit is pretty low, so just adding a check to classic BPF
similar to what happens in the bpf syscall may cause breakages. In addition,
some usages may increase the memlock limit for other reasons, which will remove
the protection for exhausting the module space.

There is unfortunately no cross platform place to perform this accounting
during allocation in the module space, so instead two helpers are created to be
inserted into the various arch’s that implement module_alloc. These helpers
perform the checks and help with tracking. The intention is that they can be
added to the other arch’s as easily as possible.

For decrementing the module space usage when an area is free, there _is_ a
cross-platform place to do this, so its done there. The behavior is that if the
helpers to increment and check are not added into an arch’s module_alloc, then
the decrement should have no effect. This is due to the allocation being missing
from the allocation-uid tracking.

Changes since v1 RFC
--------------------
Some feedback from Kees Cook was to try to plug this in for every architecture
and so this is done in this set for every architecture that has a BPF JIT
implementation. I have only done testing on x86.

There was also a suggestion from Daniel Borkmann to have default value for the
rlimit scale with the module size. This was complicated because the module space
size is not named the same accross architectures. It also is not always a
compile time constant and so the struct initilization would need to be changed.
So instead for this version a default value is added that can be overridden for
each architecture. For this set it is just defined for x86, all others get the
default.

Questions
---------
 - Should there be any special behavior for root or users with superuser
   capabilities?

Rick Edgecombe (7):
  modules: Create rlimit for module space
  x86/modules: Add rlimit checking for x86 modules
  arm/modules: Add rlimit checking for arm modules
  arm64/modules: Add rlimit checking for arm64 modules
  mips/modules: Add rlimit checking for mips modules
  sparc/modules: Add rlimit for sparc modules
  s390/modules: Add rlimit checking for s390 modules

 arch/arm/kernel/module.c                |  12 +-
 arch/arm64/kernel/module.c              |   5 +
 arch/mips/kernel/module.c               |  11 +-
 arch/s390/kernel/module.c               |  12 +-
 arch/sparc/kernel/module.c              |   5 +
 arch/x86/include/asm/pgtable_32_types.h |   3 +
 arch/x86/include/asm/pgtable_64_types.h |   2 +
 arch/x86/kernel/module.c                |   7 +-
 fs/proc/base.c                          |   1 +
 include/asm-generic/resource.h          |   8 ++
 include/linux/moduleloader.h            |   3 +
 include/linux/sched/user.h              |   4 +
 include/uapi/asm-generic/resource.h     |   3 +-
 kernel/module.c                         | 141 +++++++++++++++++++++++-
 14 files changed, 210 insertions(+), 7 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 1/7] modules: Create rlimit for module space
  2018-10-11 23:31 [PATCH v2 0/7] Rlimit for module space Rick Edgecombe
@ 2018-10-11 23:31 ` Rick Edgecombe
  2018-10-12  0:35   ` Jann Horn
  2018-10-11 23:31 ` [PATCH v2 2/7] x86/modules: Add rlimit checking for x86 modules Rick Edgecombe
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Rick Edgecombe @ 2018-10-11 23:31 UTC (permalink / raw)
  To: kernel-hardening, daniel, keescook, catalin.marinas, will.deacon,
	davem, tglx, mingo, bp, x86, arnd, jeyu, linux-arm-kernel,
	linux-kernel, linux-mips, linux-s390, sparclinux, linux-fsdevel,
	linux-arch
  Cc: kristen, dave.hansen, arjan, deneen.t.dock, Rick Edgecombe

This introduces a new rlimit, RLIMIT_MODSPACE, which limits the amount of
module space a user can use. The intention is to be able to limit module space
allocations that may come from un-privlidged users inserting e/BPF filters.

There is unfortunately no cross platform place to perform this accounting
during allocation in the module space, so instead two helpers are created to be
inserted into the various arch’s that implement module_alloc. These
helpers perform the checks and help with tracking. The intention is that they
an be added to the various arch’s as easily as possible.

Since filters attached to sockets can be passed to other processes via domain
sockets and freed there, there is new tracking for the uid of each allocation.
This way if the allocation is freed by a different user, it will not throw off
the accounting.

For decrementing the module space usage when an area is free, there is a
cross-platform place to do this. The behavior is that if the helpers to
increment and check are not added into an arch’s module_alloc, then the
decrement should have no effect. This is due to the allocation being missing
from the allocation-uid tracking.

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 fs/proc/base.c                      |   1 +
 include/asm-generic/resource.h      |   8 ++
 include/linux/moduleloader.h        |   3 +
 include/linux/sched/user.h          |   4 +
 include/uapi/asm-generic/resource.h |   3 +-
 kernel/module.c                     | 141 +++++++++++++++++++++++++++-
 6 files changed, 158 insertions(+), 2 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 7e9f07bf260d..84824f50e9f8 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -562,6 +562,7 @@ static const struct limit_names lnames[RLIM_NLIMITS] = {
 	[RLIMIT_NICE] = {"Max nice priority", NULL},
 	[RLIMIT_RTPRIO] = {"Max realtime priority", NULL},
 	[RLIMIT_RTTIME] = {"Max realtime timeout", "us"},
+	[RLIMIT_MODSPACE] = {"Max module space", "bytes"},
 };
 
 /* Display limits for a process */
diff --git a/include/asm-generic/resource.h b/include/asm-generic/resource.h
index 8874f681b056..94c150e3dd12 100644
--- a/include/asm-generic/resource.h
+++ b/include/asm-generic/resource.h
@@ -4,6 +4,13 @@
 
 #include <uapi/asm-generic/resource.h>
 
+/*
+ * If the module space rlimit is not defined in an arch specific way, leave
+ * room for 10000 large eBPF filters.
+ */
+#ifndef MODSPACE_LIMIT
+#define MODSPACE_LIMIT (5*PAGE_SIZE*10000)
+#endif
 
 /*
  * boot-time rlimit defaults for the init task:
@@ -26,6 +33,7 @@
 	[RLIMIT_NICE]		= { 0, 0 },				\
 	[RLIMIT_RTPRIO]		= { 0, 0 },				\
 	[RLIMIT_RTTIME]		= {  RLIM_INFINITY,  RLIM_INFINITY },	\
+	[RLIMIT_MODSPACE]	= {  MODSPACE_LIMIT,  MODSPACE_LIMIT },	\
 }
 
 #endif
diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
index 31013c2effd3..206539e97579 100644
--- a/include/linux/moduleloader.h
+++ b/include/linux/moduleloader.h
@@ -86,6 +86,9 @@ void module_arch_cleanup(struct module *mod);
 /* Any cleanup before freeing mod->module_init */
 void module_arch_freeing_init(struct module *mod);
 
+int check_inc_mod_rlimit(unsigned long size);
+void update_mod_rlimit(void *addr, unsigned long size);
+
 #ifdef CONFIG_KASAN
 #include <linux/kasan.h>
 #define MODULE_ALIGN (PAGE_SIZE << KASAN_SHADOW_SCALE_SHIFT)
diff --git a/include/linux/sched/user.h b/include/linux/sched/user.h
index 39ad98c09c58..4c6d99d066fe 100644
--- a/include/linux/sched/user.h
+++ b/include/linux/sched/user.h
@@ -44,6 +44,10 @@ struct user_struct {
 	atomic_long_t locked_vm;
 #endif
 
+#ifdef CONFIG_MODULES
+	atomic_long_t module_vm;
+#endif
+
 	/* Miscellaneous per-user rate limit */
 	struct ratelimit_state ratelimit;
 };
diff --git a/include/uapi/asm-generic/resource.h b/include/uapi/asm-generic/resource.h
index f12db7a0da64..3f998340ed30 100644
--- a/include/uapi/asm-generic/resource.h
+++ b/include/uapi/asm-generic/resource.h
@@ -46,7 +46,8 @@
 					   0-39 for nice level 19 .. -20 */
 #define RLIMIT_RTPRIO		14	/* maximum realtime priority */
 #define RLIMIT_RTTIME		15	/* timeout for RT tasks in us */
-#define RLIM_NLIMITS		16
+#define RLIMIT_MODSPACE		16	/* max module space address usage */
+#define RLIM_NLIMITS		17
 
 /*
  * SuS says limits have to be unsigned.
diff --git a/kernel/module.c b/kernel/module.c
index 6746c85511fe..2ef9ed95bf60 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2110,9 +2110,139 @@ static void free_module_elf(struct module *mod)
 }
 #endif /* CONFIG_LIVEPATCH */
 
+struct mod_alloc_user {
+	struct rb_node node;
+	unsigned long addr;
+	unsigned long pages;
+	kuid_t uid;
+};
+
+static struct rb_root alloc_users = RB_ROOT;
+static DEFINE_SPINLOCK(alloc_users_lock);
+
+static unsigned int get_mod_page_cnt(unsigned long size)
+{
+	/* Add one for guard page */
+	return (PAGE_ALIGN(size) >> PAGE_SHIFT) + 1;
+}
+
+void update_mod_rlimit(void *addr, unsigned long size)
+{
+	unsigned long addrl = (unsigned long) addr;
+	struct rb_node **new = &(alloc_users.rb_node), *parent = NULL;
+	struct mod_alloc_user *track = kmalloc(sizeof(struct mod_alloc_user),
+				GFP_KERNEL);
+	unsigned int pages = get_mod_page_cnt(size);
+
+	/*
+	 * If addr is NULL, then we need to reverse the earlier increment that
+	 * would have happened in an check_inc_mod_rlimit call.
+	 */
+	if (!addr) {
+		struct user_struct *user = get_current_user();
+
+		atomic_long_sub(pages, &user->module_vm);
+		free_uid(user);
+		return;
+	}
+
+	/* Now, add tracking for the uid that allocated this */
+	track->uid = current_uid();
+	track->addr = addrl;
+	track->pages = pages;
+
+	spin_lock(&alloc_users_lock);
+
+	while (*new) {
+		struct mod_alloc_user *cur =
+				rb_entry(*new, struct mod_alloc_user, node);
+		parent = *new;
+		if (cur->addr > addrl)
+			new = &(*new)->rb_left;
+		else
+			new = &(*new)->rb_right;
+	}
+
+	rb_link_node(&(track->node), parent, new);
+	rb_insert_color(&(track->node), &alloc_users);
+
+	spin_unlock(&alloc_users_lock);
+}
+
+/* Remove user allocation tracking, return NULL if allocation untracked */
+static struct user_struct *remove_user_alloc(void *addr, unsigned long *pages)
+{
+	struct rb_node *cur_node = alloc_users.rb_node;
+	unsigned long addrl = (unsigned long) addr;
+	struct mod_alloc_user *cur_alloc_user = NULL;
+	struct user_struct *user;
+
+	spin_lock(&alloc_users_lock);
+	while (cur_node) {
+		cur_alloc_user =
+			rb_entry(cur_node, struct mod_alloc_user, node);
+		if (cur_alloc_user->addr > addrl)
+			cur_node = cur_node->rb_left;
+		else if (cur_alloc_user->addr < addrl)
+			cur_node = cur_node->rb_right;
+		else
+			goto found;
+	}
+	spin_unlock(&alloc_users_lock);
+
+	return NULL;
+found:
+	rb_erase(&cur_alloc_user->node, &alloc_users);
+	spin_unlock(&alloc_users_lock);
+
+	user = find_user(cur_alloc_user->uid);
+	*pages = cur_alloc_user->pages;
+	kfree(cur_alloc_user);
+
+	return user;
+}
+
+int check_inc_mod_rlimit(unsigned long size)
+{
+	struct user_struct *user = get_current_user();
+	unsigned long modspace_pages = rlimit(RLIMIT_MODSPACE) >> PAGE_SHIFT;
+	unsigned long cur_pages = atomic_long_read(&user->module_vm);
+	unsigned long new_pages = get_mod_page_cnt(size);
+
+	if (rlimit(RLIMIT_MODSPACE) != RLIM_INFINITY
+			&& cur_pages + new_pages > modspace_pages) {
+		free_uid(user);
+		return 1;
+	}
+
+	atomic_long_add(new_pages, &user->module_vm);
+
+	if (atomic_long_read(&user->module_vm) > modspace_pages) {
+		atomic_long_sub(new_pages, &user->module_vm);
+		free_uid(user);
+		return 1;
+	}
+
+	free_uid(user);
+	return 0;
+}
+
+void dec_mod_rlimit(void *addr)
+{
+	unsigned long pages;
+	struct user_struct *user = remove_user_alloc(addr, &pages);
+
+	if (!user)
+		return;
+
+	atomic_long_sub(pages, &user->module_vm);
+	free_uid(user);
+}
+
 void __weak module_memfree(void *module_region)
 {
 	vfree(module_region);
+	dec_mod_rlimit(module_region);
 }
 
 void __weak module_arch_cleanup(struct module *mod)
@@ -2730,7 +2860,16 @@ static void dynamic_debug_remove(struct module *mod, struct _ddebug *debug)
 
 void * __weak module_alloc(unsigned long size)
 {
-	return vmalloc_exec(size);
+	void *p;
+
+	if (check_inc_mod_rlimit(size))
+		return NULL;
+
+	p = vmalloc_exec(size);
+
+	update_mod_rlimit(p, size);
+
+	return p;
 }
 
 #ifdef CONFIG_DEBUG_KMEMLEAK
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 2/7] x86/modules: Add rlimit checking for x86 modules
  2018-10-11 23:31 [PATCH v2 0/7] Rlimit for module space Rick Edgecombe
  2018-10-11 23:31 ` [PATCH v2 1/7] modules: Create rlimit " Rick Edgecombe
@ 2018-10-11 23:31 ` Rick Edgecombe
  2018-10-11 23:31 ` [PATCH v2 3/7] arm/modules: Add rlimit checking for arm modules Rick Edgecombe
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Rick Edgecombe @ 2018-10-11 23:31 UTC (permalink / raw)
  To: kernel-hardening, daniel, keescook, catalin.marinas, will.deacon,
	davem, tglx, mingo, bp, x86, arnd, jeyu, linux-arm-kernel,
	linux-kernel, linux-mips, linux-s390, sparclinux, linux-fsdevel,
	linux-arch
  Cc: kristen, dave.hansen, arjan, deneen.t.dock, Rick Edgecombe

This adds in the rlimit checking for the x86 module allocator.

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 arch/x86/include/asm/pgtable_32_types.h | 3 +++
 arch/x86/include/asm/pgtable_64_types.h | 2 ++
 arch/x86/kernel/module.c                | 7 ++++++-
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/pgtable_32_types.h b/arch/x86/include/asm/pgtable_32_types.h
index b0bc0fff5f1f..185e382fa8c3 100644
--- a/arch/x86/include/asm/pgtable_32_types.h
+++ b/arch/x86/include/asm/pgtable_32_types.h
@@ -68,6 +68,9 @@ extern bool __vmalloc_start_set; /* set once high_memory is set */
 #define MODULES_END	VMALLOC_END
 #define MODULES_LEN	(MODULES_VADDR - MODULES_END)
 
+/* Half of 128MB vmalloc space */
+#define MODSPACE_LIMIT (1 << 25)
+
 #define MAXMEM	(VMALLOC_END - PAGE_OFFSET - __VMALLOC_RESERVE)
 
 #endif /* _ASM_X86_PGTABLE_32_DEFS_H */
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 04edd2d58211..c256931f4667 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -143,6 +143,8 @@ extern unsigned int ptrs_per_p4d;
 #define MODULES_END		_AC(0xffffffffff000000, UL)
 #define MODULES_LEN		(MODULES_END - MODULES_VADDR)
 
+#define MODSPACE_LIMIT 		(MODULES_LEN / 2)
+
 #define ESPFIX_PGD_ENTRY	_AC(-2, UL)
 #define ESPFIX_BASE_ADDR	(ESPFIX_PGD_ENTRY << P4D_SHIFT)
 
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index f58336af095c..5eb3f9c5a976 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -84,16 +84,21 @@ void *module_alloc(unsigned long size)
 	if (PAGE_ALIGN(size) > MODULES_LEN)
 		return NULL;
 
+	if (check_inc_mod_rlimit(size))
+		return NULL;
+
 	p = __vmalloc_node_range(size, MODULE_ALIGN,
 				    MODULES_VADDR + get_module_load_offset(),
 				    MODULES_END, GFP_KERNEL,
 				    PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
 				    __builtin_return_address(0));
 	if (p && (kasan_module_alloc(p, size) < 0)) {
-		vfree(p);
+		module_memfree(p);
 		return NULL;
 	}
 
+	update_mod_rlimit(p, size);
+
 	return p;
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 3/7] arm/modules: Add rlimit checking for arm modules
  2018-10-11 23:31 [PATCH v2 0/7] Rlimit for module space Rick Edgecombe
  2018-10-11 23:31 ` [PATCH v2 1/7] modules: Create rlimit " Rick Edgecombe
  2018-10-11 23:31 ` [PATCH v2 2/7] x86/modules: Add rlimit checking for x86 modules Rick Edgecombe
@ 2018-10-11 23:31 ` Rick Edgecombe
  2018-10-11 23:31 ` [PATCH v2 4/7] arm64/modules: Add rlimit checking for arm64 modules Rick Edgecombe
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Rick Edgecombe @ 2018-10-11 23:31 UTC (permalink / raw)
  To: kernel-hardening, daniel, keescook, catalin.marinas, will.deacon,
	davem, tglx, mingo, bp, x86, arnd, jeyu, linux-arm-kernel,
	linux-kernel, linux-mips, linux-s390, sparclinux, linux-fsdevel,
	linux-arch
  Cc: kristen, dave.hansen, arjan, deneen.t.dock, Rick Edgecombe

This adds in the rlimit checking for the arm module allocator.

This has not been tested.

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 arch/arm/kernel/module.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
index 3ff571c2c71c..e331863553d2 100644
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -43,6 +43,9 @@ void *module_alloc(unsigned long size)
 	gfp_t gfp_mask = GFP_KERNEL;
 	void *p;
 
+	if (check_inc_mod_rlimit(size))
+		return NULL;
+
 	/* Silence the initial allocation */
 	if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS))
 		gfp_mask |= __GFP_NOWARN;
@@ -51,10 +54,15 @@ void *module_alloc(unsigned long size)
 				gfp_mask, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
 				__builtin_return_address(0));
 	if (!IS_ENABLED(CONFIG_ARM_MODULE_PLTS) || p)
-		return p;
-	return __vmalloc_node_range(size, 1,  VMALLOC_START, VMALLOC_END,
+		goto done;
+	p = __vmalloc_node_range(size, 1,  VMALLOC_START, VMALLOC_END,
 				GFP_KERNEL, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
 				__builtin_return_address(0));
+
+done:
+	update_mod_rlimit(p, size);
+
+	return p;
 }
 #endif
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 4/7] arm64/modules: Add rlimit checking for arm64 modules
  2018-10-11 23:31 [PATCH v2 0/7] Rlimit for module space Rick Edgecombe
                   ` (2 preceding siblings ...)
  2018-10-11 23:31 ` [PATCH v2 3/7] arm/modules: Add rlimit checking for arm modules Rick Edgecombe
@ 2018-10-11 23:31 ` Rick Edgecombe
  2018-10-11 23:47   ` Dave Hansen
  2018-10-11 23:31 ` [PATCH v2 5/7] mips/modules: Add rlimit checking for mips modules Rick Edgecombe
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Rick Edgecombe @ 2018-10-11 23:31 UTC (permalink / raw)
  To: kernel-hardening, daniel, keescook, catalin.marinas, will.deacon,
	davem, tglx, mingo, bp, x86, arnd, jeyu, linux-arm-kernel,
	linux-kernel, linux-mips, linux-s390, sparclinux, linux-fsdevel,
	linux-arch
  Cc: kristen, dave.hansen, arjan, deneen.t.dock, Rick Edgecombe

This adds in the rlimit checking for the arm64 module allocator.

This has not been tested.

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 arch/arm64/kernel/module.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index f0f27aeefb73..ea9794f2f571 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -39,6 +39,9 @@ void *module_alloc(unsigned long size)
 	if (IS_ENABLED(CONFIG_ARM64_MODULE_PLTS))
 		gfp_mask |= __GFP_NOWARN;
 
+	if (check_inc_mod_rlimit(size))
+		return NULL;
+
 	p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
 				module_alloc_base + MODULES_VSIZE,
 				gfp_mask, PAGE_KERNEL_EXEC, 0,
@@ -65,6 +68,8 @@ void *module_alloc(unsigned long size)
 		return NULL;
 	}
 
+	update_mod_rlimit(p, size);
+
 	return p;
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 5/7] mips/modules: Add rlimit checking for mips modules
  2018-10-11 23:31 [PATCH v2 0/7] Rlimit for module space Rick Edgecombe
                   ` (3 preceding siblings ...)
  2018-10-11 23:31 ` [PATCH v2 4/7] arm64/modules: Add rlimit checking for arm64 modules Rick Edgecombe
@ 2018-10-11 23:31 ` Rick Edgecombe
  2018-10-11 23:31 ` [PATCH v2 6/7] sparc/modules: Add rlimit for sparc modules Rick Edgecombe
  2018-10-11 23:31 ` [PATCH v2 7/7] s390/modules: Add rlimit checking for s390 modules Rick Edgecombe
  6 siblings, 0 replies; 19+ messages in thread
From: Rick Edgecombe @ 2018-10-11 23:31 UTC (permalink / raw)
  To: kernel-hardening, daniel, keescook, catalin.marinas, will.deacon,
	davem, tglx, mingo, bp, x86, arnd, jeyu, linux-arm-kernel,
	linux-kernel, linux-mips, linux-s390, sparclinux, linux-fsdevel,
	linux-arch
  Cc: kristen, dave.hansen, arjan, deneen.t.dock, Rick Edgecombe

This adds in the rlimit checking for the mips module allocator.

This has not been tested.

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 arch/mips/kernel/module.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/mips/kernel/module.c b/arch/mips/kernel/module.c
index 491605137b03..7a23392512d1 100644
--- a/arch/mips/kernel/module.c
+++ b/arch/mips/kernel/module.c
@@ -47,9 +47,18 @@ static DEFINE_SPINLOCK(dbe_lock);
 #ifdef MODULE_START
 void *module_alloc(unsigned long size)
 {
-	return __vmalloc_node_range(size, 1, MODULE_START, MODULE_END,
+	void *p;
+
+	if (check_inc_mod_rlimit(size))
+		return NULL;
+
+	p = __vmalloc_node_range(size, 1, MODULE_START, MODULE_END,
 				GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
 				__builtin_return_address(0));
+
+	update_mod_rlimit(p, size);
+
+	return p;
 }
 #endif
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 6/7] sparc/modules: Add rlimit for sparc modules
  2018-10-11 23:31 [PATCH v2 0/7] Rlimit for module space Rick Edgecombe
                   ` (4 preceding siblings ...)
  2018-10-11 23:31 ` [PATCH v2 5/7] mips/modules: Add rlimit checking for mips modules Rick Edgecombe
@ 2018-10-11 23:31 ` Rick Edgecombe
  2018-10-11 23:31 ` [PATCH v2 7/7] s390/modules: Add rlimit checking for s390 modules Rick Edgecombe
  6 siblings, 0 replies; 19+ messages in thread
From: Rick Edgecombe @ 2018-10-11 23:31 UTC (permalink / raw)
  To: kernel-hardening, daniel, keescook, catalin.marinas, will.deacon,
	davem, tglx, mingo, bp, x86, arnd, jeyu, linux-arm-kernel,
	linux-kernel, linux-mips, linux-s390, sparclinux, linux-fsdevel,
	linux-arch
  Cc: kristen, dave.hansen, arjan, deneen.t.dock, Rick Edgecombe

This adds in the rlimit checking for the sparc module allocator.

This has not been tested.

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 arch/sparc/kernel/module.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/sparc/kernel/module.c b/arch/sparc/kernel/module.c
index df39580f398d..24854fdfa7c3 100644
--- a/arch/sparc/kernel/module.c
+++ b/arch/sparc/kernel/module.c
@@ -44,10 +44,15 @@ void *module_alloc(unsigned long size)
 {
 	void *ret;
 
+	if (check_inc_mod_rlimit(size))
+		return NULL;
+
 	ret = module_map(size);
 	if (ret)
 		memset(ret, 0, size);
 
+	update_mod_rlimit(ret, size);
+
 	return ret;
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 7/7] s390/modules: Add rlimit checking for s390 modules
  2018-10-11 23:31 [PATCH v2 0/7] Rlimit for module space Rick Edgecombe
                   ` (5 preceding siblings ...)
  2018-10-11 23:31 ` [PATCH v2 6/7] sparc/modules: Add rlimit for sparc modules Rick Edgecombe
@ 2018-10-11 23:31 ` Rick Edgecombe
  6 siblings, 0 replies; 19+ messages in thread
From: Rick Edgecombe @ 2018-10-11 23:31 UTC (permalink / raw)
  To: kernel-hardening, daniel, keescook, catalin.marinas, will.deacon,
	davem, tglx, mingo, bp, x86, arnd, jeyu, linux-arm-kernel,
	linux-kernel, linux-mips, linux-s390, sparclinux, linux-fsdevel,
	linux-arch
  Cc: kristen, dave.hansen, arjan, deneen.t.dock, Rick Edgecombe

This adds in the rlimit checking for the s390 module allocator.

This has not been tested.

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 arch/s390/kernel/module.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/s390/kernel/module.c b/arch/s390/kernel/module.c
index d298d3cb46d0..6c2356a72b63 100644
--- a/arch/s390/kernel/module.c
+++ b/arch/s390/kernel/module.c
@@ -32,12 +32,22 @@
 
 void *module_alloc(unsigned long size)
 {
+	void *p;
+
 	if (PAGE_ALIGN(size) > MODULES_LEN)
 		return NULL;
-	return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
+
+	if (check_inc_mod_rlimit(size))
+		return NULL;
+
+	p = __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
 				    GFP_KERNEL, PAGE_KERNEL_EXEC,
 				    0, NUMA_NO_NODE,
 				    __builtin_return_address(0));
+
+	update_mod_rlimit(p, size);
+
+	return p;
 }
 
 void module_arch_freeing_init(struct module *mod)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 4/7] arm64/modules: Add rlimit checking for arm64 modules
  2018-10-11 23:31 ` [PATCH v2 4/7] arm64/modules: Add rlimit checking for arm64 modules Rick Edgecombe
@ 2018-10-11 23:47   ` Dave Hansen
  2018-10-12 14:32     ` Jessica Yu
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Hansen @ 2018-10-11 23:47 UTC (permalink / raw)
  To: Rick Edgecombe, kernel-hardening, daniel, keescook,
	catalin.marinas, will.deacon, davem, tglx, mingo, bp, x86, arnd,
	jeyu, linux-arm-kernel, linux-kernel, linux-mips, linux-s390,
	sparclinux, linux-fsdevel, linux-arch
  Cc: kristen, arjan, deneen.t.dock

On 10/11/2018 04:31 PM, Rick Edgecombe wrote:
> +	if (check_inc_mod_rlimit(size))
> +		return NULL;
> +
>  	p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
>  				module_alloc_base + MODULES_VSIZE,
>  				gfp_mask, PAGE_KERNEL_EXEC, 0,
> @@ -65,6 +68,8 @@ void *module_alloc(unsigned long size)
>  		return NULL;
>  	}
>  
> +	update_mod_rlimit(p, size);

Is there a reason we couldn't just rename all of the existing per-arch
module_alloc() calls to be called, say, "arch_module_alloc()", then put
this new rlimit code in a generic helper that does:


	if (check_inc_mod_rlimit(size))
		return NULL;

	p = arch_module_alloc(...);

	...

	update_mod_rlimit(p, size);


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/7] modules: Create rlimit for module space
  2018-10-11 23:31 ` [PATCH v2 1/7] modules: Create rlimit " Rick Edgecombe
@ 2018-10-12  0:35   ` Jann Horn
  2018-10-12 17:04     ` Edgecombe, Rick P
  2018-10-12 18:23     ` Jann Horn
  0 siblings, 2 replies; 19+ messages in thread
From: Jann Horn @ 2018-10-12  0:35 UTC (permalink / raw)
  To: rick.p.edgecombe
  Cc: Kernel Hardening, Daniel Borkmann, Kees Cook, Catalin Marinas,
	Will Deacon, David S. Miller, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, the arch/x86 maintainers, Arnd Bergmann, jeyu,
	linux-arm-kernel, kernel list, linux-mips, linux-s390,
	sparclinux, linux-fsdevel, linux-arch, kristen, Dave Hansen,
	Arjan van de Ven, deneen.t.dock

On Fri, Oct 12, 2018 at 1:40 AM Rick Edgecombe
<rick.p.edgecombe@intel.com> wrote:
> This introduces a new rlimit, RLIMIT_MODSPACE, which limits the amount of
> module space a user can use. The intention is to be able to limit module space
> allocations that may come from un-privlidged users inserting e/BPF filters.

Note that in some configurations (iirc e.g. the default Ubuntu
config), normal users can use the subuid mechanism (the /etc/subuid
config file and the /usr/bin/newuidmap setuid helper) to gain access
to 65536 UIDs, which means that in such a configuration,
RLIMIT_MODSPACE*65537 is the actual limit for one user. (Same thing
applies to RLIMIT_MEMLOCK.)

Also, it is probably possible to waste a few times as much virtual
memory as permitted by the limit by deliberately fragmenting virtual
memory?

> There is unfortunately no cross platform place to perform this accounting
> during allocation in the module space, so instead two helpers are created to be
> inserted into the various arch’s that implement module_alloc. These
> helpers perform the checks and help with tracking. The intention is that they
> an be added to the various arch’s as easily as possible.

nit: s/an/can/

[...]
> diff --git a/kernel/module.c b/kernel/module.c
> index 6746c85511fe..2ef9ed95bf60 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -2110,9 +2110,139 @@ static void free_module_elf(struct module *mod)
>  }
>  #endif /* CONFIG_LIVEPATCH */
>
> +struct mod_alloc_user {
> +       struct rb_node node;
> +       unsigned long addr;
> +       unsigned long pages;
> +       kuid_t uid;
> +};
> +
> +static struct rb_root alloc_users = RB_ROOT;
> +static DEFINE_SPINLOCK(alloc_users_lock);

Why all the rbtree stuff instead of stashing a pointer in struct
vmap_area, or something like that?

[...]
> +int check_inc_mod_rlimit(unsigned long size)
> +{
> +       struct user_struct *user = get_current_user();
> +       unsigned long modspace_pages = rlimit(RLIMIT_MODSPACE) >> PAGE_SHIFT;
> +       unsigned long cur_pages = atomic_long_read(&user->module_vm);
> +       unsigned long new_pages = get_mod_page_cnt(size);
> +
> +       if (rlimit(RLIMIT_MODSPACE) != RLIM_INFINITY
> +                       && cur_pages + new_pages > modspace_pages) {
> +               free_uid(user);
> +               return 1;
> +       }
> +
> +       atomic_long_add(new_pages, &user->module_vm);
> +
> +       if (atomic_long_read(&user->module_vm) > modspace_pages) {
> +               atomic_long_sub(new_pages, &user->module_vm);
> +               free_uid(user);
> +               return 1;
> +       }
> +
> +       free_uid(user);

If you drop the reference on the user_struct, an attacker with two
UIDs can charge module allocations to UID A, keep the associated
sockets alive as UID B, and then log out and back in again as UID A.
At that point, nobody is charged for the module space anymore. If you
look at the eBPF implementation, you'll see that
bpf_prog_charge_memlock() actually stores a refcounted pointer to the
user_struct.

> +       return 0;
> +}

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 4/7] arm64/modules: Add rlimit checking for arm64 modules
  2018-10-11 23:47   ` Dave Hansen
@ 2018-10-12 14:32     ` Jessica Yu
  2018-10-12 22:01       ` Edgecombe, Rick P
  0 siblings, 1 reply; 19+ messages in thread
From: Jessica Yu @ 2018-10-12 14:32 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Rick Edgecombe, kernel-hardening, daniel, keescook,
	catalin.marinas, will.deacon, davem, tglx, mingo, bp, x86, arnd,
	linux-arm-kernel, linux-kernel, linux-mips, linux-s390,
	sparclinux, linux-fsdevel, linux-arch, kristen, arjan,
	deneen.t.dock

+++ Dave Hansen [11/10/18 16:47 -0700]:
>On 10/11/2018 04:31 PM, Rick Edgecombe wrote:
>> +	if (check_inc_mod_rlimit(size))
>> +		return NULL;
>> +
>>  	p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
>>  				module_alloc_base + MODULES_VSIZE,
>>  				gfp_mask, PAGE_KERNEL_EXEC, 0,
>> @@ -65,6 +68,8 @@ void *module_alloc(unsigned long size)
>>  		return NULL;
>>  	}
>>
>> +	update_mod_rlimit(p, size);
>
>Is there a reason we couldn't just rename all of the existing per-arch
>module_alloc() calls to be called, say, "arch_module_alloc()", then put
>this new rlimit code in a generic helper that does:
>
>
>	if (check_inc_mod_rlimit(size))
>		return NULL;
>
>	p = arch_module_alloc(...);
>
>	...
>
>	update_mod_rlimit(p, size);
>

I second this suggestion. Just make module_{alloc,memfree} generic,
non-weak functions that call the rlimit helpers in addition to the
arch-specific arch_module_{alloc,memfree} functions.

Jessica

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/7] modules: Create rlimit for module space
  2018-10-12  0:35   ` Jann Horn
@ 2018-10-12 17:04     ` Edgecombe, Rick P
  2018-10-12 17:22       ` Jann Horn
  2018-10-23 11:32       ` Michal Hocko
  2018-10-12 18:23     ` Jann Horn
  1 sibling, 2 replies; 19+ messages in thread
From: Edgecombe, Rick P @ 2018-10-12 17:04 UTC (permalink / raw)
  To: jannh
  Cc: linux-kernel, daniel, keescook, jeyu, linux-arch, arjan, tglx,
	linux-mips, linux-s390, x86, kristen, Dock, Deneen T,
	catalin.marinas, mingo, will.deacon, kernel-hardening, bp,
	Hansen, Dave, linux-arm-kernel, davem, arnd, linux-fsdevel,
	sparclinux

On Fri, 2018-10-12 at 02:35 +0200, Jann Horn wrote:
> On Fri, Oct 12, 2018 at 1:40 AM Rick Edgecombe
> <rick.p.edgecombe@intel.com> wrote:
> > This introduces a new rlimit, RLIMIT_MODSPACE, which limits the amount of
> > module space a user can use. The intention is to be able to limit module
> > space
> > allocations that may come from un-privlidged users inserting e/BPF filters.
> 
> Note that in some configurations (iirc e.g. the default Ubuntu
> config), normal users can use the subuid mechanism (the /etc/subuid
> config file and the /usr/bin/newuidmap setuid helper) to gain access
> to 65536 UIDs, which means that in such a configuration,
> RLIMIT_MODSPACE*65537 is the actual limit for one user. (Same thing
> applies to RLIMIT_MEMLOCK.)
Ah, that is a problem. There is only room for about 130,000 filters on x86 with
KASLR, so it couldn't really be set small enough.

I'll have to look into what this is. Thanks for pointing it out.

> Also, it is probably possible to waste a few times as much virtual
> memory as permitted by the limit by deliberately fragmenting virtual
> memory?
Good point. I guess if the first point can be addressed somehow, this one could
maybe be solved by just picking a lower limit.

Any thoughts on if instead of all this there was just a system wide limit on BPF
JIT module space usage?

> > There is unfortunately no cross platform place to perform this accounting
> > during allocation in the module space, so instead two helpers are created to
> > be
> > inserted into the various arch’s that implement module_alloc. These
> > helpers perform the checks and help with tracking. The intention is that
> > they
> > an be added to the various arch’s as easily as possible.
> 
> nit: s/an/can/
> 
> [...]
> > diff --git a/kernel/module.c b/kernel/module.c
> > index 6746c85511fe..2ef9ed95bf60 100644
> > --- a/kernel/module.c
> > +++ b/kernel/module.c
> > @@ -2110,9 +2110,139 @@ static void free_module_elf(struct module *mod)
> >  }
> >  #endif /* CONFIG_LIVEPATCH */it 
> > 
> > +struct mod_alloc_user {
> > +       struct rb_node node;
> > +       unsigned long addr;
> > +       unsigned long pages;
> > +       kuid_t uid;
> > +};
> > +
> > +static struct rb_root alloc_users = RB_ROOT;
> > +static DEFINE_SPINLOCK(alloc_users_lock);
> 
> Why all the rbtree stuff instead of stashing a pointer in struct
> vmap_area, or something like that?

Since the tracking was not for all vmalloc usage, the intention was to not bloat
the structure for other usages likes stacks. I thought usually there wouldn't be
nearly as much module space allocations as there would be kernel stacks, but I
didn't do any actual measurements on the tradeoffs.

> [...]
> > +int check_inc_mod_rlimit(unsigned long size)
> > +{
> > +       struct user_struct *user = get_current_user();
> > +       unsigned long modspace_pages = rlimit(RLIMIT_MODSPACE) >>
> > PAGE_SHIFT;
> > +       unsigned long cur_pages = atomic_long_read(&user->module_vm);
> > +       unsigned long new_pages = get_mod_page_cnt(size);
> > +
> > +       if (rlimit(RLIMIT_MODSPACE) != RLIM_INFINITY
> > +                       && cur_pages + new_pages > modspace_pages) {
> > +               free_uid(user);
> > +               return 1;
> > +       }
> > +
> > +       atomic_long_add(new_pages, &user->module_vm);
> > +
> > +       if (atomic_long_read(&user->module_vm) > modspace_pages) {
> > +               atomic_long_sub(new_pages, &user->module_vm);
> > +               free_uid(user);
> > +               return 1;
> > +       }
> > +
> > +       free_uid(user);
> 
> If you drop the reference on the user_struct, an attacker with two
> UIDs can charge module allocations to UID A, keep the associated
> sockets alive as UID B, and then log out and back in again as UID A.
> At that point, nobody is charged for the module space anymore. If you
> look at the eBPF implementation, you'll see that
> bpf_prog_charge_memlock() actually stores a refcounted pointer to the
> user_struct.
Ok, I'll take a look. Thanks Jann.
> > +       return 0;
> > +}

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/7] modules: Create rlimit for module space
  2018-10-12 17:04     ` Edgecombe, Rick P
@ 2018-10-12 17:22       ` Jann Horn
  2018-10-13  0:04         ` Edgecombe, Rick P
  2018-10-23 11:32       ` Michal Hocko
  1 sibling, 1 reply; 19+ messages in thread
From: Jann Horn @ 2018-10-12 17:22 UTC (permalink / raw)
  To: rick.p.edgecombe
  Cc: kernel list, Daniel Borkmann, Kees Cook, jeyu, linux-arch,
	Arjan van de Ven, Thomas Gleixner, linux-mips, linux-s390,
	the arch/x86 maintainers, kristen, deneen.t.dock,
	Catalin Marinas, Ingo Molnar, Will Deacon, Kernel Hardening,
	Borislav Petkov, Dave Hansen, linux-arm-kernel, David S. Miller,
	Arnd Bergmann, linux-fsdevel, sparclinux

On Fri, Oct 12, 2018 at 7:04 PM Edgecombe, Rick P
<rick.p.edgecombe@intel.com> wrote:
> On Fri, 2018-10-12 at 02:35 +0200, Jann Horn wrote:
> > On Fri, Oct 12, 2018 at 1:40 AM Rick Edgecombe
> > <rick.p.edgecombe@intel.com> wrote:
> > > This introduces a new rlimit, RLIMIT_MODSPACE, which limits the amount of
> > > module space a user can use. The intention is to be able to limit module
> > > space
> > > allocations that may come from un-privlidged users inserting e/BPF filters.
> >
> > Note that in some configurations (iirc e.g. the default Ubuntu
> > config), normal users can use the subuid mechanism (the /etc/subuid
> > config file and the /usr/bin/newuidmap setuid helper) to gain access
> > to 65536 UIDs, which means that in such a configuration,
> > RLIMIT_MODSPACE*65537 is the actual limit for one user. (Same thing
> > applies to RLIMIT_MEMLOCK.)
> Ah, that is a problem. There is only room for about 130,000 filters on x86 with
> KASLR, so it couldn't really be set small enough.
>
> I'll have to look into what this is. Thanks for pointing it out.
>
> > Also, it is probably possible to waste a few times as much virtual
> > memory as permitted by the limit by deliberately fragmenting virtual
> > memory?
> Good point. I guess if the first point can be addressed somehow, this one could
> maybe be solved by just picking a lower limit.
>
> Any thoughts on if instead of all this there was just a system wide limit on BPF
> JIT module space usage?

That does sound more robust to me. And at least on systems that don't
compile out the BPF interpreter, everything should be more or less
fine then...

> > > There is unfortunately no cross platform place to perform this accounting
> > > during allocation in the module space, so instead two helpers are created to
> > > be
> > > inserted into the various arch’s that implement module_alloc. These
> > > helpers perform the checks and help with tracking. The intention is that
> > > they
> > > an be added to the various arch’s as easily as possible.
> >
> > nit: s/an/can/
> >
> > [...]
> > > diff --git a/kernel/module.c b/kernel/module.c
> > > index 6746c85511fe..2ef9ed95bf60 100644
> > > --- a/kernel/module.c
> > > +++ b/kernel/module.c
> > > @@ -2110,9 +2110,139 @@ static void free_module_elf(struct module *mod)
> > >  }
> > >  #endif /* CONFIG_LIVEPATCH */it
> > >
> > > +struct mod_alloc_user {
> > > +       struct rb_node node;
> > > +       unsigned long addr;
> > > +       unsigned long pages;
> > > +       kuid_t uid;
> > > +};
> > > +
> > > +static struct rb_root alloc_users = RB_ROOT;
> > > +static DEFINE_SPINLOCK(alloc_users_lock);
> >
> > Why all the rbtree stuff instead of stashing a pointer in struct
> > vmap_area, or something like that?
>
> Since the tracking was not for all vmalloc usage, the intention was to not bloat
> the structure for other usages likes stacks. I thought usually there wouldn't be
> nearly as much module space allocations as there would be kernel stacks, but I
> didn't do any actual measurements on the tradeoffs.

I imagine that one extra pointer in there - pointing to your struct
mod_alloc_user - would probably not be terrible. 8 bytes more per
kernel stack shouldn't be so bad?

> > [...]
> > > +int check_inc_mod_rlimit(unsigned long size)
> > > +{
> > > +       struct user_struct *user = get_current_user();
> > > +       unsigned long modspace_pages = rlimit(RLIMIT_MODSPACE) >>
> > > PAGE_SHIFT;
> > > +       unsigned long cur_pages = atomic_long_read(&user->module_vm);
> > > +       unsigned long new_pages = get_mod_page_cnt(size);
> > > +
> > > +       if (rlimit(RLIMIT_MODSPACE) != RLIM_INFINITY
> > > +                       && cur_pages + new_pages > modspace_pages) {
> > > +               free_uid(user);
> > > +               return 1;
> > > +       }
> > > +
> > > +       atomic_long_add(new_pages, &user->module_vm);
> > > +
> > > +       if (atomic_long_read(&user->module_vm) > modspace_pages) {
> > > +               atomic_long_sub(new_pages, &user->module_vm);
> > > +               free_uid(user);
> > > +               return 1;
> > > +       }
> > > +
> > > +       free_uid(user);
> >
> > If you drop the reference on the user_struct, an attacker with two
> > UIDs can charge module allocations to UID A, keep the associated
> > sockets alive as UID B, and then log out and back in again as UID A.
> > At that point, nobody is charged for the module space anymore. If you
> > look at the eBPF implementation, you'll see that
> > bpf_prog_charge_memlock() actually stores a refcounted pointer to the
> > user_struct.
> Ok, I'll take a look. Thanks Jann.
> > > +       return 0;
> > > +}

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/7] modules: Create rlimit for module space
  2018-10-12  0:35   ` Jann Horn
  2018-10-12 17:04     ` Edgecombe, Rick P
@ 2018-10-12 18:23     ` Jann Horn
  1 sibling, 0 replies; 19+ messages in thread
From: Jann Horn @ 2018-10-12 18:23 UTC (permalink / raw)
  To: rick.p.edgecombe
  Cc: Kernel Hardening, Daniel Borkmann, Kees Cook, Catalin Marinas,
	Will Deacon, David S. Miller, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, the arch/x86 maintainers, Arnd Bergmann, jeyu,
	linux-arm-kernel, kernel list, linux-mips, linux-s390,
	sparclinux, linux-fsdevel, linux-arch, kristen, Dave Hansen,
	Arjan van de Ven, deneen.t.dock

On Fri, Oct 12, 2018 at 2:35 AM Jann Horn <jannh@google.com> wrote:
> On Fri, Oct 12, 2018 at 1:40 AM Rick Edgecombe
> <rick.p.edgecombe@intel.com> wrote:
> > This introduces a new rlimit, RLIMIT_MODSPACE, which limits the amount of
> > module space a user can use. The intention is to be able to limit module space
> > allocations that may come from un-privlidged users inserting e/BPF filters.
>
> Note that in some configurations (iirc e.g. the default Ubuntu
> config), normal users can use the subuid mechanism (the /etc/subuid
> config file and the /usr/bin/newuidmap setuid helper) to gain access
> to 65536 UIDs, which means that in such a configuration,
> RLIMIT_MODSPACE*65537 is the actual limit for one user. (Same thing
> applies to RLIMIT_MEMLOCK.)

Actually, I may have misremembered, perhaps it's not installed by
default - I just checked in a Ubuntu VM, and the newuidmap helper from
the uidmap package wasn't installed.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 4/7] arm64/modules: Add rlimit checking for arm64 modules
  2018-10-12 14:32     ` Jessica Yu
@ 2018-10-12 22:01       ` Edgecombe, Rick P
  2018-10-12 22:54         ` Edgecombe, Rick P
  0 siblings, 1 reply; 19+ messages in thread
From: Edgecombe, Rick P @ 2018-10-12 22:01 UTC (permalink / raw)
  To: jeyu, Hansen, Dave
  Cc: linux-kernel, daniel, keescook, linux-s390, linux-arch, arjan,
	tglx, linux-mips, Dock, Deneen T, x86, kristen, catalin.marinas,
	mingo, will.deacon, kernel-hardening, bp, linux-arm-kernel,
	davem, arnd, linux-fsdevel, sparclinux

On Fri, 2018-10-12 at 16:32 +0200, Jessica Yu wrote:
> +++ Dave Hansen [11/10/18 16:47 -0700]:
> > On 10/11/2018 04:31 PM, Rick Edgecombe wrote:
> > > +	if (check_inc_mod_rlimit(size))
> > > +		return NULL;
> > > +
> > >  	p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
> > >  				module_alloc_base + MODULES_VSIZE,
> > >  				gfp_mask, PAGE_KERNEL_EXEC, 0,
> > > @@ -65,6 +68,8 @@ void *module_alloc(unsigned long size)
> > >  		return NULL;
> > >  	}
> > > 
> > > +	update_mod_rlimit(p, size);
> > 
> > Is there a reason we couldn't just rename all of the existing per-arch
> > module_alloc() calls to be called, say, "arch_module_alloc()", then put
> > this new rlimit code in a generic helper that does:
> > 
> > 
> > 	if (check_inc_mod_rlimit(size))
> > 		return NULL;
> > 
> > 	p = arch_module_alloc(...);
> > 
> > 	...
> > 
> > 	update_mod_rlimit(p, size);
> > 
> 
> I second this suggestion. Just make module_{alloc,memfree} generic,
> non-weak functions that call the rlimit helpers in addition to the
> arch-specific arch_module_{alloc,memfree} functions.
> 
> Jessica

Ok, thanks. I am going to try another version of this with just a system wide
BPF JIT limit based on the problems Jann brought up. I think it would be nice to
have a module space limit, but as far as I know the only way today un-privlidged 
users could fill the space is from BPF JIT. Unless you see another purpose long
term?

Rick

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 4/7] arm64/modules: Add rlimit checking for arm64 modules
  2018-10-12 22:01       ` Edgecombe, Rick P
@ 2018-10-12 22:54         ` Edgecombe, Rick P
  0 siblings, 0 replies; 19+ messages in thread
From: Edgecombe, Rick P @ 2018-10-12 22:54 UTC (permalink / raw)
  To: jeyu, Hansen, Dave
  Cc: linux-fsdevel, linux-kernel, daniel, keescook, arjan, linux-mips,
	tglx, linux-s390, x86, kristen, Dock, Deneen T, catalin.marinas,
	mingo, will.deacon, kernel-hardening, bp, linux-arm-kernel,
	davem, linux-arch, arnd, sparclinux

On Fri, 2018-10-12 at 22:01 +0000, Edgecombe, Rick P wrote:
> On Fri, 2018-10-12 at 16:32 +0200, Jessica Yu wrote:
> > +++ Dave Hansen [11/10/18 16:47 -0700]:
> > > On 10/11/2018 04:31 PM, Rick Edgecombe wrote:
> > > > +	if (check_inc_mod_rlimit(size))
> > > > +		return NULL;
> > > > +
> > > >  	p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
> > > >  				module_alloc_base + MODULES_VSIZE,
> > > >  				gfp_mask, PAGE_KERNEL_EXEC, 0,
> > > > @@ -65,6 +68,8 @@ void *module_alloc(unsigned long size)
> > > >  		return NULL;
> > > >  	}
> > > > 
> > > > +	update_mod_rlimit(p, size);
> > > 
> > > Is there a reason we couldn't just rename all of the existing per-arch
> > > module_alloc() calls to be called, say, "arch_module_alloc()", then put
> > > this new rlimit code in a generic helper that does:
> > > 
> > > 
> > > 	if (check_inc_mod_rlimit(size))
> > > 		return NULL;
> > > 
> > > 	p = arch_module_alloc(...);
> > > 
> > > 	...
> > > 
> > > 	update_mod_rlimit(p, size);
> > > 
> > 
> > I second this suggestion. Just make module_{alloc,memfree} generic,
> > non-weak functions that call the rlimit helpers in addition to the
> > arch-specific arch_module_{alloc,memfree} functions.
> > 
> > Jessica
> 
> Ok, thanks. I am going to try another version of this with just a system wide
> BPF JIT limit based on the problems Jann brought up. I think it would be nice
> to
> have a module space limit, but as far as I know the only way today un-
> privlidged 
> users could fill the space is from BPF JIT. Unless you see another purpose
> long
> term?

Err, nevermind. Going to try to include both limits. I'll incorporate this
refactor into the next version.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/7] modules: Create rlimit for module space
  2018-10-12 17:22       ` Jann Horn
@ 2018-10-13  0:04         ` Edgecombe, Rick P
  2018-10-13  0:09           ` Jann Horn
  0 siblings, 1 reply; 19+ messages in thread
From: Edgecombe, Rick P @ 2018-10-13  0:04 UTC (permalink / raw)
  To: jannh
  Cc: linux-fsdevel, linux-kernel, daniel, keescook, jeyu, arjan,
	linux-mips, tglx, linux-s390, x86, kristen, Dock, Deneen T,
	catalin.marinas, mingo, will.deacon, kernel-hardening, bp,
	Hansen, Dave, linux-arm-kernel, davem, linux-arch, arnd,
	sparclinux

On Fri, 2018-10-12 at 19:22 +0200, Jann Horn wrote:
> On Fri, Oct 12, 2018 at 7:04 PM Edgecombe, Rick P
> <rick.p.edgecombe@intel.com> wrote:
> > On Fri, 2018-10-12 at 02:35 +0200, Jann Horn wrote:
> > > Why all the rbtree stuff instead of stashing a pointer in struct
> > > vmap_area, or something like that?
> > 
> > Since the tracking was not for all vmalloc usage, the intention was to not
> > bloat
> > the structure for other usages likes stacks. I thought usually there
> > wouldn't be
> > nearly as much module space allocations as there would be kernel stacks, but
> > I
> > didn't do any actual measurements on the tradeoffs.
> 
> I imagine that one extra pointer in there - pointing to your struct
> mod_alloc_user - would probably not be terrible. 8 bytes more per
> kernel stack shouldn't be so bad?

I looked into this and it starts to look a little messy. The nommu.c version of
vmalloc doesn't use or expose access to vmap_area or vm_struct. So it starts to
look like a bunch of IFDEFs to remove the rlimit in the nommu case or making a
stand in that maintains pretend vm struct's in nommu.c. I had actually
previously tried to at least pull the allocations size from vmalloc structs, but it broke on nommu.

Thought I would check back and see. How important do you think this is?



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/7] modules: Create rlimit for module space
  2018-10-13  0:04         ` Edgecombe, Rick P
@ 2018-10-13  0:09           ` Jann Horn
  0 siblings, 0 replies; 19+ messages in thread
From: Jann Horn @ 2018-10-13  0:09 UTC (permalink / raw)
  To: rick.p.edgecombe
  Cc: linux-fsdevel, kernel list, Daniel Borkmann, Kees Cook, jeyu,
	Arjan van de Ven, linux-mips, Thomas Gleixner, linux-s390,
	the arch/x86 maintainers, kristen, deneen.t.dock,
	Catalin Marinas, Ingo Molnar, Will Deacon, Kernel Hardening,
	Borislav Petkov, Dave Hansen, linux-arm-kernel, David S. Miller,
	linux-arch, Arnd Bergmann, sparclinux

On Sat, Oct 13, 2018 at 2:04 AM Edgecombe, Rick P
<rick.p.edgecombe@intel.com> wrote:
> On Fri, 2018-10-12 at 19:22 +0200, Jann Horn wrote:
> > On Fri, Oct 12, 2018 at 7:04 PM Edgecombe, Rick P
> > <rick.p.edgecombe@intel.com> wrote:
> > > On Fri, 2018-10-12 at 02:35 +0200, Jann Horn wrote:
> > > > Why all the rbtree stuff instead of stashing a pointer in struct
> > > > vmap_area, or something like that?
> > >
> > > Since the tracking was not for all vmalloc usage, the intention was to not
> > > bloat
> > > the structure for other usages likes stacks. I thought usually there
> > > wouldn't be
> > > nearly as much module space allocations as there would be kernel stacks, but
> > > I
> > > didn't do any actual measurements on the tradeoffs.
> >
> > I imagine that one extra pointer in there - pointing to your struct
> > mod_alloc_user - would probably not be terrible. 8 bytes more per
> > kernel stack shouldn't be so bad?
>
> I looked into this and it starts to look a little messy. The nommu.c version of
> vmalloc doesn't use or expose access to vmap_area or vm_struct. So it starts to
> look like a bunch of IFDEFs to remove the rlimit in the nommu case or making a
> stand in that maintains pretend vm struct's in nommu.c. I had actually
> previously tried to at least pull the allocations size from vmalloc structs, but it broke on nommu.
>
> Thought I would check back and see. How important do you think this is?

I don't think it's important - I just thought that it would be nice to
avoid the extra complexity if it is easily avoidable.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/7] modules: Create rlimit for module space
  2018-10-12 17:04     ` Edgecombe, Rick P
  2018-10-12 17:22       ` Jann Horn
@ 2018-10-23 11:32       ` Michal Hocko
  1 sibling, 0 replies; 19+ messages in thread
From: Michal Hocko @ 2018-10-23 11:32 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: jannh, linux-kernel, daniel, keescook, jeyu, linux-arch, arjan,
	tglx, linux-mips, linux-s390, x86, kristen, Dock, Deneen T,
	catalin.marinas, mingo, will.deacon, kernel-hardening, bp,
	Hansen, Dave, linux-arm-kernel, davem, arnd, linux-fsdevel,
	sparclinux

On Fri 12-10-18 17:04:46, Edgecombe, Rick P wrote:
[...]
> Any thoughts on if instead of all this there was just a system wide limit on BPF
> JIT module space usage?

We do allow to charge vmalloc memory to a memory cgroup. Isn't that a
way forward?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-10-23 11:32 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-11 23:31 [PATCH v2 0/7] Rlimit for module space Rick Edgecombe
2018-10-11 23:31 ` [PATCH v2 1/7] modules: Create rlimit " Rick Edgecombe
2018-10-12  0:35   ` Jann Horn
2018-10-12 17:04     ` Edgecombe, Rick P
2018-10-12 17:22       ` Jann Horn
2018-10-13  0:04         ` Edgecombe, Rick P
2018-10-13  0:09           ` Jann Horn
2018-10-23 11:32       ` Michal Hocko
2018-10-12 18:23     ` Jann Horn
2018-10-11 23:31 ` [PATCH v2 2/7] x86/modules: Add rlimit checking for x86 modules Rick Edgecombe
2018-10-11 23:31 ` [PATCH v2 3/7] arm/modules: Add rlimit checking for arm modules Rick Edgecombe
2018-10-11 23:31 ` [PATCH v2 4/7] arm64/modules: Add rlimit checking for arm64 modules Rick Edgecombe
2018-10-11 23:47   ` Dave Hansen
2018-10-12 14:32     ` Jessica Yu
2018-10-12 22:01       ` Edgecombe, Rick P
2018-10-12 22:54         ` Edgecombe, Rick P
2018-10-11 23:31 ` [PATCH v2 5/7] mips/modules: Add rlimit checking for mips modules Rick Edgecombe
2018-10-11 23:31 ` [PATCH v2 6/7] sparc/modules: Add rlimit for sparc modules Rick Edgecombe
2018-10-11 23:31 ` [PATCH v2 7/7] s390/modules: Add rlimit checking for s390 modules Rick Edgecombe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).