All of lore.kernel.org
 help / color / mirror / Atom feed
* [kernel-hardening] [RFC][PATCH v2 0/4] mm: Hardened usercopy
@ 2016-06-08 21:11 Kees Cook
  2016-06-08 21:11 ` [kernel-hardening] [PATCH v2 1/4] " Kees Cook
                   ` (7 more replies)
  0 siblings, 8 replies; 26+ messages in thread
From: Kees Cook @ 2016-06-08 21:11 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Kees Cook, Brad Spengler, PaX Team, Casey Schaufler,
	Rik van Riel, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton

Hi,

This is v2 of the RFC patches for a mainline port of PAX_USERCOPY. After
I started writing tests for Casey's earlier port[1], I kept fixing things
further and further until I ended up with a whole new patch series. To
that end, I also took Rik's feedback and made a number of other changes
and clean-ups, which are noted in the "v2" history at the end.

Based on my understanding (please correct anything I haven't understood
correctly), PAX_USERCOPY was designed to catch a few classes of flaws
around the use of copy_to_user/copy_from_user. These changes don't touch
get_user and put_user, since those operate on constant sized lengths,
and tend to be much less vulnerable. The goal is protecting against
flawed lengths, and then also against manipulated addresses.

There are effectively three distinct protections in this series, each of
which I've given a separate CONFIG. (Generally speaking, PAX_USERCOPY
covers both CONFIG_HARDENED_USERCOPY and CONFIG_HARDENED_USERCOPY_WHITELIST,
and PAX_USERCOPY_SLABS covers CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC.) The
new CONFIGs are:


CONFIG_HARDENED_USERCOPY checks that objects being copied to/from userspace
meet certain criteria:
- if address is a heap object, the size must not exceed the object's
  allocated size. (This will catch all kinds of heap overflow flaws.)
- if address range is in the current process stack, it must be within the
  current stack frame (if such checking is possible) or at least entirely
  within the current process's stack. (This could catch large lengths that
  would have extended beyond the current process stack.)
- if address is within the kernel text, it is rejected.
- everything else is accepted (which includes vmalloc, other process
  stacks, etc). This isn't a perfect checker, but it entirely closes
  several classes of usercopy flaws.


CONFIG_HARDENED_USERCOPY_WHITELIST adds the SLUB_USERCOPY flag to kernel
memory caches. This is used to vastly reduce the number of heap objects
that can be copied to userspace by whitelisting only those that are deemed
a "tolerable risk" to be copied. Due to kmalloc's extensive use in the
kernel, all of the kmalloc caches are marked with SLUB_USERCOPY. Beyond
those, the other caches marked for SLUB_USERCOPY are: cifs_request,
cifs_small_rq, names_cache, jfs_ip, thread_info, and kvm_vcpu. (In my
test VM, there are about 160 non-kmalloc caches in /proc/slabinfo, so
this whitelisting provides significant coverage.) With these markings
in places, the CONFIG_HARDENED_USERCOPY heap object check above gains an
additional rule:
- if address is a heap object, it must be flagged with SLUB_USERCOPY

One thing I was unable to easily solve was that "filldir" wants to read
from the "dcache" cache, which is not whitelisted in grsecurity, yet seems
to work there. I haven't figured out why, and I hope someone can help
point me in the right direction. In the meantime, I have whitelisted it.

After this minimal whitelist, this leaves a number of places in the
kernel where information needs to be moved in and out of userspace,
but without exposing an entire kernel cache that is not sensible to
whitelist. In these places, we need to adjust how the kernel handles the
copying. In my local testing, I uncovered several of them, and went to
the grsecurity/PaX patches[2] to see how they were fixed. They were:

- Changed exec to use copy of auxv instead of copying from "mm_struct"
  cache.
- Changed signal handling to use an on-stack copy of signal frames instead
  of direct copying from "task_struct" cache.
- Changed name_to_handle to use put_user (which operates on fixed sizes)
  instead of copy_to_user from "mnt_cache" cache.
- Changed readlink to use stack for in-struct names to avoid copying from
  filesystem cache (e.g. "ext4_inode_cache")
- Changed sg ioctl to bounce commands through stack instead of directly
  copying to/from "blkdev_requests" cache.

Since there are probably plenty more, it seemed prudent to split the
whitelisting into a separate CONFIG so that people could turn it on
or off depending on their desire to help find and fix these.


CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC is a less obvious protection, and
is only partially related to the prior two protections. Several userspace
APIs (e.g. ipc, seq_file) allow precise control over the size of kernel
kmallocs, which provides a trivial way to perform heap overflow attacks
where the attacker must control neighboring allocations of a particular size.
Instead, move these APIs into their own caches so they cannot interfere
with the standard kmallocs. This makes heap attacks more difficult since
easily controlled heap allocations are now isolated.


The patches in the series are:
- [PATCH 1/4] mm: Hardened usercopy
  This contains all of CONFIG_HARDENED_USERCOPY.
- [PATCH 2/4] usercopy: avoid direct copying to userspace
  This contains various kernel memory copying cleanups in support of
  CONFIG_HARDENED_USERCOPY_WHITELIST
- [PATCH 3/4] usercopy: whitelist user-copyable caches
  This contains all of CONFIG_HARDENED_USERCOPY_WHITELIST
- [PATCH 4/4] usercopy: provide split of user-controlled slabs
  This contains all of CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC


Some notes:

- I couldn't detect a measurable performance change with these features
  enabled. Kernel build times were unchanged, hackbench was unchanged,
  etc.

- SLOB seems entirely broken. I have no idea what's going on there; I
  spent my time testing SLAB and SLUB mostly.

- Ultimately, I think CONFIG_HARDENED_USERCOPY should be a no-brainer
  to enable by default. CONFIG_HARDENED_USERCOPY_WHITELIST will take
  a little time to shake out all the needed tweaks, but after that,
  it too should be enabled by default. I'm not sure how to reason
  about ..._SPLIT_KMALLOC since it needs slab merging disabled to
  be fully effective. I would need to understand the benefits of
  slab merging better to give a reasonable suggestion here.


Open questions, which hopefully spender or pageexec can help answer:

- What rationale has been used to select caches to whitelist with
  SLAB_USERCOPY? I'd like to codify something that people can use to
  guide them in their decisions.

- Why does CONFIG_PAX_USERCOPY_SLABS wrap the slub SLAB_ATTR_RO(usercopy)?
  It seems like the SLUB_USERCOPY flag would be generally interesting,
  and should be visible under CONFIG_PAX_USERCOPY instead.

- Why, in most architectures, when adding check_object_size() to the
  copy_*_user implementations, is an explicit "if (!__builtin_constant_p(n))"
  used? Is this just an optimization that assumes only non-constant
  sizes need protecting? It seems to me that it shouldn't be there since
  the address may have been manipulated too, not just the size.

- What is happening in the SLOB allocator? It seems to lack the
  SLAB_USERCOPY flag check entirely, and has a boatload of changes
  only under CONFIG_PAX_USERCOPY_SLABS. What's going on here?

- What allows filldir to work happily on grsecurity when "dcache" isn't
  whitelisted with SLAB_USERCOPY?


Things to do:

- "dcache" cache shouldn't be whitelisted.

- SLOB support needs to be fixed or removed.

- Any other fixups for non-whitelisted cache copying need to be
  rediscovered and extracted from grsecurity. (Seriously, if you hit one,
  read the stack dump, download grsecurity[2], and see if you find changes
  in the function that handles the copying differently.)

- Determine how to reason about the need for slab merging to be fully
  disabled vs partially disabled. Can someone explain to me what
  slab merging is actually doing? Is it merging across different
  caches? If so, it needs to stay fully disabled under ..._SPLIT_KMALLOC
  otherwise the benefit of forcing separated caches disappears.

- Needs porting to arm64.

- Needs per-architecture support for stack frame checking (only x86 now).

- Need to extract the drivers/char/mem.c changes for ..._SPLIT_KMALLOC.

- Testing testing


Thanks!

-Kees

v2:
- moved common checking logic from fs/exec.c to mm/usercopy.c
- fixed missing x86_64 copy_*_user checks.
- fixed typos in sparc and ia64 checks.
- changed report text from "leak" to "exposure".
- dropped #ifdefs in favor of empty static inline functions.
- consolidated common heap checks into a parent function.
- added many more comments.
- dropped curr_ip (IPv4 address) since it's unused by mainline.

v1:
- Casey's initial extraction[1].

[1] http://www.openwall.com/lists/kernel-hardening/2016/05/19/5
[2] https://grsecurity.net/download.php "grsecurity - test kernel patch"

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [kernel-hardening] [PATCH v2 1/4] mm: Hardened usercopy
  2016-06-08 21:11 [kernel-hardening] [RFC][PATCH v2 0/4] mm: Hardened usercopy Kees Cook
@ 2016-06-08 21:11 ` Kees Cook
  2016-06-09  0:47   ` [kernel-hardening] " Brad Spengler
  2016-07-12 23:04   ` Kees Cook
  2016-06-08 21:11 ` [kernel-hardening] [PATCH v2 2/4] usercopy: avoid direct copying to userspace Kees Cook
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 26+ messages in thread
From: Kees Cook @ 2016-06-08 21:11 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Kees Cook, Brad Spengler, PaX Team, Casey Schaufler,
	Rik van Riel, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton

This is an attempt at porting PAX_USERCOPY into the mainline kernel,
calling it CONFIG_HARDENED_USERCOPY. The work is based on code by Brad
Spengler and PaX Team, and an earlier port from Casey Schaufler.

This patch contains the logic for validating several conditions when
performing copy_to_user() and copy_from_user() on the kernel object
being copied to/from:
- if on the heap:
  - the size of copy must be less than or equal to the size of the object
- if on the stack (and we have architecture/build support for frames):
  - object must be contained by the current stack frame
- object must not be contained in the kernel text

Additional restrictions are in following patches.

This implements the checks on many architectures, but I have only tested
x86_64 so far. I would love to see an arm64 port added as well.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/arm/include/asm/uaccess.h      |   5 +
 arch/ia64/include/asm/uaccess.h     |  18 +++-
 arch/powerpc/include/asm/uaccess.h  |  21 ++++-
 arch/sparc/include/asm/uaccess_32.h |  14 ++-
 arch/sparc/include/asm/uaccess_64.h |  11 ++-
 arch/x86/include/asm/uaccess.h      |  10 +-
 arch/x86/include/asm/uaccess_32.h   |   2 +
 arch/x86/include/asm/uaccess_64.h   |   2 +
 include/linux/slab.h                |   5 +
 include/linux/thread_info.h         |  15 +++
 mm/Makefile                         |   1 +
 mm/slab.c                           |  29 ++++++
 mm/slob.c                           |  51 +++++++++++
 mm/slub.c                           |  17 ++++
 mm/usercopy.c                       | 177 ++++++++++++++++++++++++++++++++++++
 security/Kconfig                    |  11 +++
 16 files changed, 374 insertions(+), 15 deletions(-)
 create mode 100644 mm/usercopy.c

diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h
index 35c9db857ebe..7bcdb56ce6fb 100644
--- a/arch/arm/include/asm/uaccess.h
+++ b/arch/arm/include/asm/uaccess.h
@@ -497,6 +497,8 @@ static inline unsigned long __must_check
 __copy_from_user(void *to, const void __user *from, unsigned long n)
 {
 	unsigned int __ua_flags = uaccess_save_and_enable();
+
+	check_object_size(to, n, false);
 	n = arm_copy_from_user(to, from, n);
 	uaccess_restore(__ua_flags);
 	return n;
@@ -512,10 +514,13 @@ __copy_to_user(void __user *to, const void *from, unsigned long n)
 {
 #ifndef CONFIG_UACCESS_WITH_MEMCPY
 	unsigned int __ua_flags = uaccess_save_and_enable();
+
+	check_object_size(to, n, false);
 	n = arm_copy_to_user(to, from, n);
 	uaccess_restore(__ua_flags);
 	return n;
 #else
+	check_object_size(to, n, false);
 	return arm_copy_to_user(to, from, n);
 #endif
 }
diff --git a/arch/ia64/include/asm/uaccess.h b/arch/ia64/include/asm/uaccess.h
index 2189d5ddc1ee..465c70982f40 100644
--- a/arch/ia64/include/asm/uaccess.h
+++ b/arch/ia64/include/asm/uaccess.h
@@ -241,12 +241,18 @@ extern unsigned long __must_check __copy_user (void __user *to, const void __use
 static inline unsigned long
 __copy_to_user (void __user *to, const void *from, unsigned long count)
 {
+	if (!__builtin_constant_p(count))
+		check_object_size(from, count, true);
+
 	return __copy_user(to, (__force void __user *) from, count);
 }
 
 static inline unsigned long
 __copy_from_user (void *to, const void __user *from, unsigned long count)
 {
+	if (!__builtin_constant_p(count))
+		check_object_size(to, count, false);
+
 	return __copy_user((__force void __user *) to, from, count);
 }
 
@@ -258,8 +264,11 @@ __copy_from_user (void *to, const void __user *from, unsigned long count)
 	const void *__cu_from = (from);							\
 	long __cu_len = (n);								\
 											\
-	if (__access_ok(__cu_to, __cu_len, get_fs()))					\
-		__cu_len = __copy_user(__cu_to, (__force void __user *) __cu_from, __cu_len);	\
+	if (__access_ok(__cu_to, __cu_len, get_fs())) {					\
+		if (!__builtin_constant_p(n))						\
+			check_object_size(__cu_from, __cu_len, true);			\
+		__cu_len = __copy_user(__cu_to, (__force void __user *)  __cu_from, __cu_len);	\
+	}										\
 	__cu_len;									\
 })
 
@@ -270,8 +279,11 @@ __copy_from_user (void *to, const void __user *from, unsigned long count)
 	long __cu_len = (n);								\
 											\
 	__chk_user_ptr(__cu_from);							\
-	if (__access_ok(__cu_from, __cu_len, get_fs()))					\
+	if (__access_ok(__cu_from, __cu_len, get_fs())) {				\
+		if (!__builtin_constant_p(n))						\
+			check_object_size(__cu_to, __cu_len, false);			\
 		__cu_len = __copy_user((__force void __user *) __cu_to, __cu_from, __cu_len);	\
+	}										\
 	__cu_len;									\
 })
 
diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h
index b7c20f0b8fbe..c1dc6c14deb8 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -310,10 +310,15 @@ static inline unsigned long copy_from_user(void *to,
 {
 	unsigned long over;
 
-	if (access_ok(VERIFY_READ, from, n))
+	if (access_ok(VERIFY_READ, from, n)) {
+		if (!__builtin_constant_p(n))
+			check_object_size(to, n, false);
 		return __copy_tofrom_user((__force void __user *)to, from, n);
+	}
 	if ((unsigned long)from < TASK_SIZE) {
 		over = (unsigned long)from + n - TASK_SIZE;
+		if (!__builtin_constant_p(n - over))
+			check_object_size(to, n - over, false);
 		return __copy_tofrom_user((__force void __user *)to, from,
 				n - over) + over;
 	}
@@ -325,10 +330,15 @@ static inline unsigned long copy_to_user(void __user *to,
 {
 	unsigned long over;
 
-	if (access_ok(VERIFY_WRITE, to, n))
+	if (access_ok(VERIFY_WRITE, to, n)) {
+		if (!__builtin_constant_p(n))
+			check_object_size(from, n, true);
 		return __copy_tofrom_user(to, (__force void __user *)from, n);
+	}
 	if ((unsigned long)to < TASK_SIZE) {
 		over = (unsigned long)to + n - TASK_SIZE;
+		if (!__builtin_constant_p(n))
+			check_object_size(from, n - over, true);
 		return __copy_tofrom_user(to, (__force void __user *)from,
 				n - over) + over;
 	}
@@ -372,6 +382,10 @@ static inline unsigned long __copy_from_user_inatomic(void *to,
 		if (ret == 0)
 			return 0;
 	}
+
+	if (!__builtin_constant_p(n))
+		check_object_size(to, n, false);
+
 	return __copy_tofrom_user((__force void __user *)to, from, n);
 }
 
@@ -398,6 +412,9 @@ static inline unsigned long __copy_to_user_inatomic(void __user *to,
 		if (ret == 0)
 			return 0;
 	}
+	if (!__builtin_constant_p(n))
+		check_object_size(from, n, true);
+
 	return __copy_tofrom_user(to, (__force const void __user *)from, n);
 }
 
diff --git a/arch/sparc/include/asm/uaccess_32.h b/arch/sparc/include/asm/uaccess_32.h
index 57aca2792d29..341a5a133f48 100644
--- a/arch/sparc/include/asm/uaccess_32.h
+++ b/arch/sparc/include/asm/uaccess_32.h
@@ -248,22 +248,28 @@ unsigned long __copy_user(void __user *to, const void __user *from, unsigned lon
 
 static inline unsigned long copy_to_user(void __user *to, const void *from, unsigned long n)
 {
-	if (n && __access_ok((unsigned long) to, n))
+	if (n && __access_ok((unsigned long) to, n)) {
+		if (!__builtin_constant_p(n))
+			check_object_size(from, n, true);
 		return __copy_user(to, (__force void __user *) from, n);
-	else
+	} else
 		return n;
 }
 
 static inline unsigned long __copy_to_user(void __user *to, const void *from, unsigned long n)
 {
+	if (!__builtin_constant_p(n))
+		check_object_size(from, n, true);
 	return __copy_user(to, (__force void __user *) from, n);
 }
 
 static inline unsigned long copy_from_user(void *to, const void __user *from, unsigned long n)
 {
-	if (n && __access_ok((unsigned long) from, n))
+	if (n && __access_ok((unsigned long) from, n)) {
+		if (!__builtin_constant_p(n))
+			check_object_size(to, n, false);
 		return __copy_user((__force void __user *) to, from, n);
-	else
+	} else
 		return n;
 }
 
diff --git a/arch/sparc/include/asm/uaccess_64.h b/arch/sparc/include/asm/uaccess_64.h
index e9a51d64974d..8bda94fab8e8 100644
--- a/arch/sparc/include/asm/uaccess_64.h
+++ b/arch/sparc/include/asm/uaccess_64.h
@@ -210,8 +210,12 @@ unsigned long copy_from_user_fixup(void *to, const void __user *from,
 static inline unsigned long __must_check
 copy_from_user(void *to, const void __user *from, unsigned long size)
 {
-	unsigned long ret = ___copy_from_user(to, from, size);
+	unsigned long ret;
 
+	if (!__builtin_constant_p(size))
+		check_object_size(to, size, false);
+
+	ret = ___copy_from_user(to, from, size);
 	if (unlikely(ret))
 		ret = copy_from_user_fixup(to, from, size);
 
@@ -227,8 +231,11 @@ unsigned long copy_to_user_fixup(void __user *to, const void *from,
 static inline unsigned long __must_check
 copy_to_user(void __user *to, const void *from, unsigned long size)
 {
-	unsigned long ret = ___copy_to_user(to, from, size);
+	unsigned long ret;
 
+	if (!__builtin_constant_p(size))
+		check_object_size(from, size, true);
+	ret = ___copy_to_user(to, from, size);
 	if (unlikely(ret))
 		ret = copy_to_user_fixup(to, from, size);
 	return ret;
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 2982387ba817..aa9cc58409c6 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -742,9 +742,10 @@ copy_from_user(void *to, const void __user *from, unsigned long n)
 	 * case, and do only runtime checking for non-constant sizes.
 	 */
 
-	if (likely(sz < 0 || sz >= n))
+	if (likely(sz < 0 || sz >= n)) {
+		check_object_size(to, n, false);
 		n = _copy_from_user(to, from, n);
-	else if(__builtin_constant_p(n))
+	} else if(__builtin_constant_p(n))
 		copy_from_user_overflow();
 	else
 		__copy_from_user_overflow(sz, n);
@@ -762,9 +763,10 @@ copy_to_user(void __user *to, const void *from, unsigned long n)
 	might_fault();
 
 	/* See the comment in copy_from_user() above. */
-	if (likely(sz < 0 || sz >= n))
+	if (likely(sz < 0 || sz >= n)) {
+		check_object_size(from, n, true);
 		n = _copy_to_user(to, from, n);
-	else if(__builtin_constant_p(n))
+	} else if(__builtin_constant_p(n))
 		copy_to_user_overflow();
 	else
 		__copy_to_user_overflow(sz, n);
diff --git a/arch/x86/include/asm/uaccess_32.h b/arch/x86/include/asm/uaccess_32.h
index 4b32da24faaf..7d3bdd1ed697 100644
--- a/arch/x86/include/asm/uaccess_32.h
+++ b/arch/x86/include/asm/uaccess_32.h
@@ -37,6 +37,7 @@ unsigned long __must_check __copy_from_user_ll_nocache_nozero
 static __always_inline unsigned long __must_check
 __copy_to_user_inatomic(void __user *to, const void *from, unsigned long n)
 {
+	check_object_size(from, n, true);
 	return __copy_to_user_ll(to, from, n);
 }
 
@@ -95,6 +96,7 @@ static __always_inline unsigned long
 __copy_from_user(void *to, const void __user *from, unsigned long n)
 {
 	might_fault();
+	check_object_size(to, n, false);
 	if (__builtin_constant_p(n)) {
 		unsigned long ret;
 
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index 2eac2aa3e37f..673059a109fe 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -54,6 +54,7 @@ int __copy_from_user_nocheck(void *dst, const void __user *src, unsigned size)
 {
 	int ret = 0;
 
+	check_object_size(dst, size, false);
 	if (!__builtin_constant_p(size))
 		return copy_user_generic(dst, (__force void *)src, size);
 	switch (size) {
@@ -119,6 +120,7 @@ int __copy_to_user_nocheck(void __user *dst, const void *src, unsigned size)
 {
 	int ret = 0;
 
+	check_object_size(src, size, true);
 	if (!__builtin_constant_p(size))
 		return copy_user_generic((__force void *)dst, src, size);
 	switch (size) {
diff --git a/include/linux/slab.h b/include/linux/slab.h
index aeb3e6d00a66..5c0cd75b2d07 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -155,6 +155,11 @@ void kfree(const void *);
 void kzfree(const void *);
 size_t ksize(const void *);
 
+#ifdef CONFIG_HARDENED_USERCOPY
+const char *__check_heap_object(const void *ptr, unsigned long n,
+				struct page *page);
+#endif
+
 /*
  * Some archs want to perform DMA into kmalloc caches and need a guaranteed
  * alignment larger than the alignment of a 64-bit integer.
diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index b4c2a485b28a..a02200db9c33 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -146,6 +146,21 @@ static inline bool test_and_clear_restore_sigmask(void)
 #error "no set_restore_sigmask() provided and default one won't work"
 #endif
 
+#ifdef CONFIG_HARDENED_USERCOPY
+extern void __check_object_size(const void *ptr, unsigned long n,
+					bool to_user);
+
+static inline void check_object_size(const void *ptr, unsigned long n,
+				     bool to_user)
+{
+	__check_object_size(ptr, n, to_user);
+}
+#else
+static inline void check_object_size(const void *ptr, unsigned long n,
+				     bool to_user)
+{ }
+#endif /* CONFIG_HARDENED_USERCOPY */
+
 #endif	/* __KERNEL__ */
 
 #endif /* _LINUX_THREAD_INFO_H */
diff --git a/mm/Makefile b/mm/Makefile
index 78c6f7dedb83..a359cd9aa759 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -99,3 +99,4 @@ obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
 obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
 obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
 obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
+obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
diff --git a/mm/slab.c b/mm/slab.c
index cc8bbc1e6bc9..4cb2e5408625 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -4477,6 +4477,35 @@ static int __init slab_proc_init(void)
 module_init(slab_proc_init);
 #endif
 
+#ifdef CONFIG_HARDENED_USERCOPY
+/*
+ * Rejects objects that are:
+ * - NULL or zero-allocated
+ * - incorrectly sized
+ *
+ * Returns NULL if check passes, otherwise const char * to name of cache
+ * to indicate an error.
+ */
+const char *__check_heap_object(const void *ptr, unsigned long n,
+				struct page *page)
+{
+	struct kmem_cache *cachep;
+	unsigned int objnr;
+	unsigned long offset;
+
+	cachep = page->slab_cache;
+
+	objnr = obj_to_index(cachep, page, (void *)ptr);
+	BUG_ON(objnr >= cachep->num);
+	offset = ptr - index_to_obj(cachep, page, objnr) - obj_offset(cachep);
+
+	if (offset <= cachep->object_size && n <= cachep->object_size - offset)
+		return NULL;
+
+	return cachep->name;
+}
+#endif /* CONFIG_HARDENED_USERCOPY */
+
 /**
  * ksize - get the actual amount of memory allocated for a given object
  * @objp: Pointer to the object
diff --git a/mm/slob.c b/mm/slob.c
index 5ec158054ffe..2d54fcd262fa 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -501,6 +501,57 @@ void kfree(const void *block)
 }
 EXPORT_SYMBOL(kfree);
 
+#ifdef CONFIG_HARDENED_USERCOPY
+const char *__check_heap_object(const void *ptr, unsigned long n,
+				struct page *page)
+{
+	const slob_t *free;
+	const void *base;
+	unsigned long flags;
+
+	if (page->private) {
+		base = page;
+		if (base <= ptr && n <= page->private - (ptr - base))
+			return NULL;
+		return "<slob>";
+	}
+
+	/* some tricky double walking to find the chunk */
+	spin_lock_irqsave(&slob_lock, flags);
+	base = (void *)((unsigned long)ptr & PAGE_MASK);
+	free = page->freelist;
+
+	while (!slob_last(free) && (void *)free <= ptr) {
+		base = free + slob_units(free);
+		free = slob_next(free);
+	}
+
+	while (base < (void *)free) {
+		slobidx_t m = ((slob_t *)base)[0].units, align = ((slob_t *)base)[1].units;
+		int size = SLOB_UNIT * SLOB_UNITS(m + align);
+		int offset;
+
+		if (ptr < base + align)
+			break;
+
+		offset = ptr - base - align;
+		if (offset >= m) {
+			base += size;
+			continue;
+		}
+
+		if (n > m - offset)
+			break;
+
+		spin_unlock_irqrestore(&slob_lock, flags);
+		return NULL;
+	}
+
+	spin_unlock_irqrestore(&slob_lock, flags);
+	return "<slob>";
+}
+#endif /* CONFIG_HARDENED_USERCOPY */
+
 /* can't use ksize for kmem_cache_alloc memory, only kmalloc */
 size_t ksize(const void *block)
 {
diff --git a/mm/slub.c b/mm/slub.c
index 825ff4505336..83d3cbc7adf8 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3614,6 +3614,23 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node)
 EXPORT_SYMBOL(__kmalloc_node);
 #endif
 
+#ifdef CONFIG_HARDENED_USERCOPY
+const char *__check_heap_object(const void *ptr, unsigned long n,
+				struct page *page)
+{
+	struct kmem_cache *s;
+	unsigned long offset;
+
+	s = page->slab_cache;
+
+	offset = (ptr - page_address(page)) % s->size;
+	if (offset <= s->object_size && n <= s->object_size - offset)
+		return NULL;
+
+	return s->name;
+}
+#endif /* CONFIG_HARDENED_USERCOPY */
+
 static size_t __ksize(const void *object)
 {
 	struct page *page;
diff --git a/mm/usercopy.c b/mm/usercopy.c
new file mode 100644
index 000000000000..e09c33070759
--- /dev/null
+++ b/mm/usercopy.c
@@ -0,0 +1,177 @@
+/*
+ * This implements the various checks for CONFIG_HARDENED_USERCOPY*,
+ * which are designed to protect kernel memory from needless exposure
+ * and overwrite under many conditions.
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <asm/sections.h>
+
+/*
+ * Checks if a given pointer and length is contained by the current
+ * stack frame (if possible).
+ *
+ *	0: not at all on the stack
+ *	1: fully on the stack (when can't do frame-checking)
+ *	2: fully inside the current stack frame
+ *	-1: error condition (invalid stack position or bad stack frame)
+ */
+static noinline int check_stack_object(const void *obj, unsigned long len)
+{
+	const void * const stack = task_stack_page(current);
+	const void * const stackend = stack + THREAD_SIZE;
+
+#if defined(CONFIG_FRAME_POINTER) && defined(CONFIG_X86)
+	const void *frame = NULL;
+	const void *oldframe;
+#endif
+
+	/* Reject: object wraps past end of memory. */
+	if (obj + len < obj)
+		return -1;
+
+	/* Object is not on the stack at all. */
+	if (obj + len <= stack || stackend <= obj)
+		return 0;
+
+	/*
+	 * Reject: object partially overlaps the stack (passing the
+	 * the check above means at least one end is within the stack,
+	 * so if this check fails, the other end is outside the stack).
+	 */
+	if (obj < stack || stackend < obj + len)
+		return -1;
+
+#if defined(CONFIG_FRAME_POINTER) && defined(CONFIG_X86)
+	oldframe = __builtin_frame_address(1);
+	if (oldframe)
+		frame = __builtin_frame_address(2);
+	/*
+	 * low ----------------------------------------------> high
+	 * [saved bp][saved ip][args][local vars][saved bp][saved ip]
+	 *		     ^----------------^
+	 *             allow copies only within here
+	 */
+	while (stack <= frame && frame < stackend) {
+		/*
+		 * If obj + len extends past the last frame, this
+		 * check won't pass and the next frame will be 0,
+		 * causing us to bail out and correctly report
+		 * the copy as invalid.
+		 */
+		if (obj + len <= frame)
+			return obj >= oldframe + 2 * sizeof(void *) ? 2 : -1;
+		oldframe = frame;
+		frame = *(const void * const *)frame;
+	}
+	return -1;
+#else
+	return 1;
+#endif
+}
+
+static void report_usercopy(const void *ptr, unsigned long len,
+			    bool to_user, const char *type)
+{
+	pr_emerg("kernel memory %s attempt detected %s %p (%s) (%lu bytes)\n",
+		to_user ? "exposure" : "overwrite",
+		to_user ? "from" : "to", ptr, type ? : "unknown", len);
+	dump_stack();
+	do_group_exit(SIGKILL);
+}
+
+/* Is this address range (low, high) in the kernel text area? */
+static inline bool check_kernel_text_object(const void *ptr, unsigned long n)
+{
+	unsigned long low = (unsigned long)ptr;
+	unsigned long high = low + n;
+	unsigned long textlow = (unsigned long)_stext;
+	unsigned long texthigh = (unsigned long)_etext;
+
+#ifdef CONFIG_X86_64
+	/* Check against linear mapping as well. */
+	if (high > (unsigned long)__va(__pa(textlow)) &&
+	    low < (unsigned long)__va(__pa(texthigh)))
+		return true;
+#endif
+
+	/*
+	 * Unless we're entirely below or entirely above the kernel text,
+	 * we've overlapped.
+	 */
+	if (high <= textlow || low >= texthigh)
+		return false;
+	else
+		return true;
+}
+
+static inline const char *check_heap_object(const void *ptr, unsigned long n)
+{
+	struct page *page;
+
+	if (ZERO_OR_NULL_PTR(ptr))
+		return "<null>";
+
+	if (!virt_addr_valid(ptr))
+		return NULL;
+
+	page = virt_to_head_page(ptr);
+	if (!PageSlab(page))
+		return NULL;
+
+	/* Check allocator for flags and size. */
+	return __check_heap_object(ptr, n, page);
+}
+
+/*
+ * Validates that the given object is one of:
+ * - known safe heap object
+ * - known safe stack object
+ * - not in kernel text
+ */
+void __check_object_size(const void *ptr, unsigned long n, bool to_user)
+{
+	const char *err;
+
+#if !defined(CONFIG_STACK_GROWSUP) && !defined(CONFIG_X86_64)
+	unsigned long stackstart = (unsigned long)task_stack_page(current);
+	unsigned long currentsp = (unsigned long)&stackstart;
+	if (unlikely((currentsp < stackstart + 512 ||
+		     currentsp >= stackstart + THREAD_SIZE) && !in_interrupt()))
+		BUG();
+#endif
+	if (!n)
+		return;
+
+	/* Check for bad heap object. */
+	err = check_heap_object(ptr, n);
+	if (!err) {
+		/* Check for bad stack object. */
+		int ret = check_stack_object(ptr, n);
+		if (ret == 1 || ret == 2) {
+			/*
+			 * Object is either in the correct frame (when it
+			 * is possible to check) or just generally on the
+			 * on the process stack (when frame checking not
+			 * available).
+			 */
+			return;
+		}
+		if (ret == 0) {
+			/*
+			 * Object is not on the heap and not on the stack.
+			 * Double-check that it's not in the kernel text.
+			 */
+			if (check_kernel_text_object(ptr, n))
+				err = "<kernel text>";
+			else
+				return;
+		} else
+			err = "<process stack>";
+	}
+
+	report_usercopy(ptr, n, to_user, err);
+}
+EXPORT_SYMBOL(__check_object_size);
diff --git a/security/Kconfig b/security/Kconfig
index 176758cdfa57..081607a5e078 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -118,6 +118,17 @@ config LSM_MMAP_MIN_ADDR
 	  this low address space will need the permission specific to the
 	  systems running LSM.
 
+config HARDENED_USERCOPY
+	bool "Harden memory copies between kernel and userspace"
+	default n
+	help
+	  This option checks for obviously wrong memory regions when
+	  calling copy_to_user() and copy_from_user() by rejecting
+	  copies that are larger than the specified heap object, are
+	  not on the process stack, or are part of the kernel text.
+	  This kills entire classes of heap overflows and similar
+	  kernel memory exposures.
+
 source security/selinux/Kconfig
 source security/smack/Kconfig
 source security/tomoyo/Kconfig
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [kernel-hardening] [PATCH v2 2/4] usercopy: avoid direct copying to userspace
  2016-06-08 21:11 [kernel-hardening] [RFC][PATCH v2 0/4] mm: Hardened usercopy Kees Cook
  2016-06-08 21:11 ` [kernel-hardening] [PATCH v2 1/4] " Kees Cook
@ 2016-06-08 21:11 ` Kees Cook
  2016-06-09 23:37   ` [kernel-hardening] " Rik van Riel
  2016-06-10 21:09   ` Kees Cook
  2016-06-08 21:11 ` [kernel-hardening] [PATCH v2 3/4] usercopy: whitelist user-copyable caches Kees Cook
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 26+ messages in thread
From: Kees Cook @ 2016-06-08 21:11 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Kees Cook, Brad Spengler, PaX Team, Casey Schaufler,
	Rik van Riel, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton

Some non-whitelisted heap memory has small areas that need to be copied
to userspace. For these cases, explicitly copy the needed contents out
to stack first before sending to userspace. This lets their respective
caches remain un-whitelisted (i.e. no SLAB_USERCOPY), since the bulk of
their contents should not be exposed to userspace.

These changes, based on code by Brad Spengler and PaX Team, were extracted
from grsecurity/PaX on a case-by-case basis as I ran into errors during
testing of CONFIG_HARDENED_USERCOPY_WHITELIST:

Changed exec to use copy of auxv instead of copying from "mm_struct" cache.

Changed signal handling to use an on-stack copy of signal frames instead
of direct copying from "task_struct" cache.

Changed name_to_handle to use put_user (which operates on fixed sizes)
instead of copy_to_user from "mnt_cache" cache.

Changed readlink to use stack for in-struct names to avoid copying from
filesystem cache (e.g. "ext4_inode_cache")

Changed sg ioctl to bounce commands through stack instead of directly
copying to/from "blkdev_requests" cache.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/kernel/signal.c |  6 +++---
 block/scsi_ioctl.c       | 27 +++++++++++++++++++++++++--
 fs/binfmt_elf.c          |  8 +++++++-
 fs/fhandle.c             |  3 +--
 fs/namei.c               | 11 ++++++++++-
 5 files changed, 46 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 22cc2f9f8aec..479fb2b9afcd 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -680,8 +680,8 @@ static int
 setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs)
 {
 	int usig = ksig->sig;
-	sigset_t *set = sigmask_to_save();
-	compat_sigset_t *cset = (compat_sigset_t *) set;
+	sigset_t sigcopy = *sigmask_to_save();
+	compat_sigset_t *cset = (compat_sigset_t *)&sigcopy;
 
 	/* Set up the stack frame */
 	if (is_ia32_frame()) {
@@ -692,7 +692,7 @@ setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs)
 	} else if (is_x32_frame()) {
 		return x32_setup_rt_frame(ksig, cset, regs);
 	} else {
-		return __setup_rt_frame(ksig->sig, ksig, set, regs);
+		return __setup_rt_frame(ksig->sig, ksig, &sigcopy, regs);
 	}
 }
 
diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index 0774799942e0..c14d12a60a4c 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -227,8 +227,20 @@ EXPORT_SYMBOL(blk_verify_command);
 static int blk_fill_sghdr_rq(struct request_queue *q, struct request *rq,
 			     struct sg_io_hdr *hdr, fmode_t mode)
 {
-	if (copy_from_user(rq->cmd, hdr->cmdp, hdr->cmd_len))
+	unsigned char tmpcmd[sizeof(rq->__cmd)];
+	unsigned char *cmdptr;
+
+	if (rq->cmd != rq->__cmd)
+		cmdptr = rq->cmd;
+	else
+		cmdptr = tmpcmd;
+
+	if (copy_from_user(cmdptr, hdr->cmdp, hdr->cmd_len))
 		return -EFAULT;
+
+	if (cmdptr != rq->cmd)
+		memcpy(rq->cmd, cmdptr, hdr->cmd_len);
+
 	if (blk_verify_command(rq->cmd, mode & FMODE_WRITE))
 		return -EPERM;
 
@@ -420,6 +432,8 @@ int sg_scsi_ioctl(struct request_queue *q, struct gendisk *disk, fmode_t mode,
 	int err;
 	unsigned int in_len, out_len, bytes, opcode, cmdlen;
 	char *buffer = NULL, sense[SCSI_SENSE_BUFFERSIZE];
+	unsigned char tmpcmd[sizeof(rq->__cmd)];
+	unsigned char *cmdptr;
 
 	if (!sic)
 		return -EINVAL;
@@ -458,9 +472,18 @@ int sg_scsi_ioctl(struct request_queue *q, struct gendisk *disk, fmode_t mode,
 	 */
 	err = -EFAULT;
 	rq->cmd_len = cmdlen;
-	if (copy_from_user(rq->cmd, sic->data, cmdlen))
+
+	if (rq->cmd != rq->__cmd)
+		cmdptr = rq->cmd;
+	else
+		cmdptr = tmpcmd;
+
+	if (copy_from_user(cmdptr, sic->data, cmdlen))
 		goto error;
 
+	if (rq->cmd != cmdptr)
+		memcpy(rq->cmd, cmdptr, cmdlen);
+
 	if (in_len && copy_from_user(buffer, sic->data + cmdlen, in_len))
 		goto error;
 
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index e158b22ef32f..84edd23d7fcc 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -167,6 +167,8 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
 	int ei_index = 0;
 	const struct cred *cred = current_cred();
 	struct vm_area_struct *vma;
+	unsigned long saved_auxv[AT_VECTOR_SIZE];
+	size_t saved_auxv_size;
 
 	/*
 	 * In some cases (e.g. Hyper-Threading), we want to avoid L1
@@ -324,9 +326,13 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
 		return -EFAULT;
 	current->mm->env_end = p;
 
+	saved_auxv_size = ei_index * sizeof(elf_addr_t);
+	BUG_ON(saved_auxv_size > sizeof(saved_auxv));
+	memcpy(saved_auxv, elf_info, saved_auxv_size);
+
 	/* Put the elf_info on the stack in the right place.  */
 	sp = (elf_addr_t __user *)envp + 1;
-	if (copy_to_user(sp, elf_info, ei_index * sizeof(elf_addr_t)))
+	if (copy_to_user(sp, saved_auxv, saved_auxv_size))
 		return -EFAULT;
 	return 0;
 }
diff --git a/fs/fhandle.c b/fs/fhandle.c
index ca3c3dd01789..7ccb0a3fc86c 100644
--- a/fs/fhandle.c
+++ b/fs/fhandle.c
@@ -67,8 +67,7 @@ static long do_sys_name_to_handle(struct path *path,
 	} else
 		retval = 0;
 	/* copy the mount id */
-	if (copy_to_user(mnt_id, &real_mount(path->mnt)->mnt_id,
-			 sizeof(*mnt_id)) ||
+	if (put_user(real_mount(path->mnt)->mnt_id, mnt_id) ||
 	    copy_to_user(ufh, handle,
 			 sizeof(struct file_handle) + handle_bytes))
 		retval = -EFAULT;
diff --git a/fs/namei.c b/fs/namei.c
index 4c4f95ac8aa5..3ad684dd5c94 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4616,14 +4616,23 @@ EXPORT_SYMBOL(vfs_whiteout);
 
 int readlink_copy(char __user *buffer, int buflen, const char *link)
 {
+	char tmpbuf[64];
+	const char *newlink;
 	int len = PTR_ERR(link);
+
 	if (IS_ERR(link))
 		goto out;
 
 	len = strlen(link);
 	if (len > (unsigned) buflen)
 		len = buflen;
-	if (copy_to_user(buffer, link, len))
+	if (len < sizeof(tmpbuf)) {
+		memcpy(tmpbuf, link, len);
+		newlink = tmpbuf;
+	} else
+		newlink = link;
+
+	if (copy_to_user(buffer, newlink, len))
 		len = -EFAULT;
 out:
 	return len;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [kernel-hardening] [PATCH v2 3/4] usercopy: whitelist user-copyable caches
  2016-06-08 21:11 [kernel-hardening] [RFC][PATCH v2 0/4] mm: Hardened usercopy Kees Cook
  2016-06-08 21:11 ` [kernel-hardening] [PATCH v2 1/4] " Kees Cook
  2016-06-08 21:11 ` [kernel-hardening] [PATCH v2 2/4] usercopy: avoid direct copying to userspace Kees Cook
@ 2016-06-08 21:11 ` Kees Cook
  2016-06-08 21:11 ` [kernel-hardening] [PATCH v2 4/4] usercopy: provide split of user-controlled slabs Kees Cook
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 26+ messages in thread
From: Kees Cook @ 2016-06-08 21:11 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Kees Cook, Brad Spengler, PaX Team, Casey Schaufler,
	Rik van Riel, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton

Most stuff in the kernel should not be copied to/from userspace, so we
instead mark those that were arguably designed to be with SLAB_USERCOPY.
Some, like the general kmalloc caches, must be marked this way since they
are one of the most frequently used allocation methods, after the stack,
used to hold data copied to/from userspace.

Since this is likely going to be temporarily disruptive for other
architectures or workflows, this can be disabled by turning off
CONFIG_HARDENED_USERCOPY_WHITELIST.

Note: "dcache" cache isn't whitelisted in grsecurity/PaX, but I have
not figured out how to make readdir operate sanely yet.

Based on PAX_USERCOPY by Brad Spengler and PaX Team.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/cifs/cifsfs.c       |  7 ++++---
 fs/dcache.c            |  6 ++++--
 fs/jfs/super.c         |  2 +-
 include/linux/gfp.h    |  5 ++++-
 include/linux/slab.h   |  1 +
 kernel/fork.c          |  2 +-
 mm/slab.c              |  8 +++++++-
 mm/slab.h              |  3 ++-
 mm/slab_common.c       |  6 +++---
 mm/slub.c              |  4 ++++
 net/decnet/af_decnet.c |  1 +
 security/Kconfig       | 10 ++++++++++
 virt/kvm/kvm_main.c    |  2 +-
 13 files changed, 43 insertions(+), 14 deletions(-)

diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index 5d8b7edf8a8f..18444cc5e7dc 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -1142,7 +1142,8 @@ cifs_init_request_bufs(void)
 */
 	cifs_req_cachep = kmem_cache_create("cifs_request",
 					    CIFSMaxBufSize + max_hdr_size, 0,
-					    SLAB_HWCACHE_ALIGN, NULL);
+					    SLAB_HWCACHE_ALIGN|SLAB_USERCOPY,
+					    NULL);
 	if (cifs_req_cachep == NULL)
 		return -ENOMEM;
 
@@ -1169,8 +1170,8 @@ cifs_init_request_bufs(void)
 	efficient to alloc 1 per page off the slab compared to 17K (5page)
 	alloc of large cifs buffers even when page debugging is on */
 	cifs_sm_req_cachep = kmem_cache_create("cifs_small_rq",
-			MAX_CIFS_SMALL_BUFFER_SIZE, 0, SLAB_HWCACHE_ALIGN,
-			NULL);
+			MAX_CIFS_SMALL_BUFFER_SIZE, 0,
+			SLAB_USERCOPY | SLAB_HWCACHE_ALIGN, NULL);
 	if (cifs_sm_req_cachep == NULL) {
 		mempool_destroy(cifs_req_poolp);
 		kmem_cache_destroy(cifs_req_cachep);
diff --git a/fs/dcache.c b/fs/dcache.c
index ad4a542e9bab..976896d39283 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -3616,8 +3616,10 @@ static void __init dcache_init(void)
 	 * but it is probably not worth it because of the cache nature
 	 * of the dcache. 
 	 */
+	/* FIXME: this shouldn't need SLAB_USERCOPY. */
 	dentry_cache = KMEM_CACHE(dentry,
-		SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD|SLAB_ACCOUNT);
+		SLAB_USERCOPY | SLAB_RECLAIM_ACCOUNT | SLAB_PANIC |
+		SLAB_MEM_SPREAD | SLAB_ACCOUNT);
 
 	/* Hash may have been set up in dcache_init_early */
 	if (!hashdist)
@@ -3653,7 +3655,7 @@ void __init vfs_caches_init_early(void)
 void __init vfs_caches_init(void)
 {
 	names_cachep = kmem_cache_create("names_cache", PATH_MAX, 0,
-			SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
+			SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_USERCOPY, NULL);
 
 	dcache_init();
 	inode_init();
diff --git a/fs/jfs/super.c b/fs/jfs/super.c
index cec8814a3b8b..9f1b2a24e779 100644
--- a/fs/jfs/super.c
+++ b/fs/jfs/super.c
@@ -898,7 +898,7 @@ static int __init init_jfs_fs(void)
 
 	jfs_inode_cachep =
 	    kmem_cache_create("jfs_ip", sizeof(struct jfs_inode_info), 0,
-			    SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|SLAB_ACCOUNT,
+			    SLAB_USERCOPY|SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|SLAB_ACCOUNT,
 			    init_once);
 	if (jfs_inode_cachep == NULL)
 		return -ENOMEM;
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 570383a41853..2c9f70a69561 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -41,6 +41,7 @@ struct vm_area_struct;
 #define ___GFP_OTHER_NODE	0x800000u
 #define ___GFP_WRITE		0x1000000u
 #define ___GFP_KSWAPD_RECLAIM	0x2000000u
+#define ___GFP_USERCOPY		0x4000000u
 /* If the above are modified, __GFP_BITS_SHIFT may need updating */
 
 /*
@@ -86,6 +87,7 @@ struct vm_area_struct;
 #define __GFP_HARDWALL   ((__force gfp_t)___GFP_HARDWALL)
 #define __GFP_THISNODE	((__force gfp_t)___GFP_THISNODE)
 #define __GFP_ACCOUNT	((__force gfp_t)___GFP_ACCOUNT)
+#define __GFP_USERCOPY	((__force gfp_t)___GFP_USERCOPY)
 
 /*
  * Watermark modifiers -- controls access to emergency reserves
@@ -188,7 +190,7 @@ struct vm_area_struct;
 #define __GFP_OTHER_NODE ((__force gfp_t)___GFP_OTHER_NODE)
 
 /* Room for N __GFP_FOO bits */
-#define __GFP_BITS_SHIFT 26
+#define __GFP_BITS_SHIFT 27
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /*
@@ -258,6 +260,7 @@ struct vm_area_struct;
 #define GFP_TRANSHUGE	((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
 			 __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN) & \
 			 ~__GFP_RECLAIM)
+#define GFP_USERCOPY	__GFP_USERCOPY
 
 /* Convert GFP flags to their corresponding migrate type */
 #define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE)
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 5c0cd75b2d07..59cc29ef4cd1 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -21,6 +21,7 @@
  * The ones marked DEBUG are only valid if CONFIG_DEBUG_SLAB is set.
  */
 #define SLAB_CONSISTENCY_CHECKS	0x00000100UL	/* DEBUG: Perform (expensive) checks on alloc/free */
+#define SLAB_USERCOPY		0x00000200UL	/* USERCOPY: Allow copying objs to/from userspace */
 #define SLAB_RED_ZONE		0x00000400UL	/* DEBUG: Red zone objs in a cache */
 #define SLAB_POISON		0x00000800UL	/* DEBUG: Poison objects */
 #define SLAB_HWCACHE_ALIGN	0x00002000UL	/* Align objs on cache lines */
diff --git a/kernel/fork.c b/kernel/fork.c
index 5c2c355aa97f..2b09fc076dea 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -197,7 +197,7 @@ static void free_thread_info(struct thread_info *ti)
 void thread_info_cache_init(void)
 {
 	thread_info_cache = kmem_cache_create("thread_info", THREAD_SIZE,
-					      THREAD_SIZE, 0, NULL);
+					      THREAD_SIZE, SLAB_USERCOPY, NULL);
 	BUG_ON(thread_info_cache == NULL);
 }
 # endif
diff --git a/mm/slab.c b/mm/slab.c
index 4cb2e5408625..b3fbdd6ac027 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1354,7 +1354,8 @@ void __init kmem_cache_init(void)
 	 * structures first.  Without this, further allocations will bug.
 	 */
 	kmalloc_caches[INDEX_NODE] = create_kmalloc_cache("kmalloc-node",
-				kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS);
+				kmalloc_size(INDEX_NODE),
+				SLAB_USERCOPY | ARCH_KMALLOC_FLAGS);
 	slab_state = PARTIAL_NODE;
 	setup_kmalloc_cache_index_table();
 
@@ -4482,6 +4483,7 @@ module_init(slab_proc_init);
  * Rejects objects that are:
  * - NULL or zero-allocated
  * - incorrectly sized
+ * - not marked with SLAB_USERCOPY
  *
  * Returns NULL if check passes, otherwise const char * to name of cache
  * to indicate an error.
@@ -4494,6 +4496,10 @@ const char *__check_heap_object(const void *ptr, unsigned long n,
 	unsigned long offset;
 
 	cachep = page->slab_cache;
+#ifdef CONFIG_HARDENED_USERCOPY_WHITELIST
+	if (!(cachep->flags & SLAB_USERCOPY))
+		return cachep->name;
+#endif
 
 	objnr = obj_to_index(cachep, page, (void *)ptr);
 	BUG_ON(objnr >= cachep->num);
diff --git a/mm/slab.h b/mm/slab.h
index dedb1a920fb8..db29e111902b 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -119,7 +119,8 @@ static inline unsigned long kmem_cache_flags(unsigned long object_size,
 
 /* Legal flag mask for kmem_cache_create(), for various configurations */
 #define SLAB_CORE_FLAGS (SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA | SLAB_PANIC | \
-			 SLAB_DESTROY_BY_RCU | SLAB_DEBUG_OBJECTS )
+			 SLAB_DESTROY_BY_RCU | SLAB_DEBUG_OBJECTS | \
+			 SLAB_USERCOPY)
 
 #if defined(CONFIG_DEBUG_SLAB)
 #define SLAB_DEBUG_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER)
diff --git a/mm/slab_common.c b/mm/slab_common.c
index a65dad7fdcd1..f3f6ae3f56fc 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -967,7 +967,7 @@ void __init create_kmalloc_caches(unsigned long flags)
 
 	for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++) {
 		if (!kmalloc_caches[i])
-			new_kmalloc_cache(i, flags);
+			new_kmalloc_cache(i, SLAB_USERCOPY | flags);
 
 		/*
 		 * Caches that are not of the two-to-the-power-of size.
@@ -975,9 +975,9 @@ void __init create_kmalloc_caches(unsigned long flags)
 		 * earlier power of two caches
 		 */
 		if (KMALLOC_MIN_SIZE <= 32 && !kmalloc_caches[1] && i == 6)
-			new_kmalloc_cache(1, flags);
+			new_kmalloc_cache(1, SLAB_USERCOPY | flags);
 		if (KMALLOC_MIN_SIZE <= 64 && !kmalloc_caches[2] && i == 7)
-			new_kmalloc_cache(2, flags);
+			new_kmalloc_cache(2, SLAB_USERCOPY | flags);
 	}
 
 	/* Kmalloc array is now usable */
diff --git a/mm/slub.c b/mm/slub.c
index 83d3cbc7adf8..589f0ffe712b 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3622,6 +3622,10 @@ const char *__check_heap_object(const void *ptr, unsigned long n,
 	unsigned long offset;
 
 	s = page->slab_cache;
+#ifdef CONFIG_HARDENED_USERCOPY_WHITELIST
+	if (!(s->flags & SLAB_USERCOPY))
+		return s->name;
+#endif
 
 	offset = (ptr - page_address(page)) % s->size;
 	if (offset <= s->object_size && n <= s->object_size - offset)
diff --git a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c
index 13d6b1a6e0fc..0ac2a9bd18f3 100644
--- a/net/decnet/af_decnet.c
+++ b/net/decnet/af_decnet.c
@@ -466,6 +466,7 @@ static struct proto dn_proto = {
 	.sysctl_rmem		= sysctl_decnet_rmem,
 	.max_header		= DN_MAX_NSP_DATA_HEADER + 64,
 	.obj_size		= sizeof(struct dn_sock),
+	.slab_flags		= SLAB_USERCOPY,
 };
 
 static struct sock *dn_alloc_sock(struct net *net, struct socket *sock, gfp_t gfp, int kern)
diff --git a/security/Kconfig b/security/Kconfig
index 081607a5e078..0ec17a252e49 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -129,6 +129,16 @@ config HARDENED_USERCOPY
 	  This kills entire classes of heap overflows and similar
 	  kernel memory exposures.
 
+config HARDENED_USERCOPY_WHITELIST
+	bool "Whiltelist heap memory that is allow to be user-copied"
+	default HARDENED_USERCOPY
+	help
+	  This option adds checking for kernel memory being whitelisted
+	  as "expected to be copied to/from userspace", via the
+	  SLAB_USERCOPY flag. This greatly reduces the areas of kernel
+	  memory that an attack has access to through bugs in interfaces
+	  that use copy_to_user() and copy_from_user().
+
 source security/selinux/Kconfig
 source security/smack/Kconfig
 source security/tomoyo/Kconfig
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 02e98f3131bd..2178fd2b87b9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3752,7 +3752,7 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
 	if (!vcpu_align)
 		vcpu_align = __alignof__(struct kvm_vcpu);
 	kvm_vcpu_cache = kmem_cache_create("kvm_vcpu", vcpu_size, vcpu_align,
-					   0, NULL);
+					   SLAB_USERCOPY, NULL);
 	if (!kvm_vcpu_cache) {
 		r = -ENOMEM;
 		goto out_free_3;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [kernel-hardening] [PATCH v2 4/4] usercopy: provide split of user-controlled slabs
  2016-06-08 21:11 [kernel-hardening] [RFC][PATCH v2 0/4] mm: Hardened usercopy Kees Cook
                   ` (2 preceding siblings ...)
  2016-06-08 21:11 ` [kernel-hardening] [PATCH v2 3/4] usercopy: whitelist user-copyable caches Kees Cook
@ 2016-06-08 21:11 ` Kees Cook
  2016-06-09  3:02 ` [kernel-hardening] [RFC][PATCH v2 5/4] arm: fixes for usercopy Kees Cook
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 26+ messages in thread
From: Kees Cook @ 2016-06-08 21:11 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Kees Cook, Brad Spengler, PaX Team, Casey Schaufler,
	Rik van Riel, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton

Several userspace APIs (ipc, seq_file) provide precise control over
the size of kernel kmallocs, which provides a trivial way to perform
heap overflow attacks where the attacker must control neighboring
allocations. Instead, move these APIs into their own cache so they
cannot interfere with the standard kmallocs, and disable slab merging.
This is enabled with CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC.

Based on PAX_USERCOPY_SLABS by Brad Spengler and PaX Team.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/seq_file.c        |  2 +-
 include/linux/slab.h |  4 ++++
 ipc/msgutil.c        |  4 ++--
 mm/slab_common.c     | 35 ++++++++++++++++++++++++++++
 mm/slob.c            | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/slub.c            | 11 +++++++++
 security/Kconfig     | 13 +++++++++++
 7 files changed, 131 insertions(+), 3 deletions(-)

diff --git a/fs/seq_file.c b/fs/seq_file.c
index 19f532e7d35e..1686d05d7914 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -26,7 +26,7 @@ static void seq_set_overflow(struct seq_file *m)
 static void *seq_buf_alloc(unsigned long size)
 {
 	void *buf;
-	gfp_t gfp = GFP_KERNEL;
+	gfp_t gfp = GFP_KERNEL | GFP_USERCOPY;
 
 	/*
 	 * For high order allocations, use __GFP_NORETRY to avoid oom-killing -
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 59cc29ef4cd1..196972482333 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -270,6 +270,10 @@ extern struct kmem_cache *kmalloc_caches[KMALLOC_SHIFT_HIGH + 1];
 extern struct kmem_cache *kmalloc_dma_caches[KMALLOC_SHIFT_HIGH + 1];
 #endif
 
+#ifdef CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC
+extern struct kmem_cache *kmalloc_usercopy_caches[KMALLOC_SHIFT_HIGH + 1];
+#endif
+
 /*
  * Figure out which kmalloc slab an allocation of a certain size
  * belongs to.
diff --git a/ipc/msgutil.c b/ipc/msgutil.c
index ed81aafd2392..6bd28c3aec8c 100644
--- a/ipc/msgutil.c
+++ b/ipc/msgutil.c
@@ -55,7 +55,7 @@ static struct msg_msg *alloc_msg(size_t len)
 	size_t alen;
 
 	alen = min(len, DATALEN_MSG);
-	msg = kmalloc(sizeof(*msg) + alen, GFP_KERNEL);
+	msg = kmalloc(sizeof(*msg) + alen, GFP_KERNEL | GFP_USERCOPY);
 	if (msg == NULL)
 		return NULL;
 
@@ -67,7 +67,7 @@ static struct msg_msg *alloc_msg(size_t len)
 	while (len > 0) {
 		struct msg_msgseg *seg;
 		alen = min(len, DATALEN_SEG);
-		seg = kmalloc(sizeof(*seg) + alen, GFP_KERNEL);
+		seg = kmalloc(sizeof(*seg) + alen, GFP_KERNEL | GFP_USERCOPY);
 		if (seg == NULL)
 			goto out_err;
 		*pseg = seg;
diff --git a/mm/slab_common.c b/mm/slab_common.c
index f3f6ae3f56fc..fd567a61f8aa 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -44,7 +44,16 @@ struct kmem_cache *kmem_cache;
  * Merge control. If this is set then no merging of slab caches will occur.
  * (Could be removed. This was introduced to pacify the merge skeptics.)
  */
+#ifdef CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC
+/*
+ * If the kmalloc slabs are split between user-controlled sizing and
+ * regular kmalloc, we want to make sure we don't help attackers by
+ * merging slabs.
+ */
+static int slab_nomerge = 1;
+#else
 static int slab_nomerge;
+#endif
 
 static int __init setup_slab_nomerge(char *str)
 {
@@ -811,6 +820,11 @@ struct kmem_cache *kmalloc_dma_caches[KMALLOC_SHIFT_HIGH + 1];
 EXPORT_SYMBOL(kmalloc_dma_caches);
 #endif
 
+#ifdef CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC
+struct kmem_cache *kmalloc_usercopy_caches[KMALLOC_SHIFT_HIGH + 1];
+EXPORT_SYMBOL(kmalloc_usercopy_caches);
+#endif /* CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC */
+
 /*
  * Conversion table for small slabs sizes / 8 to the index in the
  * kmalloc array. This is necessary for slabs < 192 since we have non power
@@ -875,6 +889,11 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags)
 		return kmalloc_dma_caches[index];
 
 #endif
+#ifdef CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC
+	if (unlikely((flags & GFP_USERCOPY)))
+		return kmalloc_usercopy_caches[index];
+#endif /* CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC */
+
 	return kmalloc_caches[index];
 }
 
@@ -998,6 +1017,22 @@ void __init create_kmalloc_caches(unsigned long flags)
 		}
 	}
 #endif
+#ifdef CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC
+	for (i = 0; i <= KMALLOC_SHIFT_HIGH; i++) {
+		struct kmem_cache *s = kmalloc_caches[i];
+
+		if (s) {
+			int size = kmalloc_size(i);
+			char *n = kasprintf(GFP_NOWAIT,
+				 "usercopy-kmalloc-%d", size);
+
+			BUG_ON(!n);
+			kmalloc_usercopy_caches[i] = create_kmalloc_cache(n,
+				size, SLAB_USERCOPY | flags);
+		}
+	}
+#endif /* CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC */
+
 }
 #endif /* !CONFIG_SLOB */
 
diff --git a/mm/slob.c b/mm/slob.c
index 2d54fcd262fa..a01379794670 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -583,6 +583,53 @@ int __kmem_cache_create(struct kmem_cache *c, unsigned long flags)
 	return 0;
 }
 
+#ifdef CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC
+static __always_inline void *
+__do_kmalloc_node_align(size_t size, gfp_t gfp, int node, unsigned long caller, int align)
+{
+	slob_t *m;
+	void *ret = NULL;
+
+	gfp &= gfp_allowed_mask;
+
+	lockdep_trace_alloc(gfp);
+
+	if (size < PAGE_SIZE - align) {
+		if (!size)
+			return ZERO_SIZE_PTR;
+
+		m = slob_alloc(size + align, gfp, align, node);
+
+		if (!m)
+			return NULL;
+		BUILD_BUG_ON(ARCH_KMALLOC_MINALIGN < 2 * SLOB_UNIT);
+		BUILD_BUG_ON(ARCH_SLAB_MINALIGN < 2 * SLOB_UNIT);
+		m[0].units = size;
+		m[1].units = align;
+		ret = (void *)m + align;
+
+		trace_kmalloc_node(caller, ret,
+				   size, size + align, gfp, node);
+	} else {
+		unsigned int order = get_order(size);
+		struct page *page;
+
+		if (likely(order))
+			gfp |= __GFP_COMP;
+		page = slob_new_pages(gfp, order, node);
+		if (page) {
+			ret = page_address(page);
+			page->private = size;
+		}
+
+		trace_kmalloc_node(caller, ret,
+				   size, PAGE_SIZE << order, gfp, node);
+	}
+
+	return ret;
+}
+#endif /* CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC */
+
 static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node)
 {
 	void *b;
@@ -591,6 +638,10 @@ static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node)
 
 	lockdep_trace_alloc(flags);
 
+#ifdef CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC
+	b = __do_kmalloc_node_align(c->size, flags, node, _RET_IP_, c->align);
+#else /* CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC */
+
 	if (c->size < PAGE_SIZE) {
 		b = slob_alloc(c->size, flags, c->align, node);
 		trace_kmem_cache_alloc_node(_RET_IP_, b, c->object_size,
@@ -602,6 +653,7 @@ static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node)
 					    PAGE_SIZE << get_order(c->size),
 					    flags, node);
 	}
+#endif /* CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC */
 
 	if (b && c->ctor)
 		c->ctor(b);
@@ -648,6 +700,15 @@ static void kmem_rcu_free(struct rcu_head *head)
 
 void kmem_cache_free(struct kmem_cache *c, void *b)
 {
+#ifdef CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC
+	int size = c->size;
+
+	if (size + c->align < PAGE_SIZE) {
+		size += c->align;
+		b -= c->align;
+	}
+#endif /* CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC */
+
 	kmemleak_free_recursive(b, c->flags);
 	if (unlikely(c->flags & SLAB_DESTROY_BY_RCU)) {
 		struct slob_rcu *slob_rcu;
@@ -658,7 +719,11 @@ void kmem_cache_free(struct kmem_cache *c, void *b)
 		__kmem_cache_free(b, c->size);
 	}
 
+#ifdef CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC
+	trace_kfree(_RET_IP_, b);
+#else /* CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC */
 	trace_kmem_cache_free(_RET_IP_, b);
+#endif /* CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC */
 }
 EXPORT_SYMBOL(kmem_cache_free);
 
diff --git a/mm/slub.c b/mm/slub.c
index 589f0ffe712b..bab85cf94f3f 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4838,6 +4838,14 @@ static ssize_t cache_dma_show(struct kmem_cache *s, char *buf)
 SLAB_ATTR_RO(cache_dma);
 #endif
 
+#ifdef CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC
+static ssize_t usercopy_show(struct kmem_cache *s, char *buf)
+{
+	return sprintf(buf, "%d\n", !!(s->flags & SLAB_USERCOPY));
+}
+SLAB_ATTR_RO(usercopy);
+#endif /* CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC */
+
 static ssize_t destroy_by_rcu_show(struct kmem_cache *s, char *buf)
 {
 	return sprintf(buf, "%d\n", !!(s->flags & SLAB_DESTROY_BY_RCU));
@@ -5178,6 +5186,9 @@ static struct attribute *slab_attrs[] = {
 #ifdef CONFIG_ZONE_DMA
 	&cache_dma_attr.attr,
 #endif
+#ifdef CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC
+	&usercopy_attr.attr,
+#endif /* CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC */
 #ifdef CONFIG_NUMA
 	&remote_node_defrag_ratio_attr.attr,
 #endif
diff --git a/security/Kconfig b/security/Kconfig
index 0ec17a252e49..f5d367213d8c 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -139,6 +139,19 @@ config HARDENED_USERCOPY_WHITELIST
 	  memory that an attack has access to through bugs in interfaces
 	  that use copy_to_user() and copy_from_user().
 
+config HARDENED_USERCOPY_SPLIT_KMALLOC
+	bool "Split kmalloc caches from user-controlled allocations"
+	depends on HARDENED_USERCOPY
+	default HARDENED_USERCOPY
+	help
+	  This option creates a separate set of kmalloc caches used for
+	  userspace APIs that provide fine-grained control over kernel
+	  allocation sizes. Without this, it is much easier for attackers
+	  to precisely size and attack heap overflows. If their allocations
+	  are separated into a different cache, attackers must find other
+	  ways to prepare heap attacks that will be near their desired
+	  targets.
+
 source security/selinux/Kconfig
 source security/smack/Kconfig
 source security/tomoyo/Kconfig
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [kernel-hardening] Re: [PATCH v2 1/4] mm: Hardened usercopy
  2016-06-08 21:11 ` [kernel-hardening] [PATCH v2 1/4] " Kees Cook
@ 2016-06-09  0:47   ` Brad Spengler
  2016-06-09  1:39     ` Rik van Riel
  2016-06-09  2:58     ` Kees Cook
  2016-07-12 23:04   ` Kees Cook
  1 sibling, 2 replies; 26+ messages in thread
From: Brad Spengler @ 2016-06-09  0:47 UTC (permalink / raw)
  To: Kees Cook
  Cc: kernel-hardening, PaX Team, Casey Schaufler, Rik van Riel,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 679 bytes --]

> diff --git a/mm/usercopy.c b/mm/usercopy.c
> new file mode 100644
> index 000000000000..e09c33070759
> --- /dev/null
> +++ b/mm/usercopy.c
> @@ -0,0 +1,177 @@
> +/*
> + * This implements the various checks for CONFIG_HARDENED_USERCOPY*,
> + * which are designed to protect kernel memory from needless exposure
> + * and overwrite under many conditions.
> + */

As this is a new file being introduced which is (modulo some bikeshedding
and addition of a few comments) a direct copy+paste of our code and comments
in fs/exec.c, I would appreciate both a GPL notice (the same as exists for
KASAN, etc) and both the PaX Team and myself being listed as the copyright
owners.

-Brad

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [kernel-hardening] Re: [PATCH v2 1/4] mm: Hardened usercopy
  2016-06-09  0:47   ` [kernel-hardening] " Brad Spengler
@ 2016-06-09  1:39     ` Rik van Riel
  2016-06-09  2:58     ` Kees Cook
  1 sibling, 0 replies; 26+ messages in thread
From: Rik van Riel @ 2016-06-09  1:39 UTC (permalink / raw)
  To: Brad Spengler, Kees Cook
  Cc: kernel-hardening, PaX Team, Casey Schaufler, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 895 bytes --]

On Wed, 2016-06-08 at 20:47 -0400, Brad Spengler wrote:
> > 
> > diff --git a/mm/usercopy.c b/mm/usercopy.c
> > new file mode 100644
> > index 000000000000..e09c33070759
> > --- /dev/null
> > +++ b/mm/usercopy.c
> > @@ -0,0 +1,177 @@
> > +/*
> > + * This implements the various checks for
> > CONFIG_HARDENED_USERCOPY*,
> > + * which are designed to protect kernel memory from needless
> > exposure
> > + * and overwrite under many conditions.
> > + */
> As this is a new file being introduced which is (modulo some
> bikeshedding
> and addition of a few comments) a direct copy+paste of our code and
> comments
> in fs/exec.c, I would appreciate both a GPL notice (the same as
> exists for
> KASAN, etc) and both the PaX Team and myself being listed as the
> copyright
> owners.
> 
I have to agree with this. Credit where credit is due.

-- 
All Rights Reversed.


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [kernel-hardening] Re: [PATCH v2 1/4] mm: Hardened usercopy
  2016-06-09  0:47   ` [kernel-hardening] " Brad Spengler
  2016-06-09  1:39     ` Rik van Riel
@ 2016-06-09  2:58     ` Kees Cook
  1 sibling, 0 replies; 26+ messages in thread
From: Kees Cook @ 2016-06-09  2:58 UTC (permalink / raw)
  To: Brad Spengler
  Cc: kernel-hardening, PaX Team, Casey Schaufler, Rik van Riel,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton

On Wed, Jun 8, 2016 at 5:47 PM, Brad Spengler <spender@grsecurity.net> wrote:
>> diff --git a/mm/usercopy.c b/mm/usercopy.c
>> new file mode 100644
>> index 000000000000..e09c33070759
>> --- /dev/null
>> +++ b/mm/usercopy.c
>> @@ -0,0 +1,177 @@
>> +/*
>> + * This implements the various checks for CONFIG_HARDENED_USERCOPY*,
>> + * which are designed to protect kernel memory from needless exposure
>> + * and overwrite under many conditions.
>> + */
>
> As this is a new file being introduced which is (modulo some bikeshedding
> and addition of a few comments) a direct copy+paste of our code and comments
> in fs/exec.c, I would appreciate both a GPL notice (the same as exists for
> KASAN, etc) and both the PaX Team and myself being listed as the copyright
> owners.

Sure thing! I'll include it in the next revision. Do you have specific
text and/or date ranges you'd prefer me to use?

Thanks,

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [kernel-hardening] [RFC][PATCH v2 5/4] arm: fixes for usercopy
  2016-06-08 21:11 [kernel-hardening] [RFC][PATCH v2 0/4] mm: Hardened usercopy Kees Cook
                   ` (3 preceding siblings ...)
  2016-06-08 21:11 ` [kernel-hardening] [PATCH v2 4/4] usercopy: provide split of user-controlled slabs Kees Cook
@ 2016-06-09  3:02 ` Kees Cook
  2016-06-09 15:35 ` [kernel-hardening] RE: [RFC][PATCH v2 0/4] mm: Hardened usercopy Schaufler, Casey
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 26+ messages in thread
From: Kees Cook @ 2016-06-09  3:02 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Brad Spengler, PaX Team, Casey Schaufler, Rik van Riel,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton

These will be in the next version; they are the fixes I needed to boot my
ARM VM.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h
index 7bcdb56ce6fb..c4887b272527 100644
--- a/arch/arm/include/asm/uaccess.h
+++ b/arch/arm/include/asm/uaccess.h
@@ -515,12 +515,12 @@ __copy_to_user(void __user *to, const void *from, unsigned long n)
 #ifndef CONFIG_UACCESS_WITH_MEMCPY
 	unsigned int __ua_flags = uaccess_save_and_enable();
 
-	check_object_size(to, n, false);
+	check_object_size(from, n, true);
 	n = arm_copy_to_user(to, from, n);
 	uaccess_restore(__ua_flags);
 	return n;
 #else
-	check_object_size(to, n, false);
+	check_object_size(from, n, true);
 	return arm_copy_to_user(to, from, n);
 #endif
 }
diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c
index 7b8f2141427b..98b497c83aef 100644
--- a/arch/arm/kernel/signal.c
+++ b/arch/arm/kernel/signal.c
@@ -252,6 +252,7 @@ badframe:
 static int
 setup_sigframe(struct sigframe __user *sf, struct pt_regs *regs, sigset_t *set)
 {
+	sigset_t setcopy = *set;
 	struct aux_sigframe __user *aux;
 	int err = 0;
 
@@ -278,7 +279,7 @@ setup_sigframe(struct sigframe __user *sf, struct pt_regs *regs, sigset_t *set)
 	__put_user_error(current->thread.address, &sf->uc.uc_mcontext.fault_address, err);
 	__put_user_error(set->sig[0], &sf->uc.uc_mcontext.oldmask, err);
 
-	err |= __copy_to_user(&sf->uc.uc_sigmask, set, sizeof(*set));
+	err |= __copy_to_user(&sf->uc.uc_sigmask, &setcopy, sizeof(*set));
 
 	aux = (struct aux_sigframe __user *) sf->uc.uc_regspace;
 #ifdef CONFIG_CRUNCH
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index e2c6da096cef..99420fc1f066 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -125,6 +125,8 @@ SECTIONS
 #ifdef CONFIG_DEBUG_ALIGN_RODATA
 	. = ALIGN(1<<SECTION_SHIFT);
 #endif
+	_etext = .;			/* End of text section */
+
 	RO_DATA(PAGE_SIZE)
 
 	. = ALIGN(4);
@@ -155,8 +157,6 @@ SECTIONS
 
 	NOTES
 
-	_etext = .;			/* End of text and rodata section */
-
 #ifdef CONFIG_DEBUG_RODATA
 	. = ALIGN(1<<SECTION_SHIFT);
 #else

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [kernel-hardening] RE: [RFC][PATCH v2 0/4] mm: Hardened usercopy
  2016-06-08 21:11 [kernel-hardening] [RFC][PATCH v2 0/4] mm: Hardened usercopy Kees Cook
                   ` (4 preceding siblings ...)
  2016-06-09  3:02 ` [kernel-hardening] [RFC][PATCH v2 5/4] arm: fixes for usercopy Kees Cook
@ 2016-06-09 15:35 ` Schaufler, Casey
  2016-06-09 17:48   ` [kernel-hardening] " Kees Cook
  2016-06-09 23:39 ` [kernel-hardening] [RFC][PATCH 6/4] mm: disallow user copy to/from separately allocated pages Rik van Riel
  2016-06-16  1:30 ` [kernel-hardening] [RFC][PATCH v2 0/4] mm: Hardened usercopy Valdis.Kletnieks
  7 siblings, 1 reply; 26+ messages in thread
From: Schaufler, Casey @ 2016-06-09 15:35 UTC (permalink / raw)
  To: Kees Cook, kernel-hardening
  Cc: Brad Spengler, PaX Team, Rik van Riel, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton

> -----Original Message-----
> From: Kees Cook [mailto:keescook@chromium.org]
> Sent: Wednesday, June 08, 2016 2:12 PM
> To: kernel-hardening@lists.openwall.com
> Cc: Kees Cook <keescook@chromium.org>; Brad Spengler
> <spender@grsecurity.net>; PaX Team <pageexec@freemail.hu>; Schaufler,
> Casey <casey.schaufler@intel.com>; Rik van Riel <riel@redhat.com>; Christoph
> Lameter <cl@linux.com>; Pekka Enberg <penberg@kernel.org>; David Rientjes
> <rientjes@google.com>; Joonsoo Kim <iamjoonsoo.kim@lge.com>; Andrew
> Morton <akpm@linux-foundation.org>
> Subject: [RFC][PATCH v2 0/4] mm: Hardened usercopy
> 
> Hi,
> 
> This is v2 of the RFC patches for a mainline port of PAX_USERCOPY. After
> I started writing tests for Casey's earlier port[1], I kept fixing things
> further and further until I ended up with a whole new patch series. To
> that end, I also took Rik's feedback and made a number of other changes
> and clean-ups, which are noted in the "v2" history at the end.

I love it when a plan comes together.

Thank you for v2. Hopefully v1 was useful as a base.

> Based on my understanding (please correct anything I haven't understood
> correctly), PAX_USERCOPY was designed to catch a few classes of flaws
> around the use of copy_to_user/copy_from_user. These changes don't touch
> get_user and put_user, since those operate on constant sized lengths,
> and tend to be much less vulnerable. The goal is protecting against
> flawed lengths, and then also against manipulated addresses.
> 
> There are effectively three distinct protections in this series, each of
> which I've given a separate CONFIG. (Generally speaking, PAX_USERCOPY
> covers both CONFIG_HARDENED_USERCOPY and
> CONFIG_HARDENED_USERCOPY_WHITELIST,
> and PAX_USERCOPY_SLABS covers
> CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC.) The
> new CONFIGs are:
> 
> 
> CONFIG_HARDENED_USERCOPY checks that objects being copied to/from
> userspace
> meet certain criteria:
> - if address is a heap object, the size must not exceed the object's
>   allocated size. (This will catch all kinds of heap overflow flaws.)
> - if address range is in the current process stack, it must be within the
>   current stack frame (if such checking is possible) or at least entirely
>   within the current process's stack. (This could catch large lengths that
>   would have extended beyond the current process stack.)
> - if address is within the kernel text, it is rejected.
> - everything else is accepted (which includes vmalloc, other process
>   stacks, etc). This isn't a perfect checker, but it entirely closes
>   several classes of usercopy flaws.
> 
> 
> CONFIG_HARDENED_USERCOPY_WHITELIST adds the SLUB_USERCOPY flag to
> kernel
> memory caches. This is used to vastly reduce the number of heap objects
> that can be copied to userspace by whitelisting only those that are deemed
> a "tolerable risk" to be copied. Due to kmalloc's extensive use in the
> kernel, all of the kmalloc caches are marked with SLUB_USERCOPY. Beyond
> those, the other caches marked for SLUB_USERCOPY are: cifs_request,
> cifs_small_rq, names_cache, jfs_ip, thread_info, and kvm_vcpu. (In my
> test VM, there are about 160 non-kmalloc caches in /proc/slabinfo, so
> this whitelisting provides significant coverage.) With these markings
> in places, the CONFIG_HARDENED_USERCOPY heap object check above gains
> an
> additional rule:
> - if address is a heap object, it must be flagged with SLUB_USERCOPY
> 
> One thing I was unable to easily solve was that "filldir" wants to read
> from the "dcache" cache, which is not whitelisted in grsecurity, yet seems
> to work there. I haven't figured out why, and I hope someone can help
> point me in the right direction. In the meantime, I have whitelisted it.
> 
> After this minimal whitelist, this leaves a number of places in the
> kernel where information needs to be moved in and out of userspace,
> but without exposing an entire kernel cache that is not sensible to
> whitelist. In these places, we need to adjust how the kernel handles the
> copying. In my local testing, I uncovered several of them, and went to
> the grsecurity/PaX patches[2] to see how they were fixed. They were:
> 
> - Changed exec to use copy of auxv instead of copying from "mm_struct"
>   cache.
> - Changed signal handling to use an on-stack copy of signal frames instead
>   of direct copying from "task_struct" cache.
> - Changed name_to_handle to use put_user (which operates on fixed sizes)
>   instead of copy_to_user from "mnt_cache" cache.
> - Changed readlink to use stack for in-struct names to avoid copying from
>   filesystem cache (e.g. "ext4_inode_cache")
> - Changed sg ioctl to bounce commands through stack instead of directly
>   copying to/from "blkdev_requests" cache.
> 
> Since there are probably plenty more, it seemed prudent to split the
> whitelisting into a separate CONFIG so that people could turn it on
> or off depending on their desire to help find and fix these.
> 
> 
> CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC is a less obvious protection,
> and
> is only partially related to the prior two protections. Several userspace
> APIs (e.g. ipc, seq_file) allow precise control over the size of kernel
> kmallocs, which provides a trivial way to perform heap overflow attacks
> where the attacker must control neighboring allocations of a particular size.
> Instead, move these APIs into their own caches so they cannot interfere
> with the standard kmallocs. This makes heap attacks more difficult since
> easily controlled heap allocations are now isolated.
> 
> 
> The patches in the series are:
> - [PATCH 1/4] mm: Hardened usercopy
>   This contains all of CONFIG_HARDENED_USERCOPY.
> - [PATCH 2/4] usercopy: avoid direct copying to userspace
>   This contains various kernel memory copying cleanups in support of
>   CONFIG_HARDENED_USERCOPY_WHITELIST
> - [PATCH 3/4] usercopy: whitelist user-copyable caches
>   This contains all of CONFIG_HARDENED_USERCOPY_WHITELIST
> - [PATCH 4/4] usercopy: provide split of user-controlled slabs
>   This contains all of CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC
> 
> 
> Some notes:
> 
> - I couldn't detect a measurable performance change with these features
>   enabled. Kernel build times were unchanged, hackbench was unchanged,
>   etc.
> 
> - SLOB seems entirely broken. I have no idea what's going on there; I
>   spent my time testing SLAB and SLUB mostly.
> 
> - Ultimately, I think CONFIG_HARDENED_USERCOPY should be a no-brainer
>   to enable by default. CONFIG_HARDENED_USERCOPY_WHITELIST will take
>   a little time to shake out all the needed tweaks, but after that,
>   it too should be enabled by default. I'm not sure how to reason
>   about ..._SPLIT_KMALLOC since it needs slab merging disabled to
>   be fully effective. I would need to understand the benefits of
>   slab merging better to give a reasonable suggestion here.
> 
> 
> Open questions, which hopefully spender or pageexec can help answer:
> 
> - What rationale has been used to select caches to whitelist with
>   SLAB_USERCOPY? I'd like to codify something that people can use to
>   guide them in their decisions.
> 
> - Why does CONFIG_PAX_USERCOPY_SLABS wrap the slub
> SLAB_ATTR_RO(usercopy)?
>   It seems like the SLUB_USERCOPY flag would be generally interesting,
>   and should be visible under CONFIG_PAX_USERCOPY instead.
> 
> - Why, in most architectures, when adding check_object_size() to the
>   copy_*_user implementations, is an explicit "if (!__builtin_constant_p(n))"
>   used? Is this just an optimization that assumes only non-constant
>   sizes need protecting? It seems to me that it shouldn't be there since
>   the address may have been manipulated too, not just the size.
> 
> - What is happening in the SLOB allocator? It seems to lack the
>   SLAB_USERCOPY flag check entirely, and has a boatload of changes
>   only under CONFIG_PAX_USERCOPY_SLABS. What's going on here?
> 
> - What allows filldir to work happily on grsecurity when "dcache" isn't
>   whitelisted with SLAB_USERCOPY?
> 
> 
> Things to do:
> 
> - "dcache" cache shouldn't be whitelisted.
> 
> - SLOB support needs to be fixed or removed.
> 
> - Any other fixups for non-whitelisted cache copying need to be
>   rediscovered and extracted from grsecurity. (Seriously, if you hit one,
>   read the stack dump, download grsecurity[2], and see if you find changes
>   in the function that handles the copying differently.)
> 
> - Determine how to reason about the need for slab merging to be fully
>   disabled vs partially disabled. Can someone explain to me what
>   slab merging is actually doing? Is it merging across different
>   caches? If so, it needs to stay fully disabled under ..._SPLIT_KMALLOC
>   otherwise the benefit of forcing separated caches disappears.
> 
> - Needs porting to arm64.
> 
> - Needs per-architecture support for stack frame checking (only x86 now).
> 
> - Need to extract the drivers/char/mem.c changes for ..._SPLIT_KMALLOC.
> 
> - Testing testing
> 
> 
> Thanks!
> 
> -Kees
> 
> v2:
> - moved common checking logic from fs/exec.c to mm/usercopy.c
> - fixed missing x86_64 copy_*_user checks.
> - fixed typos in sparc and ia64 checks.
> - changed report text from "leak" to "exposure".
> - dropped #ifdefs in favor of empty static inline functions.
> - consolidated common heap checks into a parent function.
> - added many more comments.
> - dropped curr_ip (IPv4 address) since it's unused by mainline.
> 
> v1:
> - Casey's initial extraction[1].
> 
> [1] http://www.openwall.com/lists/kernel-hardening/2016/05/19/5
> [2] https://grsecurity.net/download.php "grsecurity - test kernel patch"

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [kernel-hardening] Re: [RFC][PATCH v2 0/4] mm: Hardened usercopy
  2016-06-09 15:35 ` [kernel-hardening] RE: [RFC][PATCH v2 0/4] mm: Hardened usercopy Schaufler, Casey
@ 2016-06-09 17:48   ` Kees Cook
  0 siblings, 0 replies; 26+ messages in thread
From: Kees Cook @ 2016-06-09 17:48 UTC (permalink / raw)
  To: Schaufler, Casey
  Cc: kernel-hardening, Brad Spengler, PaX Team, Rik van Riel,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton

On Thu, Jun 9, 2016 at 8:35 AM, Schaufler, Casey
<casey.schaufler@intel.com> wrote:
>> -----Original Message-----
>> From: Kees Cook [mailto:keescook@chromium.org]
>> Sent: Wednesday, June 08, 2016 2:12 PM
>> To: kernel-hardening@lists.openwall.com
>> Cc: Kees Cook <keescook@chromium.org>; Brad Spengler
>> <spender@grsecurity.net>; PaX Team <pageexec@freemail.hu>; Schaufler,
>> Casey <casey.schaufler@intel.com>; Rik van Riel <riel@redhat.com>; Christoph
>> Lameter <cl@linux.com>; Pekka Enberg <penberg@kernel.org>; David Rientjes
>> <rientjes@google.com>; Joonsoo Kim <iamjoonsoo.kim@lge.com>; Andrew
>> Morton <akpm@linux-foundation.org>
>> Subject: [RFC][PATCH v2 0/4] mm: Hardened usercopy
>>
>> Hi,
>>
>> This is v2 of the RFC patches for a mainline port of PAX_USERCOPY. After
>> I started writing tests for Casey's earlier port[1], I kept fixing things
>> further and further until I ended up with a whole new patch series. To
>> that end, I also took Rik's feedback and made a number of other changes
>> and clean-ups, which are noted in the "v2" history at the end.
>
> I love it when a plan comes together.
>
> Thank you for v2. Hopefully v1 was useful as a base.

Yeah, absolutely! It let me focus on a much smaller set of changes,
and the v2 really just ended up falling out of writing the lkdtm
tests. I spent a bunch of time scratching my head over the fact that
on x86 copy_*_user doesn't call down to __copy_*_user (!!), and
instead uses _copy_*_user (one underscore). After figuring that out,
then I was able to trip over all kinds of things with the whitelisting
pieces, so I separated that out, and it just continued from there.

I'd still really love to get a lot more testing, since I suspect my
bare-bones VM is not sufficiently exercising all the code that needs
whitelist tweaks, etc.

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [kernel-hardening] Re: [PATCH v2 2/4] usercopy: avoid direct copying to userspace
  2016-06-08 21:11 ` [kernel-hardening] [PATCH v2 2/4] usercopy: avoid direct copying to userspace Kees Cook
@ 2016-06-09 23:37   ` Rik van Riel
  2016-06-10 21:09   ` Kees Cook
  1 sibling, 0 replies; 26+ messages in thread
From: Rik van Riel @ 2016-06-09 23:37 UTC (permalink / raw)
  To: Kees Cook
  Cc: kernel-hardening, Brad Spengler, PaX Team, Casey Schaufler,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton

On Wed,  8 Jun 2016 14:11:40 -0700
Kees Cook <keescook@chromium.org> wrote:

> Some non-whitelisted heap memory has small areas that need to be copied
> to userspace. For these cases, explicitly copy the needed contents out
> to stack first before sending to userspace. This lets their respective
> caches remain un-whitelisted (i.e. no SLAB_USERCOPY), since the bulk of
> their contents should not be exposed to userspace.
> 
> These changes, based on code by Brad Spengler and PaX Team, were extracted
> from grsecurity/PaX on a case-by-case basis as I ran into errors during
> testing of CONFIG_HARDENED_USERCOPY_WHITELIST:

You will want this bit as well. It is an adaptation, with
a slight change after digging through XFS code for an hour
and a half or so, of code originally from grsecurity.

With this change, my system boots a usercopy kernel without
any visible issues.

---8<---

Subject: mm,xfs: bounce buffer the file name in xfs_dir2_sf_getdents

"Short form" directories in XFS have the directory content inside the
in-memory inode, or other kernel memory. The directory contents can be
in the same slab object as other, more sensitive, contents.

Instead of marking the xfs_inode slab accessible to copy_from_user and
copy_to_user, bounce buffer the file name when doing getdents on a short
form directory.

This only affects short form directories, which will have a very small
number of entries. Large directories use different code.

Adapted from the grsecurity patch set. Thanks go out to pipacs and spender.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 fs/xfs/xfs_dir2_readdir.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_dir2_readdir.c b/fs/xfs/xfs_dir2_readdir.c
index f44f79996978..bc6c78cbe4c6 100644
--- a/fs/xfs/xfs_dir2_readdir.c
+++ b/fs/xfs/xfs_dir2_readdir.c
@@ -127,6 +127,7 @@ xfs_dir2_sf_getdents(
 	 */
 	sfep = xfs_dir2_sf_firstentry(sfp);
 	for (i = 0; i < sfp->count; i++) {
+		char name[sfep->namelen];
 		__uint8_t filetype;
 
 		off = xfs_dir2_db_off_to_dataptr(geo, geo->datablk,
@@ -140,7 +141,14 @@ xfs_dir2_sf_getdents(
 		ino = dp->d_ops->sf_get_ino(sfp, sfep);
 		filetype = dp->d_ops->sf_get_ftype(sfep);
 		ctx->pos = off & 0x7fffffff;
-		if (!dir_emit(ctx, (char *)sfep->name, sfep->namelen, ino,
+		/*
+		 * Short form directories have the file name stored in
+		 * memory that is not directly accessible to copy_to_user.
+		 * Bounce buffer the name, instead of potentially making
+		 * the other data accessible.
+		 */
+		memcpy(name, sfep->name, sfep->namelen);
+		if (!dir_emit(ctx, name, sfep->namelen, ino,
 			    xfs_dir3_get_dtype(dp->i_mount, filetype)))
 			return 0;
 		sfep = dp->d_ops->sf_nextentry(sfp, sfep);

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [kernel-hardening] [RFC][PATCH 6/4] mm: disallow user copy to/from separately allocated pages
  2016-06-08 21:11 [kernel-hardening] [RFC][PATCH v2 0/4] mm: Hardened usercopy Kees Cook
                   ` (5 preceding siblings ...)
  2016-06-09 15:35 ` [kernel-hardening] RE: [RFC][PATCH v2 0/4] mm: Hardened usercopy Schaufler, Casey
@ 2016-06-09 23:39 ` Rik van Riel
  2016-06-10 19:44   ` [kernel-hardening] [RFC][PATCH v2 " Rik van Riel
  2016-06-16  1:30 ` [kernel-hardening] [RFC][PATCH v2 0/4] mm: Hardened usercopy Valdis.Kletnieks
  7 siblings, 1 reply; 26+ messages in thread
From: Rik van Riel @ 2016-06-09 23:39 UTC (permalink / raw)
  To: Kees Cook
  Cc: kernel-hardening, Brad Spengler, PaX Team, Casey Schaufler,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton

This patch is some new functionality for the usercopy hardening
method. I hope it will be useful both to the mainline community
and the grsecurity community.

I have not figured out what to do about CMA yet, but that can
probably be done in a follow-up patch.

---8<---

Subject: mm: disallow user copy to/from separately allocated pages

A single copy_from_user or copy_to_user should go to or from a single
kernel object. Inside the slab, or on the stack, we can track the
individual objects.

For the general kernel heap, we do not know exactly where each object
is, but we can tell whether the whole range from ptr to ptr + n is
inside the same page, or inside the same compound page.

If the start and end of the "object" are in pages that were not allocated
together, we are likely dealing with an overflow from one object into
the next page, and should disallow this copy.

The kernel will have some objects that cross page boundaries in
sections like .rodata, .bss, etc.  Copying from those needs to be
allowed; they can be identified with PageReserved.

TODO: figure out what to do with CMA memory

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 mm/usercopy.c | 23 +++++++++++++++++++----
 2 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/mm/usercopy.c b/mm/usercopy.c
index e09c33070759..be75d97c5d75 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -109,7 +109,7 @@ static inline bool check_kernel_text_object(const void *ptr, unsigned long n)
 
 static inline const char *check_heap_object(const void *ptr, unsigned long n)
 {
-	struct page *page;
+	struct page *page, *endpage;
 
 	if (ZERO_OR_NULL_PTR(ptr))
 		return "<null>";
@@ -118,11 +118,26 @@ static inline const char *check_heap_object(const void *ptr, unsigned long n)
 		return NULL;
 
 	page = virt_to_head_page(ptr);
-	if (!PageSlab(page))
+	if (PageSlab(page))
+		/* Check allocator for flags and size. */
+		return __check_heap_object(ptr, n, page);
+
+	/* Is the object wholly within one base page? Great. */
+	if (likely(((unsigned long)ptr & (unsigned long)PAGE_MASK) ==
+		   ((unsigned long)(ptr + n - 1) & (unsigned long)PAGE_MASK)))
+		return NULL;
+
+	/* Are the start and end inside the same compound page? Great. */
+	endpage = virt_to_head_page(ptr + n - 1);
+	if (likely(endpage == page))
+		return NULL;
+
+	/* Is this a special area, eg. .rodata, .bss, or device memory? */
+	if (PageReserved(page) && PageReserved(endpage))
 		return NULL;
 
-	/* Check allocator for flags and size. */
-	return __check_heap_object(ptr, n, page);
+	/* Uh oh. The "object" spans several independently allocated pages. */
+	return "<spans multiple pages>";
 }
 
 /*

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [kernel-hardening] [RFC][PATCH v2 6/4] mm: disallow user copy to/from separately allocated pages
  2016-06-09 23:39 ` [kernel-hardening] [RFC][PATCH 6/4] mm: disallow user copy to/from separately allocated pages Rik van Riel
@ 2016-06-10 19:44   ` Rik van Riel
  2016-06-10 20:46     ` [kernel-hardening] " Kees Cook
  2016-06-24 20:53     ` Kees Cook
  0 siblings, 2 replies; 26+ messages in thread
From: Rik van Riel @ 2016-06-10 19:44 UTC (permalink / raw)
  To: Kees Cook
  Cc: kernel-hardening, Brad Spengler, PaX Team, Casey Schaufler,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton

v2 of yesterday's patch, this one seems to completely work on my
system, after taking _bdata & _edata into account.

I am still looking into CMA, but am leaning towards doing that as
a follow-up patch.

---8<---

Subject: mm: disallow user copy to/from separately allocated pages

A single copy_from_user or copy_to_user should go to or from a single
kernel object. Inside the slab, or on the stack, we can track the
individual objects.

For the general kernel heap, we do not know exactly where each object
is, but we can tell whether the whole range from ptr to ptr + n is
inside the same page, or inside the same compound page.

If the start and end of the "object" are in pages that were not allocated
together, we are likely dealing with an overflow from one object into
the next page, and should disallow this copy.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
v2:
 - also test against _bdata & _edata, this appears to be necessary for
   some kernfs/sysfs stuff
 - clean up the code a little bit

 mm/usercopy.c | 27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/mm/usercopy.c b/mm/usercopy.c
index e09c33070759..78869ea73194 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -109,7 +109,8 @@ static inline bool check_kernel_text_object(const void *ptr, unsigned long n)
 
 static inline const char *check_heap_object(const void *ptr, unsigned long n)
 {
-	struct page *page;
+	struct page *page, *endpage;
+	const void *end = ptr + n - 1;
 
 	if (ZERO_OR_NULL_PTR(ptr))
 		return "<null>";
@@ -118,11 +119,29 @@ static inline const char *check_heap_object(const void *ptr, unsigned long n)
 		return NULL;
 
 	page = virt_to_head_page(ptr);
-	if (!PageSlab(page))
+	if (PageSlab(page))
+		/* Check allocator for flags and size. */
+		return __check_heap_object(ptr, n, page);
+
+	/* Is the object wholly within one base page? */
+	if (likely(((unsigned long)ptr & (unsigned long)PAGE_MASK) ==
+		   ((unsigned long)end & (unsigned long)PAGE_MASK)))
+		return NULL;
+
+	/* Are the start and end inside the same compound page? */
+	endpage = virt_to_head_page(end);
+	if (likely(endpage == page))
+		return NULL;
+
+	/* Is this a special area, eg. .rodata, .bss, or device memory? */
+	if (ptr >= (const void *)_sdata && end <= (const void *)_edata)
+		return NULL;
+
+	if (PageReserved(page) && PageReserved(endpage))
 		return NULL;
 
-	/* Check allocator for flags and size. */
-	return __check_heap_object(ptr, n, page);
+	/* Uh oh. The "object" spans several independently allocated pages. */
+	return "<spans multiple pages>";
 }
 
 /*

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [kernel-hardening] Re: [RFC][PATCH v2 6/4] mm: disallow user copy to/from separately allocated pages
  2016-06-10 19:44   ` [kernel-hardening] [RFC][PATCH v2 " Rik van Riel
@ 2016-06-10 20:46     ` Kees Cook
  2016-06-24 20:53     ` Kees Cook
  1 sibling, 0 replies; 26+ messages in thread
From: Kees Cook @ 2016-06-10 20:46 UTC (permalink / raw)
  To: Rik van Riel
  Cc: kernel-hardening, Brad Spengler, PaX Team, Casey Schaufler,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton

On Fri, Jun 10, 2016 at 12:44 PM, Rik van Riel <riel@redhat.com> wrote:
> v2 of yesterday's patch, this one seems to completely work on my
> system, after taking _bdata & _edata into account.
>
> I am still looking into CMA, but am leaning towards doing that as
> a follow-up patch.

Cool. It looks like this is slowly changing to more of a whitelist
than a blacklist, which I think would be a welcome improvement if it's
feasible. That way we don't need an explicit check for kernel text, it
would already be outside the allowed areas.

-Kees

>
> ---8<---
>
> Subject: mm: disallow user copy to/from separately allocated pages
>
> A single copy_from_user or copy_to_user should go to or from a single
> kernel object. Inside the slab, or on the stack, we can track the
> individual objects.
>
> For the general kernel heap, we do not know exactly where each object
> is, but we can tell whether the whole range from ptr to ptr + n is
> inside the same page, or inside the same compound page.
>
> If the start and end of the "object" are in pages that were not allocated
> together, we are likely dealing with an overflow from one object into
> the next page, and should disallow this copy.
>
> Signed-off-by: Rik van Riel <riel@redhat.com>
> ---
> v2:
>  - also test against _bdata & _edata, this appears to be necessary for
>    some kernfs/sysfs stuff
>  - clean up the code a little bit
>
>  mm/usercopy.c | 27 +++++++++++++++++++++++----
>  1 file changed, 23 insertions(+), 4 deletions(-)
>
> diff --git a/mm/usercopy.c b/mm/usercopy.c
> index e09c33070759..78869ea73194 100644
> --- a/mm/usercopy.c
> +++ b/mm/usercopy.c
> @@ -109,7 +109,8 @@ static inline bool check_kernel_text_object(const void *ptr, unsigned long n)
>
>  static inline const char *check_heap_object(const void *ptr, unsigned long n)
>  {
> -       struct page *page;
> +       struct page *page, *endpage;
> +       const void *end = ptr + n - 1;
>
>         if (ZERO_OR_NULL_PTR(ptr))
>                 return "<null>";
> @@ -118,11 +119,29 @@ static inline const char *check_heap_object(const void *ptr, unsigned long n)
>                 return NULL;
>
>         page = virt_to_head_page(ptr);
> -       if (!PageSlab(page))
> +       if (PageSlab(page))
> +               /* Check allocator for flags and size. */
> +               return __check_heap_object(ptr, n, page);
> +
> +       /* Is the object wholly within one base page? */
> +       if (likely(((unsigned long)ptr & (unsigned long)PAGE_MASK) ==
> +                  ((unsigned long)end & (unsigned long)PAGE_MASK)))
> +               return NULL;
> +
> +       /* Are the start and end inside the same compound page? */
> +       endpage = virt_to_head_page(end);
> +       if (likely(endpage == page))
> +               return NULL;
> +
> +       /* Is this a special area, eg. .rodata, .bss, or device memory? */
> +       if (ptr >= (const void *)_sdata && end <= (const void *)_edata)
> +               return NULL;
> +
> +       if (PageReserved(page) && PageReserved(endpage))
>                 return NULL;
>
> -       /* Check allocator for flags and size. */
> -       return __check_heap_object(ptr, n, page);
> +       /* Uh oh. The "object" spans several independently allocated pages. */
> +       return "<spans multiple pages>";
>  }
>
>  /*



-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [kernel-hardening] Re: [PATCH v2 2/4] usercopy: avoid direct copying to userspace
  2016-06-08 21:11 ` [kernel-hardening] [PATCH v2 2/4] usercopy: avoid direct copying to userspace Kees Cook
  2016-06-09 23:37   ` [kernel-hardening] " Rik van Riel
@ 2016-06-10 21:09   ` Kees Cook
  2016-06-11  1:08     ` Rik van Riel
  1 sibling, 1 reply; 26+ messages in thread
From: Kees Cook @ 2016-06-10 21:09 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Kees Cook, Brad Spengler, PaX Team, Casey Schaufler,
	Rik van Riel, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton

On Wed, Jun 8, 2016 at 2:11 PM, Kees Cook <keescook@chromium.org> wrote:
> Some non-whitelisted heap memory has small areas that need to be copied
> to userspace. For these cases, explicitly copy the needed contents out
> to stack first before sending to userspace. This lets their respective
> caches remain un-whitelisted (i.e. no SLAB_USERCOPY), since the bulk of
> their contents should not be exposed to userspace.

I've spent some time thinking about these kinds of
non-whitelisted-slab-workaround changes, and I would like to see if we
can design a better solution. So, to that end, here's what I see:

- HARDENED_USERCOPY verifies object addresses and sizes
- whitelisted caches (via HARDENED_USERCOPY_WHITELIST's SLAB_USERCOPY)
are intentionally rare
- Some code uses small parts of non-whitelisted cache memory for
userspace work (I think the auxv ("mm_struct") and signal frames
("task_struct") are good examples of this: neither cache should be
entirely exposed to userspace, yet tiny pieces are sent to userspace.)
- non-whitelist-workarounds are open-coded
- non-whitelist-workarounds require a double-copy
- non-whitelist-workarounds have explicit size maximums (e.g.
AT_VECTOR_SIZE, sizeof(sigset_t))
- non-whitelist-workarounds _bypass_ HARDENED_USERCOPY object address checking

So, while the workarounds do have a max-size sanity-check, they
actually lack the object address checking that would normally happen
with the usercopy checks. I think to solve the open-coding and
double-copy problems without compromising on the whitelisting or the
explicit size checking, we could also gain back the address checking
if we created something like:

copy_to_user_n(user, kernel, dynamic-size, const-max-size);

If "const-max-size" isn't detected as a builtin_constant it could fail
to build. When run, it would a) verify dynamic-size wasn't larger that
const-max-size, and b) perform the regular usercopy checks (without
the SLAB_USERCOPY check).

So, for the auxv example, instead of the new stack variable, the
memcpy, etc, it could just be a one-line change replacing the existing
copy_to_user() call:

copy_to_user_n(sp, elf_info, ei_index * sizeof(elf_addr_t), AT_VECTOR_SIZE);

(Bike-shedding: copy_to_user_bounded(), ..._limited(), ..._whitelist_hole(), ?)

What do people think?

>
> These changes, based on code by Brad Spengler and PaX Team, were extracted
> from grsecurity/PaX on a case-by-case basis as I ran into errors during
> testing of CONFIG_HARDENED_USERCOPY_WHITELIST:
>
> Changed exec to use copy of auxv instead of copying from "mm_struct" cache.
>
> Changed signal handling to use an on-stack copy of signal frames instead
> of direct copying from "task_struct" cache.
>
> Changed name_to_handle to use put_user (which operates on fixed sizes)
> instead of copy_to_user from "mnt_cache" cache.
>
> Changed readlink to use stack for in-struct names to avoid copying from
> filesystem cache (e.g. "ext4_inode_cache")
>
> Changed sg ioctl to bounce commands through stack instead of directly
> copying to/from "blkdev_requests" cache.
>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  arch/x86/kernel/signal.c |  6 +++---
>  block/scsi_ioctl.c       | 27 +++++++++++++++++++++++++--
>  fs/binfmt_elf.c          |  8 +++++++-
>  fs/fhandle.c             |  3 +--
>  fs/namei.c               | 11 ++++++++++-
>  5 files changed, 46 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
> index 22cc2f9f8aec..479fb2b9afcd 100644
> --- a/arch/x86/kernel/signal.c
> +++ b/arch/x86/kernel/signal.c
> @@ -680,8 +680,8 @@ static int
>  setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs)
>  {
>         int usig = ksig->sig;
> -       sigset_t *set = sigmask_to_save();
> -       compat_sigset_t *cset = (compat_sigset_t *) set;
> +       sigset_t sigcopy = *sigmask_to_save();
> +       compat_sigset_t *cset = (compat_sigset_t *)&sigcopy;
>
>         /* Set up the stack frame */
>         if (is_ia32_frame()) {
> @@ -692,7 +692,7 @@ setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs)
>         } else if (is_x32_frame()) {
>                 return x32_setup_rt_frame(ksig, cset, regs);
>         } else {
> -               return __setup_rt_frame(ksig->sig, ksig, set, regs);
> +               return __setup_rt_frame(ksig->sig, ksig, &sigcopy, regs);
>         }
>  }
>
> [...]
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index e158b22ef32f..84edd23d7fcc 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -167,6 +167,8 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
>         int ei_index = 0;
>         const struct cred *cred = current_cred();
>         struct vm_area_struct *vma;
> +       unsigned long saved_auxv[AT_VECTOR_SIZE];
> +       size_t saved_auxv_size;
>
>         /*
>          * In some cases (e.g. Hyper-Threading), we want to avoid L1
> @@ -324,9 +326,13 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
>                 return -EFAULT;
>         current->mm->env_end = p;
>
> +       saved_auxv_size = ei_index * sizeof(elf_addr_t);
> +       BUG_ON(saved_auxv_size > sizeof(saved_auxv));
> +       memcpy(saved_auxv, elf_info, saved_auxv_size);
> +
>         /* Put the elf_info on the stack in the right place.  */
>         sp = (elf_addr_t __user *)envp + 1;
> -       if (copy_to_user(sp, elf_info, ei_index * sizeof(elf_addr_t)))
> +       if (copy_to_user(sp, saved_auxv, saved_auxv_size))
>                 return -EFAULT;
>         return 0;
>  }
> [...]

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [kernel-hardening] Re: [PATCH v2 2/4] usercopy: avoid direct copying to userspace
  2016-06-10 21:09   ` Kees Cook
@ 2016-06-11  1:08     ` Rik van Riel
  0 siblings, 0 replies; 26+ messages in thread
From: Rik van Riel @ 2016-06-11  1:08 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Kees Cook, Brad Spengler, PaX Team, Casey Schaufler,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 2793 bytes --]

On Fri, 2016-06-10 at 14:09 -0700, Kees Cook wrote:
> On Wed, Jun 8, 2016 at 2:11 PM, Kees Cook <keescook@chromium.org>
> wrote:
> > 
> > Some non-whitelisted heap memory has small areas that need to be
> > copied
> > to userspace. For these cases, explicitly copy the needed contents
> > out
> > to stack first before sending to userspace. This lets their
> > respective
> > caches remain un-whitelisted (i.e. no SLAB_USERCOPY), since the
> > bulk of
> > their contents should not be exposed to userspace.
> I've spent some time thinking about these kinds of
> non-whitelisted-slab-workaround changes, and I would like to see if
> we
> can design a better solution. So, to that end, here's what I see:
> 
> - HARDENED_USERCOPY verifies object addresses and sizes
> - whitelisted caches (via HARDENED_USERCOPY_WHITELIST's
> SLAB_USERCOPY)
> are intentionally rare
> - Some code uses small parts of non-whitelisted cache memory for
> userspace work (I think the auxv ("mm_struct") and signal frames
> ("task_struct") are good examples of this: neither cache should be
> entirely exposed to userspace, yet tiny pieces are sent to
> userspace.)
> - non-whitelist-workarounds are open-coded
> - non-whitelist-workarounds require a double-copy
> - non-whitelist-workarounds have explicit size maximums (e.g.
> AT_VECTOR_SIZE, sizeof(sigset_t))
> - non-whitelist-workarounds _bypass_ HARDENED_USERCOPY object address
> checking
> 
> So, while the workarounds do have a max-size sanity-check, they
> actually lack the object address checking that would normally happen
> with the usercopy checks. I think to solve the open-coding and
> double-copy problems without compromising on the whitelisting or the
> explicit size checking, we could also gain back the address checking
> if we created something like:
> 
> copy_to_user_n(user, kernel, dynamic-size, const-max-size);
> 
> If "const-max-size" isn't detected as a builtin_constant it could
> fail
> to build. When run, it would a) verify dynamic-size wasn't larger
> that
> const-max-size, and b) perform the regular usercopy checks (without
> the SLAB_USERCOPY check).
> 
> So, for the auxv example, instead of the new stack variable, the
> memcpy, etc, it could just be a one-line change replacing the
> existing
> copy_to_user() call:
> 
> copy_to_user_n(sp, elf_info, ei_index * sizeof(elf_addr_t),
> AT_VECTOR_SIZE);
> 
> (Bike-shedding: copy_to_user_bounded(), ..._limited(),
> ..._whitelist_hole(), ?)
> 
> What do people think?

I like your idea a lot.

For some kinds of objects, we could go one further.

Eg. for objects we know to be in the slab, we could use
copy_to_user_slab, and fail the copy if the pointer is
not a slab object.

-- 
All Rights Reversed.


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [kernel-hardening] [RFC][PATCH v2 0/4] mm: Hardened usercopy
  2016-06-08 21:11 [kernel-hardening] [RFC][PATCH v2 0/4] mm: Hardened usercopy Kees Cook
                   ` (6 preceding siblings ...)
  2016-06-09 23:39 ` [kernel-hardening] [RFC][PATCH 6/4] mm: disallow user copy to/from separately allocated pages Rik van Riel
@ 2016-06-16  1:30 ` Valdis.Kletnieks
  2016-06-16  1:38   ` Kees Cook
  7 siblings, 1 reply; 26+ messages in thread
From: Valdis.Kletnieks @ 2016-06-16  1:30 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Kees Cook, Brad Spengler, PaX Team, Casey Schaufler,
	Rik van Riel, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 1998 bytes --]

On Wed, 08 Jun 2016 14:11:38 -0700, Kees Cook said:
> Hi,
>
> This is v2 of the RFC patches for a mainline port of PAX_USERCOPY. After
> I started writing tests for Casey's earlier port[1], I kept fixing things
> further and further until I ended up with a whole new patch series. To
> that end, I also took Rik's feedback and made a number of other changes
> and clean-ups, which are noted in the "v2" history at the end.

In the "For what it's worth" category - the 6 patches apply mostly cleanly
to the linux-next tree as of next-20160614 - a bunch of offsets, and one
easily fixed reject against include/linux/slab.h caused by KASAN landing
in linux-next.

And I found a test case - the NVidia driver is, of course, not annotated
for USERCOPY, so this happened:

[   39.184701] usercopy: kernel memory exposure attempt detected from ffff8800bb056fc0 (nvidia_stack_cache) (3 bytes)
[   39.184715] CPU: 2 PID: 1583 Comm: Xorg Tainted: G           O    4.7.0-rc3-next-20160614-dirty #302
[   39.184720] Hardware name: Dell Inc. Latitude E6530/07Y85M, BIOS A17 08/19/2015
[   39.184725]  0000000000000000 00000000422dbb87 ffff8802233cbb28 ffffffffb769f61a
[   39.184736]  ffff8800bb056fc0 00000000422dbb87 0000000000000003 0000000000000001
[   39.184744]  ffff8802233cbb78 ffffffffb7367b30 0000000000000000 ffffea00028e92f0
[   39.184754] Call Trace:
[   39.184766]  [<ffffffffb769f61a>] dump_stack+0x7b/0xd1
[   39.184772]  [<ffffffffb7367b30>] __check_object_size+0x70/0x3d4
[   39.184947]  [<ffffffffc0287098>] os_memcpy_to_user+0x38/0x60 [nvidia]

So I guess you can stick a:

Tested-By: Valdis Kletnieks <valdis.kletnieks@vt.edu>

on that patch set. :)

(Of course, the system only lived for another 4 seconds after that, because the
blocked copy_to_user() caused the module to not initialize properly, and it
quite reasonably crapped all over itself as a result.  And yes, I realize that
*fixing* the module with proper annotations is a problem for me and the NVidia
engineering team...  :)


[-- Attachment #2: Type: application/pgp-signature, Size: 848 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [kernel-hardening] [RFC][PATCH v2 0/4] mm: Hardened usercopy
  2016-06-16  1:30 ` [kernel-hardening] [RFC][PATCH v2 0/4] mm: Hardened usercopy Valdis.Kletnieks
@ 2016-06-16  1:38   ` Kees Cook
  2016-06-16 23:36     ` Valdis.Kletnieks
  0 siblings, 1 reply; 26+ messages in thread
From: Kees Cook @ 2016-06-16  1:38 UTC (permalink / raw)
  To: Valdis Kletnieks
  Cc: kernel-hardening, Brad Spengler, PaX Team, Casey Schaufler,
	Rik van Riel, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton

On Wed, Jun 15, 2016 at 6:30 PM,  <Valdis.Kletnieks@vt.edu> wrote:
> On Wed, 08 Jun 2016 14:11:38 -0700, Kees Cook said:
>> Hi,
>>
>> This is v2 of the RFC patches for a mainline port of PAX_USERCOPY. After
>> I started writing tests for Casey's earlier port[1], I kept fixing things
>> further and further until I ended up with a whole new patch series. To
>> that end, I also took Rik's feedback and made a number of other changes
>> and clean-ups, which are noted in the "v2" history at the end.
>
> In the "For what it's worth" category - the 6 patches apply mostly cleanly
> to the linux-next tree as of next-20160614 - a bunch of offsets, and one
> easily fixed reject against include/linux/slab.h caused by KASAN landing
> in linux-next.
>
> And I found a test case - the NVidia driver is, of course, not annotated
> for USERCOPY, so this happened:
>
> [   39.184701] usercopy: kernel memory exposure attempt detected from ffff8800bb056fc0 (nvidia_stack_cache) (3 bytes)
> [   39.184715] CPU: 2 PID: 1583 Comm: Xorg Tainted: G           O    4.7.0-rc3-next-20160614-dirty #302
> [   39.184720] Hardware name: Dell Inc. Latitude E6530/07Y85M, BIOS A17 08/19/2015
> [   39.184725]  0000000000000000 00000000422dbb87 ffff8802233cbb28 ffffffffb769f61a
> [   39.184736]  ffff8800bb056fc0 00000000422dbb87 0000000000000003 0000000000000001
> [   39.184744]  ffff8802233cbb78 ffffffffb7367b30 0000000000000000 ffffea00028e92f0
> [   39.184754] Call Trace:
> [   39.184766]  [<ffffffffb769f61a>] dump_stack+0x7b/0xd1
> [   39.184772]  [<ffffffffb7367b30>] __check_object_size+0x70/0x3d4
> [   39.184947]  [<ffffffffc0287098>] os_memcpy_to_user+0x38/0x60 [nvidia]
>
> So I guess you can stick a:
>
> Tested-By: Valdis Kletnieks <valdis.kletnieks@vt.edu>
>
> on that patch set. :)

Awesome, thanks! It's good to know the system operated normally up
until that point. I'm glad to have lots of people testing.

> (Of course, the system only lived for another 4 seconds after that, because the
> blocked copy_to_user() caused the module to not initialize properly, and it
> quite reasonably crapped all over itself as a result.  And yes, I realize that
> *fixing* the module with proper annotations is a problem for me and the NVidia
> engineering team...  :)

Yeah, I think the slab whitelisting may cause some pain, which is why
I've moved it to a separate config. I'll get a v3 up at some point
here with a copy_*_user_n() implementation, but I've been poking at a
few other things for the moment.

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [kernel-hardening] [RFC][PATCH v2 0/4] mm: Hardened usercopy
  2016-06-16  1:38   ` Kees Cook
@ 2016-06-16 23:36     ` Valdis.Kletnieks
  2016-06-17  1:38       ` Valdis.Kletnieks
  0 siblings, 1 reply; 26+ messages in thread
From: Valdis.Kletnieks @ 2016-06-16 23:36 UTC (permalink / raw)
  To: Kees Cook
  Cc: kernel-hardening, Brad Spengler, PaX Team, Casey Schaufler,
	Rik van Riel, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton


[-- Attachment #1.1: Type: text/plain, Size: 1333 bytes --]

On Wed, 15 Jun 2016 18:38:31 -0700, Kees Cook said:
> On Wed, Jun 15, 2016 at 6:30 PM,  <Valdis.Kletnieks@vt.edu> wrote:
> > So I guess you can stick a:
> >
> > Tested-By: Valdis Kletnieks <valdis.kletnieks@vt.edu>
> >
> > on that patch set. :)
>
> Awesome, thanks! It's good to know the system operated normally up
> until that point. I'm glad to have lots of people testing.

Following up - I did a BFI patch against the NVidia driver that basically
tagged all its memory allocations as USERCOPY, and the resulting kernel
has gotten up to multiuser and XOrg.  Been up for a half hour doing my usual
stuff on the laptop, and no usercopy whines.

Workload: email, pidgin IM, Google Chrome with some 30 tabs, some streaming
audio. Plenty of room for corner cases still lurking, but all the basic
stuff is working.  I may whomp on it with trinity for a while, see if
anything falls out...

Today's surprise: VirtualBox 5.0.22 was released - and it was able to boot
a Windows 7 image to the desktop without complaint.  Something still wonky
there, as it gets unstable at some point, but given the lack of dmesg entries,
I suspect it's a linux-next regression rather than a usercopy issue.  Will
debug more later tonight.

NVidia patch attached as guidance to what's needed for anybody else who's facing
patching an out-of-tree module.


[-- Attachment #1.2: patch4 --]
[-- Type: text/plain , Size: 2366 bytes --]

--- nvidia-uvm/uvm_linux.h.dist	2016-06-16 04:54:42.573247324 -0400
+++ nvidia-uvm/uvm_linux.h	2016-06-16 17:23:29.863108182 -0400
@@ -185,7 +185,11 @@
 #define __GFP_NORETRY 0
 #endif

-#define NV_UVM_GFP_FLAGS (GFP_KERNEL | __GFP_NORETRY)
+#if !defined(__GFP_USERCOPY)
+#define __GFP_USERCOPY 0
+#endif
+
+#define NV_UVM_GFP_FLAGS (GFP_KERNEL | __GFP_NORETRY | __GFP_USERCOPY)

 #if defined(NV_VM_INSERT_PAGE_PRESENT)
 #define NV_VM_INSERT_PAGE(vma, addr, page) \
--- nvidia/nv-vm.c.dist	2016-06-09 20:37:13.000000000 -0400
+++ nvidia/nv-vm.c	2016-06-16 17:32:51.357212907 -0400
@@ -265,6 +265,9 @@
     if (at->flags & NV_ALLOC_TYPE_ZEROED)
         gfp_mask |= __GFP_ZERO;
 #endif
+#if defined(__GPF_USERCOPY)
+    gfp_mask |= __GFP_USERCOPY;
+#endif

     return gfp_mask;
 }
--- common/inc/nv-linux.h.dist	2016-06-16 04:49:57.775133204 -0400
+++ common/inc/nv-linux.h	2016-06-16 18:36:13.760153738 -0400
@@ -412,12 +412,16 @@
 #define __GFP_COMP 0
 #endif

+#if !defined(GFP_USERCOPY)
+#define GPF_USERCOPY 0
+#endif
+
 #if !defined(DEBUG) && defined(__GFP_NOWARN)
-#define NV_GFP_KERNEL (GFP_KERNEL | __GFP_NOWARN)
-#define NV_GFP_ATOMIC (GFP_ATOMIC | __GFP_NOWARN)
+#define NV_GFP_KERNEL (GFP_KERNEL | __GFP_NOWARN | GFP_USERCOPY)
+#define NV_GFP_ATOMIC (GFP_ATOMIC | __GFP_NOWARN | GFP_USERCOPY)
 #else
-#define NV_GFP_KERNEL (GFP_KERNEL)
-#define NV_GFP_ATOMIC (GFP_ATOMIC)
+#define NV_GFP_KERNEL (GFP_KERNEL | GFP_USERCOPY)
+#define NV_GFP_ATOMIC (GFP_ATOMIC | GFP_USERCOPY)
 #endif

 #if defined(GFP_DMA32)
@@ -427,9 +431,9 @@
  * such as Linux/x86-64; the alternative is to use an IOMMU such
  * as the one implemented with the K8 GART, if available.
  */
-#define NV_GFP_DMA32 (NV_GFP_KERNEL | GFP_DMA32)
+#define NV_GFP_DMA32 (NV_GFP_KERNEL | GFP_DMA32 | GFP_USERCOPY)
 #else
-#define NV_GFP_DMA32 (NV_GFP_KERNEL)
+#define NV_GFP_DMA32 (NV_GFP_KERNEL | GFP_USERCOPY)
 #endif

 #if defined(NVCPU_X86) || defined(NVCPU_X86_64)
@@ -1307,8 +1311,12 @@
     kmem_cache_create(name, size, align, flags, ctor, NULL)
 #endif

+#if !defined(SLAB_USERCOPY)
+#define SLAB_USERCOPY 0
+#endif
+
 #define NV_KMEM_CACHE_CREATE(name, type)    \
-    NV_KMEM_CACHE_CREATE_FULL(name, sizeof(type), 0, 0, NULL)
+    NV_KMEM_CACHE_CREATE_FULL(name, sizeof(type), 0, SLAB_USERCOPY, NULL)

 #define NV_KMEM_CACHE_DESTROY(kmem_cache)   \
     kmem_cache_destroy(kmem_cache)

[-- Attachment #2: Type: application/pgp-signature, Size: 848 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [kernel-hardening] [RFC][PATCH v2 0/4] mm: Hardened usercopy
  2016-06-16 23:36     ` Valdis.Kletnieks
@ 2016-06-17  1:38       ` Valdis.Kletnieks
  2016-06-18 19:30         ` Kees Cook
  0 siblings, 1 reply; 26+ messages in thread
From: Valdis.Kletnieks @ 2016-06-17  1:38 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Kees Cook, Brad Spengler, PaX Team, Casey Schaufler,
	Rik van Riel, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 2408 bytes --]

On Thu, 16 Jun 2016 19:36:52 -0400, Valdis.Kletnieks@vt.edu said:

> stuff is working.  I may whomp on it with trinity for a while, see if
> anything falls out...

Woo hoo! Bagged one! :)  (Haven't figured out yet if actual bug, or missing
annotation)

[ 4033.178386] NET: Registered protocol family 21
[ 4033.226806] NET: Registered protocol family 38
[ 4033.256276] Guest personality initialized and is inactive
[ 4033.256797] VMCI host device registered (name=vmci, major=10, minor=53)
[ 4033.256801] Initialized host personality
[ 4033.266376] NET: Registered protocol family 40
[ 4033.365982] NET: Registered protocol family 24
[ 4033.413031] irda_setsockopt: not allowed to set MAXSDUSIZE for this socket type!
[ 4033.531569] sock: process `trinity-main' is using obsolete setsockopt SO_BSDCOMPAT
[ 4033.834839] irda_setsockopt: not allowed to set MAXSDUSIZE for this socket type!
[ 4034.444515] irda_setsockopt: not allowed to set MAXSDUSIZE for this socket type!
[ 4034.569913] sctp: [Deprecated]: trinity-main (pid 19154) Use of int in max_burst socket option deprecated.
[ 4034.569913] Use struct sctp_assoc_value instead
[ 4034.728723] usercopy: kernel memory overwrite attempt detected to ffff8801ecef4700 (SCTP) (4 bytes)
[ 4034.728730] CPU: 3 PID: 19154 Comm: trinity-main Tainted: G           OE   4.7.0-rc3-next-20160614-dirty #302
[ 4034.728732] Hardware name: Dell Inc. Latitude E6530/07Y85M, BIOS A17 08/19/2015
[ 4034.728734]  0000000000000000 0000000063913a95 ffff8801f8b33da8 ffffffffb269f61a
[ 4034.728740]  ffff8801ecef4700 0000000063913a95 0000000000000004 0000000000000000
[ 4034.728744]  ffff8801f8b33df8 ffffffffb2367b30 0000000000000004 ffffea0006bd4580
[ 4034.728748] Call Trace:
[ 4034.728754]  [<ffffffffb269f61a>] dump_stack+0x7b/0xd1
[ 4034.728758]  [<ffffffffb2367b30>] __check_object_size+0x70/0x3d4
[ 4034.728761]  [<ffffffffb2eae5e4>] sctp_setsockopt.part.9+0x684/0x1e70
[ 4034.728764]  [<ffffffffb236f002>] ? __vfs_write+0x22/0x2e0
[ 4034.728767]  [<ffffffffb2eafe3e>] sctp_setsockopt+0x6e/0xe0
[ 4034.728770]  [<ffffffffb2bd1d0a>] sock_common_setsockopt+0x3a/0xc0
[ 4034.728772]  [<ffffffffb2bcfb99>] SyS_setsockopt+0x89/0x120
[ 4034.728775]  [<ffffffffb30896e5>] entry_SYSCALL_64_fastpath+0x18/0xa8
[ 4034.728779]  [<ffffffffb2143e3f>] ? trace_hardirqs_off_caller+0x1f/0xf0

Do we have a good place to collect these, or should I just post them here
as I find stuff?


[-- Attachment #2: Type: application/pgp-signature, Size: 848 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [kernel-hardening] [RFC][PATCH v2 0/4] mm: Hardened usercopy
  2016-06-17  1:38       ` Valdis.Kletnieks
@ 2016-06-18 19:30         ` Kees Cook
  0 siblings, 0 replies; 26+ messages in thread
From: Kees Cook @ 2016-06-18 19:30 UTC (permalink / raw)
  To: Valdis Kletnieks
  Cc: kernel-hardening, Brad Spengler, PaX Team, Casey Schaufler,
	Rik van Riel, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton

On Thu, Jun 16, 2016 at 6:38 PM,  <Valdis.Kletnieks@vt.edu> wrote:
> On Thu, 16 Jun 2016 19:36:52 -0400, Valdis.Kletnieks@vt.edu said:
>
>> stuff is working.  I may whomp on it with trinity for a while, see if
>> anything falls out...
>
> Woo hoo! Bagged one! :)  (Haven't figured out yet if actual bug, or missing
> annotation)
>
> [ 4033.178386] NET: Registered protocol family 21
> [ 4033.226806] NET: Registered protocol family 38
> [ 4033.256276] Guest personality initialized and is inactive
> [ 4033.256797] VMCI host device registered (name=vmci, major=10, minor=53)
> [ 4033.256801] Initialized host personality
> [ 4033.266376] NET: Registered protocol family 40
> [ 4033.365982] NET: Registered protocol family 24
> [ 4033.413031] irda_setsockopt: not allowed to set MAXSDUSIZE for this socket type!
> [ 4033.531569] sock: process `trinity-main' is using obsolete setsockopt SO_BSDCOMPAT
> [ 4033.834839] irda_setsockopt: not allowed to set MAXSDUSIZE for this socket type!
> [ 4034.444515] irda_setsockopt: not allowed to set MAXSDUSIZE for this socket type!
> [ 4034.569913] sctp: [Deprecated]: trinity-main (pid 19154) Use of int in max_burst socket option deprecated.
> [ 4034.569913] Use struct sctp_assoc_value instead
> [ 4034.728723] usercopy: kernel memory overwrite attempt detected to ffff8801ecef4700 (SCTP) (4 bytes)
> [ 4034.728730] CPU: 3 PID: 19154 Comm: trinity-main Tainted: G           OE   4.7.0-rc3-next-20160614-dirty #302
> [ 4034.728732] Hardware name: Dell Inc. Latitude E6530/07Y85M, BIOS A17 08/19/2015
> [ 4034.728734]  0000000000000000 0000000063913a95 ffff8801f8b33da8 ffffffffb269f61a
> [ 4034.728740]  ffff8801ecef4700 0000000063913a95 0000000000000004 0000000000000000
> [ 4034.728744]  ffff8801f8b33df8 ffffffffb2367b30 0000000000000004 ffffea0006bd4580
> [ 4034.728748] Call Trace:
> [ 4034.728754]  [<ffffffffb269f61a>] dump_stack+0x7b/0xd1
> [ 4034.728758]  [<ffffffffb2367b30>] __check_object_size+0x70/0x3d4
> [ 4034.728761]  [<ffffffffb2eae5e4>] sctp_setsockopt.part.9+0x684/0x1e70
> [ 4034.728764]  [<ffffffffb236f002>] ? __vfs_write+0x22/0x2e0
> [ 4034.728767]  [<ffffffffb2eafe3e>] sctp_setsockopt+0x6e/0xe0
> [ 4034.728770]  [<ffffffffb2bd1d0a>] sock_common_setsockopt+0x3a/0xc0
> [ 4034.728772]  [<ffffffffb2bcfb99>] SyS_setsockopt+0x89/0x120
> [ 4034.728775]  [<ffffffffb30896e5>] entry_SYSCALL_64_fastpath+0x18/0xa8
> [ 4034.728779]  [<ffffffffb2143e3f>] ? trace_hardirqs_off_caller+0x1f/0xf0

Cool, interesting. I don't see anything obvious in grsecurity's
patches that covers this, so either I'm missing something else, or
this bug exists there too. (Though not a lot of people use SCTP,
though.)

> Do we have a good place to collect these, or should I just post them here
> as I find stuff?

For now, let's just collect them on the list, and any patches that
might solve them. I'm hoping to add the copy_*_user_n() API to help
with these.

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [kernel-hardening] Re: [RFC][PATCH v2 6/4] mm: disallow user copy to/from separately allocated pages
  2016-06-10 19:44   ` [kernel-hardening] [RFC][PATCH v2 " Rik van Riel
  2016-06-10 20:46     ` [kernel-hardening] " Kees Cook
@ 2016-06-24 20:53     ` Kees Cook
  2016-06-24 20:57       ` Rik van Riel
  1 sibling, 1 reply; 26+ messages in thread
From: Kees Cook @ 2016-06-24 20:53 UTC (permalink / raw)
  To: Rik van Riel
  Cc: kernel-hardening, Brad Spengler, PaX Team, Casey Schaufler,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton

On Fri, Jun 10, 2016 at 12:44 PM, Rik van Riel <riel@redhat.com> wrote:
> v2 of yesterday's patch, this one seems to completely work on my
> system, after taking _bdata & _edata into account.
>
> I am still looking into CMA, but am leaning towards doing that as
> a follow-up patch.
>
> ---8<---
>
> Subject: mm: disallow user copy to/from separately allocated pages
>
> A single copy_from_user or copy_to_user should go to or from a single
> kernel object. Inside the slab, or on the stack, we can track the
> individual objects.
>
> For the general kernel heap, we do not know exactly where each object
> is, but we can tell whether the whole range from ptr to ptr + n is
> inside the same page, or inside the same compound page.
>
> If the start and end of the "object" are in pages that were not allocated
> together, we are likely dealing with an overflow from one object into
> the next page, and should disallow this copy.
>
> Signed-off-by: Rik van Riel <riel@redhat.com>
> ---
> v2:
>  - also test against _bdata & _edata, this appears to be necessary for
>    some kernfs/sysfs stuff
>  - clean up the code a little bit
>
>  mm/usercopy.c | 27 +++++++++++++++++++++++----
>  1 file changed, 23 insertions(+), 4 deletions(-)
>
> diff --git a/mm/usercopy.c b/mm/usercopy.c
> index e09c33070759..78869ea73194 100644
> --- a/mm/usercopy.c
> +++ b/mm/usercopy.c
> @@ -109,7 +109,8 @@ static inline bool check_kernel_text_object(const void *ptr, unsigned long n)
>
>  static inline const char *check_heap_object(const void *ptr, unsigned long n)
>  {
> -       struct page *page;
> +       struct page *page, *endpage;
> +       const void *end = ptr + n - 1;
>
>         if (ZERO_OR_NULL_PTR(ptr))
>                 return "<null>";
> @@ -118,11 +119,29 @@ static inline const char *check_heap_object(const void *ptr, unsigned long n)
>                 return NULL;
>
>         page = virt_to_head_page(ptr);
> -       if (!PageSlab(page))
> +       if (PageSlab(page))
> +               /* Check allocator for flags and size. */
> +               return __check_heap_object(ptr, n, page);
> +
> +       /* Is the object wholly within one base page? */
> +       if (likely(((unsigned long)ptr & (unsigned long)PAGE_MASK) ==
> +                  ((unsigned long)end & (unsigned long)PAGE_MASK)))
> +               return NULL;
> +
> +       /* Are the start and end inside the same compound page? */
> +       endpage = virt_to_head_page(end);
> +       if (likely(endpage == page))
> +               return NULL;
> +
> +       /* Is this a special area, eg. .rodata, .bss, or device memory? */
> +       if (ptr >= (const void *)_sdata && end <= (const void *)_edata)
> +               return NULL;
> +
> +       if (PageReserved(page) && PageReserved(endpage))
>                 return NULL;

Shouldn't PageReserved cover the .data, .rodata, and .bss areas
already? Is the concern for the check being added here that a copy
might span multiple pages and that they're not allocated together when
laying out the kernel data regions?

-Kees

>
> -       /* Check allocator for flags and size. */
> -       return __check_heap_object(ptr, n, page);
> +       /* Uh oh. The "object" spans several independently allocated pages. */
> +       return "<spans multiple pages>";
>  }
>
>  /*



-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [kernel-hardening] Re: [RFC][PATCH v2 6/4] mm: disallow user copy to/from separately allocated pages
  2016-06-24 20:53     ` Kees Cook
@ 2016-06-24 20:57       ` Rik van Riel
  2016-06-24 20:59         ` Kees Cook
  0 siblings, 1 reply; 26+ messages in thread
From: Rik van Riel @ 2016-06-24 20:57 UTC (permalink / raw)
  To: Kees Cook
  Cc: kernel-hardening, Brad Spengler, PaX Team, Casey Schaufler,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 4065 bytes --]

On Fri, 2016-06-24 at 13:53 -0700, Kees Cook wrote:
> On Fri, Jun 10, 2016 at 12:44 PM, Rik van Riel <riel@redhat.com>
> wrote:
> > 
> > v2 of yesterday's patch, this one seems to completely work on my
> > system, after taking _bdata & _edata into account.
> > 
> > I am still looking into CMA, but am leaning towards doing that as
> > a follow-up patch.
> > 
> > ---8<---
> > 
> > Subject: mm: disallow user copy to/from separately allocated pages
> > 
> > A single copy_from_user or copy_to_user should go to or from a
> > single
> > kernel object. Inside the slab, or on the stack, we can track the
> > individual objects.
> > 
> > For the general kernel heap, we do not know exactly where each
> > object
> > is, but we can tell whether the whole range from ptr to ptr + n is
> > inside the same page, or inside the same compound page.
> > 
> > If the start and end of the "object" are in pages that were not
> > allocated
> > together, we are likely dealing with an overflow from one object
> > into
> > the next page, and should disallow this copy.
> > 
> > Signed-off-by: Rik van Riel <riel@redhat.com>
> > ---
> > v2:
> >  - also test against _bdata & _edata, this appears to be necessary
> > for
> >    some kernfs/sysfs stuff
> >  - clean up the code a little bit
> > 
> >  mm/usercopy.c | 27 +++++++++++++++++++++++----
> >  1 file changed, 23 insertions(+), 4 deletions(-)
> > 
> > diff --git a/mm/usercopy.c b/mm/usercopy.c
> > index e09c33070759..78869ea73194 100644
> > --- a/mm/usercopy.c
> > +++ b/mm/usercopy.c
> > @@ -109,7 +109,8 @@ static inline bool
> > check_kernel_text_object(const void *ptr, unsigned long n)
> > 
> >  static inline const char *check_heap_object(const void *ptr,
> > unsigned long n)
> >  {
> > -       struct page *page;
> > +       struct page *page, *endpage;
> > +       const void *end = ptr + n - 1;
> > 
> >         if (ZERO_OR_NULL_PTR(ptr))
> >                 return "<null>";
> > @@ -118,11 +119,29 @@ static inline const char
> > *check_heap_object(const void *ptr, unsigned long n)
> >                 return NULL;
> > 
> >         page = virt_to_head_page(ptr);
> > -       if (!PageSlab(page))
> > +       if (PageSlab(page))
> > +               /* Check allocator for flags and size. */
> > +               return __check_heap_object(ptr, n, page);
> > +
> > +       /* Is the object wholly within one base page? */
> > +       if (likely(((unsigned long)ptr & (unsigned long)PAGE_MASK)
> > ==
> > +                  ((unsigned long)end & (unsigned
> > long)PAGE_MASK)))
> > +               return NULL;
> > +
> > +       /* Are the start and end inside the same compound page? */
> > +       endpage = virt_to_head_page(end);
> > +       if (likely(endpage == page))
> > +               return NULL;
> > +
> > +       /* Is this a special area, eg. .rodata, .bss, or device
> > memory? */
> > +       if (ptr >= (const void *)_sdata && end <= (const void
> > *)_edata)
> > +               return NULL;
> > +
> > +       if (PageReserved(page) && PageReserved(endpage))
> >                 return NULL;
> Shouldn't PageReserved cover the .data, .rodata, and .bss areas
> already? Is the concern for the check being added here that a copy
> might span multiple pages and that they're not allocated together
> when
> laying out the kernel data regions?

Having just the PageReserved check was not enough, I had
to specifically add the _sdata & _edata test to get things
to work.

It looks like PageReserved is simply not set in some cases,
perhaps we are mapping those kernel sections with huge pages,
but not setting up the compound page stuff for the backing
pages, since they never pass through the buddy allocator?

-- 
All Rights Reversed.


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [kernel-hardening] Re: [RFC][PATCH v2 6/4] mm: disallow user copy to/from separately allocated pages
  2016-06-24 20:57       ` Rik van Riel
@ 2016-06-24 20:59         ` Kees Cook
  0 siblings, 0 replies; 26+ messages in thread
From: Kees Cook @ 2016-06-24 20:59 UTC (permalink / raw)
  To: Rik van Riel
  Cc: kernel-hardening, Brad Spengler, PaX Team, Casey Schaufler,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton

On Fri, Jun 24, 2016 at 1:57 PM, Rik van Riel <riel@redhat.com> wrote:
> On Fri, 2016-06-24 at 13:53 -0700, Kees Cook wrote:
>> On Fri, Jun 10, 2016 at 12:44 PM, Rik van Riel <riel@redhat.com>
>> wrote:
>> >
>> > v2 of yesterday's patch, this one seems to completely work on my
>> > system, after taking _bdata & _edata into account.
>> >
>> > I am still looking into CMA, but am leaning towards doing that as
>> > a follow-up patch.
>> >
>> > ---8<---
>> >
>> > Subject: mm: disallow user copy to/from separately allocated pages
>> >
>> > A single copy_from_user or copy_to_user should go to or from a
>> > single
>> > kernel object. Inside the slab, or on the stack, we can track the
>> > individual objects.
>> >
>> > For the general kernel heap, we do not know exactly where each
>> > object
>> > is, but we can tell whether the whole range from ptr to ptr + n is
>> > inside the same page, or inside the same compound page.
>> >
>> > If the start and end of the "object" are in pages that were not
>> > allocated
>> > together, we are likely dealing with an overflow from one object
>> > into
>> > the next page, and should disallow this copy.
>> >
>> > Signed-off-by: Rik van Riel <riel@redhat.com>
>> > ---
>> > v2:
>> >  - also test against _bdata & _edata, this appears to be necessary
>> > for
>> >    some kernfs/sysfs stuff
>> >  - clean up the code a little bit
>> >
>> >  mm/usercopy.c | 27 +++++++++++++++++++++++----
>> >  1 file changed, 23 insertions(+), 4 deletions(-)
>> >
>> > diff --git a/mm/usercopy.c b/mm/usercopy.c
>> > index e09c33070759..78869ea73194 100644
>> > --- a/mm/usercopy.c
>> > +++ b/mm/usercopy.c
>> > @@ -109,7 +109,8 @@ static inline bool
>> > check_kernel_text_object(const void *ptr, unsigned long n)
>> >
>> >  static inline const char *check_heap_object(const void *ptr,
>> > unsigned long n)
>> >  {
>> > -       struct page *page;
>> > +       struct page *page, *endpage;
>> > +       const void *end = ptr + n - 1;
>> >
>> >         if (ZERO_OR_NULL_PTR(ptr))
>> >                 return "<null>";
>> > @@ -118,11 +119,29 @@ static inline const char
>> > *check_heap_object(const void *ptr, unsigned long n)
>> >                 return NULL;
>> >
>> >         page = virt_to_head_page(ptr);
>> > -       if (!PageSlab(page))
>> > +       if (PageSlab(page))
>> > +               /* Check allocator for flags and size. */
>> > +               return __check_heap_object(ptr, n, page);
>> > +
>> > +       /* Is the object wholly within one base page? */
>> > +       if (likely(((unsigned long)ptr & (unsigned long)PAGE_MASK)
>> > ==
>> > +                  ((unsigned long)end & (unsigned
>> > long)PAGE_MASK)))
>> > +               return NULL;
>> > +
>> > +       /* Are the start and end inside the same compound page? */
>> > +       endpage = virt_to_head_page(end);
>> > +       if (likely(endpage == page))
>> > +               return NULL;
>> > +
>> > +       /* Is this a special area, eg. .rodata, .bss, or device
>> > memory? */
>> > +       if (ptr >= (const void *)_sdata && end <= (const void
>> > *)_edata)
>> > +               return NULL;
>> > +
>> > +       if (PageReserved(page) && PageReserved(endpage))
>> >                 return NULL;
>> Shouldn't PageReserved cover the .data, .rodata, and .bss areas
>> already? Is the concern for the check being added here that a copy
>> might span multiple pages and that they're not allocated together
>> when
>> laying out the kernel data regions?
>
> Having just the PageReserved check was not enough, I had
> to specifically add the _sdata & _edata test to get things
> to work.
>
> It looks like PageReserved is simply not set in some cases,
> perhaps we are mapping those kernel sections with huge pages,
> but not setting up the compound page stuff for the backing
> pages, since they never pass through the buddy allocator?

Ah-ha, gotcha. Thanks!

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [kernel-hardening] Re: [PATCH v2 1/4] mm: Hardened usercopy
  2016-06-08 21:11 ` [kernel-hardening] [PATCH v2 1/4] " Kees Cook
  2016-06-09  0:47   ` [kernel-hardening] " Brad Spengler
@ 2016-07-12 23:04   ` Kees Cook
  1 sibling, 0 replies; 26+ messages in thread
From: Kees Cook @ 2016-07-12 23:04 UTC (permalink / raw)
  To: PaX Team
  Cc: Brad Spengler, Casey Schaufler, Rik van Riel, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	kernel-hardening

On Wed, Jun 8, 2016 at 5:11 PM, Kees Cook <keescook@chromium.org> wrote:
> This is an attempt at porting PAX_USERCOPY into the mainline kernel,
> calling it CONFIG_HARDENED_USERCOPY. The work is based on code by Brad
> Spengler and PaX Team, and an earlier port from Casey Schaufler.
>
> This patch contains the logic for validating several conditions when
> performing copy_to_user() and copy_from_user() on the kernel object
> being copied to/from:
> - if on the heap:
>   - the size of copy must be less than or equal to the size of the object
> - if on the stack (and we have architecture/build support for frames):
>   - object must be contained by the current stack frame
> - object must not be contained in the kernel text
>
> Additional restrictions are in following patches.
>
> This implements the checks on many architectures, but I have only tested
> x86_64 so far. I would love to see an arm64 port added as well.
>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> [...]
> +/*
> + * Checks if a given pointer and length is contained by the current
> + * stack frame (if possible).
> + *
> + *     0: not at all on the stack
> + *     1: fully on the stack (when can't do frame-checking)
> + *     2: fully inside the current stack frame
> + *     -1: error condition (invalid stack position or bad stack frame)
> + */
> +static noinline int check_stack_object(const void *obj, unsigned long len)
> +{
> +       const void * const stack = task_stack_page(current);
> +       const void * const stackend = stack + THREAD_SIZE;
> +
> +#if defined(CONFIG_FRAME_POINTER) && defined(CONFIG_X86)
> +       const void *frame = NULL;
> +       const void *oldframe;
> +#endif
> +
> +       /* Reject: object wraps past end of memory. */
> +       if (obj + len < obj)
> +               return -1;
> +
> +       /* Object is not on the stack at all. */
> +       if (obj + len <= stack || stackend <= obj)
> +               return 0;
> +
> +       /*
> +        * Reject: object partially overlaps the stack (passing the
> +        * the check above means at least one end is within the stack,
> +        * so if this check fails, the other end is outside the stack).
> +        */
> +       if (obj < stack || stackend < obj + len)
> +               return -1;
> +
> +#if defined(CONFIG_FRAME_POINTER) && defined(CONFIG_X86)
> +       oldframe = __builtin_frame_address(1);
> +       if (oldframe)
> +               frame = __builtin_frame_address(2);
> +       /*
> +        * low ----------------------------------------------> high
> +        * [saved bp][saved ip][args][local vars][saved bp][saved ip]
> +        *                   ^----------------^
> +        *             allow copies only within here
> +        */
> +       while (stack <= frame && frame < stackend) {
> +               /*
> +                * If obj + len extends past the last frame, this
> +                * check won't pass and the next frame will be 0,
> +                * causing us to bail out and correctly report
> +                * the copy as invalid.
> +                */
> +               if (obj + len <= frame)
> +                       return obj >= oldframe + 2 * sizeof(void *) ? 2 : -1;
> +               oldframe = frame;
> +               frame = *(const void * const *)frame;
> +       }
> +       return -1;
> +#else
> +       return 1;
> +#endif
> +}

PaX Team,

Doesn't this checking leave (possible) stack canaries exposed to being
copied to userspace? I'm at a loss for a way to reliably determine if
they're present or not, though...

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2016-07-12 23:04 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-08 21:11 [kernel-hardening] [RFC][PATCH v2 0/4] mm: Hardened usercopy Kees Cook
2016-06-08 21:11 ` [kernel-hardening] [PATCH v2 1/4] " Kees Cook
2016-06-09  0:47   ` [kernel-hardening] " Brad Spengler
2016-06-09  1:39     ` Rik van Riel
2016-06-09  2:58     ` Kees Cook
2016-07-12 23:04   ` Kees Cook
2016-06-08 21:11 ` [kernel-hardening] [PATCH v2 2/4] usercopy: avoid direct copying to userspace Kees Cook
2016-06-09 23:37   ` [kernel-hardening] " Rik van Riel
2016-06-10 21:09   ` Kees Cook
2016-06-11  1:08     ` Rik van Riel
2016-06-08 21:11 ` [kernel-hardening] [PATCH v2 3/4] usercopy: whitelist user-copyable caches Kees Cook
2016-06-08 21:11 ` [kernel-hardening] [PATCH v2 4/4] usercopy: provide split of user-controlled slabs Kees Cook
2016-06-09  3:02 ` [kernel-hardening] [RFC][PATCH v2 5/4] arm: fixes for usercopy Kees Cook
2016-06-09 15:35 ` [kernel-hardening] RE: [RFC][PATCH v2 0/4] mm: Hardened usercopy Schaufler, Casey
2016-06-09 17:48   ` [kernel-hardening] " Kees Cook
2016-06-09 23:39 ` [kernel-hardening] [RFC][PATCH 6/4] mm: disallow user copy to/from separately allocated pages Rik van Riel
2016-06-10 19:44   ` [kernel-hardening] [RFC][PATCH v2 " Rik van Riel
2016-06-10 20:46     ` [kernel-hardening] " Kees Cook
2016-06-24 20:53     ` Kees Cook
2016-06-24 20:57       ` Rik van Riel
2016-06-24 20:59         ` Kees Cook
2016-06-16  1:30 ` [kernel-hardening] [RFC][PATCH v2 0/4] mm: Hardened usercopy Valdis.Kletnieks
2016-06-16  1:38   ` Kees Cook
2016-06-16 23:36     ` Valdis.Kletnieks
2016-06-17  1:38       ` Valdis.Kletnieks
2016-06-18 19:30         ` Kees Cook

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.