linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC v1] implement SL*B and stack usercopy runtime checks
@ 2011-07-03 11:10 Vasiliy Kulikov
  2011-07-03 18:27 ` Linus Torvalds
  0 siblings, 1 reply; 16+ messages in thread
From: Vasiliy Kulikov @ 2011-07-03 11:10 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Linus Torvalds,
	x86, Arnd Bergmann, Christoph Lameter, Pekka Enberg,
	Matt Mackall, Andrew Morton, linux-kernel, linux-arch, linux-mm

This patch implements 2 additional checks for the data copied from
kernelspace to userspace and vice versa.  Currently there are some
very simple and cheap comparisons of supplied size and the size of
a copied object known at the compile time in copy_* functions.  This
patch enhances these checks to check against stack frame boundaries and
against SL*B object sizes.

More precisely, it checks:

1) if the data touches the stack, checks whether it fully fits in the stack
and whether it fully fits in a single stack frame.  The latter is arch
dependent, currently it is implemented for x86 with CONFIG_FRAME_POINTER=y only.
It limits infoleaks/overwrites to a single frame and arguments/local
variables only, and prevents saved return instruction pointer
overwriting.

2) if the data is from the SL*B cache, checks whether it fully fits in a
slab page and whether it overflows a slab object.  E.g. if the memory
was allocated as kmalloc(64, GFP_KERNEL) and one tries to copy 150
bytes, the copy would fail.


The checks are implemented for copy_{to,from}_user() and similar and are
missing for {put,get}_user() and similar because the base pointer might
be a result of any pointer arithmetics, and the correctness of these
arithmetics is almost impossible to check on this stage.

/dev/kmem and /dev/mem are fixed to pass this check (e.g. without
STRICT_DEVMEM it should be possible to overflow the stack frame and slab
objects).


The limitations:

The stack check does nothing with local variables overwriting and 
saved registers.  It only limits overflows to a single frame.

The SL*B checks don't validate whether the object is actually allocated.
So, it doesn't prevent infoleaks related to the freed objects.  Also if
the cache's granularity is larger than an actual allocated object size,
an infoleak of padding bytes is possible.

The slob check is missing yet.  Unfortunately, the check for slob
would have to (1) walk through the slob chunks and (2) hold the slob
lock, so it would lead to a significant slowdown.

The patch does nothing with other memory areas like vmalloc'ed areas,
modules' data and code sections, etc.  It could be an area for the
improvements.


The patch is a forwardport of the PAX_USERCOPY feature from the PaX
patch.  Most code was copied from the PaX patch with minor cosmetic
changes.  Also PaX' version of the patch has additional restrictions:

a) some slab caches has SLAB_USERCOPY flag set and copies to/from the slab
caches without the flag are denied.  Rare cases where some bytes needed
from the caches missing in the white list are handled by copying the
bytes into temporary area on the stack/heap.

b) if a malformed copy request is spotted, the event is logged and
SIGKILL signal is sent to the current task.


Questions/thoughts:

1) Is it possible to leave these checks unconditionally?  Or maybe
guarded by DEBUG_STRICT_USER_COPY_CHECKS or a new configure option?

2) Should this code put in action some monitoring/reacting mechanisms?
It makes sense to at least log the event because such overflow almost
surely is a result of a deliberate kernel bug exploitation attempt.
Whether active reactions on the event like killing is needed by default
is less obvious.

3) Maybe *_access_ok() checks should be moved to smth like
kernel_access_ok() for a more abstraction level and more comfortable
CONFIG_* guarding?  Then (2) and similar changes would be more simple.


If the patch will be ACK'ed by the maintainers, there will be separate
patches for the stack check, each of SL*Bs, x86, and /dev/*mem.  For
RFCv1 it is left as a one big blob.


Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
---

 arch/x86/include/asm/uaccess.h    |   51 +++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/uaccess_32.h |   21 ++++++++++++++-
 arch/x86/include/asm/uaccess_64.h |   47 +++++++++++++++++++++++++++++++++-
 arch/x86/kernel/crash_dump_32.c   |    2 +-
 arch/x86/kernel/crash_dump_64.c   |    2 +-
 arch/x86/lib/usercopy_64.c        |    4 +++
 drivers/char/mem.c                |    8 +++---
 include/asm-generic/uaccess.h     |   19 +++++++++++++
 include/linux/slab.h              |    9 ++++++
 mm/maccess.c                      |   41 +++++++++++++++++++++++++++++
 mm/slab.c                         |   35 +++++++++++++++++++++++++
 mm/slub.c                         |   29 +++++++++++++++++++++
 12 files changed, 260 insertions(+), 8 deletions(-)
---
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 99ddd14..a1d6b91 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -10,6 +10,9 @@
 #include <asm/asm.h>
 #include <asm/page.h>
 
+extern bool slab_access_ok(const void *ptr, unsigned long len);
+extern bool stack_access_ok(const void *ptr, unsigned long len);
+
 #define VERIFY_READ 0
 #define VERIFY_WRITE 1
 
@@ -78,6 +81,54 @@
  */
 #define access_ok(type, addr, size) (likely(__range_not_ok(addr, size) == 0))
 
+#if defined(CONFIG_FRAME_POINTER)
+/*
+ * MUST be always_inline to correctly count stack frame numbers.
+ *
+ * low ----------------------------------------------> high
+ * [saved bp][saved ip][args][local vars][saved bp][saved ip]
+ *		       ^----------------^
+ *		  allow copies only within here
+*/
+inline static __attribute__((always_inline))
+bool arch_check_object_on_stack_frame(const void *stack,
+	     const void *stackend, const void *obj, unsigned long len)
+{
+	const void *frame = NULL;
+	const void *oldframe;
+
+	/*
+	 * Get the stack_access_ok() caller frame.
+	 * __builtin_frame_address(0) returns stack_access_ok() frame
+	 * as arch_ is inline and stack_ is noinline.
+	 */
+	oldframe = __builtin_frame_address(0);
+	if (oldframe)
+		frame = __builtin_frame_address(1);
+
+	while (stack <= frame && frame < stackend) {
+		/*
+		 * If obj + len extends past the last frame, this
+		 * check won't pass and the next frame will be 0,
+		 * causing us to bail out and correctly report
+		 * the copy as invalid.
+		 */
+		if (obj + len <= frame) {
+			/* EBP + EIP */
+			int protected_regs_size = 2*sizeof(void *);
+
+			if (obj >= oldframe + protected_regs_size)
+				return true;
+			return false;
+		}
+		oldframe = frame;
+		frame = *(const void * const *)frame;
+	}
+	return false;
+}
+#define arch_check_object_on_stack_frame arch_check_object_on_stack_frame
+#endif
+
 /*
  * The exception table consists of pairs of addresses: the first is the
  * address of an instruction that is allowed to fault, and the second is
diff --git a/arch/x86/include/asm/uaccess_32.h b/arch/x86/include/asm/uaccess_32.h
index 566e803..9a9df71 100644
--- a/arch/x86/include/asm/uaccess_32.h
+++ b/arch/x86/include/asm/uaccess_32.h
@@ -61,6 +61,10 @@ __copy_to_user_inatomic(void __user *to, const void *from, unsigned long n)
 			return ret;
 		}
 	}
+
+	if (!slab_access_ok(from, n) || !stack_access_ok(from, n))
+		return n;
+
 	return __copy_to_user_ll(to, from, n);
 }
 
@@ -108,6 +112,10 @@ __copy_from_user_inatomic(void *to, const void __user *from, unsigned long n)
 			return ret;
 		}
 	}
+
+	if (!slab_access_ok(to, n) || !stack_access_ok(to, n))
+		return n;
+
 	return __copy_from_user_ll_nozero(to, from, n);
 }
 
@@ -152,6 +160,10 @@ __copy_from_user(void *to, const void __user *from, unsigned long n)
 			return ret;
 		}
 	}
+
+	if (!slab_access_ok(to, n) || !stack_access_ok(to, n))
+		return n;
+
 	return __copy_from_user_ll(to, from, n);
 }
 
@@ -174,6 +186,10 @@ static __always_inline unsigned long __copy_from_user_nocache(void *to,
 			return ret;
 		}
 	}
+
+	if (!slab_access_ok(to, n) || !stack_access_ok(to, n))
+		return n;
+
 	return __copy_from_user_ll_nocache(to, from, n);
 }
 
@@ -181,7 +197,10 @@ static __always_inline unsigned long
 __copy_from_user_inatomic_nocache(void *to, const void __user *from,
 				  unsigned long n)
 {
-       return __copy_from_user_ll_nocache_nozero(to, from, n);
+	if (!slab_access_ok(to, n) || !stack_access_ok(to, n))
+		return n;
+
+	return __copy_from_user_ll_nocache_nozero(to, from, n);
 }
 
 unsigned long __must_check copy_to_user(void __user *to,
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index 1c66d30..c25dcc7 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -50,6 +50,26 @@ static inline unsigned long __must_check copy_from_user(void *to,
 	int sz = __compiletime_object_size(to);
 
 	might_fault();
+	if (likely(sz == -1 || sz >= n)) {
+		if (!slab_access_ok(to, n) || !stack_access_ok(to, n))
+			return n;
+		n = _copy_from_user(to, from, n);
+	}
+#ifdef CONFIG_DEBUG_VM
+	else {
+		WARN(1, "Buffer overflow detected!\n");
+	}
+#endif
+	return n;
+}
+
+static inline unsigned long __must_check copy_from_user_nocheck(void *to,
+					  const void __user *from,
+					  unsigned long n)
+{
+	int sz = __compiletime_object_size(to);
+
+	might_fault();
 	if (likely(sz == -1 || sz >= n))
 		n = _copy_from_user(to, from, n);
 #ifdef CONFIG_DEBUG_VM
@@ -60,7 +80,7 @@ static inline unsigned long __must_check copy_from_user(void *to,
 }
 
 static __always_inline __must_check
-int copy_to_user(void __user *dst, const void *src, unsigned size)
+int copy_to_user_nocheck(void __user *dst, const void *src, unsigned size)
 {
 	might_fault();
 
@@ -68,11 +88,26 @@ int copy_to_user(void __user *dst, const void *src, unsigned size)
 }
 
 static __always_inline __must_check
+int copy_to_user(void __user *dst, const void *src, unsigned size)
+{
+	might_fault();
+
+	if (!slab_access_ok(src, size) || !stack_access_ok(src, size))
+		return size;
+
+	return copy_to_user_nocheck(dst, src, size);
+}
+
+static __always_inline __must_check
 int __copy_from_user(void *dst, const void __user *src, unsigned size)
 {
 	int ret = 0;
 
 	might_fault();
+
+	if (!slab_access_ok(dst, size) || !stack_access_ok(dst, size))
+		return size;
+
 	if (!__builtin_constant_p(size))
 		return copy_user_generic(dst, (__force void *)src, size);
 	switch (size) {
@@ -117,6 +152,10 @@ int __copy_to_user(void __user *dst, const void *src, unsigned size)
 	int ret = 0;
 
 	might_fault();
+
+	if (!slab_access_ok(src, size) || !stack_access_ok(src, size))
+		return size;
+
 	if (!__builtin_constant_p(size))
 		return copy_user_generic((__force void *)dst, src, size);
 	switch (size) {
@@ -221,12 +260,18 @@ __must_check unsigned long __clear_user(void __user *mem, unsigned long len);
 static __must_check __always_inline int
 __copy_from_user_inatomic(void *dst, const void __user *src, unsigned size)
 {
+	if (!slab_access_ok(dst, size) || !stack_access_ok(dst, size))
+		return size;
+
 	return copy_user_generic(dst, (__force const void *)src, size);
 }
 
 static __must_check __always_inline int
 __copy_to_user_inatomic(void __user *dst, const void *src, unsigned size)
 {
+	if (!slab_access_ok(src, size) || !stack_access_ok(src, size))
+		return size;
+
 	return copy_user_generic((__force void *)dst, src, size);
 }
 
diff --git a/arch/x86/kernel/crash_dump_32.c b/arch/x86/kernel/crash_dump_32.c
index 642f75a..6d3f36d 100644
--- a/arch/x86/kernel/crash_dump_32.c
+++ b/arch/x86/kernel/crash_dump_32.c
@@ -72,7 +72,7 @@ ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
 		}
 		copy_page(kdump_buf_page, vaddr);
 		kunmap_atomic(vaddr, KM_PTE0);
-		if (copy_to_user(buf, (kdump_buf_page + offset), csize))
+		if (copy_to_user_nocheck(buf, (kdump_buf_page + offset), csize))
 			return -EFAULT;
 	}
 
diff --git a/arch/x86/kernel/crash_dump_64.c b/arch/x86/kernel/crash_dump_64.c
index afa64ad..c241ab8 100644
--- a/arch/x86/kernel/crash_dump_64.c
+++ b/arch/x86/kernel/crash_dump_64.c
@@ -36,7 +36,7 @@ ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
 		return -ENOMEM;
 
 	if (userbuf) {
-		if (copy_to_user(buf, vaddr + offset, csize)) {
+		if (copy_to_user_nocheck(buf, vaddr + offset, csize)) {
 			iounmap(vaddr);
 			return -EFAULT;
 		}
diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c
index b7c2849..699f45b 100644
--- a/arch/x86/lib/usercopy_64.c
+++ b/arch/x86/lib/usercopy_64.c
@@ -42,6 +42,10 @@ long
 __strncpy_from_user(char *dst, const char __user *src, long count)
 {
 	long res;
+
+	if (!slab_access_ok(dst, count) || !stack_access_ok(dst, count))
+		return count;
+
 	__do_strncpy_from_user(dst, src, count, res);
 	return res;
 }
diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index 8fc04b4..0c506cb 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -132,7 +132,7 @@ static ssize_t read_mem(struct file *file, char __user *buf,
 		if (!ptr)
 			return -EFAULT;
 
-		remaining = copy_to_user(buf, ptr, sz);
+		remaining = copy_to_user_nocheck(buf, ptr, sz);
 		unxlate_dev_mem_ptr(p, ptr);
 		if (remaining)
 			return -EFAULT;
@@ -190,7 +190,7 @@ static ssize_t write_mem(struct file *file, const char __user *buf,
 			return -EFAULT;
 		}
 
-		copied = copy_from_user(ptr, buf, sz);
+		copied = copy_from_user_nocheck(ptr, buf, sz);
 		unxlate_dev_mem_ptr(p, ptr);
 		if (copied) {
 			written += sz - copied;
@@ -428,7 +428,7 @@ static ssize_t read_kmem(struct file *file, char __user *buf,
 			 */
 			kbuf = xlate_dev_kmem_ptr((char *)p);
 
-			if (copy_to_user(buf, kbuf, sz))
+			if (copy_to_user_nocheck(buf, kbuf, sz))
 				return -EFAULT;
 			buf += sz;
 			p += sz;
@@ -498,7 +498,7 @@ static ssize_t do_write_kmem(unsigned long p, const char __user *buf,
 		 */
 		ptr = xlate_dev_kmem_ptr((char *)p);
 
-		copied = copy_from_user(ptr, buf, sz);
+		copied = copy_from_user_nocheck(ptr, buf, sz);
 		if (copied) {
 			written += sz - copied;
 			if (written)
diff --git a/include/asm-generic/uaccess.h b/include/asm-generic/uaccess.h
index ac68c99..7124db6 100644
--- a/include/asm-generic/uaccess.h
+++ b/include/asm-generic/uaccess.h
@@ -50,6 +50,15 @@ static inline int __access_ok(unsigned long addr, unsigned long size)
 }
 #endif
 
+#ifndef arch_check_object_on_stack_frame
+static inline bool arch_check_object_on_stack_frame(const void *stack,
+	     const void *stackend, const void *obj, unsigned long len)
+{
+	return true;
+}
+#define arch_check_object_on_stack_frame arch_check_object_on_stack_frame
+#endif /* arch_check_object_on_stack_frame */
+
 /*
  * The exception table consists of pairs of addresses: the first is the
  * address of an instruction that is allowed to fault, and the second is
@@ -99,6 +108,9 @@ static inline __must_check long __copy_from_user(void *to,
 		}
 	}
 
+	if (!slab_access_ok(to, n) || !stack_access_ok(to, n))
+		return n;
+
 	memcpy(to, (const void __force *)from, n);
 	return 0;
 }
@@ -129,6 +141,9 @@ static inline __must_check long __copy_to_user(void __user *to,
 		}
 	}
 
+	if (!slab_access_ok(from, n) || !stack_access_ok(from, n))
+		return n;
+
 	memcpy((void __force *)to, from, n);
 	return 0;
 }
@@ -268,6 +283,10 @@ static inline long
 __strncpy_from_user(char *dst, const char __user *src, long count)
 {
 	char *tmp;
+
+	if (!slab_access_ok(dst, count) || !stack_access_ok(dst, count))
+		return count;
+
 	strncpy(dst, (const char __force *)src, count);
 	for (tmp = dst; *tmp && count > 0; tmp++, count--)
 		;
diff --git a/include/linux/slab.h b/include/linux/slab.h
index ad4dd1c..8e564bb 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -333,4 +333,13 @@ static inline void *kzalloc_node(size_t size, gfp_t flags, int node)
 
 void __init kmem_cache_init_late(void);
 
+/*
+ * slab_access_ok() checks whether ptr belongs to the slab cache and whether
+ * it fits in a single allocated area.
+ *
+ * Returns false only if ptr belongs to a slab cache and overflows allocated
+ * slab area.
+ */
+extern bool slab_access_ok(const void *ptr, unsigned long len);
+
 #endif	/* _LINUX_SLAB_H */
diff --git a/mm/maccess.c b/mm/maccess.c
index 4cee182..0b8f3eb 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -3,6 +3,7 @@
  */
 #include <linux/module.h>
 #include <linux/mm.h>
+#include <linux/sched.h>
 #include <linux/uaccess.h>
 
 /**
@@ -60,3 +61,43 @@ long __probe_kernel_write(void *dst, const void *src, size_t size)
 	return ret ? -EFAULT : 0;
 }
 EXPORT_SYMBOL_GPL(probe_kernel_write);
+
+/*
+ * stack_access_ok() checks whether object is on the stack and
+ * whether it fits in a single stack frame (in case arch allows
+ * to learn this information).
+ *
+ * Returns true in cases:
+ * a) object is not a stack object at all
+ * b) object is located on the stack and fits in a single frame
+ *
+ * MUST be noinline not to confuse arch_check_object_on_stack_frame.
+ */
+bool noinline stack_access_ok(const void *obj, unsigned long len)
+{
+	const void * const stack = task_stack_page(current);
+	const void * const stackend = stack + THREAD_SIZE;
+	bool rc = false;
+
+	/* Does obj+len overflow vm space? */
+	if (unlikely(obj + len < obj))
+		goto exit;
+
+	/* Does [obj; obj+len) at least touch our stack? */
+	if (unlikely(obj + len <= stack || stackend <= obj)) {
+		rc = true;
+		goto exit;
+	}
+
+	/* Does [obj; obj+len) overflow/underflow the stack? */
+	if (unlikely(obj < stack || stackend < obj + len))
+		goto exit;
+
+	rc = arch_check_object_on_stack_frame(stack, stackend, obj, len);
+
+exit:
+	if (!rc)
+		pr_err("stack_access_ok failed (ptr = %p, len = %lu)\n", obj, len);
+	return rc;
+}
+EXPORT_SYMBOL(stack_access_ok);
diff --git a/mm/slab.c b/mm/slab.c
index d96e223..4ec5681 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3844,6 +3844,41 @@ unsigned int kmem_cache_size(struct kmem_cache *cachep)
 EXPORT_SYMBOL(kmem_cache_size);
 
 /*
+ * Returns false if and only if [ptr; ptr+len) touches the slab,
+ * but breaks objects boundaries.  It doesn't check whether the
+ * accessed object is actually allocated.
+ */
+bool slab_access_ok(const void *ptr, unsigned long len)
+{
+	struct page *page;
+	struct kmem_cache *cachep = NULL;
+	struct slab *slabp;
+	unsigned int objnr;
+	unsigned long offset;
+
+	if (!len)
+		return true;
+	if (!virt_addr_valid(ptr))
+		return true;
+	page = virt_to_head_page(ptr);
+	if (!PageSlab(page))
+		return true;
+
+	cachep = page_get_cache(page);
+	slabp = page_get_slab(page);
+	objnr = obj_to_index(cachep, slabp, (void *)ptr);
+	BUG_ON(objnr >= cachep->num);
+	offset = (const char *)ptr - obj_offset(cachep) -
+	    (const char *)index_to_obj(cachep, slabp, objnr);
+	if (offset <= obj_size(cachep) && len <= obj_size(cachep) - offset)
+		return true;
+
+	pr_err("slab_access_ok failed (addr %p, len %lu)\n", ptr, len);
+	return false;
+}
+EXPORT_SYMBOL(slab_access_ok);
+
+/*
  * This initializes kmem_list3 or resizes various caches for all nodes.
  */
 static int alloc_kmemlist(struct kmem_cache *cachep, gfp_t gfp)
diff --git a/mm/slub.c b/mm/slub.c
index 35f351f..169349b 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2623,6 +2623,35 @@ unsigned int kmem_cache_size(struct kmem_cache *s)
 }
 EXPORT_SYMBOL(kmem_cache_size);
 
+/*
+ * Returns false if and only if [ptr; ptr+len) touches the slab,
+ * but breaks objects boundaries.  It doesn't check whether the
+ * accessed object is actually allocated.
+ */
+bool slab_access_ok(const void *ptr, unsigned long len)
+{
+	struct page *page;
+	struct kmem_cache *s = NULL;
+	unsigned long offset;
+
+	if (len == 0)
+		return true;
+	if (!virt_addr_valid(ptr))
+		return true;
+	page = virt_to_head_page(ptr);
+	if (!PageSlab(page))
+		return true;
+
+	s = page->slab;
+	offset = ((const char *)ptr - (const char *)page_address(page)) % s->size;
+	if (offset <= s->objsize && len <= s->objsize - offset)
+		return true;
+
+	pr_err("slab_access_ok failed (addr %p, len %lu)\n", ptr, len);
+	return false;
+}
+EXPORT_SYMBOL(slab_access_ok);
+
 static void list_slab_objects(struct kmem_cache *s, struct page *page,
 							const char *text)
 {
---

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC v1] implement SL*B and stack usercopy runtime checks
  2011-07-03 11:10 [RFC v1] implement SL*B and stack usercopy runtime checks Vasiliy Kulikov
@ 2011-07-03 18:27 ` Linus Torvalds
  2011-07-03 18:57   ` Vasiliy Kulikov
                     ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Linus Torvalds @ 2011-07-03 18:27 UTC (permalink / raw)
  To: Vasiliy Kulikov
  Cc: kernel-hardening, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86, Arnd Bergmann, Christoph Lameter, Pekka Enberg,
	Matt Mackall, Andrew Morton, linux-kernel, linux-arch, linux-mm

That patch is entirely insane. No way in hell will that ever get merged.

copy_to/from_user() is some of the most performance-critical code, and
runs a *lot*, often for fairly small structures (ie 'fstat()' etc).

Adding random ad-hoc tests to it is entirely inappropriate. Doing so
unconditionally is insane.

So NAK, NAK, NAK.

If you seriously clean it up (that at a minimum includes things like
making it configurable using some pretty helper function that just
compiles away for all the normal cases, and not writing out

   if (!slab_access_ok(to, n) || !stack_access_ok(to, n))

multiple times, for chrissake) it _might_ be acceptable.

But in its current form it's just total crap. It's exactly the kind of
"crazy security people who don't care about anything BUT security"
crap that I refuse to see.

Some balance and sanity.

                      Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC v1] implement SL*B and stack usercopy runtime checks
  2011-07-03 18:27 ` Linus Torvalds
@ 2011-07-03 18:57   ` Vasiliy Kulikov
  2011-07-03 19:10     ` Linus Torvalds
  2011-07-06  3:39   ` Jonathan Hawthorne
  2011-07-18 18:39   ` [RFC v2] " Vasiliy Kulikov
  2 siblings, 1 reply; 16+ messages in thread
From: Vasiliy Kulikov @ 2011-07-03 18:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: kernel-hardening, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86, Arnd Bergmann, Christoph Lameter, Pekka Enberg,
	Matt Mackall, Andrew Morton, linux-kernel, linux-arch, linux-mm

On Sun, Jul 03, 2011 at 11:27 -0700, Linus Torvalds wrote:
> That patch is entirely insane. No way in hell will that ever get merged.

Sure, this is just an RFC :)  I didn't think about proposing it as a
patch as is, I tried to just show how/what checks it introduces.


> copy_to/from_user() is some of the most performance-critical code, and
> runs a *lot*, often for fairly small structures (ie 'fstat()' etc).
> 
> Adding random ad-hoc tests to it is entirely inappropriate. Doing so
> unconditionally is insane.

That's why I've asked whether it makes sense to guard it with
CONFIG_XXX, defaults to =n.  Some distributions might think it makes
sense to enable it sacrificing some speed.

Will do.


> If you seriously clean it up (that at a minimum includes things like
> making it configurable using some pretty helper function that just
> compiles away for all the normal cases,

Hm, it is not as simple as it looks at the first glance - even if the
object size is known at the compile time (__compiletime_object_size), it
might be a field of a structure, which crosses the slab object
boundaries because of an overflow.

However, if interpret constants fed to copy_*_user() as equivalent to
{get,put}_user() (== worry about size argument overflow only), then it
might be useful here.


>    if (!slab_access_ok(to, n) || !stack_access_ok(to, n))

OK :)


Thanks!

-- 
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC v1] implement SL*B and stack usercopy runtime checks
  2011-07-03 18:57   ` Vasiliy Kulikov
@ 2011-07-03 19:10     ` Linus Torvalds
  2011-07-03 19:24       ` [kernel-hardening] " Vasiliy Kulikov
  0 siblings, 1 reply; 16+ messages in thread
From: Linus Torvalds @ 2011-07-03 19:10 UTC (permalink / raw)
  To: Vasiliy Kulikov
  Cc: kernel-hardening, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86, Arnd Bergmann, Christoph Lameter, Pekka Enberg,
	Matt Mackall, Andrew Morton, linux-kernel, linux-arch, linux-mm

On Sun, Jul 3, 2011 at 11:57 AM, Vasiliy Kulikov <segoon@openwall.com> wrote:
>> If you seriously clean it up (that at a minimum includes things like
>> making it configurable using some pretty helper function that just
>> compiles away for all the normal cases,
>
> Hm, it is not as simple as it looks at the first glance - even if the
> object size is known at the compile time (__compiletime_object_size), it
> might be a field of a structure, which crosses the slab object
> boundaries because of an overflow.

No, I was more talking about having something like

  #ifdef CONFIG_EXPENSIVE_CHECK_USERCOPY
  extern int check_user_copy(const void *kptr, unsigned long size);
  #else
  static inline int check_user_copy(const void *kptr, unsigned long size)
  { return 0; }
  #endif

so that the actual user-copy routines end up being clean and not have
#ifdefs in them or any implementation details like what you check
(stack, slab, page cache - whatever)

If you can also make it automatically not generate any code for cases
that are somehow obviously safe, then that's an added bonus.

But my concern is that performance is a real issue, and the strict
user-copy checking sounds like mostly a "let's enable this for testing
kernels when chasing some particular issue" feature, the way
DEBUG_PAGEALLOC is. And at the same time, code cleanliness and
maintainability is a big deal, so the usercopy code itself should have
minimal impact and look nice regardless (which is why I strongly
object to that kind of "(!slab_access_ok(to, n) ||
!stack_access_ok(to, n))" crud - the internal details of what you
check are *totally* irrelevant to the usercopy code.

                           Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [kernel-hardening] Re: [RFC v1] implement SL*B and stack usercopy runtime checks
  2011-07-03 19:10     ` Linus Torvalds
@ 2011-07-03 19:24       ` Vasiliy Kulikov
  2011-07-03 19:37         ` Joe Perches
  0 siblings, 1 reply; 16+ messages in thread
From: Vasiliy Kulikov @ 2011-07-03 19:24 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Arnd Bergmann,
	Christoph Lameter, Pekka Enberg, Matt Mackall, Andrew Morton,
	linux-kernel, linux-arch, linux-mm

On Sun, Jul 03, 2011 at 12:10 -0700, Linus Torvalds wrote:
> On Sun, Jul 3, 2011 at 11:57 AM, Vasiliy Kulikov <segoon@openwall.com> wrote:
> >> If you seriously clean it up (that at a minimum includes things like
> >> making it configurable using some pretty helper function that just
> >> compiles away for all the normal cases,
> >
> > Hm, it is not as simple as it looks at the first glance - even if the
> > object size is known at the compile time (__compiletime_object_size), it
> > might be a field of a structure, which crosses the slab object
> > boundaries because of an overflow.
> 
> No, I was more talking about having something like
> 
>   #ifdef CONFIG_EXPENSIVE_CHECK_USERCOPY
>   extern int check_user_copy(const void *kptr, unsigned long size);
>   #else
>   static inline int check_user_copy(const void *kptr, unsigned long size)
>   { return 0; }
>   #endif

Sure, will do.  This is what I mean by kernel_access_ok() as it is a
weak equivalent of access_ok(), check_user_copy() is a bit confusing
name IMO.


> so that the actual user-copy routines end up being clean and not have
> #ifdefs in them or any implementation details like what you check
> (stack, slab, page cache - whatever)
> 
> If you can also make it automatically not generate any code for cases
> that are somehow obviously safe, then that's an added bonus.

OK, then let's stop on "checks for overflows" and remove the check if
__compiletime_object_size() says something or length is constant.  It
should remove most of the checks in fast pathes.


> But my concern is that performance is a real issue, and the strict
> user-copy checking sounds like mostly a "let's enable this for testing
> kernels when chasing some particular issue" feature, the way
> DEBUG_PAGEALLOC is.

I will measure the perfomance penalty tomorrow.


Btw, if the perfomance will be acceptable, what do you think about
logging/reacting on the spotted overflows?


Thanks,

-- 
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [kernel-hardening] Re: [RFC v1] implement SL*B and stack usercopy runtime checks
  2011-07-03 19:24       ` [kernel-hardening] " Vasiliy Kulikov
@ 2011-07-03 19:37         ` Joe Perches
  2011-07-03 19:53           ` Vasiliy Kulikov
  0 siblings, 1 reply; 16+ messages in thread
From: Joe Perches @ 2011-07-03 19:37 UTC (permalink / raw)
  To: Vasiliy Kulikov
  Cc: kernel-hardening, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86, Arnd Bergmann, Christoph Lameter, Pekka Enberg,
	Matt Mackall, Andrew Morton, linux-kernel, linux-arch, linux-mm

On Sun, 2011-07-03 at 23:24 +0400, Vasiliy Kulikov wrote:
> Btw, if the perfomance will be acceptable, what do you think about
> logging/reacting on the spotted overflows?

If you do, it might be useful to track the found location(s)
and only emit the overflow log entry once as found.

Maybe use __builtin_return_address(depth) for tracking.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [kernel-hardening] Re: [RFC v1] implement SL*B and stack usercopy runtime checks
  2011-07-03 19:37         ` Joe Perches
@ 2011-07-03 19:53           ` Vasiliy Kulikov
  0 siblings, 0 replies; 16+ messages in thread
From: Vasiliy Kulikov @ 2011-07-03 19:53 UTC (permalink / raw)
  To: Joe Perches
  Cc: kernel-hardening, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86, Arnd Bergmann, Christoph Lameter, Pekka Enberg,
	Matt Mackall, Andrew Morton, linux-kernel, linux-arch, linux-mm

On Sun, Jul 03, 2011 at 12:37 -0700, Joe Perches wrote:
> On Sun, 2011-07-03 at 23:24 +0400, Vasiliy Kulikov wrote:
> > Btw, if the perfomance will be acceptable, what do you think about
> > logging/reacting on the spotted overflows?
> 
> If you do, it might be useful to track the found location(s)

Sure.


> and only emit the overflow log entry once as found.

Hmm, if consider it as a purely debugging feature, then yes.  But if
consider it as a try to block some exploitation attempt, then no.
I'd appresiate the latter.


> Maybe use __builtin_return_address(depth) for tracking.

PaX/Grsecurity uses dump_stack() and do_group_exit(SIGKILL);  If setup,
it kills all user's processes and locks the user for some time.  I don't
really propose the latter, but some reaction (to at least slowdown a
blind bruteforce) might be useful.


Thanks,

-- 
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [kernel-hardening] Re: [RFC v1] implement SL*B and stack usercopy runtime checks
  2011-07-03 18:27 ` Linus Torvalds
  2011-07-03 18:57   ` Vasiliy Kulikov
@ 2011-07-06  3:39   ` Jonathan Hawthorne
  2011-07-18 18:39   ` [RFC v2] " Vasiliy Kulikov
  2 siblings, 0 replies; 16+ messages in thread
From: Jonathan Hawthorne @ 2011-07-06  3:39 UTC (permalink / raw)
  To: kernel-hardening, Vasiliy Kulikov
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Arnd Bergmann,
	Christoph Lameter, Pekka Enberg, Matt Mackall, Andrew Morton,
	linux-kernel, linux-arch, linux-mm


Linus is correct to push back on system call auditing in the general case.
 There should be a trust boundary that contains where we optimize speed at
the cost of security.  Small pockets of trust will best compromise between
speed and security.

In the case of FreeBSD or Solaris, these can be smaller than a single
machine; in fact they may scale from a single process, a process group, to
a physical machine.  Scripted environments can go smaller, giving as
little as a 64KB environment.

Between those pockets of trust, isolated (preferably air-gap) auditors are
a must have.  Auditors from independent vendors are ideal; these minimize
herd failure.

Network auditors are a clear first foundation, but after they do the heavy
lifting at the transaction boundary, statistical auditors between
subsystems would be strategically valuable at a minimal incident cost.
These systems can integrate well with self-balancing health monitors.

But in the tight loops, performance should be king.  If the trust fails to
that point, well, trust can always fail somewhere.  The granularity of the
trust bubbles makes a best effort to contain exploitation or failure in
compromise for throughput and latency.

Systems like MAC (Manditory Access Control) are a good compromise in UNIX
systems.  These audit cross-process communication on an access-list basis
and often benefit from a control plane that merges pattern-plus-remedy
rule sets into a single inspection dictionary and exception handler.

Imagine a coloring system.  Each piece of data is marked with a color.
When two colors of data are used together, the result is tainted by both
colors.  Rules exist that limit the colors that may be mixed on a
case-by-case basis using scenario-oriented configuration scripts.

__
Jonathan Hawthorne | Software Architect
t: +1.206.272.6624 | e: j.hawthorne@f5.com






On 7/3/11 11:27 AM, "Linus Torvalds" <torvalds@linux-foundation.org> wrote:

>That patch is entirely insane. No way in hell will that ever get merged.
>
>copy_to/from_user() is some of the most performance-critical code, and
>runs a *lot*, often for fairly small structures (ie 'fstat()' etc).
>
>Adding random ad-hoc tests to it is entirely inappropriate. Doing so
>unconditionally is insane.
>
>So NAK, NAK, NAK.
>
>If you seriously clean it up (that at a minimum includes things like
>making it configurable using some pretty helper function that just
>compiles away for all the normal cases, and not writing out
>
>   if (!slab_access_ok(to, n) || !stack_access_ok(to, n))
>
>multiple times, for chrissake) it _might_ be acceptable.
>
>But in its current form it's just total crap. It's exactly the kind of
>"crazy security people who don't care about anything BUT security"
>crap that I refuse to see.
>
>Some balance and sanity.
>
>                      Linus


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC v2] implement SL*B and stack usercopy runtime checks
  2011-07-03 18:27 ` Linus Torvalds
  2011-07-03 18:57   ` Vasiliy Kulikov
  2011-07-06  3:39   ` Jonathan Hawthorne
@ 2011-07-18 18:39   ` Vasiliy Kulikov
  2011-07-18 18:52     ` Andrew Morton
                       ` (2 more replies)
  2 siblings, 3 replies; 16+ messages in thread
From: Vasiliy Kulikov @ 2011-07-18 18:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: kernel-hardening, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86, Arnd Bergmann, Christoph Lameter, Pekka Enberg,
	Matt Mackall, Andrew Morton, linux-kernel, linux-arch, linux-mm

This patch implements 2 additional checks for the data copied from
kernelspace to userspace and vice versa (original PAX_USERCOPY from PaX
patch).  Currently there are some very simple and cheap comparisons of
supplied size and the size of a copied object known at the compile time
in copy_* functions.  This patch enhances these checks to check against
stack frame boundaries and against SL*B object sizes.

More precisely, it checks:

1) if the data touches the stack, checks whether it fully fits in the stack
and whether it fully fits in a single stack frame.  The latter is arch
dependent, currently it is implemented for x86 with CONFIG_FRAME_POINTER=y
only.  It limits infoleaks/overwrites to a single frame and local variables
only, and prevents saved return instruction pointer overwriting.

2) if the data is from the SL*B cache, checks whether it fully fits in a
slab page and whether it overflows a slab object.  E.g. if the memory
was allocated as kmalloc(64, GFP_KERNEL) and one tries to copy 150
bytes, the copy would fail.

The checks are implemented for copy_{to,from}_user() and similar and are
missing for {put,get}_user() and similar because the base pointer might
be a result of any pointer arithmetics, and the correctness of these
arithmetics is almost impossible to check on this stage.  If the
real object size is known at the compile time, the check is reduced to 2
integers comparison.  If the supplied length argument is known at the
compile time, the check is skipped because the only thing that can be
under attacker's control is object pointer and checking for errors as a
result of wrong pointer arithmetic is beyond patch's goals.

/dev/kmem and /dev/mem are fixed to pass this check (e.g. without
STRICT_DEVMEM it should be possible to overflow the stack frame and slab
objects).

The slowdown is negligible for most workflows - for some cases it is
reduced to integer comparison (in fstat, getrlimit, ipc), for some cases
the whole syscall time and the time of a check are not comparable (in
write, path traversal functions), for programs with not intensive
syscalls usage.  One of the most significant slowdowns is gethostname(),
the penalty is 0,9%.  For 'find /usr', 'git log -Sredirect' in kernel
tree, kernel compilation, file copying the slowdown is less than 0,1%
(couldn't measure it more precisely).

The limitations:

The stack check does nothing with local variables overwriting and 
saved registers.  It only limits overflows to a single frame.

The SL*B checks don't validate whether the object is actually allocated.
So, it doesn't prevent infoleaks related to the freed objects.  Also if
the cache's granularity is larger than an actual allocated object size,
an infoleak of padding bytes is possible.  The slob check is missing yet.
Unfortunately, the check for slob would have to (1) walk through the
slob chunks and (2) hold the slob lock, so it would lead to a
significant slowdown.

The patch does nothing with other memory areas like vmalloc'ed areas,
modules' data and code sections, etc.  It can be an area for
improvements.

The patch's goal is similar to StackGuard (-fstack-protector gcc option,
enabled by CONFIG_CC_STACKPROTECTOR): catch buffer oveflows.
However, the design is completely different.  First, SG does nothing
with overreads, it can catch overwrites only.  Second, SG cannot catch
SL*B buffer overflows.  Third, SG checks the canary after a buffer is
overflowed instead of preventing an actual overflow attempt; when an attacker
overflows a stack buffer, he can uncontrolledly wipe some data on the
stack before the function return.  If attacker's actions generate kernel
oops before the return, SG would not get the control and the overflow is
not catched as if SG is disabled.  However, SG can catch oveflows of
memcpy(), strcpy(), sprintf() and other functions working with kernel
data only, which are not caught by RUNTIME_USER_COPY_CHECK.

The checks are implemented for x86, it can be easily extended to other
architectues by including <linux/uaccess-check.h> and adding
kernel_access_ok() checks into {,__}copy_{to,from}_user().

The patch is a forwardport of the PAX_USERCOPY feature from the PaX
patch.  Most code was copied from the PaX patch with minor cosmetic
changes.  Also PaX' version of the patch has additional restrictions:

a) some slab caches has SLAB_USERCOPY flag set and copies to/from the slab
caches without the flag are denied.  Rare cases where some bytes needed
from the caches missing in the white list are handled by copying the
bytes into temporary area on the stack/heap.

b) if a malformed copy request is spotted, the event is logged and
SIGKILL signal is sent to the current task.

Examples of overflows, which become nonexploitable with RUNTIME_USER_COPY_CHECK:
DCCP getsockopt copy_to_user() overflow (fixed in 39ebc0276b),
L2TP memcpy_fromiovec() overflow (fixed in 53eacc070b),
64kb iwconfig infoleak (fixed in 42da2f948d, was found by PAX_USERCOPY).


Questions/thoughts:

Should this code put in action some reacting mechanisms?  Probably it is
a job of userspace monitoring daemon (similar to segvguard), but the
kernel's reaction would be race free and much more timely.

v2: - Moved the checks to kernel_access_ok().
    - If the object size is known at the compilation time, just compare
      length and object size.
    - Check only if length value is not known at the compilation time.
    - Provided performance results.

---
 arch/x86/include/asm/uaccess.h    |   49 ++++++++++++++++++++++++++++++++
 arch/x86/include/asm/uaccess_32.h |   29 +++++++++++++++++--
 arch/x86/include/asm/uaccess_64.h |   36 ++++++++++++++++++++++-
 arch/x86/lib/usercopy_32.c        |    2 +-
 drivers/char/mem.c                |    8 ++--
 include/linux/uaccess-check.h     |   37 ++++++++++++++++++++++++
 lib/Kconfig.debug                 |   22 ++++++++++++++
 mm/maccess.c                      |   56 +++++++++++++++++++++++++++++++++++++
 mm/slab.c                         |   34 ++++++++++++++++++++++
 mm/slob.c                         |   10 ++++++
 mm/slub.c                         |   28 ++++++++++++++++++
 11 files changed, 301 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 99ddd14..96bad6c 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -9,6 +9,7 @@
 #include <linux/string.h>
 #include <asm/asm.h>
 #include <asm/page.h>
+#include <linux/uaccess-check.h>
 
 #define VERIFY_READ 0
 #define VERIFY_WRITE 1
@@ -78,6 +79,54 @@
  */
 #define access_ok(type, addr, size) (likely(__range_not_ok(addr, size) == 0))
 
+#if defined(CONFIG_FRAME_POINTER)
+/*
+ * MUST be always_inline to correctly count stack frame numbers.
+ *
+ * low ----------------------------------------------> high
+ * [saved bp][saved ip][args][local vars][saved bp][saved ip]
+ *		       ^----------------^
+ *		  allow copies only within here
+*/
+#undef arch_check_object_on_stack_frame
+inline static __attribute__((always_inline))
+bool arch_check_object_on_stack_frame(const void *stack,
+	     const void *stackend, const void *obj, unsigned long len)
+{
+	const void *frame = NULL;
+	const void *oldframe;
+
+	/*
+	 * Get the kernel_access_ok() caller frame.
+	 * __builtin_frame_address(0) returns kernel_access_ok() frame
+	 * as arch_ and stack_ are inline and kernel_ is noinline.
+	 */
+	oldframe = __builtin_frame_address(0);
+	if (oldframe)
+		frame = __builtin_frame_address(1);
+
+	while (stack <= frame && frame < stackend) {
+		/*
+		 * If obj + len extends past the last frame, this
+		 * check won't pass and the next frame will be 0,
+		 * causing us to bail out and correctly report
+		 * the copy as invalid.
+		 */
+		if (obj + len <= frame) {
+			/* EBP + EIP */
+			int protected_regs_size = 2*sizeof(void *);
+
+			if (obj >= oldframe + protected_regs_size)
+				return true;
+			return false;
+		}
+		oldframe = frame;
+		frame = *(const void * const *)frame;
+	}
+	return false;
+}
+#endif
+
 /*
  * The exception table consists of pairs of addresses: the first is the
  * address of an instruction that is allowed to fault, and the second is
diff --git a/arch/x86/include/asm/uaccess_32.h b/arch/x86/include/asm/uaccess_32.h
index 566e803..d48fa9c 100644
--- a/arch/x86/include/asm/uaccess_32.h
+++ b/arch/x86/include/asm/uaccess_32.h
@@ -82,6 +82,8 @@ static __always_inline unsigned long __must_check
 __copy_to_user(void __user *to, const void *from, unsigned long n)
 {
 	might_fault();
+	if (!kernel_access_ok(from, n))
+		return n;
 	return __copy_to_user_inatomic(to, from, n);
 }
 
@@ -152,6 +154,8 @@ __copy_from_user(void *to, const void __user *from, unsigned long n)
 			return ret;
 		}
 	}
+	if (!kernel_access_ok(to, n))
+		return n;
 	return __copy_from_user_ll(to, from, n);
 }
 
@@ -205,11 +209,30 @@ static inline unsigned long __must_check copy_from_user(void *to,
 {
 	int sz = __compiletime_object_size(to);
 
-	if (likely(sz == -1 || sz >= n))
-		n = _copy_from_user(to, from, n);
-	else
+	if (likely(sz == -1 || sz >= n)) {
+		if (kernel_access_ok(to, n))
+			n = _copy_from_user(to, from, n);
+	} else {
 		copy_from_user_overflow();
+	}
+
+	return n;
+}
+
+#undef copy_from_user_uncheched
+static inline unsigned long __must_check copy_from_user_uncheched(void *to,
+					  const void __user *from,
+					  unsigned long n)
+{
+	return _copy_from_user(to, from, n);
+}
 
+#undef copy_to_user_uncheched
+static inline unsigned long copy_to_user_unchecked(void __user *to,
+     const void *from, unsigned long n)
+{
+	if (access_ok(VERIFY_WRITE, to, n))
+		n = __copy_to_user(to, from, n);
 	return n;
 }
 
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index 1c66d30..10c5a0a 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -50,8 +50,10 @@ static inline unsigned long __must_check copy_from_user(void *to,
 	int sz = __compiletime_object_size(to);
 
 	might_fault();
-	if (likely(sz == -1 || sz >= n))
-		n = _copy_from_user(to, from, n);
+	if (likely(sz == -1 || sz >= n)) {
+		if (kernel_access_ok(to, n))
+			n = _copy_from_user(to, from, n);
+	}
 #ifdef CONFIG_DEBUG_VM
 	else
 		WARN(1, "Buffer overflow detected!\n");
@@ -59,11 +61,33 @@ static inline unsigned long __must_check copy_from_user(void *to,
 	return n;
 }
 
+#undef copy_from_user_unchecked
+static inline unsigned long __must_check copy_from_user_unchecked(void *to,
+					  const void __user *from,
+					  unsigned long n)
+{
+	might_fault();
+
+	return _copy_from_user(to, from, n);
+}
+
 static __always_inline __must_check
 int copy_to_user(void __user *dst, const void *src, unsigned size)
 {
 	might_fault();
 
+	if (!kernel_access_ok(src, size))
+		return size;
+
+	return _copy_to_user(dst, src, size);
+}
+
+#undef copy_to_user_unchecked
+static __always_inline __must_check
+int copy_to_user_unchecked(void __user *dst, const void *src, unsigned size)
+{
+	might_fault();
+
 	return _copy_to_user(dst, src, size);
 }
 
@@ -73,8 +97,12 @@ int __copy_from_user(void *dst, const void __user *src, unsigned size)
 	int ret = 0;
 
 	might_fault();
+	if (!kernel_access_ok(dst, size))
+		return size;
+
 	if (!__builtin_constant_p(size))
 		return copy_user_generic(dst, (__force void *)src, size);
+
 	switch (size) {
 	case 1:__get_user_asm(*(u8 *)dst, (u8 __user *)src,
 			      ret, "b", "b", "=q", 1);
@@ -117,8 +145,12 @@ int __copy_to_user(void __user *dst, const void *src, unsigned size)
 	int ret = 0;
 
 	might_fault();
+	if (!kernel_access_ok(dst, size))
+		return size;
+
 	if (!__builtin_constant_p(size))
 		return copy_user_generic((__force void *)dst, src, size);
+
 	switch (size) {
 	case 1:__put_user_asm(*(u8 *)src, (u8 __user *)dst,
 			      ret, "b", "b", "iq", 1);
diff --git a/arch/x86/lib/usercopy_32.c b/arch/x86/lib/usercopy_32.c
index e218d5d..e136309 100644
--- a/arch/x86/lib/usercopy_32.c
+++ b/arch/x86/lib/usercopy_32.c
@@ -851,7 +851,7 @@ EXPORT_SYMBOL(__copy_from_user_ll_nocache_nozero);
 unsigned long
 copy_to_user(void __user *to, const void *from, unsigned long n)
 {
-	if (access_ok(VERIFY_WRITE, to, n))
+	if (access_ok(VERIFY_WRITE, to, n) && kernel_access_ok(from, n))
 		n = __copy_to_user(to, from, n);
 	return n;
 }
diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index 8fc04b4..77c93d4 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -132,7 +132,7 @@ static ssize_t read_mem(struct file *file, char __user *buf,
 		if (!ptr)
 			return -EFAULT;
 
-		remaining = copy_to_user(buf, ptr, sz);
+		remaining = copy_to_user_unchecked(buf, ptr, sz);
 		unxlate_dev_mem_ptr(p, ptr);
 		if (remaining)
 			return -EFAULT;
@@ -190,7 +190,7 @@ static ssize_t write_mem(struct file *file, const char __user *buf,
 			return -EFAULT;
 		}
 
-		copied = copy_from_user(ptr, buf, sz);
+		copied = copy_from_user_unchecked(ptr, buf, sz);
 		unxlate_dev_mem_ptr(p, ptr);
 		if (copied) {
 			written += sz - copied;
@@ -428,7 +428,7 @@ static ssize_t read_kmem(struct file *file, char __user *buf,
 			 */
 			kbuf = xlate_dev_kmem_ptr((char *)p);
 
-			if (copy_to_user(buf, kbuf, sz))
+			if (copy_to_user_unchecked(buf, kbuf, sz))
 				return -EFAULT;
 			buf += sz;
 			p += sz;
@@ -498,7 +498,7 @@ static ssize_t do_write_kmem(unsigned long p, const char __user *buf,
 		 */
 		ptr = xlate_dev_kmem_ptr((char *)p);
 
-		copied = copy_from_user(ptr, buf, sz);
+		copied = copy_from_user_unchecked(ptr, buf, sz);
 		if (copied) {
 			written += sz - copied;
 			if (written)
diff --git a/include/linux/uaccess-check.h b/include/linux/uaccess-check.h
new file mode 100644
index 0000000..5592a3e
--- /dev/null
+++ b/include/linux/uaccess-check.h
@@ -0,0 +1,37 @@
+#ifndef __LINUX_UACCESS_CHECK_H__
+#define __LINUX_UACCESS_CHECK_H__
+
+#ifdef CONFIG_DEBUG_RUNTIME_USER_COPY_CHECKS
+extern bool __kernel_access_ok(const void *ptr, unsigned long len);
+static inline bool kernel_access_ok(const void *ptr, unsigned long len)
+{
+	size_t sz = __compiletime_object_size(ptr);
+
+	if (sz != (size_t)-1) {
+		if (sz >= len)
+			return true;
+		pr_alert("kernel_access_ok(static) failed, ptr = %p, length = %lu\n",
+			ptr, len);
+		dump_stack();
+		return false;
+	}
+
+	/* We care about "len" overflows only. */
+	if (__builtin_constant_p(len))
+		return true;
+
+	return __kernel_access_ok(ptr, len);
+}
+#else
+static inline bool kernel_access_ok(const void *ptr, unsigned long len)
+{
+	return true;
+}
+#endif /* CONFIG_DEBUG_RUNTIME_USER_COPY_CHECKS */
+
+#define copy_to_user_unchecked copy_to_user
+#define copy_from_user_unchecked copy_from_user
+
+#define arch_check_object_on_stack_frame(s,se,o,len) true
+
+#endif		/* __LINUX_UACCESS_CHECK_H__ */
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index dd373c8..ed266b6 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -679,6 +679,28 @@ config DEBUG_STACK_USAGE
 
 	  This option will slow down process creation somewhat.
 
+config DEBUG_RUNTIME_USER_COPY_CHECKS
+	bool "Runtime usercopy size checks"
+	default n
+	depends on DEBUG_KERNEL && X86
+	---help---
+	  Enabling this option adds additional runtime checks into copy_from_user()
+	  and similar functions.
+
+	  Specifically, if the data touches the stack, it checks whether a copied
+	  memory chunk fully fits in the stack. If CONFIG_FRAME_POINTER=y, also
+	  checks whether it fully fits in a single stack frame. It limits
+	  infoleaks/overwrites to a single frame and local variables
+	  only, and prevents saved return instruction pointer overwriting.
+
+	  If the data is from the SL*B cache, checks whether it fully fits in a
+	  slab page and whether it overflows a slab object.  E.g. if the memory
+	  was allocated as kmalloc(64, GFP_KERNEL) and one tries to copy 150
+	  bytes, the copy would fail.
+
+	  The option has a minimal performance drawback (up to 1% on tiny syscalls
+	  like gethostname).
+
 config DEBUG_KOBJECT
 	bool "kobject debugging"
 	depends on DEBUG_KERNEL
diff --git a/mm/maccess.c b/mm/maccess.c
index 4cee182..af450b8 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -3,8 +3,11 @@
  */
 #include <linux/module.h>
 #include <linux/mm.h>
+#include <linux/sched.h>
 #include <linux/uaccess.h>
 
+extern bool slab_access_ok(const void *ptr, unsigned long len);
+
 /**
  * probe_kernel_read(): safely attempt to read from a location
  * @dst: pointer to the buffer that shall take the data
@@ -60,3 +63,56 @@ long __probe_kernel_write(void *dst, const void *src, size_t size)
 	return ret ? -EFAULT : 0;
 }
 EXPORT_SYMBOL_GPL(probe_kernel_write);
+
+#ifdef CONFIG_DEBUG_RUNTIME_USER_COPY_CHECKS
+/*
+ * stack_access_ok() checks whether object is on the stack and
+ * whether it fits in a single stack frame (in case arch allows
+ * to learn this information).
+ *
+ * Returns true in cases:
+ * a) object is not a stack object at all
+ * b) object is located on the stack and fits in a single frame
+ *
+ * MUST be inline not to confuse arch_check_object_on_stack_frame.
+ */
+inline static bool __attribute__((always_inline))
+stack_access_ok(const void *obj, unsigned long len)
+{
+	const void * const stack = task_stack_page(current);
+	const void * const stackend = stack + THREAD_SIZE;
+
+	/* Does obj+len overflow vm space? */
+	if (unlikely(obj + len < obj))
+		return false;
+
+	/* Does [obj; obj+len) at least touch our stack? */
+	if (unlikely(obj + len <= stack || stackend <= obj))
+		return true;
+
+	/* Does [obj; obj+len) overflow/underflow the stack? */
+	if (unlikely(obj < stack || stackend < obj + len))
+		return false;
+
+	return arch_check_object_on_stack_frame(stack, stackend, obj, len);
+}
+
+noinline bool __kernel_access_ok(const void *ptr, unsigned long len)
+{
+	if (!slab_access_ok(ptr, len)) {
+		pr_alert("slab_access_ok failed, ptr = %p, length = %lu\n",
+			ptr, len);
+		dump_stack();
+		return false;
+	}
+	if (!stack_access_ok(ptr, len)) {
+		pr_alert("stack_access_ok failed, ptr = %p, length = %lu\n",
+			ptr, len);
+		dump_stack();
+		return false;
+	}
+
+	return true;
+}
+EXPORT_SYMBOL(__kernel_access_ok);
+#endif /* CONFIG_DEBUG_RUNTIME_USER_COPY_CHECKS */
diff --git a/mm/slab.c b/mm/slab.c
index d96e223..60e062c 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3844,6 +3844,40 @@ unsigned int kmem_cache_size(struct kmem_cache *cachep)
 EXPORT_SYMBOL(kmem_cache_size);
 
 /*
+ * Returns false if and only if [ptr; ptr+len) touches the slab,
+ * but breaks objects boundaries.  It doesn't check whether the
+ * accessed object is actually allocated.
+ */
+bool slab_access_ok(const void *ptr, unsigned long len)
+{
+	struct page *page;
+	struct kmem_cache *cachep = NULL;
+	struct slab *slabp;
+	unsigned int objnr;
+	unsigned long offset;
+
+	if (!len)
+		return true;
+	if (!virt_addr_valid(ptr))
+		return true;
+	page = virt_to_head_page(ptr);
+	if (!PageSlab(page))
+		return true;
+
+	cachep = page_get_cache(page);
+	slabp = page_get_slab(page);
+	objnr = obj_to_index(cachep, slabp, (void *)ptr);
+	BUG_ON(objnr >= cachep->num);
+	offset = (const char *)ptr - obj_offset(cachep) -
+	    (const char *)index_to_obj(cachep, slabp, objnr);
+	if (offset <= obj_size(cachep) && len <= obj_size(cachep) - offset)
+		return true;
+
+	return false;
+}
+EXPORT_SYMBOL(slab_access_ok);
+
+/*
  * This initializes kmem_list3 or resizes various caches for all nodes.
  */
 static int alloc_kmemlist(struct kmem_cache *cachep, gfp_t gfp)
diff --git a/mm/slob.c b/mm/slob.c
index 46e0aee..2d9bb2b 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -666,6 +666,16 @@ unsigned int kmem_cache_size(struct kmem_cache *c)
 }
 EXPORT_SYMBOL(kmem_cache_size);
 
+bool slab_access_ok(const void *ptr, unsigned long len)
+{
+	/*
+	 * TODO: is it worth checking?  We have to gain a lock and
+	 * walk through all chunks.
+	 */
+	return true;
+}
+EXPORT_SYMBOL(slab_access_ok);
+
 int kmem_cache_shrink(struct kmem_cache *d)
 {
 	return 0;
diff --git a/mm/slub.c b/mm/slub.c
index 35f351f..be64e77 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2623,6 +2623,34 @@ unsigned int kmem_cache_size(struct kmem_cache *s)
 }
 EXPORT_SYMBOL(kmem_cache_size);
 
+/*
+ * Returns false if and only if [ptr; ptr+len) touches the slab,
+ * but breaks objects boundaries.  It doesn't check whether the
+ * accessed object is actually allocated.
+ */
+bool slab_access_ok(const void *ptr, unsigned long len)
+{
+	struct page *page;
+	struct kmem_cache *s = NULL;
+	unsigned long offset;
+
+	if (len == 0)
+		return true;
+	if (!virt_addr_valid(ptr))
+		return true;
+	page = virt_to_head_page(ptr);
+	if (!PageSlab(page))
+		return true;
+
+	s = page->slab;
+	offset = ((const char *)ptr - (const char *)page_address(page)) % s->size;
+	if (offset <= s->objsize && len <= s->objsize - offset)
+		return true;
+
+	return false;
+}
+EXPORT_SYMBOL(slab_access_ok);
+
 static void list_slab_objects(struct kmem_cache *s, struct page *page,
 							const char *text)
 {
--

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC v2] implement SL*B and stack usercopy runtime checks
  2011-07-18 18:39   ` [RFC v2] " Vasiliy Kulikov
@ 2011-07-18 18:52     ` Andrew Morton
  2011-07-18 19:33       ` [kernel-hardening] " Vasiliy Kulikov
  2011-07-19  7:40       ` Vasiliy Kulikov
  2011-07-18 19:08     ` Matt Mackall
  2011-07-18 21:18     ` Christoph Lameter
  2 siblings, 2 replies; 16+ messages in thread
From: Andrew Morton @ 2011-07-18 18:52 UTC (permalink / raw)
  To: Vasiliy Kulikov
  Cc: Linus Torvalds, kernel-hardening, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Arnd Bergmann, Christoph Lameter,
	Pekka Enberg, Matt Mackall, linux-kernel, linux-arch, linux-mm

On Mon, 18 Jul 2011 22:39:51 +0400
Vasiliy Kulikov <segoon@openwall.com> wrote:

>   */
>  #define access_ok(type, addr, size) (likely(__range_not_ok(addr, size) == 0))
>  
> +#if defined(CONFIG_FRAME_POINTER)

#ifdef is conventional in this case

> +/*
> + * MUST be always_inline to correctly count stack frame numbers.
> + *
> + * low ----------------------------------------------> high
> + * [saved bp][saved ip][args][local vars][saved bp][saved ip]
> + *		       ^----------------^
> + *		  allow copies only within here
> +*/
> +#undef arch_check_object_on_stack_frame
> +inline static __attribute__((always_inline))

static inline __always_inline

> +bool arch_check_object_on_stack_frame(const void *stack,
> +	     const void *stackend, const void *obj, unsigned long len)
> +{
> +	const void *frame = NULL;
> +	const void *oldframe;
> +
> +	/*
> +	 * Get the kernel_access_ok() caller frame.
> +	 * __builtin_frame_address(0) returns kernel_access_ok() frame
> +	 * as arch_ and stack_ are inline and kernel_ is noinline.
> +	 */
> +	oldframe = __builtin_frame_address(0);
> +	if (oldframe)
> +		frame = __builtin_frame_address(1);
> +
> +	while (stack <= frame && frame < stackend) {
> +		/*
> +		 * If obj + len extends past the last frame, this
> +		 * check won't pass and the next frame will be 0,
> +		 * causing us to bail out and correctly report
> +		 * the copy as invalid.
> +		 */
> +		if (obj + len <= frame) {
> +			/* EBP + EIP */
> +			int protected_regs_size = 2*sizeof(void *);

size_t?

> +			if (obj >= oldframe + protected_regs_size)
> +				return true;
> +			return false;
> +		}
> +		oldframe = frame;
> +		frame = *(const void * const *)frame;
> +	}
> +	return false;
> +}
> +#endif
> +
>  /*
>   * The exception table consists of pairs of addresses: the first is the
>   * address of an instruction that is allowed to fault, and the second is
>
> ...
>
> @@ -205,11 +209,30 @@ static inline unsigned long __must_check copy_from_user(void *to,
>  {
>  	int sz = __compiletime_object_size(to);
>  
> -	if (likely(sz == -1 || sz >= n))
> -		n = _copy_from_user(to, from, n);
> -	else
> +	if (likely(sz == -1 || sz >= n)) {
> +		if (kernel_access_ok(to, n))
> +			n = _copy_from_user(to, from, n);
> +	} else {
>  		copy_from_user_overflow();
> +	}
> +
> +	return n;
> +}
> +
> +#undef copy_from_user_uncheched

typo

> +static inline unsigned long __must_check copy_from_user_uncheched(void *to,

typo

> +					  const void __user *from,
> +					  unsigned long n)
> +{
> +	return _copy_from_user(to, from, n);
> +}
>  
> +#undef copy_to_user_uncheched

typo

> +static inline unsigned long copy_to_user_unchecked(void __user *to,
> +     const void *from, unsigned long n)
> +{
> +	if (access_ok(VERIFY_WRITE, to, n))
> +		n = __copy_to_user(to, from, n);
>  	return n;
>  }
>  
> diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
> index 1c66d30..10c5a0a 100644
> --- a/arch/x86/include/asm/uaccess_64.h
> +++ b/arch/x86/include/asm/uaccess_64.h
> @@ -50,8 +50,10 @@ static inline unsigned long __must_check copy_from_user(void *to,
>  	int sz = __compiletime_object_size(to);

size_t? (ssize_t?)

>  	might_fault();
> -	if (likely(sz == -1 || sz >= n))
> -		n = _copy_from_user(to, from, n);
> +	if (likely(sz == -1 || sz >= n)) {
> +		if (kernel_access_ok(to, n))
> +			n = _copy_from_user(to, from, n);
> +	}
>  #ifdef CONFIG_DEBUG_VM
>  	else
>  		WARN(1, "Buffer overflow detected!\n");
>
> ...
>
> --- a/mm/maccess.c
> +++ b/mm/maccess.c
> @@ -3,8 +3,11 @@
>   */
>  #include <linux/module.h>
>  #include <linux/mm.h>
> +#include <linux/sched.h>
>  #include <linux/uaccess.h>
>  
> +extern bool slab_access_ok(const void *ptr, unsigned long len);

no externs in .c - use a header

> +
>  /**
>   * probe_kernel_read(): safely attempt to read from a location
>   * @dst: pointer to the buffer that shall take the data
> @@ -60,3 +63,56 @@ long __probe_kernel_write(void *dst, const void *src, size_t size)
>  	return ret ? -EFAULT : 0;
>  }
>  EXPORT_SYMBOL_GPL(probe_kernel_write);
> +
> +#ifdef CONFIG_DEBUG_RUNTIME_USER_COPY_CHECKS
> +/*
> + * stack_access_ok() checks whether object is on the stack and
> + * whether it fits in a single stack frame (in case arch allows
> + * to learn this information).
> + *
> + * Returns true in cases:
> + * a) object is not a stack object at all
> + * b) object is located on the stack and fits in a single frame
> + *
> + * MUST be inline not to confuse arch_check_object_on_stack_frame.
> + */
> +inline static bool __attribute__((always_inline))

__always_inline

> +stack_access_ok(const void *obj, unsigned long len)
> +{
> +	const void * const stack = task_stack_page(current);
> +	const void * const stackend = stack + THREAD_SIZE;
> +
> +	/* Does obj+len overflow vm space? */
> +	if (unlikely(obj + len < obj))
> +		return false;
> +
> +	/* Does [obj; obj+len) at least touch our stack? */
> +	if (unlikely(obj + len <= stack || stackend <= obj))
> +		return true;
> +
> +	/* Does [obj; obj+len) overflow/underflow the stack? */
> +	if (unlikely(obj < stack || stackend < obj + len))
> +		return false;
> +
> +	return arch_check_object_on_stack_frame(stack, stackend, obj, len);
> +}
> +
> +noinline bool __kernel_access_ok(const void *ptr, unsigned long len)

noinline seems unneeded

> +{
> +	if (!slab_access_ok(ptr, len)) {
> +		pr_alert("slab_access_ok failed, ptr = %p, length = %lu\n",
> +			ptr, len);
> +		dump_stack();
> +		return false;
> +	}
> +	if (!stack_access_ok(ptr, len)) {
> +		pr_alert("stack_access_ok failed, ptr = %p, length = %lu\n",
> +			ptr, len);
> +		dump_stack();
> +		return false;
> +	}
> +
> +	return true;
> +}
>
> ...
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC v2] implement SL*B and stack usercopy runtime checks
  2011-07-18 18:39   ` [RFC v2] " Vasiliy Kulikov
  2011-07-18 18:52     ` Andrew Morton
@ 2011-07-18 19:08     ` Matt Mackall
  2011-07-18 19:24       ` Vasiliy Kulikov
  2011-07-18 21:18     ` Christoph Lameter
  2 siblings, 1 reply; 16+ messages in thread
From: Matt Mackall @ 2011-07-18 19:08 UTC (permalink / raw)
  To: Vasiliy Kulikov
  Cc: Linus Torvalds, kernel-hardening, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Arnd Bergmann, Christoph Lameter,
	Pekka Enberg, Andrew Morton, linux-kernel, linux-arch, linux-mm

On Mon, 2011-07-18 at 22:39 +0400, Vasiliy Kulikov wrote:
> This patch implements 2 additional checks for the data copied from
> kernelspace to userspace and vice versa (original PAX_USERCOPY from PaX
> patch).  Currently there are some very simple and cheap comparisons of
> supplied size and the size of a copied object known at the compile time
> in copy_* functions.  This patch enhances these checks to check against
> stack frame boundaries and against SL*B object sizes.
> 
> More precisely, it checks:
> 
> 1) if the data touches the stack, checks whether it fully fits in the stack
> and whether it fully fits in a single stack frame.  The latter is arch
> dependent, currently it is implemented for x86 with CONFIG_FRAME_POINTER=y
> only.  It limits infoleaks/overwrites to a single frame and local variables
> only, and prevents saved return instruction pointer overwriting.
> 
> 2) if the data is from the SL*B cache, checks whether it fully fits in a
> slab page and whether it overflows a slab object.  E.g. if the memory
> was allocated as kmalloc(64, GFP_KERNEL) and one tries to copy 150
> bytes, the copy would fail.

FYI, this should almost certainly be split into (at least) two patches:

- the stack check
- the SL*B check (probably one patch per allocator, preceded by one for
any shared infrastructure)

-- 
Mathematics is the supreme nostalgia of our time.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC v2] implement SL*B and stack usercopy runtime checks
  2011-07-18 19:08     ` Matt Mackall
@ 2011-07-18 19:24       ` Vasiliy Kulikov
  0 siblings, 0 replies; 16+ messages in thread
From: Vasiliy Kulikov @ 2011-07-18 19:24 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Linus Torvalds, kernel-hardening, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Arnd Bergmann, Christoph Lameter,
	Pekka Enberg, Andrew Morton, linux-kernel, linux-arch, linux-mm

On Mon, Jul 18, 2011 at 14:08 -0500, Matt Mackall wrote:
> On Mon, 2011-07-18 at 22:39 +0400, Vasiliy Kulikov wrote:
> > This patch implements 2 additional checks for the data copied from
> > kernelspace to userspace and vice versa (original PAX_USERCOPY from PaX
> > patch).  Currently there are some very simple and cheap comparisons of
> > supplied size and the size of a copied object known at the compile time
> > in copy_* functions.  This patch enhances these checks to check against
> > stack frame boundaries and against SL*B object sizes.
> > 
> > More precisely, it checks:
> > 
> > 1) if the data touches the stack, checks whether it fully fits in the stack
> > and whether it fully fits in a single stack frame.  The latter is arch
> > dependent, currently it is implemented for x86 with CONFIG_FRAME_POINTER=y
> > only.  It limits infoleaks/overwrites to a single frame and local variables
> > only, and prevents saved return instruction pointer overwriting.
> > 
> > 2) if the data is from the SL*B cache, checks whether it fully fits in a
> > slab page and whether it overflows a slab object.  E.g. if the memory
> > was allocated as kmalloc(64, GFP_KERNEL) and one tries to copy 150
> > bytes, the copy would fail.
> 
> FYI, this should almost certainly be split into (at least) two patches:
> 
> - the stack check
> - the SL*B check (probably one patch per allocator, preceded by one for
> any shared infrastructure)

Sure, also per architecture probably.  But I want to get the comments
about the feature itself before the division.

Thanks,

-- 
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [kernel-hardening] Re: [RFC v2] implement SL*B and stack usercopy runtime checks
  2011-07-18 18:52     ` Andrew Morton
@ 2011-07-18 19:33       ` Vasiliy Kulikov
  2011-07-19  7:40       ` Vasiliy Kulikov
  1 sibling, 0 replies; 16+ messages in thread
From: Vasiliy Kulikov @ 2011-07-18 19:33 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Linus Torvalds, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86, Arnd Bergmann, Christoph Lameter, Pekka Enberg,
	Matt Mackall, linux-kernel, linux-arch, linux-mm

On Mon, Jul 18, 2011 at 11:52 -0700, Andrew Morton wrote:
> On Mon, 18 Jul 2011 22:39:51 +0400
> Vasiliy Kulikov <segoon@openwall.com> wrote:
> 
> >   */
> >  #define access_ok(type, addr, size) (likely(__range_not_ok(addr, size) == 0))
> >  
> > +#if defined(CONFIG_FRAME_POINTER)
> 
> #ifdef is conventional in this case

OK.

> > +/*
> > + * MUST be always_inline to correctly count stack frame numbers.
> > + *
> > + * low ----------------------------------------------> high
> > + * [saved bp][saved ip][args][local vars][saved bp][saved ip]
> > + *		       ^----------------^
> > + *		  allow copies only within here
> > +*/
> > +#undef arch_check_object_on_stack_frame
> > +inline static __attribute__((always_inline))
> 
> static inline __always_inline

OK.

> > +bool arch_check_object_on_stack_frame(const void *stack,
> > +	     const void *stackend, const void *obj, unsigned long len)
> > +{
> > +	const void *frame = NULL;
> > +	const void *oldframe;
> > +
> > +	/*
> > +	 * Get the kernel_access_ok() caller frame.
> > +	 * __builtin_frame_address(0) returns kernel_access_ok() frame
> > +	 * as arch_ and stack_ are inline and kernel_ is noinline.
> > +	 */
> > +	oldframe = __builtin_frame_address(0);
> > +	if (oldframe)
> > +		frame = __builtin_frame_address(1);
> > +
> > +	while (stack <= frame && frame < stackend) {
> > +		/*
> > +		 * If obj + len extends past the last frame, this
> > +		 * check won't pass and the next frame will be 0,
> > +		 * causing us to bail out and correctly report
> > +		 * the copy as invalid.
> > +		 */
> > +		if (obj + len <= frame) {
> > +			/* EBP + EIP */
> > +			int protected_regs_size = 2*sizeof(void *);
> 
> size_t?

Yes, it looks better here.

> > +static inline unsigned long __must_check copy_from_user_uncheched(void *to,
> 
> typo

Oops, sure.

> > diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
> > index 1c66d30..10c5a0a 100644
> > --- a/arch/x86/include/asm/uaccess_64.h
> > +++ b/arch/x86/include/asm/uaccess_64.h
> > @@ -50,8 +50,10 @@ static inline unsigned long __must_check copy_from_user(void *to,
> >  	int sz = __compiletime_object_size(to);
> 
> size_t? (ssize_t?)

It doesn't touch my patch, however, ssize_t seems reasonable here.

> >  	might_fault();
> > -	if (likely(sz == -1 || sz >= n))
> > -		n = _copy_from_user(to, from, n);
> > +	if (likely(sz == -1 || sz >= n)) {
> > +		if (kernel_access_ok(to, n))
> > +			n = _copy_from_user(to, from, n);
> > +	}
> >  #ifdef CONFIG_DEBUG_VM
> >  	else
> >  		WARN(1, "Buffer overflow detected!\n");
> >
> > ...
> >
> > --- a/mm/maccess.c
> > +++ b/mm/maccess.c
> > @@ -3,8 +3,11 @@
> >   */
> >  #include <linux/module.h>
> >  #include <linux/mm.h>
> > +#include <linux/sched.h>
> >  #include <linux/uaccess.h>
> >  
> > +extern bool slab_access_ok(const void *ptr, unsigned long len);
> 
> no externs in .c - use a header

I thought it would make less noise.  OK, will do.

> > +noinline bool __kernel_access_ok(const void *ptr, unsigned long len)
> 
> noinline seems unneeded

It is needed here because arch_check_object_on_stack_frame() needs the
precise number of frames it should skip.


Thank you!

-- 
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC v2] implement SL*B and stack usercopy runtime checks
  2011-07-18 18:39   ` [RFC v2] " Vasiliy Kulikov
  2011-07-18 18:52     ` Andrew Morton
  2011-07-18 19:08     ` Matt Mackall
@ 2011-07-18 21:18     ` Christoph Lameter
  2011-07-19  6:53       ` Vasiliy Kulikov
  2 siblings, 1 reply; 16+ messages in thread
From: Christoph Lameter @ 2011-07-18 21:18 UTC (permalink / raw)
  To: Vasiliy Kulikov
  Cc: Linus Torvalds, kernel-hardening, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Arnd Bergmann, Pekka Enberg, Matt Mackall,
	Andrew Morton, linux-kernel, linux-arch, linux-mm

On Mon, 18 Jul 2011, Vasiliy Kulikov wrote:

> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3844,6 +3844,40 @@ unsigned int kmem_cache_size(struct kmem_cache *cachep)
>  EXPORT_SYMBOL(kmem_cache_size);
>
>  /*
> + * Returns false if and only if [ptr; ptr+len) touches the slab,
> + * but breaks objects boundaries.  It doesn't check whether the
> + * accessed object is actually allocated.
> + */
> +bool slab_access_ok(const void *ptr, unsigned long len)
> +{
> +	struct page *page;
> +	struct kmem_cache *cachep = NULL;

Why = NULL?

> +	struct slab *slabp;
> +	unsigned int objnr;
> +	unsigned long offset;
> +
> +	if (!len)
> +		return true;
> +	if (!virt_addr_valid(ptr))
> +		return true;
> +	page = virt_to_head_page(ptr);
> +	if (!PageSlab(page))
> +		return true;
> +
> +	cachep = page_get_cache(page);
> +	slabp = page_get_slab(page);
> +	objnr = obj_to_index(cachep, slabp, (void *)ptr);
> +	BUG_ON(objnr >= cachep->num);
> +	offset = (const char *)ptr - obj_offset(cachep) -
> +	    (const char *)index_to_obj(cachep, slabp, objnr);
> +	if (offset <= obj_size(cachep) && len <= obj_size(cachep) - offset)
> +		return true;
> +
> +	return false;
> +}
> +EXPORT_SYMBOL(slab_access_ok);
> +
> +/*
>   * This initializes kmem_list3 or resizes various caches for all nodes.
>   */
>  static int alloc_kmemlist(struct kmem_cache *cachep, gfp_t gfp)

> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2623,6 +2623,34 @@ unsigned int kmem_cache_size(struct kmem_cache *s)
>  }
>  EXPORT_SYMBOL(kmem_cache_size);
>
> +/*
> + * Returns false if and only if [ptr; ptr+len) touches the slab,
> + * but breaks objects boundaries.  It doesn't check whether the
> + * accessed object is actually allocated.
> + */
> +bool slab_access_ok(const void *ptr, unsigned long len)
> +{
> +	struct page *page;
> +	struct kmem_cache *s = NULL;

No need to assign NULL.

> +	unsigned long offset;
> +
> +	if (len == 0)
> +		return true;
> +	if (!virt_addr_valid(ptr))
> +		return true;
> +	page = virt_to_head_page(ptr);
> +	if (!PageSlab(page))
> +		return true;
> +
> +	s = page->slab;
> +	offset = ((const char *)ptr - (const char *)page_address(page)) % s->size;

Are the casts necessary? Both are pointers to void *


> +	if (offset <= s->objsize && len <= s->objsize - offset)

If offset == s->objsize then we access the first byte after the object.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC v2] implement SL*B and stack usercopy runtime checks
  2011-07-18 21:18     ` Christoph Lameter
@ 2011-07-19  6:53       ` Vasiliy Kulikov
  0 siblings, 0 replies; 16+ messages in thread
From: Vasiliy Kulikov @ 2011-07-19  6:53 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Linus Torvalds, kernel-hardening, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Arnd Bergmann, Pekka Enberg, Matt Mackall,
	Andrew Morton, linux-kernel, linux-arch, linux-mm

On Mon, Jul 18, 2011 at 16:18 -0500, Christoph Lameter wrote:
> On Mon, 18 Jul 2011, Vasiliy Kulikov wrote:
> 
> > --- a/mm/slab.c
> > +++ b/mm/slab.c
> > @@ -3844,6 +3844,40 @@ unsigned int kmem_cache_size(struct kmem_cache *cachep)
> >  EXPORT_SYMBOL(kmem_cache_size);
> >
> >  /*
> > + * Returns false if and only if [ptr; ptr+len) touches the slab,
> > + * but breaks objects boundaries.  It doesn't check whether the
> > + * accessed object is actually allocated.
> > + */
> > +bool slab_access_ok(const void *ptr, unsigned long len)
> > +{
> > +	struct page *page;
> > +	struct kmem_cache *cachep = NULL;
> 
> Why = NULL?

Indeed, redundant.

> > +	struct slab *slabp;
> > +	unsigned int objnr;
> > +	unsigned long offset;
> > +
> > +	if (!len)
> > +		return true;
> > +	if (!virt_addr_valid(ptr))
> > +		return true;
> > +	page = virt_to_head_page(ptr);
> > +	if (!PageSlab(page))
> > +		return true;
> > +
> > +	cachep = page_get_cache(page);
> > +	slabp = page_get_slab(page);
> > +	objnr = obj_to_index(cachep, slabp, (void *)ptr);
> > +	BUG_ON(objnr >= cachep->num);
> > +	offset = (const char *)ptr - obj_offset(cachep) -
> > +	    (const char *)index_to_obj(cachep, slabp, objnr);
> > +	if (offset <= obj_size(cachep) && len <= obj_size(cachep) - offset)
> > +		return true;
> > +
> > +	return false;
> > +}
> > +EXPORT_SYMBOL(slab_access_ok);
> > +
> > +/*
> >   * This initializes kmem_list3 or resizes various caches for all nodes.
> >   */
> >  static int alloc_kmemlist(struct kmem_cache *cachep, gfp_t gfp)
> 
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -2623,6 +2623,34 @@ unsigned int kmem_cache_size(struct kmem_cache *s)
> >  }
> >  EXPORT_SYMBOL(kmem_cache_size);
> >
> > +/*
> > + * Returns false if and only if [ptr; ptr+len) touches the slab,
> > + * but breaks objects boundaries.  It doesn't check whether the
> > + * accessed object is actually allocated.
> > + */
> > +bool slab_access_ok(const void *ptr, unsigned long len)
> > +{
> > +	struct page *page;
> > +	struct kmem_cache *s = NULL;
> 
> No need to assign NULL.

Ditto.

> > +	unsigned long offset;
> > +
> > +	if (len == 0)
> > +		return true;
> > +	if (!virt_addr_valid(ptr))
> > +		return true;
> > +	page = virt_to_head_page(ptr);
> > +	if (!PageSlab(page))
> > +		return true;
> > +
> > +	s = page->slab;
> > +	offset = ((const char *)ptr - (const char *)page_address(page)) % s->size;
> 
> Are the casts necessary? Both are pointers to void *

Is it normal kernel style to use void* pointer arithmetic?

> > +	if (offset <= s->objsize && len <= s->objsize - offset)
> 
> If offset == s->objsize then we access the first byte after the object.

Well, then objsize - offset == 0 and len can be 0 only to pass the right
part of && check.  But (len == 0) case is already handled above.

But yes, for better readability it should be "<".


Thank you,

-- 
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [kernel-hardening] Re: [RFC v2] implement SL*B and stack usercopy runtime checks
  2011-07-18 18:52     ` Andrew Morton
  2011-07-18 19:33       ` [kernel-hardening] " Vasiliy Kulikov
@ 2011-07-19  7:40       ` Vasiliy Kulikov
  1 sibling, 0 replies; 16+ messages in thread
From: Vasiliy Kulikov @ 2011-07-19  7:40 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Linus Torvalds, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86, Arnd Bergmann, Christoph Lameter, Pekka Enberg,
	Matt Mackall, linux-kernel, linux-arch, linux-mm

On Mon, Jul 18, 2011 at 11:52 -0700, Andrew Morton wrote:
> > +noinline bool __kernel_access_ok(const void *ptr, unsigned long len)
> 
> noinline seems unneeded

Ah, understood what you mean.  It is .c, and users are in other .c, so
it is indeed redundant.

Thanks!

-- 
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2011-07-19  7:40 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-03 11:10 [RFC v1] implement SL*B and stack usercopy runtime checks Vasiliy Kulikov
2011-07-03 18:27 ` Linus Torvalds
2011-07-03 18:57   ` Vasiliy Kulikov
2011-07-03 19:10     ` Linus Torvalds
2011-07-03 19:24       ` [kernel-hardening] " Vasiliy Kulikov
2011-07-03 19:37         ` Joe Perches
2011-07-03 19:53           ` Vasiliy Kulikov
2011-07-06  3:39   ` Jonathan Hawthorne
2011-07-18 18:39   ` [RFC v2] " Vasiliy Kulikov
2011-07-18 18:52     ` Andrew Morton
2011-07-18 19:33       ` [kernel-hardening] " Vasiliy Kulikov
2011-07-19  7:40       ` Vasiliy Kulikov
2011-07-18 19:08     ` Matt Mackall
2011-07-18 19:24       ` Vasiliy Kulikov
2011-07-18 21:18     ` Christoph Lameter
2011-07-19  6:53       ` Vasiliy Kulikov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).