kernel-hardening.lists.openwall.com archive mirror
 help / color / mirror / Atom feed
* [RFC v1 0/4] static analysis of copy_to_user()
@ 2018-12-20 19:59 Tycho Andersen
  2018-12-20 19:59 ` [RFC v1 1/4] expression.h: update comment to include other cast types Tycho Andersen
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Tycho Andersen @ 2018-12-20 19:59 UTC (permalink / raw)
  To: linux-sparse, kernel-hardening; +Cc: Tycho Andersen

Hi all,

A while ago I talked with various people about whether some static
analsys of copy_to_user() could be productive in finding infoleaks.
Unfortunately, due to the various issues outlined in the patch notes, it
doesn't seem like it is. Perhaps these checks are useful to put in just
to future proof ourselves against these sorts of issues, though.

Anyway, here's the code. Thoughts welcome!

Tycho

Tycho Andersen (4):
  expression.h: update comment to include other cast types
  move name-based analysis before linearization
  add a check for copy_to_user() address spaces
  check copy_to_user() sizes

 expression.h                           |   2 +-
 sparse.c                               | 327 ++++++++++++++++++++++---
 validation/copy_to_user.c              |  31 +++
 validation/copy_to_user_sizes.c        |  53 ++++
 validation/copy_to_user_sizes_inline.c |  29 +++
 5 files changed, 405 insertions(+), 37 deletions(-)
 create mode 100644 validation/copy_to_user.c
 create mode 100644 validation/copy_to_user_sizes.c
 create mode 100644 validation/copy_to_user_sizes_inline.c

-- 
2.19.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC v1 1/4] expression.h: update comment to include other cast types
  2018-12-20 19:59 [RFC v1 0/4] static analysis of copy_to_user() Tycho Andersen
@ 2018-12-20 19:59 ` Tycho Andersen
  2018-12-20 19:59 ` [RFC v1 2/4] move name-based analysis before linearization Tycho Andersen
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Tycho Andersen @ 2018-12-20 19:59 UTC (permalink / raw)
  To: linux-sparse, kernel-hardening; +Cc: Tycho Andersen

This part of the union is used with other cast types as well, so let's
include those in the comment.

Signed-off-by: Tycho Andersen <tycho@tycho.ws>
---
 expression.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/expression.h b/expression.h
index ba4157f..fece40d 100644
--- a/expression.h
+++ b/expression.h
@@ -187,7 +187,7 @@ struct expression {
 			struct expression *base;
 			unsigned r_bitpos, r_nrbits;
 		};
-		// EXPR_CAST and EXPR_SIZEOF
+		// EXPR_CAST, FORCE_CAST, EXPR_IMPLIED_CAST and EXPR_SIZEOF
 		struct /* cast_arg */ {
 			struct symbol *cast_type;
 			struct expression *cast_expression;
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC v1 2/4] move name-based analysis before linearization
  2018-12-20 19:59 [RFC v1 0/4] static analysis of copy_to_user() Tycho Andersen
  2018-12-20 19:59 ` [RFC v1 1/4] expression.h: update comment to include other cast types Tycho Andersen
@ 2018-12-20 19:59 ` Tycho Andersen
  2018-12-20 19:59 ` [RFC v1 3/4] add a check for copy_to_user() address spaces Tycho Andersen
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Tycho Andersen @ 2018-12-20 19:59 UTC (permalink / raw)
  To: linux-sparse, kernel-hardening; +Cc: Tycho Andersen

Based on [1], let's move the function name matching analysis to before
linearization so that we don't destroy type info. This has the added
benefit that we can call the same analysis on inlined functions, since they
haven't been totally inlined yet and still have their name information.

Unfortunately, this means that walking the AST is much more complicated,
because we need to handle all the various expression/statement cases. I've
done what I believe is a mostly complete implementation of this, but there
are a few TODOs here and there as well.

I ported the current size checks to this new point as well.

[1]: https://www.spinics.net/lists/linux-sparse/msg09867.html

Signed-off-by: Tycho Andersen <tycho@tycho.ws>
---
 sparse.c | 193 ++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 156 insertions(+), 37 deletions(-)

diff --git a/sparse.c b/sparse.c
index 151eaf4..217a5bf 100644
--- a/sparse.c
+++ b/sparse.c
@@ -149,68 +149,188 @@ static void check_range_instruction(struct instruction *insn)
 	warning(insn->pos, "value out of range");
 }
 
-static void check_byte_count(struct instruction *insn, pseudo_t count)
+static void check_byte_count(struct position pos, struct expression *size)
 {
-	if (!count)
+	long long val;
+
+	if (!size)
 		return;
-	if (count->type == PSEUDO_VAL) {
-		unsigned long long val = count->value;
-		if (Wmemcpy_max_count && val > fmemcpy_max_count)
-			warning(insn->pos, "%s with byte count of %llu",
-				show_ident(insn->func->sym->ident), val);
+
+	val = get_expression_value_silent(size);
+	if (val == 0)
 		return;
-	}
+
+	if (Wmemcpy_max_count && val > fmemcpy_max_count)
+		warning(pos, "copy with byte count of %llu", val);
 	/* OK, we could try to do the range analysis here */
 }
 
-static pseudo_t argument(struct instruction *call, unsigned int argno)
+static void check_memset(struct position pos, struct expression_list *args)
 {
-	pseudo_t args[8];
-	struct ptr_list *arg_list = (struct ptr_list *) call->arguments;
-
-	argno--;
-	if (linearize_ptr_list(arg_list, (void *)args, 8) > argno)
-		return args[argno];
-	return NULL;
+	struct expression *size = ptr_list_nth_entry((struct ptr_list *)args, 2);
+	check_byte_count(pos, size);
 }
 
-static void check_memset(struct instruction *insn)
-{
-	check_byte_count(insn, argument(insn, 3));
 }
 
+
 #define check_memcpy check_memset
 #define check_ctu check_memset
 #define check_cfu check_memset
 
 struct checkfn {
 	struct ident *id;
-	void (*check)(struct instruction *insn);
+	void (*check)(struct position pos, struct expression_list *args);
+};
+
+static const struct checkfn check_fn[] = {
+	{ &memset_ident, check_memset },
+	{ &memcpy_ident, check_memcpy },
+	{ &copy_to_user_ident, check_ctu },
+	{ &copy_from_user_ident, check_cfu },
 };
 
-static void check_call_instruction(struct instruction *insn)
+static void check_one_statement(struct statement *);
+
+static void check_one_expression(struct position pos, struct expression *expr)
 {
-	pseudo_t fn = insn->func;
-	struct ident *ident;
-	static const struct checkfn check_fn[] = {
-		{ &memset_ident, check_memset },
-		{ &memcpy_ident, check_memcpy },
-		{ &copy_to_user_ident, check_ctu },
-		{ &copy_from_user_ident, check_cfu },
-	};
+	struct expression *call, *fn;
+	struct symbol *direct;
 	int i;
 
-	if (fn->type != PSEUDO_SYM)
+	if (!expr)
+		return;
+
+	if (expr->type == EXPR_STATEMENT) {
+		check_one_statement(expr->statement);
+		return;
+	}
+
+	if (expr->type != EXPR_CALL)
 		return;
-	ident = fn->sym->ident;
-	if (!ident)
+
+	call = expr;
+	fn = call->fn;
+
+	direct = NULL;
+	if (fn->type == EXPR_PREOP) {
+		if (fn->unop->type == EXPR_SYMBOL) {
+			struct symbol *sym = fn->unop->symbol;
+			if (sym->ctype.base_type->type == SYM_FN)
+				direct = sym;
+		}
+	}
+
+	if (!direct)
 		return;
+
 	for (i = 0; i < ARRAY_SIZE(check_fn); i++) {
-		if (check_fn[i].id != ident)
+		if (check_fn[i].id != direct->ident)
 			continue;
-		check_fn[i].check(insn);
+		check_fn[i].check(pos, call->args);
+		return;
+	}
+}
+
+static void check_one_statement(struct statement *stmt)
+{
+	struct ident *ident;
+	int i;
+
+	if (!stmt)
+		return;
+
+	switch (stmt->type) {
+	case STMT_DECLARATION: {
+		struct symbol *sym;
+		FOR_EACH_PTR(stmt->declaration, sym) {
+			check_one_expression(stmt->pos, sym->initializer);
+		} END_FOR_EACH_PTR(sym);
 		break;
 	}
+	case STMT_EXPRESSION:
+		check_one_expression(stmt->pos, stmt->expression);
+		break;
+	case STMT_COMPOUND: {
+		struct statement *s;
+
+		if (stmt->inline_fn) {
+			ident = stmt->inline_fn->ident;
+
+			for (i = 0; i < ARRAY_SIZE(check_fn); i++) {
+				struct symbol *sym;
+				struct expression_list *args = NULL;
+
+				if (check_fn[i].id != ident)
+					continue;
+
+				FOR_EACH_PTR(stmt->args->declaration, sym) {
+					add_expression(&args, sym->initializer);
+				} END_FOR_EACH_PTR(sym);
+
+				check_fn[i].check(stmt->pos, args);
+				free_ptr_list((struct ptr_list **) &args);
+				break;
+			}
+			break;
+		}
+
+		FOR_EACH_PTR(stmt->stmts, s) {
+			check_one_statement(s);
+		} END_FOR_EACH_PTR(s);
+		break;
+	}
+	case STMT_IF:
+		check_one_expression(stmt->pos, stmt->if_conditional);
+		check_one_statement(stmt->if_true);
+		check_one_statement(stmt->if_false);
+		break;
+	case STMT_CASE:
+		check_one_expression(stmt->pos, stmt->case_expression);
+		check_one_expression(stmt->pos, stmt->case_to);
+		check_one_statement(stmt->case_statement);
+		break;
+	case STMT_SWITCH:
+		check_one_expression(stmt->pos, stmt->switch_expression);
+		check_one_statement(stmt->switch_statement);
+		break;
+	case STMT_ITERATOR:
+		check_one_statement(stmt->iterator_pre_statement);
+		check_one_expression(stmt->pos, stmt->iterator_pre_condition);
+		check_one_statement(stmt->iterator_statement);
+		check_one_statement(stmt->iterator_post_statement);
+		check_one_expression(stmt->pos, stmt->iterator_post_condition);
+		break;
+	case STMT_LABEL:
+		check_one_statement(stmt->label_statement);
+		break;
+	case STMT_RANGE:
+		check_one_expression(stmt->pos, stmt->range_expression);
+		check_one_expression(stmt->pos, stmt->range_low);
+		check_one_expression(stmt->pos, stmt->range_high);
+		break;
+	/*
+	 * TODO: STMT_CONTEXT, GOTO, ASM; these could/should all be walked, but
+	 * don't seem super relevant for copy_{to,from}_user().
+	 */
+	default:
+		return;
+	}
+}
+
+static void check_symbol(struct symbol *sym)
+{
+	if (sym->type == SYM_NODE)
+		sym = sym->ctype.base_type;
+
+	switch (sym->type) {
+	case SYM_FN:
+		if (sym->stmt)
+			check_one_statement(sym->stmt);
+		break;
+	default:
+		return;
+	}
 }
 
 static void check_one_instruction(struct instruction *insn)
@@ -224,9 +344,6 @@ static void check_one_instruction(struct instruction *insn)
 	case OP_RANGE:
 		check_range_instruction(insn);
 		break;
-	case OP_CALL:
-		check_call_instruction(insn);
-		break;
 	default:
 		break;
 	}
@@ -314,6 +431,8 @@ static void check_symbols(struct symbol_list *list)
 		struct entrypoint *ep;
 
 		expand_symbol(sym);
+		check_symbol(sym);
+
 		ep = linearize_symbol(sym);
 		if (ep && ep->entry) {
 			if (dbg_entry)
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC v1 3/4] add a check for copy_to_user() address spaces
  2018-12-20 19:59 [RFC v1 0/4] static analysis of copy_to_user() Tycho Andersen
  2018-12-20 19:59 ` [RFC v1 1/4] expression.h: update comment to include other cast types Tycho Andersen
  2018-12-20 19:59 ` [RFC v1 2/4] move name-based analysis before linearization Tycho Andersen
@ 2018-12-20 19:59 ` Tycho Andersen
  2018-12-20 19:59 ` [RFC v1 4/4] check copy_to_user() sizes Tycho Andersen
  2018-12-21 22:47 ` [RFC v1 0/4] static analysis of copy_to_user() Luc Van Oostenryck
  4 siblings, 0 replies; 9+ messages in thread
From: Tycho Andersen @ 2018-12-20 19:59 UTC (permalink / raw)
  To: linux-sparse, kernel-hardening; +Cc: Tycho Andersen

Leaking kernel pointers to userspace is a bad idea, so let's try to do some
analysis for it.

The basic idea is that every pointer copied to userspace "should be" (but
isn't necessarily) annotated with __user, and if it is not, then it's a
potential infoleak. The motivation for this is stuff like [1], which is
exactly a case of this. Based on a subsequent manual analysis of the uapi
headers, I found [2].

Unfortunately in both of these cases, there is void * (for compat) trickery
that masks them from actually turning up in this case. But hey, I wrote it
and tried it out, so perhaps the code is useful for someone :)

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/block/floppy.c?id=65eea8edc315589d6c993cf12dbb5d0e9ef1fe4e
[2]: https://lkml.org/lkml/2018/11/3/142

Signed-off-by: Tycho Andersen <tycho@tycho.ws>
---
 sparse.c                  | 93 ++++++++++++++++++++++++++++++++++++++-
 validation/copy_to_user.c | 31 +++++++++++++
 2 files changed, 123 insertions(+), 1 deletion(-)

diff --git a/sparse.c b/sparse.c
index 217a5bf..8bcbe16 100644
--- a/sparse.c
+++ b/sparse.c
@@ -171,13 +171,104 @@ static void check_memset(struct position pos, struct expression_list *args)
 	check_byte_count(pos, size);
 }
 
+static struct symbol *resolve_arg_type(struct position pos, struct expression *arg)
+{
+	struct expression *uncast;
+
+	uncast = arg;
+	switch (arg->type) {
+	case EXPR_CAST:
+	case EXPR_FORCE_CAST:
+	case EXPR_IMPLIED_CAST:
+		/*
+		 * Undo any casting done by sparse to the function's
+		 * argument type.
+		 */
+		uncast = arg->cast_expression;
+		break;
+	case EXPR_SYMBOL:
+		break;
+	case EXPR_PREOP:
+		/*
+		 * handle derefs; these are really just the type of the
+		 * resulting expression.
+		 */
+		break;
+	case EXPR_BINOP:
+		/* TODO: resolve this pointer math if possible? */
+		return NULL;
+	default:
+		warning(pos, "huh? arg not a cast or symbol? %d", arg->type);
+		return NULL;
+	}
+
+	return uncast->ctype->ctype.base_type;
+}
+
+static void check_ptr_in_other_as(struct position pos, struct symbol *sym, int this_as)
+{
+	struct ident *ident = sym->ident;
+
+	if (sym->type == SYM_NODE)
+		sym = sym->ctype.base_type;
+
+	switch (sym->type) {
+	case SYM_ARRAY:
+	case SYM_PTR: {
+		if (sym->ctype.as != this_as)
+			warning(pos, "member %s is a kernel pointer copied to userspace", show_ident(ident));
+		check_ptr_in_other_as(pos, sym->ctype.base_type, this_as);
+		break;
+	}
+	case SYM_STRUCT:
+	case SYM_UNION: {
+		struct symbol *member;
+
+		FOR_EACH_PTR(sym->symbol_list, member) {
+			check_ptr_in_other_as(pos, member, this_as);
+		} END_FOR_EACH_PTR(member);
+		break;
+	}
+	default:
+		/*
+		 * scalar types are ok
+		 * TODO: what about SYM_LABEL/PREPROCESSOR?
+		 */
+		break;
+	}
+}
+
+static void check_no_kernel_pointers(struct position pos, struct expression_list *args)
+{
+	struct expression *src = ptr_list_nth_entry((struct ptr_list *)args, 1);
+	struct symbol *base = NULL;
+
+	if (!Waddress_space)
+		return;
+
+	/* get the type of src */
+	base = resolve_arg_type(pos, src);
+
+	/*
+	 * And deref it to *src; src will *always* be a kernel pointer, and
+	 * we're really after members of structures here, not the pointers
+	 * themselves. So we do this deref at the top level.
+	 */
+	base = base->ctype.base_type;
+
+	check_ptr_in_other_as(pos, base, 1);
 }
 
 
 #define check_memcpy check_memset
-#define check_ctu check_memset
 #define check_cfu check_memset
 
+void check_ctu(struct position pos, struct expression_list *args)
+{
+	check_memset(pos, args);
+	check_no_kernel_pointers(pos, args);
+}
+
 struct checkfn {
 	struct ident *id;
 	void (*check)(struct position pos, struct expression_list *args);
diff --git a/validation/copy_to_user.c b/validation/copy_to_user.c
new file mode 100644
index 0000000..d9cded6
--- /dev/null
+++ b/validation/copy_to_user.c
@@ -0,0 +1,31 @@
+#define __user __attribute__((address_space(1)))
+
+struct bar {
+	char *bar_kptr;
+};
+
+struct foo {
+	char *foo_kptr;
+	char __user *uptr;
+	struct bar bar;
+};
+
+static void copy_to_user(void __user *to, const void *from, unsigned long n)
+{
+}
+
+static void bar(void)
+{
+	struct foo f;
+	void __user *p = (void __user *)0;
+
+	copy_to_user(p, &f, sizeof(f));
+}
+/*
+ * check-name: copy_to_user arguments
+ *
+ * check-error-start
+copy_to_user.c:22:9: warning: member foo_kptr is a kernel pointer copied to userspace
+copy_to_user.c:22:9: warning: member bar_kptr is a kernel pointer copied to userspace
+ * check-error-end
+ */
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC v1 4/4] check copy_to_user() sizes
  2018-12-20 19:59 [RFC v1 0/4] static analysis of copy_to_user() Tycho Andersen
                   ` (2 preceding siblings ...)
  2018-12-20 19:59 ` [RFC v1 3/4] add a check for copy_to_user() address spaces Tycho Andersen
@ 2018-12-20 19:59 ` Tycho Andersen
  2018-12-21 22:47 ` [RFC v1 0/4] static analysis of copy_to_user() Luc Van Oostenryck
  4 siblings, 0 replies; 9+ messages in thread
From: Tycho Andersen @ 2018-12-20 19:59 UTC (permalink / raw)
  To: linux-sparse, kernel-hardening; +Cc: Tycho Andersen

The intent here is to be a smarter version of the size checking that's just
a hard coded constant. The insight is that we know the type of the thing
that's getting passed to copy_to_user(), so if we can evaluate the size, it
should be <= sizeof(*the thing being copied).

I've run this over x86 with allyes config, and it didn't find anything, so
perhaps it isn't particularly useful. Indeed, most copy_to_user()s look
like:

        copy_to_user(p, &foo, sizeof(foo))

so perhaps this isn't a surprise.

The one potentially interesting case is the hard coded array length case:
it's possible someone might change one place but not the other. See e.g.
the first copy_to_user() in drivers/ata/libata-scsi.c, which looks like:

        copy_to_user(dst, dev->id, ATA_ID_WORDS * sizeof(u16))

id here is a u16 id[ATA_ID_WORDS], so it would be possible to change this
in one place or not the other. Unfortunately, sparse seems to unify the
type of dev->id to a u16* instead of u16[], so we can't see through this.

Anyway, there are currently no violations of this in the kernel either,
although it does emit some false positives like,

fs/xfs/xfs_ioctl.c:1813:25: warning: copy_to_user() where size (13) is larger than src (1)

due to the above. I wrote the code, so perhaps someday it will be useful to
someone, so I'm posting it :)

Signed-off-by: Tycho Andersen <tycho@tycho.ws>
---
 sparse.c                               | 45 ++++++++++++++++++++++
 validation/copy_to_user_sizes.c        | 53 ++++++++++++++++++++++++++
 validation/copy_to_user_sizes_inline.c | 29 ++++++++++++++
 3 files changed, 127 insertions(+)

diff --git a/sparse.c b/sparse.c
index 8bcbe16..eb7d5d7 100644
--- a/sparse.c
+++ b/sparse.c
@@ -31,6 +31,7 @@
 #include <ctype.h>
 #include <unistd.h>
 #include <fcntl.h>
+#include <limits.h>
 
 #include "lib.h"
 #include "allocate.h"
@@ -259,6 +260,49 @@ static void check_no_kernel_pointers(struct position pos, struct expression_list
 	check_ptr_in_other_as(pos, base, 1);
 }
 
+static void check_copy_size(struct position pos, struct expression_list *args)
+{
+	struct expression *src = ptr_list_nth_entry((struct ptr_list *)args, 1);
+	struct expression *size = ptr_list_nth_entry((struct ptr_list *)args, 2);
+	long long src_actual = LLONG_MAX;
+	long long size_actual;
+	struct symbol *base, *deref;
+
+	/* get the type of src */
+	base = resolve_arg_type(pos, src);
+
+	/* and deref it to *src */
+	deref = base->ctype.base_type;
+
+	/*
+	 * Note, this is never hit right now because sparse seems to unify
+	 * arrays that are arguments to type *.
+	 */
+	if (is_array_type(base)) {
+		long long array_size = get_expression_value_silent(base->array_size);
+		printf("%s is array type\n", show_ident(base->ident));
+
+		/* failed to evaluate */
+		if (array_size == 0)
+			return;
+		src_actual = array_size * bits_to_bytes(deref->bit_size);
+	} else {
+		src_actual = bits_to_bytes(deref->bit_size);
+	}
+
+	/* Evaluate size, if we can */
+	size_actual = get_expression_value_silent(size);
+	/*
+	 * size_actual == 0 means that get_expression_value failed; of
+	 * course we'll miss something if there is a 0 length copy, but
+	 * then nothing will leak anyway so...
+	 */
+	if (size_actual == 0)
+		return;
+
+	if (size_actual > src_actual)
+		warning(pos, "copy_to_user() where size (%lld) is larger than src (%lld)", size_actual, src_actual);
+}
 
 #define check_memcpy check_memset
 #define check_cfu check_memset
@@ -267,6 +311,7 @@ void check_ctu(struct position pos, struct expression_list *args)
 {
 	check_memset(pos, args);
 	check_no_kernel_pointers(pos, args);
+	check_copy_size(pos, args);
 }
 
 struct checkfn {
diff --git a/validation/copy_to_user_sizes.c b/validation/copy_to_user_sizes.c
new file mode 100644
index 0000000..f1f7b8e
--- /dev/null
+++ b/validation/copy_to_user_sizes.c
@@ -0,0 +1,53 @@
+static void copy_to_user(void *to, const void *from, unsigned long n)
+{
+}
+
+struct foo {
+	int bar;
+};
+
+static void main(void)
+{
+	void *p;
+	struct foo f;
+	int uninitialized;
+
+	copy_to_user(p, &f, sizeof(f));
+	copy_to_user(p, &f, sizeof(f)-1);
+	copy_to_user(p, &f, sizeof(f)+1);
+	copy_to_user(p, &f, 1);
+	copy_to_user(p, &f, 100);
+	copy_to_user(p, &f, uninitialized);
+}
+
+#if 0
+static void broken(void)
+{
+	void *p;
+	char arr[100];
+	struct foo foo_arr[2];
+
+	/*
+	 * These all fail right now, because sparse seems to unify the type of
+	 * arr/foo_arr to char * /struct foo *, instead of char[]/struct foo[].
+	 *
+	 * That's unfortunate, because it means that these generate false
+	 * positives in the kernel when using a static array length, and that
+	 * seems like a case where this type of check would be especially
+	 * useful.
+	 */
+	copy_to_user(p, arr, 100);
+	copy_to_user(p, arr, 101);
+	copy_to_user(p, foo_arr, sizeof(foo_arr));
+	copy_to_user(p, foo_arr, sizeof(foo_arr)-1);
+	copy_to_user(p, foo_arr, sizeof(foo_arr[0])*2);
+}
+#endif
+/*
+ * check-name: copy_to_user sizes
+ *
+ * check-error-start
+copy_to_user_sizes.c:17:9: warning: copy_to_user() where size (5) is larger than src (4)
+copy_to_user_sizes.c:19:9: warning: copy_to_user() where size (100) is larger than src (4)
+ * check-error-end
+ */
diff --git a/validation/copy_to_user_sizes_inline.c b/validation/copy_to_user_sizes_inline.c
new file mode 100644
index 0000000..bd3c3bd
--- /dev/null
+++ b/validation/copy_to_user_sizes_inline.c
@@ -0,0 +1,29 @@
+static inline void copy_to_user(void *to, const void *from, unsigned long n)
+{
+}
+
+struct foo {
+	int bar;
+};
+
+static void main(void)
+{
+	void *p;
+	struct foo f;
+	int uninitialized;
+
+	copy_to_user(p, &f, sizeof(f));
+	copy_to_user(p, &f, sizeof(f)-1);
+	copy_to_user(p, &f, sizeof(f)+1);
+	copy_to_user(p, &f, 1);
+	copy_to_user(p, &f, 100);
+	copy_to_user(p, &f, uninitialized);
+}
+/*
+ * check-name: copy_to_user sizes
+ *
+ * check-error-start
+copy_to_user_sizes_inline.c:17:21: warning: copy_to_user() where size (5) is larger than src (4)
+copy_to_user_sizes_inline.c:19:21: warning: copy_to_user() where size (100) is larger than src (4)
+ * check-error-end
+ */
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC v1 0/4] static analysis of copy_to_user()
  2018-12-20 19:59 [RFC v1 0/4] static analysis of copy_to_user() Tycho Andersen
                   ` (3 preceding siblings ...)
  2018-12-20 19:59 ` [RFC v1 4/4] check copy_to_user() sizes Tycho Andersen
@ 2018-12-21 22:47 ` Luc Van Oostenryck
  2019-01-20 19:05   ` Tycho Andersen
  4 siblings, 1 reply; 9+ messages in thread
From: Luc Van Oostenryck @ 2018-12-21 22:47 UTC (permalink / raw)
  To: Tycho Andersen; +Cc: linux-sparse, kernel-hardening

On Thu, Dec 20, 2018 at 12:59:27PM -0700, Tycho Andersen wrote:
> Hi all,
> 
> A while ago I talked with various people about whether some static
> analsys of copy_to_user() could be productive in finding infoleaks.
> Unfortunately, due to the various issues outlined in the patch notes, it
> doesn't seem like it is. Perhaps these checks are useful to put in just
> to future proof ourselves against these sorts of issues, though.
> 
> Anyway, here's the code. Thoughts welcome!

Hi,

I'm taking the first patch directly but I won't be able to look
closer at the other patches until next week.

Best regards,
-- Luc 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC v1 0/4] static analysis of copy_to_user()
  2018-12-21 22:47 ` [RFC v1 0/4] static analysis of copy_to_user() Luc Van Oostenryck
@ 2019-01-20 19:05   ` Tycho Andersen
  2019-01-21 21:41     ` Luc Van Oostenryck
  0 siblings, 1 reply; 9+ messages in thread
From: Tycho Andersen @ 2019-01-20 19:05 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: linux-sparse, kernel-hardening

Hey Luc,

On Fri, Dec 21, 2018 at 11:47:19PM +0100, Luc Van Oostenryck wrote:
> On Thu, Dec 20, 2018 at 12:59:27PM -0700, Tycho Andersen wrote:
> > Hi all,
> > 
> > A while ago I talked with various people about whether some static
> > analsys of copy_to_user() could be productive in finding infoleaks.
> > Unfortunately, due to the various issues outlined in the patch notes, it
> > doesn't seem like it is. Perhaps these checks are useful to put in just
> > to future proof ourselves against these sorts of issues, though.
> > 
> > Anyway, here's the code. Thoughts welcome!
> 
> Hi,
> 
> I'm taking the first patch directly but I won't be able to look
> closer at the other patches until next week.

Any chance you can take a peek at these?

Cheers,

Tycho

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC v1 0/4] static analysis of copy_to_user()
  2019-01-20 19:05   ` Tycho Andersen
@ 2019-01-21 21:41     ` Luc Van Oostenryck
  2019-01-24  3:15       ` Tycho Andersen
  0 siblings, 1 reply; 9+ messages in thread
From: Luc Van Oostenryck @ 2019-01-21 21:41 UTC (permalink / raw)
  To: Tycho Andersen; +Cc: linux-sparse, kernel-hardening

On Mon, Jan 21, 2019 at 08:05:10AM +1300, Tycho Andersen wrote:
> Hey Luc,
> 
> On Fri, Dec 21, 2018 at 11:47:19PM +0100, Luc Van Oostenryck wrote:
> > On Thu, Dec 20, 2018 at 12:59:27PM -0700, Tycho Andersen wrote:
> > > Hi all,
> > > 
> > > A while ago I talked with various people about whether some static
> > > analsys of copy_to_user() could be productive in finding infoleaks.
> > > Unfortunately, due to the various issues outlined in the patch notes, it
> > > doesn't seem like it is. Perhaps these checks are useful to put in just
> > > to future proof ourselves against these sorts of issues, though.
> > > 
> > > Anyway, here's the code. Thoughts welcome!
> > 
> > Hi,
> > 
> > I'm taking the first patch directly but I won't be able to look
> > closer at the other patches until next week.
> 
> Any chance you can take a peek at these?

Hi,

Sorry, I've had few available time the last weeks.
I had look at them shortly after you send them but
I haven't yet made my mind about them.

I'm quite reluctant to add complexity (the AST walking)
if it doesn't bring much benefit if any.

In, short the problems are:
1) duplication of the AST walking
2) unreliable type because of using void *
3) unreliable size because array to pointer degeneracy

There is some solutions, though:
1) what *could* be done is to add a method 'check'
   to struct symbol_op and call it, *for example*,
   just after op->expand() in expand_symbol_call()
   (and add a mechanism to set this method for the symbol
   corresponding to copy_to_user()).

   Otherwise, splitting the AST walking from sparse.c
   and making it something generic would be preferable.

   Another approach could be keep the check via OP_CALL
   but doing it just after linearization, before the
   optimization destroy the types (and add, if needed,
   some flag to force linearize_cast() keep absolutely
   all type info).

2) this one seems pretty hopeless

3) the current calls degenerate()/create_pointer()
   do indeed destroy the original type and (at first sight)
   no 'addressof' should exist anymore after evaluation.
   This is inconsistent with the existence of expand_addressof().
   By changing degenerate()/create_pointer() the original
   type should stay available.

Sorry again for the late reply,
-- Luc

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC v1 0/4] static analysis of copy_to_user()
  2019-01-21 21:41     ` Luc Van Oostenryck
@ 2019-01-24  3:15       ` Tycho Andersen
  0 siblings, 0 replies; 9+ messages in thread
From: Tycho Andersen @ 2019-01-24  3:15 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: linux-sparse, kernel-hardening

Hi Luc,

On Mon, Jan 21, 2019 at 10:41:28PM +0100, Luc Van Oostenryck wrote:
> On Mon, Jan 21, 2019 at 08:05:10AM +1300, Tycho Andersen wrote:
> > Hey Luc,
> > 
> > On Fri, Dec 21, 2018 at 11:47:19PM +0100, Luc Van Oostenryck wrote:
> > > On Thu, Dec 20, 2018 at 12:59:27PM -0700, Tycho Andersen wrote:
> > > > Hi all,
> > > > 
> > > > A while ago I talked with various people about whether some static
> > > > analsys of copy_to_user() could be productive in finding infoleaks.
> > > > Unfortunately, due to the various issues outlined in the patch notes, it
> > > > doesn't seem like it is. Perhaps these checks are useful to put in just
> > > > to future proof ourselves against these sorts of issues, though.
> > > > 
> > > > Anyway, here's the code. Thoughts welcome!
> > > 
> > > Hi,
> > > 
> > > I'm taking the first patch directly but I won't be able to look
> > > closer at the other patches until next week.
> > 
> > Any chance you can take a peek at these?
> 
> Hi,
> 
> Sorry, I've had few available time the last weeks.
> I had look at them shortly after you send them but
> I haven't yet made my mind about them.
> 
> I'm quite reluctant to add complexity (the AST walking)
> if it doesn't bring much benefit if any.

No problem :).

> In, short the problems are:
> 1) duplication of the AST walking
> 2) unreliable type because of using void *
> 3) unreliable size because array to pointer degeneracy
> 
> There is some solutions, though:
> 1) what *could* be done is to add a method 'check'
>    to struct symbol_op and call it, *for example*,
>    just after op->expand() in expand_symbol_call()
>    (and add a mechanism to set this method for the symbol
>    corresponding to copy_to_user()).
> 
>    Otherwise, splitting the AST walking from sparse.c
>    and making it something generic would be preferable.

Yeah, this sounds like a good option to me.

>    Another approach could be keep the check via OP_CALL
>    but doing it just after linearization, before the
>    optimization destroy the types (and add, if needed,
>    some flag to force linearize_cast() keep absolutely
>    all type info).
> 
> 2) this one seems pretty hopeless

I was hoping you might have some brilliant insight here. It seems like
these checks could catch real bugs at some point, so I'll give the
changes you've suggested a go over the next couple of weeks and see
about a v2.

> 3) the current calls degenerate()/create_pointer()
>    do indeed destroy the original type and (at first sight)
>    no 'addressof' should exist anymore after evaluation.
>    This is inconsistent with the existence of expand_addressof().
>    By changing degenerate()/create_pointer() the original
>    type should stay available.

Ah ha, thanks. I guess it's not necessary to change create_pointer()
for this, but degenerate() definitely looks important.

Thanks!

Tycho

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-01-24  3:15 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-20 19:59 [RFC v1 0/4] static analysis of copy_to_user() Tycho Andersen
2018-12-20 19:59 ` [RFC v1 1/4] expression.h: update comment to include other cast types Tycho Andersen
2018-12-20 19:59 ` [RFC v1 2/4] move name-based analysis before linearization Tycho Andersen
2018-12-20 19:59 ` [RFC v1 3/4] add a check for copy_to_user() address spaces Tycho Andersen
2018-12-20 19:59 ` [RFC v1 4/4] check copy_to_user() sizes Tycho Andersen
2018-12-21 22:47 ` [RFC v1 0/4] static analysis of copy_to_user() Luc Van Oostenryck
2019-01-20 19:05   ` Tycho Andersen
2019-01-21 21:41     ` Luc Van Oostenryck
2019-01-24  3:15       ` Tycho Andersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).