All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] API for inhibiting speculative arbitrary read primitives
@ 2018-01-03 22:38 Mark Rutland
  2018-01-03 22:38 ` [RFC PATCH 1/4] asm-generic/barrier: add generic nospec helpers Mark Rutland
                   ` (4 more replies)
  0 siblings, 5 replies; 66+ messages in thread
From: Mark Rutland @ 2018-01-03 22:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-arch, Mark Rutland

Recently, Google Project Zero discovered several classes of attack
against speculative execution. One of these, known as variant-1, allows
explicit bounds checks to be bypassed under speculation, providing an
arbitrary read gadget. Further details can be found on the GPZ blog [1]
and the Documentation patch in this series.

There are a number of potential gadgets in the Linux codebase, and
mitigations for these are architecture-specific.

This RFC attempts to provide a cross-architecture API for inhibiting
these primitives. Hopefully, architecture-specific mitigations can be
unified behind this. An arm64 implementation is provided following the
architecturally recommended sequence laid out in the Arm whitepaper [2].
The API is based on a proposed compiler intrinsic [3].

I've provided a patch to BPF as an example use of the API. I know that
this is incomplete and less than optimal. I'd appreciate feedback from
other affected architectures as to whether this API is suitable for
their required mitigation.

I've pushed the series to my kernel.org repo [4].

[1] https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html
[2] https://developer.arm.com/support/security-update
[3] https://developer.arm.com/support/security-update/compiler-support-for-mitigations
[4] git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git core/nospec

Thanks,
Mark.

Mark Rutland (4):
  asm-generic/barrier: add generic nospec helpers
  Documentation: document nospec helpers
  arm64: implement nospec_{load,ptr}()
  bpf: inhibit speculated out-of-bounds pointers

 Documentation/speculation.txt    | 99 ++++++++++++++++++++++++++++++++++++++++
 arch/arm64/include/asm/barrier.h | 61 +++++++++++++++++++++++++
 include/asm-generic/barrier.h    | 76 ++++++++++++++++++++++++++++++
 kernel/bpf/arraymap.c            | 21 ++++++---
 kernel/bpf/cpumap.c              |  8 ++--
 kernel/bpf/devmap.c              |  6 ++-
 kernel/bpf/sockmap.c             |  6 ++-
 7 files changed, 265 insertions(+), 12 deletions(-)
 create mode 100644 Documentation/speculation.txt

-- 
2.11.0

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH 1/4] asm-generic/barrier: add generic nospec helpers
  2018-01-03 22:38 [RFC PATCH 0/4] API for inhibiting speculative arbitrary read primitives Mark Rutland
@ 2018-01-03 22:38 ` Mark Rutland
  2018-01-04 12:00   ` Mark Rutland
  2018-01-03 22:38 ` [RFC PATCH 2/4] Documentation: document " Mark Rutland
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 66+ messages in thread
From: Mark Rutland @ 2018-01-03 22:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-arch, Mark Rutland, Will Deacon

Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
memory accesses under a bounds check may be speculated even if the
bounds check fails, providing a primitive for building a side channel.

This patch adds helpers which can be used to inhibit the use of
out-of-bounds pointers and/or valeus read from these under speculation.

A generic implementation is provided for compatibility, but does not
guarantee safety under speculation. Architectures are expected to
override these helpers as necessary.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 include/asm-generic/barrier.h | 76 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 76 insertions(+)

diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index fe297b599b0a..5eba6ae0c34e 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -54,6 +54,82 @@
 #define read_barrier_depends()		do { } while (0)
 #endif
 
+/**
+ * nospec_ptr() - Ensure a  pointer is bounded, even under speculation.
+ *
+ * @ptr: the pointer to test
+ * @lo: the lower valid bound for @ptr, inclusive
+ * @hi: the upper valid bound for @ptr, exclusive
+ *
+ * If @ptr falls in the interval [@lo, @i), returns @ptr, otherwise returns
+ * NULL.
+ *
+ * Architectures should override this to ensure that ptr falls in the [lo, hi)
+ * interval both under architectural execution and under speculation,
+ * preventing propagation of an out-of-bounds pointer to code which is
+ * speculatively executed.
+ */
+#ifndef nospec_ptr
+#define nospec_ptr(ptr, lo, hi)						\
+({									\
+	typeof (ptr) __ptr = (ptr);					\
+	typeof (ptr) __lo = (lo);					\
+	typeof (ptr) __hi = (hi);					\
+									\
+	(__lo <= __ptr && __ptr < __hi) ? __ptr : NULL;			\
+})
+#endif
+
+/**
+ * nospec_load() - Load a pointer, respecting bounds under speculation
+ *
+ * @ptr: the pointer to load
+ * @lo: the lower valid bound for @ptr, inclusive
+ * @hi: the upper valid bound for @ptr, exclusive
+ *
+ * If @ptr falls in the interval [@lo, @hi), returns the value at @ptr,
+ * otherwise returns (typeof(*ptr))0.
+ *
+ * Architectures should override this to ensure that ptr falls in the [lo, hi)
+ * interval both under architectural execution and under speculation,
+ * preventing speculative out-of-bounds reads.
+ */
+#ifndef nospec_load
+#define nospec_load(ptr, lo, hi)					\
+({									\
+	typeof (ptr) __ptr = (ptr);					\
+	typeof (ptr) __lo = (lo);					\
+	typeof (ptr) __hi = (hi);					\
+									\
+	(__lo <= __ptr && __ptr <= __hi) ?				\
+		*__ptr : 						\
+		(typeof(*__ptr))(unsigned long)0;			\
+})
+#endif
+
+/**
+ * nospec_array_load - Load an array entry, respecting bounds under speculation
+ *
+ * @arr: the base of the array
+ * @idx: the index of the element to load
+ * @sz: the number of elements in the array
+ *
+ * If @idx falls in the interval [0, @sz), returns the value at @arr[@idx],
+ * otherwise returns (typeof(*ptr))0.
+ *
+ * This is a wrapper around nospec_load(), provided for convenience.
+ * Architectures should implement nospec_load() to ensure this is the case
+ * under speculation.
+ */
+#define nospec_array_load(arr, idx, sz)					\
+({									\
+	typeof(*(arr)) *__arr = arr;					\
+	typeof(idx) __idx = idx;					\
+	typeof(sz) __sz = __sz;						\
+									\
+	nospec_load(__arr + __idx, __arr, __arr + __sz);		\
+})
+
 #ifndef __smp_mb
 #define __smp_mb()	mb()
 #endif
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC PATCH 2/4] Documentation: document nospec helpers
  2018-01-03 22:38 [RFC PATCH 0/4] API for inhibiting speculative arbitrary read primitives Mark Rutland
  2018-01-03 22:38 ` [RFC PATCH 1/4] asm-generic/barrier: add generic nospec helpers Mark Rutland
@ 2018-01-03 22:38 ` Mark Rutland
  2018-01-03 22:38 ` [RFC PATCH 3/4] arm64: implement nospec_{load,ptr}() Mark Rutland
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 66+ messages in thread
From: Mark Rutland @ 2018-01-03 22:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-arch, Mark Rutland, Will Deacon

Document the rationale and usage of the new nospec*() helpers.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 Documentation/speculation.txt | 99 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 99 insertions(+)
 create mode 100644 Documentation/speculation.txt

diff --git a/Documentation/speculation.txt b/Documentation/speculation.txt
new file mode 100644
index 000000000000..0bec4ed5ac29
--- /dev/null
+++ b/Documentation/speculation.txt
@@ -0,0 +1,99 @@
+This document explains potential effects of speculation, and how undesirable
+effects can be mitigated portably using common APIs.
+
+===========
+Speculation
+===========
+
+To improve performance and minimize average latencies, many contemporary CPUs
+employ speculative execution techniques such as branch prediction, performing
+work which may be discarded at a later stage.
+
+Typically speculative execution cannot be observed from architectural state,
+such as the contents of registers. However, in some cases it is possible to
+observe its impact on microarchitectural state, such as the presence or
+absence of data in caches. Such state may form side-channels which can be
+observed to extract secret information.
+
+For example, in the presence of branch prediction, it is possible for bounds
+checks to be ignored by code which is speculatively executed. Consider the
+following code:
+
+	int load_array(int *array, unsigned int idx) {
+		if (idx >= MAX_ARRAY_ELEMS)
+			return 0;
+		else
+			return array[idx];
+	}
+
+Which, on arm64, may be compiled to an assembly sequence such as:
+
+	CMP	<idx>, #MAX_ARRAY_ELEMS
+	B.LT	less
+	MOV	<returnval>, #0
+	RET
+  less:
+	LDR	<returnval>, [<array>, <idx>]
+	RET
+
+It is possible that a CPU mis-predicts the conditional branch, and
+speculatively loads array[idx], even if idx >= MAX_ARRAY_ELEMS. This value
+will subsequently be discarded, but the speculated load may affect
+microarchitectural state which can be subsequently measured.
+
+More complex sequences involving multiple dependent memory accesses may result
+in sensitive information being leaked. Consider the following code, building on
+the prior example:
+
+	int load_dependent_arrays(int *arr1, int *arr2, int idx) {
+		int val1, val2,
+
+		val1 = load_array(arr1, idx);
+		val2 = load_array(arr2, val1);
+
+		return val2;
+	}
+
+Under speculation, the first call to load_array() may return the value of an
+out-of-bounds address, while the second call will influence microarchitectural
+state dependent on this value. This may provide an arbitrary read primitive.
+
+====================================
+Mitigating speculation side-channels
+====================================
+
+The kernel provides a generic API to ensure that bounds checks are respected
+even under speculation. Architectures which are affected by speculation-based
+side-channels are expected to implement these primitives.
+
+The following helpers found in <asm/barrier.h> can be used to prevent
+information from being leaked via side-channels.
+
+* nospec_load(ptr, lo, hi)
+
+  Returns the data at *ptr only if ptr falls in the [lo, hi) interval. When
+  ptr < lo or ptr >= hi, typeof(*ptr)0 is returned, even under speculation.
+
+  This does not prevent an out-of-bounds load from being speculated, but does
+  prevent its value from influencing code which is subsequently speculated,
+  preventing the value from being leaked.
+
+* nospec_array_load(arr, idx, sz)
+
+  Returns the data at arr[idx] only if idx falls in the [0, sz) interval. When
+  idx < 0 or idx > sz, typeof(*arr)0 is returned, even under speculation.
+
+  This is a wrapper around nospec_load() provided for convenience.
+
+* nospec_ptr(ptr, lo, hi)
+
+  Returns a sanitized pointer that is bounded by the [lo, hi) interval, even
+  under speculation. If ptr < lo, or ptr >= hi, NULL is returned.
+
+  This is expected to be used by code which computes a pointer to an element
+  of a data structure, or where multiple fields of a data structure will be
+  accessed.
+
+  Note that it is not safe to compare the returned value to the original
+  pointer, as compiler optimizations may infer that the original unsanitized
+  pointer is safe to use when the two compare equal.
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC PATCH 3/4] arm64: implement nospec_{load,ptr}()
  2018-01-03 22:38 [RFC PATCH 0/4] API for inhibiting speculative arbitrary read primitives Mark Rutland
  2018-01-03 22:38 ` [RFC PATCH 1/4] asm-generic/barrier: add generic nospec helpers Mark Rutland
  2018-01-03 22:38 ` [RFC PATCH 2/4] Documentation: document " Mark Rutland
@ 2018-01-03 22:38 ` Mark Rutland
  2018-01-03 22:38 ` [RFC PATCH 4/4] bpf: inhibit speculated out-of-bounds pointers Mark Rutland
  2018-01-04  0:15 ` [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier Dan Williams
  4 siblings, 0 replies; 66+ messages in thread
From: Mark Rutland @ 2018-01-03 22:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-arch, Mark Rutland, Will Deacon

This patch implements nospec_load() and nospec_ptr() for arm64,
following the recommended architectural sequence.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/barrier.h | 61 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index 77651c49ef44..5d17919e2688 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -40,6 +40,67 @@
 #define dma_rmb()	dmb(oshld)
 #define dma_wmb()	dmb(oshst)
 
+#define __load_no_speculate_n(ptr, lo, hi, failval, cmpptr, w, sz)	\
+({									\
+	typeof(*ptr)	__nln_val;					\
+	typeof(*ptr)	__failval = 					\
+		(typeof(*ptr))(unsigned long)(failval);			\
+									\
+	asm volatile (							\
+	"	cmp	%[c], %[l]\n"					\
+	"	ccmp	%[c], %[h], 2, cs\n"				\
+	"	b.cs	1f\n"						\
+	"	ldr" #sz " %" #w "[v], %[p]\n"				\
+	"1:	csel	%" #w "[v], %" #w "[v], %" #w "[f], cc\n"	\
+	"	hint	#0x14 // CSDB\n"				\
+	: [v] "=&r" (__nln_val)						\
+	: [p] "m" (*(ptr)), [l] "r" (lo), [h] "r" (hi),			\
+	  [f] "rZ" (__failval), [c] "r" (cmpptr)			\
+	: "cc");							\
+									\
+	__nln_val;							\
+})
+
+#define __load_no_speculate(ptr, lo, hi, failval, cmpptr)		\
+({									\
+	typeof(*(ptr)) __nl_val;					\
+									\
+	switch (sizeof(__nl_val)) {					\
+	case 1:								\
+		__nl_val = __load_no_speculate_n(ptr, lo, hi, failval,	\
+						 cmpptr, w, b);		\
+		break;							\
+	case 2:								\
+		__nl_val = __load_no_speculate_n(ptr, lo, hi, failval,	\
+						 cmpptr, w, h);		\
+		break;							\
+	case 4:								\
+		__nl_val = __load_no_speculate_n(ptr, lo, hi, failval,	\
+						 cmpptr, w, );		\
+		break;							\
+	case 8:								\
+		__nl_val = __load_no_speculate_n(ptr, lo, hi, failval,	\
+						 cmpptr, x, );		\
+		break;							\
+	default:							\
+		BUILD_BUG();						\
+	}								\
+									\
+	__nl_val;							\
+})
+
+#define nospec_load(ptr, lo, hi)					\
+({									\
+	typeof(ptr) __nl_ptr = (ptr);					\
+	__load_no_speculate(__nl_ptr, lo, hi, 0, __nl_ptr);		\
+})
+
+#define nospec_ptr(ptr, lo, hi)						\
+({									\
+	typeof(ptr) __np_ptr = (ptr);					\
+	__load_no_speculate(&__np_ptr, lo, hi, 0, __np_ptr);		\
+})
+
 #define __smp_mb()	dmb(ish)
 #define __smp_rmb()	dmb(ishld)
 #define __smp_wmb()	dmb(ishst)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC PATCH 4/4] bpf: inhibit speculated out-of-bounds pointers
  2018-01-03 22:38 [RFC PATCH 0/4] API for inhibiting speculative arbitrary read primitives Mark Rutland
                   ` (2 preceding siblings ...)
  2018-01-03 22:38 ` [RFC PATCH 3/4] arm64: implement nospec_{load,ptr}() Mark Rutland
@ 2018-01-03 22:38 ` Mark Rutland
  2018-01-03 23:45   ` Peter Zijlstra
  2018-01-04  0:15 ` [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier Dan Williams
  4 siblings, 1 reply; 66+ messages in thread
From: Mark Rutland @ 2018-01-03 22:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-arch, Mark Rutland, Will Deacon

Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
memory accesses under a bounds check may be speculated even if the
bounds check fails, providing a primitive for building a side channel.

The EBPF map code has a number of such bounds-checks accesses in
map_lookup_elem implementations. This patch modifies these to use the
nospec helpers to inhibit such side channels.

The JITted lookup_elem implementations remain potentially vulnerable,
and are disabled (with JITted code falling back to the C
implementations).

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 kernel/bpf/arraymap.c | 21 ++++++++++++++-------
 kernel/bpf/cpumap.c   |  8 +++++---
 kernel/bpf/devmap.c   |  6 +++++-
 kernel/bpf/sockmap.c  |  6 +++++-
 4 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 7c25426d3cf5..5090636da2c1 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -117,15 +117,20 @@ static void *array_map_lookup_elem(struct bpf_map *map, void *key)
 {
 	struct bpf_array *array = container_of(map, struct bpf_array, map);
 	u32 index = *(u32 *)key;
+	void *ptr, *high;
 
 	if (unlikely(index >= array->map.max_entries))
 		return NULL;
 
-	return array->value + array->elem_size * index;
+	ptr = array->value + array->elem_size * index;
+	high = array->value + array->elem_size * array->map.max_entries;
+
+	return nospec_ptr(ptr, array->value, high);
 }
 
 /* emit BPF instructions equivalent to C code of array_map_lookup_elem() */
-static u32 array_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf)
+static u32 __maybe_unused array_map_gen_lookup(struct bpf_map *map,
+					       struct bpf_insn *insn_buf)
 {
 	struct bpf_insn *insn = insn_buf;
 	u32 elem_size = round_up(map->value_size, 8);
@@ -153,11 +158,15 @@ static void *percpu_array_map_lookup_elem(struct bpf_map *map, void *key)
 {
 	struct bpf_array *array = container_of(map, struct bpf_array, map);
 	u32 index = *(u32 *)key;
+	void __percpu **pptrs, __percpu **high;
 
 	if (unlikely(index >= array->map.max_entries))
 		return NULL;
 
-	return this_cpu_ptr(array->pptrs[index]);
+	pptrs = array->pptrs + index;
+	high = array->pptrs + array->map.max_entries;
+
+	return this_cpu_ptr(nospec_load(pptrs, array->pptrs, high));
 }
 
 int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value)
@@ -302,7 +311,6 @@ const struct bpf_map_ops array_map_ops = {
 	.map_lookup_elem = array_map_lookup_elem,
 	.map_update_elem = array_map_update_elem,
 	.map_delete_elem = array_map_delete_elem,
-	.map_gen_lookup = array_map_gen_lookup,
 };
 
 const struct bpf_map_ops percpu_array_map_ops = {
@@ -610,8 +618,8 @@ static void *array_of_map_lookup_elem(struct bpf_map *map, void *key)
 	return READ_ONCE(*inner_map);
 }
 
-static u32 array_of_map_gen_lookup(struct bpf_map *map,
-				   struct bpf_insn *insn_buf)
+static u32 __maybe_unused array_of_map_gen_lookup(struct bpf_map *map,
+						  struct bpf_insn *insn_buf)
 {
 	u32 elem_size = round_up(map->value_size, 8);
 	struct bpf_insn *insn = insn_buf;
@@ -644,5 +652,4 @@ const struct bpf_map_ops array_of_maps_map_ops = {
 	.map_fd_get_ptr = bpf_map_fd_get_ptr,
 	.map_fd_put_ptr = bpf_map_fd_put_ptr,
 	.map_fd_sys_lookup_elem = bpf_map_fd_sys_lookup_elem,
-	.map_gen_lookup = array_of_map_gen_lookup,
 };
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index ce5b669003b2..52831b101d35 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -551,13 +551,15 @@ void cpu_map_free(struct bpf_map *map)
 struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 key)
 {
 	struct bpf_cpu_map *cmap = container_of(map, struct bpf_cpu_map, map);
-	struct bpf_cpu_map_entry *rcpu;
+	struct bpf_cpu_map_entry **ptr, **high;
 
 	if (key >= map->max_entries)
 		return NULL;
 
-	rcpu = READ_ONCE(cmap->cpu_map[key]);
-	return rcpu;
+	ptr = cmap->cpu_map + key;
+	high = cmap->cpu_map + map->max_entries;
+
+	return READ_ONCE(*nospec_ptr(ptr, cmap->cpu_map, high));
 }
 
 static void *cpu_map_lookup_elem(struct bpf_map *map, void *key)
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index ebdef54bf7df..23b2b0547304 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -250,11 +250,15 @@ struct net_device  *__dev_map_lookup_elem(struct bpf_map *map, u32 key)
 {
 	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
 	struct bpf_dtab_netdev *dev;
+	struct bpf_dtab_netdev **ptr, **high;
 
 	if (key >= map->max_entries)
 		return NULL;
 
-	dev = READ_ONCE(dtab->netdev_map[key]);
+	ptr = dtab->netdev_map + key;
+	high = dtab->netdev_map + map->max_entries;
+
+	dev = READ_ONCE(*nospec_ptr(ptr, dtab->netdev_map, high));
 	return dev ? dev->dev : NULL;
 }
 
diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 5ee2e41893d9..ea59f6737751 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -626,11 +626,15 @@ static int sock_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
 struct sock  *__sock_map_lookup_elem(struct bpf_map *map, u32 key)
 {
 	struct bpf_stab *stab = container_of(map, struct bpf_stab, map);
+	struct sock **ptr, **high;
 
 	if (key >= map->max_entries)
 		return NULL;
 
-	return READ_ONCE(stab->sock_map[key]);
+	ptr = stab->sock_map + key;
+	high = stab->sock_map + map->max_entries;
+
+	return READ_ONCE(*nospec_ptr(ptr, stab->sock_map, high));
 }
 
 static int sock_map_delete_elem(struct bpf_map *map, void *key)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH 4/4] bpf: inhibit speculated out-of-bounds pointers
  2018-01-03 22:38 ` [RFC PATCH 4/4] bpf: inhibit speculated out-of-bounds pointers Mark Rutland
@ 2018-01-03 23:45   ` Peter Zijlstra
  2018-01-04 10:59     ` Mark Rutland
  0 siblings, 1 reply; 66+ messages in thread
From: Peter Zijlstra @ 2018-01-03 23:45 UTC (permalink / raw)
  To: Mark Rutland; +Cc: linux-kernel, linux-arch, Will Deacon

On Wed, Jan 03, 2018 at 10:38:27PM +0000, Mark Rutland wrote:
> Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
> memory accesses under a bounds check may be speculated even if the
> bounds check fails, providing a primitive for building a side channel.
> 
> The EBPF map code has a number of such bounds-checks accesses in
> map_lookup_elem implementations. This patch modifies these to use the
> nospec helpers to inhibit such side channels.
> 
> The JITted lookup_elem implementations remain potentially vulnerable,
> and are disabled (with JITted code falling back to the C
> implementations).

Since this is now public, let me re-iterate that I don't particularly
like this approach. If you have to kill the JIT, could we please keep
that in the arch JIT implementation?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-03 22:38 [RFC PATCH 0/4] API for inhibiting speculative arbitrary read primitives Mark Rutland
                   ` (3 preceding siblings ...)
  2018-01-03 22:38 ` [RFC PATCH 4/4] bpf: inhibit speculated out-of-bounds pointers Mark Rutland
@ 2018-01-04  0:15 ` Dan Williams
  2018-01-04  0:39   ` Linus Torvalds
  4 siblings, 1 reply; 66+ messages in thread
From: Dan Williams @ 2018-01-04  0:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mark Rutland, linux-arch, Peter Zijlstra, Greg KH,
	Thomas Gleixner, Linus Torvalds, Elena Reshetova, Alan Cox

The 'if_nospec' primitive marks locations where the kernel is disabling
speculative execution that could potentially access privileged data. It
is expected to be paired with a 'nospec_{ptr,load}' where the user
controlled value is actually consumed. Architectures can optionally
implement a speculation barrier in 'if_nospec' or a speculation sync in
each 'nospec_{ptr,load}' depending what is best/feasible for the
architecture.

The pairing of if_nospec and nospec_load documents which branch
is sensitive to which load(s) in a code block. For example:

	if_nospec(foo < bar) {
		ptr = array_base + foo;
		array_max = array_base + bar;
		baz = nospec_load(ptr, array_base, array_max);
	}

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---

This proposal builds on Mark's nospec_load RFC here:

    https://lkml.org/lkml/2018/1/3/754

...and combines it with a suggestion from Peter to introduce if_nospec.
The goal is to both document which loads must not be speculated behind
which branches, and allow architectures like x86 that can do a single
barrier after the branch to coexist with architectures like ARM that
want to instrument each load.

 include/asm-generic/barrier.h |   31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 5eba6ae0c34e..25c1b47f84b7 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -55,6 +55,27 @@
 #endif
 
 /**
+ * if_nospec() - block speculative execution of this branch
+ *
+ * @cond: condition that if true should barrier speculation
+ *
+ * Architectures should override the definition of nospec_barrier() to
+ * inject a speculative execution barrier for any conditional branches
+ * that might speculatively execute reads based on user controlled
+ * value.  Architectures that can more efficiently flush speculation via
+ * nospec_load can leave nospec_barrier as a nop.
+ *
+ * The expectation is that nospec_{load,ptr} are always used inside an
+ * if_nospec block so that architectures that can use a single barrier
+ * after the branch can do that once rather than per access.
+ */
+#define if_nospec(cond)	if (({ bool ret = (cond); nospec_barrier(); ret }))
+
+#ifndef nospec_barrier
+#define nospec_barrier() do { } while (0)
+#endif
+
+/**
  * nospec_ptr() - Ensure a  pointer is bounded, even under speculation.
  *
  * @ptr: the pointer to test
@@ -68,6 +89,11 @@
  * interval both under architectural execution and under speculation,
  * preventing propagation of an out-of-bounds pointer to code which is
  * speculatively executed.
+ *
+ * Architectures that can more efficiently flush speculation via
+ * nospec_barrier can define nospec_ptr as a nop.
+ *
+ * The expectation is that if_nospec and nospec_ptr are always paired.
  */
 #ifndef nospec_ptr
 #define nospec_ptr(ptr, lo, hi)						\
@@ -93,6 +119,11 @@
  * Architectures should override this to ensure that ptr falls in the [lo, hi)
  * interval both under architectural execution and under speculation,
  * preventing speculative out-of-bounds reads.
+ *
+ * Architectures that can more efficiently flush speculation via
+ * nospec_barrier can define nospec_load as a nop.
+ *
+ * The expectation is that if_nospec and nospec_load are always paired.
  */
 #ifndef nospec_load
 #define nospec_load(ptr, lo, hi)					\

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04  0:15 ` [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier Dan Williams
@ 2018-01-04  0:39   ` Linus Torvalds
  2018-01-04  1:07     ` Alan Cox
  0 siblings, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2018-01-04  0:39 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux Kernel Mailing List, Mark Rutland, linux-arch,
	Peter Zijlstra, Greg KH, Thomas Gleixner, Elena Reshetova,
	Alan Cox

On Wed, Jan 3, 2018 at 4:15 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> The 'if_nospec' primitive marks locations where the kernel is disabling
> speculative execution that could potentially access privileged data. It
> is expected to be paired with a 'nospec_{ptr,load}' where the user
> controlled value is actually consumed.

I'm much less worried about these "nospec_load/if" macros, than I am
about having a sane way to determine when they should be needed.

Is there such a sane model right now, or are we talking "people will
randomly add these based on strong feelings"?

                    Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04  0:39   ` Linus Torvalds
@ 2018-01-04  1:07     ` Alan Cox
  2018-01-04  1:13       ` Dan Williams
  2018-01-04  1:27       ` Jiri Kosina
  0 siblings, 2 replies; 66+ messages in thread
From: Alan Cox @ 2018-01-04  1:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dan Williams, Linux Kernel Mailing List, Mark Rutland,
	linux-arch, Peter Zijlstra, Greg KH, Thomas Gleixner,
	Elena Reshetova, Alan Cox

On Wed, 3 Jan 2018 16:39:31 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Wed, Jan 3, 2018 at 4:15 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> > The 'if_nospec' primitive marks locations where the kernel is disabling
> > speculative execution that could potentially access privileged data. It
> > is expected to be paired with a 'nospec_{ptr,load}' where the user
> > controlled value is actually consumed.  
> 
> I'm much less worried about these "nospec_load/if" macros, than I am
> about having a sane way to determine when they should be needed.
> 
> Is there such a sane model right now, or are we talking "people will
> randomly add these based on strong feelings"?

There are people trying to tune coverity and other tool rules to identify
cases, and some of the work so far was done that way. For x86 we didn't
find too many so far so either the needed pattern is uncommon or ....  8)

Given you can execute over a hundred basic instructions in a speculation
window it does need to be a tool that can explore not just in function
but across functions. That's really tough for the compiler itself to do
without help.

What remains to be seen is if there are other patterns that affect
different processors.

In the longer term the compiler itself needs to know what is and isn't
safe (ie you need to be able to write things like

void foo(tainted __user int *x)

and have the compiler figure out what level of speculation it can do and
(on processors with those features like IA64) when it can and can't do
various kinds of non-trapping loads.

Alan

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04  1:07     ` Alan Cox
@ 2018-01-04  1:13       ` Dan Williams
  2018-01-04  6:28         ` Julia Lawall
  2018-01-04  1:27       ` Jiri Kosina
  1 sibling, 1 reply; 66+ messages in thread
From: Dan Williams @ 2018-01-04  1:13 UTC (permalink / raw)
  To: Alan Cox
  Cc: Linus Torvalds, Linux Kernel Mailing List, Mark Rutland,
	linux-arch, Peter Zijlstra, Greg KH, Thomas Gleixner,
	Elena Reshetova, Alan Cox, Dan Carpenter, Julia Lawall

[ adding Julia and Dan ]

On Wed, Jan 3, 2018 at 5:07 PM, Alan Cox <gnomes@lxorguk.ukuu.org.uk> wrote:
> On Wed, 3 Jan 2018 16:39:31 -0800
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>> On Wed, Jan 3, 2018 at 4:15 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>> > The 'if_nospec' primitive marks locations where the kernel is disabling
>> > speculative execution that could potentially access privileged data. It
>> > is expected to be paired with a 'nospec_{ptr,load}' where the user
>> > controlled value is actually consumed.
>>
>> I'm much less worried about these "nospec_load/if" macros, than I am
>> about having a sane way to determine when they should be needed.
>>
>> Is there such a sane model right now, or are we talking "people will
>> randomly add these based on strong feelings"?
>
> There are people trying to tune coverity and other tool rules to identify
> cases, and some of the work so far was done that way. For x86 we didn't
> find too many so far so either the needed pattern is uncommon or ....  8)
>
> Given you can execute over a hundred basic instructions in a speculation
> window it does need to be a tool that can explore not just in function
> but across functions. That's really tough for the compiler itself to do
> without help.
>
> What remains to be seen is if there are other patterns that affect
> different processors.
>
> In the longer term the compiler itself needs to know what is and isn't
> safe (ie you need to be able to write things like
>
> void foo(tainted __user int *x)
>
> and have the compiler figure out what level of speculation it can do and
> (on processors with those features like IA64) when it can and can't do
> various kinds of non-trapping loads.
>

It would be great if coccinelle and/or smatch could be taught to catch
some of these case at least as a first pass "please audit this code
block" type of notification.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04  1:07     ` Alan Cox
  2018-01-04  1:13       ` Dan Williams
@ 2018-01-04  1:27       ` Jiri Kosina
  2018-01-04  1:41         ` Alan Cox
  2018-01-04  1:51         ` Dan Williams
  1 sibling, 2 replies; 66+ messages in thread
From: Jiri Kosina @ 2018-01-04  1:27 UTC (permalink / raw)
  To: Alan Cox
  Cc: Linus Torvalds, Dan Williams, Linux Kernel Mailing List,
	Mark Rutland, linux-arch, Peter Zijlstra, Greg KH,
	Thomas Gleixner, Elena Reshetova, Alan Cox

On Thu, 4 Jan 2018, Alan Cox wrote:

> There are people trying to tune coverity and other tool rules to identify
> cases, 

Yeah, but that (and *especially* Coverity) is so inconvenient to be 
applied to each and every patch ... that this is not the way to go.

If the CPU speculation can cause these kinds of side-effects, it just must 
not happen, full stop. OS trying to work it around is just a whack-a-mole 
(which we can apply for old silicon, sure ... but not something that 
becomes a new standard) that's never going to lead to any ultimate 
solution.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04  1:27       ` Jiri Kosina
@ 2018-01-04  1:41         ` Alan Cox
  2018-01-04  1:47           ` Jiri Kosina
  2018-01-04  1:51         ` Dan Williams
  1 sibling, 1 reply; 66+ messages in thread
From: Alan Cox @ 2018-01-04  1:41 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Linus Torvalds, Dan Williams, Linux Kernel Mailing List,
	Mark Rutland, linux-arch, Peter Zijlstra, Greg KH,
	Thomas Gleixner, Elena Reshetova

On Thu, 4 Jan 2018 02:27:54 +0100 (CET)
Jiri Kosina <jikos@kernel.org> wrote:

> On Thu, 4 Jan 2018, Alan Cox wrote:
> 
> > There are people trying to tune coverity and other tool rules to identify
> > cases,   
> 
> Yeah, but that (and *especially* Coverity) is so inconvenient to be 
> applied to each and every patch ... that this is not the way to go.

Agreed enitely - and coverity is non-free which is even worse as a
dependancy. Right now we are in the 'what could be done quickly by a few
people' space. The papers are now published, so the world can work on
better solutions and extending more convenient tooling.

> If the CPU speculation can cause these kinds of side-effects, it just must 
> not happen, full stop. 

At which point your performance will resemble that of a 2012 atom
processor at best.

> OS trying to work it around is just a whack-a-mole 
> (which we can apply for old silicon, sure ... but not something that 
> becomes a new standard) that's never going to lead to any ultimate 
> solution.

In the ideal world it becomes possible for future processors to resolve
such markings down to no-ops. Will that be possible or will we get more
explicit ways to tell the processor what is unsafe - I don't
personally know but I do know that turning off speculation is not the
answer.

Clearly if the CPU must be told then C is going to have to grow some
syntax for it and some other languages are going to see 'taint' moving
from a purely software construct to a real processor one.

Alan

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04  1:41         ` Alan Cox
@ 2018-01-04  1:47           ` Jiri Kosina
  2018-01-04 19:39             ` Pavel Machek
  0 siblings, 1 reply; 66+ messages in thread
From: Jiri Kosina @ 2018-01-04  1:47 UTC (permalink / raw)
  To: Alan Cox
  Cc: Linus Torvalds, Dan Williams, Linux Kernel Mailing List,
	Mark Rutland, linux-arch, Peter Zijlstra, Greg KH,
	Thomas Gleixner, Elena Reshetova

On Thu, 4 Jan 2018, Alan Cox wrote:

> > If the CPU speculation can cause these kinds of side-effects, it just must 
> > not happen, full stop. 
> 
> At which point your performance will resemble that of a 2012 atom
> processor at best.

You know what? I'd be completely fine with that, if it's traded for "my 
ssh and internet banking keys are JUST MINE, ok?" :)

> Clearly if the CPU must be told then C is going to have to grow some
> syntax for it and some other languages are going to see 'taint' moving
> from a purely software construct to a real processor one.

Again, what is the problem with "yeah, it turned out to speed things up in 
the past, but hey, it has security issues, so we stopped doing it" 
aproach?

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04  1:27       ` Jiri Kosina
  2018-01-04  1:41         ` Alan Cox
@ 2018-01-04  1:51         ` Dan Williams
  2018-01-04  1:54           ` Linus Torvalds
  2018-01-04  1:59           ` Jiri Kosina
  1 sibling, 2 replies; 66+ messages in thread
From: Dan Williams @ 2018-01-04  1:51 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Alan Cox, Linus Torvalds, Linux Kernel Mailing List,
	Mark Rutland, linux-arch, Peter Zijlstra, Greg KH,
	Thomas Gleixner, Elena Reshetova, Alan Cox

On Wed, Jan 3, 2018 at 5:27 PM, Jiri Kosina <jikos@kernel.org> wrote:
> On Thu, 4 Jan 2018, Alan Cox wrote:
>
>> There are people trying to tune coverity and other tool rules to identify
>> cases,
>
> Yeah, but that (and *especially* Coverity) is so inconvenient to be
> applied to each and every patch ... that this is not the way to go.
>
> If the CPU speculation can cause these kinds of side-effects, it just must
> not happen, full stop. OS trying to work it around is just a whack-a-mole
> (which we can apply for old silicon, sure ... but not something that
> becomes a new standard) that's never going to lead to any ultimate
> solution.

Speaking from a purely Linux kernel maintenance process perspective we
play wack-a-mole with missed endian conversions and other bugs that
coccinelle, sparse, etc help us catch. So this is in that same
category, but yes, it's inconvenient.

Alan already mentioned potential compiler changes and annotating
variables so that the compiler can help in the future, but until then
we need these explicit checks.

Elena has done the work of auditing static analysis reports to a dozen
or so locations that need some 'nospec' handling.

The point of this RFC is to level set across architectures on the
macros/names/mechanisms that should be used for the first round of
fixes.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04  1:51         ` Dan Williams
@ 2018-01-04  1:54           ` Linus Torvalds
  2018-01-04  3:10             ` Williams, Dan J
  2018-01-04  1:59           ` Jiri Kosina
  1 sibling, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2018-01-04  1:54 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jiri Kosina, Alan Cox, Linux Kernel Mailing List, Mark Rutland,
	linux-arch, Peter Zijlstra, Greg KH, Thomas Gleixner,
	Elena Reshetova, Alan Cox

On Wed, Jan 3, 2018 at 5:51 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>
> Elena has done the work of auditing static analysis reports to a dozen
> or so locations that need some 'nospec' handling.

I'd love to see that patch, just to see how bad things look.

Because I think that really is very relevant to the interface too.

If we're talking "a dozen locations" that are fairly well constrained,
that's very different from having thousands all over the place.

                  Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04  1:51         ` Dan Williams
  2018-01-04  1:54           ` Linus Torvalds
@ 2018-01-04  1:59           ` Jiri Kosina
  2018-01-04  2:15             ` Alan Cox
  2018-01-04 20:40             ` Pavel Machek
  1 sibling, 2 replies; 66+ messages in thread
From: Jiri Kosina @ 2018-01-04  1:59 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jiri Kosina, Alan Cox, Linus Torvalds, Linux Kernel Mailing List,
	Mark Rutland, linux-arch, Peter Zijlstra, Greg KH,
	Thomas Gleixner, Elena Reshetova, Alan Cox

On Wed, 3 Jan 2018, Dan Williams wrote:

> Speaking from a purely Linux kernel maintenance process perspective we
> play wack-a-mole with missed endian conversions and other bugs that
> coccinelle, sparse, etc help us catch. 

Fully agreed.

> So this is in that same category, but yes, it's inconvenient.

Disagreed, violently. CPU has to execute the instructions I ask it to 
execute, and if it executes *anything* else that reveals any information 
about the instructions that have *not* been executed, it's flawed.

> Elena has done the work of auditing static analysis reports to a dozen
> or so locations that need some 'nospec' handling.

How exactly is that related (especially in longer-term support terms) to 
BPF anyway?

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04  1:59           ` Jiri Kosina
@ 2018-01-04  2:15             ` Alan Cox
  2018-01-04  3:12               ` Alexei Starovoitov
  2018-01-04 20:40             ` Pavel Machek
  1 sibling, 1 reply; 66+ messages in thread
From: Alan Cox @ 2018-01-04  2:15 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Dan Williams, Linus Torvalds, Linux Kernel Mailing List,
	Mark Rutland, linux-arch, Peter Zijlstra, Greg KH,
	Thomas Gleixner, Elena Reshetova

> Disagreed, violently. CPU has to execute the instructions I ask it to 
> execute, and if it executes *anything* else that reveals any information 
> about the instructions that have *not* been executed, it's flawed.

Then stick to in order processors. Just don't be in a hurry to get your
computation finished.

> > Elena has done the work of auditing static analysis reports to a dozen
> > or so locations that need some 'nospec' handling.  
> 
> How exactly is that related (especially in longer-term support terms) to 
> BPF anyway?

If you read the papers you need a very specific construct in order to not
only cause a speculative load of an address you choose but also to then
manage to cause a second operation that in some way reveals bits of data
or allows you to ask questions.

BPF allows you to construct those sequences relatively easily and it's
the one case where a user space application can fairly easily place code
it wants to execute in the kernel. Without BPF you have to find the right
construct in the kernel, prime all the right predictions and measure the
result without getting killed off. There are places you can do that but
they are not so easy and we don't (at this point) think there are that
many.

The same situation occurs in user space with interpreters and JITs,hence
the paper talking about javascript. Any JIT with the ability to do timing
is particularly vulnerable to versions of this specific attack because
the attacker gets to create the code pattern rather than have to find it.

Alan

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04  1:54           ` Linus Torvalds
@ 2018-01-04  3:10             ` Williams, Dan J
  2018-01-04  4:44               ` Al Viro
                                 ` (2 more replies)
  0 siblings, 3 replies; 66+ messages in thread
From: Williams, Dan J @ 2018-01-04  3:10 UTC (permalink / raw)
  To: torvalds
  Cc: linux-kernel, peterz, tglx, alan, Reshetova, Elena, mark.rutland,
	gnomes, gregkh, jikos, linux-arch

On Wed, 2018-01-03 at 17:54 -0800, Linus Torvalds wrote:
> On Wed, Jan 3, 2018 at 5:51 PM, Dan Williams <dan.j.williams@intel.co
> m> wrote:
> > 
> > Elena has done the work of auditing static analysis reports to a
> > dozen
> > or so locations that need some 'nospec' handling.
> 
> I'd love to see that patch, just to see how bad things look.
> 
> Because I think that really is very relevant to the interface too.
> 
> If we're talking "a dozen locations" that are fairly well
> constrained,
> that's very different from having thousands all over the place.
> 

Note, the following is Elena's work, I'm just helping poke the upstream
discussion along while she's offline.

Elena audited the static analysis reports down to the following
locations where userspace provides a value for a conditional branch and
then later creates a dependent load on that same userspace controlled
value.

'osb()', observable memory barrier, resolves to an lfence on x86. My
concern with these changes is that it is not clear what content within
the branch block is of concern. Peter's 'if_nospec' proposal combined
with Mark's 'nospec_load' would seem to clear that up, from a self
documenting code perspective, and let archs optionally protect entire
conditional blocks or individual loads within those blocks.

Note that these are "a human looked at static analysis reports and
could not rationalize that these are false positives". Specific domain
knowledge about these paths may find that some of them are indeed false
positives.

The change to m_start in kernel/user_namespace.c is interesting because
that's an example where the nospec_load() approach by itself would need
to barrier speculation twice whereas if_nospec can do it once for the
whole block.

[ forgive any whitespace damage ]

8<---
diff --git a/drivers/media/usb/uvc/uvc_v4l2.c b/drivers/media/usb/uvc/uvc_v4l2.c
index 3e7e283a44a8..65175bbe805f 100644
--- a/drivers/media/usb/uvc/uvc_v4l2.c
+++ b/drivers/media/usb/uvc/uvc_v4l2.c
@@ -821,6 +821,7 @@ static int uvc_ioctl_enum_input(struct file *file, void *fh,
 		}
 		pin = iterm->id;
 	} else if (index < selector->bNrInPins) {
+		osb();
 		pin = selector->baSourceID[index];
 		list_for_each_entry(iterm, &chain->entities, chain) {
 			if (!UVC_ENTITY_IS_ITERM(iterm))
diff --git a/drivers/net/wireless/ath/carl9170/main.c b/drivers/net/wireless/ath/carl9170/main.c
index 988c8857d78c..cf267b709af6 100644
--- a/drivers/net/wireless/ath/carl9170/main.c
+++ b/drivers/net/wireless/ath/carl9170/main.c
@@ -1388,6 +1388,7 @@ static int carl9170_op_conf_tx(struct ieee80211_hw *hw,
 
 	mutex_lock(&ar->mutex);
 	if (queue < ar->hw->queues) {
+		osb();
 		memcpy(&ar->edcf[ar9170_qmap[queue]], param, sizeof(*param));
 		ret = carl9170_set_qos(ar);
 	} else {
diff --git a/drivers/net/wireless/intersil/p54/main.c b/drivers/net/wireless/intersil/p54/main.c
index ab6d39e12069..f024f1ad81ff 100644
--- a/drivers/net/wireless/intersil/p54/main.c
+++ b/drivers/net/wireless/intersil/p54/main.c
@@ -415,6 +415,7 @@ static int p54_conf_tx(struct ieee80211_hw *dev,
 
 	mutex_lock(&priv->conf_mutex);
 	if (queue < dev->queues) {
+		osb();
 		P54_SET_QUEUE(priv->qos_params[queue], params->aifs,
 			params->cw_min, params->cw_max, params->txop);
 		ret = p54_set_edcf(priv);
diff --git a/drivers/net/wireless/st/cw1200/sta.c b/drivers/net/wireless/st/cw1200/sta.c
index 38678e9a0562..b4a2a7ba04e8 100644
--- a/drivers/net/wireless/st/cw1200/sta.c
+++ b/drivers/net/wireless/st/cw1200/sta.c
@@ -619,6 +619,7 @@ int cw1200_conf_tx(struct ieee80211_hw *dev, struct ieee80211_vif *vif,
 	mutex_lock(&priv->conf_mutex);
 
 	if (queue < dev->queues) {
+		osb();
 		old_uapsd_flags = le16_to_cpu(priv->uapsd_info.uapsd_flags);
 
 		WSM_TX_QUEUE_SET(&priv->tx_queue_params, queue, 0, 0, 0);
diff --git a/drivers/scsi/qla2xxx/qla_mr.c b/drivers/scsi/qla2xxx/qla_mr.c
index d5da3981cefe..dec8b6e087e3 100644
--- a/drivers/scsi/qla2xxx/qla_mr.c
+++ b/drivers/scsi/qla2xxx/qla_mr.c
@@ -2304,10 +2304,12 @@ qlafx00_status_entry(scsi_qla_host_t *vha, struct rsp_que *rsp, void *pkt)
 	req = ha->req_q_map[que];
 
 	/* Validate handle. */
-	if (handle < req->num_outstanding_cmds)
+	if (handle < req->num_outstanding_cmds) {
+		osb();
 		sp = req->outstanding_cmds[handle];
-	else
+	} else {
 		sp = NULL;
+	}
 
 	if (sp == NULL) {
 		ql_dbg(ql_dbg_io, vha, 0x3034,
@@ -2655,10 +2657,12 @@ qlafx00_multistatus_entry(struct scsi_qla_host *vha,
 		req = ha->req_q_map[que];
 
 		/* Validate handle. */
-		if (handle < req->num_outstanding_cmds)
+		if (handle < req->num_outstanding_cmds) {
+			osb();
 			sp = req->outstanding_cmds[handle];
-		else
+		} else {
 			sp = NULL;
+		}
 
 		if (sp == NULL) {
 			ql_dbg(ql_dbg_io, vha, 0x3044,
diff --git a/drivers/thermal/int340x_thermal/int340x_thermal_zone.c b/drivers/thermal/int340x_thermal/int340x_thermal_zone.c
index 145a5c53ff5c..d732b34e120d 100644
--- a/drivers/thermal/int340x_thermal/int340x_thermal_zone.c
+++ b/drivers/thermal/int340x_thermal/int340x_thermal_zone.c
@@ -57,15 +57,16 @@ static int int340x_thermal_get_trip_temp(struct thermal_zone_device *zone,
 	if (d->override_ops && d->override_ops->get_trip_temp)
 		return d->override_ops->get_trip_temp(zone, trip, temp);
 
-	if (trip < d->aux_trip_nr)
+	if (trip < d->aux_trip_nr) {
+		osb();
 		*temp = d->aux_trips[trip];
-	else if (trip == d->crt_trip_id)
+	} else if (trip == d->crt_trip_id) {
 		*temp = d->crt_temp;
-	else if (trip == d->psv_trip_id)
+	} else if (trip == d->psv_trip_id) {
 		*temp = d->psv_temp;
-	else if (trip == d->hot_trip_id)
+	} else if (trip == d->hot_trip_id) {
 		*temp = d->hot_temp;
-	else {
+	} else {
 		for (i = 0; i < INT340X_THERMAL_MAX_ACT_TRIP_COUNT; i++) {
 			if (d->act_trips[i].valid &&
 			    d->act_trips[i].id == trip) {
diff --git a/fs/udf/misc.c b/fs/udf/misc.c
index 401e64cde1be..c5394760d26b 100644
--- a/fs/udf/misc.c
+++ b/fs/udf/misc.c
@@ -104,6 +104,8 @@ struct genericFormat *udf_add_extendedattr(struct inode *inode, uint32_t size,
 					iinfo->i_lenEAttr) {
 				uint32_t aal =
 					le32_to_cpu(eahd->appAttrLocation);
+
+				osb();
 				memmove(&ea[offset - aal + size],
 					&ea[aal], offset - aal);
 				offset -= aal;
@@ -114,6 +116,8 @@ struct genericFormat *udf_add_extendedattr(struct inode *inode, uint32_t size,
 					iinfo->i_lenEAttr) {
 				uint32_t ial =
 					le32_to_cpu(eahd->impAttrLocation);
+
+				osb();
 				memmove(&ea[offset - ial + size],
 					&ea[ial], offset - ial);
 				offset -= ial;
@@ -125,6 +129,8 @@ struct genericFormat *udf_add_extendedattr(struct inode *inode, uint32_t size,
 					iinfo->i_lenEAttr) {
 				uint32_t aal =
 					le32_to_cpu(eahd->appAttrLocation);
+
+				osb();
 				memmove(&ea[offset - aal + size],
 					&ea[aal], offset - aal);
 				offset -= aal;
diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index 1c65817673db..dbc12007da51 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -82,8 +82,10 @@ static inline struct file *__fcheck_files(struct files_struct *files, unsigned i
 {
 	struct fdtable *fdt = rcu_dereference_raw(files->fdt);
 
-	if (fd < fdt->max_fds)
+	if (fd < fdt->max_fds) {
+		osb();
 		return rcu_dereference_raw(fdt->fd[fd]);
+	}
 	return NULL;
 }
 
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 246d4d4ce5c7..aa0be8cef2d4 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -648,15 +648,18 @@ static void *m_start(struct seq_file *seq, loff_t *ppos,
 {
 	loff_t pos = *ppos;
 	unsigned extents = map->nr_extents;
-	smp_rmb();
 
-	if (pos >= extents)
-		return NULL;
+	/* paired with smp_wmb in map_write */
+	smp_rmb();
 
-	if (extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
-		return &map->extent[pos];
+	if (pos < extents) {
+		osb();
+		if (extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
+			return &map->extent[pos];
+		return &map->forward[pos];
+	}
 
-	return &map->forward[pos];
+	return NULL;
 }
 
 static void *uid_m_start(struct seq_file *seq, loff_t *ppos)
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 125c1eab3eaa..56909c8fa025 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -476,6 +476,7 @@ static int raw_getfrag(void *from, char *to, int offset, int len, int odd,
 	if (offset < rfv->hlen) {
 		int copy = min(rfv->hlen - offset, len);
 
+		osb();
 		if (skb->ip_summed == CHECKSUM_PARTIAL)
 			memcpy(to, rfv->hdr.c + offset, copy);
 		else
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 761a473a07c5..0942784f3f8d 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -729,6 +729,7 @@ static int raw6_getfrag(void *from, char *to, int offset, int len, int odd,
 	if (offset < rfv->hlen) {
 		int copy = min(rfv->hlen - offset, len);
 
+		osb();
 		if (skb->ip_summed == CHECKSUM_PARTIAL)
 			memcpy(to, rfv->c + offset, copy);
 		else
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 8ca9915befc8..7f83abdea255 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -81,6 +81,8 @@ static struct mpls_route *mpls_route_input_rcu(struct net *net, unsigned index)
 	if (index < net->mpls.platform_labels) {
 		struct mpls_route __rcu **platform_label =
 			rcu_dereference(net->mpls.platform_label);
+
+		osb();
 		rt = rcu_dereference(platform_label[index]);
 	}
 	return rt;

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04  2:15             ` Alan Cox
@ 2018-01-04  3:12               ` Alexei Starovoitov
  2018-01-04  9:16                 ` Reshetova, Elena
  0 siblings, 1 reply; 66+ messages in thread
From: Alexei Starovoitov @ 2018-01-04  3:12 UTC (permalink / raw)
  To: Alan Cox
  Cc: Jiri Kosina, Dan Williams, Linus Torvalds,
	Linux Kernel Mailing List, Mark Rutland, linux-arch,
	Peter Zijlstra, Greg KH, Thomas Gleixner, Elena Reshetova,
	netdev, Daniel Borkmann, David S. Miller

On Thu, Jan 04, 2018 at 02:15:53AM +0000, Alan Cox wrote:
> 
> > > Elena has done the work of auditing static analysis reports to a dozen
> > > or so locations that need some 'nospec' handling.  
> > 
> > How exactly is that related (especially in longer-term support terms) to 
> > BPF anyway?
> 
> If you read the papers you need a very specific construct in order to not
> only cause a speculative load of an address you choose but also to then
> manage to cause a second operation that in some way reveals bits of data
> or allows you to ask questions.
> 
> BPF allows you to construct those sequences relatively easily and it's
> the one case where a user space application can fairly easily place code
> it wants to execute in the kernel. Without BPF you have to find the right
> construct in the kernel, prime all the right predictions and measure the
> result without getting killed off. There are places you can do that but
> they are not so easy and we don't (at this point) think there are that
> many.

for BPF in particular we're thinking to do a different fix.
Instead of killing speculation we can let cpu speculate.
The fix will include rounding up bpf maps to nearest power of 2 and
inserting bpf_and operation on the index after bounds check,
so cpu can continue speculate beyond bounds check, but it will
load from zero-ed memory.
So this nospec arch dependent hack won't be necessary for bpf side,
but may still be needed in other parts of the kernel.

Also note that variant 1 is talking about exploiting prog_array
bpf feature which had 64-bit access prior to
commit 90caccdd8cc0 ("bpf: fix bpf_tail_call() x64 JIT")
That was a fix for JIT and not related to this cpu issue, but
I believe it breaks the existing exploit.

Since it's not clear whether it's still possible to use bpf
with 32-bit speculation only, we're going to do this rounding fix
for unpriv just in case.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04  3:10             ` Williams, Dan J
@ 2018-01-04  4:44               ` Al Viro
  2018-01-04  5:44                 ` Dan Williams
  2018-01-04  5:01                 ` Eric W. Biederman
  2018-01-04 11:47                 ` Mark Rutland
  2 siblings, 1 reply; 66+ messages in thread
From: Al Viro @ 2018-01-04  4:44 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: torvalds, linux-kernel, peterz, tglx, alan, Reshetova, Elena,
	mark.rutland, gnomes, gregkh, jikos, linux-arch

On Thu, Jan 04, 2018 at 03:10:51AM +0000, Williams, Dan J wrote:

> diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
> index 1c65817673db..dbc12007da51 100644
> --- a/include/linux/fdtable.h
> +++ b/include/linux/fdtable.h
> @@ -82,8 +82,10 @@ static inline struct file *__fcheck_files(struct files_struct *files, unsigned i
>  {
>  	struct fdtable *fdt = rcu_dereference_raw(files->fdt);
>  
> -	if (fd < fdt->max_fds)
> +	if (fd < fdt->max_fds) {
> +		osb();
>  		return rcu_dereference_raw(fdt->fd[fd]);
> +	}
>  	return NULL;
>  }

... and the point of that would be?  Possibly revealing the value of files->fdt?
Why would that be a threat, assuming you manage to extract the information in
question in the first place?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04  3:10             ` Williams, Dan J
@ 2018-01-04  5:01                 ` Eric W. Biederman
  2018-01-04  5:01                 ` Eric W. Biederman
  2018-01-04 11:47                 ` Mark Rutland
  2 siblings, 0 replies; 66+ messages in thread
From: Eric W. Biederman @ 2018-01-04  5:01 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: torvalds, linux-kernel, peterz, tglx, alan, Reshetova, Elena,
	mark.rutland, gnomes, gregkh, jikos, linux-arch

"Williams, Dan J" <dan.j.williams@intel.com> writes:



> Note that these are "a human looked at static analysis reports and
> could not rationalize that these are false positives". Specific domain
> knowledge about these paths may find that some of them are indeed false
> positives.
>
> The change to m_start in kernel/user_namespace.c is interesting because
> that's an example where the nospec_load() approach by itself would need
> to barrier speculation twice whereas if_nospec can do it once for the
> whole block.


This user_namespace.c change is very convoluted for what it is trying to
do.  It simplifies to a one liner that just adds osb() after pos >=
extents. AKA:

	if (pos >= extents)
        	return NULL;
+ 	osb();

Is the intent to hide which branch branch we take based on extents,
after the pos check?

I suspect this implies that using a user namespace and a crafted uid
map you can hit this in stat, on the fast path.

At which point I suspect we will be better off extending struct
user_namespace by a few pointers, so there is no union and remove the
need for blocking speculation entirely.

> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 246d4d4ce5c7..aa0be8cef2d4 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -648,15 +648,18 @@ static void *m_start(struct seq_file *seq, loff_t *ppos,
>  {
>  	loff_t pos = *ppos;
>  	unsigned extents = map->nr_extents;
> -	smp_rmb();
>  
> -	if (pos >= extents)
> -		return NULL;
> +	/* paired with smp_wmb in map_write */
> +	smp_rmb();
>  
> -	if (extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
> -		return &map->extent[pos];
> +	if (pos < extents) {
> +		osb();
> +		if (extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
> +			return &map->extent[pos];
> +		return &map->forward[pos];
> +	}
>  
> -	return &map->forward[pos];
> +	return NULL;
>  }
>  
>  static void *uid_m_start(struct seq_file *seq, loff_t *ppos)



> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
> index 8ca9915befc8..7f83abdea255 100644
> --- a/net/mpls/af_mpls.c
> +++ b/net/mpls/af_mpls.c
> @@ -81,6 +81,8 @@ static struct mpls_route *mpls_route_input_rcu(struct net *net, unsigned index)
>  	if (index < net->mpls.platform_labels) {
>  		struct mpls_route __rcu **platform_label =
>  			rcu_dereference(net->mpls.platform_label);
> +
> +		osb();
>  		rt = rcu_dereference(platform_label[index]);
>  	}
>  	return rt;

Ouch!  This adds a barrier in the middle of an rcu lookup, on the
fast path for routing mpls packets.  Which if memory serves will
noticably slow down software processing of mpls packets.

Why does osb() fall after the branch for validity?  So that we allow
speculation up until then? 

I suspect it would be better to have those barriers in the tun/tap
interfaces where userspace can inject packets and thus time them.  Then
the code could still speculate and go fast for remote packets.

Or does the speculation stomping have to be immediately at the place
where we use data from userspace to perform a table lookup?

Eric

p.s. osb() seems to be mispelled. obs() would make much more sense.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
@ 2018-01-04  5:01                 ` Eric W. Biederman
  0 siblings, 0 replies; 66+ messages in thread
From: Eric W. Biederman @ 2018-01-04  5:01 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: torvalds, linux-kernel, peterz, tglx, alan, Reshetova, Elena,
	mark.rutland, gnomes, gregkh, jikos, linux-arch

"Williams, Dan J" <dan.j.williams@intel.com> writes:



> Note that these are "a human looked at static analysis reports and
> could not rationalize that these are false positives". Specific domain
> knowledge about these paths may find that some of them are indeed false
> positives.
>
> The change to m_start in kernel/user_namespace.c is interesting because
> that's an example where the nospec_load() approach by itself would need
> to barrier speculation twice whereas if_nospec can do it once for the
> whole block.


This user_namespace.c change is very convoluted for what it is trying to
do.  It simplifies to a one liner that just adds osb() after pos >=
extents. AKA:

	if (pos >= extents)
        	return NULL;
+ 	osb();

Is the intent to hide which branch branch we take based on extents,
after the pos check?

I suspect this implies that using a user namespace and a crafted uid
map you can hit this in stat, on the fast path.

At which point I suspect we will be better off extending struct
user_namespace by a few pointers, so there is no union and remove the
need for blocking speculation entirely.

> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 246d4d4ce5c7..aa0be8cef2d4 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -648,15 +648,18 @@ static void *m_start(struct seq_file *seq, loff_t *ppos,
>  {
>  	loff_t pos = *ppos;
>  	unsigned extents = map->nr_extents;
> -	smp_rmb();
>  
> -	if (pos >= extents)
> -		return NULL;
> +	/* paired with smp_wmb in map_write */
> +	smp_rmb();
>  
> -	if (extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
> -		return &map->extent[pos];
> +	if (pos < extents) {
> +		osb();
> +		if (extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
> +			return &map->extent[pos];
> +		return &map->forward[pos];
> +	}
>  
> -	return &map->forward[pos];
> +	return NULL;
>  }
>  
>  static void *uid_m_start(struct seq_file *seq, loff_t *ppos)



> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
> index 8ca9915befc8..7f83abdea255 100644
> --- a/net/mpls/af_mpls.c
> +++ b/net/mpls/af_mpls.c
> @@ -81,6 +81,8 @@ static struct mpls_route *mpls_route_input_rcu(struct net *net, unsigned index)
>  	if (index < net->mpls.platform_labels) {
>  		struct mpls_route __rcu **platform_label =
>  			rcu_dereference(net->mpls.platform_label);
> +
> +		osb();
>  		rt = rcu_dereference(platform_label[index]);
>  	}
>  	return rt;

Ouch!  This adds a barrier in the middle of an rcu lookup, on the
fast path for routing mpls packets.  Which if memory serves will
noticably slow down software processing of mpls packets.

Why does osb() fall after the branch for validity?  So that we allow
speculation up until then? 

I suspect it would be better to have those barriers in the tun/tap
interfaces where userspace can inject packets and thus time them.  Then
the code could still speculate and go fast for remote packets.

Or does the speculation stomping have to be immediately at the place
where we use data from userspace to perform a table lookup?

Eric

p.s. osb() seems to be mispelled. obs() would make much more sense.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04  4:44               ` Al Viro
@ 2018-01-04  5:44                 ` Dan Williams
  2018-01-04  5:49                   ` Dave Hansen
  2018-01-04  5:50                   ` Al Viro
  0 siblings, 2 replies; 66+ messages in thread
From: Dan Williams @ 2018-01-04  5:44 UTC (permalink / raw)
  To: Al Viro
  Cc: torvalds, linux-kernel, peterz, tglx, alan, Reshetova, Elena,
	mark.rutland, gnomes, gregkh, jikos, linux-arch

On Wed, Jan 3, 2018 at 8:44 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Thu, Jan 04, 2018 at 03:10:51AM +0000, Williams, Dan J wrote:
>
>> diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
>> index 1c65817673db..dbc12007da51 100644
>> --- a/include/linux/fdtable.h
>> +++ b/include/linux/fdtable.h
>> @@ -82,8 +82,10 @@ static inline struct file *__fcheck_files(struct files_struct *files, unsigned i
>>  {
>>       struct fdtable *fdt = rcu_dereference_raw(files->fdt);
>>
>> -     if (fd < fdt->max_fds)
>> +     if (fd < fdt->max_fds) {
>> +             osb();
>>               return rcu_dereference_raw(fdt->fd[fd]);
>> +     }
>>       return NULL;
>>  }
>
> ... and the point of that would be?  Possibly revealing the value of files->fdt?
> Why would that be a threat, assuming you manage to extract the information in
> question in the first place?

No, the concern is that an fd value >= fdt->max_fds may cause the cpu
to read arbitrary memory addresses relative to files->fdt and
userspace can observe that it got loaded. With the barrier the
speculation stops and never allows that speculative read to issue.
With the change, the cpu only issues a read for fdt->fd[fd] when fd is
valid.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04  5:44                 ` Dan Williams
@ 2018-01-04  5:49                   ` Dave Hansen
  2018-01-04  5:50                   ` Al Viro
  1 sibling, 0 replies; 66+ messages in thread
From: Dave Hansen @ 2018-01-04  5:49 UTC (permalink / raw)
  To: Dan Williams, Al Viro
  Cc: torvalds, linux-kernel, peterz, tglx, alan, Reshetova, Elena,
	mark.rutland, gnomes, gregkh, jikos, linux-arch

On 01/03/2018 09:44 PM, Dan Williams wrote:
> No, the concern is that an fd value >= fdt->max_fds may cause the cpu
> to read arbitrary memory addresses relative to files->fdt and
> userspace can observe that it got loaded.

Yep, it potentially tells you someting about memory after fdt->fd[].
For instance, you might be able to observe if some random bit of memory
after the actual fd[] array had 'mask' set because the CPU is running
this code with a 'file' that actually fails the "fd < fdt->max_fds" check:

                file = __fcheck_files(files, fd);
                if (!file || unlikely(file->f_mode & mask))
                        return 0;
                return (unsigned long)file;

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04  5:44                 ` Dan Williams
  2018-01-04  5:49                   ` Dave Hansen
@ 2018-01-04  5:50                   ` Al Viro
  2018-01-04  5:55                     ` Al Viro
  1 sibling, 1 reply; 66+ messages in thread
From: Al Viro @ 2018-01-04  5:50 UTC (permalink / raw)
  To: Dan Williams
  Cc: torvalds, linux-kernel, peterz, tglx, alan, Reshetova, Elena,
	mark.rutland, gnomes, gregkh, jikos, linux-arch

On Wed, Jan 03, 2018 at 09:44:33PM -0800, Dan Williams wrote:
> On Wed, Jan 3, 2018 at 8:44 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > On Thu, Jan 04, 2018 at 03:10:51AM +0000, Williams, Dan J wrote:
> >
> >> diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
> >> index 1c65817673db..dbc12007da51 100644
> >> --- a/include/linux/fdtable.h
> >> +++ b/include/linux/fdtable.h
> >> @@ -82,8 +82,10 @@ static inline struct file *__fcheck_files(struct files_struct *files, unsigned i
> >>  {
> >>       struct fdtable *fdt = rcu_dereference_raw(files->fdt);
> >>
> >> -     if (fd < fdt->max_fds)
> >> +     if (fd < fdt->max_fds) {
> >> +             osb();
> >>               return rcu_dereference_raw(fdt->fd[fd]);
> >> +     }
> >>       return NULL;
> >>  }
> >
> > ... and the point of that would be?  Possibly revealing the value of files->fdt?
> > Why would that be a threat, assuming you manage to extract the information in
> > question in the first place?
> 
> No, the concern is that an fd value >= fdt->max_fds may cause the cpu
> to read arbitrary memory addresses relative to files->fdt and
> userspace can observe that it got loaded.

Yes.  And all that might reveal is the value of files->fdt.  Who cares?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04  5:50                   ` Al Viro
@ 2018-01-04  5:55                     ` Al Viro
  2018-01-04  6:42                       ` Dan Williams
  0 siblings, 1 reply; 66+ messages in thread
From: Al Viro @ 2018-01-04  5:55 UTC (permalink / raw)
  To: Dan Williams
  Cc: torvalds, linux-kernel, peterz, tglx, alan, Reshetova, Elena,
	mark.rutland, gnomes, gregkh, jikos, linux-arch

On Thu, Jan 04, 2018 at 05:50:12AM +0000, Al Viro wrote:
> On Wed, Jan 03, 2018 at 09:44:33PM -0800, Dan Williams wrote:
> > On Wed, Jan 3, 2018 at 8:44 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > > On Thu, Jan 04, 2018 at 03:10:51AM +0000, Williams, Dan J wrote:
> > >
> > >> diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
> > >> index 1c65817673db..dbc12007da51 100644
> > >> --- a/include/linux/fdtable.h
> > >> +++ b/include/linux/fdtable.h
> > >> @@ -82,8 +82,10 @@ static inline struct file *__fcheck_files(struct files_struct *files, unsigned i
> > >>  {
> > >>       struct fdtable *fdt = rcu_dereference_raw(files->fdt);
> > >>
> > >> -     if (fd < fdt->max_fds)
> > >> +     if (fd < fdt->max_fds) {
> > >> +             osb();
> > >>               return rcu_dereference_raw(fdt->fd[fd]);
> > >> +     }
> > >>       return NULL;
> > >>  }
> > >
> > > ... and the point of that would be?  Possibly revealing the value of files->fdt?
> > > Why would that be a threat, assuming you manage to extract the information in
> > > question in the first place?
> > 
> > No, the concern is that an fd value >= fdt->max_fds may cause the cpu
> > to read arbitrary memory addresses relative to files->fdt and
> > userspace can observe that it got loaded.
> 
> Yes.  And all that might reveal is the value of files->fdt.  Who cares?

Sorry, s/files->fdt/files->fdt->fd/.  Still the same question - what information
would that extract and how would attacker use that?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04  1:13       ` Dan Williams
@ 2018-01-04  6:28         ` Julia Lawall
  2018-01-04 17:58           ` Dan Williams
  0 siblings, 1 reply; 66+ messages in thread
From: Julia Lawall @ 2018-01-04  6:28 UTC (permalink / raw)
  To: Dan Williams
  Cc: Alan Cox, Linus Torvalds, Linux Kernel Mailing List,
	Mark Rutland, linux-arch, Peter Zijlstra, Greg KH,
	Thomas Gleixner, Elena Reshetova, Alan Cox, Dan Carpenter



On Wed, 3 Jan 2018, Dan Williams wrote:

> [ adding Julia and Dan ]
>
> On Wed, Jan 3, 2018 at 5:07 PM, Alan Cox <gnomes@lxorguk.ukuu.org.uk> wrote:
> > On Wed, 3 Jan 2018 16:39:31 -0800
> > Linus Torvalds <torvalds@linux-foundation.org> wrote:
> >
> >> On Wed, Jan 3, 2018 at 4:15 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> >> > The 'if_nospec' primitive marks locations where the kernel is disabling
> >> > speculative execution that could potentially access privileged data. It
> >> > is expected to be paired with a 'nospec_{ptr,load}' where the user
> >> > controlled value is actually consumed.
> >>
> >> I'm much less worried about these "nospec_load/if" macros, than I am
> >> about having a sane way to determine when they should be needed.
> >>
> >> Is there such a sane model right now, or are we talking "people will
> >> randomly add these based on strong feelings"?
> >
> > There are people trying to tune coverity and other tool rules to identify
> > cases, and some of the work so far was done that way. For x86 we didn't
> > find too many so far so either the needed pattern is uncommon or ....  8)
> >
> > Given you can execute over a hundred basic instructions in a speculation
> > window it does need to be a tool that can explore not just in function
> > but across functions. That's really tough for the compiler itself to do
> > without help.
> >
> > What remains to be seen is if there are other patterns that affect
> > different processors.
> >
> > In the longer term the compiler itself needs to know what is and isn't
> > safe (ie you need to be able to write things like
> >
> > void foo(tainted __user int *x)
> >
> > and have the compiler figure out what level of speculation it can do and
> > (on processors with those features like IA64) when it can and can't do
> > various kinds of non-trapping loads.
> >
>
> It would be great if coccinelle and/or smatch could be taught to catch
> some of these case at least as a first pass "please audit this code
> block" type of notification.
>

What should one be looking for.  Do you have a typical example?

julia

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04  5:01                 ` Eric W. Biederman
  (?)
@ 2018-01-04  6:32                 ` Dan Williams
  2018-01-04 14:54                     ` Eric W. Biederman
  -1 siblings, 1 reply; 66+ messages in thread
From: Dan Williams @ 2018-01-04  6:32 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: torvalds, linux-kernel, peterz, tglx, alan, Reshetova, Elena,
	mark.rutland, gnomes, gregkh, jikos, linux-arch

On Wed, Jan 3, 2018 at 9:01 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> "Williams, Dan J" <dan.j.williams@intel.com> writes:
>
>
>
>> Note that these are "a human looked at static analysis reports and
>> could not rationalize that these are false positives". Specific domain
>> knowledge about these paths may find that some of them are indeed false
>> positives.
>>
>> The change to m_start in kernel/user_namespace.c is interesting because
>> that's an example where the nospec_load() approach by itself would need
>> to barrier speculation twice whereas if_nospec can do it once for the
>> whole block.
>
>
> This user_namespace.c change is very convoluted for what it is trying to
> do.

Sorry this was my rebase on top of commit d5e7b3c5f51f "userns: Don't
read extents twice in m_start" the original change from Elena was
simpler. Part of the complexity arises from converting the common
kernel pattern of

if (<invalid condition>)
   return NULL;
do_stuff;

...to:

if (<valid conidtion>) {
   barrier();
   do_stuff;
}

> It simplifies to a one liner that just adds osb() after pos >=
> extents. AKA:
>
>         if (pos >= extents)
>                 return NULL;
> +       osb();
>
> Is the intent to hide which branch branch we take based on extents,
> after the pos check?

The intent is to prevent speculative execution from triggering any
reads when 'pos' is invalid.

> I suspect this implies that using a user namespace and a crafted uid
> map you can hit this in stat, on the fast path.
>
> At which point I suspect we will be better off extending struct
> user_namespace by a few pointers, so there is no union and remove the
> need for blocking speculation entirely.

How does this help prevent a speculative read with an invalid 'pos'
reading arbitrary kernel addresses?

>
>> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>> index 246d4d4ce5c7..aa0be8cef2d4 100644
>> --- a/kernel/user_namespace.c
>> +++ b/kernel/user_namespace.c
>> @@ -648,15 +648,18 @@ static void *m_start(struct seq_file *seq, loff_t *ppos,
>>  {
>>       loff_t pos = *ppos;
>>       unsigned extents = map->nr_extents;
>> -     smp_rmb();
>>
>> -     if (pos >= extents)
>> -             return NULL;
>> +     /* paired with smp_wmb in map_write */
>> +     smp_rmb();
>>
>> -     if (extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
>> -             return &map->extent[pos];
>> +     if (pos < extents) {
>> +             osb();
>> +             if (extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
>> +                     return &map->extent[pos];
>> +             return &map->forward[pos];
>> +     }
>>
>> -     return &map->forward[pos];
>> +     return NULL;
>>  }
>>
>>  static void *uid_m_start(struct seq_file *seq, loff_t *ppos)
>
>
>
>> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
>> index 8ca9915befc8..7f83abdea255 100644
>> --- a/net/mpls/af_mpls.c
>> +++ b/net/mpls/af_mpls.c
>> @@ -81,6 +81,8 @@ static struct mpls_route *mpls_route_input_rcu(struct net *net, unsigned index)
>>       if (index < net->mpls.platform_labels) {
>>               struct mpls_route __rcu **platform_label =
>>                       rcu_dereference(net->mpls.platform_label);
>> +
>> +             osb();
>>               rt = rcu_dereference(platform_label[index]);
>>       }
>>       return rt;
>
> Ouch!  This adds a barrier in the middle of an rcu lookup, on the
> fast path for routing mpls packets.  Which if memory serves will
> noticably slow down software processing of mpls packets.
>
> Why does osb() fall after the branch for validity?  So that we allow
> speculation up until then?

It falls there so that the cpu only issues reads with known good 'index' values.

> I suspect it would be better to have those barriers in the tun/tap
> interfaces where userspace can inject packets and thus time them.  Then
> the code could still speculate and go fast for remote packets.
>
> Or does the speculation stomping have to be immediately at the place
> where we use data from userspace to perform a table lookup?

The speculation stomping barrier has to be between where we validate
the input and when we may speculate on invalid input. So, yes, moving
the user controllable input validation earlier and out of the fast
path would be preferred. Think of this patch purely as a static
analysis warning that something might need to be done to resolve the
report.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04  5:55                     ` Al Viro
@ 2018-01-04  6:42                       ` Dan Williams
  0 siblings, 0 replies; 66+ messages in thread
From: Dan Williams @ 2018-01-04  6:42 UTC (permalink / raw)
  To: Al Viro
  Cc: torvalds, linux-kernel, peterz, tglx, alan, Reshetova, Elena,
	mark.rutland, gnomes, gregkh, jikos, linux-arch

On Wed, Jan 3, 2018 at 9:55 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Thu, Jan 04, 2018 at 05:50:12AM +0000, Al Viro wrote:
>> On Wed, Jan 03, 2018 at 09:44:33PM -0800, Dan Williams wrote:
>> > On Wed, Jan 3, 2018 at 8:44 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>> > > On Thu, Jan 04, 2018 at 03:10:51AM +0000, Williams, Dan J wrote:
>> > >
>> > >> diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
>> > >> index 1c65817673db..dbc12007da51 100644
>> > >> --- a/include/linux/fdtable.h
>> > >> +++ b/include/linux/fdtable.h
>> > >> @@ -82,8 +82,10 @@ static inline struct file *__fcheck_files(struct files_struct *files, unsigned i
>> > >>  {
>> > >>       struct fdtable *fdt = rcu_dereference_raw(files->fdt);
>> > >>
>> > >> -     if (fd < fdt->max_fds)
>> > >> +     if (fd < fdt->max_fds) {
>> > >> +             osb();
>> > >>               return rcu_dereference_raw(fdt->fd[fd]);
>> > >> +     }
>> > >>       return NULL;
>> > >>  }
>> > >
>> > > ... and the point of that would be?  Possibly revealing the value of files->fdt?
>> > > Why would that be a threat, assuming you manage to extract the information in
>> > > question in the first place?
>> >
>> > No, the concern is that an fd value >= fdt->max_fds may cause the cpu
>> > to read arbitrary memory addresses relative to files->fdt and
>> > userspace can observe that it got loaded.
>>
>> Yes.  And all that might reveal is the value of files->fdt.  Who cares?
>
> Sorry, s/files->fdt/files->fdt->fd/.  Still the same question - what information
> would that extract and how would attacker use that?

The question is if userspace can ex-filtrate any data from the kernel
that would otherwise be blocked by a bounds check should the kernel
close that hole?

For these patches I do not think the bar should be "can I prove an
information leak is exploitable" it should be "can I prove that a leak
is not exploitable", especially when possibly combined with other leak
sites.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04  3:12               ` Alexei Starovoitov
@ 2018-01-04  9:16                 ` Reshetova, Elena
  0 siblings, 0 replies; 66+ messages in thread
From: Reshetova, Elena @ 2018-01-04  9:16 UTC (permalink / raw)
  To: Alexei Starovoitov, Alan Cox
  Cc: Jiri Kosina, Williams, Dan J, Linus Torvalds,
	Linux Kernel Mailing List, Mark Rutland, linux-arch,
	Peter Zijlstra, Greg KH, Thomas Gleixner, netdev,
	Daniel Borkmann, David S. Miller

> On Thu, Jan 04, 2018 at 02:15:53AM +0000, Alan Cox wrote:
> >
> > > > Elena has done the work of auditing static analysis reports to a dozen
> > > > or so locations that need some 'nospec' handling.
> > >
> > > How exactly is that related (especially in longer-term support terms) to
> > > BPF anyway?
> >
> > If you read the papers you need a very specific construct in order to not
> > only cause a speculative load of an address you choose but also to then
> > manage to cause a second operation that in some way reveals bits of data
> > or allows you to ask questions.
> >
> > BPF allows you to construct those sequences relatively easily and it's
> > the one case where a user space application can fairly easily place code
> > it wants to execute in the kernel. Without BPF you have to find the right
> > construct in the kernel, prime all the right predictions and measure the
> > result without getting killed off. There are places you can do that but
> > they are not so easy and we don't (at this point) think there are that
> > many.
> 
> for BPF in particular we're thinking to do a different fix.
> Instead of killing speculation we can let cpu speculate.
> The fix will include rounding up bpf maps to nearest power of 2 and
> inserting bpf_and operation on the index after bounds check,
> so cpu can continue speculate beyond bounds check, but it will
> load from zero-ed memory.
> So this nospec arch dependent hack won't be necessary for bpf side,
> but may still be needed in other parts of the kernel.

I think this is a much better approach than what we have been doing internally so
far. My initial fix back in August for this was to insert a minimal set of lfence
barriers (osb() barrier below) that prevent speculation just based on how attack was using constants
to index eBPF maps:

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
#define BPF_R0	regs[BPF_REG_0]
@@ -939,6 +940,7 @@ static unsigned int ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn,
 		DST = IMM;
 		CONT;
 	LD_IMM_DW:
+		osb();
 		DST = (u64) (u32) insn[0].imm | ((u64) (u32) insn[1].imm) << 32;
 		insn++;
 		CONT;
@@ -1200,6 +1202,7 @@ static unsigned int ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn,
 		*(SIZE *)(unsigned long) (DST + insn->off) = IMM;	\
 		CONT;							\
 	LDX_MEM_##SIZEOP:						\
+		osb();							\
 		DST = *(SIZE *)(unsigned long) (SRC + insn->off);	\
 		CONT;
 


And similar stuff also for x86 JIT (blinding dependent):

@@ -400,7 +413,7 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
 			case BPF_ADD: b2 = 0x01; break;
 			case BPF_SUB: b2 = 0x29; break;
 			case BPF_AND: b2 = 0x21; break;
-			case BPF_OR: b2 = 0x09; break;
+			case BPF_OR: b2 = 0x09; emit_memory_barrier(&prog); break;
 			case BPF_XOR: b2 = 0x31; break;
 			}
 			if (BPF_CLASS(insn->code) == BPF_ALU64)
@@ -647,6 +660,16 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
 		case BPF_ALU64 | BPF_RSH | BPF_X:
 		case BPF_ALU64 | BPF_ARSH | BPF_X:
 
+			/* If blinding is enabled, each
+			 * BPF_LD | BPF_IMM | BPF_DW instruction
+			 * is converted to 4 eBPF instructions with
+			 * BPF_ALU64_IMM(BPF_LSH, BPF_REG_AX, 32)
+			 * always present(number 3). Detect such cases
+			 * and insert memory barriers. */
+			if ((BPF_CLASS(insn->code) == BPF_ALU64)
+				&& (BPF_OP(insn->code) == BPF_LSH)
+				&& (src_reg == BPF_REG_AX))
+				emit_memory_barrier(&prog);
 			/* check for bad case when dst_reg == rcx */
 			if (dst_reg == BPF_REG_4) {
 				/* mov r11, dst_reg */

But this is really ugly fix IMO to prevent speculation from happen, so with your
approach I think it is really much better. 

If you need help in testing the fixes, I can do it unless you already have the
attack code yourself. 

> 
> Also note that variant 1 is talking about exploiting prog_array
> bpf feature which had 64-bit access prior to
> commit 90caccdd8cc0 ("bpf: fix bpf_tail_call() x64 JIT")
> That was a fix for JIT and not related to this cpu issue, but
> I believe it breaks the existing exploit.

Actually I could not get existing exploit working on anything past 4.11
but at that time I could not see any fundamental change in eBPF that
would prevent it. 

Best Regards,
Elena.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH 4/4] bpf: inhibit speculated out-of-bounds pointers
  2018-01-03 23:45   ` Peter Zijlstra
@ 2018-01-04 10:59     ` Mark Rutland
  0 siblings, 0 replies; 66+ messages in thread
From: Mark Rutland @ 2018-01-04 10:59 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, linux-arch, Will Deacon

On Thu, Jan 04, 2018 at 12:45:29AM +0100, Peter Zijlstra wrote:
> On Wed, Jan 03, 2018 at 10:38:27PM +0000, Mark Rutland wrote:
> > Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
> > memory accesses under a bounds check may be speculated even if the
> > bounds check fails, providing a primitive for building a side channel.
> > 
> > The EBPF map code has a number of such bounds-checks accesses in
> > map_lookup_elem implementations. This patch modifies these to use the
> > nospec helpers to inhibit such side channels.
> > 
> > The JITted lookup_elem implementations remain potentially vulnerable,
> > and are disabled (with JITted code falling back to the C
> > implementations).
> 
> Since this is now public, let me re-iterate that I don't particularly
> like this approach. If you have to kill the JIT, could we please keep
> that in the arch JIT implementation?

Hopefully, killing the JIT is a temporary bodge. I agree that we want the arch
backends to JIT safe sequences somehow; I just haven't figured out exactly what
we need to do to make that happen.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04  3:10             ` Williams, Dan J
  2018-01-04  4:44               ` Al Viro
@ 2018-01-04 11:47                 ` Mark Rutland
  2018-01-04 11:47                 ` Mark Rutland
  2 siblings, 0 replies; 66+ messages in thread
From: Mark Rutland @ 2018-01-04 11:47 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: torvalds, linux-kernel, peterz, tglx, alan, Reshetova, Elena,
	gnomes, gregkh, jikos, linux-arch

Hi Dan,

Thanks for these examples.

On Thu, Jan 04, 2018 at 03:10:51AM +0000, Williams, Dan J wrote:
> Note, the following is Elena's work, I'm just helping poke the upstream
> discussion along while she's offline.
> 
> Elena audited the static analysis reports down to the following
> locations where userspace provides a value for a conditional branch and
> then later creates a dependent load on that same userspace controlled
> value.
> 
> 'osb()', observable memory barrier, resolves to an lfence on x86. My
> concern with these changes is that it is not clear what content within
> the branch block is of concern. Peter's 'if_nospec' proposal combined
> with Mark's 'nospec_load' would seem to clear that up, from a self
> documenting code perspective, and let archs optionally protect entire
> conditional blocks or individual loads within those blocks.

I'm a little concerned by having to use two helpers that need to be paired. It
means that we have to duplicate the bounds information, and I suspect in
practice that they'll often be paired improperly.

I think that we can avoid those problems if we use nospec_ptr() on its own. It
returns NULL if the pointer would be out-of-bounds, so we can use it in the
if-statement.

On x86, that can contain the bounds checks followed be an OSB / lfence, like
if_nospec(). On arm we can use our architected sequence. In either case,
nospec_ptr() returns NULL if the pointer would be out-of-bounds.

Then, rather than sequences like:

	if_nospec(idx < max) {
		val = nospec_array_load(array, idx, max);
		...
	}

... we could have:

	if ((elem_ptr = nospec_array_ptr(array, idx, max)) {
		val = *elem_ptr;
		...
	}

... which also means we don't have to annotate every single load in the branch
if the element is a structure with multiple fields.

Does that sound workable to you?

>From a quick scan, it looks like it would fit all of the example cases below.

Thanks,
Mark.

> Note that these are "a human looked at static analysis reports and
> could not rationalize that these are false positives". Specific domain
> knowledge about these paths may find that some of them are indeed false
> positives.
> 
> The change to m_start in kernel/user_namespace.c is interesting because
> that's an example where the nospec_load() approach by itself would need
> to barrier speculation twice whereas if_nospec can do it once for the
> whole block.
> 
> [ forgive any whitespace damage ]
> 
> 8<---
> diff --git a/drivers/media/usb/uvc/uvc_v4l2.c b/drivers/media/usb/uvc/uvc_v4l2.c
> index 3e7e283a44a8..65175bbe805f 100644
> --- a/drivers/media/usb/uvc/uvc_v4l2.c
> +++ b/drivers/media/usb/uvc/uvc_v4l2.c
> @@ -821,6 +821,7 @@ static int uvc_ioctl_enum_input(struct file *file, void *fh,
>  		}
>  		pin = iterm->id;
>  	} else if (index < selector->bNrInPins) {
> +		osb();
>  		pin = selector->baSourceID[index];
>  		list_for_each_entry(iterm, &chain->entities, chain) {
>  			if (!UVC_ENTITY_IS_ITERM(iterm))
> diff --git a/drivers/net/wireless/ath/carl9170/main.c b/drivers/net/wireless/ath/carl9170/main.c
> index 988c8857d78c..cf267b709af6 100644
> --- a/drivers/net/wireless/ath/carl9170/main.c
> +++ b/drivers/net/wireless/ath/carl9170/main.c
> @@ -1388,6 +1388,7 @@ static int carl9170_op_conf_tx(struct ieee80211_hw *hw,
>  
>  	mutex_lock(&ar->mutex);
>  	if (queue < ar->hw->queues) {
> +		osb();
>  		memcpy(&ar->edcf[ar9170_qmap[queue]], param, sizeof(*param));

>  		ret = carl9170_set_qos(ar);
>  	} else {
> diff --git a/drivers/net/wireless/intersil/p54/main.c b/drivers/net/wireless/intersil/p54/main.c
> index ab6d39e12069..f024f1ad81ff 100644
> --- a/drivers/net/wireless/intersil/p54/main.c
> +++ b/drivers/net/wireless/intersil/p54/main.c
> @@ -415,6 +415,7 @@ static int p54_conf_tx(struct ieee80211_hw *dev,
>  
>  	mutex_lock(&priv->conf_mutex);
>  	if (queue < dev->queues) {
> +		osb();
>  		P54_SET_QUEUE(priv->qos_params[queue], params->aifs,
>  			params->cw_min, params->cw_max, params->txop);
>  		ret = p54_set_edcf(priv);
> diff --git a/drivers/net/wireless/st/cw1200/sta.c b/drivers/net/wireless/st/cw1200/sta.c
> index 38678e9a0562..b4a2a7ba04e8 100644
> --- a/drivers/net/wireless/st/cw1200/sta.c
> +++ b/drivers/net/wireless/st/cw1200/sta.c
> @@ -619,6 +619,7 @@ int cw1200_conf_tx(struct ieee80211_hw *dev, struct ieee80211_vif *vif,
>  	mutex_lock(&priv->conf_mutex);
>  
>  	if (queue < dev->queues) {
> +		osb();
>  		old_uapsd_flags = le16_to_cpu(priv->uapsd_info.uapsd_flags);
>  
>  		WSM_TX_QUEUE_SET(&priv->tx_queue_params, queue, 0, 0, 0);
> diff --git a/drivers/scsi/qla2xxx/qla_mr.c b/drivers/scsi/qla2xxx/qla_mr.c
> index d5da3981cefe..dec8b6e087e3 100644
> --- a/drivers/scsi/qla2xxx/qla_mr.c
> +++ b/drivers/scsi/qla2xxx/qla_mr.c
> @@ -2304,10 +2304,12 @@ qlafx00_status_entry(scsi_qla_host_t *vha, struct rsp_que *rsp, void *pkt)
>  	req = ha->req_q_map[que];
>  
>  	/* Validate handle. */
> -	if (handle < req->num_outstanding_cmds)
> +	if (handle < req->num_outstanding_cmds) {
> +		osb();
>  		sp = req->outstanding_cmds[handle];
> -	else
> +	} else {
>  		sp = NULL;
> +	}
>  
>  	if (sp == NULL) {
>  		ql_dbg(ql_dbg_io, vha, 0x3034,
> @@ -2655,10 +2657,12 @@ qlafx00_multistatus_entry(struct scsi_qla_host *vha,
>  		req = ha->req_q_map[que];
>  
>  		/* Validate handle. */
> -		if (handle < req->num_outstanding_cmds)
> +		if (handle < req->num_outstanding_cmds) {
> +			osb();
>  			sp = req->outstanding_cmds[handle];
> -		else
> +		} else {
>  			sp = NULL;
> +		}
>  
>  		if (sp == NULL) {
>  			ql_dbg(ql_dbg_io, vha, 0x3044,
> diff --git a/drivers/thermal/int340x_thermal/int340x_thermal_zone.c b/drivers/thermal/int340x_thermal/int340x_thermal_zone.c
> index 145a5c53ff5c..d732b34e120d 100644
> --- a/drivers/thermal/int340x_thermal/int340x_thermal_zone.c
> +++ b/drivers/thermal/int340x_thermal/int340x_thermal_zone.c
> @@ -57,15 +57,16 @@ static int int340x_thermal_get_trip_temp(struct thermal_zone_device *zone,
>  	if (d->override_ops && d->override_ops->get_trip_temp)
>  		return d->override_ops->get_trip_temp(zone, trip, temp);
>  
> -	if (trip < d->aux_trip_nr)
> +	if (trip < d->aux_trip_nr) {
> +		osb();
>  		*temp = d->aux_trips[trip];
> -	else if (trip == d->crt_trip_id)
> +	} else if (trip == d->crt_trip_id) {
>  		*temp = d->crt_temp;
> -	else if (trip == d->psv_trip_id)
> +	} else if (trip == d->psv_trip_id) {
>  		*temp = d->psv_temp;
> -	else if (trip == d->hot_trip_id)
> +	} else if (trip == d->hot_trip_id) {
>  		*temp = d->hot_temp;
> -	else {
> +	} else {
>  		for (i = 0; i < INT340X_THERMAL_MAX_ACT_TRIP_COUNT; i++) {
>  			if (d->act_trips[i].valid &&
>  			    d->act_trips[i].id == trip) {
> diff --git a/fs/udf/misc.c b/fs/udf/misc.c
> index 401e64cde1be..c5394760d26b 100644
> --- a/fs/udf/misc.c
> +++ b/fs/udf/misc.c
> @@ -104,6 +104,8 @@ struct genericFormat *udf_add_extendedattr(struct inode *inode, uint32_t size,
>  					iinfo->i_lenEAttr) {
>  				uint32_t aal =
>  					le32_to_cpu(eahd->appAttrLocation);
> +
> +				osb();
>  				memmove(&ea[offset - aal + size],
>  					&ea[aal], offset - aal);
>  				offset -= aal;
> @@ -114,6 +116,8 @@ struct genericFormat *udf_add_extendedattr(struct inode *inode, uint32_t size,
>  					iinfo->i_lenEAttr) {
>  				uint32_t ial =
>  					le32_to_cpu(eahd->impAttrLocation);
> +
> +				osb();
>  				memmove(&ea[offset - ial + size],
>  					&ea[ial], offset - ial);
>  				offset -= ial;
> @@ -125,6 +129,8 @@ struct genericFormat *udf_add_extendedattr(struct inode *inode, uint32_t size,
>  					iinfo->i_lenEAttr) {
>  				uint32_t aal =
>  					le32_to_cpu(eahd->appAttrLocation);
> +
> +				osb();
>  				memmove(&ea[offset - aal + size],
>  					&ea[aal], offset - aal);
>  				offset -= aal;
> diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
> index 1c65817673db..dbc12007da51 100644
> --- a/include/linux/fdtable.h
> +++ b/include/linux/fdtable.h
> @@ -82,8 +82,10 @@ static inline struct file *__fcheck_files(struct files_struct *files, unsigned i
>  {
>  	struct fdtable *fdt = rcu_dereference_raw(files->fdt);
>  
> -	if (fd < fdt->max_fds)
> +	if (fd < fdt->max_fds) {
> +		osb();
>  		return rcu_dereference_raw(fdt->fd[fd]);
> +	}
>  	return NULL;
>  }
>  
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 246d4d4ce5c7..aa0be8cef2d4 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -648,15 +648,18 @@ static void *m_start(struct seq_file *seq, loff_t *ppos,
>  {
>  	loff_t pos = *ppos;
>  	unsigned extents = map->nr_extents;
> -	smp_rmb();
>  
> -	if (pos >= extents)
> -		return NULL;
> +	/* paired with smp_wmb in map_write */
> +	smp_rmb();
>  
> -	if (extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
> -		return &map->extent[pos];
> +	if (pos < extents) {
> +		osb();
> +		if (extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
> +			return &map->extent[pos];
> +		return &map->forward[pos];
> +	}
>  
> -	return &map->forward[pos];
> +	return NULL;
>  }
>  
>  static void *uid_m_start(struct seq_file *seq, loff_t *ppos)
> diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
> index 125c1eab3eaa..56909c8fa025 100644
> --- a/net/ipv4/raw.c
> +++ b/net/ipv4/raw.c
> @@ -476,6 +476,7 @@ static int raw_getfrag(void *from, char *to, int offset, int len, int odd,
>  	if (offset < rfv->hlen) {
>  		int copy = min(rfv->hlen - offset, len);
>  
> +		osb();
>  		if (skb->ip_summed == CHECKSUM_PARTIAL)
>  			memcpy(to, rfv->hdr.c + offset, copy);
>  		else
> diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
> index 761a473a07c5..0942784f3f8d 100644
> --- a/net/ipv6/raw.c
> +++ b/net/ipv6/raw.c
> @@ -729,6 +729,7 @@ static int raw6_getfrag(void *from, char *to, int offset, int len, int odd,
>  	if (offset < rfv->hlen) {
>  		int copy = min(rfv->hlen - offset, len);
>  
> +		osb();
>  		if (skb->ip_summed == CHECKSUM_PARTIAL)
>  			memcpy(to, rfv->c + offset, copy);
>  		else
> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
> index 8ca9915befc8..7f83abdea255 100644
> --- a/net/mpls/af_mpls.c
> +++ b/net/mpls/af_mpls.c
> @@ -81,6 +81,8 @@ static struct mpls_route *mpls_route_input_rcu(struct net *net, unsigned index)
>  	if (index < net->mpls.platform_labels) {
>  		struct mpls_route __rcu **platform_label =
>  			rcu_dereference(net->mpls.platform_label);
> +
> +		osb();
>  		rt = rcu_dereference(platform_label[index]);
>  	}
>  	return rt;

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
@ 2018-01-04 11:47                 ` Mark Rutland
  0 siblings, 0 replies; 66+ messages in thread
From: Mark Rutland @ 2018-01-04 11:47 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: torvalds, linux-kernel, peterz, tglx, alan, Reshetova, Elena,
	gnomes, gregkh, jikos, linux-arch

Hi Dan,

Thanks for these examples.

On Thu, Jan 04, 2018 at 03:10:51AM +0000, Williams, Dan J wrote:
> Note, the following is Elena's work, I'm just helping poke the upstream
> discussion along while she's offline.
> 
> Elena audited the static analysis reports down to the following
> locations where userspace provides a value for a conditional branch and
> then later creates a dependent load on that same userspace controlled
> value.
> 
> 'osb()', observable memory barrier, resolves to an lfence on x86. My
> concern with these changes is that it is not clear what content within
> the branch block is of concern. Peter's 'if_nospec' proposal combined
> with Mark's 'nospec_load' would seem to clear that up, from a self
> documenting code perspective, and let archs optionally protect entire
> conditional blocks or individual loads within those blocks.

I'm a little concerned by having to use two helpers that need to be paired. It
means that we have to duplicate the bounds information, and I suspect in
practice that they'll often be paired improperly.

I think that we can avoid those problems if we use nospec_ptr() on its own. It
returns NULL if the pointer would be out-of-bounds, so we can use it in the
if-statement.

On x86, that can contain the bounds checks followed be an OSB / lfence, like
if_nospec(). On arm we can use our architected sequence. In either case,
nospec_ptr() returns NULL if the pointer would be out-of-bounds.

Then, rather than sequences like:

	if_nospec(idx < max) {
		val = nospec_array_load(array, idx, max);
		...
	}

... we could have:

	if ((elem_ptr = nospec_array_ptr(array, idx, max)) {
		val = *elem_ptr;
		...
	}

... which also means we don't have to annotate every single load in the branch
if the element is a structure with multiple fields.

Does that sound workable to you?

From a quick scan, it looks like it would fit all of the example cases below.

Thanks,
Mark.

> Note that these are "a human looked at static analysis reports and
> could not rationalize that these are false positives". Specific domain
> knowledge about these paths may find that some of them are indeed false
> positives.
> 
> The change to m_start in kernel/user_namespace.c is interesting because
> that's an example where the nospec_load() approach by itself would need
> to barrier speculation twice whereas if_nospec can do it once for the
> whole block.
> 
> [ forgive any whitespace damage ]
> 
> 8<---
> diff --git a/drivers/media/usb/uvc/uvc_v4l2.c b/drivers/media/usb/uvc/uvc_v4l2.c
> index 3e7e283a44a8..65175bbe805f 100644
> --- a/drivers/media/usb/uvc/uvc_v4l2.c
> +++ b/drivers/media/usb/uvc/uvc_v4l2.c
> @@ -821,6 +821,7 @@ static int uvc_ioctl_enum_input(struct file *file, void *fh,
>  		}
>  		pin = iterm->id;
>  	} else if (index < selector->bNrInPins) {
> +		osb();
>  		pin = selector->baSourceID[index];
>  		list_for_each_entry(iterm, &chain->entities, chain) {
>  			if (!UVC_ENTITY_IS_ITERM(iterm))
> diff --git a/drivers/net/wireless/ath/carl9170/main.c b/drivers/net/wireless/ath/carl9170/main.c
> index 988c8857d78c..cf267b709af6 100644
> --- a/drivers/net/wireless/ath/carl9170/main.c
> +++ b/drivers/net/wireless/ath/carl9170/main.c
> @@ -1388,6 +1388,7 @@ static int carl9170_op_conf_tx(struct ieee80211_hw *hw,
>  
>  	mutex_lock(&ar->mutex);
>  	if (queue < ar->hw->queues) {
> +		osb();
>  		memcpy(&ar->edcf[ar9170_qmap[queue]], param, sizeof(*param));

>  		ret = carl9170_set_qos(ar);
>  	} else {
> diff --git a/drivers/net/wireless/intersil/p54/main.c b/drivers/net/wireless/intersil/p54/main.c
> index ab6d39e12069..f024f1ad81ff 100644
> --- a/drivers/net/wireless/intersil/p54/main.c
> +++ b/drivers/net/wireless/intersil/p54/main.c
> @@ -415,6 +415,7 @@ static int p54_conf_tx(struct ieee80211_hw *dev,
>  
>  	mutex_lock(&priv->conf_mutex);
>  	if (queue < dev->queues) {
> +		osb();
>  		P54_SET_QUEUE(priv->qos_params[queue], params->aifs,
>  			params->cw_min, params->cw_max, params->txop);
>  		ret = p54_set_edcf(priv);
> diff --git a/drivers/net/wireless/st/cw1200/sta.c b/drivers/net/wireless/st/cw1200/sta.c
> index 38678e9a0562..b4a2a7ba04e8 100644
> --- a/drivers/net/wireless/st/cw1200/sta.c
> +++ b/drivers/net/wireless/st/cw1200/sta.c
> @@ -619,6 +619,7 @@ int cw1200_conf_tx(struct ieee80211_hw *dev, struct ieee80211_vif *vif,
>  	mutex_lock(&priv->conf_mutex);
>  
>  	if (queue < dev->queues) {
> +		osb();
>  		old_uapsd_flags = le16_to_cpu(priv->uapsd_info.uapsd_flags);
>  
>  		WSM_TX_QUEUE_SET(&priv->tx_queue_params, queue, 0, 0, 0);
> diff --git a/drivers/scsi/qla2xxx/qla_mr.c b/drivers/scsi/qla2xxx/qla_mr.c
> index d5da3981cefe..dec8b6e087e3 100644
> --- a/drivers/scsi/qla2xxx/qla_mr.c
> +++ b/drivers/scsi/qla2xxx/qla_mr.c
> @@ -2304,10 +2304,12 @@ qlafx00_status_entry(scsi_qla_host_t *vha, struct rsp_que *rsp, void *pkt)
>  	req = ha->req_q_map[que];
>  
>  	/* Validate handle. */
> -	if (handle < req->num_outstanding_cmds)
> +	if (handle < req->num_outstanding_cmds) {
> +		osb();
>  		sp = req->outstanding_cmds[handle];
> -	else
> +	} else {
>  		sp = NULL;
> +	}
>  
>  	if (sp == NULL) {
>  		ql_dbg(ql_dbg_io, vha, 0x3034,
> @@ -2655,10 +2657,12 @@ qlafx00_multistatus_entry(struct scsi_qla_host *vha,
>  		req = ha->req_q_map[que];
>  
>  		/* Validate handle. */
> -		if (handle < req->num_outstanding_cmds)
> +		if (handle < req->num_outstanding_cmds) {
> +			osb();
>  			sp = req->outstanding_cmds[handle];
> -		else
> +		} else {
>  			sp = NULL;
> +		}
>  
>  		if (sp == NULL) {
>  			ql_dbg(ql_dbg_io, vha, 0x3044,
> diff --git a/drivers/thermal/int340x_thermal/int340x_thermal_zone.c b/drivers/thermal/int340x_thermal/int340x_thermal_zone.c
> index 145a5c53ff5c..d732b34e120d 100644
> --- a/drivers/thermal/int340x_thermal/int340x_thermal_zone.c
> +++ b/drivers/thermal/int340x_thermal/int340x_thermal_zone.c
> @@ -57,15 +57,16 @@ static int int340x_thermal_get_trip_temp(struct thermal_zone_device *zone,
>  	if (d->override_ops && d->override_ops->get_trip_temp)
>  		return d->override_ops->get_trip_temp(zone, trip, temp);
>  
> -	if (trip < d->aux_trip_nr)
> +	if (trip < d->aux_trip_nr) {
> +		osb();
>  		*temp = d->aux_trips[trip];
> -	else if (trip == d->crt_trip_id)
> +	} else if (trip == d->crt_trip_id) {
>  		*temp = d->crt_temp;
> -	else if (trip == d->psv_trip_id)
> +	} else if (trip == d->psv_trip_id) {
>  		*temp = d->psv_temp;
> -	else if (trip == d->hot_trip_id)
> +	} else if (trip == d->hot_trip_id) {
>  		*temp = d->hot_temp;
> -	else {
> +	} else {
>  		for (i = 0; i < INT340X_THERMAL_MAX_ACT_TRIP_COUNT; i++) {
>  			if (d->act_trips[i].valid &&
>  			    d->act_trips[i].id == trip) {
> diff --git a/fs/udf/misc.c b/fs/udf/misc.c
> index 401e64cde1be..c5394760d26b 100644
> --- a/fs/udf/misc.c
> +++ b/fs/udf/misc.c
> @@ -104,6 +104,8 @@ struct genericFormat *udf_add_extendedattr(struct inode *inode, uint32_t size,
>  					iinfo->i_lenEAttr) {
>  				uint32_t aal =
>  					le32_to_cpu(eahd->appAttrLocation);
> +
> +				osb();
>  				memmove(&ea[offset - aal + size],
>  					&ea[aal], offset - aal);
>  				offset -= aal;
> @@ -114,6 +116,8 @@ struct genericFormat *udf_add_extendedattr(struct inode *inode, uint32_t size,
>  					iinfo->i_lenEAttr) {
>  				uint32_t ial =
>  					le32_to_cpu(eahd->impAttrLocation);
> +
> +				osb();
>  				memmove(&ea[offset - ial + size],
>  					&ea[ial], offset - ial);
>  				offset -= ial;
> @@ -125,6 +129,8 @@ struct genericFormat *udf_add_extendedattr(struct inode *inode, uint32_t size,
>  					iinfo->i_lenEAttr) {
>  				uint32_t aal =
>  					le32_to_cpu(eahd->appAttrLocation);
> +
> +				osb();
>  				memmove(&ea[offset - aal + size],
>  					&ea[aal], offset - aal);
>  				offset -= aal;
> diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
> index 1c65817673db..dbc12007da51 100644
> --- a/include/linux/fdtable.h
> +++ b/include/linux/fdtable.h
> @@ -82,8 +82,10 @@ static inline struct file *__fcheck_files(struct files_struct *files, unsigned i
>  {
>  	struct fdtable *fdt = rcu_dereference_raw(files->fdt);
>  
> -	if (fd < fdt->max_fds)
> +	if (fd < fdt->max_fds) {
> +		osb();
>  		return rcu_dereference_raw(fdt->fd[fd]);
> +	}
>  	return NULL;
>  }
>  
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 246d4d4ce5c7..aa0be8cef2d4 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -648,15 +648,18 @@ static void *m_start(struct seq_file *seq, loff_t *ppos,
>  {
>  	loff_t pos = *ppos;
>  	unsigned extents = map->nr_extents;
> -	smp_rmb();
>  
> -	if (pos >= extents)
> -		return NULL;
> +	/* paired with smp_wmb in map_write */
> +	smp_rmb();
>  
> -	if (extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
> -		return &map->extent[pos];
> +	if (pos < extents) {
> +		osb();
> +		if (extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
> +			return &map->extent[pos];
> +		return &map->forward[pos];
> +	}
>  
> -	return &map->forward[pos];
> +	return NULL;
>  }
>  
>  static void *uid_m_start(struct seq_file *seq, loff_t *ppos)
> diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
> index 125c1eab3eaa..56909c8fa025 100644
> --- a/net/ipv4/raw.c
> +++ b/net/ipv4/raw.c
> @@ -476,6 +476,7 @@ static int raw_getfrag(void *from, char *to, int offset, int len, int odd,
>  	if (offset < rfv->hlen) {
>  		int copy = min(rfv->hlen - offset, len);
>  
> +		osb();
>  		if (skb->ip_summed == CHECKSUM_PARTIAL)
>  			memcpy(to, rfv->hdr.c + offset, copy);
>  		else
> diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
> index 761a473a07c5..0942784f3f8d 100644
> --- a/net/ipv6/raw.c
> +++ b/net/ipv6/raw.c
> @@ -729,6 +729,7 @@ static int raw6_getfrag(void *from, char *to, int offset, int len, int odd,
>  	if (offset < rfv->hlen) {
>  		int copy = min(rfv->hlen - offset, len);
>  
> +		osb();
>  		if (skb->ip_summed == CHECKSUM_PARTIAL)
>  			memcpy(to, rfv->c + offset, copy);
>  		else
> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
> index 8ca9915befc8..7f83abdea255 100644
> --- a/net/mpls/af_mpls.c
> +++ b/net/mpls/af_mpls.c
> @@ -81,6 +81,8 @@ static struct mpls_route *mpls_route_input_rcu(struct net *net, unsigned index)
>  	if (index < net->mpls.platform_labels) {
>  		struct mpls_route __rcu **platform_label =
>  			rcu_dereference(net->mpls.platform_label);
> +
> +		osb();
>  		rt = rcu_dereference(platform_label[index]);
>  	}
>  	return rt;

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
@ 2018-01-04 11:47                 ` Mark Rutland
  0 siblings, 0 replies; 66+ messages in thread
From: Mark Rutland @ 2018-01-04 11:47 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: torvalds, linux-kernel, peterz, tglx, alan, Reshetova, Elena,
	gnomes, gregkh, jikos, linux-arch

Hi Dan,

Thanks for these examples.

On Thu, Jan 04, 2018 at 03:10:51AM +0000, Williams, Dan J wrote:
> Note, the following is Elena's work, I'm just helping poke the upstream
> discussion along while she's offline.
> 
> Elena audited the static analysis reports down to the following
> locations where userspace provides a value for a conditional branch and
> then later creates a dependent load on that same userspace controlled
> value.
> 
> 'osb()', observable memory barrier, resolves to an lfence on x86. My
> concern with these changes is that it is not clear what content within
> the branch block is of concern. Peter's 'if_nospec' proposal combined
> with Mark's 'nospec_load' would seem to clear that up, from a self
> documenting code perspective, and let archs optionally protect entire
> conditional blocks or individual loads within those blocks.

I'm a little concerned by having to use two helpers that need to be paired. It
means that we have to duplicate the bounds information, and I suspect in
practice that they'll often be paired improperly.

I think that we can avoid those problems if we use nospec_ptr() on its own. It
returns NULL if the pointer would be out-of-bounds, so we can use it in the
if-statement.

On x86, that can contain the bounds checks followed be an OSB / lfence, like
if_nospec(). On arm we can use our architected sequence. In either case,
nospec_ptr() returns NULL if the pointer would be out-of-bounds.

Then, rather than sequences like:

	if_nospec(idx < max) {
		val = nospec_array_load(array, idx, max);
		...
	}

... we could have:

	if ((elem_ptr = nospec_array_ptr(array, idx, max)) {
		val = *elem_ptr;
		...
	}

... which also means we don't have to annotate every single load in the branch
if the element is a structure with multiple fields.

Does that sound workable to you?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH 1/4] asm-generic/barrier: add generic nospec helpers
  2018-01-03 22:38 ` [RFC PATCH 1/4] asm-generic/barrier: add generic nospec helpers Mark Rutland
@ 2018-01-04 12:00   ` Mark Rutland
  2018-01-05  4:21     ` Dan Williams
  0 siblings, 1 reply; 66+ messages in thread
From: Mark Rutland @ 2018-01-04 12:00 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-arch, Will Deacon

On Wed, Jan 03, 2018 at 10:38:24PM +0000, Mark Rutland wrote:
> +#define nospec_array_load(arr, idx, sz)					\
> +({									\
> +	typeof(*(arr)) *__arr = arr;					\
> +	typeof(idx) __idx = idx;					\
> +	typeof(sz) __sz = __sz;						\

Whoops. The second __sz should be sz here.

Mark.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04  6:32                 ` Dan Williams
@ 2018-01-04 14:54                     ` Eric W. Biederman
  0 siblings, 0 replies; 66+ messages in thread
From: Eric W. Biederman @ 2018-01-04 14:54 UTC (permalink / raw)
  To: Dan Williams
  Cc: torvalds, linux-kernel, peterz, tglx, alan, Reshetova, Elena,
	mark.rutland, gnomes, gregkh, jikos, linux-arch

Dan Williams <dan.j.williams@intel.com> writes:

> On Wed, Jan 3, 2018 at 9:01 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> "Williams, Dan J" <dan.j.williams@intel.com> writes:
>>
>>
>>
>>> Note that these are "a human looked at static analysis reports and
>>> could not rationalize that these are false positives". Specific domain
>>> knowledge about these paths may find that some of them are indeed false
>>> positives.
>>>
>>> The change to m_start in kernel/user_namespace.c is interesting because
>>> that's an example where the nospec_load() approach by itself would need
>>> to barrier speculation twice whereas if_nospec can do it once for the
>>> whole block.
>>
>>
>> This user_namespace.c change is very convoluted for what it is trying to
>> do.
>
> Sorry this was my rebase on top of commit d5e7b3c5f51f "userns: Don't
> read extents twice in m_start" the original change from Elena was
> simpler. Part of the complexity arises from converting the common
> kernel pattern of
>
> if (<invalid condition>)
>    return NULL;
> do_stuff;
>
> ...to:
>
> if (<valid conidtion>) {
>    barrier();
>    do_stuff;
> }
>
>> It simplifies to a one liner that just adds osb() after pos >=
>> extents. AKA:
>>
>>         if (pos >= extents)
>>                 return NULL;
>> +       osb();
>>
>> Is the intent to hide which branch branch we take based on extents,
>> after the pos check?
>
> The intent is to prevent speculative execution from triggering any
> reads when 'pos' is invalid.

If that is the intent I think the patch you posted is woefully
inadequate.  We have many many more seq files in proc than just
/proc/<pid>/uid_map.

>> I suspect this implies that using a user namespace and a crafted uid
>> map you can hit this in stat, on the fast path.
>>
>> At which point I suspect we will be better off extending struct
>> user_namespace by a few pointers, so there is no union and remove the
>> need for blocking speculation entirely.
>
> How does this help prevent a speculative read with an invalid 'pos'
> reading arbitrary kernel addresses?

I though the concern was extents.

I am now convinced that collectively we need a much better description
of the problem than currently exists.

Either the patch you presented missed a whole lot like 90%+ of the
user/kernel interface or there is some mitigating factor that I am not
seeing.  Either way until reasonable people can read the code and
agree on the potential exploitability of it, I will be nacking these
patches.

>>> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>>> index 246d4d4ce5c7..aa0be8cef2d4 100644
>>> --- a/kernel/user_namespace.c
>>> +++ b/kernel/user_namespace.c
>>> @@ -648,15 +648,18 @@ static void *m_start(struct seq_file *seq, loff_t *ppos,
>>>  {
>>>       loff_t pos = *ppos;
>>>       unsigned extents = map->nr_extents;
>>> -     smp_rmb();
>>>
>>> -     if (pos >= extents)
>>> -             return NULL;
>>> +     /* paired with smp_wmb in map_write */
>>> +     smp_rmb();
>>>
>>> -     if (extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
>>> -             return &map->extent[pos];
>>> +     if (pos < extents) {
>>> +             osb();
>>> +             if (extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
>>> +                     return &map->extent[pos];
>>> +             return &map->forward[pos];
>>> +     }
>>>
>>> -     return &map->forward[pos];
>>> +     return NULL;
>>>  }
>>>
>>>  static void *uid_m_start(struct seq_file *seq, loff_t *ppos)
>>
>>
>>
>>> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
>>> index 8ca9915befc8..7f83abdea255 100644
>>> --- a/net/mpls/af_mpls.c
>>> +++ b/net/mpls/af_mpls.c
>>> @@ -81,6 +81,8 @@ static struct mpls_route *mpls_route_input_rcu(struct net *net, unsigned index)
>>>       if (index < net->mpls.platform_labels) {
>>>               struct mpls_route __rcu **platform_label =
>>>                       rcu_dereference(net->mpls.platform_label);
>>> +
>>> +             osb();
>>>               rt = rcu_dereference(platform_label[index]);
>>>       }
>>>       return rt;
>>
>> Ouch!  This adds a barrier in the middle of an rcu lookup, on the
>> fast path for routing mpls packets.  Which if memory serves will
>> noticably slow down software processing of mpls packets.
>>
>> Why does osb() fall after the branch for validity?  So that we allow
>> speculation up until then?
>
> It falls there so that the cpu only issues reads with known good 'index' values.
>
>> I suspect it would be better to have those barriers in the tun/tap
>> interfaces where userspace can inject packets and thus time them.  Then
>> the code could still speculate and go fast for remote packets.
>>
>> Or does the speculation stomping have to be immediately at the place
>> where we use data from userspace to perform a table lookup?
>
> The speculation stomping barrier has to be between where we validate
> the input and when we may speculate on invalid input.

So a serializing instruction at the kernel/user boundary (like say
loading cr3) is not enough?  That would seem to break any chance of a
controlled timing.

> So, yes, moving
> the user controllable input validation earlier and out of the fast
> path would be preferred. Think of this patch purely as a static
> analysis warning that something might need to be done to resolve the
> report.

That isn't what I was suggesting.  I was just suggesting a serialization
instruction earlier in the pipeline.

Given what I have seen in other parts of the thread I think an and
instruction that just limits the index to a sane range is generally
applicable, and should be cheap enough to not care about.  Further
it seems to apply to the pattern the static checkers were catching,
so I suspect that is the pattern we want to stress for limiting
speculation.  Assuming of course the compiler won't just optimize the
and of the index out.

Eric

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
@ 2018-01-04 14:54                     ` Eric W. Biederman
  0 siblings, 0 replies; 66+ messages in thread
From: Eric W. Biederman @ 2018-01-04 14:54 UTC (permalink / raw)
  To: Dan Williams
  Cc: torvalds, linux-kernel, peterz, tglx, alan, Reshetova, Elena,
	mark.rutland, gnomes, gregkh, jikos, linux-arch

Dan Williams <dan.j.williams@intel.com> writes:

> On Wed, Jan 3, 2018 at 9:01 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> "Williams, Dan J" <dan.j.williams@intel.com> writes:
>>
>>
>>
>>> Note that these are "a human looked at static analysis reports and
>>> could not rationalize that these are false positives". Specific domain
>>> knowledge about these paths may find that some of them are indeed false
>>> positives.
>>>
>>> The change to m_start in kernel/user_namespace.c is interesting because
>>> that's an example where the nospec_load() approach by itself would need
>>> to barrier speculation twice whereas if_nospec can do it once for the
>>> whole block.
>>
>>
>> This user_namespace.c change is very convoluted for what it is trying to
>> do.
>
> Sorry this was my rebase on top of commit d5e7b3c5f51f "userns: Don't
> read extents twice in m_start" the original change from Elena was
> simpler. Part of the complexity arises from converting the common
> kernel pattern of
>
> if (<invalid condition>)
>    return NULL;
> do_stuff;
>
> ...to:
>
> if (<valid conidtion>) {
>    barrier();
>    do_stuff;
> }
>
>> It simplifies to a one liner that just adds osb() after pos >=
>> extents. AKA:
>>
>>         if (pos >= extents)
>>                 return NULL;
>> +       osb();
>>
>> Is the intent to hide which branch branch we take based on extents,
>> after the pos check?
>
> The intent is to prevent speculative execution from triggering any
> reads when 'pos' is invalid.

If that is the intent I think the patch you posted is woefully
inadequate.  We have many many more seq files in proc than just
/proc/<pid>/uid_map.

>> I suspect this implies that using a user namespace and a crafted uid
>> map you can hit this in stat, on the fast path.
>>
>> At which point I suspect we will be better off extending struct
>> user_namespace by a few pointers, so there is no union and remove the
>> need for blocking speculation entirely.
>
> How does this help prevent a speculative read with an invalid 'pos'
> reading arbitrary kernel addresses?

I though the concern was extents.

I am now convinced that collectively we need a much better description
of the problem than currently exists.

Either the patch you presented missed a whole lot like 90%+ of the
user/kernel interface or there is some mitigating factor that I am not
seeing.  Either way until reasonable people can read the code and
agree on the potential exploitability of it, I will be nacking these
patches.

>>> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>>> index 246d4d4ce5c7..aa0be8cef2d4 100644
>>> --- a/kernel/user_namespace.c
>>> +++ b/kernel/user_namespace.c
>>> @@ -648,15 +648,18 @@ static void *m_start(struct seq_file *seq, loff_t *ppos,
>>>  {
>>>       loff_t pos = *ppos;
>>>       unsigned extents = map->nr_extents;
>>> -     smp_rmb();
>>>
>>> -     if (pos >= extents)
>>> -             return NULL;
>>> +     /* paired with smp_wmb in map_write */
>>> +     smp_rmb();
>>>
>>> -     if (extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
>>> -             return &map->extent[pos];
>>> +     if (pos < extents) {
>>> +             osb();
>>> +             if (extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
>>> +                     return &map->extent[pos];
>>> +             return &map->forward[pos];
>>> +     }
>>>
>>> -     return &map->forward[pos];
>>> +     return NULL;
>>>  }
>>>
>>>  static void *uid_m_start(struct seq_file *seq, loff_t *ppos)
>>
>>
>>
>>> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
>>> index 8ca9915befc8..7f83abdea255 100644
>>> --- a/net/mpls/af_mpls.c
>>> +++ b/net/mpls/af_mpls.c
>>> @@ -81,6 +81,8 @@ static struct mpls_route *mpls_route_input_rcu(struct net *net, unsigned index)
>>>       if (index < net->mpls.platform_labels) {
>>>               struct mpls_route __rcu **platform_label =
>>>                       rcu_dereference(net->mpls.platform_label);
>>> +
>>> +             osb();
>>>               rt = rcu_dereference(platform_label[index]);
>>>       }
>>>       return rt;
>>
>> Ouch!  This adds a barrier in the middle of an rcu lookup, on the
>> fast path for routing mpls packets.  Which if memory serves will
>> noticably slow down software processing of mpls packets.
>>
>> Why does osb() fall after the branch for validity?  So that we allow
>> speculation up until then?
>
> It falls there so that the cpu only issues reads with known good 'index' values.
>
>> I suspect it would be better to have those barriers in the tun/tap
>> interfaces where userspace can inject packets and thus time them.  Then
>> the code could still speculate and go fast for remote packets.
>>
>> Or does the speculation stomping have to be immediately at the place
>> where we use data from userspace to perform a table lookup?
>
> The speculation stomping barrier has to be between where we validate
> the input and when we may speculate on invalid input.

So a serializing instruction at the kernel/user boundary (like say
loading cr3) is not enough?  That would seem to break any chance of a
controlled timing.

> So, yes, moving
> the user controllable input validation earlier and out of the fast
> path would be preferred. Think of this patch purely as a static
> analysis warning that something might need to be done to resolve the
> report.

That isn't what I was suggesting.  I was just suggesting a serialization
instruction earlier in the pipeline.

Given what I have seen in other parts of the thread I think an and
instruction that just limits the index to a sane range is generally
applicable, and should be cheap enough to not care about.  Further
it seems to apply to the pattern the static checkers were catching,
so I suspect that is the pattern we want to stress for limiting
speculation.  Assuming of course the compiler won't just optimize the
and of the index out.

Eric

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04 14:54                     ` Eric W. Biederman
  (?)
@ 2018-01-04 16:39                     ` Mark Rutland
  -1 siblings, 0 replies; 66+ messages in thread
From: Mark Rutland @ 2018-01-04 16:39 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Dan Williams, torvalds, linux-kernel, peterz, tglx, alan,
	Reshetova, Elena, gnomes, gregkh, jikos, linux-arch

On Thu, Jan 04, 2018 at 08:54:11AM -0600, Eric W. Biederman wrote:
> Dan Williams <dan.j.williams@intel.com> writes:
> > On Wed, Jan 3, 2018 at 9:01 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> >> "Williams, Dan J" <dan.j.williams@intel.com> writes:
> Either the patch you presented missed a whole lot like 90%+ of the
> user/kernel interface or there is some mitigating factor that I am not
> seeing.  Either way until reasonable people can read the code and
> agree on the potential exploitability of it, I will be nacking these
> patches.

As Dan mentioned, this is the result of auditing some static analysis reports.
I don't think it was claimed that this was complete, just that these are
locations that we're fairly certain need attention.

Auditing the entire user/kernel interface is going to take time, and I don't
think we should ignore this corpus in the mean time (though we certainly want
to avoid a whack-a-mole game).

[...]

> >>> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
> >>> index 8ca9915befc8..7f83abdea255 100644
> >>> --- a/net/mpls/af_mpls.c
> >>> +++ b/net/mpls/af_mpls.c
> >>> @@ -81,6 +81,8 @@ static struct mpls_route *mpls_route_input_rcu(struct net *net, unsigned index)
> >>>       if (index < net->mpls.platform_labels) {
> >>>               struct mpls_route __rcu **platform_label =
> >>>                       rcu_dereference(net->mpls.platform_label);
> >>> +
> >>> +             osb();
> >>>               rt = rcu_dereference(platform_label[index]);
> >>>       }
> >>>       return rt;
> >>
> >> Ouch!  This adds a barrier in the middle of an rcu lookup, on the
> >> fast path for routing mpls packets.  Which if memory serves will
> >> noticably slow down software processing of mpls packets.
> >>
> >> Why does osb() fall after the branch for validity?  So that we allow
> >> speculation up until then?
> >
> > It falls there so that the cpu only issues reads with known good 'index' values.
> >
> >> I suspect it would be better to have those barriers in the tun/tap
> >> interfaces where userspace can inject packets and thus time them.  Then
> >> the code could still speculate and go fast for remote packets.
> >>
> >> Or does the speculation stomping have to be immediately at the place
> >> where we use data from userspace to perform a table lookup?
> >
> > The speculation stomping barrier has to be between where we validate
> > the input and when we may speculate on invalid input.
> 
> So a serializing instruction at the kernel/user boundary (like say
> loading cr3) is not enough?  That would seem to break any chance of a
> controlled timing.

Unfortunately, it isn't sufficient to do this at the kernel/user boundary. Any
subsequent bounds check can be mis-speculated regardless of prior
serialization.

Such serialization has to occur *after* the relevant bounds check, but *before*
use of the value that was checked.

Where it's possible to audit user-provided values up front, we may be able to
batch checks to amortize the cost of such serialization, but typically bounds
checks are spread arbitrarily deep in the kernel.

[...]

> Given what I have seen in other parts of the thread I think an and
> instruction that just limits the index to a sane range is generally
> applicable, and should be cheap enough to not care about.

Where feasible, this sounds good to me.

However, since many places have dynamic bounds which aren't necessarily
powers-of-two, I'm not sure how applicable this is.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04  6:28         ` Julia Lawall
@ 2018-01-04 17:58           ` Dan Williams
  2018-01-04 19:26             ` Pavel Machek
  0 siblings, 1 reply; 66+ messages in thread
From: Dan Williams @ 2018-01-04 17:58 UTC (permalink / raw)
  To: Julia Lawall
  Cc: Alan Cox, Linus Torvalds, Linux Kernel Mailing List,
	Mark Rutland, linux-arch, Peter Zijlstra, Greg KH,
	Thomas Gleixner, Elena Reshetova, Alan Cox, Dan Carpenter

On Wed, Jan 3, 2018 at 10:28 PM, Julia Lawall <julia.lawall@lip6.fr> wrote:
>
>
> On Wed, 3 Jan 2018, Dan Williams wrote:
>
>> [ adding Julia and Dan ]
>>
>> On Wed, Jan 3, 2018 at 5:07 PM, Alan Cox <gnomes@lxorguk.ukuu.org.uk> wrote:
>> > On Wed, 3 Jan 2018 16:39:31 -0800
>> > Linus Torvalds <torvalds@linux-foundation.org> wrote:
>> >
>> >> On Wed, Jan 3, 2018 at 4:15 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>> >> > The 'if_nospec' primitive marks locations where the kernel is disabling
>> >> > speculative execution that could potentially access privileged data. It
>> >> > is expected to be paired with a 'nospec_{ptr,load}' where the user
>> >> > controlled value is actually consumed.
>> >>
>> >> I'm much less worried about these "nospec_load/if" macros, than I am
>> >> about having a sane way to determine when they should be needed.
>> >>
>> >> Is there such a sane model right now, or are we talking "people will
>> >> randomly add these based on strong feelings"?
>> >
>> > There are people trying to tune coverity and other tool rules to identify
>> > cases, and some of the work so far was done that way. For x86 we didn't
>> > find too many so far so either the needed pattern is uncommon or ....  8)
>> >
>> > Given you can execute over a hundred basic instructions in a speculation
>> > window it does need to be a tool that can explore not just in function
>> > but across functions. That's really tough for the compiler itself to do
>> > without help.
>> >
>> > What remains to be seen is if there are other patterns that affect
>> > different processors.
>> >
>> > In the longer term the compiler itself needs to know what is and isn't
>> > safe (ie you need to be able to write things like
>> >
>> > void foo(tainted __user int *x)
>> >
>> > and have the compiler figure out what level of speculation it can do and
>> > (on processors with those features like IA64) when it can and can't do
>> > various kinds of non-trapping loads.
>> >
>>
>> It would be great if coccinelle and/or smatch could be taught to catch
>> some of these case at least as a first pass "please audit this code
>> block" type of notification.
>>
>
> What should one be looking for.  Do you have a typical example?
>

See "Exploiting Conditional Branch Misprediction" from the paper [1].

The typical example is an attacker controlled index used to trigger a
dependent read near a branch. Where an example of "near" from the
paper is "up to 188 simple instructions inserted in the source code
between the ‘if’ statement and the line accessing array...".

if (attacker_controlled_index < bound)
     val = array[attacker_controlled_index];
else
    return error;

...when the cpu speculates that the 'index < bound' branch is taken it
reads index and uses that value to read array[index]. The result of an
'array' relative read is potentially observable in the cache.

[1]: https://spectreattack.com/spectre.pdf

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04 17:58           ` Dan Williams
@ 2018-01-04 19:26             ` Pavel Machek
  2018-01-04 21:43               ` Dan Williams
  0 siblings, 1 reply; 66+ messages in thread
From: Pavel Machek @ 2018-01-04 19:26 UTC (permalink / raw)
  To: Dan Williams
  Cc: Julia Lawall, Alan Cox, Linus Torvalds,
	Linux Kernel Mailing List, Mark Rutland, linux-arch,
	Peter Zijlstra, Greg KH, Thomas Gleixner, Elena Reshetova,
	Alan Cox, Dan Carpenter

[-- Attachment #1: Type: text/plain, Size: 1754 bytes --]

Hi!

> >> > What remains to be seen is if there are other patterns that affect
> >> > different processors.
> >> >
> >> > In the longer term the compiler itself needs to know what is and isn't
> >> > safe (ie you need to be able to write things like
> >> >
> >> > void foo(tainted __user int *x)
> >> >
> >> > and have the compiler figure out what level of speculation it can do and
> >> > (on processors with those features like IA64) when it can and can't do
> >> > various kinds of non-trapping loads.
> >> >
> >>
> >> It would be great if coccinelle and/or smatch could be taught to catch
> >> some of these case at least as a first pass "please audit this code
> >> block" type of notification.
> >>
> >
> > What should one be looking for.  Do you have a typical example?
> >
> 
> See "Exploiting Conditional Branch Misprediction" from the paper [1].
> 
> The typical example is an attacker controlled index used to trigger a
> dependent read near a branch. Where an example of "near" from the
> paper is "up to 188 simple instructions inserted in the source code
> between the ‘if’ statement and the line accessing array...".
> 
> if (attacker_controlled_index < bound)
>      val = array[attacker_controlled_index];
> else
>     return error;
> 
> ...when the cpu speculates that the 'index < bound' branch is taken it
> reads index and uses that value to read array[index]. The result of an
> 'array' relative read is potentially observable in the cache.

You still need

	(void) array2[val];

after that to get something observable, right?

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04  1:47           ` Jiri Kosina
@ 2018-01-04 19:39             ` Pavel Machek
  2018-01-04 20:32               ` Alan Cox
  0 siblings, 1 reply; 66+ messages in thread
From: Pavel Machek @ 2018-01-04 19:39 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Alan Cox, Linus Torvalds, Dan Williams,
	Linux Kernel Mailing List, Mark Rutland, linux-arch,
	Peter Zijlstra, Greg KH, Thomas Gleixner, Elena Reshetova

[-- Attachment #1: Type: text/plain, Size: 882 bytes --]

On Thu 2018-01-04 02:47:51, Jiri Kosina wrote:
> On Thu, 4 Jan 2018, Alan Cox wrote:
> 
> > > If the CPU speculation can cause these kinds of side-effects, it just must 
> > > not happen, full stop. 
> > 
> > At which point your performance will resemble that of a 2012 atom
> > processor at best.
> 
> You know what? I'd be completely fine with that, if it's traded for "my 
> ssh and internet banking keys are JUST MINE, ok?" :)

Agreed.

For kernel, we may be able to annonate "tainted" pointers. But then
there's quite a lot of code in userspace... What will need to be
modified? Just JITs? Setuid programs?

And we can get part of the performance back by adding more of
SMT... AFAICT.

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04 19:39             ` Pavel Machek
@ 2018-01-04 20:32               ` Alan Cox
  2018-01-04 20:39                 ` Jiri Kosina
  0 siblings, 1 reply; 66+ messages in thread
From: Alan Cox @ 2018-01-04 20:32 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Jiri Kosina, Linus Torvalds, Dan Williams,
	Linux Kernel Mailing List, Mark Rutland, linux-arch,
	Peter Zijlstra, Greg KH, Thomas Gleixner, Elena Reshetova

> For kernel, we may be able to annonate "tainted" pointers. But then
> there's quite a lot of code in userspace... What will need to be
> modified? Just JITs? Setuid programs?

You never go from one user process to another except via the kernel. We
have no hardware scheduling going on. That means that if the kernel
and/or CPU imposes the correct speculation barriers you can't attack
anyone but yourself.

Sandboxes and JITs are the critical components where the issue matters.
That IMHO is also an argument for why once the basics are in place there
is a good argument for a prctl and maybe cgroup support to run some
groups of processes with different trust levels.

For example there's not much point protecting a user process from root,
or a process from another process that could use ptrace on it instead.

> And we can get part of the performance back by adding more of
> SMT... AFAICT.

The current evidence is not because most workloads are not sufficiently
parallelisable. Likewise doing the speculation in the compiler doesn't
appear to have been the success people hoped for (IA64).

Alan

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04 20:32               ` Alan Cox
@ 2018-01-04 20:39                 ` Jiri Kosina
  2018-01-04 21:23                   ` Alan Cox
  0 siblings, 1 reply; 66+ messages in thread
From: Jiri Kosina @ 2018-01-04 20:39 UTC (permalink / raw)
  To: Alan Cox
  Cc: Pavel Machek, Linus Torvalds, Dan Williams,
	Linux Kernel Mailing List, Mark Rutland, linux-arch,
	Peter Zijlstra, Greg KH, Thomas Gleixner, Elena Reshetova

On Thu, 4 Jan 2018, Alan Cox wrote:

> You never go from one user process to another except via the kernel. We
> have no hardware scheduling going on. That means that if the kernel
> and/or CPU imposes the correct speculation barriers you can't attack
> anyone but yourself.

So how does this work on HT with the shared BTB? There is no context 
switch (and hence no IBPB) happening between the threads sharing it.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04  1:59           ` Jiri Kosina
  2018-01-04  2:15             ` Alan Cox
@ 2018-01-04 20:40             ` Pavel Machek
  1 sibling, 0 replies; 66+ messages in thread
From: Pavel Machek @ 2018-01-04 20:40 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Dan Williams, Alan Cox, Linus Torvalds,
	Linux Kernel Mailing List, Mark Rutland, linux-arch,
	Peter Zijlstra, Greg KH, Thomas Gleixner, Elena Reshetova,
	Alan Cox

[-- Attachment #1: Type: text/plain, Size: 1329 bytes --]

Hi!

> > So this is in that same category, but yes, it's inconvenient.
> 
> Disagreed, violently. CPU has to execute the instructions I ask it to 
> execute, and if it executes *anything* else that reveals any information 
> about the instructions that have *not* been executed, it's flawed.

I agree that's a flaw. Unfortunately... CPUs do execute instructions
you did not ask them to execute all the time.

Plus CPU designers forgot that cache state (and active row in DRAM) is
actually observable side-effect. ....and that's where we are today.

If you want, I have two systems with AMD Geode. One is PC. Neither is
very fast.

All the other general purpose CPUs I have -- and that includes
smartphones -- are likely out-of-order, and that means flawed.

So... situation is bad.

CPUs do execute intruction you did not ask them to execute. I don't
think that's reasonable to change.

I believe "right" fix would be for CPUs to treat even DRAM read as
side-effects, and adjust speculation accordingly. I'm not sure Intel/AMD
is going to do the right thing here.

Oh, I have an FPGA, too, if you want to play with RISC-V :-).

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04 14:54                     ` Eric W. Biederman
  (?)
  (?)
@ 2018-01-04 20:56                     ` Pavel Machek
  -1 siblings, 0 replies; 66+ messages in thread
From: Pavel Machek @ 2018-01-04 20:56 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Dan Williams, torvalds, linux-kernel, peterz, tglx, alan,
	Reshetova, Elena, mark.rutland, gnomes, gregkh, jikos,
	linux-arch

[-- Attachment #1: Type: text/plain, Size: 1113 bytes --]

Hi!


> > It falls there so that the cpu only issues reads with known good 'index' values.
> >
> >> I suspect it would be better to have those barriers in the tun/tap
> >> interfaces where userspace can inject packets and thus time them.  Then
> >> the code could still speculate and go fast for remote packets.
> >>
> >> Or does the speculation stomping have to be immediately at the place
> >> where we use data from userspace to perform a table lookup?
> >
> > The speculation stomping barrier has to be between where we validate
> > the input and when we may speculate on invalid input.
> 
> So a serializing instruction at the kernel/user boundary (like say
> loading cr3) is not enough?  That would seem to break any chance of a
> controlled timing.

Timing attack is not as straightforward as this.

You can assume cache snooping from second CPU _while_ kernel is
executing. Yes, that will mean timing, but....

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04 20:39                 ` Jiri Kosina
@ 2018-01-04 21:23                   ` Alan Cox
  2018-01-04 21:48                     ` Pavel Machek
  0 siblings, 1 reply; 66+ messages in thread
From: Alan Cox @ 2018-01-04 21:23 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Pavel Machek, Linus Torvalds, Dan Williams,
	Linux Kernel Mailing List, Mark Rutland, linux-arch,
	Peter Zijlstra, Greg KH, Thomas Gleixner, Elena Reshetova

On Thu, 4 Jan 2018 21:39:24 +0100 (CET)
Jiri Kosina <jikos@kernel.org> wrote:

> On Thu, 4 Jan 2018, Alan Cox wrote:
> 
> > You never go from one user process to another except via the kernel. We
> > have no hardware scheduling going on. That means that if the kernel
> > and/or CPU imposes the correct speculation barriers you can't attack
> > anyone but yourself.  
> 
> So how does this work on HT with the shared BTB? There is no context 
> switch (and hence no IBPB) happening between the threads sharing it.
> 

If you are paranoid in that case you either need to schedule things that
trust each other together or disable the speculation while that situation
occurs. However the kernel is always in the position to make that
decision.

Alan

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04 19:26             ` Pavel Machek
@ 2018-01-04 21:43               ` Dan Williams
  2018-01-04 22:20                 ` Linus Torvalds
  2018-01-04 22:44                 ` Pavel Machek
  0 siblings, 2 replies; 66+ messages in thread
From: Dan Williams @ 2018-01-04 21:43 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Julia Lawall, Alan Cox, Linus Torvalds,
	Linux Kernel Mailing List, Mark Rutland, linux-arch,
	Peter Zijlstra, Greg KH, Thomas Gleixner, Elena Reshetova,
	Alan Cox, Dan Carpenter

On Thu, Jan 4, 2018 at 11:26 AM, Pavel Machek <pavel@ucw.cz> wrote:
> Hi!
>
>> >> > What remains to be seen is if there are other patterns that affect
>> >> > different processors.
>> >> >
>> >> > In the longer term the compiler itself needs to know what is and isn't
>> >> > safe (ie you need to be able to write things like
>> >> >
>> >> > void foo(tainted __user int *x)
>> >> >
>> >> > and have the compiler figure out what level of speculation it can do and
>> >> > (on processors with those features like IA64) when it can and can't do
>> >> > various kinds of non-trapping loads.
>> >> >
>> >>
>> >> It would be great if coccinelle and/or smatch could be taught to catch
>> >> some of these case at least as a first pass "please audit this code
>> >> block" type of notification.
>> >>
>> >
>> > What should one be looking for.  Do you have a typical example?
>> >
>>
>> See "Exploiting Conditional Branch Misprediction" from the paper [1].
>>
>> The typical example is an attacker controlled index used to trigger a
>> dependent read near a branch. Where an example of "near" from the
>> paper is "up to 188 simple instructions inserted in the source code
>> between the ‘if’ statement and the line accessing array...".
>>
>> if (attacker_controlled_index < bound)
>>      val = array[attacker_controlled_index];
>> else
>>     return error;
>>
>> ...when the cpu speculates that the 'index < bound' branch is taken it
>> reads index and uses that value to read array[index]. The result of an
>> 'array' relative read is potentially observable in the cache.
>
> You still need
>
>         (void) array2[val];
>
> after that to get something observable, right?

As far as I understand the presence of array2[val] discloses more
information, but in terms of the cpu taking an action that it is
observable in the cache that's already occurred when "val =
array[attacker_controlled_index];" is speculated. Lets err on the side
of caution and shut down all the observable actions that are already
explicitly gated by an input validation check. In other words, a low
bandwidth information leak is still a leak.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04 21:23                   ` Alan Cox
@ 2018-01-04 21:48                     ` Pavel Machek
  0 siblings, 0 replies; 66+ messages in thread
From: Pavel Machek @ 2018-01-04 21:48 UTC (permalink / raw)
  To: Alan Cox
  Cc: Jiri Kosina, Linus Torvalds, Dan Williams,
	Linux Kernel Mailing List, Mark Rutland, linux-arch,
	Peter Zijlstra, Greg KH, Thomas Gleixner, Elena Reshetova

[-- Attachment #1: Type: text/plain, Size: 1214 bytes --]

On Thu 2018-01-04 21:23:59, Alan Cox wrote:
> On Thu, 4 Jan 2018 21:39:24 +0100 (CET)
> Jiri Kosina <jikos@kernel.org> wrote:
> 
> > On Thu, 4 Jan 2018, Alan Cox wrote:
> > 
> > > You never go from one user process to another except via the kernel. We
> > > have no hardware scheduling going on. That means that if the kernel
> > > and/or CPU imposes the correct speculation barriers you can't attack
> > > anyone but yourself.  
> > 
> > So how does this work on HT with the shared BTB? There is no context 
> > switch (and hence no IBPB) happening between the threads sharing it.
> > 
> 
> If you are paranoid in that case you either need to schedule things that
> trust each other together or disable the speculation while that situation
> occurs. However the kernel is always in the position to make that
> decision.

Actually... I'm not paranoid but would like to run flightgear on one
core (smt cpu #0), with smt cpu#1 being idle, while running
compilations on second core (smt cpus #2 and #3).

Is there easy way to do that?
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04 11:47                 ` Mark Rutland
  (?)
  (?)
@ 2018-01-04 22:09                 ` Dan Williams
  2018-01-05 14:40                   ` Mark Rutland
  -1 siblings, 1 reply; 66+ messages in thread
From: Dan Williams @ 2018-01-04 22:09 UTC (permalink / raw)
  To: Mark Rutland
  Cc: torvalds, linux-kernel, peterz, tglx, alan, Reshetova, Elena,
	gnomes, gregkh, jikos, linux-arch

On Thu, Jan 4, 2018 at 3:47 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> Hi Dan,
>
> Thanks for these examples.
>
> On Thu, Jan 04, 2018 at 03:10:51AM +0000, Williams, Dan J wrote:
>> Note, the following is Elena's work, I'm just helping poke the upstream
>> discussion along while she's offline.
>>
>> Elena audited the static analysis reports down to the following
>> locations where userspace provides a value for a conditional branch and
>> then later creates a dependent load on that same userspace controlled
>> value.
>>
>> 'osb()', observable memory barrier, resolves to an lfence on x86. My
>> concern with these changes is that it is not clear what content within
>> the branch block is of concern. Peter's 'if_nospec' proposal combined
>> with Mark's 'nospec_load' would seem to clear that up, from a self
>> documenting code perspective, and let archs optionally protect entire
>> conditional blocks or individual loads within those blocks.
>
> I'm a little concerned by having to use two helpers that need to be paired. It
> means that we have to duplicate the bounds information, and I suspect in
> practice that they'll often be paired improperly.

Hmm, will they be often mispaired? All of the examples to date the
load occurs in the same code block as the bound checking if()
statement.

> I think that we can avoid those problems if we use nospec_ptr() on its own. It
> returns NULL if the pointer would be out-of-bounds, so we can use it in the
> if-statement.
>
> On x86, that can contain the bounds checks followed be an OSB / lfence, like
> if_nospec(). On arm we can use our architected sequence. In either case,
> nospec_ptr() returns NULL if the pointer would be out-of-bounds.
>
> Then, rather than sequences like:
>
>         if_nospec(idx < max) {
>                 val = nospec_array_load(array, idx, max);
>                 ...
>         }
>
> ... we could have:
>
>         if ((elem_ptr = nospec_array_ptr(array, idx, max)) {
>                 val = *elem_ptr;
>                 ...
>         }
>
> ... which also means we don't have to annotate every single load in the branch
> if the element is a structure with multiple fields.

We wouldn't need to annotate each load in that case, right? Just the
instance that uses idx to calculate an address.

if_nospec (idx < max_idx) {
   elem_ptr = nospec_array_ptr(array, idx, max);
   val = elem_ptr->val;
   val2 = elem_ptr->val2;
}

> Does that sound workable to you?

Relative to the urgency of getting fixes upstream I'm ok with whatever
approach generates the least debate from sub-system maintainers.

One concern is that on x86 the:

    if ((elem_ptr = nospec_array_ptr(array, idx, max)) {

...approach produces two conditional branches whereas:

    if_nospec (idx < max_idx) {
        elem_ptr = nospec_array_ptr(array, idx, max);

....can be optimized to one conditional with the barrier.

Is that convincing enough to proceed with if_nospec + nospec_* combination?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04 21:43               ` Dan Williams
@ 2018-01-04 22:20                 ` Linus Torvalds
  2018-01-04 22:23                   ` Linus Torvalds
  2018-01-04 22:55                   ` Alan Cox
  2018-01-04 22:44                 ` Pavel Machek
  1 sibling, 2 replies; 66+ messages in thread
From: Linus Torvalds @ 2018-01-04 22:20 UTC (permalink / raw)
  To: Dan Williams
  Cc: Pavel Machek, Julia Lawall, Alan Cox, Linux Kernel Mailing List,
	Mark Rutland, linux-arch, Peter Zijlstra, Greg KH,
	Thomas Gleixner, Elena Reshetova, Alan Cox, Dan Carpenter

On Thu, Jan 4, 2018 at 1:43 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>
> As far as I understand the presence of array2[val] discloses more
> information, but in terms of the cpu taking an action that it is
> observable in the cache that's already occurred when "val =
> array[attacker_controlled_index];" is speculated. Lets err on the side
> of caution and shut down all the observable actions that are already
> explicitly gated by an input validation check. In other words, a low
> bandwidth information leak is still a leak.

I do think that the change to __fcheck_files() is interesting, because
that one is potentially rather performance-sensitive.

And the data access speculation annotations we presumably won't be
able to get rid of eventually with newer hardware, so to some degree
this is a bigger deal than the random hacks that affect _current_
hardware but hopefully hw designers will fix in the future.

That does make me think that we should strive to perhaps use the same
model that the BPF people are looking at: making the address
generation part of the computation, rather than using a barrier. I'm
not sure how expensive lfence is, but from every single time ever in
the history of man, barriers have been a bad idea. Which makes me
suspect that lfence is a bad idea too.

If one common pattern is going to be bounces checking array accesses
like this, then we probably should strive to turn

    if (index < bound)
       val = array[index];

into making sure we actually have that data dependency on the address
instead. Whether that is done with a "select" instruction or by using
an "and" with -1/0 is then a small detail.

I wonder if we can make the interface be something really straightforward:

 - specify a special

        array_access(array, index, max)

    operation, and say that the access is guaranteed to not
speculatively access outside the array, and return 0 (of the type
"array entry") if outside the range.

 - further specify that it's safe as READ_ONCE() and/or RCU access
(and only defined for native data types)

That would turn that existing __fcheck_files() from

        if (fd < fdt->max_fds)
                return rcu_dereference_raw(fdt->fd[fd]);
        return NULL;

into just

        return array_access(fdt->fd, fd, ft->max_fds);

and the *implementation* might be architecture-specific, but one
particular implementation would be something like simply

#define array_access(base, idx, max) ({                         \
        union { typeof(base[0]) _val; unsigned long _bit; } __u;\
        unsigned long _i = (idx);                               \
        unsigned long _m = (max);                               \
        unsigned long _mask = _i < _m ? ~0 : 0;                 \
        OPTIMIZER_HIDE_VAR(_mask);                              \
        __u._val = base[_i & _mask];                            \
        __u._bit &= _mask;                                      \
        __u._val; })

(Ok, the above is not exhaustively tested, but you get the idea, and
gcc doesn't seem to mess it up *too* badly).

No barriers anywhere, except for the compiler barrier to make sure the
compiler doesn't see how it all depends on that comparison, and
actually uses two "and" instructions to fix the address and the data.

How slow is lfence? Maybe it's not slow, but the above looks like it
*might* be better..

                     Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04 22:20                 ` Linus Torvalds
@ 2018-01-04 22:23                   ` Linus Torvalds
  2018-01-04 22:55                   ` Alan Cox
  1 sibling, 0 replies; 66+ messages in thread
From: Linus Torvalds @ 2018-01-04 22:23 UTC (permalink / raw)
  To: Dan Williams
  Cc: Pavel Machek, Julia Lawall, Alan Cox, Linux Kernel Mailing List,
	Mark Rutland, linux-arch, Peter Zijlstra, Greg KH,
	Thomas Gleixner, Elena Reshetova, Alan Cox, Dan Carpenter

On Thu, Jan 4, 2018 at 2:20 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> #define array_access(base, idx, max) ({                         \
>         union { typeof(base[0]) _val; unsigned long _bit; } __u;\
>         unsigned long _i = (idx);                               \
>         unsigned long _m = (max);                               \
>         unsigned long _mask = _i < _m ? ~0 : 0;                 \
>         OPTIMIZER_HIDE_VAR(_mask);                              \
>         __u._val = base[_i & _mask];                            \
>         __u._bit &= _mask;                                      \
>         __u._val; })

That

    __u._val = base[_i & _mask];

thing would have to be READ_ONCE() to make it all ok for the special
cases without locking etc.

              Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04 21:43               ` Dan Williams
  2018-01-04 22:20                 ` Linus Torvalds
@ 2018-01-04 22:44                 ` Pavel Machek
  2018-01-04 23:12                   ` Dan Williams
  1 sibling, 1 reply; 66+ messages in thread
From: Pavel Machek @ 2018-01-04 22:44 UTC (permalink / raw)
  To: Dan Williams
  Cc: Julia Lawall, Alan Cox, Linus Torvalds,
	Linux Kernel Mailing List, Mark Rutland, linux-arch,
	Peter Zijlstra, Greg KH, Thomas Gleixner, Elena Reshetova,
	Alan Cox, Dan Carpenter

[-- Attachment #1: Type: text/plain, Size: 2194 bytes --]

Hi!

> >> > What should one be looking for.  Do you have a typical example?
> >> >
> >>
> >> See "Exploiting Conditional Branch Misprediction" from the paper [1].
> >>
> >> The typical example is an attacker controlled index used to trigger a
> >> dependent read near a branch. Where an example of "near" from the
> >> paper is "up to 188 simple instructions inserted in the source code
> >> between the ‘if’ statement and the line accessing array...".
> >>
> >> if (attacker_controlled_index < bound)
> >>      val = array[attacker_controlled_index];
> >> else
> >>     return error;
> >>
> >> ...when the cpu speculates that the 'index < bound' branch is taken it
> >> reads index and uses that value to read array[index]. The result of an
> >> 'array' relative read is potentially observable in the cache.
> >
> > You still need
> >
> >         (void) array2[val];
> >
> > after that to get something observable, right?
> 
> As far as I understand the presence of array2[val] discloses more
> information, but in terms of the cpu taking an action that it is
> observable in the cache that's already occurred when "val =
> array[attacker_controlled_index];" is speculated. Lets err on the

Well yes, attacker can observe val = 
array[attacker_controlled_index]; . But that's not something he's
interested in. So the CPU cheats and attacker has a proof. But he knew
that before.

>side
> of caution and shut down all the observable actions that are already
> explicitly gated by an input validation check. In other words, a low
> bandwidth information leak is still a leak.

What did it leak? Nothing. Attacker had to know
array+attacker_controlled_index, and he now knows
(array+attacker_controlled_index)%CACHELINE_SIZE.

With (void) array2[val];, the attack gets interesting -- I now know
*(array+attacker_controlled_index) % CACHELINE_SIZE ... allowing me to
get information from arbitrary place in memory -- which is useful for
.. reading ssh keys, for example.

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04 22:20                 ` Linus Torvalds
  2018-01-04 22:23                   ` Linus Torvalds
@ 2018-01-04 22:55                   ` Alan Cox
  2018-01-04 23:06                     ` Linus Torvalds
  1 sibling, 1 reply; 66+ messages in thread
From: Alan Cox @ 2018-01-04 22:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dan Williams, Pavel Machek, Julia Lawall,
	Linux Kernel Mailing List, Mark Rutland, linux-arch,
	Peter Zijlstra, Greg KH, Thomas Gleixner, Elena Reshetova,
	Alan Cox, Dan Carpenter

> and the *implementation* might be architecture-specific, but one
> particular implementation would be something like simply
> 
> #define array_access(base, idx, max) ({                         \
>         union { typeof(base[0]) _val; unsigned long _bit; } __u;\
>         unsigned long _i = (idx);                               \
>         unsigned long _m = (max);                               \
>         unsigned long _mask = _i < _m ? ~0 : 0;                 \
>         OPTIMIZER_HIDE_VAR(_mask);                              \
>         __u._val = base[_i & _mask];                            \
>         __u._bit &= _mask;                                      \
>         __u._val; })
> 
> (Ok, the above is not exhaustively tested, but you get the idea, and
> gcc doesn't seem to mess it up *too* badly).

How do you ensure that the CPU doesn't speculate j < _m  ? ~0 : 0 pick the
wrong mask and then reference base[] ?

It's a nice idea but I think we'd need to run it past the various CPU
vendors especially before it was implemented as a generic solution.
Anding with a constant works because the constant doesn't get speculated
and nor does the and with a constant, but you've got a whole additional
conditional path in your macro.

Alan

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04 22:55                   ` Alan Cox
@ 2018-01-04 23:06                     ` Linus Torvalds
  2018-01-04 23:11                       ` Alan Cox
  2018-01-05  0:24                       ` Dan Williams
  0 siblings, 2 replies; 66+ messages in thread
From: Linus Torvalds @ 2018-01-04 23:06 UTC (permalink / raw)
  To: Alan Cox
  Cc: Dan Williams, Pavel Machek, Julia Lawall,
	Linux Kernel Mailing List, Mark Rutland, linux-arch,
	Peter Zijlstra, Greg KH, Thomas Gleixner, Elena Reshetova,
	Alan Cox, Dan Carpenter

On Thu, Jan 4, 2018 at 2:55 PM, Alan Cox <gnomes@lxorguk.ukuu.org.uk> wrote:
>
> How do you ensure that the CPU doesn't speculate j < _m  ? ~0 : 0 pick the
> wrong mask and then reference base[] ?

.. yeah, that's exactly where we want to make sure that the compiler
uses a select or 'setb'.

That's what gcc does for me in testing:

        xorl    %eax, %eax
        setbe   %al
        negq    %rax

 but yes, we'd need to guarantee it somehow.

Presumably that is where we end up having some arch-specific stuff.
Possibly there is some gcc builtin. I wanted to avoid actually writing
architecture-specific asm.

> Anding with a constant works because the constant doesn't get speculated
> and nor does the and with a constant, but you've got a whole additional
> conditional path in your macro.

Absolutely. Think of it as an example, not "the solution".

It's also possible that x86 'lfence' really is so fast that it doesn't
make sense to try to do this. Agner Fog claims that it's single-cycle
(well, except for P4, surprise, surprise), but I suspect that his
timings are simply for 'lfence' in a loop or something. Which may not
show the real cost of actually halting things until they are stable.

Also, maybe that __fcheck_files() pattern where getting a NULL pointer
happens to be the right thing for out-of-range is so unusual as to be
useless, and most people end up having to have that limit check for
other reasons anyway.

          Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04 23:06                     ` Linus Torvalds
@ 2018-01-04 23:11                       ` Alan Cox
  2018-01-05  0:24                       ` Dan Williams
  1 sibling, 0 replies; 66+ messages in thread
From: Alan Cox @ 2018-01-04 23:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dan Williams, Pavel Machek, Julia Lawall,
	Linux Kernel Mailing List, Mark Rutland, linux-arch,
	Peter Zijlstra, Greg KH, Thomas Gleixner, Elena Reshetova,
	Alan Cox, Dan Carpenter

> It's also possible that x86 'lfence' really is so fast that it doesn't
> make sense to try to do this.

If you've got a lot going on it's not 1 cycle.

Alan

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04 22:44                 ` Pavel Machek
@ 2018-01-04 23:12                   ` Dan Williams
  2018-01-04 23:21                     ` Alan Cox
  2018-01-04 23:33                     ` Pavel Machek
  0 siblings, 2 replies; 66+ messages in thread
From: Dan Williams @ 2018-01-04 23:12 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Julia Lawall, Alan Cox, Linus Torvalds,
	Linux Kernel Mailing List, Mark Rutland, linux-arch,
	Peter Zijlstra, Greg KH, Thomas Gleixner, Elena Reshetova,
	Alan Cox, Dan Carpenter

On Thu, Jan 4, 2018 at 2:44 PM, Pavel Machek <pavel@ucw.cz> wrote:
> Hi!
>
>> >> > What should one be looking for.  Do you have a typical example?
>> >> >
>> >>
>> >> See "Exploiting Conditional Branch Misprediction" from the paper [1].
>> >>
>> >> The typical example is an attacker controlled index used to trigger a
>> >> dependent read near a branch. Where an example of "near" from the
>> >> paper is "up to 188 simple instructions inserted in the source code
>> >> between the ‘if’ statement and the line accessing array...".
>> >>
>> >> if (attacker_controlled_index < bound)
>> >>      val = array[attacker_controlled_index];
>> >> else
>> >>     return error;
>> >>
>> >> ...when the cpu speculates that the 'index < bound' branch is taken it
>> >> reads index and uses that value to read array[index]. The result of an
>> >> 'array' relative read is potentially observable in the cache.
>> >
>> > You still need
>> >
>> >         (void) array2[val];
>> >
>> > after that to get something observable, right?
>>
>> As far as I understand the presence of array2[val] discloses more
>> information, but in terms of the cpu taking an action that it is
>> observable in the cache that's already occurred when "val =
>> array[attacker_controlled_index];" is speculated. Lets err on the
>
> Well yes, attacker can observe val =
> array[attacker_controlled_index]; . But that's not something he's
> interested in. So the CPU cheats and attacker has a proof. But he knew
> that before.
>
>>side
>> of caution and shut down all the observable actions that are already
>> explicitly gated by an input validation check. In other words, a low
>> bandwidth information leak is still a leak.
>
> What did it leak? Nothing. Attacker had to know
> array+attacker_controlled_index, and he now knows
> (array+attacker_controlled_index)%CACHELINE_SIZE.
>
> With (void) array2[val];, the attack gets interesting -- I now know
> *(array+attacker_controlled_index) % CACHELINE_SIZE ... allowing me to
> get information from arbitrary place in memory -- which is useful for
> .. reading ssh keys, for example.

Right, but how far away from "val = array[attacker_controlled_index];"
in the instruction stream do you need to look before you're
comfortable there's no 'val' dependent reads in the speculation window
on all possible architectures. Until we have variable annotations and
compiler help my guess is that static analysis has an easier time
pointing us to the first potentially bad speculative access.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04 23:12                   ` Dan Williams
@ 2018-01-04 23:21                     ` Alan Cox
  2018-01-04 23:33                     ` Pavel Machek
  1 sibling, 0 replies; 66+ messages in thread
From: Alan Cox @ 2018-01-04 23:21 UTC (permalink / raw)
  To: Dan Williams
  Cc: Pavel Machek, Julia Lawall, Linus Torvalds,
	Linux Kernel Mailing List, Mark Rutland, linux-arch,
	Peter Zijlstra, Greg KH, Thomas Gleixner, Elena Reshetova,
	Alan Cox, Dan Carpenter

> Right, but how far away from "val = array[attacker_controlled_index];"
> in the instruction stream do you need to look before you're
> comfortable there's no 'val' dependent reads in the speculation window
> on all possible architectures. Until we have variable annotations and
> compiler help my guess is that static analysis has an easier time
> pointing us to the first potentially bad speculative access.

On x86 the researchers are saying up to 180 or so simple instructions.
I've not seen any Intel or AMD official quoted value. It's enough that
you need to peer across functions.

Alan

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04 23:12                   ` Dan Williams
  2018-01-04 23:21                     ` Alan Cox
@ 2018-01-04 23:33                     ` Pavel Machek
  2018-01-05  8:11                       ` Julia Lawall
  1 sibling, 1 reply; 66+ messages in thread
From: Pavel Machek @ 2018-01-04 23:33 UTC (permalink / raw)
  To: Dan Williams
  Cc: Julia Lawall, Alan Cox, Linus Torvalds,
	Linux Kernel Mailing List, Mark Rutland, linux-arch,
	Peter Zijlstra, Greg KH, Thomas Gleixner, Elena Reshetova,
	Alan Cox, Dan Carpenter

[-- Attachment #1: Type: text/plain, Size: 1337 bytes --]

Hi!

> > What did it leak? Nothing. Attacker had to know
> > array+attacker_controlled_index, and he now knows
> > (array+attacker_controlled_index)%CACHELINE_SIZE.
> >
> > With (void) array2[val];, the attack gets interesting -- I now know
> > *(array+attacker_controlled_index) % CACHELINE_SIZE ... allowing me to
> > get information from arbitrary place in memory -- which is useful for
> > .. reading ssh keys, for example.
> 
> Right, but how far away from "val = array[attacker_controlled_index];"
> in the instruction stream do you need to look before you're
> comfortable there's no 'val' dependent reads in the speculation window
> on all possible architectures. Until we have variable annotations and
> compiler help my guess is that static analysis has an easier time
> pointing us to the first potentially bad speculative access.

Well, you are already scanning for if (attacker_controlled_index <
limit) .... array[attacker_controlled_index] and those can already be
far away from each other....

Anyway, likely in the end human should be creating the patch, and if
there's no array2[val], we do not need the patch after all.

Best regards,

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04 23:06                     ` Linus Torvalds
  2018-01-04 23:11                       ` Alan Cox
@ 2018-01-05  0:24                       ` Dan Williams
  1 sibling, 0 replies; 66+ messages in thread
From: Dan Williams @ 2018-01-05  0:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Pavel Machek, Julia Lawall, Linux Kernel Mailing List,
	Mark Rutland, linux-arch, Peter Zijlstra, Greg KH,
	Thomas Gleixner, Elena Reshetova, Alan Cox, Dan Carpenter

On Thu, Jan 4, 2018 at 3:06 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Jan 4, 2018 at 2:55 PM, Alan Cox <gnomes@lxorguk.ukuu.org.uk> wrote:
>>
>> How do you ensure that the CPU doesn't speculate j < _m  ? ~0 : 0 pick the
>> wrong mask and then reference base[] ?
>
> .. yeah, that's exactly where we want to make sure that the compiler
> uses a select or 'setb'.
>
> That's what gcc does for me in testing:
>
>         xorl    %eax, %eax
>         setbe   %al
>         negq    %rax
>
>  but yes, we'd need to guarantee it somehow.
>
> Presumably that is where we end up having some arch-specific stuff.
> Possibly there is some gcc builtin. I wanted to avoid actually writing
> architecture-specific asm.
>
>> Anding with a constant works because the constant doesn't get speculated
>> and nor does the and with a constant, but you've got a whole additional
>> conditional path in your macro.
>
> Absolutely. Think of it as an example, not "the solution".
>
> It's also possible that x86 'lfence' really is so fast that it doesn't
> make sense to try to do this. Agner Fog claims that it's single-cycle
> (well, except for P4, surprise, surprise), but I suspect that his
> timings are simply for 'lfence' in a loop or something. Which may not
> show the real cost of actually halting things until they are stable.
>
> Also, maybe that __fcheck_files() pattern where getting a NULL pointer
> happens to be the right thing for out-of-range is so unusual as to be
> useless, and most people end up having to have that limit check for
> other reasons anyway.

This potential barrier avoidance optimization technique is something
that could fit in the nospec_{ptr,load,array_load} interface that Mark
defined for ARM, and is constructed around a proposed compiler
builtin. Although, lfence is not a full serializing instruction, so
before we spend too much effort trying to kill it we should measure
how bad it is in practice in these paths.

At this point I'll go ahead with rewriting the osb() patches into
using Mark's nospec_* accessors plus the if_nospec macro. We can kill
the barrier in if_nospec once we are sure the compiler will always "Do
the Right Thing" with the array_access() approach, and can otherwise
make the array_access() approach the 'best-effort' default fallback.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH 1/4] asm-generic/barrier: add generic nospec helpers
  2018-01-04 12:00   ` Mark Rutland
@ 2018-01-05  4:21     ` Dan Williams
  2018-01-05  9:15       ` Mark Rutland
  0 siblings, 1 reply; 66+ messages in thread
From: Dan Williams @ 2018-01-05  4:21 UTC (permalink / raw)
  To: Mark Rutland; +Cc: Linux Kernel Mailing List, linux-arch, Will Deacon

On Thu, Jan 4, 2018 at 4:00 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> On Wed, Jan 03, 2018 at 10:38:24PM +0000, Mark Rutland wrote:
>> +#define nospec_array_load(arr, idx, sz)                                      \
>> +({                                                                   \
>> +     typeof(*(arr)) *__arr = arr;                                    \
>> +     typeof(idx) __idx = idx;                                        \
>> +     typeof(sz) __sz = __sz;                                         \
>
> Whoops. The second __sz should be sz here.

Those should all have parenthesis on the args too, right?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04 23:33                     ` Pavel Machek
@ 2018-01-05  8:11                       ` Julia Lawall
  0 siblings, 0 replies; 66+ messages in thread
From: Julia Lawall @ 2018-01-05  8:11 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Dan Williams, Julia Lawall, Alan Cox, Linus Torvalds,
	Linux Kernel Mailing List, Mark Rutland, linux-arch,
	Peter Zijlstra, Greg KH, Thomas Gleixner, Elena Reshetova,
	Alan Cox, Dan Carpenter

It looks like the problem in terms of detection is to find values that
should be annotated as __user.  Poking around a bit, it seems like this
tool is doing just that:

http://www.cs.umd.edu/~jfoster/cqual/

It dates from 2004, but perhaps the developer could be motivated to pick
it up again.

I don't think Coccinelle would be good for doing this (ie implementing
taint analysis) because the dataflow is too complicated.

julia

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH 1/4] asm-generic/barrier: add generic nospec helpers
  2018-01-05  4:21     ` Dan Williams
@ 2018-01-05  9:15       ` Mark Rutland
  0 siblings, 0 replies; 66+ messages in thread
From: Mark Rutland @ 2018-01-05  9:15 UTC (permalink / raw)
  To: Dan Williams; +Cc: Linux Kernel Mailing List, linux-arch, Will Deacon

On Thu, Jan 04, 2018 at 08:21:56PM -0800, Dan Williams wrote:
> On Thu, Jan 4, 2018 at 4:00 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> > On Wed, Jan 03, 2018 at 10:38:24PM +0000, Mark Rutland wrote:
> >> +#define nospec_array_load(arr, idx, sz)                                      \
> >> +({                                                                   \
> >> +     typeof(*(arr)) *__arr = arr;                                    \
> >> +     typeof(idx) __idx = idx;                                        \
> >> +     typeof(sz) __sz = __sz;                                         \
> >
> > Whoops. The second __sz should be sz here.
> 
> Those should all have parenthesis on the args too, right?

Probably, yes.

I've added those to the version in my core/nospec branch.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-04 22:09                 ` Dan Williams
@ 2018-01-05 14:40                   ` Mark Rutland
  2018-01-05 16:44                     ` Dan Williams
  0 siblings, 1 reply; 66+ messages in thread
From: Mark Rutland @ 2018-01-05 14:40 UTC (permalink / raw)
  To: Dan Williams
  Cc: torvalds, linux-kernel, peterz, tglx, alan, Reshetova, Elena,
	gnomes, gregkh, jikos, linux-arch

On Thu, Jan 04, 2018 at 02:09:52PM -0800, Dan Williams wrote:
> On Thu, Jan 4, 2018 at 3:47 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> > Hi Dan,
> >
> > Thanks for these examples.
> >
> > On Thu, Jan 04, 2018 at 03:10:51AM +0000, Williams, Dan J wrote:
> >> Note, the following is Elena's work, I'm just helping poke the upstream
> >> discussion along while she's offline.
> >>
> >> Elena audited the static analysis reports down to the following
> >> locations where userspace provides a value for a conditional branch and
> >> then later creates a dependent load on that same userspace controlled
> >> value.
> >>
> >> 'osb()', observable memory barrier, resolves to an lfence on x86. My
> >> concern with these changes is that it is not clear what content within
> >> the branch block is of concern. Peter's 'if_nospec' proposal combined
> >> with Mark's 'nospec_load' would seem to clear that up, from a self
> >> documenting code perspective, and let archs optionally protect entire
> >> conditional blocks or individual loads within those blocks.
> >
> > I'm a little concerned by having to use two helpers that need to be paired. It
> > means that we have to duplicate the bounds information, and I suspect in
> > practice that they'll often be paired improperly.
> 
> Hmm, will they be often mispaired? All of the examples to date the
> load occurs in the same code block as the bound checking if()
> statement.

Practically speaking, barriers get misused all the time, and having a
single helper minimizes the risk or misuse.

> > I think that we can avoid those problems if we use nospec_ptr() on its own. It
> > returns NULL if the pointer would be out-of-bounds, so we can use it in the
> > if-statement.
> >
> > On x86, that can contain the bounds checks followed be an OSB / lfence, like
> > if_nospec(). On arm we can use our architected sequence. In either case,
> > nospec_ptr() returns NULL if the pointer would be out-of-bounds.
> >
> > Then, rather than sequences like:
> >
> >         if_nospec(idx < max) {
> >                 val = nospec_array_load(array, idx, max);
> >                 ...
> >         }
> >
> > ... we could have:
> >
> >         if ((elem_ptr = nospec_array_ptr(array, idx, max)) {
> >                 val = *elem_ptr;
> >                 ...
> >         }
> >
> > ... which also means we don't have to annotate every single load in the branch
> > if the element is a structure with multiple fields.
> 
> We wouldn't need to annotate each load in that case, right? Just the
> instance that uses idx to calculate an address.

Correct.

> if_nospec (idx < max_idx) {
>    elem_ptr = nospec_array_ptr(array, idx, max);
>    val = elem_ptr->val;
>    val2 = elem_ptr->val2;
> }
> 
> > Does that sound workable to you?
> 
> Relative to the urgency of getting fixes upstream I'm ok with whatever
> approach generates the least debate from sub-system maintainers.
> 
> One concern is that on x86 the:
> 
>     if ((elem_ptr = nospec_array_ptr(array, idx, max)) {
> 
> ...approach produces two conditional branches whereas:
> 
>     if_nospec (idx < max_idx) {
>         elem_ptr = nospec_array_ptr(array, idx, max);
> 
> ....can be optimized to one conditional with the barrier.

Do you mean because the NULL check is redundant? I was hoping that the
compiler would have the necessary visibility to fold that with the
bounds check (on the assumption that the array base must not be NULL).

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-05 14:40                   ` Mark Rutland
@ 2018-01-05 16:44                     ` Dan Williams
  2018-01-05 18:05                       ` Dan Williams
  0 siblings, 1 reply; 66+ messages in thread
From: Dan Williams @ 2018-01-05 16:44 UTC (permalink / raw)
  To: Mark Rutland
  Cc: torvalds, linux-kernel, peterz, tglx, alan, Reshetova, Elena,
	gnomes, gregkh, jikos, linux-arch

On Fri, Jan 5, 2018 at 6:40 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> On Thu, Jan 04, 2018 at 02:09:52PM -0800, Dan Williams wrote:
>> On Thu, Jan 4, 2018 at 3:47 AM, Mark Rutland <mark.rutland@arm.com> wrote:
>> > Hi Dan,
>> >
>> > Thanks for these examples.
>> >
>> > On Thu, Jan 04, 2018 at 03:10:51AM +0000, Williams, Dan J wrote:
>> >> Note, the following is Elena's work, I'm just helping poke the upstream
>> >> discussion along while she's offline.
>> >>
>> >> Elena audited the static analysis reports down to the following
>> >> locations where userspace provides a value for a conditional branch and
>> >> then later creates a dependent load on that same userspace controlled
>> >> value.
>> >>
>> >> 'osb()', observable memory barrier, resolves to an lfence on x86. My
>> >> concern with these changes is that it is not clear what content within
>> >> the branch block is of concern. Peter's 'if_nospec' proposal combined
>> >> with Mark's 'nospec_load' would seem to clear that up, from a self
>> >> documenting code perspective, and let archs optionally protect entire
>> >> conditional blocks or individual loads within those blocks.
>> >
>> > I'm a little concerned by having to use two helpers that need to be paired. It
>> > means that we have to duplicate the bounds information, and I suspect in
>> > practice that they'll often be paired improperly.
>>
>> Hmm, will they be often mispaired? All of the examples to date the
>> load occurs in the same code block as the bound checking if()
>> statement.
>
> Practically speaking, barriers get misused all the time, and having a
> single helper minimizes the risk or misuse.

I agree, but this is why if_nospec hides the barrier / makes it implicit.

>
>> > I think that we can avoid those problems if we use nospec_ptr() on its own. It
>> > returns NULL if the pointer would be out-of-bounds, so we can use it in the
>> > if-statement.
>> >
>> > On x86, that can contain the bounds checks followed be an OSB / lfence, like
>> > if_nospec(). On arm we can use our architected sequence. In either case,
>> > nospec_ptr() returns NULL if the pointer would be out-of-bounds.
>> >
>> > Then, rather than sequences like:
>> >
>> >         if_nospec(idx < max) {
>> >                 val = nospec_array_load(array, idx, max);
>> >                 ...
>> >         }
>> >
>> > ... we could have:
>> >
>> >         if ((elem_ptr = nospec_array_ptr(array, idx, max)) {
>> >                 val = *elem_ptr;
>> >                 ...
>> >         }
>> >
>> > ... which also means we don't have to annotate every single load in the branch
>> > if the element is a structure with multiple fields.
>>
>> We wouldn't need to annotate each load in that case, right? Just the
>> instance that uses idx to calculate an address.
>
> Correct.
>
>> if_nospec (idx < max_idx) {
>>    elem_ptr = nospec_array_ptr(array, idx, max);
>>    val = elem_ptr->val;
>>    val2 = elem_ptr->val2;
>> }
>>
>> > Does that sound workable to you?
>>
>> Relative to the urgency of getting fixes upstream I'm ok with whatever
>> approach generates the least debate from sub-system maintainers.
>>
>> One concern is that on x86 the:
>>
>>     if ((elem_ptr = nospec_array_ptr(array, idx, max)) {
>>
>> ...approach produces two conditional branches whereas:
>>
>>     if_nospec (idx < max_idx) {
>>         elem_ptr = nospec_array_ptr(array, idx, max);
>>
>> ....can be optimized to one conditional with the barrier.
>
> Do you mean because the NULL check is redundant? I was hoping that the
> compiler would have the necessary visibility to fold that with the
> bounds check (on the assumption that the array base must not be NULL).

...but there's legitimately 2 conditionals one to control the
non-speculative flow, and one to sink the speculation ala the
array_access() approach from Linus. Keeping those separate seems to
lead to less change in the suspected areas. In any event I'll post the
reworked patches and we can iterate from there.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
  2018-01-05 16:44                     ` Dan Williams
@ 2018-01-05 18:05                       ` Dan Williams
  0 siblings, 0 replies; 66+ messages in thread
From: Dan Williams @ 2018-01-05 18:05 UTC (permalink / raw)
  To: Mark Rutland
  Cc: torvalds, linux-kernel, peterz, tglx, alan, Reshetova, Elena,
	gnomes, gregkh, jikos, linux-arch

On Fri, Jan 5, 2018 at 8:44 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Fri, Jan 5, 2018 at 6:40 AM, Mark Rutland <mark.rutland@arm.com> wrote:
>> On Thu, Jan 04, 2018 at 02:09:52PM -0800, Dan Williams wrote:
>>> On Thu, Jan 4, 2018 at 3:47 AM, Mark Rutland <mark.rutland@arm.com> wrote:
>>> > Hi Dan,
>>> >
>>> > Thanks for these examples.
>>> >
>>> > On Thu, Jan 04, 2018 at 03:10:51AM +0000, Williams, Dan J wrote:
>>> >> Note, the following is Elena's work, I'm just helping poke the upstream
>>> >> discussion along while she's offline.
>>> >>
>>> >> Elena audited the static analysis reports down to the following
>>> >> locations where userspace provides a value for a conditional branch and
>>> >> then later creates a dependent load on that same userspace controlled
>>> >> value.
>>> >>
>>> >> 'osb()', observable memory barrier, resolves to an lfence on x86. My
>>> >> concern with these changes is that it is not clear what content within
>>> >> the branch block is of concern. Peter's 'if_nospec' proposal combined
>>> >> with Mark's 'nospec_load' would seem to clear that up, from a self
>>> >> documenting code perspective, and let archs optionally protect entire
>>> >> conditional blocks or individual loads within those blocks.
>>> >
>>> > I'm a little concerned by having to use two helpers that need to be paired. It
>>> > means that we have to duplicate the bounds information, and I suspect in
>>> > practice that they'll often be paired improperly.
>>>
>>> Hmm, will they be often mispaired? All of the examples to date the
>>> load occurs in the same code block as the bound checking if()
>>> statement.
>>
>> Practically speaking, barriers get misused all the time, and having a
>> single helper minimizes the risk or misuse.
>
> I agree, but this is why if_nospec hides the barrier / makes it implicit.
>
>>
>>> > I think that we can avoid those problems if we use nospec_ptr() on its own. It
>>> > returns NULL if the pointer would be out-of-bounds, so we can use it in the
>>> > if-statement.
>>> >
>>> > On x86, that can contain the bounds checks followed be an OSB / lfence, like
>>> > if_nospec(). On arm we can use our architected sequence. In either case,
>>> > nospec_ptr() returns NULL if the pointer would be out-of-bounds.
>>> >
>>> > Then, rather than sequences like:
>>> >
>>> >         if_nospec(idx < max) {
>>> >                 val = nospec_array_load(array, idx, max);
>>> >                 ...
>>> >         }
>>> >
>>> > ... we could have:
>>> >
>>> >         if ((elem_ptr = nospec_array_ptr(array, idx, max)) {
>>> >                 val = *elem_ptr;
>>> >                 ...
>>> >         }
>>> >
>>> > ... which also means we don't have to annotate every single load in the branch
>>> > if the element is a structure with multiple fields.
>>>
>>> We wouldn't need to annotate each load in that case, right? Just the
>>> instance that uses idx to calculate an address.
>>
>> Correct.
>>
>>> if_nospec (idx < max_idx) {
>>>    elem_ptr = nospec_array_ptr(array, idx, max);
>>>    val = elem_ptr->val;
>>>    val2 = elem_ptr->val2;
>>> }
>>>
>>> > Does that sound workable to you?
>>>
>>> Relative to the urgency of getting fixes upstream I'm ok with whatever
>>> approach generates the least debate from sub-system maintainers.
>>>
>>> One concern is that on x86 the:
>>>
>>>     if ((elem_ptr = nospec_array_ptr(array, idx, max)) {
>>>
>>> ...approach produces two conditional branches whereas:
>>>
>>>     if_nospec (idx < max_idx) {
>>>         elem_ptr = nospec_array_ptr(array, idx, max);
>>>
>>> ....can be optimized to one conditional with the barrier.
>>
>> Do you mean because the NULL check is redundant? I was hoping that the
>> compiler would have the necessary visibility to fold that with the
>> bounds check (on the assumption that the array base must not be NULL).
>
> ...but there's legitimately 2 conditionals one to control the
> non-speculative flow, and one to sink the speculation ala the
> array_access() approach from Linus. Keeping those separate seems to
> lead to less change in the suspected areas. In any event I'll post the
> reworked patches and we can iterate from there.

Disregard this, I'm going ahead with your new nospec_array_ptr()
helper as it falls out nicely and seems to be equally self documenting
as 'if_nospec'. Since I was already done with the if_nospec +
nospec_array_load conversions it's not much more work to go back and
roll the nospec_array_ptr conversion on top.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
@ 2018-01-05 10:33 Alexey Dobriyan
  0 siblings, 0 replies; 66+ messages in thread
From: Alexey Dobriyan @ 2018-01-05 10:33 UTC (permalink / raw)
  To: viro
  Cc: linux-kernel, dave.hansen, dan.j.williams, ebiederm, gnomes,
	jikos, torvalds

Al Viro wrote:
> > > No, the concern is that an fd value >= fdt->max_fds may cause the cpu
> > > to read arbitrary memory addresses relative to files->fdt and
> > > userspace can observe that it got loaded.
> >
> > Yes.  And all that might reveal is the value of files->fdt.  Who cares?
>
> Sorry, s/files->fdt/files->fdt->fd/.  Still the same question - what information
> would that extract and how would attacker use that?

Al, paper exploit requires second data dependent load but they only do it
for easy demonstration.

    struct file *file = (fd < fdt->max_fds) ? fdt->fd[fd] : NULL;
    if (file && (file->f_mode & mask))
        ...

Speculative "struct file *" can be anything.
If ->f_mode access happens cacheline will be primed.
If pointer is userspace address nothing will happen because of SMAP.

Now you know that some data past fdtable looks like canonical kernel
address.

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2018-01-05 18:05 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-03 22:38 [RFC PATCH 0/4] API for inhibiting speculative arbitrary read primitives Mark Rutland
2018-01-03 22:38 ` [RFC PATCH 1/4] asm-generic/barrier: add generic nospec helpers Mark Rutland
2018-01-04 12:00   ` Mark Rutland
2018-01-05  4:21     ` Dan Williams
2018-01-05  9:15       ` Mark Rutland
2018-01-03 22:38 ` [RFC PATCH 2/4] Documentation: document " Mark Rutland
2018-01-03 22:38 ` [RFC PATCH 3/4] arm64: implement nospec_{load,ptr}() Mark Rutland
2018-01-03 22:38 ` [RFC PATCH 4/4] bpf: inhibit speculated out-of-bounds pointers Mark Rutland
2018-01-03 23:45   ` Peter Zijlstra
2018-01-04 10:59     ` Mark Rutland
2018-01-04  0:15 ` [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier Dan Williams
2018-01-04  0:39   ` Linus Torvalds
2018-01-04  1:07     ` Alan Cox
2018-01-04  1:13       ` Dan Williams
2018-01-04  6:28         ` Julia Lawall
2018-01-04 17:58           ` Dan Williams
2018-01-04 19:26             ` Pavel Machek
2018-01-04 21:43               ` Dan Williams
2018-01-04 22:20                 ` Linus Torvalds
2018-01-04 22:23                   ` Linus Torvalds
2018-01-04 22:55                   ` Alan Cox
2018-01-04 23:06                     ` Linus Torvalds
2018-01-04 23:11                       ` Alan Cox
2018-01-05  0:24                       ` Dan Williams
2018-01-04 22:44                 ` Pavel Machek
2018-01-04 23:12                   ` Dan Williams
2018-01-04 23:21                     ` Alan Cox
2018-01-04 23:33                     ` Pavel Machek
2018-01-05  8:11                       ` Julia Lawall
2018-01-04  1:27       ` Jiri Kosina
2018-01-04  1:41         ` Alan Cox
2018-01-04  1:47           ` Jiri Kosina
2018-01-04 19:39             ` Pavel Machek
2018-01-04 20:32               ` Alan Cox
2018-01-04 20:39                 ` Jiri Kosina
2018-01-04 21:23                   ` Alan Cox
2018-01-04 21:48                     ` Pavel Machek
2018-01-04  1:51         ` Dan Williams
2018-01-04  1:54           ` Linus Torvalds
2018-01-04  3:10             ` Williams, Dan J
2018-01-04  4:44               ` Al Viro
2018-01-04  5:44                 ` Dan Williams
2018-01-04  5:49                   ` Dave Hansen
2018-01-04  5:50                   ` Al Viro
2018-01-04  5:55                     ` Al Viro
2018-01-04  6:42                       ` Dan Williams
2018-01-04  5:01               ` Eric W. Biederman
2018-01-04  5:01                 ` Eric W. Biederman
2018-01-04  6:32                 ` Dan Williams
2018-01-04 14:54                   ` Eric W. Biederman
2018-01-04 14:54                     ` Eric W. Biederman
2018-01-04 16:39                     ` Mark Rutland
2018-01-04 20:56                     ` Pavel Machek
2018-01-04 11:47               ` Mark Rutland
2018-01-04 11:47                 ` Mark Rutland
2018-01-04 11:47                 ` Mark Rutland
2018-01-04 22:09                 ` Dan Williams
2018-01-05 14:40                   ` Mark Rutland
2018-01-05 16:44                     ` Dan Williams
2018-01-05 18:05                       ` Dan Williams
2018-01-04  1:59           ` Jiri Kosina
2018-01-04  2:15             ` Alan Cox
2018-01-04  3:12               ` Alexei Starovoitov
2018-01-04  9:16                 ` Reshetova, Elena
2018-01-04 20:40             ` Pavel Machek
2018-01-05 10:33 Alexey Dobriyan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.