All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4.1 00/10] spectre variant1 mitigations for tip/x86/pti
@ 2018-01-20 21:05 ` Dan Williams
  0 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2018-01-20 21:05 UTC (permalink / raw)
  To: tglx
  Cc: Mark Rutland, kernel-hardening, Peter Zijlstra, Catalin Marinas,
	Will Deacon, H. Peter Anvin, Elena Reshetova, linux-arch,
	Andi Kleen, Jonathan Corbet, x86, Russell King, Ingo Molnar,
	Andrew Honig, alan, Tom Lendacky, Kees Cook, Al Viro,
	Andy Lutomirski, Jim Mattson, Christian Lamparter, gregkh,
	linux-wireless, Paolo Bonzini, Johannes Berg, torvalds,
	David S. Miller

Hi Thomas,

Please consider taking this collection of spectre-v1 mitigations through
the tip/x86/pti branch for 4.16 inclusion. The review feedback has
dropped off considerably, so I believe these patches are ready for -tip
inclusion. However, "nl80211: sanitize array index in parse_txq_params"
stands out as a patch that should get an ack from net/wireless folks
before moving forward.

Also a heads up that x86/pti is missing commit 75f139aaf896 "KVM: x86:
Add memory barrier on vmcs field lookup", so you will hit a trivial
conflict merging "kvm, x86: update spectre-v1 mitigation" from this set
against latest mainline.

The infrastructure includes:

* __uaccess_begin_nospec: similar to __uaccess_begin this invokes
'stac', but it also includes an 'ifence'. After an 'access_ok' check
has speculatively succeeded that result needs to be retired before the
user pointer is de-referenced. '__get_user' can't use the pointer
sanitization approach without redoing the 'access_ok' check, so per
Linus [1] just use 'ifence'.

* MASK_NOSPEC: an assembler macro for x86 'get_user' and syscall entry
that sanitizes a user controlled pointer or array index to zero after a
'cmp %limit %val' instruction sets the CF flag.

* array_ptr: When dereferencing a kernel pointer with a user controlled
index, sanitize the pointer to either NULL or valid addresses under
speculation to eliminate a precondition for Spectre variant1 attacks.
It uses a mask generation technique that does not involve speculative
control flows on either x86 or ARM64 [2].

* x86 array_ptr_mask: Achieve the same effect as the default
'array_ptr_mask' in fewer instructions. This approach does not have the
same "array index and limit must be less than LONG_MAX" constraint as
the default mask.

* array_idx: Similar to 'array_ptr', use a mask to return a valid
pointer or NULL to an array index variable. An example where we need
this is the wireless driver stack where the core sanitizes user input
and the actual usage of the array index is in a different compilation
unit in the low-level driver.

[1]: https://lkml.org/lkml/2018/1/17/929
[2]: https://www.spinics.net/lists/netdev/msg477542.html

---

Dan Williams (9):
      asm/nospec, array_ptr: sanitize speculative array de-references
      x86: implement array_ptr_mask()
      x86: introduce __uaccess_begin_nospec and ifence
      x86, __get_user: use __uaccess_begin_nospec
      x86, get_user: use pointer masking to limit speculation
      x86: narrow out of bounds syscalls to sys_read under speculation
      vfs, fdtable: prevent bounds-check bypass via speculative execution
      kvm, x86: update spectre-v1 mitigation
      nl80211: sanitize array index in parse_txq_params

Mark Rutland (1):
      Documentation: document array_ptr


 Documentation/speculation.txt     |  143 +++++++++++++++++++++++++++++++++++++
 arch/x86/entry/entry_64.S         |    2 +
 arch/x86/include/asm/barrier.h    |   28 +++++++
 arch/x86/include/asm/msr.h        |    3 -
 arch/x86/include/asm/smap.h       |   24 ++++++
 arch/x86/include/asm/uaccess.h    |   15 +++-
 arch/x86/include/asm/uaccess_32.h |    6 +-
 arch/x86/include/asm/uaccess_64.h |   12 ++-
 arch/x86/kvm/vmx.c                |   11 ++-
 arch/x86/lib/getuser.S            |    5 +
 arch/x86/lib/usercopy_32.c        |    8 +-
 include/linux/fdtable.h           |    7 +-
 include/linux/nospec.h            |   65 +++++++++++++++++
 net/wireless/nl80211.c            |   10 ++-
 14 files changed, 312 insertions(+), 27 deletions(-)
 create mode 100644 Documentation/speculation.txt
 create mode 100644 include/linux/nospec.h

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v4.1 00/10] spectre variant1 mitigations for tip/x86/pti
@ 2018-01-20 21:05 ` Dan Williams
  0 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2018-01-20 21:05 UTC (permalink / raw)
  To: tglx
  Cc: Mark Rutland, kernel-hardening, Peter Zijlstra, Catalin Marinas,
	Will Deacon, H. Peter Anvin, Elena Reshetova, linux-arch,
	Andi Kleen, Jonathan Corbet, x86, Russell King, Ingo Molnar,
	Andrew Honig, alan, Tom Lendacky, Kees Cook, Al Viro,
	Andy Lutomirski, Jim Mattson, Christian Lamparter, gregkh

Hi Thomas,

Please consider taking this collection of spectre-v1 mitigations through
the tip/x86/pti branch for 4.16 inclusion. The review feedback has
dropped off considerably, so I believe these patches are ready for -tip
inclusion. However, "nl80211: sanitize array index in parse_txq_params"
stands out as a patch that should get an ack from net/wireless folks
before moving forward.

Also a heads up that x86/pti is missing commit 75f139aaf896 "KVM: x86:
Add memory barrier on vmcs field lookup", so you will hit a trivial
conflict merging "kvm, x86: update spectre-v1 mitigation" from this set
against latest mainline.

The infrastructure includes:

* __uaccess_begin_nospec: similar to __uaccess_begin this invokes
'stac', but it also includes an 'ifence'. After an 'access_ok' check
has speculatively succeeded that result needs to be retired before the
user pointer is de-referenced. '__get_user' can't use the pointer
sanitization approach without redoing the 'access_ok' check, so per
Linus [1] just use 'ifence'.

* MASK_NOSPEC: an assembler macro for x86 'get_user' and syscall entry
that sanitizes a user controlled pointer or array index to zero after a
'cmp %limit %val' instruction sets the CF flag.

* array_ptr: When dereferencing a kernel pointer with a user controlled
index, sanitize the pointer to either NULL or valid addresses under
speculation to eliminate a precondition for Spectre variant1 attacks.
It uses a mask generation technique that does not involve speculative
control flows on either x86 or ARM64 [2].

* x86 array_ptr_mask: Achieve the same effect as the default
'array_ptr_mask' in fewer instructions. This approach does not have the
same "array index and limit must be less than LONG_MAX" constraint as
the default mask.

* array_idx: Similar to 'array_ptr', use a mask to return a valid
pointer or NULL to an array index variable. An example where we need
this is the wireless driver stack where the core sanitizes user input
and the actual usage of the array index is in a different compilation
unit in the low-level driver.

[1]: https://lkml.org/lkml/2018/1/17/929
[2]: https://www.spinics.net/lists/netdev/msg477542.html

---

Dan Williams (9):
      asm/nospec, array_ptr: sanitize speculative array de-references
      x86: implement array_ptr_mask()
      x86: introduce __uaccess_begin_nospec and ifence
      x86, __get_user: use __uaccess_begin_nospec
      x86, get_user: use pointer masking to limit speculation
      x86: narrow out of bounds syscalls to sys_read under speculation
      vfs, fdtable: prevent bounds-check bypass via speculative execution
      kvm, x86: update spectre-v1 mitigation
      nl80211: sanitize array index in parse_txq_params

Mark Rutland (1):
      Documentation: document array_ptr


 Documentation/speculation.txt     |  143 +++++++++++++++++++++++++++++++++++++
 arch/x86/entry/entry_64.S         |    2 +
 arch/x86/include/asm/barrier.h    |   28 +++++++
 arch/x86/include/asm/msr.h        |    3 -
 arch/x86/include/asm/smap.h       |   24 ++++++
 arch/x86/include/asm/uaccess.h    |   15 +++-
 arch/x86/include/asm/uaccess_32.h |    6 +-
 arch/x86/include/asm/uaccess_64.h |   12 ++-
 arch/x86/kvm/vmx.c                |   11 ++-
 arch/x86/lib/getuser.S            |    5 +
 arch/x86/lib/usercopy_32.c        |    8 +-
 include/linux/fdtable.h           |    7 +-
 include/linux/nospec.h            |   65 +++++++++++++++++
 net/wireless/nl80211.c            |   10 ++-
 14 files changed, 312 insertions(+), 27 deletions(-)
 create mode 100644 Documentation/speculation.txt
 create mode 100644 include/linux/nospec.h

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] [PATCH v4.1 00/10] spectre variant1 mitigations for tip/x86/pti
@ 2018-01-20 21:05 ` Dan Williams
  0 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2018-01-20 21:05 UTC (permalink / raw)
  To: tglx
  Cc: Mark Rutland, kernel-hardening, Peter Zijlstra, Catalin Marinas,
	Will Deacon, H. Peter Anvin, Elena Reshetova, linux-arch,
	Andi Kleen, Jonathan Corbet, x86, Russell King, Ingo Molnar,
	Andrew Honig, alan, Tom Lendacky, Kees Cook, Al Viro,
	Andy Lutomirski, Jim Mattson, Christian Lamparter, gregkh,
	linux-wireless, Paolo Bonzini, Johannes Berg, torvalds,
	David S. Miller

Hi Thomas,

Please consider taking this collection of spectre-v1 mitigations through
the tip/x86/pti branch for 4.16 inclusion. The review feedback has
dropped off considerably, so I believe these patches are ready for -tip
inclusion. However, "nl80211: sanitize array index in parse_txq_params"
stands out as a patch that should get an ack from net/wireless folks
before moving forward.

Also a heads up that x86/pti is missing commit 75f139aaf896 "KVM: x86:
Add memory barrier on vmcs field lookup", so you will hit a trivial
conflict merging "kvm, x86: update spectre-v1 mitigation" from this set
against latest mainline.

The infrastructure includes:

* __uaccess_begin_nospec: similar to __uaccess_begin this invokes
'stac', but it also includes an 'ifence'. After an 'access_ok' check
has speculatively succeeded that result needs to be retired before the
user pointer is de-referenced. '__get_user' can't use the pointer
sanitization approach without redoing the 'access_ok' check, so per
Linus [1] just use 'ifence'.

* MASK_NOSPEC: an assembler macro for x86 'get_user' and syscall entry
that sanitizes a user controlled pointer or array index to zero after a
'cmp %limit %val' instruction sets the CF flag.

* array_ptr: When dereferencing a kernel pointer with a user controlled
index, sanitize the pointer to either NULL or valid addresses under
speculation to eliminate a precondition for Spectre variant1 attacks.
It uses a mask generation technique that does not involve speculative
control flows on either x86 or ARM64 [2].

* x86 array_ptr_mask: Achieve the same effect as the default
'array_ptr_mask' in fewer instructions. This approach does not have the
same "array index and limit must be less than LONG_MAX" constraint as
the default mask.

* array_idx: Similar to 'array_ptr', use a mask to return a valid
pointer or NULL to an array index variable. An example where we need
this is the wireless driver stack where the core sanitizes user input
and the actual usage of the array index is in a different compilation
unit in the low-level driver.

[1]: https://lkml.org/lkml/2018/1/17/929
[2]: https://www.spinics.net/lists/netdev/msg477542.html

---

Dan Williams (9):
      asm/nospec, array_ptr: sanitize speculative array de-references
      x86: implement array_ptr_mask()
      x86: introduce __uaccess_begin_nospec and ifence
      x86, __get_user: use __uaccess_begin_nospec
      x86, get_user: use pointer masking to limit speculation
      x86: narrow out of bounds syscalls to sys_read under speculation
      vfs, fdtable: prevent bounds-check bypass via speculative execution
      kvm, x86: update spectre-v1 mitigation
      nl80211: sanitize array index in parse_txq_params

Mark Rutland (1):
      Documentation: document array_ptr


 Documentation/speculation.txt     |  143 +++++++++++++++++++++++++++++++++++++
 arch/x86/entry/entry_64.S         |    2 +
 arch/x86/include/asm/barrier.h    |   28 +++++++
 arch/x86/include/asm/msr.h        |    3 -
 arch/x86/include/asm/smap.h       |   24 ++++++
 arch/x86/include/asm/uaccess.h    |   15 +++-
 arch/x86/include/asm/uaccess_32.h |    6 +-
 arch/x86/include/asm/uaccess_64.h |   12 ++-
 arch/x86/kvm/vmx.c                |   11 ++-
 arch/x86/lib/getuser.S            |    5 +
 arch/x86/lib/usercopy_32.c        |    8 +-
 include/linux/fdtable.h           |    7 +-
 include/linux/nospec.h            |   65 +++++++++++++++++
 net/wireless/nl80211.c            |   10 ++-
 14 files changed, 312 insertions(+), 27 deletions(-)
 create mode 100644 Documentation/speculation.txt
 create mode 100644 include/linux/nospec.h

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v4.1 01/10] Documentation: document array_ptr
  2018-01-20 21:05 ` Dan Williams
@ 2018-01-20 21:06   ` Dan Williams
  -1 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2018-01-20 21:06 UTC (permalink / raw)
  To: tglx
  Cc: Mark Rutland, linux-arch, Kees Cook, kernel-hardening,
	Peter Zijlstra, gregkh, Jonathan Corbet, Will Deacon, torvalds,
	alan

From: Mark Rutland <mark.rutland@arm.com>

Document the rationale and usage of the new array_ptr() helper.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Documentation/speculation.txt |  143 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 143 insertions(+)
 create mode 100644 Documentation/speculation.txt

diff --git a/Documentation/speculation.txt b/Documentation/speculation.txt
new file mode 100644
index 000000000000..a47fbffe0dab
--- /dev/null
+++ b/Documentation/speculation.txt
@@ -0,0 +1,143 @@
+This document explains potential effects of speculation, and how undesirable
+effects can be mitigated portably using common APIs.
+
+===========
+Speculation
+===========
+
+To improve performance and minimize average latencies, many contemporary CPUs
+employ speculative execution techniques such as branch prediction, performing
+work which may be discarded at a later stage.
+
+Typically speculative execution cannot be observed from architectural state,
+such as the contents of registers. However, in some cases it is possible to
+observe its impact on microarchitectural state, such as the presence or
+absence of data in caches. Such state may form side-channels which can be
+observed to extract secret information.
+
+For example, in the presence of branch prediction, it is possible for bounds
+checks to be ignored by code which is speculatively executed. Consider the
+following code:
+
+	int load_array(int *array, unsigned int idx)
+	{
+		if (idx >= MAX_ARRAY_ELEMS)
+			return 0;
+		else
+			return array[idx];
+	}
+
+Which, on arm64, may be compiled to an assembly sequence such as:
+
+	CMP	<idx>, #MAX_ARRAY_ELEMS
+	B.LT	less
+	MOV	<returnval>, #0
+	RET
+  less:
+	LDR	<returnval>, [<array>, <idx>]
+	RET
+
+It is possible that a CPU mis-predicts the conditional branch, and
+speculatively loads array[idx], even if idx >= MAX_ARRAY_ELEMS. This value
+will subsequently be discarded, but the speculated load may affect
+microarchitectural state which can be subsequently measured.
+
+More complex sequences involving multiple dependent memory accesses may result
+in sensitive information being leaked. Consider the following code, building
+on the prior example:
+
+	int load_dependent_arrays(int *arr1, int *arr2, int idx)
+	{
+		int val1, val2,
+
+		val1 = load_array(arr1, idx);
+		val2 = load_array(arr2, val1);
+
+		return val2;
+	}
+
+Under speculation, the first call to load_array() may return the value of an
+out-of-bounds address, while the second call will influence microarchitectural
+state dependent on this value. This may provide an arbitrary read primitive.
+
+====================================
+Mitigating speculation side-channels
+====================================
+
+The kernel provides a generic API to ensure that bounds checks are respected
+even under speculation. Architectures which are affected by speculation-based
+side-channels are expected to implement these primitives.
+
+The array_ptr() helper in <asm/barrier.h> can be used to prevent
+information from being leaked via side-channels.
+
+A call to array_ptr(arr, idx, sz) returns a sanitized pointer to
+arr[idx] only if idx falls in the [0, sz) interval. When idx < 0 or idx > sz,
+NULL is returned. Additionally, array_ptr() of an out-of-bounds pointer is
+not propagated to code which is speculatively executed.
+
+This can be used to protect the earlier load_array() example:
+
+	int load_array(int *array, unsigned int idx)
+	{
+		int *elem;
+
+		elem = array_ptr(array, idx, MAX_ARRAY_ELEMS);
+		if (elem)
+			return *elem;
+		else
+			return 0;
+	}
+
+This can also be used in situations where multiple fields on a structure are
+accessed:
+
+	struct foo array[SIZE];
+	int a, b;
+
+	void do_thing(int idx)
+	{
+		struct foo *elem;
+
+		elem = array_ptr(array, idx, SIZE);
+		if (elem) {
+			a = elem->field_a;
+			b = elem->field_b;
+		}
+	}
+
+It is imperative that the returned pointer is used. Pointers which are
+generated separately are subject to a number of potential CPU and compiler
+optimizations, and may still be used speculatively. For example, this means
+that the following sequence is unsafe:
+
+	struct foo array[SIZE];
+	int a, b;
+
+	void do_thing(int idx)
+	{
+		if (array_ptr(array, idx, SIZE) != NULL) {
+			// unsafe as wrong pointer is used
+			a = array[idx].field_a;
+			b = array[idx].field_b;
+		}
+	}
+
+Similarly, it is unsafe to compare the returned pointer with other pointers,
+as this may permit the compiler to substitute one pointer with another,
+permitting speculation. For example, the following sequence is unsafe:
+
+	struct foo array[SIZE];
+	int a, b;
+
+	void do_thing(int idx)
+	{
+		struct foo *elem = array_ptr(array, idx, size);
+
+		// unsafe due to pointer substitution
+		if (elem == &array[idx]) {
+			a = elem->field_a;
+			b = elem->field_b;
+		}
+	}
+

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [kernel-hardening] [PATCH v4.1 01/10] Documentation: document array_ptr
@ 2018-01-20 21:06   ` Dan Williams
  0 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2018-01-20 21:06 UTC (permalink / raw)
  To: tglx
  Cc: Mark Rutland, linux-arch, Kees Cook, kernel-hardening,
	Peter Zijlstra, gregkh, Jonathan Corbet, Will Deacon, torvalds,
	alan

From: Mark Rutland <mark.rutland@arm.com>

Document the rationale and usage of the new array_ptr() helper.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Documentation/speculation.txt |  143 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 143 insertions(+)
 create mode 100644 Documentation/speculation.txt

diff --git a/Documentation/speculation.txt b/Documentation/speculation.txt
new file mode 100644
index 000000000000..a47fbffe0dab
--- /dev/null
+++ b/Documentation/speculation.txt
@@ -0,0 +1,143 @@
+This document explains potential effects of speculation, and how undesirable
+effects can be mitigated portably using common APIs.
+
+===========
+Speculation
+===========
+
+To improve performance and minimize average latencies, many contemporary CPUs
+employ speculative execution techniques such as branch prediction, performing
+work which may be discarded at a later stage.
+
+Typically speculative execution cannot be observed from architectural state,
+such as the contents of registers. However, in some cases it is possible to
+observe its impact on microarchitectural state, such as the presence or
+absence of data in caches. Such state may form side-channels which can be
+observed to extract secret information.
+
+For example, in the presence of branch prediction, it is possible for bounds
+checks to be ignored by code which is speculatively executed. Consider the
+following code:
+
+	int load_array(int *array, unsigned int idx)
+	{
+		if (idx >= MAX_ARRAY_ELEMS)
+			return 0;
+		else
+			return array[idx];
+	}
+
+Which, on arm64, may be compiled to an assembly sequence such as:
+
+	CMP	<idx>, #MAX_ARRAY_ELEMS
+	B.LT	less
+	MOV	<returnval>, #0
+	RET
+  less:
+	LDR	<returnval>, [<array>, <idx>]
+	RET
+
+It is possible that a CPU mis-predicts the conditional branch, and
+speculatively loads array[idx], even if idx >= MAX_ARRAY_ELEMS. This value
+will subsequently be discarded, but the speculated load may affect
+microarchitectural state which can be subsequently measured.
+
+More complex sequences involving multiple dependent memory accesses may result
+in sensitive information being leaked. Consider the following code, building
+on the prior example:
+
+	int load_dependent_arrays(int *arr1, int *arr2, int idx)
+	{
+		int val1, val2,
+
+		val1 = load_array(arr1, idx);
+		val2 = load_array(arr2, val1);
+
+		return val2;
+	}
+
+Under speculation, the first call to load_array() may return the value of an
+out-of-bounds address, while the second call will influence microarchitectural
+state dependent on this value. This may provide an arbitrary read primitive.
+
+====================================
+Mitigating speculation side-channels
+====================================
+
+The kernel provides a generic API to ensure that bounds checks are respected
+even under speculation. Architectures which are affected by speculation-based
+side-channels are expected to implement these primitives.
+
+The array_ptr() helper in <asm/barrier.h> can be used to prevent
+information from being leaked via side-channels.
+
+A call to array_ptr(arr, idx, sz) returns a sanitized pointer to
+arr[idx] only if idx falls in the [0, sz) interval. When idx < 0 or idx > sz,
+NULL is returned. Additionally, array_ptr() of an out-of-bounds pointer is
+not propagated to code which is speculatively executed.
+
+This can be used to protect the earlier load_array() example:
+
+	int load_array(int *array, unsigned int idx)
+	{
+		int *elem;
+
+		elem = array_ptr(array, idx, MAX_ARRAY_ELEMS);
+		if (elem)
+			return *elem;
+		else
+			return 0;
+	}
+
+This can also be used in situations where multiple fields on a structure are
+accessed:
+
+	struct foo array[SIZE];
+	int a, b;
+
+	void do_thing(int idx)
+	{
+		struct foo *elem;
+
+		elem = array_ptr(array, idx, SIZE);
+		if (elem) {
+			a = elem->field_a;
+			b = elem->field_b;
+		}
+	}
+
+It is imperative that the returned pointer is used. Pointers which are
+generated separately are subject to a number of potential CPU and compiler
+optimizations, and may still be used speculatively. For example, this means
+that the following sequence is unsafe:
+
+	struct foo array[SIZE];
+	int a, b;
+
+	void do_thing(int idx)
+	{
+		if (array_ptr(array, idx, SIZE) != NULL) {
+			// unsafe as wrong pointer is used
+			a = array[idx].field_a;
+			b = array[idx].field_b;
+		}
+	}
+
+Similarly, it is unsafe to compare the returned pointer with other pointers,
+as this may permit the compiler to substitute one pointer with another,
+permitting speculation. For example, the following sequence is unsafe:
+
+	struct foo array[SIZE];
+	int a, b;
+
+	void do_thing(int idx)
+	{
+		struct foo *elem = array_ptr(array, idx, size);
+
+		// unsafe due to pointer substitution
+		if (elem == &array[idx]) {
+			a = elem->field_a;
+			b = elem->field_b;
+		}
+	}
+

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4.1 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
  2018-01-20 21:05 ` Dan Williams
@ 2018-01-20 21:06   ` Dan Williams
  -1 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2018-01-20 21:06 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, kernel-hardening, Peter Zijlstra, Catalin Marinas,
	x86, Will Deacon, Russell King, Ingo Molnar, gregkh,
	H. Peter Anvin, torvalds, alan

'array_ptr' is proposed as a generic mechanism to mitigate against
Spectre-variant-1 attacks, i.e. an attack that bypasses boundary checks
via speculative execution). The 'array_ptr' implementation is expected
to be safe for current generation cpus across multiple architectures
(ARM, x86).

Based on an original implementation by Linus Torvalds, tweaked to remove
speculative flows by Alexei Starovoitov, and tweaked again by Linus to
introduce an x86 assembly implementation for the mask generation.

Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org>
Co-developed-by: Alexei Starovoitov <ast@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/nospec.h |   44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)
 create mode 100644 include/linux/nospec.h

diff --git a/include/linux/nospec.h b/include/linux/nospec.h
new file mode 100644
index 000000000000..dd3aa05fab87
--- /dev/null
+++ b/include/linux/nospec.h
@@ -0,0 +1,44 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright(c) 2018 Intel Corporation. All rights reserved.
+
+#ifndef __NOSPEC_H__
+#define __NOSPEC_H__
+
+#include <linux/jump_label.h>
+#include <asm/barrier.h>
+
+/*
+ * If idx is negative or if idx > size then bit 63 is set in the mask,
+ * and the value of ~(-1L) is zero. When the mask is zero, bounds check
+ * failed, array_ptr will return NULL.
+ */
+#ifndef array_ptr_mask
+static inline unsigned long array_ptr_mask(unsigned long idx, unsigned long sz)
+{
+	return ~(long)(idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1);
+}
+#endif
+
+/**
+ * array_ptr - Generate a pointer to an array element, ensuring
+ * the pointer is bounded under speculation to NULL.
+ *
+ * @base: the base of the array
+ * @idx: the index of the element, must be less than LONG_MAX
+ * @sz: the number of elements in the array, must be less than LONG_MAX
+ *
+ * If @idx falls in the interval [0, @sz), returns the pointer to
+ * @arr[@idx], otherwise returns NULL.
+ */
+#define array_ptr(base, idx, sz)					\
+({									\
+	union { typeof(*(base)) *_ptr; unsigned long _bit; } __u;	\
+	typeof(*(base)) *_arr = (base);					\
+	unsigned long _i = (idx);					\
+	unsigned long _mask = array_ptr_mask(_i, (sz));			\
+									\
+	__u._ptr = _arr + _i;						\
+	__u._bit &= _mask;						\
+	__u._ptr;							\
+})
+#endif /* __NOSPEC_H__ */

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [kernel-hardening] [PATCH v4.1 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
@ 2018-01-20 21:06   ` Dan Williams
  0 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2018-01-20 21:06 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, kernel-hardening, Peter Zijlstra, Catalin Marinas,
	x86, Will Deacon, Russell King, Ingo Molnar, gregkh,
	H. Peter Anvin, torvalds, alan

'array_ptr' is proposed as a generic mechanism to mitigate against
Spectre-variant-1 attacks, i.e. an attack that bypasses boundary checks
via speculative execution). The 'array_ptr' implementation is expected
to be safe for current generation cpus across multiple architectures
(ARM, x86).

Based on an original implementation by Linus Torvalds, tweaked to remove
speculative flows by Alexei Starovoitov, and tweaked again by Linus to
introduce an x86 assembly implementation for the mask generation.

Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org>
Co-developed-by: Alexei Starovoitov <ast@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/nospec.h |   44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)
 create mode 100644 include/linux/nospec.h

diff --git a/include/linux/nospec.h b/include/linux/nospec.h
new file mode 100644
index 000000000000..dd3aa05fab87
--- /dev/null
+++ b/include/linux/nospec.h
@@ -0,0 +1,44 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright(c) 2018 Intel Corporation. All rights reserved.
+
+#ifndef __NOSPEC_H__
+#define __NOSPEC_H__
+
+#include <linux/jump_label.h>
+#include <asm/barrier.h>
+
+/*
+ * If idx is negative or if idx > size then bit 63 is set in the mask,
+ * and the value of ~(-1L) is zero. When the mask is zero, bounds check
+ * failed, array_ptr will return NULL.
+ */
+#ifndef array_ptr_mask
+static inline unsigned long array_ptr_mask(unsigned long idx, unsigned long sz)
+{
+	return ~(long)(idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1);
+}
+#endif
+
+/**
+ * array_ptr - Generate a pointer to an array element, ensuring
+ * the pointer is bounded under speculation to NULL.
+ *
+ * @base: the base of the array
+ * @idx: the index of the element, must be less than LONG_MAX
+ * @sz: the number of elements in the array, must be less than LONG_MAX
+ *
+ * If @idx falls in the interval [0, @sz), returns the pointer to
+ * @arr[@idx], otherwise returns NULL.
+ */
+#define array_ptr(base, idx, sz)					\
+({									\
+	union { typeof(*(base)) *_ptr; unsigned long _bit; } __u;	\
+	typeof(*(base)) *_arr = (base);					\
+	unsigned long _i = (idx);					\
+	unsigned long _mask = array_ptr_mask(_i, (sz));			\
+									\
+	__u._ptr = _arr + _i;						\
+	__u._bit &= _mask;						\
+	__u._ptr;							\
+})
+#endif /* __NOSPEC_H__ */

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4.1 03/10] x86: implement array_ptr_mask()
  2018-01-20 21:05 ` Dan Williams
@ 2018-01-20 21:06   ` Dan Williams
  -1 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2018-01-20 21:06 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, kernel-hardening, gregkh, x86, Ingo Molnar,
	H. Peter Anvin, torvalds, alan

'array_ptr' uses a mask to sanitize user controllable pointers.  The x86
'array_ptr_mask' is an assembler optimized way to generate a 0 or ~0
mask if an array index is out-of-bounds or in-bounds.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/barrier.h |   24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 01727dbc294a..e8fd92008eab 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -24,6 +24,30 @@
 #define wmb()	asm volatile("sfence" ::: "memory")
 #endif
 
+/**
+ * array_ptr_mask - generate a mask for array_ptr() that is ~0UL when
+ * the bounds check succeeds and 0 otherwise
+ */
+#define array_ptr_mask array_ptr_mask
+static inline unsigned long array_ptr_mask(unsigned long idx, unsigned long sz)
+{
+	unsigned long mask;
+
+	/*
+	 * mask = index - size, if that result is >= 0 then the index is
+	 * invalid and the mask is 0 else ~0
+	 */
+#ifdef CONFIG_X86_32
+	asm ("cmpl %1,%2; sbbl %0,%0;"
+#else
+	asm ("cmpq %1,%2; sbbq %0,%0;"
+#endif
+			:"=r" (mask)
+			:"r"(sz),"r" (idx)
+			:"cc");
+	return mask;
+}
+
 #ifdef CONFIG_X86_PPRO_FENCE
 #define dma_rmb()	rmb()
 #else

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [kernel-hardening] [PATCH v4.1 03/10] x86: implement array_ptr_mask()
@ 2018-01-20 21:06   ` Dan Williams
  0 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2018-01-20 21:06 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, kernel-hardening, gregkh, x86, Ingo Molnar,
	H. Peter Anvin, torvalds, alan

'array_ptr' uses a mask to sanitize user controllable pointers.  The x86
'array_ptr_mask' is an assembler optimized way to generate a 0 or ~0
mask if an array index is out-of-bounds or in-bounds.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/barrier.h |   24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 01727dbc294a..e8fd92008eab 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -24,6 +24,30 @@
 #define wmb()	asm volatile("sfence" ::: "memory")
 #endif
 
+/**
+ * array_ptr_mask - generate a mask for array_ptr() that is ~0UL when
+ * the bounds check succeeds and 0 otherwise
+ */
+#define array_ptr_mask array_ptr_mask
+static inline unsigned long array_ptr_mask(unsigned long idx, unsigned long sz)
+{
+	unsigned long mask;
+
+	/*
+	 * mask = index - size, if that result is >= 0 then the index is
+	 * invalid and the mask is 0 else ~0
+	 */
+#ifdef CONFIG_X86_32
+	asm ("cmpl %1,%2; sbbl %0,%0;"
+#else
+	asm ("cmpq %1,%2; sbbq %0,%0;"
+#endif
+			:"=r" (mask)
+			:"r"(sz),"r" (idx)
+			:"cc");
+	return mask;
+}
+
 #ifdef CONFIG_X86_PPRO_FENCE
 #define dma_rmb()	rmb()
 #else

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4.1 04/10] x86: introduce __uaccess_begin_nospec and ifence
  2018-01-20 21:05 ` Dan Williams
@ 2018-01-20 21:06   ` Dan Williams
  -1 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2018-01-20 21:06 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, Tom Lendacky, Andi Kleen, Kees Cook,
	kernel-hardening, gregkh, x86, Ingo Molnar, Al Viro,
	H. Peter Anvin, torvalds, alan

For '__get_user' paths, do not allow the kernel to speculate on the
value of a user controlled pointer. In addition to the 'stac'
instruction for Supervisor Mode Access Protection, an 'ifence' causes
the 'access_ok' result to resolve in the pipeline before the cpu might
take any speculative action on the pointer value.

Since __get_user is a major kernel interface that deals with user
controlled pointers, the '__uaccess_begin_nospec' mechanism will prevent
speculative execution past an 'access_ok' permission check. While
speculative execution past 'access_ok' is not enough to lead to a kernel
memory leak, it is a necessary precondition.

To be clear, '__uaccess_begin_nospec' is addressing a class of potential
problems near '__get_user' usages.

Note, that while ifence is used to protect '__get_user', pointer masking
will be used for 'get_user' since it incorporates a bounds check near
the usage.

There are no functional changes in this patch.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Andi Kleen <ak@linux.intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/barrier.h |    4 ++++
 arch/x86/include/asm/msr.h     |    3 +--
 arch/x86/include/asm/uaccess.h |    9 +++++++++
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index e8fd92008eab..6c9c81e8049f 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -48,6 +48,10 @@ static inline unsigned long array_ptr_mask(unsigned long idx, unsigned long sz)
 	return mask;
 }
 
+/* prevent speculative execution past this barrier */
+#define ifence() alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC, \
+				   "lfence", X86_FEATURE_LFENCE_RDTSC)
+
 #ifdef CONFIG_X86_PPRO_FENCE
 #define dma_rmb()	rmb()
 #else
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 07962f5f6fba..e426d2a33ff3 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -214,8 +214,7 @@ static __always_inline unsigned long long rdtsc_ordered(void)
 	 * that some other imaginary CPU is updating continuously with a
 	 * time stamp.
 	 */
-	alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC,
-			  "lfence", X86_FEATURE_LFENCE_RDTSC);
+	ifence();
 	return rdtsc();
 }
 
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 574dff4d2913..626caf58183a 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -124,6 +124,11 @@ extern int __get_user_bad(void);
 
 #define __uaccess_begin() stac()
 #define __uaccess_end()   clac()
+#define __uaccess_begin_nospec()	\
+({					\
+	stac();				\
+	ifence();			\
+})
 
 /*
  * This is a type: either unsigned long, if the argument fits into
@@ -487,6 +492,10 @@ struct __large_struct { unsigned long buf[100]; };
 	__uaccess_begin();						\
 	barrier();
 
+#define uaccess_try_nospec do {						\
+	current->thread.uaccess_err = 0;				\
+	__uaccess_begin_nospec();					\
+
 #define uaccess_catch(err)						\
 	__uaccess_end();						\
 	(err) |= (current->thread.uaccess_err ? -EFAULT : 0);		\

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [kernel-hardening] [PATCH v4.1 04/10] x86: introduce __uaccess_begin_nospec and ifence
@ 2018-01-20 21:06   ` Dan Williams
  0 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2018-01-20 21:06 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, Tom Lendacky, Andi Kleen, Kees Cook,
	kernel-hardening, gregkh, x86, Ingo Molnar, Al Viro,
	H. Peter Anvin, torvalds, alan

For '__get_user' paths, do not allow the kernel to speculate on the
value of a user controlled pointer. In addition to the 'stac'
instruction for Supervisor Mode Access Protection, an 'ifence' causes
the 'access_ok' result to resolve in the pipeline before the cpu might
take any speculative action on the pointer value.

Since __get_user is a major kernel interface that deals with user
controlled pointers, the '__uaccess_begin_nospec' mechanism will prevent
speculative execution past an 'access_ok' permission check. While
speculative execution past 'access_ok' is not enough to lead to a kernel
memory leak, it is a necessary precondition.

To be clear, '__uaccess_begin_nospec' is addressing a class of potential
problems near '__get_user' usages.

Note, that while ifence is used to protect '__get_user', pointer masking
will be used for 'get_user' since it incorporates a bounds check near
the usage.

There are no functional changes in this patch.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Andi Kleen <ak@linux.intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/barrier.h |    4 ++++
 arch/x86/include/asm/msr.h     |    3 +--
 arch/x86/include/asm/uaccess.h |    9 +++++++++
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index e8fd92008eab..6c9c81e8049f 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -48,6 +48,10 @@ static inline unsigned long array_ptr_mask(unsigned long idx, unsigned long sz)
 	return mask;
 }
 
+/* prevent speculative execution past this barrier */
+#define ifence() alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC, \
+				   "lfence", X86_FEATURE_LFENCE_RDTSC)
+
 #ifdef CONFIG_X86_PPRO_FENCE
 #define dma_rmb()	rmb()
 #else
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 07962f5f6fba..e426d2a33ff3 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -214,8 +214,7 @@ static __always_inline unsigned long long rdtsc_ordered(void)
 	 * that some other imaginary CPU is updating continuously with a
 	 * time stamp.
 	 */
-	alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC,
-			  "lfence", X86_FEATURE_LFENCE_RDTSC);
+	ifence();
 	return rdtsc();
 }
 
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 574dff4d2913..626caf58183a 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -124,6 +124,11 @@ extern int __get_user_bad(void);
 
 #define __uaccess_begin() stac()
 #define __uaccess_end()   clac()
+#define __uaccess_begin_nospec()	\
+({					\
+	stac();				\
+	ifence();			\
+})
 
 /*
  * This is a type: either unsigned long, if the argument fits into
@@ -487,6 +492,10 @@ struct __large_struct { unsigned long buf[100]; };
 	__uaccess_begin();						\
 	barrier();
 
+#define uaccess_try_nospec do {						\
+	current->thread.uaccess_err = 0;				\
+	__uaccess_begin_nospec();					\
+
 #define uaccess_catch(err)						\
 	__uaccess_end();						\
 	(err) |= (current->thread.uaccess_err ? -EFAULT : 0);		\

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4.1 05/10] x86, __get_user: use __uaccess_begin_nospec
  2018-01-20 21:05 ` Dan Williams
@ 2018-01-20 21:06   ` Dan Williams
  -1 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2018-01-20 21:06 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, Andi Kleen, Kees Cook, kernel-hardening, gregkh, x86,
	Ingo Molnar, Al Viro, H. Peter Anvin, torvalds, alan

Quoting Linus:

    I do think that it would be a good idea to very expressly document
    the fact that it's not that the user access itself is unsafe. I do
    agree that things like "get_user()" want to be protected, but not
    because of any direct bugs or problems with get_user() and friends,
    but simply because get_user() is an excellent source of a pointer
    that is obviously controlled from a potentially attacking user
    space. So it's a prime candidate for then finding _subsequent_
    accesses that can then be used to perturb the cache.

'__uaccess_begin_nospec' covers '__get_user' and 'copy_from_iter' where
the limit check is far away from the user pointer de-reference. In those
cases an 'lfence' prevents speculation with a potential pointer to
privileged memory.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Andi Kleen <ak@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/uaccess.h    |    6 +++---
 arch/x86/include/asm/uaccess_32.h |    6 +++---
 arch/x86/include/asm/uaccess_64.h |   12 ++++++------
 arch/x86/lib/usercopy_32.c        |    8 ++++----
 4 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 626caf58183a..a930585fa3b5 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -450,7 +450,7 @@ do {									\
 ({									\
 	int __gu_err;							\
 	__inttype(*(ptr)) __gu_val;					\
-	__uaccess_begin();						\
+	__uaccess_begin_nospec();					\
 	__get_user_size(__gu_val, (ptr), (size), __gu_err, -EFAULT);	\
 	__uaccess_end();						\
 	(x) = (__force __typeof__(*(ptr)))__gu_val;			\
@@ -557,7 +557,7 @@ struct __large_struct { unsigned long buf[100]; };
  *	get_user_ex(...);
  * } get_user_catch(err)
  */
-#define get_user_try		uaccess_try
+#define get_user_try		uaccess_try_nospec
 #define get_user_catch(err)	uaccess_catch(err)
 
 #define get_user_ex(x, ptr)	do {					\
@@ -591,7 +591,7 @@ extern void __cmpxchg_wrong_size(void)
 	__typeof__(ptr) __uval = (uval);				\
 	__typeof__(*(ptr)) __old = (old);				\
 	__typeof__(*(ptr)) __new = (new);				\
-	__uaccess_begin();						\
+	__uaccess_begin_nospec();					\
 	switch (size) {							\
 	case 1:								\
 	{								\
diff --git a/arch/x86/include/asm/uaccess_32.h b/arch/x86/include/asm/uaccess_32.h
index 72950401b223..ba2dc1930630 100644
--- a/arch/x86/include/asm/uaccess_32.h
+++ b/arch/x86/include/asm/uaccess_32.h
@@ -29,21 +29,21 @@ raw_copy_from_user(void *to, const void __user *from, unsigned long n)
 		switch (n) {
 		case 1:
 			ret = 0;
-			__uaccess_begin();
+			__uaccess_begin_nospec();
 			__get_user_asm_nozero(*(u8 *)to, from, ret,
 					      "b", "b", "=q", 1);
 			__uaccess_end();
 			return ret;
 		case 2:
 			ret = 0;
-			__uaccess_begin();
+			__uaccess_begin_nospec();
 			__get_user_asm_nozero(*(u16 *)to, from, ret,
 					      "w", "w", "=r", 2);
 			__uaccess_end();
 			return ret;
 		case 4:
 			ret = 0;
-			__uaccess_begin();
+			__uaccess_begin_nospec();
 			__get_user_asm_nozero(*(u32 *)to, from, ret,
 					      "l", "k", "=r", 4);
 			__uaccess_end();
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index f07ef3c575db..62546b3a398e 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -55,31 +55,31 @@ raw_copy_from_user(void *dst, const void __user *src, unsigned long size)
 		return copy_user_generic(dst, (__force void *)src, size);
 	switch (size) {
 	case 1:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u8 *)dst, (u8 __user *)src,
 			      ret, "b", "b", "=q", 1);
 		__uaccess_end();
 		return ret;
 	case 2:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u16 *)dst, (u16 __user *)src,
 			      ret, "w", "w", "=r", 2);
 		__uaccess_end();
 		return ret;
 	case 4:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u32 *)dst, (u32 __user *)src,
 			      ret, "l", "k", "=r", 4);
 		__uaccess_end();
 		return ret;
 	case 8:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
 			      ret, "q", "", "=r", 8);
 		__uaccess_end();
 		return ret;
 	case 10:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
 			       ret, "q", "", "=r", 10);
 		if (likely(!ret))
@@ -89,7 +89,7 @@ raw_copy_from_user(void *dst, const void __user *src, unsigned long size)
 		__uaccess_end();
 		return ret;
 	case 16:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
 			       ret, "q", "", "=r", 16);
 		if (likely(!ret))
diff --git a/arch/x86/lib/usercopy_32.c b/arch/x86/lib/usercopy_32.c
index 1b377f734e64..7add8ba06887 100644
--- a/arch/x86/lib/usercopy_32.c
+++ b/arch/x86/lib/usercopy_32.c
@@ -331,12 +331,12 @@ do {									\
 
 unsigned long __copy_user_ll(void *to, const void *from, unsigned long n)
 {
-	stac();
+	__uaccess_begin_nospec();
 	if (movsl_is_ok(to, from, n))
 		__copy_user(to, from, n);
 	else
 		n = __copy_user_intel(to, from, n);
-	clac();
+	__uaccess_end();
 	return n;
 }
 EXPORT_SYMBOL(__copy_user_ll);
@@ -344,7 +344,7 @@ EXPORT_SYMBOL(__copy_user_ll);
 unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *from,
 					unsigned long n)
 {
-	stac();
+	__uaccess_begin_nospec();
 #ifdef CONFIG_X86_INTEL_USERCOPY
 	if (n > 64 && static_cpu_has(X86_FEATURE_XMM2))
 		n = __copy_user_intel_nocache(to, from, n);
@@ -353,7 +353,7 @@ unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *fr
 #else
 	__copy_user(to, from, n);
 #endif
-	clac();
+	__uaccess_end();
 	return n;
 }
 EXPORT_SYMBOL(__copy_from_user_ll_nocache_nozero);

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [kernel-hardening] [PATCH v4.1 05/10] x86, __get_user: use __uaccess_begin_nospec
@ 2018-01-20 21:06   ` Dan Williams
  0 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2018-01-20 21:06 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, Andi Kleen, Kees Cook, kernel-hardening, gregkh, x86,
	Ingo Molnar, Al Viro, H. Peter Anvin, torvalds, alan

Quoting Linus:

    I do think that it would be a good idea to very expressly document
    the fact that it's not that the user access itself is unsafe. I do
    agree that things like "get_user()" want to be protected, but not
    because of any direct bugs or problems with get_user() and friends,
    but simply because get_user() is an excellent source of a pointer
    that is obviously controlled from a potentially attacking user
    space. So it's a prime candidate for then finding _subsequent_
    accesses that can then be used to perturb the cache.

'__uaccess_begin_nospec' covers '__get_user' and 'copy_from_iter' where
the limit check is far away from the user pointer de-reference. In those
cases an 'lfence' prevents speculation with a potential pointer to
privileged memory.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Andi Kleen <ak@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/uaccess.h    |    6 +++---
 arch/x86/include/asm/uaccess_32.h |    6 +++---
 arch/x86/include/asm/uaccess_64.h |   12 ++++++------
 arch/x86/lib/usercopy_32.c        |    8 ++++----
 4 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 626caf58183a..a930585fa3b5 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -450,7 +450,7 @@ do {									\
 ({									\
 	int __gu_err;							\
 	__inttype(*(ptr)) __gu_val;					\
-	__uaccess_begin();						\
+	__uaccess_begin_nospec();					\
 	__get_user_size(__gu_val, (ptr), (size), __gu_err, -EFAULT);	\
 	__uaccess_end();						\
 	(x) = (__force __typeof__(*(ptr)))__gu_val;			\
@@ -557,7 +557,7 @@ struct __large_struct { unsigned long buf[100]; };
  *	get_user_ex(...);
  * } get_user_catch(err)
  */
-#define get_user_try		uaccess_try
+#define get_user_try		uaccess_try_nospec
 #define get_user_catch(err)	uaccess_catch(err)
 
 #define get_user_ex(x, ptr)	do {					\
@@ -591,7 +591,7 @@ extern void __cmpxchg_wrong_size(void)
 	__typeof__(ptr) __uval = (uval);				\
 	__typeof__(*(ptr)) __old = (old);				\
 	__typeof__(*(ptr)) __new = (new);				\
-	__uaccess_begin();						\
+	__uaccess_begin_nospec();					\
 	switch (size) {							\
 	case 1:								\
 	{								\
diff --git a/arch/x86/include/asm/uaccess_32.h b/arch/x86/include/asm/uaccess_32.h
index 72950401b223..ba2dc1930630 100644
--- a/arch/x86/include/asm/uaccess_32.h
+++ b/arch/x86/include/asm/uaccess_32.h
@@ -29,21 +29,21 @@ raw_copy_from_user(void *to, const void __user *from, unsigned long n)
 		switch (n) {
 		case 1:
 			ret = 0;
-			__uaccess_begin();
+			__uaccess_begin_nospec();
 			__get_user_asm_nozero(*(u8 *)to, from, ret,
 					      "b", "b", "=q", 1);
 			__uaccess_end();
 			return ret;
 		case 2:
 			ret = 0;
-			__uaccess_begin();
+			__uaccess_begin_nospec();
 			__get_user_asm_nozero(*(u16 *)to, from, ret,
 					      "w", "w", "=r", 2);
 			__uaccess_end();
 			return ret;
 		case 4:
 			ret = 0;
-			__uaccess_begin();
+			__uaccess_begin_nospec();
 			__get_user_asm_nozero(*(u32 *)to, from, ret,
 					      "l", "k", "=r", 4);
 			__uaccess_end();
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index f07ef3c575db..62546b3a398e 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -55,31 +55,31 @@ raw_copy_from_user(void *dst, const void __user *src, unsigned long size)
 		return copy_user_generic(dst, (__force void *)src, size);
 	switch (size) {
 	case 1:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u8 *)dst, (u8 __user *)src,
 			      ret, "b", "b", "=q", 1);
 		__uaccess_end();
 		return ret;
 	case 2:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u16 *)dst, (u16 __user *)src,
 			      ret, "w", "w", "=r", 2);
 		__uaccess_end();
 		return ret;
 	case 4:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u32 *)dst, (u32 __user *)src,
 			      ret, "l", "k", "=r", 4);
 		__uaccess_end();
 		return ret;
 	case 8:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
 			      ret, "q", "", "=r", 8);
 		__uaccess_end();
 		return ret;
 	case 10:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
 			       ret, "q", "", "=r", 10);
 		if (likely(!ret))
@@ -89,7 +89,7 @@ raw_copy_from_user(void *dst, const void __user *src, unsigned long size)
 		__uaccess_end();
 		return ret;
 	case 16:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
 			       ret, "q", "", "=r", 16);
 		if (likely(!ret))
diff --git a/arch/x86/lib/usercopy_32.c b/arch/x86/lib/usercopy_32.c
index 1b377f734e64..7add8ba06887 100644
--- a/arch/x86/lib/usercopy_32.c
+++ b/arch/x86/lib/usercopy_32.c
@@ -331,12 +331,12 @@ do {									\
 
 unsigned long __copy_user_ll(void *to, const void *from, unsigned long n)
 {
-	stac();
+	__uaccess_begin_nospec();
 	if (movsl_is_ok(to, from, n))
 		__copy_user(to, from, n);
 	else
 		n = __copy_user_intel(to, from, n);
-	clac();
+	__uaccess_end();
 	return n;
 }
 EXPORT_SYMBOL(__copy_user_ll);
@@ -344,7 +344,7 @@ EXPORT_SYMBOL(__copy_user_ll);
 unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *from,
 					unsigned long n)
 {
-	stac();
+	__uaccess_begin_nospec();
 #ifdef CONFIG_X86_INTEL_USERCOPY
 	if (n > 64 && static_cpu_has(X86_FEATURE_XMM2))
 		n = __copy_user_intel_nocache(to, from, n);
@@ -353,7 +353,7 @@ unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *fr
 #else
 	__copy_user(to, from, n);
 #endif
-	clac();
+	__uaccess_end();
 	return n;
 }
 EXPORT_SYMBOL(__copy_from_user_ll_nocache_nozero);

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4.1 06/10] x86, get_user: use pointer masking to limit speculation
  2018-01-20 21:05 ` Dan Williams
@ 2018-01-20 21:06   ` Dan Williams
  -1 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2018-01-20 21:06 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, Kees Cook, kernel-hardening, gregkh, x86,
	Ingo Molnar, Al Viro, H. Peter Anvin, torvalds, alan

Quoting Linus:

    I do think that it would be a good idea to very expressly document
    the fact that it's not that the user access itself is unsafe. I do
    agree that things like "get_user()" want to be protected, but not
    because of any direct bugs or problems with get_user() and friends,
    but simply because get_user() is an excellent source of a pointer
    that is obviously controlled from a potentially attacking user
    space. So it's a prime candidate for then finding _subsequent_
    accesses that can then be used to perturb the cache.

Unlike the '__get_user' case 'get_user' includes the address limit check
near the pointer de-reference. With that locality the speculation can be
mitigated with pointer narrowing rather than a barrier. Where the
narrowing is performed by:

	cmp %limit, %ptr
	sbb %mask, %mask
	and %mask, %ptr

With respect to speculation the value of %ptr is either less than %limit
or NULL.

Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/smap.h |   17 +++++++++++++++++
 arch/x86/lib/getuser.S      |    5 +++++
 2 files changed, 22 insertions(+)

diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
index db333300bd4b..2b4ad4c6a226 100644
--- a/arch/x86/include/asm/smap.h
+++ b/arch/x86/include/asm/smap.h
@@ -25,6 +25,23 @@
 
 #include <asm/alternative-asm.h>
 
+/*
+ * MASK_NOSPEC - sanitize the value of a user controlled value with
+ * respect to speculation
+ *
+ * In the get_user path once we have determined that the pointer is
+ * below the current address limit sanitize its value with respect to
+ * speculation. In the case when the pointer is above the address limit
+ * this directs the cpu to speculate with a NULL ptr rather than
+ * something targeting kernel memory.
+ *
+ * assumes CF is set from a previous 'cmp TASK_addr_limit, %ptr'
+ */
+.macro MASK_NOSPEC mask val
+	sbb \mask, \mask
+	and \mask, \val
+.endm
+
 #ifdef CONFIG_X86_SMAP
 
 #define ASM_CLAC \
diff --git a/arch/x86/lib/getuser.S b/arch/x86/lib/getuser.S
index c97d935a29e8..07d0e8a28b17 100644
--- a/arch/x86/lib/getuser.S
+++ b/arch/x86/lib/getuser.S
@@ -40,6 +40,7 @@ ENTRY(__get_user_1)
 	mov PER_CPU_VAR(current_task), %_ASM_DX
 	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
 	jae bad_get_user
+	MASK_NOSPEC %_ASM_DX, %_ASM_AX
 	ASM_STAC
 1:	movzbl (%_ASM_AX),%edx
 	xor %eax,%eax
@@ -54,6 +55,7 @@ ENTRY(__get_user_2)
 	mov PER_CPU_VAR(current_task), %_ASM_DX
 	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
 	jae bad_get_user
+	MASK_NOSPEC %_ASM_DX, %_ASM_AX
 	ASM_STAC
 2:	movzwl -1(%_ASM_AX),%edx
 	xor %eax,%eax
@@ -68,6 +70,7 @@ ENTRY(__get_user_4)
 	mov PER_CPU_VAR(current_task), %_ASM_DX
 	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
 	jae bad_get_user
+	MASK_NOSPEC %_ASM_DX, %_ASM_AX
 	ASM_STAC
 3:	movl -3(%_ASM_AX),%edx
 	xor %eax,%eax
@@ -83,6 +86,7 @@ ENTRY(__get_user_8)
 	mov PER_CPU_VAR(current_task), %_ASM_DX
 	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
 	jae bad_get_user
+	MASK_NOSPEC %_ASM_DX, %_ASM_AX
 	ASM_STAC
 4:	movq -7(%_ASM_AX),%rdx
 	xor %eax,%eax
@@ -94,6 +98,7 @@ ENTRY(__get_user_8)
 	mov PER_CPU_VAR(current_task), %_ASM_DX
 	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
 	jae bad_get_user_8
+	MASK_NOSPEC %_ASM_DX, %_ASM_AX
 	ASM_STAC
 4:	movl -7(%_ASM_AX),%edx
 5:	movl -3(%_ASM_AX),%ecx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [kernel-hardening] [PATCH v4.1 06/10] x86, get_user: use pointer masking to limit speculation
@ 2018-01-20 21:06   ` Dan Williams
  0 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2018-01-20 21:06 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, Kees Cook, kernel-hardening, gregkh, x86,
	Ingo Molnar, Al Viro, H. Peter Anvin, torvalds, alan

Quoting Linus:

    I do think that it would be a good idea to very expressly document
    the fact that it's not that the user access itself is unsafe. I do
    agree that things like "get_user()" want to be protected, but not
    because of any direct bugs or problems with get_user() and friends,
    but simply because get_user() is an excellent source of a pointer
    that is obviously controlled from a potentially attacking user
    space. So it's a prime candidate for then finding _subsequent_
    accesses that can then be used to perturb the cache.

Unlike the '__get_user' case 'get_user' includes the address limit check
near the pointer de-reference. With that locality the speculation can be
mitigated with pointer narrowing rather than a barrier. Where the
narrowing is performed by:

	cmp %limit, %ptr
	sbb %mask, %mask
	and %mask, %ptr

With respect to speculation the value of %ptr is either less than %limit
or NULL.

Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/smap.h |   17 +++++++++++++++++
 arch/x86/lib/getuser.S      |    5 +++++
 2 files changed, 22 insertions(+)

diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
index db333300bd4b..2b4ad4c6a226 100644
--- a/arch/x86/include/asm/smap.h
+++ b/arch/x86/include/asm/smap.h
@@ -25,6 +25,23 @@
 
 #include <asm/alternative-asm.h>
 
+/*
+ * MASK_NOSPEC - sanitize the value of a user controlled value with
+ * respect to speculation
+ *
+ * In the get_user path once we have determined that the pointer is
+ * below the current address limit sanitize its value with respect to
+ * speculation. In the case when the pointer is above the address limit
+ * this directs the cpu to speculate with a NULL ptr rather than
+ * something targeting kernel memory.
+ *
+ * assumes CF is set from a previous 'cmp TASK_addr_limit, %ptr'
+ */
+.macro MASK_NOSPEC mask val
+	sbb \mask, \mask
+	and \mask, \val
+.endm
+
 #ifdef CONFIG_X86_SMAP
 
 #define ASM_CLAC \
diff --git a/arch/x86/lib/getuser.S b/arch/x86/lib/getuser.S
index c97d935a29e8..07d0e8a28b17 100644
--- a/arch/x86/lib/getuser.S
+++ b/arch/x86/lib/getuser.S
@@ -40,6 +40,7 @@ ENTRY(__get_user_1)
 	mov PER_CPU_VAR(current_task), %_ASM_DX
 	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
 	jae bad_get_user
+	MASK_NOSPEC %_ASM_DX, %_ASM_AX
 	ASM_STAC
 1:	movzbl (%_ASM_AX),%edx
 	xor %eax,%eax
@@ -54,6 +55,7 @@ ENTRY(__get_user_2)
 	mov PER_CPU_VAR(current_task), %_ASM_DX
 	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
 	jae bad_get_user
+	MASK_NOSPEC %_ASM_DX, %_ASM_AX
 	ASM_STAC
 2:	movzwl -1(%_ASM_AX),%edx
 	xor %eax,%eax
@@ -68,6 +70,7 @@ ENTRY(__get_user_4)
 	mov PER_CPU_VAR(current_task), %_ASM_DX
 	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
 	jae bad_get_user
+	MASK_NOSPEC %_ASM_DX, %_ASM_AX
 	ASM_STAC
 3:	movl -3(%_ASM_AX),%edx
 	xor %eax,%eax
@@ -83,6 +86,7 @@ ENTRY(__get_user_8)
 	mov PER_CPU_VAR(current_task), %_ASM_DX
 	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
 	jae bad_get_user
+	MASK_NOSPEC %_ASM_DX, %_ASM_AX
 	ASM_STAC
 4:	movq -7(%_ASM_AX),%rdx
 	xor %eax,%eax
@@ -94,6 +98,7 @@ ENTRY(__get_user_8)
 	mov PER_CPU_VAR(current_task), %_ASM_DX
 	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
 	jae bad_get_user_8
+	MASK_NOSPEC %_ASM_DX, %_ASM_AX
 	ASM_STAC
 4:	movl -7(%_ASM_AX),%edx
 5:	movl -3(%_ASM_AX),%ecx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4.1 07/10] x86: narrow out of bounds syscalls to sys_read under speculation
  2018-01-20 21:05 ` Dan Williams
@ 2018-01-20 21:06   ` Dan Williams
  -1 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2018-01-20 21:06 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, kernel-hardening, gregkh, x86, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, torvalds, alan

The syscall table base is a user controlled function pointer in kernel
space. Like, 'get_user, use 'MASK_NOSPEC' to prevent any out of bounds
speculation. While retpoline prevents speculating into the user
controlled target it does not stop the pointer de-reference, the concern
is leaking memory relative to the syscall table base.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/entry/entry_64.S   |    2 ++
 arch/x86/include/asm/smap.h |    9 ++++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 63f4320602a3..584f6d2236b3 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -35,6 +35,7 @@
 #include <asm/asm.h>
 #include <asm/smap.h>
 #include <asm/pgtable_types.h>
+#include <asm/smap.h>
 #include <asm/export.h>
 #include <asm/frame.h>
 #include <asm/nospec-branch.h>
@@ -260,6 +261,7 @@ entry_SYSCALL_64_fastpath:
 	cmpl	$__NR_syscall_max, %eax
 #endif
 	ja	1f				/* return -ENOSYS (already in pt_regs->ax) */
+	MASK_NOSPEC %r11 %rax			/* sanitize syscall_nr wrt speculation */
 	movq	%r10, %rcx
 
 	/*
diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
index 2b4ad4c6a226..3b5b2cf58dc6 100644
--- a/arch/x86/include/asm/smap.h
+++ b/arch/x86/include/asm/smap.h
@@ -35,7 +35,14 @@
  * this directs the cpu to speculate with a NULL ptr rather than
  * something targeting kernel memory.
  *
- * assumes CF is set from a previous 'cmp TASK_addr_limit, %ptr'
+ * In the syscall entry path it is possible to speculate past the
+ * validation of the system call number. Use MASK_NOSPEC to sanitize the
+ * syscall array index to zero (sys_read) rather than an arbitrary
+ * target.
+ *
+ * assumes CF is set from a previous 'cmp' i.e.:
+ *     cmp TASK_addr_limit, %ptr
+ *     cmp __NR_syscall_max, %idx
  */
 .macro MASK_NOSPEC mask val
 	sbb \mask, \mask

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [kernel-hardening] [PATCH v4.1 07/10] x86: narrow out of bounds syscalls to sys_read under speculation
@ 2018-01-20 21:06   ` Dan Williams
  0 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2018-01-20 21:06 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, kernel-hardening, gregkh, x86, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, torvalds, alan

The syscall table base is a user controlled function pointer in kernel
space. Like, 'get_user, use 'MASK_NOSPEC' to prevent any out of bounds
speculation. While retpoline prevents speculating into the user
controlled target it does not stop the pointer de-reference, the concern
is leaking memory relative to the syscall table base.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/entry/entry_64.S   |    2 ++
 arch/x86/include/asm/smap.h |    9 ++++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 63f4320602a3..584f6d2236b3 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -35,6 +35,7 @@
 #include <asm/asm.h>
 #include <asm/smap.h>
 #include <asm/pgtable_types.h>
+#include <asm/smap.h>
 #include <asm/export.h>
 #include <asm/frame.h>
 #include <asm/nospec-branch.h>
@@ -260,6 +261,7 @@ entry_SYSCALL_64_fastpath:
 	cmpl	$__NR_syscall_max, %eax
 #endif
 	ja	1f				/* return -ENOSYS (already in pt_regs->ax) */
+	MASK_NOSPEC %r11 %rax			/* sanitize syscall_nr wrt speculation */
 	movq	%r10, %rcx
 
 	/*
diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
index 2b4ad4c6a226..3b5b2cf58dc6 100644
--- a/arch/x86/include/asm/smap.h
+++ b/arch/x86/include/asm/smap.h
@@ -35,7 +35,14 @@
  * this directs the cpu to speculate with a NULL ptr rather than
  * something targeting kernel memory.
  *
- * assumes CF is set from a previous 'cmp TASK_addr_limit, %ptr'
+ * In the syscall entry path it is possible to speculate past the
+ * validation of the system call number. Use MASK_NOSPEC to sanitize the
+ * syscall array index to zero (sys_read) rather than an arbitrary
+ * target.
+ *
+ * assumes CF is set from a previous 'cmp' i.e.:
+ *     cmp TASK_addr_limit, %ptr
+ *     cmp __NR_syscall_max, %idx
  */
 .macro MASK_NOSPEC mask val
 	sbb \mask, \mask

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4.1 08/10] vfs, fdtable: prevent bounds-check bypass via speculative execution
  2018-01-20 21:05 ` Dan Williams
@ 2018-01-20 21:06   ` Dan Williams
  -1 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2018-01-20 21:06 UTC (permalink / raw)
  To: tglx; +Cc: linux-arch, kernel-hardening, gregkh, Al Viro, torvalds, alan

'fd' is a user controlled value that is used as a data dependency to
read from the 'fdt->fd' array.  In order to avoid potential leaks of
kernel memory values, block speculative execution of the instruction
stream that could issue reads based on an invalid 'file *' returned from
__fcheck_files.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Co-developed-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/fdtable.h |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index 1c65817673db..9731f1a255db 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -10,6 +10,7 @@
 #include <linux/compiler.h>
 #include <linux/spinlock.h>
 #include <linux/rcupdate.h>
+#include <linux/nospec.h>
 #include <linux/types.h>
 #include <linux/init.h>
 #include <linux/fs.h>
@@ -81,9 +82,11 @@ struct dentry;
 static inline struct file *__fcheck_files(struct files_struct *files, unsigned int fd)
 {
 	struct fdtable *fdt = rcu_dereference_raw(files->fdt);
+	struct file __rcu **fdp;
 
-	if (fd < fdt->max_fds)
-		return rcu_dereference_raw(fdt->fd[fd]);
+	fdp = array_ptr(fdt->fd, fd, fdt->max_fds);
+	if (fdp)
+		return rcu_dereference_raw(*fdp);
 	return NULL;
 }
 

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [kernel-hardening] [PATCH v4.1 08/10] vfs, fdtable: prevent bounds-check bypass via speculative execution
@ 2018-01-20 21:06   ` Dan Williams
  0 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2018-01-20 21:06 UTC (permalink / raw)
  To: tglx; +Cc: linux-arch, kernel-hardening, gregkh, Al Viro, torvalds, alan

'fd' is a user controlled value that is used as a data dependency to
read from the 'fdt->fd' array.  In order to avoid potential leaks of
kernel memory values, block speculative execution of the instruction
stream that could issue reads based on an invalid 'file *' returned from
__fcheck_files.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Co-developed-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/fdtable.h |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index 1c65817673db..9731f1a255db 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -10,6 +10,7 @@
 #include <linux/compiler.h>
 #include <linux/spinlock.h>
 #include <linux/rcupdate.h>
+#include <linux/nospec.h>
 #include <linux/types.h>
 #include <linux/init.h>
 #include <linux/fs.h>
@@ -81,9 +82,11 @@ struct dentry;
 static inline struct file *__fcheck_files(struct files_struct *files, unsigned int fd)
 {
 	struct fdtable *fdt = rcu_dereference_raw(files->fdt);
+	struct file __rcu **fdp;
 
-	if (fd < fdt->max_fds)
-		return rcu_dereference_raw(fdt->fd[fd]);
+	fdp = array_ptr(fdt->fd, fd, fdt->max_fds);
+	if (fdp)
+		return rcu_dereference_raw(*fdp);
 	return NULL;
 }
 

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4.1 09/10] kvm, x86: update spectre-v1 mitigation
  2018-01-20 21:05 ` Dan Williams
@ 2018-01-20 21:06   ` Dan Williams
  -1 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2018-01-20 21:06 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, kernel-hardening, Andrew Honig, gregkh,
	Paolo Bonzini, alan, torvalds, Jim Mattson

Commit 75f139aaf896 "KVM: x86: Add memory barrier on vmcs field lookup"
added a raw 'asm("lfence");' to prevent a bounds check bypass of
'vmcs_field_to_offset_table'. We can save an lfence in this path and
just use the common 'array_ptr' helper designed for these types of
fixes.

Cc: Andrew Honig <ahonig@google.com>
Cc: Jim Mattson <jmattson@google.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/kvm/vmx.c |   11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index d1e25dba3112..62af077cd975 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -34,6 +34,7 @@
 #include <linux/tboot.h>
 #include <linux/hrtimer.h>
 #include <linux/frame.h>
+#include <linux/nospec.h>
 #include "kvm_cache_regs.h"
 #include "x86.h"
 
@@ -883,13 +884,15 @@ static const unsigned short vmcs_field_to_offset_table[] = {
 
 static inline short vmcs_field_to_offset(unsigned long field)
 {
+	const unsigned short *offset;
+
 	BUILD_BUG_ON(ARRAY_SIZE(vmcs_field_to_offset_table) > SHRT_MAX);
 
-	if (field >= ARRAY_SIZE(vmcs_field_to_offset_table) ||
-	    vmcs_field_to_offset_table[field] == 0)
+	offset = array_ptr(vmcs_field_to_offset_table, field,
+			ARRAY_SIZE(vmcs_field_to_offset_table));
+	if (!offset || *offset == 0)
 		return -ENOENT;
-
-	return vmcs_field_to_offset_table[field];
+	return *offset;
 }
 
 static inline struct vmcs12 *get_vmcs12(struct kvm_vcpu *vcpu)

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [kernel-hardening] [PATCH v4.1 09/10] kvm, x86: update spectre-v1 mitigation
@ 2018-01-20 21:06   ` Dan Williams
  0 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2018-01-20 21:06 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, kernel-hardening, Andrew Honig, gregkh,
	Paolo Bonzini, alan, torvalds, Jim Mattson

Commit 75f139aaf896 "KVM: x86: Add memory barrier on vmcs field lookup"
added a raw 'asm("lfence");' to prevent a bounds check bypass of
'vmcs_field_to_offset_table'. We can save an lfence in this path and
just use the common 'array_ptr' helper designed for these types of
fixes.

Cc: Andrew Honig <ahonig@google.com>
Cc: Jim Mattson <jmattson@google.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/kvm/vmx.c |   11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index d1e25dba3112..62af077cd975 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -34,6 +34,7 @@
 #include <linux/tboot.h>
 #include <linux/hrtimer.h>
 #include <linux/frame.h>
+#include <linux/nospec.h>
 #include "kvm_cache_regs.h"
 #include "x86.h"
 
@@ -883,13 +884,15 @@ static const unsigned short vmcs_field_to_offset_table[] = {
 
 static inline short vmcs_field_to_offset(unsigned long field)
 {
+	const unsigned short *offset;
+
 	BUILD_BUG_ON(ARRAY_SIZE(vmcs_field_to_offset_table) > SHRT_MAX);
 
-	if (field >= ARRAY_SIZE(vmcs_field_to_offset_table) ||
-	    vmcs_field_to_offset_table[field] == 0)
+	offset = array_ptr(vmcs_field_to_offset_table, field,
+			ARRAY_SIZE(vmcs_field_to_offset_table));
+	if (!offset || *offset == 0)
 		return -ENOENT;
-
-	return vmcs_field_to_offset_table[field];
+	return *offset;
 }
 
 static inline struct vmcs12 *get_vmcs12(struct kvm_vcpu *vcpu)

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4.1 10/10] nl80211: sanitize array index in parse_txq_params
  2018-01-20 21:05 ` Dan Williams
@ 2018-01-20 21:06   ` Dan Williams
  -1 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2018-01-20 21:06 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, kernel-hardening, gregkh, Christian Lamparter,
	linux-wireless, Johannes Berg, torvalds, David S. Miller,
	Elena Reshetova, alan

Wireless drivers rely on parse_txq_params to validate that
txq_params->ac is less than NL80211_NUM_ACS by the time the low-level
driver's ->conf_tx() handler is called. Use a new helper, 'array_idx',
to sanitize txq_params->ac with respect to speculation. I.e. ensure that
any speculation into ->conf_tx() handlers is done with a value of
txq_params->ac that is within the bounds of [0, NL80211_NUM_ACS).

Reported-by: Christian Lamparter <chunkeey@gmail.com>
Reported-by: Elena Reshetova <elena.reshetova@intel.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-wireless@vger.kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/nospec.h |   21 +++++++++++++++++++++
 net/wireless/nl80211.c |   10 +++++++---
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/include/linux/nospec.h b/include/linux/nospec.h
index dd3aa05fab87..b8a9222e34d1 100644
--- a/include/linux/nospec.h
+++ b/include/linux/nospec.h
@@ -41,4 +41,25 @@ static inline unsigned long array_ptr_mask(unsigned long idx, unsigned long sz)
 	__u._bit &= _mask;						\
 	__u._ptr;							\
 })
+
+/**
+ * array_idx - Generate a pointer to an array index, ensuring the
+ * pointer is bounded under speculation to NULL.
+ *
+ * @idx: the index of the element, must be less than LONG_MAX
+ * @sz: the number of elements in the array, must be less than LONG_MAX
+ *
+ * If @idx falls in the interval [0, @sz), returns &@idx otherwise
+ * returns NULL.
+ */
+#define array_idx(idx, sz)						\
+({									\
+	union { typeof((idx)) *_ptr; unsigned long _bit; } __u;		\
+	typeof(idx) *_i = &(idx);					\
+	unsigned long _mask = array_ptr_mask(*_i, (sz));		\
+									\
+	__u._ptr = _i;							\
+	__u._bit &= _mask;						\
+	__u._ptr;							\
+})
 #endif /* __NOSPEC_H__ */
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index d396cb61a280..e3b3527c16d4 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -16,6 +16,7 @@
 #include <linux/nl80211.h>
 #include <linux/rtnetlink.h>
 #include <linux/netlink.h>
+#include <linux/nospec.h>
 #include <linux/etherdevice.h>
 #include <net/net_namespace.h>
 #include <net/genetlink.h>
@@ -2056,20 +2057,23 @@ static const struct nla_policy txq_params_policy[NL80211_TXQ_ATTR_MAX + 1] = {
 static int parse_txq_params(struct nlattr *tb[],
 			    struct ieee80211_txq_params *txq_params)
 {
+	u8 ac, *idx;
+
 	if (!tb[NL80211_TXQ_ATTR_AC] || !tb[NL80211_TXQ_ATTR_TXOP] ||
 	    !tb[NL80211_TXQ_ATTR_CWMIN] || !tb[NL80211_TXQ_ATTR_CWMAX] ||
 	    !tb[NL80211_TXQ_ATTR_AIFS])
 		return -EINVAL;
 
-	txq_params->ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]);
+	ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]);
 	txq_params->txop = nla_get_u16(tb[NL80211_TXQ_ATTR_TXOP]);
 	txq_params->cwmin = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMIN]);
 	txq_params->cwmax = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMAX]);
 	txq_params->aifs = nla_get_u8(tb[NL80211_TXQ_ATTR_AIFS]);
 
-	if (txq_params->ac >= NL80211_NUM_ACS)
+	idx = array_idx(ac, NL80211_NUM_ACS);
+	if (!idx)
 		return -EINVAL;
-
+	txq_params->ac = *idx;
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [kernel-hardening] [PATCH v4.1 10/10] nl80211: sanitize array index in parse_txq_params
@ 2018-01-20 21:06   ` Dan Williams
  0 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2018-01-20 21:06 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, kernel-hardening, gregkh, Christian Lamparter,
	linux-wireless, Johannes Berg, torvalds, David S. Miller,
	Elena Reshetova, alan

Wireless drivers rely on parse_txq_params to validate that
txq_params->ac is less than NL80211_NUM_ACS by the time the low-level
driver's ->conf_tx() handler is called. Use a new helper, 'array_idx',
to sanitize txq_params->ac with respect to speculation. I.e. ensure that
any speculation into ->conf_tx() handlers is done with a value of
txq_params->ac that is within the bounds of [0, NL80211_NUM_ACS).

Reported-by: Christian Lamparter <chunkeey@gmail.com>
Reported-by: Elena Reshetova <elena.reshetova@intel.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-wireless@vger.kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/nospec.h |   21 +++++++++++++++++++++
 net/wireless/nl80211.c |   10 +++++++---
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/include/linux/nospec.h b/include/linux/nospec.h
index dd3aa05fab87..b8a9222e34d1 100644
--- a/include/linux/nospec.h
+++ b/include/linux/nospec.h
@@ -41,4 +41,25 @@ static inline unsigned long array_ptr_mask(unsigned long idx, unsigned long sz)
 	__u._bit &= _mask;						\
 	__u._ptr;							\
 })
+
+/**
+ * array_idx - Generate a pointer to an array index, ensuring the
+ * pointer is bounded under speculation to NULL.
+ *
+ * @idx: the index of the element, must be less than LONG_MAX
+ * @sz: the number of elements in the array, must be less than LONG_MAX
+ *
+ * If @idx falls in the interval [0, @sz), returns &@idx otherwise
+ * returns NULL.
+ */
+#define array_idx(idx, sz)						\
+({									\
+	union { typeof((idx)) *_ptr; unsigned long _bit; } __u;		\
+	typeof(idx) *_i = &(idx);					\
+	unsigned long _mask = array_ptr_mask(*_i, (sz));		\
+									\
+	__u._ptr = _i;							\
+	__u._bit &= _mask;						\
+	__u._ptr;							\
+})
 #endif /* __NOSPEC_H__ */
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index d396cb61a280..e3b3527c16d4 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -16,6 +16,7 @@
 #include <linux/nl80211.h>
 #include <linux/rtnetlink.h>
 #include <linux/netlink.h>
+#include <linux/nospec.h>
 #include <linux/etherdevice.h>
 #include <net/net_namespace.h>
 #include <net/genetlink.h>
@@ -2056,20 +2057,23 @@ static const struct nla_policy txq_params_policy[NL80211_TXQ_ATTR_MAX + 1] = {
 static int parse_txq_params(struct nlattr *tb[],
 			    struct ieee80211_txq_params *txq_params)
 {
+	u8 ac, *idx;
+
 	if (!tb[NL80211_TXQ_ATTR_AC] || !tb[NL80211_TXQ_ATTR_TXOP] ||
 	    !tb[NL80211_TXQ_ATTR_CWMIN] || !tb[NL80211_TXQ_ATTR_CWMAX] ||
 	    !tb[NL80211_TXQ_ATTR_AIFS])
 		return -EINVAL;
 
-	txq_params->ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]);
+	ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]);
 	txq_params->txop = nla_get_u16(tb[NL80211_TXQ_ATTR_TXOP]);
 	txq_params->cwmin = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMIN]);
 	txq_params->cwmax = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMAX]);
 	txq_params->aifs = nla_get_u8(tb[NL80211_TXQ_ATTR_AIFS]);
 
-	if (txq_params->ac >= NL80211_NUM_ACS)
+	idx = array_idx(ac, NL80211_NUM_ACS);
+	if (!idx)
 		return -EINVAL;
-
+	txq_params->ac = *idx;
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v4.1 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
  2018-01-20 21:06   ` [kernel-hardening] " Dan Williams
@ 2018-01-21 10:40     ` Russell King - ARM Linux
  -1 siblings, 0 replies; 48+ messages in thread
From: Russell King - ARM Linux @ 2018-01-21 10:40 UTC (permalink / raw)
  To: Dan Williams
  Cc: tglx, linux-arch, kernel-hardening, Peter Zijlstra,
	Catalin Marinas, x86, Will Deacon, Ingo Molnar, gregkh,
	H. Peter Anvin, torvalds, alan

On Sat, Jan 20, 2018 at 01:06:09PM -0800, Dan Williams wrote:
> +/*
> + * If idx is negative or if idx > size then bit 63 is set in the mask,
> + * and the value of ~(-1L) is zero. When the mask is zero, bounds check
> + * failed, array_ptr will return NULL.

The more times I see this the more times I'm unhappy with this comment:

1. does this really mean "idx > size" or "idx >= size" ?  The code
   implements the latter not the former.

2. is "bit 63" relevant here - what if longs are 32-bit?  "the top bit"
   or "the sign bit" would be better.

3. "and the value of ~(-1L) is zero."  So does this mean that when
   0 <= idx < size, somehow the rules of logic change and ~(-1L)
   magically becomes no longer zero!

I'd suggest changing the description to something like:

  * If 0 <= idx < size, return a mask of ~0UL, otherwise return zero.

or:

  * When idx is out of bounds (iow, is negative or idx >= sz), the sign
  * bit will be set. Extend the sign bit to all bits and invert, giving
  * a result of zero for an out of bounds idx, or ~0UL if within bounds.

depending on how deeply you want to describe what's going on here.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: [PATCH v4.1 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
@ 2018-01-21 10:40     ` Russell King - ARM Linux
  0 siblings, 0 replies; 48+ messages in thread
From: Russell King - ARM Linux @ 2018-01-21 10:40 UTC (permalink / raw)
  To: Dan Williams
  Cc: tglx, linux-arch, kernel-hardening, Peter Zijlstra,
	Catalin Marinas, x86, Will Deacon, Ingo Molnar, gregkh,
	H. Peter Anvin, torvalds, alan

On Sat, Jan 20, 2018 at 01:06:09PM -0800, Dan Williams wrote:
> +/*
> + * If idx is negative or if idx > size then bit 63 is set in the mask,
> + * and the value of ~(-1L) is zero. When the mask is zero, bounds check
> + * failed, array_ptr will return NULL.

The more times I see this the more times I'm unhappy with this comment:

1. does this really mean "idx > size" or "idx >= size" ?  The code
   implements the latter not the former.

2. is "bit 63" relevant here - what if longs are 32-bit?  "the top bit"
   or "the sign bit" would be better.

3. "and the value of ~(-1L) is zero."  So does this mean that when
   0 <= idx < size, somehow the rules of logic change and ~(-1L)
   magically becomes no longer zero!

I'd suggest changing the description to something like:

  * If 0 <= idx < size, return a mask of ~0UL, otherwise return zero.

or:

  * When idx is out of bounds (iow, is negative or idx >= sz), the sign
  * bit will be set. Extend the sign bit to all bits and invert, giving
  * a result of zero for an out of bounds idx, or ~0UL if within bounds.

depending on how deeply you want to describe what's going on here.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Re: [PATCH v4.1 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
  2018-01-21 10:40     ` [kernel-hardening] " Russell King - ARM Linux
@ 2018-01-21 15:07       ` Jann Horn
  -1 siblings, 0 replies; 48+ messages in thread
From: Jann Horn @ 2018-01-21 15:07 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Dan Williams, Thomas Gleixner, linux-arch, kernel-hardening,
	Peter Zijlstra, Catalin Marinas, x86, Will Deacon, Ingo Molnar,
	gregkh, H. Peter Anvin, torvalds, alan

[-- Attachment #1: Type: text/plain, Size: 2201 bytes --]

On Sun, Jan 21, 2018, 11:40 Russell King - ARM Linux <linux@armlinux.org.uk>
wrote:

> On Sat, Jan 20, 2018 at 01:06:09PM -0800, Dan Williams wrote:
> > +/*
> > + * If idx is negative or if idx > size then bit 63 is set in the mask,
> > + * and the value of ~(-1L) is zero. When the mask is zero, bounds check
> > + * failed, array_ptr will return NULL.
>
> The more times I see this the more times I'm unhappy with this comment:
>
> 1. does this really mean "idx > size" or "idx >= size" ?  The code
>    implements the latter not the former.
>

Copying the code here for context:
return ~(long)(idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1);

That part of the condition (ignoring the overflow edgecases) is equivalent
to "!(idx > sz - 1)", which is equivalent to "idx <= sz - 1", which is
(ignoring overflow edgecases) equivalent to "idx < sz".

Handling of edgecases:
idx>=2^(BITS_PER_LONG-1) will cause a NULL return through the first part of
the condition.
Hmm... a problematic case might be "sz==0 && idx==2^(BITS_PER_LONG-1)-1".
The first part of the expression wouldn't trigger, the second part would be
"2^(BITS_PER_LONG)-1-(2^(BITS_PER_LONG-1)-1) ==
2^(BITS_PER_LONG)-2^(BITS_PER_LONG-1)
== 2^(BITS_PER_LONG-1)", which also wouldn't trigger, I think?

2. is "bit 63" relevant here - what if longs are 32-bit?  "the top bit"
>    or "the sign bit" would be better.
>
> 3. "and the value of ~(-1L) is zero."  So does this mean that when
>    0 <= idx < size, somehow the rules of logic change and ~(-1L)
>    magically becomes no longer zero!
>
> I'd suggest changing the description to something like:
>
>   * If 0 <= idx < size, return a mask of ~0UL, otherwise return zero.
>
> or:
>
>   * When idx is out of bounds (iow, is negative or idx >= sz), the sign
>   * bit will be set. Extend the sign bit to all bits and invert, giving
>   * a result of zero for an out of bounds idx, or ~0UL if within bounds.
>
> depending on how deeply you want to describe what's going on here.
>
> --
> RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps
> up
> According to speedtest.net: 8.21Mbps down 510kbps up
>

[-- Attachment #2: Type: text/html, Size: 3688 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [kernel-hardening] Re: [PATCH v4.1 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
@ 2018-01-21 15:07       ` Jann Horn
  0 siblings, 0 replies; 48+ messages in thread
From: Jann Horn @ 2018-01-21 15:07 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Dan Williams, Thomas Gleixner, linux-arch, kernel-hardening,
	Peter Zijlstra, Catalin Marinas, x86, Will Deacon, Ingo Molnar,
	gregkh, H. Peter Anvin, torvalds, alan

[-- Attachment #1: Type: text/plain, Size: 2201 bytes --]

On Sun, Jan 21, 2018, 11:40 Russell King - ARM Linux <linux@armlinux.org.uk>
wrote:

> On Sat, Jan 20, 2018 at 01:06:09PM -0800, Dan Williams wrote:
> > +/*
> > + * If idx is negative or if idx > size then bit 63 is set in the mask,
> > + * and the value of ~(-1L) is zero. When the mask is zero, bounds check
> > + * failed, array_ptr will return NULL.
>
> The more times I see this the more times I'm unhappy with this comment:
>
> 1. does this really mean "idx > size" or "idx >= size" ?  The code
>    implements the latter not the former.
>

Copying the code here for context:
return ~(long)(idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1);

That part of the condition (ignoring the overflow edgecases) is equivalent
to "!(idx > sz - 1)", which is equivalent to "idx <= sz - 1", which is
(ignoring overflow edgecases) equivalent to "idx < sz".

Handling of edgecases:
idx>=2^(BITS_PER_LONG-1) will cause a NULL return through the first part of
the condition.
Hmm... a problematic case might be "sz==0 && idx==2^(BITS_PER_LONG-1)-1".
The first part of the expression wouldn't trigger, the second part would be
"2^(BITS_PER_LONG)-1-(2^(BITS_PER_LONG-1)-1) ==
2^(BITS_PER_LONG)-2^(BITS_PER_LONG-1)
== 2^(BITS_PER_LONG-1)", which also wouldn't trigger, I think?

2. is "bit 63" relevant here - what if longs are 32-bit?  "the top bit"
>    or "the sign bit" would be better.
>
> 3. "and the value of ~(-1L) is zero."  So does this mean that when
>    0 <= idx < size, somehow the rules of logic change and ~(-1L)
>    magically becomes no longer zero!
>
> I'd suggest changing the description to something like:
>
>   * If 0 <= idx < size, return a mask of ~0UL, otherwise return zero.
>
> or:
>
>   * When idx is out of bounds (iow, is negative or idx >= sz), the sign
>   * bit will be set. Extend the sign bit to all bits and invert, giving
>   * a result of zero for an out of bounds idx, or ~0UL if within bounds.
>
> depending on how deeply you want to describe what's going on here.
>
> --
> RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps
> up
> According to speedtest.net: 8.21Mbps down 510kbps up
>

[-- Attachment #2: Type: text/html, Size: 3688 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4.1 07/10] x86: narrow out of bounds syscalls to sys_read under speculation
  2018-01-20 21:06   ` [kernel-hardening] " Dan Williams
@ 2018-01-21 19:16     ` Andy Lutomirski
  -1 siblings, 0 replies; 48+ messages in thread
From: Andy Lutomirski @ 2018-01-21 19:16 UTC (permalink / raw)
  To: Dan Williams
  Cc: tglx, linux-arch, kernel-hardening, gregkh, x86, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, torvalds, alan



> On Jan 20, 2018, at 1:06 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> 
> The syscall table base is a user controlled function pointer in kernel
> space. Like, 'get_user, use 'MASK_NOSPEC' to prevent any out of bounds
> speculation. While retpoline prevents speculating into the user
> controlled target it does not stop the pointer de-reference, the concern
> is leaking memory relative to the syscall table base.
> 
> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
> arch/x86/entry/entry_64.S   |    2 ++
> arch/x86/include/asm/smap.h |    9 ++++++++-
> 2 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 63f4320602a3..584f6d2236b3 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -35,6 +35,7 @@
> #include <asm/asm.h>
> #include <asm/smap.h>
> #include <asm/pgtable_types.h>
> +#include <asm/smap.h>
> #include <asm/export.h>
> #include <asm/frame.h>
> #include <asm/nospec-branch.h>
> @@ -260,6 +261,7 @@ entry_SYSCALL_64_fastpath:
>    cmpl    $__NR_syscall_max, %eax
> #endif
>    ja    1f                /* return -ENOSYS (already in pt_regs->ax) */
> +    MASK_NOSPEC %r11 %rax            /* sanitize syscall_nr wrt speculation */

What's the threat you're protecting against?  No one does any computation or data dependent loads based on the value being read.  Or are you worried about ASLR leaks due to merely speculatively loading from a bogus address.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: [PATCH v4.1 07/10] x86: narrow out of bounds syscalls to sys_read under speculation
@ 2018-01-21 19:16     ` Andy Lutomirski
  0 siblings, 0 replies; 48+ messages in thread
From: Andy Lutomirski @ 2018-01-21 19:16 UTC (permalink / raw)
  To: Dan Williams
  Cc: tglx, linux-arch, kernel-hardening, gregkh, x86, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, torvalds, alan



> On Jan 20, 2018, at 1:06 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> 
> The syscall table base is a user controlled function pointer in kernel
> space. Like, 'get_user, use 'MASK_NOSPEC' to prevent any out of bounds
> speculation. While retpoline prevents speculating into the user
> controlled target it does not stop the pointer de-reference, the concern
> is leaking memory relative to the syscall table base.
> 
> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
> arch/x86/entry/entry_64.S   |    2 ++
> arch/x86/include/asm/smap.h |    9 ++++++++-
> 2 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 63f4320602a3..584f6d2236b3 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -35,6 +35,7 @@
> #include <asm/asm.h>
> #include <asm/smap.h>
> #include <asm/pgtable_types.h>
> +#include <asm/smap.h>
> #include <asm/export.h>
> #include <asm/frame.h>
> #include <asm/nospec-branch.h>
> @@ -260,6 +261,7 @@ entry_SYSCALL_64_fastpath:
>    cmpl    $__NR_syscall_max, %eax
> #endif
>    ja    1f                /* return -ENOSYS (already in pt_regs->ax) */
> +    MASK_NOSPEC %r11 %rax            /* sanitize syscall_nr wrt speculation */

What's the threat you're protecting against?  No one does any computation or data dependent loads based on the value being read.  Or are you worried about ASLR leaks due to merely speculatively loading from a bogus address.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4.1 07/10] x86: narrow out of bounds syscalls to sys_read under speculation
  2018-01-21 19:16     ` [kernel-hardening] " Andy Lutomirski
@ 2018-01-21 19:25       ` Andy Lutomirski
  -1 siblings, 0 replies; 48+ messages in thread
From: Andy Lutomirski @ 2018-01-21 19:25 UTC (permalink / raw)
  To: Dan Williams
  Cc: tglx, linux-arch, kernel-hardening, gregkh, x86, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, torvalds, alan



> On Jan 21, 2018, at 11:16 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> 
> 
> 
>> On Jan 20, 2018, at 1:06 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>> 
>> The syscall table base is a user controlled function pointer in kernel
>> space. Like, 'get_user, use 'MASK_NOSPEC' to prevent any out of bounds
>> speculation. While retpoline prevents speculating into the user
>> controlled target it does not stop the pointer de-reference, the concern
>> is leaking memory relative to the syscall table base.
>> 
>> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: x86@kernel.org
>> Cc: Andy Lutomirski <luto@kernel.org>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> ---
>> arch/x86/entry/entry_64.S   |    2 ++
>> arch/x86/include/asm/smap.h |    9 ++++++++-
>> 2 files changed, 10 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>> index 63f4320602a3..584f6d2236b3 100644
>> --- a/arch/x86/entry/entry_64.S
>> +++ b/arch/x86/entry/entry_64.S
>> @@ -35,6 +35,7 @@
>> #include <asm/asm.h>
>> #include <asm/smap.h>
>> #include <asm/pgtable_types.h>
>> +#include <asm/smap.h>
>> #include <asm/export.h>
>> #include <asm/frame.h>
>> #include <asm/nospec-branch.h>
>> @@ -260,6 +261,7 @@ entry_SYSCALL_64_fastpath:
>>   cmpl    $__NR_syscall_max, %eax
>> #endif
>>   ja    1f                /* return -ENOSYS (already in pt_regs->ax) */
>> +    MASK_NOSPEC %r11 %rax            /* sanitize syscall_nr wrt speculation */
> 
> What's the threat you're protecting against?  No one does any computation or data dependent loads based on the value being read.  Or are you worried about ASLR leaks due to merely speculatively loading from a bogus address.

Also, the actual guard you're using is unfortunate.  At the very least, it's a maintenance disaster.  Get rid of the macro and open code it.  It's two instructions and it has crazy clobber requirements *and* a flag dependency.

But I think it's also just a bad way to do it.  You're making the dependency chain longer in a place where it matters, and the retpoline prevents the CPU from hiding that latency.  Just use an immediate mask with andq.  Realistically, you should be fine rounding the table size up to a power of two without even extending the table, since I think this, at best, protects against ASLR bypass.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: [PATCH v4.1 07/10] x86: narrow out of bounds syscalls to sys_read under speculation
@ 2018-01-21 19:25       ` Andy Lutomirski
  0 siblings, 0 replies; 48+ messages in thread
From: Andy Lutomirski @ 2018-01-21 19:25 UTC (permalink / raw)
  To: Dan Williams
  Cc: tglx, linux-arch, kernel-hardening, gregkh, x86, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, torvalds, alan



> On Jan 21, 2018, at 11:16 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> 
> 
> 
>> On Jan 20, 2018, at 1:06 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>> 
>> The syscall table base is a user controlled function pointer in kernel
>> space. Like, 'get_user, use 'MASK_NOSPEC' to prevent any out of bounds
>> speculation. While retpoline prevents speculating into the user
>> controlled target it does not stop the pointer de-reference, the concern
>> is leaking memory relative to the syscall table base.
>> 
>> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: x86@kernel.org
>> Cc: Andy Lutomirski <luto@kernel.org>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> ---
>> arch/x86/entry/entry_64.S   |    2 ++
>> arch/x86/include/asm/smap.h |    9 ++++++++-
>> 2 files changed, 10 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>> index 63f4320602a3..584f6d2236b3 100644
>> --- a/arch/x86/entry/entry_64.S
>> +++ b/arch/x86/entry/entry_64.S
>> @@ -35,6 +35,7 @@
>> #include <asm/asm.h>
>> #include <asm/smap.h>
>> #include <asm/pgtable_types.h>
>> +#include <asm/smap.h>
>> #include <asm/export.h>
>> #include <asm/frame.h>
>> #include <asm/nospec-branch.h>
>> @@ -260,6 +261,7 @@ entry_SYSCALL_64_fastpath:
>>   cmpl    $__NR_syscall_max, %eax
>> #endif
>>   ja    1f                /* return -ENOSYS (already in pt_regs->ax) */
>> +    MASK_NOSPEC %r11 %rax            /* sanitize syscall_nr wrt speculation */
> 
> What's the threat you're protecting against?  No one does any computation or data dependent loads based on the value being read.  Or are you worried about ASLR leaks due to merely speculatively loading from a bogus address.

Also, the actual guard you're using is unfortunate.  At the very least, it's a maintenance disaster.  Get rid of the macro and open code it.  It's two instructions and it has crazy clobber requirements *and* a flag dependency.

But I think it's also just a bad way to do it.  You're making the dependency chain longer in a place where it matters, and the retpoline prevents the CPU from hiding that latency.  Just use an immediate mask with andq.  Realistically, you should be fine rounding the table size up to a power of two without even extending the table, since I think this, at best, protects against ASLR bypass.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [kernel-hardening] Re: [PATCH v4.1 07/10] x86: narrow out of bounds syscalls to sys_read under speculation
  2018-01-21 19:16     ` [kernel-hardening] " Andy Lutomirski
  (?)
  (?)
@ 2018-01-21 19:31     ` Jann Horn
  2018-01-22  1:38       ` Andy Lutomirski
  -1 siblings, 1 reply; 48+ messages in thread
From: Jann Horn @ 2018-01-21 19:31 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dan Williams, Thomas Gleixner, linux-arch, Kernel Hardening,
	Greg Kroah-Hartman, the arch/x86 maintainers, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Linus Torvalds, alan

On Sun, Jan 21, 2018 at 8:16 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Jan 20, 2018, at 1:06 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>> The syscall table base is a user controlled function pointer in kernel
>> space. Like, 'get_user, use 'MASK_NOSPEC' to prevent any out of bounds
>> speculation. While retpoline prevents speculating into the user
>> controlled target it does not stop the pointer de-reference, the concern
>> is leaking memory relative to the syscall table base.
>>
>> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: x86@kernel.org
>> Cc: Andy Lutomirski <luto@kernel.org>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> ---
>> arch/x86/entry/entry_64.S   |    2 ++
>> arch/x86/include/asm/smap.h |    9 ++++++++-
>> 2 files changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>> index 63f4320602a3..584f6d2236b3 100644
>> --- a/arch/x86/entry/entry_64.S
>> +++ b/arch/x86/entry/entry_64.S
>> @@ -35,6 +35,7 @@
>> #include <asm/asm.h>
>> #include <asm/smap.h>
>> #include <asm/pgtable_types.h>
>> +#include <asm/smap.h>
>> #include <asm/export.h>
>> #include <asm/frame.h>
>> #include <asm/nospec-branch.h>
>> @@ -260,6 +261,7 @@ entry_SYSCALL_64_fastpath:
>>    cmpl    $__NR_syscall_max, %eax
>> #endif
>>    ja    1f                /* return -ENOSYS (already in pt_regs->ax) */
>> +    MASK_NOSPEC %r11 %rax            /* sanitize syscall_nr wrt speculation */
>
> What's the threat you're protecting against?  No one does any computation or data dependent loads based on the value being read.  Or are you worried about ASLR leaks due to merely speculatively loading from a bogus address.

AFAIU the threat is that you could potentially use variant 1 to get
speculative RIP control using the indirect call through the syscall
table. (I haven't tested myself whether that'd work, and I don't know
how well that would work especially when the limit for the comparison
is coming from an immediate, but apparently Dan thinks it's a
potential problem?)
In other words, this would again be an attack against the indirect
call "call *sys_call_table(, %rax, 8)", but this time, the attack
would rely on the indirect branch being resolved *properly* based on
the address read from memory instead of relying on a misprediction;
but using an index into the syscall array that is out of bounds, after
mispredicting the conditional jump for verifying that the syscall
number is in bounds. This could still work even with retpolines if the
CPU can correct mispredictions within speculative execution.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Re: [PATCH v4.1 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
@ 2018-01-21 20:15       ` Jann Horn
  0 siblings, 0 replies; 48+ messages in thread
From: Jann Horn @ 2018-01-21 20:15 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Dan Williams, Thomas Gleixner, linux-arch, Kernel Hardening,
	Peter Zijlstra, Catalin Marinas, the arch/x86 maintainers,
	Will Deacon, Ingo Molnar, Greg Kroah-Hartman, H. Peter Anvin,
	Linus Torvalds, alan

[whoops, resending as non-HTML mail]

On Sun, Jan 21, 2018 at 11:40 AM, Russell King - ARM Linux
<linux@armlinux.org.uk> wrote:
> On Sat, Jan 20, 2018 at 01:06:09PM -0800, Dan Williams wrote:
>> +/*
>> + * If idx is negative or if idx > size then bit 63 is set in the mask,
>> + * and the value of ~(-1L) is zero. When the mask is zero, bounds check
>> + * failed, array_ptr will return NULL.
>
> The more times I see this the more times I'm unhappy with this comment:
>
> 1. does this really mean "idx > size" or "idx >= size" ?  The code
>    implements the latter not the former.

Copying the code here for context:
return ~(long)(idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1);

That part of the condition (ignoring the overflow edgecases) is
equivalent to "!(idx > sz - 1)", which is equivalent to "idx <= sz -
1", which is (ignoring overflow edgecases) equivalent to "idx < sz".

Handling of edgecases:
idx>=2^(BITS_PER_LONG-1) will cause a NULL return through the first
part of the condition.
Hmm... a problematic case might be "sz==0 &&
idx==2^(BITS_PER_LONG-1)-1". The first part of the expression wouldn't
trigger, the second part would be
"2^(BITS_PER_LONG)-1-(2^(BITS_PER_LONG-1)-1) ==
2^(BITS_PER_LONG)-2^(BITS_PER_LONG-1) == 2^(BITS_PER_LONG-1)", which
also wouldn't trigger, I think?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [kernel-hardening] Re: [PATCH v4.1 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
@ 2018-01-21 20:15       ` Jann Horn
  0 siblings, 0 replies; 48+ messages in thread
From: Jann Horn @ 2018-01-21 20:15 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Dan Williams, Thomas Gleixner, linux-arch, Kernel Hardening,
	Peter Zijlstra, Catalin Marinas, the arch/x86 maintainers,
	Will Deacon, Ingo Molnar, Greg Kroah-Hartman, H. Peter Anvin,
	Linus Torvalds, alan

[whoops, resending as non-HTML mail]

On Sun, Jan 21, 2018 at 11:40 AM, Russell King - ARM Linux
<linux@armlinux.org.uk> wrote:
> On Sat, Jan 20, 2018 at 01:06:09PM -0800, Dan Williams wrote:
>> +/*
>> + * If idx is negative or if idx > size then bit 63 is set in the mask,
>> + * and the value of ~(-1L) is zero. When the mask is zero, bounds check
>> + * failed, array_ptr will return NULL.
>
> The more times I see this the more times I'm unhappy with this comment:
>
> 1. does this really mean "idx > size" or "idx >= size" ?  The code
>    implements the latter not the former.

Copying the code here for context:
return ~(long)(idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1);

That part of the condition (ignoring the overflow edgecases) is
equivalent to "!(idx > sz - 1)", which is equivalent to "idx <= sz -
1", which is (ignoring overflow edgecases) equivalent to "idx < sz".

Handling of edgecases:
idx>=2^(BITS_PER_LONG-1) will cause a NULL return through the first
part of the condition.
Hmm... a problematic case might be "sz==0 &&
idx==2^(BITS_PER_LONG-1)-1". The first part of the expression wouldn't
trigger, the second part would be
"2^(BITS_PER_LONG)-1-(2^(BITS_PER_LONG-1)-1) ==
2^(BITS_PER_LONG)-2^(BITS_PER_LONG-1) == 2^(BITS_PER_LONG-1)", which
also wouldn't trigger, I think?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [kernel-hardening] Re: [PATCH v4.1 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
  2018-01-21 20:15       ` [kernel-hardening] " Jann Horn
  (?)
@ 2018-01-21 20:24       ` Jann Horn
  -1 siblings, 0 replies; 48+ messages in thread
From: Jann Horn @ 2018-01-21 20:24 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Dan Williams, Thomas Gleixner, linux-arch, Kernel Hardening,
	Peter Zijlstra, Catalin Marinas, the arch/x86 maintainers,
	Will Deacon, Ingo Molnar, Greg Kroah-Hartman, H. Peter Anvin,
	Linus Torvalds, alan

On Sun, Jan 21, 2018 at 9:15 PM, Jann Horn <jannh@google.com> wrote:
> [whoops, resending as non-HTML mail]
>
> On Sun, Jan 21, 2018 at 11:40 AM, Russell King - ARM Linux
> <linux@armlinux.org.uk> wrote:
>> On Sat, Jan 20, 2018 at 01:06:09PM -0800, Dan Williams wrote:
>>> +/*
>>> + * If idx is negative or if idx > size then bit 63 is set in the mask,
>>> + * and the value of ~(-1L) is zero. When the mask is zero, bounds check
>>> + * failed, array_ptr will return NULL.
>>
>> The more times I see this the more times I'm unhappy with this comment:
>>
>> 1. does this really mean "idx > size" or "idx >= size" ?  The code
>>    implements the latter not the former.
>
> Copying the code here for context:
> return ~(long)(idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1);
>
> That part of the condition (ignoring the overflow edgecases) is
> equivalent to "!(idx > sz - 1)", which is equivalent to "idx <= sz -
> 1", which is (ignoring overflow edgecases) equivalent to "idx < sz".
>
> Handling of edgecases:
> idx>=2^(BITS_PER_LONG-1) will cause a NULL return through the first
> part of the condition.
> Hmm... a problematic case might be "sz==0 &&
> idx==2^(BITS_PER_LONG-1)-1". The first part of the expression wouldn't
> trigger, the second part would be
> "2^(BITS_PER_LONG)-1-(2^(BITS_PER_LONG-1)-1) ==
> 2^(BITS_PER_LONG)-2^(BITS_PER_LONG-1) == 2^(BITS_PER_LONG-1)", which
> also wouldn't trigger, I think?

Er, of course 2^(BITS_PER_LONG-1) still has the high bit set.
Sorry for the noise.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [kernel-hardening] Re: [PATCH v4.1 07/10] x86: narrow out of bounds syscalls to sys_read under speculation
  2018-01-21 19:31     ` Jann Horn
@ 2018-01-22  1:38       ` Andy Lutomirski
  2018-01-22  2:04         ` Linus Torvalds
  0 siblings, 1 reply; 48+ messages in thread
From: Andy Lutomirski @ 2018-01-22  1:38 UTC (permalink / raw)
  To: Jann Horn
  Cc: Dan Williams, Thomas Gleixner, linux-arch, Kernel Hardening,
	Greg Kroah-Hartman, the arch/x86 maintainers, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Linus Torvalds, Alan Cox

On Sun, Jan 21, 2018 at 11:31 AM, Jann Horn <jannh@google.com> wrote:
> On Sun, Jan 21, 2018 at 8:16 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>> On Jan 20, 2018, at 1:06 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>>> The syscall table base is a user controlled function pointer in kernel
>>> space. Like, 'get_user, use 'MASK_NOSPEC' to prevent any out of bounds
>>> speculation. While retpoline prevents speculating into the user
>>> controlled target it does not stop the pointer de-reference, the concern
>>> is leaking memory relative to the syscall table base.
>>>
>>> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>> Cc: Ingo Molnar <mingo@redhat.com>
>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>>> Cc: x86@kernel.org
>>> Cc: Andy Lutomirski <luto@kernel.org>
>>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>>> ---
>>> arch/x86/entry/entry_64.S   |    2 ++
>>> arch/x86/include/asm/smap.h |    9 ++++++++-
>>> 2 files changed, 10 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>>> index 63f4320602a3..584f6d2236b3 100644
>>> --- a/arch/x86/entry/entry_64.S
>>> +++ b/arch/x86/entry/entry_64.S
>>> @@ -35,6 +35,7 @@
>>> #include <asm/asm.h>
>>> #include <asm/smap.h>
>>> #include <asm/pgtable_types.h>
>>> +#include <asm/smap.h>
>>> #include <asm/export.h>
>>> #include <asm/frame.h>
>>> #include <asm/nospec-branch.h>
>>> @@ -260,6 +261,7 @@ entry_SYSCALL_64_fastpath:
>>>    cmpl    $__NR_syscall_max, %eax
>>> #endif
>>>    ja    1f                /* return -ENOSYS (already in pt_regs->ax) */
>>> +    MASK_NOSPEC %r11 %rax            /* sanitize syscall_nr wrt speculation */
>>
>> What's the threat you're protecting against?  No one does any computation or data dependent loads based on the value being read.  Or are you worried about ASLR leaks due to merely speculatively loading from a bogus address.
>
> AFAIU the threat is that you could potentially use variant 1 to get
> speculative RIP control using the indirect call through the syscall
> table. (I haven't tested myself whether that'd work, and I don't know
> how well that would work especially when the limit for the comparison
> is coming from an immediate, but apparently Dan thinks it's a
> potential problem?)
> In other words, this would again be an attack against the indirect
> call "call *sys_call_table(, %rax, 8)", but this time, the attack
> would rely on the indirect branch being resolved *properly* based on
> the address read from memory instead of relying on a misprediction;
> but using an index into the syscall array that is out of bounds, after
> mispredicting the conditional jump for verifying that the syscall
> number is in bounds. This could still work even with retpolines if the
> CPU can correct mispredictions within speculative execution.

Fair enough.  That being said:

1. Intel needs to document this crap.  I don't want a situation where
we keep adding random mitigations in the hopes that we got them all.

2. Wny on Earth is MASK_NOSPEC in smap.h?

3. What's with sbb; and?  I can see two sane ways to do this.  One is
cmovaq [something safe], %rax, *in the #ifdef CONFIG_RETPOLINE block*.
(Or is cmov subject to incorrect speculation, too?  And, if cmov is,
why wouldn't sbb be just as bad?  At least using cmov here is two
whole ALU ops shorter.)  The other is andq $(syscall table size), %rax
(again inside the ifdef block) and to round the syscall table to one
less than a power of two.

In my mental model of how a CPU works, the cmov approach makes more
sense, since we're about to jump somewhere that depends on the syscall
nr, and any *the whole point* of the mitigation here is to avoid
jumping anywhere until the CPU has fully caught up with reality wrt
the syscall number.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [kernel-hardening] Re: [PATCH v4.1 07/10] x86: narrow out of bounds syscalls to sys_read under speculation
  2018-01-22  1:38       ` Andy Lutomirski
@ 2018-01-22  2:04         ` Linus Torvalds
  2018-01-22  2:23           ` Andy Lutomirski
  2018-01-22 11:59           ` Samuel Neves
  0 siblings, 2 replies; 48+ messages in thread
From: Linus Torvalds @ 2018-01-22  2:04 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jann Horn, Dan Williams, Thomas Gleixner, linux-arch,
	Kernel Hardening, Greg Kroah-Hartman, the arch/x86 maintainers,
	Ingo Molnar, H. Peter Anvin, Alan Cox

On Sun, Jan 21, 2018 at 5:38 PM, Andy Lutomirski <luto@kernel.org> wrote:
>
> 3. What's with sbb; and?  I can see two sane ways to do this.  One is
> cmovaq [something safe], %rax,

Heh. I think it's partly about being old-fashioned. sbb has always
been around, and is the traditional trick for 0/-1.

Also, my original suggested thing did the *access* too, and masked the
result with the same mask.

But I guess we could use cmov instead. It has very similar performance
(ie it was relatively slow on P4, but so was sbb).

However, I suspect it actually has a slightly higher register
pressure, since you'd need to have that zero register (zero being the
"safe" value), and the only good way to get a zero value is the xor
thing, which affects flags and thus needs to be before the cmp.

In contrast, the sbb trick has no early inputs needed.

So on the whole, 'cmov' may be more natural on a conceptual level, but
the sbb trick really is a very "traditional x86 thing" to do.

               Linus

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [kernel-hardening] Re: [PATCH v4.1 07/10] x86: narrow out of bounds syscalls to sys_read under speculation
  2018-01-22  2:04         ` Linus Torvalds
@ 2018-01-22  2:23           ` Andy Lutomirski
  2018-01-22 11:59           ` Samuel Neves
  1 sibling, 0 replies; 48+ messages in thread
From: Andy Lutomirski @ 2018-01-22  2:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Jann Horn, Dan Williams, Thomas Gleixner,
	linux-arch, Kernel Hardening, Greg Kroah-Hartman,
	the arch/x86 maintainers, Ingo Molnar, H. Peter Anvin, Alan Cox

On Sun, Jan 21, 2018 at 6:04 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Sun, Jan 21, 2018 at 5:38 PM, Andy Lutomirski <luto@kernel.org> wrote:
>>
>> 3. What's with sbb; and?  I can see two sane ways to do this.  One is
>> cmovaq [something safe], %rax,
>
> Heh. I think it's partly about being old-fashioned. sbb has always
> been around, and is the traditional trick for 0/-1.
>
> Also, my original suggested thing did the *access* too, and masked the
> result with the same mask.
>
> But I guess we could use cmov instead. It has very similar performance
> (ie it was relatively slow on P4, but so was sbb).
>
> However, I suspect it actually has a slightly higher register
> pressure, since you'd need to have that zero register (zero being the
> "safe" value), and the only good way to get a zero value is the xor
> thing, which affects flags and thus needs to be before the cmp.
>
> In contrast, the sbb trick has no early inputs needed.
>
> So on the whole, 'cmov' may be more natural on a conceptual level, but
> the sbb trick really is a very "traditional x86 thing" to do.

Fair enough.

That being said, what I *actually* want to do is to nuke this thing
entirely.  I just wrote a patch to turn off the SYSCALL64 fast path
entirely when retpolines are on.  Then this issue can be dealt with in
C.  I assume someone has a brilliant way to make gcc automatically do
something intelligent about guarded array access in C. </snicker>
Seriously, though, the retpolined fast path is actually slower than
the slow path on a "minimal" retpoline kernel (which is what I'm using
because Fedora hasn't pushed out a retpoline compiler yet), and I
doubt it's more than the tiniest speedup on a full retpoline kernel.

I've read a bunch of emails flying around saying that retpolines
aren't that bad.  In very informal benchmarking, a single mispredicted
ret (which is what a retpoline is) seems to take 20-30 cycles on
Skylake.  That's pretty far from "not bad".  Is IBRS somehow doing
something that adversely affects code that doesn't use indirect
branches?  Because I'm having a bit of a hard time imagining IBRS
hurting indirect branches worse than retpolines do.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [kernel-hardening] Re: [PATCH v4.1 07/10] x86: narrow out of bounds syscalls to sys_read under speculation
  2018-01-22  2:04         ` Linus Torvalds
  2018-01-22  2:23           ` Andy Lutomirski
@ 2018-01-22 11:59           ` Samuel Neves
  1 sibling, 0 replies; 48+ messages in thread
From: Samuel Neves @ 2018-01-22 11:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Jann Horn, Dan Williams, Thomas Gleixner,
	linux-arch, Kernel Hardening, Greg Kroah-Hartman,
	the arch/x86 maintainers, Ingo Molnar, H. Peter Anvin, Alan Cox

On Mon, Jan 22, 2018 at 2:04 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> However, I suspect it actually has a slightly higher register
> pressure, since you'd need to have that zero register (zero being the
> "safe" value), and the only good way to get a zero value is the xor
> thing, which affects flags and thus needs to be before the cmp.
>
> In contrast, the sbb trick has no early inputs needed.

On the flipside, sbb carries a false dependency [*] on the destination
register. Imagine something like

divq %rcx
...
cmpq %rdi, %rsi
sbbq %rax,%rax

sbb needs to wait not only for the comparison, but also for the
earlier unrelated slow division. On the other hand, zeroing with mov
or xor breaks dependencies on the destination register, making the
computation independent of what came before.

[*] Recent AMD chips are smart enough to understand the sbb r,r idiom
and ignore the value of r, but as far as I know none of the Intel
chips do.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4.1 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
  2018-01-21 10:40     ` [kernel-hardening] " Russell King - ARM Linux
@ 2018-01-22 18:07       ` Dan Williams
  -1 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2018-01-22 18:07 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Thomas Gleixner, linux-arch, Kernel Hardening, Peter Zijlstra,
	Catalin Marinas, X86 ML, Will Deacon, Ingo Molnar, Greg KH,
	H. Peter Anvin, Linus Torvalds, Alan Cox

On Sun, Jan 21, 2018 at 2:40 AM, Russell King - ARM Linux
<linux@armlinux.org.uk> wrote:
> On Sat, Jan 20, 2018 at 01:06:09PM -0800, Dan Williams wrote:
>> +/*
>> + * If idx is negative or if idx > size then bit 63 is set in the mask,
>> + * and the value of ~(-1L) is zero. When the mask is zero, bounds check
>> + * failed, array_ptr will return NULL.
>
> The more times I see this the more times I'm unhappy with this comment:
>
> 1. does this really mean "idx > size" or "idx >= size" ?  The code
>    implements the latter not the former.
>
> 2. is "bit 63" relevant here - what if longs are 32-bit?  "the top bit"
>    or "the sign bit" would be better.
>
> 3. "and the value of ~(-1L) is zero."  So does this mean that when
>    0 <= idx < size, somehow the rules of logic change and ~(-1L)
>    magically becomes no longer zero!
>
> I'd suggest changing the description to something like:
>
>   * If 0 <= idx < size, return a mask of ~0UL, otherwise return zero.
>
> or:
>
>   * When idx is out of bounds (iow, is negative or idx >= sz), the sign
>   * bit will be set. Extend the sign bit to all bits and invert, giving
>   * a result of zero for an out of bounds idx, or ~0UL if within bounds.
>
> depending on how deeply you want to describe what's going on here.

Sounds good Russell, I'll fold the longer description in for the next
version to save the next person needing to decode this technique.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: [PATCH v4.1 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
@ 2018-01-22 18:07       ` Dan Williams
  0 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2018-01-22 18:07 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Thomas Gleixner, linux-arch, Kernel Hardening, Peter Zijlstra,
	Catalin Marinas, X86 ML, Will Deacon, Ingo Molnar, Greg KH,
	H. Peter Anvin, Linus Torvalds, Alan Cox

On Sun, Jan 21, 2018 at 2:40 AM, Russell King - ARM Linux
<linux@armlinux.org.uk> wrote:
> On Sat, Jan 20, 2018 at 01:06:09PM -0800, Dan Williams wrote:
>> +/*
>> + * If idx is negative or if idx > size then bit 63 is set in the mask,
>> + * and the value of ~(-1L) is zero. When the mask is zero, bounds check
>> + * failed, array_ptr will return NULL.
>
> The more times I see this the more times I'm unhappy with this comment:
>
> 1. does this really mean "idx > size" or "idx >= size" ?  The code
>    implements the latter not the former.
>
> 2. is "bit 63" relevant here - what if longs are 32-bit?  "the top bit"
>    or "the sign bit" would be better.
>
> 3. "and the value of ~(-1L) is zero."  So does this mean that when
>    0 <= idx < size, somehow the rules of logic change and ~(-1L)
>    magically becomes no longer zero!
>
> I'd suggest changing the description to something like:
>
>   * If 0 <= idx < size, return a mask of ~0UL, otherwise return zero.
>
> or:
>
>   * When idx is out of bounds (iow, is negative or idx >= sz), the sign
>   * bit will be set. Extend the sign bit to all bits and invert, giving
>   * a result of zero for an out of bounds idx, or ~0UL if within bounds.
>
> depending on how deeply you want to describe what's going on here.

Sounds good Russell, I'll fold the longer description in for the next
version to save the next person needing to decode this technique.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4.1 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
  2018-01-22 18:07       ` [kernel-hardening] " Dan Williams
@ 2018-01-22 18:37         ` Linus Torvalds
  -1 siblings, 0 replies; 48+ messages in thread
From: Linus Torvalds @ 2018-01-22 18:37 UTC (permalink / raw)
  To: Dan Williams
  Cc: Russell King - ARM Linux, Thomas Gleixner, linux-arch,
	Kernel Hardening, Peter Zijlstra, Catalin Marinas, X86 ML,
	Will Deacon, Ingo Molnar, Greg KH, H. Peter Anvin, Alan Cox

On Mon, Jan 22, 2018 at 10:07 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Sun, Jan 21, 2018 at 2:40 AM, Russell King - ARM Linux
> <linux@armlinux.org.uk> wrote:
>>
>> I'd suggest changing the description to something like:
>>
>>   * If 0 <= idx < size, return a mask of ~0UL, otherwise return zero.

Yes. HOWEVER.

We should make it clear that the signedness of the comparisons is
undefined, and that 'size' should always be positive for the above to
be well-specified.

Why?

Because one valid implementation of the above is:

   idx U< size ? ~0UL : 0

as a single unsigned comparison (I'm using "U<" as a "unsigned less than").

But an equally valid implementation of it might be

   idx S>= 0 && idx S< size ? ~0UL : 0

which does two signed comparisons instead.

And they are not exactly the same. They are the same _IFF_ 'size' is
positive in 'long'. But if size is an unsigned value so big as to be
negative in 'long', they give different results for 'idx'.

In fact, the ALU-only version actually did a third version of the
above that only uses "signed comparison against zero":

   idx S> 0 && (size - idx - 1) S>= 0 ? ~0UL : 0

which again is equivalent only when 'size' is positive in 'long'.

Note that 'idx' itself can have any range (that's kind of the _point_,
after all: checking that idx is in some range). But the logic only
works for a positive array size.

Which honestly is not really a limitation, but it's worth spelling out, I think.

In particular, if you have a user pointer, and a 3G:1G split in
user:kernel address space, you camn *not* do something like

    uptr = array_ptr(NULL, userptr, TASK_SIZE);

to get a sanitized user pointer, because TASK_SIZE is not positive in 'long'.

Hmm?

                     Linus

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: [PATCH v4.1 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
@ 2018-01-22 18:37         ` Linus Torvalds
  0 siblings, 0 replies; 48+ messages in thread
From: Linus Torvalds @ 2018-01-22 18:37 UTC (permalink / raw)
  To: Dan Williams
  Cc: Russell King - ARM Linux, Thomas Gleixner, linux-arch,
	Kernel Hardening, Peter Zijlstra, Catalin Marinas, X86 ML,
	Will Deacon, Ingo Molnar, Greg KH, H. Peter Anvin, Alan Cox

On Mon, Jan 22, 2018 at 10:07 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Sun, Jan 21, 2018 at 2:40 AM, Russell King - ARM Linux
> <linux@armlinux.org.uk> wrote:
>>
>> I'd suggest changing the description to something like:
>>
>>   * If 0 <= idx < size, return a mask of ~0UL, otherwise return zero.

Yes. HOWEVER.

We should make it clear that the signedness of the comparisons is
undefined, and that 'size' should always be positive for the above to
be well-specified.

Why?

Because one valid implementation of the above is:

   idx U< size ? ~0UL : 0

as a single unsigned comparison (I'm using "U<" as a "unsigned less than").

But an equally valid implementation of it might be

   idx S>= 0 && idx S< size ? ~0UL : 0

which does two signed comparisons instead.

And they are not exactly the same. They are the same _IFF_ 'size' is
positive in 'long'. But if size is an unsigned value so big as to be
negative in 'long', they give different results for 'idx'.

In fact, the ALU-only version actually did a third version of the
above that only uses "signed comparison against zero":

   idx S> 0 && (size - idx - 1) S>= 0 ? ~0UL : 0

which again is equivalent only when 'size' is positive in 'long'.

Note that 'idx' itself can have any range (that's kind of the _point_,
after all: checking that idx is in some range). But the logic only
works for a positive array size.

Which honestly is not really a limitation, but it's worth spelling out, I think.

In particular, if you have a user pointer, and a 3G:1G split in
user:kernel address space, you camn *not* do something like

    uptr = array_ptr(NULL, userptr, TASK_SIZE);

to get a sanitized user pointer, because TASK_SIZE is not positive in 'long'.

Hmm?

                     Linus

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4.1 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
  2018-01-22 18:37         ` [kernel-hardening] " Linus Torvalds
@ 2018-01-22 18:52           ` Dan Williams
  -1 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2018-01-22 18:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Russell King - ARM Linux, Thomas Gleixner, linux-arch,
	Kernel Hardening, Peter Zijlstra, Catalin Marinas, X86 ML,
	Will Deacon, Ingo Molnar, Greg KH, H. Peter Anvin, Alan Cox

On Mon, Jan 22, 2018 at 10:37 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Jan 22, 2018 at 10:07 AM, Dan Williams <dan.j.williams@intel.com> wrote:
>> On Sun, Jan 21, 2018 at 2:40 AM, Russell King - ARM Linux
>> <linux@armlinux.org.uk> wrote:
>>>
>>> I'd suggest changing the description to something like:
>>>
>>>   * If 0 <= idx < size, return a mask of ~0UL, otherwise return zero.
>
> Yes. HOWEVER.
>
> We should make it clear that the signedness of the comparisons is
> undefined, and that 'size' should always be positive for the above to
> be well-specified.
>
> Why?
>
> Because one valid implementation of the above is:
>
>    idx U< size ? ~0UL : 0
>
> as a single unsigned comparison (I'm using "U<" as a "unsigned less than").
>
> But an equally valid implementation of it might be
>
>    idx S>= 0 && idx S< size ? ~0UL : 0
>
> which does two signed comparisons instead.
>
> And they are not exactly the same. They are the same _IFF_ 'size' is
> positive in 'long'. But if size is an unsigned value so big as to be
> negative in 'long', they give different results for 'idx'.
>
> In fact, the ALU-only version actually did a third version of the
> above that only uses "signed comparison against zero":
>
>    idx S> 0 && (size - idx - 1) S>= 0 ? ~0UL : 0
>
> which again is equivalent only when 'size' is positive in 'long'.
>
> Note that 'idx' itself can have any range (that's kind of the _point_,
> after all: checking that idx is in some range). But the logic only
> works for a positive array size.
>
> Which honestly is not really a limitation, but it's worth spelling out, I think.
>
> In particular, if you have a user pointer, and a 3G:1G split in
> user:kernel address space, you camn *not* do something like
>
>     uptr = array_ptr(NULL, userptr, TASK_SIZE);
>
> to get a sanitized user pointer, because TASK_SIZE is not positive in 'long'.
>
> Hmm?

We could at least have a BUILD_BUG_ON when 'size' is a
builtin_contstant and greater than LONG_MAX. Alternatively we could
require archs to provide the equivalent of the x86 array_ptr_mask()
that does not have the LONG_MAX limitation?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: [PATCH v4.1 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
@ 2018-01-22 18:52           ` Dan Williams
  0 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2018-01-22 18:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Russell King - ARM Linux, Thomas Gleixner, linux-arch,
	Kernel Hardening, Peter Zijlstra, Catalin Marinas, X86 ML,
	Will Deacon, Ingo Molnar, Greg KH, H. Peter Anvin, Alan Cox

On Mon, Jan 22, 2018 at 10:37 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Jan 22, 2018 at 10:07 AM, Dan Williams <dan.j.williams@intel.com> wrote:
>> On Sun, Jan 21, 2018 at 2:40 AM, Russell King - ARM Linux
>> <linux@armlinux.org.uk> wrote:
>>>
>>> I'd suggest changing the description to something like:
>>>
>>>   * If 0 <= idx < size, return a mask of ~0UL, otherwise return zero.
>
> Yes. HOWEVER.
>
> We should make it clear that the signedness of the comparisons is
> undefined, and that 'size' should always be positive for the above to
> be well-specified.
>
> Why?
>
> Because one valid implementation of the above is:
>
>    idx U< size ? ~0UL : 0
>
> as a single unsigned comparison (I'm using "U<" as a "unsigned less than").
>
> But an equally valid implementation of it might be
>
>    idx S>= 0 && idx S< size ? ~0UL : 0
>
> which does two signed comparisons instead.
>
> And they are not exactly the same. They are the same _IFF_ 'size' is
> positive in 'long'. But if size is an unsigned value so big as to be
> negative in 'long', they give different results for 'idx'.
>
> In fact, the ALU-only version actually did a third version of the
> above that only uses "signed comparison against zero":
>
>    idx S> 0 && (size - idx - 1) S>= 0 ? ~0UL : 0
>
> which again is equivalent only when 'size' is positive in 'long'.
>
> Note that 'idx' itself can have any range (that's kind of the _point_,
> after all: checking that idx is in some range). But the logic only
> works for a positive array size.
>
> Which honestly is not really a limitation, but it's worth spelling out, I think.
>
> In particular, if you have a user pointer, and a 3G:1G split in
> user:kernel address space, you camn *not* do something like
>
>     uptr = array_ptr(NULL, userptr, TASK_SIZE);
>
> to get a sanitized user pointer, because TASK_SIZE is not positive in 'long'.
>
> Hmm?

We could at least have a BUILD_BUG_ON when 'size' is a
builtin_contstant and greater than LONG_MAX. Alternatively we could
require archs to provide the equivalent of the x86 array_ptr_mask()
that does not have the LONG_MAX limitation?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [kernel-hardening] [PATCH v4.1 01/10] Documentation: document array_ptr
  2018-01-20 21:06   ` [kernel-hardening] " Dan Williams
  (?)
@ 2018-01-22 19:00   ` Jann Horn
  -1 siblings, 0 replies; 48+ messages in thread
From: Jann Horn @ 2018-01-22 19:00 UTC (permalink / raw)
  To: Dan Williams
  Cc: Thomas Gleixner, Mark Rutland, linux-arch, Kees Cook,
	Kernel Hardening, Peter Zijlstra, Greg Kroah-Hartman,
	Jonathan Corbet, Will Deacon, Linus Torvalds, alan

On Sat, Jan 20, 2018 at 10:06 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> From: Mark Rutland <mark.rutland@arm.com>
>
> Document the rationale and usage of the new array_ptr() helper.
[...]
> +A call to array_ptr(arr, idx, sz) returns a sanitized pointer to
> +arr[idx] only if idx falls in the [0, sz) interval. When idx < 0 or idx > sz,

"or idx >= sz"?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4.1 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
  2018-01-22 18:52           ` [kernel-hardening] " Dan Williams
@ 2018-01-23 10:17             ` Will Deacon
  -1 siblings, 0 replies; 48+ messages in thread
From: Will Deacon @ 2018-01-23 10:17 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linus Torvalds, Russell King - ARM Linux, Thomas Gleixner,
	linux-arch, Kernel Hardening, Peter Zijlstra, Catalin Marinas,
	X86 ML, Ingo Molnar, Greg KH, H. Peter Anvin, Alan Cox

On Mon, Jan 22, 2018 at 10:52:54AM -0800, Dan Williams wrote:
> On Mon, Jan 22, 2018 at 10:37 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > Note that 'idx' itself can have any range (that's kind of the _point_,
> > after all: checking that idx is in some range). But the logic only
> > works for a positive array size.
> >
> > Which honestly is not really a limitation, but it's worth spelling out, I think.

Agreed. We should be absolutely clear about when this thing works and when
it doesn't.

> > In particular, if you have a user pointer, and a 3G:1G split in
> > user:kernel address space, you camn *not* do something like
> >
> >     uptr = array_ptr(NULL, userptr, TASK_SIZE);
> >
> > to get a sanitized user pointer, because TASK_SIZE is not positive in 'long'.
> >
> > Hmm?
> 
> We could at least have a BUILD_BUG_ON when 'size' is a
> builtin_contstant and greater than LONG_MAX. Alternatively we could
> require archs to provide the equivalent of the x86 array_ptr_mask()
> that does not have the LONG_MAX limitation?

I'd rather avoid imposing that limitation and instead just treat uaccess
specially. It looks like we can satisfy the current definition of
array_ptr_mask with three instructions, but I'm not sure how we could get it
working on arm64 if we needed to deal with arbitrary sizes (particularly as
we're trying to avoid conditional instructions).

Will

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: [PATCH v4.1 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
@ 2018-01-23 10:17             ` Will Deacon
  0 siblings, 0 replies; 48+ messages in thread
From: Will Deacon @ 2018-01-23 10:17 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linus Torvalds, Russell King - ARM Linux, Thomas Gleixner,
	linux-arch, Kernel Hardening, Peter Zijlstra, Catalin Marinas,
	X86 ML, Ingo Molnar, Greg KH, H. Peter Anvin, Alan Cox

On Mon, Jan 22, 2018 at 10:52:54AM -0800, Dan Williams wrote:
> On Mon, Jan 22, 2018 at 10:37 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > Note that 'idx' itself can have any range (that's kind of the _point_,
> > after all: checking that idx is in some range). But the logic only
> > works for a positive array size.
> >
> > Which honestly is not really a limitation, but it's worth spelling out, I think.

Agreed. We should be absolutely clear about when this thing works and when
it doesn't.

> > In particular, if you have a user pointer, and a 3G:1G split in
> > user:kernel address space, you camn *not* do something like
> >
> >     uptr = array_ptr(NULL, userptr, TASK_SIZE);
> >
> > to get a sanitized user pointer, because TASK_SIZE is not positive in 'long'.
> >
> > Hmm?
> 
> We could at least have a BUILD_BUG_ON when 'size' is a
> builtin_contstant and greater than LONG_MAX. Alternatively we could
> require archs to provide the equivalent of the x86 array_ptr_mask()
> that does not have the LONG_MAX limitation?

I'd rather avoid imposing that limitation and instead just treat uaccess
specially. It looks like we can satisfy the current definition of
array_ptr_mask with three instructions, but I'm not sure how we could get it
working on arm64 if we needed to deal with arbitrary sizes (particularly as
we're trying to avoid conditional instructions).

Will

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2018-01-23 10:17 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-20 21:05 [PATCH v4.1 00/10] spectre variant1 mitigations for tip/x86/pti Dan Williams
2018-01-20 21:05 ` [kernel-hardening] " Dan Williams
2018-01-20 21:05 ` Dan Williams
2018-01-20 21:06 ` [PATCH v4.1 01/10] Documentation: document array_ptr Dan Williams
2018-01-20 21:06   ` [kernel-hardening] " Dan Williams
2018-01-22 19:00   ` Jann Horn
2018-01-20 21:06 ` [PATCH v4.1 02/10] asm/nospec, array_ptr: sanitize speculative array de-references Dan Williams
2018-01-20 21:06   ` [kernel-hardening] " Dan Williams
2018-01-21 10:40   ` Russell King - ARM Linux
2018-01-21 10:40     ` [kernel-hardening] " Russell King - ARM Linux
2018-01-21 15:07     ` Jann Horn
2018-01-21 15:07       ` [kernel-hardening] " Jann Horn
2018-01-21 20:15     ` Jann Horn
2018-01-21 20:15       ` [kernel-hardening] " Jann Horn
2018-01-21 20:24       ` Jann Horn
2018-01-22 18:07     ` Dan Williams
2018-01-22 18:07       ` [kernel-hardening] " Dan Williams
2018-01-22 18:37       ` Linus Torvalds
2018-01-22 18:37         ` [kernel-hardening] " Linus Torvalds
2018-01-22 18:52         ` Dan Williams
2018-01-22 18:52           ` [kernel-hardening] " Dan Williams
2018-01-23 10:17           ` Will Deacon
2018-01-23 10:17             ` [kernel-hardening] " Will Deacon
2018-01-20 21:06 ` [PATCH v4.1 03/10] x86: implement array_ptr_mask() Dan Williams
2018-01-20 21:06   ` [kernel-hardening] " Dan Williams
2018-01-20 21:06 ` [PATCH v4.1 04/10] x86: introduce __uaccess_begin_nospec and ifence Dan Williams
2018-01-20 21:06   ` [kernel-hardening] " Dan Williams
2018-01-20 21:06 ` [PATCH v4.1 05/10] x86, __get_user: use __uaccess_begin_nospec Dan Williams
2018-01-20 21:06   ` [kernel-hardening] " Dan Williams
2018-01-20 21:06 ` [PATCH v4.1 06/10] x86, get_user: use pointer masking to limit speculation Dan Williams
2018-01-20 21:06   ` [kernel-hardening] " Dan Williams
2018-01-20 21:06 ` [PATCH v4.1 07/10] x86: narrow out of bounds syscalls to sys_read under speculation Dan Williams
2018-01-20 21:06   ` [kernel-hardening] " Dan Williams
2018-01-21 19:16   ` Andy Lutomirski
2018-01-21 19:16     ` [kernel-hardening] " Andy Lutomirski
2018-01-21 19:25     ` Andy Lutomirski
2018-01-21 19:25       ` [kernel-hardening] " Andy Lutomirski
2018-01-21 19:31     ` Jann Horn
2018-01-22  1:38       ` Andy Lutomirski
2018-01-22  2:04         ` Linus Torvalds
2018-01-22  2:23           ` Andy Lutomirski
2018-01-22 11:59           ` Samuel Neves
2018-01-20 21:06 ` [PATCH v4.1 08/10] vfs, fdtable: prevent bounds-check bypass via speculative execution Dan Williams
2018-01-20 21:06   ` [kernel-hardening] " Dan Williams
2018-01-20 21:06 ` [PATCH v4.1 09/10] kvm, x86: update spectre-v1 mitigation Dan Williams
2018-01-20 21:06   ` [kernel-hardening] " Dan Williams
2018-01-20 21:06 ` [PATCH v4.1 10/10] nl80211: sanitize array index in parse_txq_params Dan Williams
2018-01-20 21:06   ` [kernel-hardening] " Dan Williams

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.