All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/10] prevent bounds-check bypass via speculative execution
@ 2018-01-19  0:01 ` Dan Williams
  0 siblings, 0 replies; 70+ messages in thread
From: Dan Williams @ 2018-01-19  0:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mark Rutland, kernel-hardening, Peter Zijlstra, Catalin Marinas,
	Will Deacon, H. Peter Anvin, Elena Reshetova, linux-arch,
	Andi Kleen, Jonathan Corbet, x86, Russell King, Ingo Molnar,
	Andrew Honig, alan, Tom Lendacky, Kees Cook, Al Viro,
	Andy Lutomirski, tglx, akpm, Jim Mattson, Christian Lamparter,
	gregkh, linux-wireless, stable, Paolo Bonzini, Johannes Berg,
	torvalds, David S. Miller

Changes since v3 [1]
* Drop 'ifence_array_ptr' and associated compile-time + run-time
  switching and just use the masking approach all the time.

* Convert 'get_user' to use pointer sanitization via masking rather than
  lfence. '__get_user' and associated paths still rely on
  lfence. (Linus)

      "Basically, the rule is trivial: find all 'stac' users, and use
       address masking if those users already integrate the limit
       check, and lfence they don't."

* At syscall entry sanitize the syscall number under speculation
  to remove a user controlled pointer de-reference in kernel
  space.  (Linus)

* Fix a raw lfence in the kvm code (added for v4.15-rc8) to use
  'array_ptr'.

* Propose 'array_idx' as a way to sanitize user input that is
  later used as an array index, but where the validation is
  happening in a different code block than the array reference.
  (Christian).

* Fix grammar in speculation.txt (Kees)

---

Quoting Mark's original RFC:

"Recently, Google Project Zero discovered several classes of attack
against speculative execution. One of these, known as variant-1, allows
explicit bounds checks to be bypassed under speculation, providing an
arbitrary read gadget. Further details can be found on the GPZ blog [2]
and the Documentation patch in this series."

A precondition of using this attack on the kernel is to get a user
controlled pointer de-referenced (under speculation) in privileged code.
The primary source of user controlled pointers in the kernel is the
arguments passed to 'get_user' and '__get_user'. An example of other
user controlled pointers are user-controlled array / pointer offsets.

Better tooling is needed to find more arrays / pointers with user
controlled indices / offsets that can be converted to use 'array_ptr' or
'array_idx'. A few are included in this set, and these are not expected
to be complete. That said, the 'get_user' protections raise the bar on
finding a vulnerable gadget in the kernel.

These patches are also available via the 'nospec-v4' git branch here:

    git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v4

Note that the BPF fix for Spectre variant1 is merged for 4.15-rc8.

[2]: https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html

---

Dan Williams (9):
      asm/nospec, array_ptr: sanitize speculative array de-references
      x86: implement array_ptr_mask()
      x86: introduce __uaccess_begin_nospec and ifence
      x86, __get_user: use __uaccess_begin_nospec
      x86, get_user: use pointer masking to limit speculation
      x86: narrow out of bounds syscalls to sys_read under speculation
      vfs, fdtable: prevent bounds-check bypass via speculative execution
      kvm, x86: fix spectre-v1 mitigation
      nl80211: sanitize array index in parse_txq_params

Mark Rutland (1):
      Documentation: document array_ptr


 Documentation/speculation.txt     |  143 +++++++++++++++++++++++++++++++++++++
 arch/x86/entry/entry_64.S         |    2 +
 arch/x86/include/asm/barrier.h    |   28 +++++++
 arch/x86/include/asm/msr.h        |    3 -
 arch/x86/include/asm/smap.h       |   24 ++++++
 arch/x86/include/asm/uaccess.h    |   15 +++-
 arch/x86/include/asm/uaccess_32.h |    6 +-
 arch/x86/include/asm/uaccess_64.h |   12 ++-
 arch/x86/kvm/vmx.c                |   19 ++---
 arch/x86/lib/getuser.S            |    5 +
 arch/x86/lib/usercopy_32.c        |    8 +-
 include/linux/fdtable.h           |    7 +-
 include/linux/nospec.h            |   65 +++++++++++++++++
 net/wireless/nl80211.c            |   10 ++-
 14 files changed, 312 insertions(+), 35 deletions(-)
 create mode 100644 Documentation/speculation.txt
 create mode 100644 include/linux/nospec.h

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v4 00/10] prevent bounds-check bypass via speculative execution
@ 2018-01-19  0:01 ` Dan Williams
  0 siblings, 0 replies; 70+ messages in thread
From: Dan Williams @ 2018-01-19  0:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mark Rutland, kernel-hardening, Peter Zijlstra, Catalin Marinas,
	Will Deacon, H. Peter Anvin, Elena Reshetova, linux-arch,
	Andi Kleen, Jonathan Corbet, x86, Russell King, Ingo Molnar,
	Andrew Honig, alan, Tom Lendacky, Kees Cook, Al Viro,
	Andy Lutomirski, tglx, akpm, Jim Mattson, Christian Lamparter

Changes since v3 [1]
* Drop 'ifence_array_ptr' and associated compile-time + run-time
  switching and just use the masking approach all the time.

* Convert 'get_user' to use pointer sanitization via masking rather than
  lfence. '__get_user' and associated paths still rely on
  lfence. (Linus)

      "Basically, the rule is trivial: find all 'stac' users, and use
       address masking if those users already integrate the limit
       check, and lfence they don't."

* At syscall entry sanitize the syscall number under speculation
  to remove a user controlled pointer de-reference in kernel
  space.  (Linus)

* Fix a raw lfence in the kvm code (added for v4.15-rc8) to use
  'array_ptr'.

* Propose 'array_idx' as a way to sanitize user input that is
  later used as an array index, but where the validation is
  happening in a different code block than the array reference.
  (Christian).

* Fix grammar in speculation.txt (Kees)

---

Quoting Mark's original RFC:

"Recently, Google Project Zero discovered several classes of attack
against speculative execution. One of these, known as variant-1, allows
explicit bounds checks to be bypassed under speculation, providing an
arbitrary read gadget. Further details can be found on the GPZ blog [2]
and the Documentation patch in this series."

A precondition of using this attack on the kernel is to get a user
controlled pointer de-referenced (under speculation) in privileged code.
The primary source of user controlled pointers in the kernel is the
arguments passed to 'get_user' and '__get_user'. An example of other
user controlled pointers are user-controlled array / pointer offsets.

Better tooling is needed to find more arrays / pointers with user
controlled indices / offsets that can be converted to use 'array_ptr' or
'array_idx'. A few are included in this set, and these are not expected
to be complete. That said, the 'get_user' protections raise the bar on
finding a vulnerable gadget in the kernel.

These patches are also available via the 'nospec-v4' git branch here:

    git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v4

Note that the BPF fix for Spectre variant1 is merged for 4.15-rc8.

[2]: https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html

---

Dan Williams (9):
      asm/nospec, array_ptr: sanitize speculative array de-references
      x86: implement array_ptr_mask()
      x86: introduce __uaccess_begin_nospec and ifence
      x86, __get_user: use __uaccess_begin_nospec
      x86, get_user: use pointer masking to limit speculation
      x86: narrow out of bounds syscalls to sys_read under speculation
      vfs, fdtable: prevent bounds-check bypass via speculative execution
      kvm, x86: fix spectre-v1 mitigation
      nl80211: sanitize array index in parse_txq_params

Mark Rutland (1):
      Documentation: document array_ptr


 Documentation/speculation.txt     |  143 +++++++++++++++++++++++++++++++++++++
 arch/x86/entry/entry_64.S         |    2 +
 arch/x86/include/asm/barrier.h    |   28 +++++++
 arch/x86/include/asm/msr.h        |    3 -
 arch/x86/include/asm/smap.h       |   24 ++++++
 arch/x86/include/asm/uaccess.h    |   15 +++-
 arch/x86/include/asm/uaccess_32.h |    6 +-
 arch/x86/include/asm/uaccess_64.h |   12 ++-
 arch/x86/kvm/vmx.c                |   19 ++---
 arch/x86/lib/getuser.S            |    5 +
 arch/x86/lib/usercopy_32.c        |    8 +-
 include/linux/fdtable.h           |    7 +-
 include/linux/nospec.h            |   65 +++++++++++++++++
 net/wireless/nl80211.c            |   10 ++-
 14 files changed, 312 insertions(+), 35 deletions(-)
 create mode 100644 Documentation/speculation.txt
 create mode 100644 include/linux/nospec.h

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [kernel-hardening] [PATCH v4 00/10] prevent bounds-check bypass via speculative execution
@ 2018-01-19  0:01 ` Dan Williams
  0 siblings, 0 replies; 70+ messages in thread
From: Dan Williams @ 2018-01-19  0:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mark Rutland, kernel-hardening, Peter Zijlstra, Catalin Marinas,
	Will Deacon, H. Peter Anvin, Elena Reshetova, linux-arch,
	Andi Kleen, Jonathan Corbet, x86, Russell King, Ingo Molnar,
	Andrew Honig, alan, Tom Lendacky, Kees Cook, Al Viro,
	Andy Lutomirski, tglx, akpm, Jim Mattson, Christian Lamparter,
	gregkh, linux-wireless, stable, Paolo Bonzini, Johannes Berg,
	torvalds, David S. Miller

Changes since v3 [1]
* Drop 'ifence_array_ptr' and associated compile-time + run-time
  switching and just use the masking approach all the time.

* Convert 'get_user' to use pointer sanitization via masking rather than
  lfence. '__get_user' and associated paths still rely on
  lfence. (Linus)

      "Basically, the rule is trivial: find all 'stac' users, and use
       address masking if those users already integrate the limit
       check, and lfence they don't."

* At syscall entry sanitize the syscall number under speculation
  to remove a user controlled pointer de-reference in kernel
  space.  (Linus)

* Fix a raw lfence in the kvm code (added for v4.15-rc8) to use
  'array_ptr'.

* Propose 'array_idx' as a way to sanitize user input that is
  later used as an array index, but where the validation is
  happening in a different code block than the array reference.
  (Christian).

* Fix grammar in speculation.txt (Kees)

---

Quoting Mark's original RFC:

"Recently, Google Project Zero discovered several classes of attack
against speculative execution. One of these, known as variant-1, allows
explicit bounds checks to be bypassed under speculation, providing an
arbitrary read gadget. Further details can be found on the GPZ blog [2]
and the Documentation patch in this series."

A precondition of using this attack on the kernel is to get a user
controlled pointer de-referenced (under speculation) in privileged code.
The primary source of user controlled pointers in the kernel is the
arguments passed to 'get_user' and '__get_user'. An example of other
user controlled pointers are user-controlled array / pointer offsets.

Better tooling is needed to find more arrays / pointers with user
controlled indices / offsets that can be converted to use 'array_ptr' or
'array_idx'. A few are included in this set, and these are not expected
to be complete. That said, the 'get_user' protections raise the bar on
finding a vulnerable gadget in the kernel.

These patches are also available via the 'nospec-v4' git branch here:

    git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v4

Note that the BPF fix for Spectre variant1 is merged for 4.15-rc8.

[2]: https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html

---

Dan Williams (9):
      asm/nospec, array_ptr: sanitize speculative array de-references
      x86: implement array_ptr_mask()
      x86: introduce __uaccess_begin_nospec and ifence
      x86, __get_user: use __uaccess_begin_nospec
      x86, get_user: use pointer masking to limit speculation
      x86: narrow out of bounds syscalls to sys_read under speculation
      vfs, fdtable: prevent bounds-check bypass via speculative execution
      kvm, x86: fix spectre-v1 mitigation
      nl80211: sanitize array index in parse_txq_params

Mark Rutland (1):
      Documentation: document array_ptr


 Documentation/speculation.txt     |  143 +++++++++++++++++++++++++++++++++++++
 arch/x86/entry/entry_64.S         |    2 +
 arch/x86/include/asm/barrier.h    |   28 +++++++
 arch/x86/include/asm/msr.h        |    3 -
 arch/x86/include/asm/smap.h       |   24 ++++++
 arch/x86/include/asm/uaccess.h    |   15 +++-
 arch/x86/include/asm/uaccess_32.h |    6 +-
 arch/x86/include/asm/uaccess_64.h |   12 ++-
 arch/x86/kvm/vmx.c                |   19 ++---
 arch/x86/lib/getuser.S            |    5 +
 arch/x86/lib/usercopy_32.c        |    8 +-
 include/linux/fdtable.h           |    7 +-
 include/linux/nospec.h            |   65 +++++++++++++++++
 net/wireless/nl80211.c            |   10 ++-
 14 files changed, 312 insertions(+), 35 deletions(-)
 create mode 100644 Documentation/speculation.txt
 create mode 100644 include/linux/nospec.h

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v4 01/10] Documentation: document array_ptr
  2018-01-19  0:01 ` Dan Williams
@ 2018-01-19  0:01   ` Dan Williams
  -1 siblings, 0 replies; 70+ messages in thread
From: Dan Williams @ 2018-01-19  0:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mark Rutland, linux-arch, Kees Cook, kernel-hardening,
	Peter Zijlstra, gregkh, Jonathan Corbet, Will Deacon, tglx,
	torvalds, akpm, alan

From: Mark Rutland <mark.rutland@arm.com>

Document the rationale and usage of the new array_ptr() helper.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Documentation/speculation.txt |  143 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 143 insertions(+)
 create mode 100644 Documentation/speculation.txt

diff --git a/Documentation/speculation.txt b/Documentation/speculation.txt
new file mode 100644
index 000000000000..a47fbffe0dab
--- /dev/null
+++ b/Documentation/speculation.txt
@@ -0,0 +1,143 @@
+This document explains potential effects of speculation, and how undesirable
+effects can be mitigated portably using common APIs.
+
+===========
+Speculation
+===========
+
+To improve performance and minimize average latencies, many contemporary CPUs
+employ speculative execution techniques such as branch prediction, performing
+work which may be discarded at a later stage.
+
+Typically speculative execution cannot be observed from architectural state,
+such as the contents of registers. However, in some cases it is possible to
+observe its impact on microarchitectural state, such as the presence or
+absence of data in caches. Such state may form side-channels which can be
+observed to extract secret information.
+
+For example, in the presence of branch prediction, it is possible for bounds
+checks to be ignored by code which is speculatively executed. Consider the
+following code:
+
+	int load_array(int *array, unsigned int idx)
+	{
+		if (idx >= MAX_ARRAY_ELEMS)
+			return 0;
+		else
+			return array[idx];
+	}
+
+Which, on arm64, may be compiled to an assembly sequence such as:
+
+	CMP	<idx>, #MAX_ARRAY_ELEMS
+	B.LT	less
+	MOV	<returnval>, #0
+	RET
+  less:
+	LDR	<returnval>, [<array>, <idx>]
+	RET
+
+It is possible that a CPU mis-predicts the conditional branch, and
+speculatively loads array[idx], even if idx >= MAX_ARRAY_ELEMS. This value
+will subsequently be discarded, but the speculated load may affect
+microarchitectural state which can be subsequently measured.
+
+More complex sequences involving multiple dependent memory accesses may result
+in sensitive information being leaked. Consider the following code, building
+on the prior example:
+
+	int load_dependent_arrays(int *arr1, int *arr2, int idx)
+	{
+		int val1, val2,
+
+		val1 = load_array(arr1, idx);
+		val2 = load_array(arr2, val1);
+
+		return val2;
+	}
+
+Under speculation, the first call to load_array() may return the value of an
+out-of-bounds address, while the second call will influence microarchitectural
+state dependent on this value. This may provide an arbitrary read primitive.
+
+====================================
+Mitigating speculation side-channels
+====================================
+
+The kernel provides a generic API to ensure that bounds checks are respected
+even under speculation. Architectures which are affected by speculation-based
+side-channels are expected to implement these primitives.
+
+The array_ptr() helper in <asm/barrier.h> can be used to prevent
+information from being leaked via side-channels.
+
+A call to array_ptr(arr, idx, sz) returns a sanitized pointer to
+arr[idx] only if idx falls in the [0, sz) interval. When idx < 0 or idx > sz,
+NULL is returned. Additionally, array_ptr() of an out-of-bounds pointer is
+not propagated to code which is speculatively executed.
+
+This can be used to protect the earlier load_array() example:
+
+	int load_array(int *array, unsigned int idx)
+	{
+		int *elem;
+
+		elem = array_ptr(array, idx, MAX_ARRAY_ELEMS);
+		if (elem)
+			return *elem;
+		else
+			return 0;
+	}
+
+This can also be used in situations where multiple fields on a structure are
+accessed:
+
+	struct foo array[SIZE];
+	int a, b;
+
+	void do_thing(int idx)
+	{
+		struct foo *elem;
+
+		elem = array_ptr(array, idx, SIZE);
+		if (elem) {
+			a = elem->field_a;
+			b = elem->field_b;
+		}
+	}
+
+It is imperative that the returned pointer is used. Pointers which are
+generated separately are subject to a number of potential CPU and compiler
+optimizations, and may still be used speculatively. For example, this means
+that the following sequence is unsafe:
+
+	struct foo array[SIZE];
+	int a, b;
+
+	void do_thing(int idx)
+	{
+		if (array_ptr(array, idx, SIZE) != NULL) {
+			// unsafe as wrong pointer is used
+			a = array[idx].field_a;
+			b = array[idx].field_b;
+		}
+	}
+
+Similarly, it is unsafe to compare the returned pointer with other pointers,
+as this may permit the compiler to substitute one pointer with another,
+permitting speculation. For example, the following sequence is unsafe:
+
+	struct foo array[SIZE];
+	int a, b;
+
+	void do_thing(int idx)
+	{
+		struct foo *elem = array_ptr(array, idx, size);
+
+		// unsafe due to pointer substitution
+		if (elem == &array[idx]) {
+			a = elem->field_a;
+			b = elem->field_b;
+		}
+	}
+

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [kernel-hardening] [PATCH v4 01/10] Documentation: document array_ptr
@ 2018-01-19  0:01   ` Dan Williams
  0 siblings, 0 replies; 70+ messages in thread
From: Dan Williams @ 2018-01-19  0:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mark Rutland, linux-arch, Kees Cook, kernel-hardening,
	Peter Zijlstra, gregkh, Jonathan Corbet, Will Deacon, tglx,
	torvalds, akpm, alan

From: Mark Rutland <mark.rutland@arm.com>

Document the rationale and usage of the new array_ptr() helper.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Documentation/speculation.txt |  143 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 143 insertions(+)
 create mode 100644 Documentation/speculation.txt

diff --git a/Documentation/speculation.txt b/Documentation/speculation.txt
new file mode 100644
index 000000000000..a47fbffe0dab
--- /dev/null
+++ b/Documentation/speculation.txt
@@ -0,0 +1,143 @@
+This document explains potential effects of speculation, and how undesirable
+effects can be mitigated portably using common APIs.
+
+===========
+Speculation
+===========
+
+To improve performance and minimize average latencies, many contemporary CPUs
+employ speculative execution techniques such as branch prediction, performing
+work which may be discarded at a later stage.
+
+Typically speculative execution cannot be observed from architectural state,
+such as the contents of registers. However, in some cases it is possible to
+observe its impact on microarchitectural state, such as the presence or
+absence of data in caches. Such state may form side-channels which can be
+observed to extract secret information.
+
+For example, in the presence of branch prediction, it is possible for bounds
+checks to be ignored by code which is speculatively executed. Consider the
+following code:
+
+	int load_array(int *array, unsigned int idx)
+	{
+		if (idx >= MAX_ARRAY_ELEMS)
+			return 0;
+		else
+			return array[idx];
+	}
+
+Which, on arm64, may be compiled to an assembly sequence such as:
+
+	CMP	<idx>, #MAX_ARRAY_ELEMS
+	B.LT	less
+	MOV	<returnval>, #0
+	RET
+  less:
+	LDR	<returnval>, [<array>, <idx>]
+	RET
+
+It is possible that a CPU mis-predicts the conditional branch, and
+speculatively loads array[idx], even if idx >= MAX_ARRAY_ELEMS. This value
+will subsequently be discarded, but the speculated load may affect
+microarchitectural state which can be subsequently measured.
+
+More complex sequences involving multiple dependent memory accesses may result
+in sensitive information being leaked. Consider the following code, building
+on the prior example:
+
+	int load_dependent_arrays(int *arr1, int *arr2, int idx)
+	{
+		int val1, val2,
+
+		val1 = load_array(arr1, idx);
+		val2 = load_array(arr2, val1);
+
+		return val2;
+	}
+
+Under speculation, the first call to load_array() may return the value of an
+out-of-bounds address, while the second call will influence microarchitectural
+state dependent on this value. This may provide an arbitrary read primitive.
+
+====================================
+Mitigating speculation side-channels
+====================================
+
+The kernel provides a generic API to ensure that bounds checks are respected
+even under speculation. Architectures which are affected by speculation-based
+side-channels are expected to implement these primitives.
+
+The array_ptr() helper in <asm/barrier.h> can be used to prevent
+information from being leaked via side-channels.
+
+A call to array_ptr(arr, idx, sz) returns a sanitized pointer to
+arr[idx] only if idx falls in the [0, sz) interval. When idx < 0 or idx > sz,
+NULL is returned. Additionally, array_ptr() of an out-of-bounds pointer is
+not propagated to code which is speculatively executed.
+
+This can be used to protect the earlier load_array() example:
+
+	int load_array(int *array, unsigned int idx)
+	{
+		int *elem;
+
+		elem = array_ptr(array, idx, MAX_ARRAY_ELEMS);
+		if (elem)
+			return *elem;
+		else
+			return 0;
+	}
+
+This can also be used in situations where multiple fields on a structure are
+accessed:
+
+	struct foo array[SIZE];
+	int a, b;
+
+	void do_thing(int idx)
+	{
+		struct foo *elem;
+
+		elem = array_ptr(array, idx, SIZE);
+		if (elem) {
+			a = elem->field_a;
+			b = elem->field_b;
+		}
+	}
+
+It is imperative that the returned pointer is used. Pointers which are
+generated separately are subject to a number of potential CPU and compiler
+optimizations, and may still be used speculatively. For example, this means
+that the following sequence is unsafe:
+
+	struct foo array[SIZE];
+	int a, b;
+
+	void do_thing(int idx)
+	{
+		if (array_ptr(array, idx, SIZE) != NULL) {
+			// unsafe as wrong pointer is used
+			a = array[idx].field_a;
+			b = array[idx].field_b;
+		}
+	}
+
+Similarly, it is unsafe to compare the returned pointer with other pointers,
+as this may permit the compiler to substitute one pointer with another,
+permitting speculation. For example, the following sequence is unsafe:
+
+	struct foo array[SIZE];
+	int a, b;
+
+	void do_thing(int idx)
+	{
+		struct foo *elem = array_ptr(array, idx, size);
+
+		// unsafe due to pointer substitution
+		if (elem == &array[idx]) {
+			a = elem->field_a;
+			b = elem->field_b;
+		}
+	}
+

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v4 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
  2018-01-19  0:01 ` Dan Williams
@ 2018-01-19  0:01   ` Dan Williams
  -1 siblings, 0 replies; 70+ messages in thread
From: Dan Williams @ 2018-01-19  0:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, kernel-hardening, Catalin Marinas, x86, Will Deacon,
	Russell King, Ingo Molnar, gregkh, H. Peter Anvin, tglx,
	torvalds, akpm, alan

'array_ptr' is proposed as a generic mechanism to mitigate against
Spectre-variant-1 attacks, i.e. an attack that bypasses boundary checks
via speculative execution). The 'array_ptr' implementation is expected
to be safe for current generation cpus across multiple architectures
(ARM, x86).

Based on an original implementation by Linus Torvalds, tweaked to remove
speculative flows by Alexei Starovoitov, and tweaked again by Linus to
introduce an x86 assembly implementation for the mask generation.

Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org>
Co-developed-by: Alexei Starovoitov <ast@kernel.org>
Co-developed-by: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/nospec.h |   44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)
 create mode 100644 include/linux/nospec.h

diff --git a/include/linux/nospec.h b/include/linux/nospec.h
new file mode 100644
index 000000000000..f841c11f3f1f
--- /dev/null
+++ b/include/linux/nospec.h
@@ -0,0 +1,44 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright(c) 2018 Intel Corporation. All rights reserved.
+
+#ifndef __NOSPEC_H__
+#define __NOSPEC_H__
+
+#include <linux/jump_label.h>
+#include <asm/barrier.h>
+
+/*
+ * If idx is negative or if idx > size then bit 63 is set in the mask,
+ * and the value of ~(-1L) is zero. When the mask is zero, bounds check
+ * failed, array_ptr will return NULL.
+ */
+#ifndef array_ptr_mask
+static inline unsigned long array_ptr_mask(unsigned long idx, unsigned long sz)
+{
+	return ~(long)(idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1);
+}
+#endif
+
+/**
+ * array_ptr - Generate a pointer to an array element, ensuring
+ * the pointer is bounded under speculation to NULL.
+ *
+ * @base: the base of the array
+ * @idx: the index of the element, must be less than LONG_MAX
+ * @sz: the number of elements in the array, must be less than LONG_MAX
+ *
+ * If @idx falls in the interval [0, @sz), returns the pointer to
+ * @arr[@idx], otherwise returns NULL.
+ */
+#define array_ptr(base, idx, sz)					\
+({									\
+	union { typeof(*(base)) *_ptr; unsigned long _bit; } __u;	\
+	typeof(*(base)) *_arr = (base);					\
+	unsigned long _i = (idx);					\
+	unsigned long _mask = array_ptr_mask(_i, (sz));			\
+									\
+	__u._ptr = _arr + (_i & _mask);					\
+	__u._bit &= _mask;						\
+	__u._ptr;							\
+})
+#endif /* __NOSPEC_H__ */

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [kernel-hardening] [PATCH v4 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
@ 2018-01-19  0:01   ` Dan Williams
  0 siblings, 0 replies; 70+ messages in thread
From: Dan Williams @ 2018-01-19  0:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, kernel-hardening, Catalin Marinas, x86, Will Deacon,
	Russell King, Ingo Molnar, gregkh, H. Peter Anvin, tglx,
	torvalds, akpm, alan

'array_ptr' is proposed as a generic mechanism to mitigate against
Spectre-variant-1 attacks, i.e. an attack that bypasses boundary checks
via speculative execution). The 'array_ptr' implementation is expected
to be safe for current generation cpus across multiple architectures
(ARM, x86).

Based on an original implementation by Linus Torvalds, tweaked to remove
speculative flows by Alexei Starovoitov, and tweaked again by Linus to
introduce an x86 assembly implementation for the mask generation.

Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org>
Co-developed-by: Alexei Starovoitov <ast@kernel.org>
Co-developed-by: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/nospec.h |   44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)
 create mode 100644 include/linux/nospec.h

diff --git a/include/linux/nospec.h b/include/linux/nospec.h
new file mode 100644
index 000000000000..f841c11f3f1f
--- /dev/null
+++ b/include/linux/nospec.h
@@ -0,0 +1,44 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright(c) 2018 Intel Corporation. All rights reserved.
+
+#ifndef __NOSPEC_H__
+#define __NOSPEC_H__
+
+#include <linux/jump_label.h>
+#include <asm/barrier.h>
+
+/*
+ * If idx is negative or if idx > size then bit 63 is set in the mask,
+ * and the value of ~(-1L) is zero. When the mask is zero, bounds check
+ * failed, array_ptr will return NULL.
+ */
+#ifndef array_ptr_mask
+static inline unsigned long array_ptr_mask(unsigned long idx, unsigned long sz)
+{
+	return ~(long)(idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1);
+}
+#endif
+
+/**
+ * array_ptr - Generate a pointer to an array element, ensuring
+ * the pointer is bounded under speculation to NULL.
+ *
+ * @base: the base of the array
+ * @idx: the index of the element, must be less than LONG_MAX
+ * @sz: the number of elements in the array, must be less than LONG_MAX
+ *
+ * If @idx falls in the interval [0, @sz), returns the pointer to
+ * @arr[@idx], otherwise returns NULL.
+ */
+#define array_ptr(base, idx, sz)					\
+({									\
+	union { typeof(*(base)) *_ptr; unsigned long _bit; } __u;	\
+	typeof(*(base)) *_arr = (base);					\
+	unsigned long _i = (idx);					\
+	unsigned long _mask = array_ptr_mask(_i, (sz));			\
+									\
+	__u._ptr = _arr + (_i & _mask);					\
+	__u._bit &= _mask;						\
+	__u._ptr;							\
+})
+#endif /* __NOSPEC_H__ */

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v4 03/10] x86: implement array_ptr_mask()
  2018-01-19  0:01 ` Dan Williams
@ 2018-01-19  0:01   ` Dan Williams
  -1 siblings, 0 replies; 70+ messages in thread
From: Dan Williams @ 2018-01-19  0:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, kernel-hardening, gregkh, x86, Ingo Molnar,
	H. Peter Anvin, tglx, torvalds, akpm, alan

'array_ptr' uses a mask to sanitize user controllable pointers.  The x86
'array_ptr_mask' is an assembler optimized way to generate a 0 or ~0
mask if an array index is out-of-bounds or in-bounds.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/barrier.h |   24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 7fb336210e1b..67f6d4707a2c 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -24,6 +24,30 @@
 #define wmb()	asm volatile("sfence" ::: "memory")
 #endif
 
+/**
+ * array_ptr_mask - generate a mask for array_ptr() that is ~0UL when
+ * the bounds check succeeds and 0 otherwise
+ */
+#define array_ptr_mask array_ptr_mask
+static inline unsigned long array_ptr_mask(unsigned long idx, unsigned long sz)
+{
+	unsigned long mask;
+
+	/*
+	 * mask = index - size, if that result is >= 0 then the index is
+	 * invalid and the mask is 0 else ~0
+	 */
+#ifdef CONFIG_X86_32
+	asm ("cmpl %1,%2; sbbl %0,%0;"
+#else
+	asm ("cmpq %1,%2; sbbq %0,%0;"
+#endif
+			:"=r" (mask)
+			:"r"(sz),"r" (idx)
+			:"cc");
+	return mask;
+}
+
 #ifdef CONFIG_X86_PPRO_FENCE
 #define dma_rmb()	rmb()
 #else

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [kernel-hardening] [PATCH v4 03/10] x86: implement array_ptr_mask()
@ 2018-01-19  0:01   ` Dan Williams
  0 siblings, 0 replies; 70+ messages in thread
From: Dan Williams @ 2018-01-19  0:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, kernel-hardening, gregkh, x86, Ingo Molnar,
	H. Peter Anvin, tglx, torvalds, akpm, alan

'array_ptr' uses a mask to sanitize user controllable pointers.  The x86
'array_ptr_mask' is an assembler optimized way to generate a 0 or ~0
mask if an array index is out-of-bounds or in-bounds.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/barrier.h |   24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 7fb336210e1b..67f6d4707a2c 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -24,6 +24,30 @@
 #define wmb()	asm volatile("sfence" ::: "memory")
 #endif
 
+/**
+ * array_ptr_mask - generate a mask for array_ptr() that is ~0UL when
+ * the bounds check succeeds and 0 otherwise
+ */
+#define array_ptr_mask array_ptr_mask
+static inline unsigned long array_ptr_mask(unsigned long idx, unsigned long sz)
+{
+	unsigned long mask;
+
+	/*
+	 * mask = index - size, if that result is >= 0 then the index is
+	 * invalid and the mask is 0 else ~0
+	 */
+#ifdef CONFIG_X86_32
+	asm ("cmpl %1,%2; sbbl %0,%0;"
+#else
+	asm ("cmpq %1,%2; sbbq %0,%0;"
+#endif
+			:"=r" (mask)
+			:"r"(sz),"r" (idx)
+			:"cc");
+	return mask;
+}
+
 #ifdef CONFIG_X86_PPRO_FENCE
 #define dma_rmb()	rmb()
 #else

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v4 04/10] x86: introduce __uaccess_begin_nospec and ifence
  2018-01-19  0:01 ` Dan Williams
@ 2018-01-19  0:01   ` Dan Williams
  -1 siblings, 0 replies; 70+ messages in thread
From: Dan Williams @ 2018-01-19  0:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, Tom Lendacky, Andi Kleen, Kees Cook,
	kernel-hardening, gregkh, x86, Ingo Molnar, Al Viro,
	H. Peter Anvin, tglx, torvalds, akpm, alan

For '__get_user' paths, do not allow the kernel to speculate on the
value of a user controlled pointer. In addition to the 'stac'
instruction for Supervisor Mode Access Protection, an 'ifence' causes
the 'access_ok' result to resolve in the pipeline before the cpu might
take any speculative action on the pointer value.

Since __get_user is a major kernel interface that deals with user
controlled pointers, the '__uaccess_begin_nospec' mechanism will prevent
speculative execution past an 'access_ok' permission check. While
speculative execution past 'access_ok' is not enough to lead to a kernel
memory leak, it is a necessary precondition.

To be clear, '__uaccess_begin_nospec' is addressing a class of potential
problems near '__get_user' usages.

Note, that while ifence is used to protect '__get_user', pointer masking
will be used for 'get_user' since it incorporates a bounds check near
the usage.

There are no functional changes in this patch.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Andi Kleen <ak@linux.intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/barrier.h |    4 ++++
 arch/x86/include/asm/msr.h     |    3 +--
 arch/x86/include/asm/uaccess.h |    9 +++++++++
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 67f6d4707a2c..0f48c832d1fb 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -48,6 +48,10 @@ static inline unsigned long array_ptr_mask(unsigned long idx, unsigned long sz)
 	return mask;
 }
 
+/* prevent speculative execution past this barrier */
+#define ifence() alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC, \
+				   "lfence", X86_FEATURE_LFENCE_RDTSC)
+
 #ifdef CONFIG_X86_PPRO_FENCE
 #define dma_rmb()	rmb()
 #else
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 07962f5f6fba..e426d2a33ff3 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -214,8 +214,7 @@ static __always_inline unsigned long long rdtsc_ordered(void)
 	 * that some other imaginary CPU is updating continuously with a
 	 * time stamp.
 	 */
-	alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC,
-			  "lfence", X86_FEATURE_LFENCE_RDTSC);
+	ifence();
 	return rdtsc();
 }
 
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 574dff4d2913..626caf58183a 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -124,6 +124,11 @@ extern int __get_user_bad(void);
 
 #define __uaccess_begin() stac()
 #define __uaccess_end()   clac()
+#define __uaccess_begin_nospec()	\
+({					\
+	stac();				\
+	ifence();			\
+})
 
 /*
  * This is a type: either unsigned long, if the argument fits into
@@ -487,6 +492,10 @@ struct __large_struct { unsigned long buf[100]; };
 	__uaccess_begin();						\
 	barrier();
 
+#define uaccess_try_nospec do {						\
+	current->thread.uaccess_err = 0;				\
+	__uaccess_begin_nospec();					\
+
 #define uaccess_catch(err)						\
 	__uaccess_end();						\
 	(err) |= (current->thread.uaccess_err ? -EFAULT : 0);		\

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [kernel-hardening] [PATCH v4 04/10] x86: introduce __uaccess_begin_nospec and ifence
@ 2018-01-19  0:01   ` Dan Williams
  0 siblings, 0 replies; 70+ messages in thread
From: Dan Williams @ 2018-01-19  0:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, Tom Lendacky, Andi Kleen, Kees Cook,
	kernel-hardening, gregkh, x86, Ingo Molnar, Al Viro,
	H. Peter Anvin, tglx, torvalds, akpm, alan

For '__get_user' paths, do not allow the kernel to speculate on the
value of a user controlled pointer. In addition to the 'stac'
instruction for Supervisor Mode Access Protection, an 'ifence' causes
the 'access_ok' result to resolve in the pipeline before the cpu might
take any speculative action on the pointer value.

Since __get_user is a major kernel interface that deals with user
controlled pointers, the '__uaccess_begin_nospec' mechanism will prevent
speculative execution past an 'access_ok' permission check. While
speculative execution past 'access_ok' is not enough to lead to a kernel
memory leak, it is a necessary precondition.

To be clear, '__uaccess_begin_nospec' is addressing a class of potential
problems near '__get_user' usages.

Note, that while ifence is used to protect '__get_user', pointer masking
will be used for 'get_user' since it incorporates a bounds check near
the usage.

There are no functional changes in this patch.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Andi Kleen <ak@linux.intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/barrier.h |    4 ++++
 arch/x86/include/asm/msr.h     |    3 +--
 arch/x86/include/asm/uaccess.h |    9 +++++++++
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 67f6d4707a2c..0f48c832d1fb 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -48,6 +48,10 @@ static inline unsigned long array_ptr_mask(unsigned long idx, unsigned long sz)
 	return mask;
 }
 
+/* prevent speculative execution past this barrier */
+#define ifence() alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC, \
+				   "lfence", X86_FEATURE_LFENCE_RDTSC)
+
 #ifdef CONFIG_X86_PPRO_FENCE
 #define dma_rmb()	rmb()
 #else
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 07962f5f6fba..e426d2a33ff3 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -214,8 +214,7 @@ static __always_inline unsigned long long rdtsc_ordered(void)
 	 * that some other imaginary CPU is updating continuously with a
 	 * time stamp.
 	 */
-	alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC,
-			  "lfence", X86_FEATURE_LFENCE_RDTSC);
+	ifence();
 	return rdtsc();
 }
 
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 574dff4d2913..626caf58183a 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -124,6 +124,11 @@ extern int __get_user_bad(void);
 
 #define __uaccess_begin() stac()
 #define __uaccess_end()   clac()
+#define __uaccess_begin_nospec()	\
+({					\
+	stac();				\
+	ifence();			\
+})
 
 /*
  * This is a type: either unsigned long, if the argument fits into
@@ -487,6 +492,10 @@ struct __large_struct { unsigned long buf[100]; };
 	__uaccess_begin();						\
 	barrier();
 
+#define uaccess_try_nospec do {						\
+	current->thread.uaccess_err = 0;				\
+	__uaccess_begin_nospec();					\
+
 #define uaccess_catch(err)						\
 	__uaccess_end();						\
 	(err) |= (current->thread.uaccess_err ? -EFAULT : 0);		\

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v4 05/10] x86, __get_user: use __uaccess_begin_nospec
  2018-01-19  0:01 ` Dan Williams
@ 2018-01-19  0:02   ` Dan Williams
  -1 siblings, 0 replies; 70+ messages in thread
From: Dan Williams @ 2018-01-19  0:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, Andi Kleen, Kees Cook, kernel-hardening, gregkh, x86,
	Ingo Molnar, Al Viro, H. Peter Anvin, tglx, torvalds, akpm, alan

Quoting Linus:

    I do think that it would be a good idea to very expressly document
    the fact that it's not that the user access itself is unsafe. I do
    agree that things like "get_user()" want to be protected, but not
    because of any direct bugs or problems with get_user() and friends,
    but simply because get_user() is an excellent source of a pointer
    that is obviously controlled from a potentially attacking user
    space. So it's a prime candidate for then finding _subsequent_
    accesses that can then be used to perturb the cache.

'__uaccess_begin_nospec' covers '__get_user' and 'copy_from_iter' where
the limit check is far away from the user pointer de-reference. In those
cases an 'lfence' prevents speculation with a potential pointer to
privileged memory.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Andi Kleen <ak@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/uaccess.h    |    6 +++---
 arch/x86/include/asm/uaccess_32.h |    6 +++---
 arch/x86/include/asm/uaccess_64.h |   12 ++++++------
 arch/x86/lib/usercopy_32.c        |    8 ++++----
 4 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 626caf58183a..a930585fa3b5 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -450,7 +450,7 @@ do {									\
 ({									\
 	int __gu_err;							\
 	__inttype(*(ptr)) __gu_val;					\
-	__uaccess_begin();						\
+	__uaccess_begin_nospec();					\
 	__get_user_size(__gu_val, (ptr), (size), __gu_err, -EFAULT);	\
 	__uaccess_end();						\
 	(x) = (__force __typeof__(*(ptr)))__gu_val;			\
@@ -557,7 +557,7 @@ struct __large_struct { unsigned long buf[100]; };
  *	get_user_ex(...);
  * } get_user_catch(err)
  */
-#define get_user_try		uaccess_try
+#define get_user_try		uaccess_try_nospec
 #define get_user_catch(err)	uaccess_catch(err)
 
 #define get_user_ex(x, ptr)	do {					\
@@ -591,7 +591,7 @@ extern void __cmpxchg_wrong_size(void)
 	__typeof__(ptr) __uval = (uval);				\
 	__typeof__(*(ptr)) __old = (old);				\
 	__typeof__(*(ptr)) __new = (new);				\
-	__uaccess_begin();						\
+	__uaccess_begin_nospec();					\
 	switch (size) {							\
 	case 1:								\
 	{								\
diff --git a/arch/x86/include/asm/uaccess_32.h b/arch/x86/include/asm/uaccess_32.h
index 72950401b223..ba2dc1930630 100644
--- a/arch/x86/include/asm/uaccess_32.h
+++ b/arch/x86/include/asm/uaccess_32.h
@@ -29,21 +29,21 @@ raw_copy_from_user(void *to, const void __user *from, unsigned long n)
 		switch (n) {
 		case 1:
 			ret = 0;
-			__uaccess_begin();
+			__uaccess_begin_nospec();
 			__get_user_asm_nozero(*(u8 *)to, from, ret,
 					      "b", "b", "=q", 1);
 			__uaccess_end();
 			return ret;
 		case 2:
 			ret = 0;
-			__uaccess_begin();
+			__uaccess_begin_nospec();
 			__get_user_asm_nozero(*(u16 *)to, from, ret,
 					      "w", "w", "=r", 2);
 			__uaccess_end();
 			return ret;
 		case 4:
 			ret = 0;
-			__uaccess_begin();
+			__uaccess_begin_nospec();
 			__get_user_asm_nozero(*(u32 *)to, from, ret,
 					      "l", "k", "=r", 4);
 			__uaccess_end();
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index f07ef3c575db..62546b3a398e 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -55,31 +55,31 @@ raw_copy_from_user(void *dst, const void __user *src, unsigned long size)
 		return copy_user_generic(dst, (__force void *)src, size);
 	switch (size) {
 	case 1:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u8 *)dst, (u8 __user *)src,
 			      ret, "b", "b", "=q", 1);
 		__uaccess_end();
 		return ret;
 	case 2:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u16 *)dst, (u16 __user *)src,
 			      ret, "w", "w", "=r", 2);
 		__uaccess_end();
 		return ret;
 	case 4:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u32 *)dst, (u32 __user *)src,
 			      ret, "l", "k", "=r", 4);
 		__uaccess_end();
 		return ret;
 	case 8:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
 			      ret, "q", "", "=r", 8);
 		__uaccess_end();
 		return ret;
 	case 10:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
 			       ret, "q", "", "=r", 10);
 		if (likely(!ret))
@@ -89,7 +89,7 @@ raw_copy_from_user(void *dst, const void __user *src, unsigned long size)
 		__uaccess_end();
 		return ret;
 	case 16:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
 			       ret, "q", "", "=r", 16);
 		if (likely(!ret))
diff --git a/arch/x86/lib/usercopy_32.c b/arch/x86/lib/usercopy_32.c
index 1b377f734e64..7add8ba06887 100644
--- a/arch/x86/lib/usercopy_32.c
+++ b/arch/x86/lib/usercopy_32.c
@@ -331,12 +331,12 @@ do {									\
 
 unsigned long __copy_user_ll(void *to, const void *from, unsigned long n)
 {
-	stac();
+	__uaccess_begin_nospec();
 	if (movsl_is_ok(to, from, n))
 		__copy_user(to, from, n);
 	else
 		n = __copy_user_intel(to, from, n);
-	clac();
+	__uaccess_end();
 	return n;
 }
 EXPORT_SYMBOL(__copy_user_ll);
@@ -344,7 +344,7 @@ EXPORT_SYMBOL(__copy_user_ll);
 unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *from,
 					unsigned long n)
 {
-	stac();
+	__uaccess_begin_nospec();
 #ifdef CONFIG_X86_INTEL_USERCOPY
 	if (n > 64 && static_cpu_has(X86_FEATURE_XMM2))
 		n = __copy_user_intel_nocache(to, from, n);
@@ -353,7 +353,7 @@ unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *fr
 #else
 	__copy_user(to, from, n);
 #endif
-	clac();
+	__uaccess_end();
 	return n;
 }
 EXPORT_SYMBOL(__copy_from_user_ll_nocache_nozero);

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [kernel-hardening] [PATCH v4 05/10] x86, __get_user: use __uaccess_begin_nospec
@ 2018-01-19  0:02   ` Dan Williams
  0 siblings, 0 replies; 70+ messages in thread
From: Dan Williams @ 2018-01-19  0:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, Andi Kleen, Kees Cook, kernel-hardening, gregkh, x86,
	Ingo Molnar, Al Viro, H. Peter Anvin, tglx, torvalds, akpm, alan

Quoting Linus:

    I do think that it would be a good idea to very expressly document
    the fact that it's not that the user access itself is unsafe. I do
    agree that things like "get_user()" want to be protected, but not
    because of any direct bugs or problems with get_user() and friends,
    but simply because get_user() is an excellent source of a pointer
    that is obviously controlled from a potentially attacking user
    space. So it's a prime candidate for then finding _subsequent_
    accesses that can then be used to perturb the cache.

'__uaccess_begin_nospec' covers '__get_user' and 'copy_from_iter' where
the limit check is far away from the user pointer de-reference. In those
cases an 'lfence' prevents speculation with a potential pointer to
privileged memory.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Andi Kleen <ak@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/uaccess.h    |    6 +++---
 arch/x86/include/asm/uaccess_32.h |    6 +++---
 arch/x86/include/asm/uaccess_64.h |   12 ++++++------
 arch/x86/lib/usercopy_32.c        |    8 ++++----
 4 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 626caf58183a..a930585fa3b5 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -450,7 +450,7 @@ do {									\
 ({									\
 	int __gu_err;							\
 	__inttype(*(ptr)) __gu_val;					\
-	__uaccess_begin();						\
+	__uaccess_begin_nospec();					\
 	__get_user_size(__gu_val, (ptr), (size), __gu_err, -EFAULT);	\
 	__uaccess_end();						\
 	(x) = (__force __typeof__(*(ptr)))__gu_val;			\
@@ -557,7 +557,7 @@ struct __large_struct { unsigned long buf[100]; };
  *	get_user_ex(...);
  * } get_user_catch(err)
  */
-#define get_user_try		uaccess_try
+#define get_user_try		uaccess_try_nospec
 #define get_user_catch(err)	uaccess_catch(err)
 
 #define get_user_ex(x, ptr)	do {					\
@@ -591,7 +591,7 @@ extern void __cmpxchg_wrong_size(void)
 	__typeof__(ptr) __uval = (uval);				\
 	__typeof__(*(ptr)) __old = (old);				\
 	__typeof__(*(ptr)) __new = (new);				\
-	__uaccess_begin();						\
+	__uaccess_begin_nospec();					\
 	switch (size) {							\
 	case 1:								\
 	{								\
diff --git a/arch/x86/include/asm/uaccess_32.h b/arch/x86/include/asm/uaccess_32.h
index 72950401b223..ba2dc1930630 100644
--- a/arch/x86/include/asm/uaccess_32.h
+++ b/arch/x86/include/asm/uaccess_32.h
@@ -29,21 +29,21 @@ raw_copy_from_user(void *to, const void __user *from, unsigned long n)
 		switch (n) {
 		case 1:
 			ret = 0;
-			__uaccess_begin();
+			__uaccess_begin_nospec();
 			__get_user_asm_nozero(*(u8 *)to, from, ret,
 					      "b", "b", "=q", 1);
 			__uaccess_end();
 			return ret;
 		case 2:
 			ret = 0;
-			__uaccess_begin();
+			__uaccess_begin_nospec();
 			__get_user_asm_nozero(*(u16 *)to, from, ret,
 					      "w", "w", "=r", 2);
 			__uaccess_end();
 			return ret;
 		case 4:
 			ret = 0;
-			__uaccess_begin();
+			__uaccess_begin_nospec();
 			__get_user_asm_nozero(*(u32 *)to, from, ret,
 					      "l", "k", "=r", 4);
 			__uaccess_end();
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index f07ef3c575db..62546b3a398e 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -55,31 +55,31 @@ raw_copy_from_user(void *dst, const void __user *src, unsigned long size)
 		return copy_user_generic(dst, (__force void *)src, size);
 	switch (size) {
 	case 1:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u8 *)dst, (u8 __user *)src,
 			      ret, "b", "b", "=q", 1);
 		__uaccess_end();
 		return ret;
 	case 2:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u16 *)dst, (u16 __user *)src,
 			      ret, "w", "w", "=r", 2);
 		__uaccess_end();
 		return ret;
 	case 4:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u32 *)dst, (u32 __user *)src,
 			      ret, "l", "k", "=r", 4);
 		__uaccess_end();
 		return ret;
 	case 8:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
 			      ret, "q", "", "=r", 8);
 		__uaccess_end();
 		return ret;
 	case 10:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
 			       ret, "q", "", "=r", 10);
 		if (likely(!ret))
@@ -89,7 +89,7 @@ raw_copy_from_user(void *dst, const void __user *src, unsigned long size)
 		__uaccess_end();
 		return ret;
 	case 16:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
 			       ret, "q", "", "=r", 16);
 		if (likely(!ret))
diff --git a/arch/x86/lib/usercopy_32.c b/arch/x86/lib/usercopy_32.c
index 1b377f734e64..7add8ba06887 100644
--- a/arch/x86/lib/usercopy_32.c
+++ b/arch/x86/lib/usercopy_32.c
@@ -331,12 +331,12 @@ do {									\
 
 unsigned long __copy_user_ll(void *to, const void *from, unsigned long n)
 {
-	stac();
+	__uaccess_begin_nospec();
 	if (movsl_is_ok(to, from, n))
 		__copy_user(to, from, n);
 	else
 		n = __copy_user_intel(to, from, n);
-	clac();
+	__uaccess_end();
 	return n;
 }
 EXPORT_SYMBOL(__copy_user_ll);
@@ -344,7 +344,7 @@ EXPORT_SYMBOL(__copy_user_ll);
 unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *from,
 					unsigned long n)
 {
-	stac();
+	__uaccess_begin_nospec();
 #ifdef CONFIG_X86_INTEL_USERCOPY
 	if (n > 64 && static_cpu_has(X86_FEATURE_XMM2))
 		n = __copy_user_intel_nocache(to, from, n);
@@ -353,7 +353,7 @@ unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *fr
 #else
 	__copy_user(to, from, n);
 #endif
-	clac();
+	__uaccess_end();
 	return n;
 }
 EXPORT_SYMBOL(__copy_from_user_ll_nocache_nozero);

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v4 06/10] x86, get_user: use pointer masking to limit speculation
  2018-01-19  0:01 ` Dan Williams
@ 2018-01-19  0:02   ` Dan Williams
  -1 siblings, 0 replies; 70+ messages in thread
From: Dan Williams @ 2018-01-19  0:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, Kees Cook, kernel-hardening, gregkh, x86,
	Ingo Molnar, Al Viro, H. Peter Anvin, tglx, torvalds, akpm, alan

Quoting Linus:

    I do think that it would be a good idea to very expressly document
    the fact that it's not that the user access itself is unsafe. I do
    agree that things like "get_user()" want to be protected, but not
    because of any direct bugs or problems with get_user() and friends,
    but simply because get_user() is an excellent source of a pointer
    that is obviously controlled from a potentially attacking user
    space. So it's a prime candidate for then finding _subsequent_
    accesses that can then be used to perturb the cache.

Unlike the '__get_user' case 'get_user' includes the address limit check
near the pointer de-reference. With that locality the speculation can be
mitigated with pointer narrowing rather than a barrier. Where the
narrowing is performed by:

	cmp %limit, %ptr
	sbb %mask, %mask
	and %mask, %ptr

With respect to speculation the value of %ptr is either less than %limit
or NULL.

Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/smap.h |   17 +++++++++++++++++
 arch/x86/lib/getuser.S      |    5 +++++
 2 files changed, 22 insertions(+)

diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
index db333300bd4b..2b4ad4c6a226 100644
--- a/arch/x86/include/asm/smap.h
+++ b/arch/x86/include/asm/smap.h
@@ -25,6 +25,23 @@
 
 #include <asm/alternative-asm.h>
 
+/*
+ * MASK_NOSPEC - sanitize the value of a user controlled value with
+ * respect to speculation
+ *
+ * In the get_user path once we have determined that the pointer is
+ * below the current address limit sanitize its value with respect to
+ * speculation. In the case when the pointer is above the address limit
+ * this directs the cpu to speculate with a NULL ptr rather than
+ * something targeting kernel memory.
+ *
+ * assumes CF is set from a previous 'cmp TASK_addr_limit, %ptr'
+ */
+.macro MASK_NOSPEC mask val
+	sbb \mask, \mask
+	and \mask, \val
+.endm
+
 #ifdef CONFIG_X86_SMAP
 
 #define ASM_CLAC \
diff --git a/arch/x86/lib/getuser.S b/arch/x86/lib/getuser.S
index c97d935a29e8..07d0e8a28b17 100644
--- a/arch/x86/lib/getuser.S
+++ b/arch/x86/lib/getuser.S
@@ -40,6 +40,7 @@ ENTRY(__get_user_1)
 	mov PER_CPU_VAR(current_task), %_ASM_DX
 	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
 	jae bad_get_user
+	MASK_NOSPEC %_ASM_DX, %_ASM_AX
 	ASM_STAC
 1:	movzbl (%_ASM_AX),%edx
 	xor %eax,%eax
@@ -54,6 +55,7 @@ ENTRY(__get_user_2)
 	mov PER_CPU_VAR(current_task), %_ASM_DX
 	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
 	jae bad_get_user
+	MASK_NOSPEC %_ASM_DX, %_ASM_AX
 	ASM_STAC
 2:	movzwl -1(%_ASM_AX),%edx
 	xor %eax,%eax
@@ -68,6 +70,7 @@ ENTRY(__get_user_4)
 	mov PER_CPU_VAR(current_task), %_ASM_DX
 	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
 	jae bad_get_user
+	MASK_NOSPEC %_ASM_DX, %_ASM_AX
 	ASM_STAC
 3:	movl -3(%_ASM_AX),%edx
 	xor %eax,%eax
@@ -83,6 +86,7 @@ ENTRY(__get_user_8)
 	mov PER_CPU_VAR(current_task), %_ASM_DX
 	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
 	jae bad_get_user
+	MASK_NOSPEC %_ASM_DX, %_ASM_AX
 	ASM_STAC
 4:	movq -7(%_ASM_AX),%rdx
 	xor %eax,%eax
@@ -94,6 +98,7 @@ ENTRY(__get_user_8)
 	mov PER_CPU_VAR(current_task), %_ASM_DX
 	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
 	jae bad_get_user_8
+	MASK_NOSPEC %_ASM_DX, %_ASM_AX
 	ASM_STAC
 4:	movl -7(%_ASM_AX),%edx
 5:	movl -3(%_ASM_AX),%ecx

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [kernel-hardening] [PATCH v4 06/10] x86, get_user: use pointer masking to limit speculation
@ 2018-01-19  0:02   ` Dan Williams
  0 siblings, 0 replies; 70+ messages in thread
From: Dan Williams @ 2018-01-19  0:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, Kees Cook, kernel-hardening, gregkh, x86,
	Ingo Molnar, Al Viro, H. Peter Anvin, tglx, torvalds, akpm, alan

Quoting Linus:

    I do think that it would be a good idea to very expressly document
    the fact that it's not that the user access itself is unsafe. I do
    agree that things like "get_user()" want to be protected, but not
    because of any direct bugs or problems with get_user() and friends,
    but simply because get_user() is an excellent source of a pointer
    that is obviously controlled from a potentially attacking user
    space. So it's a prime candidate for then finding _subsequent_
    accesses that can then be used to perturb the cache.

Unlike the '__get_user' case 'get_user' includes the address limit check
near the pointer de-reference. With that locality the speculation can be
mitigated with pointer narrowing rather than a barrier. Where the
narrowing is performed by:

	cmp %limit, %ptr
	sbb %mask, %mask
	and %mask, %ptr

With respect to speculation the value of %ptr is either less than %limit
or NULL.

Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/smap.h |   17 +++++++++++++++++
 arch/x86/lib/getuser.S      |    5 +++++
 2 files changed, 22 insertions(+)

diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
index db333300bd4b..2b4ad4c6a226 100644
--- a/arch/x86/include/asm/smap.h
+++ b/arch/x86/include/asm/smap.h
@@ -25,6 +25,23 @@
 
 #include <asm/alternative-asm.h>
 
+/*
+ * MASK_NOSPEC - sanitize the value of a user controlled value with
+ * respect to speculation
+ *
+ * In the get_user path once we have determined that the pointer is
+ * below the current address limit sanitize its value with respect to
+ * speculation. In the case when the pointer is above the address limit
+ * this directs the cpu to speculate with a NULL ptr rather than
+ * something targeting kernel memory.
+ *
+ * assumes CF is set from a previous 'cmp TASK_addr_limit, %ptr'
+ */
+.macro MASK_NOSPEC mask val
+	sbb \mask, \mask
+	and \mask, \val
+.endm
+
 #ifdef CONFIG_X86_SMAP
 
 #define ASM_CLAC \
diff --git a/arch/x86/lib/getuser.S b/arch/x86/lib/getuser.S
index c97d935a29e8..07d0e8a28b17 100644
--- a/arch/x86/lib/getuser.S
+++ b/arch/x86/lib/getuser.S
@@ -40,6 +40,7 @@ ENTRY(__get_user_1)
 	mov PER_CPU_VAR(current_task), %_ASM_DX
 	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
 	jae bad_get_user
+	MASK_NOSPEC %_ASM_DX, %_ASM_AX
 	ASM_STAC
 1:	movzbl (%_ASM_AX),%edx
 	xor %eax,%eax
@@ -54,6 +55,7 @@ ENTRY(__get_user_2)
 	mov PER_CPU_VAR(current_task), %_ASM_DX
 	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
 	jae bad_get_user
+	MASK_NOSPEC %_ASM_DX, %_ASM_AX
 	ASM_STAC
 2:	movzwl -1(%_ASM_AX),%edx
 	xor %eax,%eax
@@ -68,6 +70,7 @@ ENTRY(__get_user_4)
 	mov PER_CPU_VAR(current_task), %_ASM_DX
 	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
 	jae bad_get_user
+	MASK_NOSPEC %_ASM_DX, %_ASM_AX
 	ASM_STAC
 3:	movl -3(%_ASM_AX),%edx
 	xor %eax,%eax
@@ -83,6 +86,7 @@ ENTRY(__get_user_8)
 	mov PER_CPU_VAR(current_task), %_ASM_DX
 	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
 	jae bad_get_user
+	MASK_NOSPEC %_ASM_DX, %_ASM_AX
 	ASM_STAC
 4:	movq -7(%_ASM_AX),%rdx
 	xor %eax,%eax
@@ -94,6 +98,7 @@ ENTRY(__get_user_8)
 	mov PER_CPU_VAR(current_task), %_ASM_DX
 	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
 	jae bad_get_user_8
+	MASK_NOSPEC %_ASM_DX, %_ASM_AX
 	ASM_STAC
 4:	movl -7(%_ASM_AX),%edx
 5:	movl -3(%_ASM_AX),%ecx

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v4 07/10] x86: narrow out of bounds syscalls to sys_read under speculation
  2018-01-19  0:01 ` Dan Williams
@ 2018-01-19  0:02   ` Dan Williams
  -1 siblings, 0 replies; 70+ messages in thread
From: Dan Williams @ 2018-01-19  0:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, kernel-hardening, gregkh, x86, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, tglx, torvalds, akpm, alan

The syscall table base is a user controlled function pointer in kernel
space. Like, 'get_user, use 'MASK_NOSPEC' to prevent any out of bounds
speculation. While retpoline prevents speculating into the user
controlled target it does not stop the pointer de-reference, the concern
is leaking memory relative to the syscall table base.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/entry/entry_64.S   |    2 ++
 arch/x86/include/asm/smap.h |    9 ++++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 4f8e1d35a97c..2320017077d4 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -35,6 +35,7 @@
 #include <asm/asm.h>
 #include <asm/smap.h>
 #include <asm/pgtable_types.h>
+#include <asm/smap.h>
 #include <asm/export.h>
 #include <asm/frame.h>
 #include <asm/nospec-branch.h>
@@ -264,6 +265,7 @@ entry_SYSCALL_64_fastpath:
 	cmpl	$__NR_syscall_max, %eax
 #endif
 	ja	1f				/* return -ENOSYS (already in pt_regs->ax) */
+	MASK_NOSPEC %r11 %rax			/* sanitize syscall_nr wrt speculation */
 	movq	%r10, %rcx
 
 	/*
diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
index 2b4ad4c6a226..3b5b2cf58dc6 100644
--- a/arch/x86/include/asm/smap.h
+++ b/arch/x86/include/asm/smap.h
@@ -35,7 +35,14 @@
  * this directs the cpu to speculate with a NULL ptr rather than
  * something targeting kernel memory.
  *
- * assumes CF is set from a previous 'cmp TASK_addr_limit, %ptr'
+ * In the syscall entry path it is possible to speculate past the
+ * validation of the system call number. Use MASK_NOSPEC to sanitize the
+ * syscall array index to zero (sys_read) rather than an arbitrary
+ * target.
+ *
+ * assumes CF is set from a previous 'cmp' i.e.:
+ *     cmp TASK_addr_limit, %ptr
+ *     cmp __NR_syscall_max, %idx
  */
 .macro MASK_NOSPEC mask val
 	sbb \mask, \mask

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [kernel-hardening] [PATCH v4 07/10] x86: narrow out of bounds syscalls to sys_read under speculation
@ 2018-01-19  0:02   ` Dan Williams
  0 siblings, 0 replies; 70+ messages in thread
From: Dan Williams @ 2018-01-19  0:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, kernel-hardening, gregkh, x86, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, tglx, torvalds, akpm, alan

The syscall table base is a user controlled function pointer in kernel
space. Like, 'get_user, use 'MASK_NOSPEC' to prevent any out of bounds
speculation. While retpoline prevents speculating into the user
controlled target it does not stop the pointer de-reference, the concern
is leaking memory relative to the syscall table base.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/entry/entry_64.S   |    2 ++
 arch/x86/include/asm/smap.h |    9 ++++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 4f8e1d35a97c..2320017077d4 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -35,6 +35,7 @@
 #include <asm/asm.h>
 #include <asm/smap.h>
 #include <asm/pgtable_types.h>
+#include <asm/smap.h>
 #include <asm/export.h>
 #include <asm/frame.h>
 #include <asm/nospec-branch.h>
@@ -264,6 +265,7 @@ entry_SYSCALL_64_fastpath:
 	cmpl	$__NR_syscall_max, %eax
 #endif
 	ja	1f				/* return -ENOSYS (already in pt_regs->ax) */
+	MASK_NOSPEC %r11 %rax			/* sanitize syscall_nr wrt speculation */
 	movq	%r10, %rcx
 
 	/*
diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
index 2b4ad4c6a226..3b5b2cf58dc6 100644
--- a/arch/x86/include/asm/smap.h
+++ b/arch/x86/include/asm/smap.h
@@ -35,7 +35,14 @@
  * this directs the cpu to speculate with a NULL ptr rather than
  * something targeting kernel memory.
  *
- * assumes CF is set from a previous 'cmp TASK_addr_limit, %ptr'
+ * In the syscall entry path it is possible to speculate past the
+ * validation of the system call number. Use MASK_NOSPEC to sanitize the
+ * syscall array index to zero (sys_read) rather than an arbitrary
+ * target.
+ *
+ * assumes CF is set from a previous 'cmp' i.e.:
+ *     cmp TASK_addr_limit, %ptr
+ *     cmp __NR_syscall_max, %idx
  */
 .macro MASK_NOSPEC mask val
 	sbb \mask, \mask

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v4 08/10] vfs, fdtable: prevent bounds-check bypass via speculative execution
  2018-01-19  0:01 ` Dan Williams
@ 2018-01-19  0:02   ` Dan Williams
  -1 siblings, 0 replies; 70+ messages in thread
From: Dan Williams @ 2018-01-19  0:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, kernel-hardening, gregkh, Al Viro, tglx, torvalds,
	akpm, alan

'fd' is a user controlled value that is used as a data dependency to
read from the 'fdt->fd' array.  In order to avoid potential leaks of
kernel memory values, block speculative execution of the instruction
stream that could issue reads based on an invalid 'file *' returned from
__fcheck_files.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Co-developed-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/fdtable.h |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index 1c65817673db..9731f1a255db 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -10,6 +10,7 @@
 #include <linux/compiler.h>
 #include <linux/spinlock.h>
 #include <linux/rcupdate.h>
+#include <linux/nospec.h>
 #include <linux/types.h>
 #include <linux/init.h>
 #include <linux/fs.h>
@@ -81,9 +82,11 @@ struct dentry;
 static inline struct file *__fcheck_files(struct files_struct *files, unsigned int fd)
 {
 	struct fdtable *fdt = rcu_dereference_raw(files->fdt);
+	struct file __rcu **fdp;
 
-	if (fd < fdt->max_fds)
-		return rcu_dereference_raw(fdt->fd[fd]);
+	fdp = array_ptr(fdt->fd, fd, fdt->max_fds);
+	if (fdp)
+		return rcu_dereference_raw(*fdp);
 	return NULL;
 }
 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [kernel-hardening] [PATCH v4 08/10] vfs, fdtable: prevent bounds-check bypass via speculative execution
@ 2018-01-19  0:02   ` Dan Williams
  0 siblings, 0 replies; 70+ messages in thread
From: Dan Williams @ 2018-01-19  0:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, kernel-hardening, gregkh, Al Viro, tglx, torvalds,
	akpm, alan

'fd' is a user controlled value that is used as a data dependency to
read from the 'fdt->fd' array.  In order to avoid potential leaks of
kernel memory values, block speculative execution of the instruction
stream that could issue reads based on an invalid 'file *' returned from
__fcheck_files.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Co-developed-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/fdtable.h |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index 1c65817673db..9731f1a255db 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -10,6 +10,7 @@
 #include <linux/compiler.h>
 #include <linux/spinlock.h>
 #include <linux/rcupdate.h>
+#include <linux/nospec.h>
 #include <linux/types.h>
 #include <linux/init.h>
 #include <linux/fs.h>
@@ -81,9 +82,11 @@ struct dentry;
 static inline struct file *__fcheck_files(struct files_struct *files, unsigned int fd)
 {
 	struct fdtable *fdt = rcu_dereference_raw(files->fdt);
+	struct file __rcu **fdp;
 
-	if (fd < fdt->max_fds)
-		return rcu_dereference_raw(fdt->fd[fd]);
+	fdp = array_ptr(fdt->fd, fd, fdt->max_fds);
+	if (fdp)
+		return rcu_dereference_raw(*fdp);
 	return NULL;
 }
 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v4 09/10] kvm, x86: fix spectre-v1 mitigation
  2018-01-19  0:01 ` Dan Williams
@ 2018-01-19  0:02   ` Dan Williams
  -1 siblings, 0 replies; 70+ messages in thread
From: Dan Williams @ 2018-01-19  0:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, kernel-hardening, Andrew Honig, stable, gregkh,
	Paolo Bonzini, tglx, alan, torvalds, akpm, Jim Mattson

Commit 75f139aaf896 "KVM: x86: Add memory barrier on vmcs field lookup"
added a raw 'asm("lfence");' to prevent a bounds check bypass of
'vmcs_field_to_offset_table'. This does not work for some AMD cpus, see
the 'ifence' helper, and it otherwise does not use the common
'array_ptr' helper designed for these types of fixes. Convert this to
use 'array_ptr'.

Cc: Andrew Honig <ahonig@google.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/kvm/vmx.c |   19 +++++++------------
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c829d89e2e63..20b9b0b5e336 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -34,6 +34,7 @@
 #include <linux/tboot.h>
 #include <linux/hrtimer.h>
 #include <linux/frame.h>
+#include <linux/nospec.h>
 #include "kvm_cache_regs.h"
 #include "x86.h"
 
@@ -898,21 +899,15 @@ static const unsigned short vmcs_field_to_offset_table[] = {
 
 static inline short vmcs_field_to_offset(unsigned long field)
 {
-	BUILD_BUG_ON(ARRAY_SIZE(vmcs_field_to_offset_table) > SHRT_MAX);
-
-	if (field >= ARRAY_SIZE(vmcs_field_to_offset_table))
-		return -ENOENT;
+	const unsigned short *offset;
 
-	/*
-	 * FIXME: Mitigation for CVE-2017-5753.  To be replaced with a
-	 * generic mechanism.
-	 */
-	asm("lfence");
+	BUILD_BUG_ON(ARRAY_SIZE(vmcs_field_to_offset_table) > SHRT_MAX);
 
-	if (vmcs_field_to_offset_table[field] == 0)
+	offset = array_ptr(vmcs_field_to_offset_table, field,
+			ARRAY_SIZE(vmcs_field_to_offset_table));
+	if (!offset || *offset == 0)
 		return -ENOENT;
-
-	return vmcs_field_to_offset_table[field];
+	return *offset;
 }
 
 static inline struct vmcs12 *get_vmcs12(struct kvm_vcpu *vcpu)

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [kernel-hardening] [PATCH v4 09/10] kvm, x86: fix spectre-v1 mitigation
@ 2018-01-19  0:02   ` Dan Williams
  0 siblings, 0 replies; 70+ messages in thread
From: Dan Williams @ 2018-01-19  0:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, kernel-hardening, Andrew Honig, stable, gregkh,
	Paolo Bonzini, tglx, alan, torvalds, akpm, Jim Mattson

Commit 75f139aaf896 "KVM: x86: Add memory barrier on vmcs field lookup"
added a raw 'asm("lfence");' to prevent a bounds check bypass of
'vmcs_field_to_offset_table'. This does not work for some AMD cpus, see
the 'ifence' helper, and it otherwise does not use the common
'array_ptr' helper designed for these types of fixes. Convert this to
use 'array_ptr'.

Cc: Andrew Honig <ahonig@google.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/kvm/vmx.c |   19 +++++++------------
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c829d89e2e63..20b9b0b5e336 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -34,6 +34,7 @@
 #include <linux/tboot.h>
 #include <linux/hrtimer.h>
 #include <linux/frame.h>
+#include <linux/nospec.h>
 #include "kvm_cache_regs.h"
 #include "x86.h"
 
@@ -898,21 +899,15 @@ static const unsigned short vmcs_field_to_offset_table[] = {
 
 static inline short vmcs_field_to_offset(unsigned long field)
 {
-	BUILD_BUG_ON(ARRAY_SIZE(vmcs_field_to_offset_table) > SHRT_MAX);
-
-	if (field >= ARRAY_SIZE(vmcs_field_to_offset_table))
-		return -ENOENT;
+	const unsigned short *offset;
 
-	/*
-	 * FIXME: Mitigation for CVE-2017-5753.  To be replaced with a
-	 * generic mechanism.
-	 */
-	asm("lfence");
+	BUILD_BUG_ON(ARRAY_SIZE(vmcs_field_to_offset_table) > SHRT_MAX);
 
-	if (vmcs_field_to_offset_table[field] == 0)
+	offset = array_ptr(vmcs_field_to_offset_table, field,
+			ARRAY_SIZE(vmcs_field_to_offset_table));
+	if (!offset || *offset == 0)
 		return -ENOENT;
-
-	return vmcs_field_to_offset_table[field];
+	return *offset;
 }
 
 static inline struct vmcs12 *get_vmcs12(struct kvm_vcpu *vcpu)

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v4 10/10] nl80211: sanitize array index in parse_txq_params
@ 2018-01-19  0:02   ` Dan Williams
  0 siblings, 0 replies; 70+ messages in thread
From: Dan Williams @ 2018-01-19  0:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, akpm, kernel-hardening, gregkh, Christian Lamparter,
	linux-wireless, tglx, Johannes Berg, torvalds, David S. Miller,
	Elena Reshetova, alan

Wireless drivers rely on parse_txq_params to validate that
txq_params->ac is less than NL80211_NUM_ACS by the time the low-level
driver's ->conf_tx() handler is called. Use a new helper, 'array_idx',
to sanitize txq_params->ac with respect to speculation. I.e. ensure that
any speculation into ->conf_tx() handlers is done with a value of
txq_params->ac that is within the bounds of [0, NL80211_NUM_ACS).

Reported-by: Christian Lamparter <chunkeey@gmail.com>
Reported-by: Elena Reshetova <elena.reshetova@intel.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-wireless@vger.kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/nospec.h |   21 +++++++++++++++++++++
 net/wireless/nl80211.c |   10 +++++++---
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/include/linux/nospec.h b/include/linux/nospec.h
index f841c11f3f1f..8af35be1869e 100644
--- a/include/linux/nospec.h
+++ b/include/linux/nospec.h
@@ -41,4 +41,25 @@ static inline unsigned long array_ptr_mask(unsigned long idx, unsigned long sz)
 	__u._bit &= _mask;						\
 	__u._ptr;							\
 })
+
+/**
+ * array_idx - Generate a pointer to an array index, ensuring the
+ * pointer is bounded under speculation to NULL.
+ *
+ * @idx: the index of the element, must be less than LONG_MAX
+ * @sz: the number of elements in the array, must be less than LONG_MAX
+ *
+ * If @idx falls in the interval [0, @sz), returns &@idx otherwise
+ * returns NULL.
+ */
+#define array_idx(idx, sz)						\
+({									\
+	union { typeof((idx)) *_ptr; unsigned long _bit; } __u;		\
+	typeof(idx) *_i = &(idx);					\
+	unsigned long _mask = array_ptr_mask(*_i, (sz));		\
+									\
+	__u._ptr = _i;							\
+	__u._bit &= _mask;						\
+	__u._ptr;							\
+})
 #endif /* __NOSPEC_H__ */
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 2b3dbcd40e46..202cb1dc03ee 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -16,6 +16,7 @@
 #include <linux/nl80211.h>
 #include <linux/rtnetlink.h>
 #include <linux/netlink.h>
+#include <linux/nospec.h>
 #include <linux/etherdevice.h>
 #include <net/net_namespace.h>
 #include <net/genetlink.h>
@@ -2056,20 +2057,23 @@ static const struct nla_policy txq_params_policy[NL80211_TXQ_ATTR_MAX + 1] = {
 static int parse_txq_params(struct nlattr *tb[],
 			    struct ieee80211_txq_params *txq_params)
 {
+	u8 ac, *idx;
+
 	if (!tb[NL80211_TXQ_ATTR_AC] || !tb[NL80211_TXQ_ATTR_TXOP] ||
 	    !tb[NL80211_TXQ_ATTR_CWMIN] || !tb[NL80211_TXQ_ATTR_CWMAX] ||
 	    !tb[NL80211_TXQ_ATTR_AIFS])
 		return -EINVAL;
 
-	txq_params->ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]);
+	ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]);
 	txq_params->txop = nla_get_u16(tb[NL80211_TXQ_ATTR_TXOP]);
 	txq_params->cwmin = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMIN]);
 	txq_params->cwmax = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMAX]);
 	txq_params->aifs = nla_get_u8(tb[NL80211_TXQ_ATTR_AIFS]);
 
-	if (txq_params->ac >= NL80211_NUM_ACS)
+	idx = array_idx(ac, NL80211_NUM_ACS);
+	if (!idx)
 		return -EINVAL;
-
+	txq_params->ac = *idx;
 	return 0;
 }
 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v4 10/10] nl80211: sanitize array index in parse_txq_params
@ 2018-01-19  0:02   ` Dan Williams
  0 siblings, 0 replies; 70+ messages in thread
From: Dan Williams @ 2018-01-19  0:02 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-arch-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	kernel-hardening-ZwoEplunGu1jrUoiu81ncdBPR1lH4CV8,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r, Christian Lamparter,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	tglx-hfZtesqFncYOwBW4kG4KsQ, Johannes Berg,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, David S. Miller,
	Elena Reshetova, alan-VuQAYsv1563Yd54FQh9/CA

Wireless drivers rely on parse_txq_params to validate that
txq_params->ac is less than NL80211_NUM_ACS by the time the low-level
driver's ->conf_tx() handler is called. Use a new helper, 'array_idx',
to sanitize txq_params->ac with respect to speculation. I.e. ensure that
any speculation into ->conf_tx() handlers is done with a value of
txq_params->ac that is within the bounds of [0, NL80211_NUM_ACS).

Reported-by: Christian Lamparter <chunkeey-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Reported-by: Elena Reshetova <elena.reshetova-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: Johannes Berg <johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>
Cc: "David S. Miller" <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Signed-off-by: Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 include/linux/nospec.h |   21 +++++++++++++++++++++
 net/wireless/nl80211.c |   10 +++++++---
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/include/linux/nospec.h b/include/linux/nospec.h
index f841c11f3f1f..8af35be1869e 100644
--- a/include/linux/nospec.h
+++ b/include/linux/nospec.h
@@ -41,4 +41,25 @@ static inline unsigned long array_ptr_mask(unsigned long idx, unsigned long sz)
 	__u._bit &= _mask;						\
 	__u._ptr;							\
 })
+
+/**
+ * array_idx - Generate a pointer to an array index, ensuring the
+ * pointer is bounded under speculation to NULL.
+ *
+ * @idx: the index of the element, must be less than LONG_MAX
+ * @sz: the number of elements in the array, must be less than LONG_MAX
+ *
+ * If @idx falls in the interval [0, @sz), returns &@idx otherwise
+ * returns NULL.
+ */
+#define array_idx(idx, sz)						\
+({									\
+	union { typeof((idx)) *_ptr; unsigned long _bit; } __u;		\
+	typeof(idx) *_i = &(idx);					\
+	unsigned long _mask = array_ptr_mask(*_i, (sz));		\
+									\
+	__u._ptr = _i;							\
+	__u._bit &= _mask;						\
+	__u._ptr;							\
+})
 #endif /* __NOSPEC_H__ */
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 2b3dbcd40e46..202cb1dc03ee 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -16,6 +16,7 @@
 #include <linux/nl80211.h>
 #include <linux/rtnetlink.h>
 #include <linux/netlink.h>
+#include <linux/nospec.h>
 #include <linux/etherdevice.h>
 #include <net/net_namespace.h>
 #include <net/genetlink.h>
@@ -2056,20 +2057,23 @@ static const struct nla_policy txq_params_policy[NL80211_TXQ_ATTR_MAX + 1] = {
 static int parse_txq_params(struct nlattr *tb[],
 			    struct ieee80211_txq_params *txq_params)
 {
+	u8 ac, *idx;
+
 	if (!tb[NL80211_TXQ_ATTR_AC] || !tb[NL80211_TXQ_ATTR_TXOP] ||
 	    !tb[NL80211_TXQ_ATTR_CWMIN] || !tb[NL80211_TXQ_ATTR_CWMAX] ||
 	    !tb[NL80211_TXQ_ATTR_AIFS])
 		return -EINVAL;
 
-	txq_params->ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]);
+	ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]);
 	txq_params->txop = nla_get_u16(tb[NL80211_TXQ_ATTR_TXOP]);
 	txq_params->cwmin = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMIN]);
 	txq_params->cwmax = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMAX]);
 	txq_params->aifs = nla_get_u8(tb[NL80211_TXQ_ATTR_AIFS]);
 
-	if (txq_params->ac >= NL80211_NUM_ACS)
+	idx = array_idx(ac, NL80211_NUM_ACS);
+	if (!idx)
 		return -EINVAL;

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [kernel-hardening] [PATCH v4 10/10] nl80211: sanitize array index in parse_txq_params
@ 2018-01-19  0:02   ` Dan Williams
  0 siblings, 0 replies; 70+ messages in thread
From: Dan Williams @ 2018-01-19  0:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, akpm, kernel-hardening, gregkh, Christian Lamparter,
	linux-wireless, tglx, Johannes Berg, torvalds, David S. Miller,
	Elena Reshetova, alan

Wireless drivers rely on parse_txq_params to validate that
txq_params->ac is less than NL80211_NUM_ACS by the time the low-level
driver's ->conf_tx() handler is called. Use a new helper, 'array_idx',
to sanitize txq_params->ac with respect to speculation. I.e. ensure that
any speculation into ->conf_tx() handlers is done with a value of
txq_params->ac that is within the bounds of [0, NL80211_NUM_ACS).

Reported-by: Christian Lamparter <chunkeey@gmail.com>
Reported-by: Elena Reshetova <elena.reshetova@intel.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-wireless@vger.kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/nospec.h |   21 +++++++++++++++++++++
 net/wireless/nl80211.c |   10 +++++++---
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/include/linux/nospec.h b/include/linux/nospec.h
index f841c11f3f1f..8af35be1869e 100644
--- a/include/linux/nospec.h
+++ b/include/linux/nospec.h
@@ -41,4 +41,25 @@ static inline unsigned long array_ptr_mask(unsigned long idx, unsigned long sz)
 	__u._bit &= _mask;						\
 	__u._ptr;							\
 })
+
+/**
+ * array_idx - Generate a pointer to an array index, ensuring the
+ * pointer is bounded under speculation to NULL.
+ *
+ * @idx: the index of the element, must be less than LONG_MAX
+ * @sz: the number of elements in the array, must be less than LONG_MAX
+ *
+ * If @idx falls in the interval [0, @sz), returns &@idx otherwise
+ * returns NULL.
+ */
+#define array_idx(idx, sz)						\
+({									\
+	union { typeof((idx)) *_ptr; unsigned long _bit; } __u;		\
+	typeof(idx) *_i = &(idx);					\
+	unsigned long _mask = array_ptr_mask(*_i, (sz));		\
+									\
+	__u._ptr = _i;							\
+	__u._bit &= _mask;						\
+	__u._ptr;							\
+})
 #endif /* __NOSPEC_H__ */
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 2b3dbcd40e46..202cb1dc03ee 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -16,6 +16,7 @@
 #include <linux/nl80211.h>
 #include <linux/rtnetlink.h>
 #include <linux/netlink.h>
+#include <linux/nospec.h>
 #include <linux/etherdevice.h>
 #include <net/net_namespace.h>
 #include <net/genetlink.h>
@@ -2056,20 +2057,23 @@ static const struct nla_policy txq_params_policy[NL80211_TXQ_ATTR_MAX + 1] = {
 static int parse_txq_params(struct nlattr *tb[],
 			    struct ieee80211_txq_params *txq_params)
 {
+	u8 ac, *idx;
+
 	if (!tb[NL80211_TXQ_ATTR_AC] || !tb[NL80211_TXQ_ATTR_TXOP] ||
 	    !tb[NL80211_TXQ_ATTR_CWMIN] || !tb[NL80211_TXQ_ATTR_CWMAX] ||
 	    !tb[NL80211_TXQ_ATTR_AIFS])
 		return -EINVAL;
 
-	txq_params->ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]);
+	ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]);
 	txq_params->txop = nla_get_u16(tb[NL80211_TXQ_ATTR_TXOP]);
 	txq_params->cwmin = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMIN]);
 	txq_params->cwmax = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMAX]);
 	txq_params->aifs = nla_get_u8(tb[NL80211_TXQ_ATTR_AIFS]);
 
-	if (txq_params->ac >= NL80211_NUM_ACS)
+	idx = array_idx(ac, NL80211_NUM_ACS);
+	if (!idx)
 		return -EINVAL;
-
+	txq_params->ac = *idx;
 	return 0;
 }
 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 09/10] kvm, x86: fix spectre-v1 mitigation
  2018-01-19  0:02   ` [kernel-hardening] " Dan Williams
@ 2018-01-19  8:42     ` Paolo Bonzini
  -1 siblings, 0 replies; 70+ messages in thread
From: Paolo Bonzini @ 2018-01-19  8:42 UTC (permalink / raw)
  To: Dan Williams, linux-kernel
  Cc: linux-arch, kernel-hardening, Andrew Honig, stable, gregkh, tglx,
	alan, torvalds, akpm, Jim Mattson

On 19/01/2018 01:02, Dan Williams wrote:
> Commit 75f139aaf896 "KVM: x86: Add memory barrier on vmcs field lookup"
> added a raw 'asm("lfence");' to prevent a bounds check bypass of
> 'vmcs_field_to_offset_table'. This does not work for some AMD cpus, see
> the 'ifence' helper,

The code never runs on AMD cpus (it's for Intel virtualization
extensions), so it'd be nice if you could fix up the commit message.

Apart from this, obviously

Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Thanks!

Paolo

> and it otherwise does not use the common
> 'array_ptr' helper designed for these types of fixes. Convert this to
> use 'array_ptr'.
> 
> Cc: Andrew Honig <ahonig@google.com>
> Cc: Jim Mattson <jmattson@google.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  arch/x86/kvm/vmx.c |   19 +++++++------------
>  1 file changed, 7 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index c829d89e2e63..20b9b0b5e336 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -34,6 +34,7 @@
>  #include <linux/tboot.h>
>  #include <linux/hrtimer.h>
>  #include <linux/frame.h>
> +#include <linux/nospec.h>
>  #include "kvm_cache_regs.h"
>  #include "x86.h"
>  
> @@ -898,21 +899,15 @@ static const unsigned short vmcs_field_to_offset_table[] = {
>  
>  static inline short vmcs_field_to_offset(unsigned long field)
>  {
> -	BUILD_BUG_ON(ARRAY_SIZE(vmcs_field_to_offset_table) > SHRT_MAX);
> -
> -	if (field >= ARRAY_SIZE(vmcs_field_to_offset_table))
> -		return -ENOENT;
> +	const unsigned short *offset;
>  
> -	/*
> -	 * FIXME: Mitigation for CVE-2017-5753.  To be replaced with a
> -	 * generic mechanism.
> -	 */
> -	asm("lfence");
> +	BUILD_BUG_ON(ARRAY_SIZE(vmcs_field_to_offset_table) > SHRT_MAX);
>  
> -	if (vmcs_field_to_offset_table[field] == 0)
> +	offset = array_ptr(vmcs_field_to_offset_table, field,
> +			ARRAY_SIZE(vmcs_field_to_offset_table));
> +	if (!offset || *offset == 0)
>  		return -ENOENT;
> -
> -	return vmcs_field_to_offset_table[field];
> +	return *offset;
>  }
>  
>  static inline struct vmcs12 *get_vmcs12(struct kvm_vcpu *vcpu)
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [kernel-hardening] Re: [PATCH v4 09/10] kvm, x86: fix spectre-v1 mitigation
@ 2018-01-19  8:42     ` Paolo Bonzini
  0 siblings, 0 replies; 70+ messages in thread
From: Paolo Bonzini @ 2018-01-19  8:42 UTC (permalink / raw)
  To: Dan Williams, linux-kernel
  Cc: linux-arch, kernel-hardening, Andrew Honig, stable, gregkh, tglx,
	alan, torvalds, akpm, Jim Mattson

On 19/01/2018 01:02, Dan Williams wrote:
> Commit 75f139aaf896 "KVM: x86: Add memory barrier on vmcs field lookup"
> added a raw 'asm("lfence");' to prevent a bounds check bypass of
> 'vmcs_field_to_offset_table'. This does not work for some AMD cpus, see
> the 'ifence' helper,

The code never runs on AMD cpus (it's for Intel virtualization
extensions), so it'd be nice if you could fix up the commit message.

Apart from this, obviously

Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Thanks!

Paolo

> and it otherwise does not use the common
> 'array_ptr' helper designed for these types of fixes. Convert this to
> use 'array_ptr'.
> 
> Cc: Andrew Honig <ahonig@google.com>
> Cc: Jim Mattson <jmattson@google.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  arch/x86/kvm/vmx.c |   19 +++++++------------
>  1 file changed, 7 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index c829d89e2e63..20b9b0b5e336 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -34,6 +34,7 @@
>  #include <linux/tboot.h>
>  #include <linux/hrtimer.h>
>  #include <linux/frame.h>
> +#include <linux/nospec.h>
>  #include "kvm_cache_regs.h"
>  #include "x86.h"
>  
> @@ -898,21 +899,15 @@ static const unsigned short vmcs_field_to_offset_table[] = {
>  
>  static inline short vmcs_field_to_offset(unsigned long field)
>  {
> -	BUILD_BUG_ON(ARRAY_SIZE(vmcs_field_to_offset_table) > SHRT_MAX);
> -
> -	if (field >= ARRAY_SIZE(vmcs_field_to_offset_table))
> -		return -ENOENT;
> +	const unsigned short *offset;
>  
> -	/*
> -	 * FIXME: Mitigation for CVE-2017-5753.  To be replaced with a
> -	 * generic mechanism.
> -	 */
> -	asm("lfence");
> +	BUILD_BUG_ON(ARRAY_SIZE(vmcs_field_to_offset_table) > SHRT_MAX);
>  
> -	if (vmcs_field_to_offset_table[field] == 0)
> +	offset = array_ptr(vmcs_field_to_offset_table, field,
> +			ARRAY_SIZE(vmcs_field_to_offset_table));
> +	if (!offset || *offset == 0)
>  		return -ENOENT;
> -
> -	return vmcs_field_to_offset_table[field];
> +	return *offset;
>  }
>  
>  static inline struct vmcs12 *get_vmcs12(struct kvm_vcpu *vcpu)
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [kernel-hardening] [PATCH v4 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
  2018-01-19  0:01   ` [kernel-hardening] " Dan Williams
  (?)
@ 2018-01-19 10:20   ` Jann Horn
  2018-01-19 17:48       ` Adam Sampson
  2018-01-19 18:18       ` Linus Torvalds
  -1 siblings, 2 replies; 70+ messages in thread
From: Jann Horn @ 2018-01-19 10:20 UTC (permalink / raw)
  To: Dan Williams
  Cc: kernel list, linux-arch, Kernel Hardening, Catalin Marinas,
	the arch/x86 maintainers, Will Deacon, Russell King, Ingo Molnar,
	Greg Kroah-Hartman, H. Peter Anvin, Thomas Gleixner,
	Linus Torvalds, Andrew Morton, alan

On Fri, Jan 19, 2018 at 1:01 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> 'array_ptr' is proposed as a generic mechanism to mitigate against
> Spectre-variant-1 attacks, i.e. an attack that bypasses boundary checks
> via speculative execution). The 'array_ptr' implementation is expected
> to be safe for current generation cpus across multiple architectures
> (ARM, x86).
>
> Based on an original implementation by Linus Torvalds, tweaked to remove
> speculative flows by Alexei Starovoitov, and tweaked again by Linus to
> introduce an x86 assembly implementation for the mask generation.
[...]
> +/*
> + * If idx is negative or if idx > size then bit 63 is set in the mask,
> + * and the value of ~(-1L) is zero. When the mask is zero, bounds check
> + * failed, array_ptr will return NULL.
> + */
> +#ifndef array_ptr_mask
> +static inline unsigned long array_ptr_mask(unsigned long idx, unsigned long sz)
> +{
> +       return ~(long)(idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1);
> +}
> +#endif

Nit: Maybe add a comment saying that this is equivalent to
"return ((long)idx >= 0 && idx < sz) ? ULONG_MAX : 0"?

> +/**
> + * array_ptr - Generate a pointer to an array element, ensuring
> + * the pointer is bounded under speculation to NULL.
> + *
> + * @base: the base of the array
> + * @idx: the index of the element, must be less than LONG_MAX
> + * @sz: the number of elements in the array, must be less than LONG_MAX
> + *
> + * If @idx falls in the interval [0, @sz), returns the pointer to
> + * @arr[@idx], otherwise returns NULL.
> + */
> +#define array_ptr(base, idx, sz)                                       \
> +({                                                                     \
> +       union { typeof(*(base)) *_ptr; unsigned long _bit; } __u;       \
> +       typeof(*(base)) *_arr = (base);                                 \
> +       unsigned long _i = (idx);                                       \
> +       unsigned long _mask = array_ptr_mask(_i, (sz));                 \
> +                                                                       \
> +       __u._ptr = _arr + (_i & _mask);                                 \
> +       __u._bit &= _mask;                                              \

AFAICS, if `idx` is out of bounds, you first zero out the index
(`_i & _mask`) and then immediately afterwards zero out
the whole pointer (`_u._bit &= _mask`).
Is there a reason for the `_i & _mask`, and if so, can you
add a comment explaining that?

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [kernel-hardening] [PATCH v4 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
  2018-01-19 10:20   ` Jann Horn
  2018-01-19 17:48       ` Adam Sampson
  2018-01-19 18:18       ` Linus Torvalds
@ 2018-01-19 17:48       ` Adam Sampson
  1 sibling, 0 replies; 70+ messages in thread
From: Adam Sampson @ 2018-01-19 17:48 UTC (permalink / raw)
  To: Jann Horn
  Cc: Dan Williams, kernel list, linux-arch, Kernel Hardening,
	Catalin Marinas, the arch/x86 maintainers, Will Deacon,
	Russell King, Ingo Molnar, Greg Kroah-Hartman, H. Peter Anvin,
	Thomas Gleixner, Linus Torvalds, Andrew Morton, alan

Jann Horn <jannh@google.com> writes:

>> +/*
>> + * If idx is negative or if idx > size then bit 63 is set in the mask,
>> + * and the value of ~(-1L) is zero. When the mask is zero, bounds check
>> + * failed, array_ptr will return NULL.
>> + */
>> +#ifndef array_ptr_mask
>> +static inline unsigned long array_ptr_mask(unsigned long idx,
>> unsigned long sz)
>> +{
>> +       return ~(long)(idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1);
>> +}
>> +#endif
>
> Nit: Maybe add a comment saying that this is equivalent to
> "return ((long)idx >= 0 && idx < sz) ? ULONG_MAX : 0"?

That's only true when sz < LONG_MAX, which is documented below but not
here; it's also different from the asm version, which doesn't do the idx
<= LONG_MAX check. So making the constraint explicit would be a good idea.

>From a bit of experimentation, when the top bit of sz is set, this
expression, the C version and the assembler version all have different
behaviour. For example, with 32-bit unsigned long:

index=00000000 size=80000001: expr=ffffffff c=00000000 asm=ffffffff
index=80000000 size=80000001: expr=00000000 c=00000000 asm=ffffffff
index=00000000 size=a0000000: expr=ffffffff c=00000000 asm=ffffffff
index=00000001 size=a0000000: expr=ffffffff c=00000000 asm=ffffffff
index=fffffffe size=ffffffff: expr=00000000 c=00000000 asm=ffffffff

It may be worth noting that:

     return 0 - ((long) (idx < sz));

causes GCC, on ia32 and amd64, to generate exactly the same cmp/sbb
sequence as in Linus's asm. Are there architectures where this form
would allow speculation?

Thanks,

-- 
Adam Sampson <ats@offog.org>                         <http://offog.org/>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [kernel-hardening] [PATCH v4 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
@ 2018-01-19 17:48       ` Adam Sampson
  0 siblings, 0 replies; 70+ messages in thread
From: Adam Sampson @ 2018-01-19 17:48 UTC (permalink / raw)
  To: Jann Horn
  Cc: Dan Williams, kernel list, linux-arch, Kernel Hardening,
	Catalin Marinas, the arch/x86 maintainers, Will Deacon,
	Russell King, Ingo Molnar, Greg Kroah-Hartman, H. Peter Anvin,
	Thomas Gleixner, Linus Torvalds, Andrew Morton, alan

Jann Horn <jannh@google.com> writes:

>> +/*
>> + * If idx is negative or if idx > size then bit 63 is set in the mask,
>> + * and the value of ~(-1L) is zero. When the mask is zero, bounds check
>> + * failed, array_ptr will return NULL.
>> + */
>> +#ifndef array_ptr_mask
>> +static inline unsigned long array_ptr_mask(unsigned long idx,
>> unsigned long sz)
>> +{
>> +       return ~(long)(idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1);
>> +}
>> +#endif
>
> Nit: Maybe add a comment saying that this is equivalent to
> "return ((long)idx >= 0 && idx < sz) ? ULONG_MAX : 0"?

That's only true when sz < LONG_MAX, which is documented below but not
here; it's also different from the asm version, which doesn't do the idx
<= LONG_MAX check. So making the constraint explicit would be a good idea.

From a bit of experimentation, when the top bit of sz is set, this
expression, the C version and the assembler version all have different
behaviour. For example, with 32-bit unsigned long:

index=00000000 size=80000001: expr=ffffffff c=00000000 asm=ffffffff
index=80000000 size=80000001: expr=00000000 c=00000000 asm=ffffffff
index=00000000 size=a0000000: expr=ffffffff c=00000000 asm=ffffffff
index=00000001 size=a0000000: expr=ffffffff c=00000000 asm=ffffffff
index=fffffffe size=ffffffff: expr=00000000 c=00000000 asm=ffffffff

It may be worth noting that:

     return 0 - ((long) (idx < sz));

causes GCC, on ia32 and amd64, to generate exactly the same cmp/sbb
sequence as in Linus's asm. Are there architectures where this form
would allow speculation?

Thanks,

-- 
Adam Sampson <ats@offog.org>                         <http://offog.org/>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [kernel-hardening] [PATCH v4 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
@ 2018-01-19 17:48       ` Adam Sampson
  0 siblings, 0 replies; 70+ messages in thread
From: Adam Sampson @ 2018-01-19 17:48 UTC (permalink / raw)
  To: Jann Horn
  Cc: Dan Williams, kernel list, linux-arch, Kernel Hardening,
	Catalin Marinas, the arch/x86 maintainers, Will Deacon,
	Russell King, Ingo Molnar, Greg Kroah-Hartman, H. Peter Anvin,
	Thomas Gleixner, Linus Torvalds, Andrew Morton, alan

Jann Horn <jannh@google.com> writes:

>> +/*
>> + * If idx is negative or if idx > size then bit 63 is set in the mask,
>> + * and the value of ~(-1L) is zero. When the mask is zero, bounds check
>> + * failed, array_ptr will return NULL.
>> + */
>> +#ifndef array_ptr_mask
>> +static inline unsigned long array_ptr_mask(unsigned long idx,
>> unsigned long sz)
>> +{
>> +       return ~(long)(idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1);
>> +}
>> +#endif
>
> Nit: Maybe add a comment saying that this is equivalent to
> "return ((long)idx >= 0 && idx < sz) ? ULONG_MAX : 0"?

That's only true when sz < LONG_MAX, which is documented below but not
here; it's also different from the asm version, which doesn't do the idx
<= LONG_MAX check. So making the constraint explicit would be a good idea.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [kernel-hardening] [PATCH v4 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
@ 2018-01-19 17:48       ` Adam Sampson
  0 siblings, 0 replies; 70+ messages in thread
From: Adam Sampson @ 2018-01-19 17:48 UTC (permalink / raw)
  To: Jann Horn
  Cc: Dan Williams, kernel list, linux-arch, Kernel Hardening,
	Catalin Marinas, the arch/x86 maintainers, Will Deacon,
	Russell King, Ingo Molnar, Greg Kroah-Hartman, H. Peter Anvin,
	Thomas Gleixner, Linus Torvalds, Andrew Morton, alan

Jann Horn <jannh@google.com> writes:

>> +/*
>> + * If idx is negative or if idx > size then bit 63 is set in the mask,
>> + * and the value of ~(-1L) is zero. When the mask is zero, bounds check
>> + * failed, array_ptr will return NULL.
>> + */
>> +#ifndef array_ptr_mask
>> +static inline unsigned long array_ptr_mask(unsigned long idx,
>> unsigned long sz)
>> +{
>> +       return ~(long)(idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1);
>> +}
>> +#endif
>
> Nit: Maybe add a comment saying that this is equivalent to
> "return ((long)idx >= 0 && idx < sz) ? ULONG_MAX : 0"?

That's only true when sz < LONG_MAX, which is documented below but not
here; it's also different from the asm version, which doesn't do the idx
<= LONG_MAX check. So making the constraint explicit would be a good idea.

>From a bit of experimentation, when the top bit of sz is set, this
expression, the C version and the assembler version all have different
behaviour. For example, with 32-bit unsigned long:

index=00000000 size=80000001: expr=ffffffff c=00000000 asm=ffffffff
index=80000000 size=80000001: expr=00000000 c=00000000 asm=ffffffff
index=00000000 size=a0000000: expr=ffffffff c=00000000 asm=ffffffff
index=00000001 size=a0000000: expr=ffffffff c=00000000 asm=ffffffff
index=fffffffe size=ffffffff: expr=00000000 c=00000000 asm=ffffffff

It may be worth noting that:

     return 0 - ((long) (idx < sz));

causes GCC, on ia32 and amd64, to generate exactly the same cmp/sbb
sequence as in Linus's asm. Are there architectures where this form
would allow speculation?

Thanks,

-- 
Adam Sampson <ats@offog.org>                         <http://offog.org/>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [kernel-hardening] [PATCH v4 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
  2018-01-19 17:48       ` Adam Sampson
                         ` (2 preceding siblings ...)
  (?)
@ 2018-01-19 18:12       ` Dan Williams
  2018-01-19 18:18           ` Will Deacon
  -1 siblings, 1 reply; 70+ messages in thread
From: Dan Williams @ 2018-01-19 18:12 UTC (permalink / raw)
  To: Adam Sampson
  Cc: Jann Horn, kernel list, linux-arch, Kernel Hardening,
	Catalin Marinas, the arch/x86 maintainers, Will Deacon,
	Russell King, Ingo Molnar, Greg Kroah-Hartman, H. Peter Anvin,
	Thomas Gleixner, Linus Torvalds, Andrew Morton, Alan Cox,
	Alexei Starovoitov

[ adding Alexei back to the cc ]

On Fri, Jan 19, 2018 at 9:48 AM, Adam Sampson <ats@offog.org> wrote:
> Jann Horn <jannh@google.com> writes:
>
>>> +/*
>>> + * If idx is negative or if idx > size then bit 63 is set in the mask,
>>> + * and the value of ~(-1L) is zero. When the mask is zero, bounds check
>>> + * failed, array_ptr will return NULL.
>>> + */
>>> +#ifndef array_ptr_mask
>>> +static inline unsigned long array_ptr_mask(unsigned long idx,
>>> unsigned long sz)
>>> +{
>>> +       return ~(long)(idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1);
>>> +}
>>> +#endif
>>
>> Nit: Maybe add a comment saying that this is equivalent to
>> "return ((long)idx >= 0 && idx < sz) ? ULONG_MAX : 0"?
>
> That's only true when sz < LONG_MAX, which is documented below but not
> here; it's also different from the asm version, which doesn't do the idx
> <= LONG_MAX check. So making the constraint explicit would be a good idea.
>
> From a bit of experimentation, when the top bit of sz is set, this
> expression, the C version and the assembler version all have different
> behaviour. For example, with 32-bit unsigned long:
>
> index=00000000 size=80000001: expr=ffffffff c=00000000 asm=ffffffff
> index=80000000 size=80000001: expr=00000000 c=00000000 asm=ffffffff
> index=00000000 size=a0000000: expr=ffffffff c=00000000 asm=ffffffff
> index=00000001 size=a0000000: expr=ffffffff c=00000000 asm=ffffffff
> index=fffffffe size=ffffffff: expr=00000000 c=00000000 asm=ffffffff
>
> It may be worth noting that:
>
>      return 0 - ((long) (idx < sz));
>
> causes GCC, on ia32 and amd64, to generate exactly the same cmp/sbb
> sequence as in Linus's asm. Are there architectures where this form
> would allow speculation?

We're operating on the assumption that compilers will not try to
introduce branches where they don't exist in the code, so if this is
producing identical assembly I think we should go with it and drop the
x86 array_ptr_mask.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [kernel-hardening] [PATCH v4 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
  2018-01-19 18:12       ` Dan Williams
@ 2018-01-19 18:18           ` Will Deacon
  0 siblings, 0 replies; 70+ messages in thread
From: Will Deacon @ 2018-01-19 18:18 UTC (permalink / raw)
  To: Dan Williams
  Cc: Adam Sampson, Jann Horn, kernel list, linux-arch,
	Kernel Hardening, Catalin Marinas, the arch/x86 maintainers,
	Russell King, Ingo Molnar, Greg Kroah-Hartman, H. Peter Anvin,
	Thomas Gleixner, Linus Torvalds, Andrew Morton, Alan Cox,
	Alexei Starovoitov

On Fri, Jan 19, 2018 at 10:12:47AM -0800, Dan Williams wrote:
> [ adding Alexei back to the cc ]
> 
> On Fri, Jan 19, 2018 at 9:48 AM, Adam Sampson <ats@offog.org> wrote:
> > Jann Horn <jannh@google.com> writes:
> >
> >>> +/*
> >>> + * If idx is negative or if idx > size then bit 63 is set in the mask,
> >>> + * and the value of ~(-1L) is zero. When the mask is zero, bounds check
> >>> + * failed, array_ptr will return NULL.
> >>> + */
> >>> +#ifndef array_ptr_mask
> >>> +static inline unsigned long array_ptr_mask(unsigned long idx,
> >>> unsigned long sz)
> >>> +{
> >>> +       return ~(long)(idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1);
> >>> +}
> >>> +#endif
> >>
> >> Nit: Maybe add a comment saying that this is equivalent to
> >> "return ((long)idx >= 0 && idx < sz) ? ULONG_MAX : 0"?
> >
> > That's only true when sz < LONG_MAX, which is documented below but not
> > here; it's also different from the asm version, which doesn't do the idx
> > <= LONG_MAX check. So making the constraint explicit would be a good idea.
> >
> > From a bit of experimentation, when the top bit of sz is set, this
> > expression, the C version and the assembler version all have different
> > behaviour. For example, with 32-bit unsigned long:
> >
> > index=00000000 size=80000001: expr=ffffffff c=00000000 asm=ffffffff
> > index=80000000 size=80000001: expr=00000000 c=00000000 asm=ffffffff
> > index=00000000 size=a0000000: expr=ffffffff c=00000000 asm=ffffffff
> > index=00000001 size=a0000000: expr=ffffffff c=00000000 asm=ffffffff
> > index=fffffffe size=ffffffff: expr=00000000 c=00000000 asm=ffffffff
> >
> > It may be worth noting that:
> >
> >      return 0 - ((long) (idx < sz));
> >
> > causes GCC, on ia32 and amd64, to generate exactly the same cmp/sbb
> > sequence as in Linus's asm. Are there architectures where this form
> > would allow speculation?
> 
> We're operating on the assumption that compilers will not try to
> introduce branches where they don't exist in the code, so if this is
> producing identical assembly I think we should go with it and drop the
> x86 array_ptr_mask.

Branches, perhaps, but this could easily be compiled to a conditional
select (CSEL) instruction on arm64 and that wouldn't be safe without a
CSDB. Of course, we can do our own thing in assembly to prevent that, but
it would mean that the generic C implementation would not be robust for us.

Will

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
@ 2018-01-19 18:18           ` Will Deacon
  0 siblings, 0 replies; 70+ messages in thread
From: Will Deacon @ 2018-01-19 18:18 UTC (permalink / raw)
  To: Dan Williams
  Cc: Adam Sampson, Jann Horn, kernel list, linux-arch,
	Kernel Hardening, Catalin Marinas, the arch/x86 maintainers,
	Russell King, Ingo Molnar, Greg Kroah-Hartman, H. Peter Anvin,
	Thomas Gleixner, Linus Torvalds, Andrew Morton, Alan Cox,
	Alexei Starovoitov

On Fri, Jan 19, 2018 at 10:12:47AM -0800, Dan Williams wrote:
> [ adding Alexei back to the cc ]
> 
> On Fri, Jan 19, 2018 at 9:48 AM, Adam Sampson <ats@offog.org> wrote:
> > Jann Horn <jannh@google.com> writes:
> >
> >>> +/*
> >>> + * If idx is negative or if idx > size then bit 63 is set in the mask,
> >>> + * and the value of ~(-1L) is zero. When the mask is zero, bounds check
> >>> + * failed, array_ptr will return NULL.
> >>> + */
> >>> +#ifndef array_ptr_mask
> >>> +static inline unsigned long array_ptr_mask(unsigned long idx,
> >>> unsigned long sz)
> >>> +{
> >>> +       return ~(long)(idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1);
> >>> +}
> >>> +#endif
> >>
> >> Nit: Maybe add a comment saying that this is equivalent to
> >> "return ((long)idx >= 0 && idx < sz) ? ULONG_MAX : 0"?
> >
> > That's only true when sz < LONG_MAX, which is documented below but not
> > here; it's also different from the asm version, which doesn't do the idx
> > <= LONG_MAX check. So making the constraint explicit would be a good idea.
> >
> > From a bit of experimentation, when the top bit of sz is set, this
> > expression, the C version and the assembler version all have different
> > behaviour. For example, with 32-bit unsigned long:
> >
> > index=00000000 size=80000001: expr=ffffffff c=00000000 asm=ffffffff
> > index=80000000 size=80000001: expr=00000000 c=00000000 asm=ffffffff
> > index=00000000 size=a0000000: expr=ffffffff c=00000000 asm=ffffffff
> > index=00000001 size=a0000000: expr=ffffffff c=00000000 asm=ffffffff
> > index=fffffffe size=ffffffff: expr=00000000 c=00000000 asm=ffffffff
> >
> > It may be worth noting that:
> >
> >      return 0 - ((long) (idx < sz));
> >
> > causes GCC, on ia32 and amd64, to generate exactly the same cmp/sbb
> > sequence as in Linus's asm. Are there architectures where this form
> > would allow speculation?
> 
> We're operating on the assumption that compilers will not try to
> introduce branches where they don't exist in the code, so if this is
> producing identical assembly I think we should go with it and drop the
> x86 array_ptr_mask.

Branches, perhaps, but this could easily be compiled to a conditional
select (CSEL) instruction on arm64 and that wouldn't be safe without a
CSDB. Of course, we can do our own thing in assembly to prevent that, but
it would mean that the generic C implementation would not be robust for us.

Will

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [kernel-hardening] [PATCH v4 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
  2018-01-19 10:20   ` Jann Horn
@ 2018-01-19 18:18       ` Linus Torvalds
  2018-01-19 18:18       ` Linus Torvalds
  1 sibling, 0 replies; 70+ messages in thread
From: Linus Torvalds @ 2018-01-19 18:18 UTC (permalink / raw)
  To: Jann Horn
  Cc: Dan Williams, kernel list, linux-arch, Kernel Hardening,
	Catalin Marinas, the arch/x86 maintainers, Will Deacon,
	Russell King, Ingo Molnar, Greg Kroah-Hartman, H. Peter Anvin,
	Thomas Gleixner, Andrew Morton, Alan Cox

On Fri, Jan 19, 2018 at 2:20 AM, Jann Horn <jannh@google.com> wrote:
>> +                                                                       \
>> +       __u._ptr = _arr + (_i & _mask);                                 \
>> +       __u._bit &= _mask;                                              \
>
> AFAICS, if `idx` is out of bounds, you first zero out the index
> (`_i & _mask`) and then immediately afterwards zero out
> the whole pointer (`_u._bit &= _mask`).
> Is there a reason for the `_i & _mask`, and if so, can you
> add a comment explaining that?

I think that's just leftovers from my original (untested) thing that
also did the access itself. So that __u._bit masking wasn't masking
the pointer, it was masking the value that was *read* from the
pointer, so that you could know that an invalid access returned
0/NULL, not just the first value in the array.

                  Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
@ 2018-01-19 18:18       ` Linus Torvalds
  0 siblings, 0 replies; 70+ messages in thread
From: Linus Torvalds @ 2018-01-19 18:18 UTC (permalink / raw)
  To: Jann Horn
  Cc: Dan Williams, kernel list, linux-arch, Kernel Hardening,
	Catalin Marinas, the arch/x86 maintainers, Will Deacon,
	Russell King, Ingo Molnar, Greg Kroah-Hartman, H. Peter Anvin,
	Thomas Gleixner, Andrew Morton, Alan Cox

On Fri, Jan 19, 2018 at 2:20 AM, Jann Horn <jannh@google.com> wrote:
>> +                                                                       \
>> +       __u._ptr = _arr + (_i & _mask);                                 \
>> +       __u._bit &= _mask;                                              \
>
> AFAICS, if `idx` is out of bounds, you first zero out the index
> (`_i & _mask`) and then immediately afterwards zero out
> the whole pointer (`_u._bit &= _mask`).
> Is there a reason for the `_i & _mask`, and if so, can you
> add a comment explaining that?

I think that's just leftovers from my original (untested) thing that
also did the access itself. So that __u._bit masking wasn't masking
the pointer, it was masking the value that was *read* from the
pointer, so that you could know that an invalid access returned
0/NULL, not just the first value in the array.

                  Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [kernel-hardening] [PATCH v4 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
  2018-01-19 18:18           ` Will Deacon
  (?)
@ 2018-01-19 18:26           ` Dan Williams
  -1 siblings, 0 replies; 70+ messages in thread
From: Dan Williams @ 2018-01-19 18:26 UTC (permalink / raw)
  To: Will Deacon
  Cc: Adam Sampson, Jann Horn, kernel list, linux-arch,
	Kernel Hardening, Catalin Marinas, the arch/x86 maintainers,
	Russell King, Ingo Molnar, Greg Kroah-Hartman, H. Peter Anvin,
	Thomas Gleixner, Linus Torvalds, Andrew Morton, Alan Cox,
	Alexei Starovoitov

On Fri, Jan 19, 2018 at 10:18 AM, Will Deacon <will.deacon@arm.com> wrote:
>
> On Fri, Jan 19, 2018 at 10:12:47AM -0800, Dan Williams wrote:
> > [ adding Alexei back to the cc ]
> >
> > On Fri, Jan 19, 2018 at 9:48 AM, Adam Sampson <ats@offog.org> wrote:
> > > Jann Horn <jannh@google.com> writes:
> > >
> > >>> +/*
> > >>> + * If idx is negative or if idx > size then bit 63 is set in the mask,
> > >>> + * and the value of ~(-1L) is zero. When the mask is zero, bounds check
> > >>> + * failed, array_ptr will return NULL.
> > >>> + */
> > >>> +#ifndef array_ptr_mask
> > >>> +static inline unsigned long array_ptr_mask(unsigned long idx,
> > >>> unsigned long sz)
> > >>> +{
> > >>> +       return ~(long)(idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1);
> > >>> +}
> > >>> +#endif
> > >>
> > >> Nit: Maybe add a comment saying that this is equivalent to
> > >> "return ((long)idx >= 0 && idx < sz) ? ULONG_MAX : 0"?
> > >
> > > That's only true when sz < LONG_MAX, which is documented below but not
> > > here; it's also different from the asm version, which doesn't do the idx
> > > <= LONG_MAX check. So making the constraint explicit would be a good idea.
> > >
> > > From a bit of experimentation, when the top bit of sz is set, this
> > > expression, the C version and the assembler version all have different
> > > behaviour. For example, with 32-bit unsigned long:
> > >
> > > index=00000000 size=80000001: expr=ffffffff c=00000000 asm=ffffffff
> > > index=80000000 size=80000001: expr=00000000 c=00000000 asm=ffffffff
> > > index=00000000 size=a0000000: expr=ffffffff c=00000000 asm=ffffffff
> > > index=00000001 size=a0000000: expr=ffffffff c=00000000 asm=ffffffff
> > > index=fffffffe size=ffffffff: expr=00000000 c=00000000 asm=ffffffff
> > >
> > > It may be worth noting that:
> > >
> > >      return 0 - ((long) (idx < sz));
> > >
> > > causes GCC, on ia32 and amd64, to generate exactly the same cmp/sbb
> > > sequence as in Linus's asm. Are there architectures where this form
> > > would allow speculation?
> >
> > We're operating on the assumption that compilers will not try to
> > introduce branches where they don't exist in the code, so if this is
> > producing identical assembly I think we should go with it and drop the
> > x86 array_ptr_mask.
>
> Branches, perhaps, but this could easily be compiled to a conditional
> select (CSEL) instruction on arm64 and that wouldn't be safe without a
> CSDB. Of course, we can do our own thing in assembly to prevent that, but
> it would mean that the generic C implementation would not be robust for us.
>

Ah, ok good to know. Likely if the current version doesn't produce a
conditional instruction on ARM perhaps it also won't do that on other
architectures, so it is safer.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [kernel-hardening] [PATCH v4 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
  2018-01-19 18:18       ` Linus Torvalds
  (?)
@ 2018-01-19 20:55       ` Dan Williams
  -1 siblings, 0 replies; 70+ messages in thread
From: Dan Williams @ 2018-01-19 20:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jann Horn, kernel list, linux-arch, Kernel Hardening,
	Catalin Marinas, the arch/x86 maintainers, Will Deacon,
	Russell King, Ingo Molnar, Greg Kroah-Hartman, H. Peter Anvin,
	Thomas Gleixner, Andrew Morton, Alan Cox

On Fri, Jan 19, 2018 at 10:18 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Jan 19, 2018 at 2:20 AM, Jann Horn <jannh@google.com> wrote:
>>> +                                                                       \
>>> +       __u._ptr = _arr + (_i & _mask);                                 \
>>> +       __u._bit &= _mask;                                              \
>>
>> AFAICS, if `idx` is out of bounds, you first zero out the index
>> (`_i & _mask`) and then immediately afterwards zero out
>> the whole pointer (`_u._bit &= _mask`).
>> Is there a reason for the `_i & _mask`, and if so, can you
>> add a comment explaining that?
>
> I think that's just leftovers from my original (untested) thing that
> also did the access itself. So that __u._bit masking wasn't masking
> the pointer, it was masking the value that was *read* from the
> pointer, so that you could know that an invalid access returned
> 0/NULL, not just the first value in the array.

Yes, the index masking can be dropped since we're returning a
sanitized array element pointer now.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 00/10] prevent bounds-check bypass via speculative execution
  2018-01-19  0:01 ` Dan Williams
  (?)
@ 2018-01-20  6:58   ` Dan Williams
  -1 siblings, 0 replies; 70+ messages in thread
From: Dan Williams @ 2018-01-20  6:58 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Mark Rutland, Kernel Hardening, Peter Zijlstra, Catalin Marinas,
	Will Deacon, H. Peter Anvin, Elena Reshetova, linux-arch,
	Andi Kleen, Jonathan Corbet, X86 ML, Russell King, Ingo Molnar,
	Andrew Honig, Alan Cox, Tom Lendacky, Kees Cook, Al Viro,
	Andy Lutomirski, Thomas Gleixner, Andrew Morton, Jim Mattson,
	Christian Lamparter, Greg KH, Linux Wireless List, stable,
	Paolo Bonzini, Johannes Berg, Linus Torvalds, David S. Miller

On Thu, Jan 18, 2018 at 4:01 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> Changes since v3 [1]
> * Drop 'ifence_array_ptr' and associated compile-time + run-time
>   switching and just use the masking approach all the time.
>
> * Convert 'get_user' to use pointer sanitization via masking rather than
>   lfence. '__get_user' and associated paths still rely on
>   lfence. (Linus)
>
>       "Basically, the rule is trivial: find all 'stac' users, and use
>        address masking if those users already integrate the limit
>        check, and lfence they don't."
>
> * At syscall entry sanitize the syscall number under speculation
>   to remove a user controlled pointer de-reference in kernel
>   space.  (Linus)
>
> * Fix a raw lfence in the kvm code (added for v4.15-rc8) to use
>   'array_ptr'.
>
> * Propose 'array_idx' as a way to sanitize user input that is
>   later used as an array index, but where the validation is
>   happening in a different code block than the array reference.
>   (Christian).
>
> * Fix grammar in speculation.txt (Kees)
>
> ---
>
> Quoting Mark's original RFC:
>
> "Recently, Google Project Zero discovered several classes of attack
> against speculative execution. One of these, known as variant-1, allows
> explicit bounds checks to be bypassed under speculation, providing an
> arbitrary read gadget. Further details can be found on the GPZ blog [2]
> and the Documentation patch in this series."
>
> A precondition of using this attack on the kernel is to get a user
> controlled pointer de-referenced (under speculation) in privileged code.
> The primary source of user controlled pointers in the kernel is the
> arguments passed to 'get_user' and '__get_user'. An example of other
> user controlled pointers are user-controlled array / pointer offsets.
>
> Better tooling is needed to find more arrays / pointers with user
> controlled indices / offsets that can be converted to use 'array_ptr' or
> 'array_idx'. A few are included in this set, and these are not expected
> to be complete. That said, the 'get_user' protections raise the bar on
> finding a vulnerable gadget in the kernel.
>
> These patches are also available via the 'nospec-v4' git branch here:
>
>     git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v4

I've pushed out a nospec-v4.1 with the below minor cleanup, a fixup of
the changelog for "kvm, x86: fix spectre-v1 mitigation", and added
Paolo's ack.

     git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v4.1

diff --git a/include/linux/nospec.h b/include/linux/nospec.h
index 8af35be1869e..b8a9222e34d1 100644
--- a/include/linux/nospec.h
+++ b/include/linux/nospec.h
@@ -37,7 +37,7 @@ static inline unsigned long array_ptr_mask(unsigned
long idx, unsigned long sz)
        unsigned long _i = (idx);                                       \
        unsigned long _mask = array_ptr_mask(_i, (sz));                 \
                                                                        \
-       __u._ptr = _arr + (_i & _mask);                                 \
+       __u._ptr = _arr + _i;                                           \
        __u._bit &= _mask;                                              \
        __u._ptr;                                                       \
 })

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 00/10] prevent bounds-check bypass via speculative execution
@ 2018-01-20  6:58   ` Dan Williams
  0 siblings, 0 replies; 70+ messages in thread
From: Dan Williams @ 2018-01-20  6:58 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Mark Rutland, Kernel Hardening, Peter Zijlstra, Catalin Marinas,
	Will Deacon, H. Peter Anvin, Elena Reshetova, linux-arch,
	Andi Kleen, Jonathan Corbet, X86 ML, Russell King, Ingo Molnar,
	Andrew Honig, Alan Cox, Tom Lendacky, Kees Cook, Al Viro,
	Andy Lutomirski, Thomas Gleixner

On Thu, Jan 18, 2018 at 4:01 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> Changes since v3 [1]
> * Drop 'ifence_array_ptr' and associated compile-time + run-time
>   switching and just use the masking approach all the time.
>
> * Convert 'get_user' to use pointer sanitization via masking rather than
>   lfence. '__get_user' and associated paths still rely on
>   lfence. (Linus)
>
>       "Basically, the rule is trivial: find all 'stac' users, and use
>        address masking if those users already integrate the limit
>        check, and lfence they don't."
>
> * At syscall entry sanitize the syscall number under speculation
>   to remove a user controlled pointer de-reference in kernel
>   space.  (Linus)
>
> * Fix a raw lfence in the kvm code (added for v4.15-rc8) to use
>   'array_ptr'.
>
> * Propose 'array_idx' as a way to sanitize user input that is
>   later used as an array index, but where the validation is
>   happening in a different code block than the array reference.
>   (Christian).
>
> * Fix grammar in speculation.txt (Kees)
>
> ---
>
> Quoting Mark's original RFC:
>
> "Recently, Google Project Zero discovered several classes of attack
> against speculative execution. One of these, known as variant-1, allows
> explicit bounds checks to be bypassed under speculation, providing an
> arbitrary read gadget. Further details can be found on the GPZ blog [2]
> and the Documentation patch in this series."
>
> A precondition of using this attack on the kernel is to get a user
> controlled pointer de-referenced (under speculation) in privileged code.
> The primary source of user controlled pointers in the kernel is the
> arguments passed to 'get_user' and '__get_user'. An example of other
> user controlled pointers are user-controlled array / pointer offsets.
>
> Better tooling is needed to find more arrays / pointers with user
> controlled indices / offsets that can be converted to use 'array_ptr' or
> 'array_idx'. A few are included in this set, and these are not expected
> to be complete. That said, the 'get_user' protections raise the bar on
> finding a vulnerable gadget in the kernel.
>
> These patches are also available via the 'nospec-v4' git branch here:
>
>     git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v4

I've pushed out a nospec-v4.1 with the below minor cleanup, a fixup of
the changelog for "kvm, x86: fix spectre-v1 mitigation", and added
Paolo's ack.

     git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v4.1

diff --git a/include/linux/nospec.h b/include/linux/nospec.h
index 8af35be1869e..b8a9222e34d1 100644
--- a/include/linux/nospec.h
+++ b/include/linux/nospec.h
@@ -37,7 +37,7 @@ static inline unsigned long array_ptr_mask(unsigned
long idx, unsigned long sz)
        unsigned long _i = (idx);                                       \
        unsigned long _mask = array_ptr_mask(_i, (sz));                 \
                                                                        \
-       __u._ptr = _arr + (_i & _mask);                                 \
+       __u._ptr = _arr + _i;                                           \
        __u._bit &= _mask;                                              \
        __u._ptr;                                                       \
 })

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [kernel-hardening] Re: [PATCH v4 00/10] prevent bounds-check bypass via speculative execution
@ 2018-01-20  6:58   ` Dan Williams
  0 siblings, 0 replies; 70+ messages in thread
From: Dan Williams @ 2018-01-20  6:58 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Mark Rutland, Kernel Hardening, Peter Zijlstra, Catalin Marinas,
	Will Deacon, H. Peter Anvin, Elena Reshetova, linux-arch,
	Andi Kleen, Jonathan Corbet, X86 ML, Russell King, Ingo Molnar,
	Andrew Honig, Alan Cox, Tom Lendacky, Kees Cook, Al Viro,
	Andy Lutomirski, Thomas Gleixner, Andrew Morton, Jim Mattson,
	Christian Lamparter, Greg KH, Linux Wireless List, stable,
	Paolo Bonzini, Johannes Berg, Linus Torvalds, David S. Miller

On Thu, Jan 18, 2018 at 4:01 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> Changes since v3 [1]
> * Drop 'ifence_array_ptr' and associated compile-time + run-time
>   switching and just use the masking approach all the time.
>
> * Convert 'get_user' to use pointer sanitization via masking rather than
>   lfence. '__get_user' and associated paths still rely on
>   lfence. (Linus)
>
>       "Basically, the rule is trivial: find all 'stac' users, and use
>        address masking if those users already integrate the limit
>        check, and lfence they don't."
>
> * At syscall entry sanitize the syscall number under speculation
>   to remove a user controlled pointer de-reference in kernel
>   space.  (Linus)
>
> * Fix a raw lfence in the kvm code (added for v4.15-rc8) to use
>   'array_ptr'.
>
> * Propose 'array_idx' as a way to sanitize user input that is
>   later used as an array index, but where the validation is
>   happening in a different code block than the array reference.
>   (Christian).
>
> * Fix grammar in speculation.txt (Kees)
>
> ---
>
> Quoting Mark's original RFC:
>
> "Recently, Google Project Zero discovered several classes of attack
> against speculative execution. One of these, known as variant-1, allows
> explicit bounds checks to be bypassed under speculation, providing an
> arbitrary read gadget. Further details can be found on the GPZ blog [2]
> and the Documentation patch in this series."
>
> A precondition of using this attack on the kernel is to get a user
> controlled pointer de-referenced (under speculation) in privileged code.
> The primary source of user controlled pointers in the kernel is the
> arguments passed to 'get_user' and '__get_user'. An example of other
> user controlled pointers are user-controlled array / pointer offsets.
>
> Better tooling is needed to find more arrays / pointers with user
> controlled indices / offsets that can be converted to use 'array_ptr' or
> 'array_idx'. A few are included in this set, and these are not expected
> to be complete. That said, the 'get_user' protections raise the bar on
> finding a vulnerable gadget in the kernel.
>
> These patches are also available via the 'nospec-v4' git branch here:
>
>     git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v4

I've pushed out a nospec-v4.1 with the below minor cleanup, a fixup of
the changelog for "kvm, x86: fix spectre-v1 mitigation", and added
Paolo's ack.

     git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v4.1

diff --git a/include/linux/nospec.h b/include/linux/nospec.h
index 8af35be1869e..b8a9222e34d1 100644
--- a/include/linux/nospec.h
+++ b/include/linux/nospec.h
@@ -37,7 +37,7 @@ static inline unsigned long array_ptr_mask(unsigned
long idx, unsigned long sz)
        unsigned long _i = (idx);                                       \
        unsigned long _mask = array_ptr_mask(_i, (sz));                 \
                                                                        \
-       __u._ptr = _arr + (_i & _mask);                                 \
+       __u._ptr = _arr + _i;                                           \
        __u._bit &= _mask;                                              \
        __u._ptr;                                                       \
 })

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 00/10] prevent bounds-check bypass via speculative execution
  2018-01-20  6:58   ` Dan Williams
  (?)
@ 2018-01-20 16:56     ` Alexei Starovoitov
  -1 siblings, 0 replies; 70+ messages in thread
From: Alexei Starovoitov @ 2018-01-20 16:56 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux Kernel Mailing List, Mark Rutland, Kernel Hardening,
	Peter Zijlstra, Catalin Marinas, Will Deacon, H. Peter Anvin,
	Elena Reshetova, linux-arch, Andi Kleen, Jonathan Corbet, X86 ML,
	Russell King, Ingo Molnar, Andrew Honig, Alan Cox, Tom Lendacky,
	Kees Cook, Al Viro, Andy Lutomirski, Thomas Gleixner,
	Andrew Morton, Jim Mattson, Christian Lamparter, Greg KH,
	Linux Wireless List, stable, Paolo Bonzini, Johannes Berg,
	Linus Torvalds, David S. Miller

On Fri, Jan 19, 2018 at 10:58:44PM -0800, Dan Williams wrote:
> On Thu, Jan 18, 2018 at 4:01 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> > Changes since v3 [1]
> > * Drop 'ifence_array_ptr' and associated compile-time + run-time
> >   switching and just use the masking approach all the time.
> >
> > * Convert 'get_user' to use pointer sanitization via masking rather than
> >   lfence. '__get_user' and associated paths still rely on
> >   lfence. (Linus)
> >
> >       "Basically, the rule is trivial: find all 'stac' users, and use
> >        address masking if those users already integrate the limit
> >        check, and lfence they don't."
> >
> > * At syscall entry sanitize the syscall number under speculation
> >   to remove a user controlled pointer de-reference in kernel
> >   space.  (Linus)
> >
> > * Fix a raw lfence in the kvm code (added for v4.15-rc8) to use
> >   'array_ptr'.
> >
> > * Propose 'array_idx' as a way to sanitize user input that is
> >   later used as an array index, but where the validation is
> >   happening in a different code block than the array reference.
> >   (Christian).
> >
> > * Fix grammar in speculation.txt (Kees)
> >
> > ---
> >
> > Quoting Mark's original RFC:
> >
> > "Recently, Google Project Zero discovered several classes of attack
> > against speculative execution. One of these, known as variant-1, allows
> > explicit bounds checks to be bypassed under speculation, providing an
> > arbitrary read gadget. Further details can be found on the GPZ blog [2]
> > and the Documentation patch in this series."
> >
> > A precondition of using this attack on the kernel is to get a user
> > controlled pointer de-referenced (under speculation) in privileged code.
> > The primary source of user controlled pointers in the kernel is the
> > arguments passed to 'get_user' and '__get_user'. An example of other
> > user controlled pointers are user-controlled array / pointer offsets.
> >
> > Better tooling is needed to find more arrays / pointers with user
> > controlled indices / offsets that can be converted to use 'array_ptr' or
> > 'array_idx'. A few are included in this set, and these are not expected
> > to be complete. That said, the 'get_user' protections raise the bar on
> > finding a vulnerable gadget in the kernel.
> >
> > These patches are also available via the 'nospec-v4' git branch here:
> >
> >     git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v4
> 
> I've pushed out a nospec-v4.1 with the below minor cleanup, a fixup of
> the changelog for "kvm, x86: fix spectre-v1 mitigation", and added
> Paolo's ack.
> 
>      git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v4.1
> 
> diff --git a/include/linux/nospec.h b/include/linux/nospec.h
> index 8af35be1869e..b8a9222e34d1 100644
> --- a/include/linux/nospec.h
> +++ b/include/linux/nospec.h
> @@ -37,7 +37,7 @@ static inline unsigned long array_ptr_mask(unsigned
> long idx, unsigned long sz)
>         unsigned long _i = (idx);                                       \
>         unsigned long _mask = array_ptr_mask(_i, (sz));                 \
>                                                                         \
> -       __u._ptr = _arr + (_i & _mask);                                 \
> +       __u._ptr = _arr + _i;                                           \
>         __u._bit &= _mask;                                              \
>         __u._ptr;                                                       \

hmm. I'm not sure it's the right thing to do, since the macro
is forcing cpu to speculate subsequent load from null instead
of valid pointer.
As Linus said: "
 So that __u._bit masking wasn't masking
 the pointer, it was masking the value that was *read* from the
 pointer, so that you could know that an invalid access returned
 0/NULL, not just the first value in the array.
"
imo just
  return _arr + (_i & _mask);
is enough. No need for union games.
The cpu will speculate the load from _arr[0] if _i is out of bounds
which is the same as if user passed _i == 0 which would have passed
bounds check anyway, so I don't see any data leak from populating
cache with _arr[0] data. In-bounds access can do that just as well
without any speculation.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 00/10] prevent bounds-check bypass via speculative execution
@ 2018-01-20 16:56     ` Alexei Starovoitov
  0 siblings, 0 replies; 70+ messages in thread
From: Alexei Starovoitov @ 2018-01-20 16:56 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux Kernel Mailing List, Mark Rutland, Kernel Hardening,
	Peter Zijlstra, Catalin Marinas, Will Deacon, H. Peter Anvin,
	Elena Reshetova, linux-arch, Andi Kleen, Jonathan Corbet, X86 ML,
	Russell King, Ingo Molnar, Andrew Honig, Alan Cox, Tom Lendacky,
	Kees Cook, Al Viro, Andy Lutomirski

On Fri, Jan 19, 2018 at 10:58:44PM -0800, Dan Williams wrote:
> On Thu, Jan 18, 2018 at 4:01 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> > Changes since v3 [1]
> > * Drop 'ifence_array_ptr' and associated compile-time + run-time
> >   switching and just use the masking approach all the time.
> >
> > * Convert 'get_user' to use pointer sanitization via masking rather than
> >   lfence. '__get_user' and associated paths still rely on
> >   lfence. (Linus)
> >
> >       "Basically, the rule is trivial: find all 'stac' users, and use
> >        address masking if those users already integrate the limit
> >        check, and lfence they don't."
> >
> > * At syscall entry sanitize the syscall number under speculation
> >   to remove a user controlled pointer de-reference in kernel
> >   space.  (Linus)
> >
> > * Fix a raw lfence in the kvm code (added for v4.15-rc8) to use
> >   'array_ptr'.
> >
> > * Propose 'array_idx' as a way to sanitize user input that is
> >   later used as an array index, but where the validation is
> >   happening in a different code block than the array reference.
> >   (Christian).
> >
> > * Fix grammar in speculation.txt (Kees)
> >
> > ---
> >
> > Quoting Mark's original RFC:
> >
> > "Recently, Google Project Zero discovered several classes of attack
> > against speculative execution. One of these, known as variant-1, allows
> > explicit bounds checks to be bypassed under speculation, providing an
> > arbitrary read gadget. Further details can be found on the GPZ blog [2]
> > and the Documentation patch in this series."
> >
> > A precondition of using this attack on the kernel is to get a user
> > controlled pointer de-referenced (under speculation) in privileged code.
> > The primary source of user controlled pointers in the kernel is the
> > arguments passed to 'get_user' and '__get_user'. An example of other
> > user controlled pointers are user-controlled array / pointer offsets.
> >
> > Better tooling is needed to find more arrays / pointers with user
> > controlled indices / offsets that can be converted to use 'array_ptr' or
> > 'array_idx'. A few are included in this set, and these are not expected
> > to be complete. That said, the 'get_user' protections raise the bar on
> > finding a vulnerable gadget in the kernel.
> >
> > These patches are also available via the 'nospec-v4' git branch here:
> >
> >     git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v4
> 
> I've pushed out a nospec-v4.1 with the below minor cleanup, a fixup of
> the changelog for "kvm, x86: fix spectre-v1 mitigation", and added
> Paolo's ack.
> 
>      git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v4.1
> 
> diff --git a/include/linux/nospec.h b/include/linux/nospec.h
> index 8af35be1869e..b8a9222e34d1 100644
> --- a/include/linux/nospec.h
> +++ b/include/linux/nospec.h
> @@ -37,7 +37,7 @@ static inline unsigned long array_ptr_mask(unsigned
> long idx, unsigned long sz)
>         unsigned long _i = (idx);                                       \
>         unsigned long _mask = array_ptr_mask(_i, (sz));                 \
>                                                                         \
> -       __u._ptr = _arr + (_i & _mask);                                 \
> +       __u._ptr = _arr + _i;                                           \
>         __u._bit &= _mask;                                              \
>         __u._ptr;                                                       \

hmm. I'm not sure it's the right thing to do, since the macro
is forcing cpu to speculate subsequent load from null instead
of valid pointer.
As Linus said: "
 So that __u._bit masking wasn't masking
 the pointer, it was masking the value that was *read* from the
 pointer, so that you could know that an invalid access returned
 0/NULL, not just the first value in the array.
"
imo just
  return _arr + (_i & _mask);
is enough. No need for union games.
The cpu will speculate the load from _arr[0] if _i is out of bounds
which is the same as if user passed _i == 0 which would have passed
bounds check anyway, so I don't see any data leak from populating
cache with _arr[0] data. In-bounds access can do that just as well
without any speculation.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [kernel-hardening] Re: [PATCH v4 00/10] prevent bounds-check bypass via speculative execution
@ 2018-01-20 16:56     ` Alexei Starovoitov
  0 siblings, 0 replies; 70+ messages in thread
From: Alexei Starovoitov @ 2018-01-20 16:56 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux Kernel Mailing List, Mark Rutland, Kernel Hardening,
	Peter Zijlstra, Catalin Marinas, Will Deacon, H. Peter Anvin,
	Elena Reshetova, linux-arch, Andi Kleen, Jonathan Corbet, X86 ML,
	Russell King, Ingo Molnar, Andrew Honig, Alan Cox, Tom Lendacky,
	Kees Cook, Al Viro, Andy Lutomirski, Thomas Gleixner,
	Andrew Morton, Jim Mattson, Christian Lamparter, Greg KH,
	Linux Wireless List, stable, Paolo Bonzini, Johannes Berg,
	Linus Torvalds, David S. Miller

On Fri, Jan 19, 2018 at 10:58:44PM -0800, Dan Williams wrote:
> On Thu, Jan 18, 2018 at 4:01 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> > Changes since v3 [1]
> > * Drop 'ifence_array_ptr' and associated compile-time + run-time
> >   switching and just use the masking approach all the time.
> >
> > * Convert 'get_user' to use pointer sanitization via masking rather than
> >   lfence. '__get_user' and associated paths still rely on
> >   lfence. (Linus)
> >
> >       "Basically, the rule is trivial: find all 'stac' users, and use
> >        address masking if those users already integrate the limit
> >        check, and lfence they don't."
> >
> > * At syscall entry sanitize the syscall number under speculation
> >   to remove a user controlled pointer de-reference in kernel
> >   space.  (Linus)
> >
> > * Fix a raw lfence in the kvm code (added for v4.15-rc8) to use
> >   'array_ptr'.
> >
> > * Propose 'array_idx' as a way to sanitize user input that is
> >   later used as an array index, but where the validation is
> >   happening in a different code block than the array reference.
> >   (Christian).
> >
> > * Fix grammar in speculation.txt (Kees)
> >
> > ---
> >
> > Quoting Mark's original RFC:
> >
> > "Recently, Google Project Zero discovered several classes of attack
> > against speculative execution. One of these, known as variant-1, allows
> > explicit bounds checks to be bypassed under speculation, providing an
> > arbitrary read gadget. Further details can be found on the GPZ blog [2]
> > and the Documentation patch in this series."
> >
> > A precondition of using this attack on the kernel is to get a user
> > controlled pointer de-referenced (under speculation) in privileged code.
> > The primary source of user controlled pointers in the kernel is the
> > arguments passed to 'get_user' and '__get_user'. An example of other
> > user controlled pointers are user-controlled array / pointer offsets.
> >
> > Better tooling is needed to find more arrays / pointers with user
> > controlled indices / offsets that can be converted to use 'array_ptr' or
> > 'array_idx'. A few are included in this set, and these are not expected
> > to be complete. That said, the 'get_user' protections raise the bar on
> > finding a vulnerable gadget in the kernel.
> >
> > These patches are also available via the 'nospec-v4' git branch here:
> >
> >     git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v4
> 
> I've pushed out a nospec-v4.1 with the below minor cleanup, a fixup of
> the changelog for "kvm, x86: fix spectre-v1 mitigation", and added
> Paolo's ack.
> 
>      git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v4.1
> 
> diff --git a/include/linux/nospec.h b/include/linux/nospec.h
> index 8af35be1869e..b8a9222e34d1 100644
> --- a/include/linux/nospec.h
> +++ b/include/linux/nospec.h
> @@ -37,7 +37,7 @@ static inline unsigned long array_ptr_mask(unsigned
> long idx, unsigned long sz)
>         unsigned long _i = (idx);                                       \
>         unsigned long _mask = array_ptr_mask(_i, (sz));                 \
>                                                                         \
> -       __u._ptr = _arr + (_i & _mask);                                 \
> +       __u._ptr = _arr + _i;                                           \
>         __u._bit &= _mask;                                              \
>         __u._ptr;                                                       \

hmm. I'm not sure it's the right thing to do, since the macro
is forcing cpu to speculate subsequent load from null instead
of valid pointer.
As Linus said: "
 So that __u._bit masking wasn't masking
 the pointer, it was masking the value that was *read* from the
 pointer, so that you could know that an invalid access returned
 0/NULL, not just the first value in the array.
"
imo just
  return _arr + (_i & _mask);
is enough. No need for union games.
The cpu will speculate the load from _arr[0] if _i is out of bounds
which is the same as if user passed _i == 0 which would have passed
bounds check anyway, so I don't see any data leak from populating
cache with _arr[0] data. In-bounds access can do that just as well
without any speculation.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 00/10] prevent bounds-check bypass via speculative execution
  2018-01-20 16:56     ` Alexei Starovoitov
  (?)
@ 2018-01-20 17:07       ` Alexei Starovoitov
  -1 siblings, 0 replies; 70+ messages in thread
From: Alexei Starovoitov @ 2018-01-20 17:07 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux Kernel Mailing List, Mark Rutland, Kernel Hardening,
	Peter Zijlstra, Catalin Marinas, Will Deacon, H. Peter Anvin,
	Elena Reshetova, linux-arch, Andi Kleen, Jonathan Corbet, X86 ML,
	Russell King, Ingo Molnar, Andrew Honig, Alan Cox, Tom Lendacky,
	Kees Cook, Al Viro, Andy Lutomirski, Thomas Gleixner,
	Andrew Morton, Jim Mattson, Christian Lamparter, Greg KH,
	Linux Wireless List, stable, Paolo Bonzini, Johannes Berg,
	Linus Torvalds, David S. Miller

On Sat, Jan 20, 2018 at 8:56 AM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Fri, Jan 19, 2018 at 10:58:44PM -0800, Dan Williams wrote:
>> On Thu, Jan 18, 2018 at 4:01 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>> > Changes since v3 [1]
>> > * Drop 'ifence_array_ptr' and associated compile-time + run-time
>> >   switching and just use the masking approach all the time.
>> >
>> > * Convert 'get_user' to use pointer sanitization via masking rather than
>> >   lfence. '__get_user' and associated paths still rely on
>> >   lfence. (Linus)
>> >
>> >       "Basically, the rule is trivial: find all 'stac' users, and use
>> >        address masking if those users already integrate the limit
>> >        check, and lfence they don't."
>> >
>> > * At syscall entry sanitize the syscall number under speculation
>> >   to remove a user controlled pointer de-reference in kernel
>> >   space.  (Linus)
>> >
>> > * Fix a raw lfence in the kvm code (added for v4.15-rc8) to use
>> >   'array_ptr'.
>> >
>> > * Propose 'array_idx' as a way to sanitize user input that is
>> >   later used as an array index, but where the validation is
>> >   happening in a different code block than the array reference.
>> >   (Christian).
>> >
>> > * Fix grammar in speculation.txt (Kees)
>> >
>> > ---
>> >
>> > Quoting Mark's original RFC:
>> >
>> > "Recently, Google Project Zero discovered several classes of attack
>> > against speculative execution. One of these, known as variant-1, allows
>> > explicit bounds checks to be bypassed under speculation, providing an
>> > arbitrary read gadget. Further details can be found on the GPZ blog [2]
>> > and the Documentation patch in this series."
>> >
>> > A precondition of using this attack on the kernel is to get a user
>> > controlled pointer de-referenced (under speculation) in privileged code.
>> > The primary source of user controlled pointers in the kernel is the
>> > arguments passed to 'get_user' and '__get_user'. An example of other
>> > user controlled pointers are user-controlled array / pointer offsets.
>> >
>> > Better tooling is needed to find more arrays / pointers with user
>> > controlled indices / offsets that can be converted to use 'array_ptr' or
>> > 'array_idx'. A few are included in this set, and these are not expected
>> > to be complete. That said, the 'get_user' protections raise the bar on
>> > finding a vulnerable gadget in the kernel.
>> >
>> > These patches are also available via the 'nospec-v4' git branch here:
>> >
>> >     git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v4
>>
>> I've pushed out a nospec-v4.1 with the below minor cleanup, a fixup of
>> the changelog for "kvm, x86: fix spectre-v1 mitigation", and added
>> Paolo's ack.
>>
>>      git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v4.1
>>
>> diff --git a/include/linux/nospec.h b/include/linux/nospec.h
>> index 8af35be1869e..b8a9222e34d1 100644
>> --- a/include/linux/nospec.h
>> +++ b/include/linux/nospec.h
>> @@ -37,7 +37,7 @@ static inline unsigned long array_ptr_mask(unsigned
>> long idx, unsigned long sz)
>>         unsigned long _i = (idx);                                       \
>>         unsigned long _mask = array_ptr_mask(_i, (sz));                 \
>>                                                                         \
>> -       __u._ptr = _arr + (_i & _mask);                                 \
>> +       __u._ptr = _arr + _i;                                           \
>>         __u._bit &= _mask;                                              \
>>         __u._ptr;                                                       \
>
> hmm. I'm not sure it's the right thing to do, since the macro
> is forcing cpu to speculate subsequent load from null instead
> of valid pointer.
> As Linus said: "
>  So that __u._bit masking wasn't masking
>  the pointer, it was masking the value that was *read* from the
>  pointer, so that you could know that an invalid access returned
>  0/NULL, not just the first value in the array.
> "
> imo just
>   return _arr + (_i & _mask);
> is enough. No need for union games.
> The cpu will speculate the load from _arr[0] if _i is out of bounds
> which is the same as if user passed _i == 0 which would have passed
> bounds check anyway, so I don't see any data leak from populating
> cache with _arr[0] data. In-bounds access can do that just as well
> without any speculation.

scratch that. It's array_ptr, not array_access.
The code will do if (!ptr) later, so yeah this api is fine.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 00/10] prevent bounds-check bypass via speculative execution
@ 2018-01-20 17:07       ` Alexei Starovoitov
  0 siblings, 0 replies; 70+ messages in thread
From: Alexei Starovoitov @ 2018-01-20 17:07 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux Kernel Mailing List, Mark Rutland, Kernel Hardening,
	Peter Zijlstra, Catalin Marinas, Will Deacon, H. Peter Anvin,
	Elena Reshetova, linux-arch, Andi Kleen, Jonathan Corbet, X86 ML,
	Russell King, Ingo Molnar, Andrew Honig, Alan Cox, Tom Lendacky,
	Kees Cook, Al Viro, Andy Lutomirski

On Sat, Jan 20, 2018 at 8:56 AM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Fri, Jan 19, 2018 at 10:58:44PM -0800, Dan Williams wrote:
>> On Thu, Jan 18, 2018 at 4:01 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>> > Changes since v3 [1]
>> > * Drop 'ifence_array_ptr' and associated compile-time + run-time
>> >   switching and just use the masking approach all the time.
>> >
>> > * Convert 'get_user' to use pointer sanitization via masking rather than
>> >   lfence. '__get_user' and associated paths still rely on
>> >   lfence. (Linus)
>> >
>> >       "Basically, the rule is trivial: find all 'stac' users, and use
>> >        address masking if those users already integrate the limit
>> >        check, and lfence they don't."
>> >
>> > * At syscall entry sanitize the syscall number under speculation
>> >   to remove a user controlled pointer de-reference in kernel
>> >   space.  (Linus)
>> >
>> > * Fix a raw lfence in the kvm code (added for v4.15-rc8) to use
>> >   'array_ptr'.
>> >
>> > * Propose 'array_idx' as a way to sanitize user input that is
>> >   later used as an array index, but where the validation is
>> >   happening in a different code block than the array reference.
>> >   (Christian).
>> >
>> > * Fix grammar in speculation.txt (Kees)
>> >
>> > ---
>> >
>> > Quoting Mark's original RFC:
>> >
>> > "Recently, Google Project Zero discovered several classes of attack
>> > against speculative execution. One of these, known as variant-1, allows
>> > explicit bounds checks to be bypassed under speculation, providing an
>> > arbitrary read gadget. Further details can be found on the GPZ blog [2]
>> > and the Documentation patch in this series."
>> >
>> > A precondition of using this attack on the kernel is to get a user
>> > controlled pointer de-referenced (under speculation) in privileged code.
>> > The primary source of user controlled pointers in the kernel is the
>> > arguments passed to 'get_user' and '__get_user'. An example of other
>> > user controlled pointers are user-controlled array / pointer offsets.
>> >
>> > Better tooling is needed to find more arrays / pointers with user
>> > controlled indices / offsets that can be converted to use 'array_ptr' or
>> > 'array_idx'. A few are included in this set, and these are not expected
>> > to be complete. That said, the 'get_user' protections raise the bar on
>> > finding a vulnerable gadget in the kernel.
>> >
>> > These patches are also available via the 'nospec-v4' git branch here:
>> >
>> >     git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v4
>>
>> I've pushed out a nospec-v4.1 with the below minor cleanup, a fixup of
>> the changelog for "kvm, x86: fix spectre-v1 mitigation", and added
>> Paolo's ack.
>>
>>      git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v4.1
>>
>> diff --git a/include/linux/nospec.h b/include/linux/nospec.h
>> index 8af35be1869e..b8a9222e34d1 100644
>> --- a/include/linux/nospec.h
>> +++ b/include/linux/nospec.h
>> @@ -37,7 +37,7 @@ static inline unsigned long array_ptr_mask(unsigned
>> long idx, unsigned long sz)
>>         unsigned long _i = (idx);                                       \
>>         unsigned long _mask = array_ptr_mask(_i, (sz));                 \
>>                                                                         \
>> -       __u._ptr = _arr + (_i & _mask);                                 \
>> +       __u._ptr = _arr + _i;                                           \
>>         __u._bit &= _mask;                                              \
>>         __u._ptr;                                                       \
>
> hmm. I'm not sure it's the right thing to do, since the macro
> is forcing cpu to speculate subsequent load from null instead
> of valid pointer.
> As Linus said: "
>  So that __u._bit masking wasn't masking
>  the pointer, it was masking the value that was *read* from the
>  pointer, so that you could know that an invalid access returned
>  0/NULL, not just the first value in the array.
> "
> imo just
>   return _arr + (_i & _mask);
> is enough. No need for union games.
> The cpu will speculate the load from _arr[0] if _i is out of bounds
> which is the same as if user passed _i == 0 which would have passed
> bounds check anyway, so I don't see any data leak from populating
> cache with _arr[0] data. In-bounds access can do that just as well
> without any speculation.

scratch that. It's array_ptr, not array_access.
The code will do if (!ptr) later, so yeah this api is fine.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [kernel-hardening] Re: [PATCH v4 00/10] prevent bounds-check bypass via speculative execution
@ 2018-01-20 17:07       ` Alexei Starovoitov
  0 siblings, 0 replies; 70+ messages in thread
From: Alexei Starovoitov @ 2018-01-20 17:07 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux Kernel Mailing List, Mark Rutland, Kernel Hardening,
	Peter Zijlstra, Catalin Marinas, Will Deacon, H. Peter Anvin,
	Elena Reshetova, linux-arch, Andi Kleen, Jonathan Corbet, X86 ML,
	Russell King, Ingo Molnar, Andrew Honig, Alan Cox, Tom Lendacky,
	Kees Cook, Al Viro, Andy Lutomirski, Thomas Gleixner,
	Andrew Morton, Jim Mattson, Christian Lamparter, Greg KH,
	Linux Wireless List, stable, Paolo Bonzini, Johannes Berg,
	Linus Torvalds, David S. Miller

On Sat, Jan 20, 2018 at 8:56 AM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Fri, Jan 19, 2018 at 10:58:44PM -0800, Dan Williams wrote:
>> On Thu, Jan 18, 2018 at 4:01 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>> > Changes since v3 [1]
>> > * Drop 'ifence_array_ptr' and associated compile-time + run-time
>> >   switching and just use the masking approach all the time.
>> >
>> > * Convert 'get_user' to use pointer sanitization via masking rather than
>> >   lfence. '__get_user' and associated paths still rely on
>> >   lfence. (Linus)
>> >
>> >       "Basically, the rule is trivial: find all 'stac' users, and use
>> >        address masking if those users already integrate the limit
>> >        check, and lfence they don't."
>> >
>> > * At syscall entry sanitize the syscall number under speculation
>> >   to remove a user controlled pointer de-reference in kernel
>> >   space.  (Linus)
>> >
>> > * Fix a raw lfence in the kvm code (added for v4.15-rc8) to use
>> >   'array_ptr'.
>> >
>> > * Propose 'array_idx' as a way to sanitize user input that is
>> >   later used as an array index, but where the validation is
>> >   happening in a different code block than the array reference.
>> >   (Christian).
>> >
>> > * Fix grammar in speculation.txt (Kees)
>> >
>> > ---
>> >
>> > Quoting Mark's original RFC:
>> >
>> > "Recently, Google Project Zero discovered several classes of attack
>> > against speculative execution. One of these, known as variant-1, allows
>> > explicit bounds checks to be bypassed under speculation, providing an
>> > arbitrary read gadget. Further details can be found on the GPZ blog [2]
>> > and the Documentation patch in this series."
>> >
>> > A precondition of using this attack on the kernel is to get a user
>> > controlled pointer de-referenced (under speculation) in privileged code.
>> > The primary source of user controlled pointers in the kernel is the
>> > arguments passed to 'get_user' and '__get_user'. An example of other
>> > user controlled pointers are user-controlled array / pointer offsets.
>> >
>> > Better tooling is needed to find more arrays / pointers with user
>> > controlled indices / offsets that can be converted to use 'array_ptr' or
>> > 'array_idx'. A few are included in this set, and these are not expected
>> > to be complete. That said, the 'get_user' protections raise the bar on
>> > finding a vulnerable gadget in the kernel.
>> >
>> > These patches are also available via the 'nospec-v4' git branch here:
>> >
>> >     git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v4
>>
>> I've pushed out a nospec-v4.1 with the below minor cleanup, a fixup of
>> the changelog for "kvm, x86: fix spectre-v1 mitigation", and added
>> Paolo's ack.
>>
>>      git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v4.1
>>
>> diff --git a/include/linux/nospec.h b/include/linux/nospec.h
>> index 8af35be1869e..b8a9222e34d1 100644
>> --- a/include/linux/nospec.h
>> +++ b/include/linux/nospec.h
>> @@ -37,7 +37,7 @@ static inline unsigned long array_ptr_mask(unsigned
>> long idx, unsigned long sz)
>>         unsigned long _i = (idx);                                       \
>>         unsigned long _mask = array_ptr_mask(_i, (sz));                 \
>>                                                                         \
>> -       __u._ptr = _arr + (_i & _mask);                                 \
>> +       __u._ptr = _arr + _i;                                           \
>>         __u._bit &= _mask;                                              \
>>         __u._ptr;                                                       \
>
> hmm. I'm not sure it's the right thing to do, since the macro
> is forcing cpu to speculate subsequent load from null instead
> of valid pointer.
> As Linus said: "
>  So that __u._bit masking wasn't masking
>  the pointer, it was masking the value that was *read* from the
>  pointer, so that you could know that an invalid access returned
>  0/NULL, not just the first value in the array.
> "
> imo just
>   return _arr + (_i & _mask);
> is enough. No need for union games.
> The cpu will speculate the load from _arr[0] if _i is out of bounds
> which is the same as if user passed _i == 0 which would have passed
> bounds check anyway, so I don't see any data leak from populating
> cache with _arr[0] data. In-bounds access can do that just as well
> without any speculation.

scratch that. It's array_ptr, not array_access.
The code will do if (!ptr) later, so yeah this api is fine.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 10/10] nl80211: sanitize array index in parse_txq_params
  2018-01-19  0:02   ` Dan Williams
  (?)
@ 2018-01-21 10:37     ` Johannes Berg
  -1 siblings, 0 replies; 70+ messages in thread
From: Johannes Berg @ 2018-01-21 10:37 UTC (permalink / raw)
  To: Dan Williams, linux-kernel
  Cc: linux-arch, akpm, kernel-hardening, gregkh, Christian Lamparter,
	linux-wireless, tglx, torvalds, David S. Miller, Elena Reshetova,
	alan

On Thu, 2018-01-18 at 16:02 -0800, Dan Williams wrote:
> Wireless drivers rely on parse_txq_params to validate that
> txq_params->ac is less than NL80211_NUM_ACS by the time the low-level
> driver's ->conf_tx() handler is called. Use a new helper, 'array_idx',
> to sanitize txq_params->ac with respect to speculation. I.e. ensure that
> any speculation into ->conf_tx() handlers is done with a value of
> txq_params->ac that is within the bounds of [0, NL80211_NUM_ACS).

Sorry, I didn't realize you were waiting for me to review.

LGTM.

Reviewed-by: Johannes Berg <johannes@sipsolutions.net>


> Reported-by: Christian Lamparter <chunkeey@gmail.com>
> Reported-by: Elena Reshetova <elena.reshetova@intel.com>
> Cc: Johannes Berg <johannes@sipsolutions.net>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: linux-wireless@vger.kernel.org
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  include/linux/nospec.h |   21 +++++++++++++++++++++
>  net/wireless/nl80211.c |   10 +++++++---
>  2 files changed, 28 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/nospec.h b/include/linux/nospec.h
> index f841c11f3f1f..8af35be1869e 100644
> --- a/include/linux/nospec.h
> +++ b/include/linux/nospec.h
> @@ -41,4 +41,25 @@ static inline unsigned long array_ptr_mask(unsigned long idx, unsigned long sz)
>  	__u._bit &= _mask;						\
>  	__u._ptr;							\
>  })
> +
> +/**
> + * array_idx - Generate a pointer to an array index, ensuring the
> + * pointer is bounded under speculation to NULL.
> + *
> + * @idx: the index of the element, must be less than LONG_MAX
> + * @sz: the number of elements in the array, must be less than LONG_MAX
> + *
> + * If @idx falls in the interval [0, @sz), returns &@idx otherwise
> + * returns NULL.
> + */
> +#define array_idx(idx, sz)						\
> +({									\
> +	union { typeof((idx)) *_ptr; unsigned long _bit; } __u;		\
> +	typeof(idx) *_i = &(idx);					\
> +	unsigned long _mask = array_ptr_mask(*_i, (sz));		\
> +									\
> +	__u._ptr = _i;							\
> +	__u._bit &= _mask;						\
> +	__u._ptr;							\
> +})
>  #endif /* __NOSPEC_H__ */
> diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
> index 2b3dbcd40e46..202cb1dc03ee 100644
> --- a/net/wireless/nl80211.c
> +++ b/net/wireless/nl80211.c
> @@ -16,6 +16,7 @@
>  #include <linux/nl80211.h>
>  #include <linux/rtnetlink.h>
>  #include <linux/netlink.h>
> +#include <linux/nospec.h>
>  #include <linux/etherdevice.h>
>  #include <net/net_namespace.h>
>  #include <net/genetlink.h>
> @@ -2056,20 +2057,23 @@ static const struct nla_policy txq_params_policy[NL80211_TXQ_ATTR_MAX + 1] = {
>  static int parse_txq_params(struct nlattr *tb[],
>  			    struct ieee80211_txq_params *txq_params)
>  {
> +	u8 ac, *idx;
> +
>  	if (!tb[NL80211_TXQ_ATTR_AC] || !tb[NL80211_TXQ_ATTR_TXOP] ||
>  	    !tb[NL80211_TXQ_ATTR_CWMIN] || !tb[NL80211_TXQ_ATTR_CWMAX] ||
>  	    !tb[NL80211_TXQ_ATTR_AIFS])
>  		return -EINVAL;
>  
> -	txq_params->ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]);
> +	ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]);
>  	txq_params->txop = nla_get_u16(tb[NL80211_TXQ_ATTR_TXOP]);
>  	txq_params->cwmin = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMIN]);
>  	txq_params->cwmax = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMAX]);
>  	txq_params->aifs = nla_get_u8(tb[NL80211_TXQ_ATTR_AIFS]);
>  
> -	if (txq_params->ac >= NL80211_NUM_ACS)
> +	idx = array_idx(ac, NL80211_NUM_ACS);
> +	if (!idx)
>  		return -EINVAL;
> -
> +	txq_params->ac = *idx;
>  	return 0;
>  }
>  
> 
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 10/10] nl80211: sanitize array index in parse_txq_params
@ 2018-01-21 10:37     ` Johannes Berg
  0 siblings, 0 replies; 70+ messages in thread
From: Johannes Berg @ 2018-01-21 10:37 UTC (permalink / raw)
  To: Dan Williams, linux-kernel
  Cc: linux-arch, akpm, kernel-hardening, gregkh, Christian Lamparter,
	linux-wireless, tglx, torvalds, David S. Miller, Elena Reshetova,
	alan

On Thu, 2018-01-18 at 16:02 -0800, Dan Williams wrote:
> Wireless drivers rely on parse_txq_params to validate that
> txq_params->ac is less than NL80211_NUM_ACS by the time the low-level
> driver's ->conf_tx() handler is called. Use a new helper, 'array_idx',
> to sanitize txq_params->ac with respect to speculation. I.e. ensure that
> any speculation into ->conf_tx() handlers is done with a value of
> txq_params->ac that is within the bounds of [0, NL80211_NUM_ACS).

Sorry, I didn't realize you were waiting for me to review.

LGTM.

Reviewed-by: Johannes Berg <johannes@sipsolutions.net>


> Reported-by: Christian Lamparter <chunkeey@gmail.com>
> Reported-by: Elena Reshetova <elena.reshetova@intel.com>
> Cc: Johannes Berg <johannes@sipsolutions.net>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: linux-wireless@vger.kernel.org
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  include/linux/nospec.h |   21 +++++++++++++++++++++
>  net/wireless/nl80211.c |   10 +++++++---
>  2 files changed, 28 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/nospec.h b/include/linux/nospec.h
> index f841c11f3f1f..8af35be1869e 100644
> --- a/include/linux/nospec.h
> +++ b/include/linux/nospec.h
> @@ -41,4 +41,25 @@ static inline unsigned long array_ptr_mask(unsigned long idx, unsigned long sz)
>  	__u._bit &= _mask;						\
>  	__u._ptr;							\
>  })
> +
> +/**
> + * array_idx - Generate a pointer to an array index, ensuring the
> + * pointer is bounded under speculation to NULL.
> + *
> + * @idx: the index of the element, must be less than LONG_MAX
> + * @sz: the number of elements in the array, must be less than LONG_MAX
> + *
> + * If @idx falls in the interval [0, @sz), returns &@idx otherwise
> + * returns NULL.
> + */
> +#define array_idx(idx, sz)						\
> +({									\
> +	union { typeof((idx)) *_ptr; unsigned long _bit; } __u;		\
> +	typeof(idx) *_i = &(idx);					\
> +	unsigned long _mask = array_ptr_mask(*_i, (sz));		\
> +									\
> +	__u._ptr = _i;							\
> +	__u._bit &= _mask;						\
> +	__u._ptr;							\
> +})
>  #endif /* __NOSPEC_H__ */
> diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
> index 2b3dbcd40e46..202cb1dc03ee 100644
> --- a/net/wireless/nl80211.c
> +++ b/net/wireless/nl80211.c
> @@ -16,6 +16,7 @@
>  #include <linux/nl80211.h>
>  #include <linux/rtnetlink.h>
>  #include <linux/netlink.h>
> +#include <linux/nospec.h>
>  #include <linux/etherdevice.h>
>  #include <net/net_namespace.h>
>  #include <net/genetlink.h>
> @@ -2056,20 +2057,23 @@ static const struct nla_policy txq_params_policy[NL80211_TXQ_ATTR_MAX + 1] = {
>  static int parse_txq_params(struct nlattr *tb[],
>  			    struct ieee80211_txq_params *txq_params)
>  {
> +	u8 ac, *idx;
> +
>  	if (!tb[NL80211_TXQ_ATTR_AC] || !tb[NL80211_TXQ_ATTR_TXOP] ||
>  	    !tb[NL80211_TXQ_ATTR_CWMIN] || !tb[NL80211_TXQ_ATTR_CWMAX] ||
>  	    !tb[NL80211_TXQ_ATTR_AIFS])
>  		return -EINVAL;
>  
> -	txq_params->ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]);
> +	ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]);
>  	txq_params->txop = nla_get_u16(tb[NL80211_TXQ_ATTR_TXOP]);
>  	txq_params->cwmin = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMIN]);
>  	txq_params->cwmax = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMAX]);
>  	txq_params->aifs = nla_get_u8(tb[NL80211_TXQ_ATTR_AIFS]);
>  
> -	if (txq_params->ac >= NL80211_NUM_ACS)
> +	idx = array_idx(ac, NL80211_NUM_ACS);
> +	if (!idx)
>  		return -EINVAL;
> -
> +	txq_params->ac = *idx;
>  	return 0;
>  }
>  
> 
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [kernel-hardening] Re: [PATCH v4 10/10] nl80211: sanitize array index in parse_txq_params
@ 2018-01-21 10:37     ` Johannes Berg
  0 siblings, 0 replies; 70+ messages in thread
From: Johannes Berg @ 2018-01-21 10:37 UTC (permalink / raw)
  To: Dan Williams, linux-kernel
  Cc: linux-arch, akpm, kernel-hardening, gregkh, Christian Lamparter,
	linux-wireless, tglx, torvalds, David S. Miller, Elena Reshetova,
	alan

On Thu, 2018-01-18 at 16:02 -0800, Dan Williams wrote:
> Wireless drivers rely on parse_txq_params to validate that
> txq_params->ac is less than NL80211_NUM_ACS by the time the low-level
> driver's ->conf_tx() handler is called. Use a new helper, 'array_idx',
> to sanitize txq_params->ac with respect to speculation. I.e. ensure that
> any speculation into ->conf_tx() handlers is done with a value of
> txq_params->ac that is within the bounds of [0, NL80211_NUM_ACS).

Sorry, I didn't realize you were waiting for me to review.

LGTM.

Reviewed-by: Johannes Berg <johannes@sipsolutions.net>


> Reported-by: Christian Lamparter <chunkeey@gmail.com>
> Reported-by: Elena Reshetova <elena.reshetova@intel.com>
> Cc: Johannes Berg <johannes@sipsolutions.net>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: linux-wireless@vger.kernel.org
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  include/linux/nospec.h |   21 +++++++++++++++++++++
>  net/wireless/nl80211.c |   10 +++++++---
>  2 files changed, 28 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/nospec.h b/include/linux/nospec.h
> index f841c11f3f1f..8af35be1869e 100644
> --- a/include/linux/nospec.h
> +++ b/include/linux/nospec.h
> @@ -41,4 +41,25 @@ static inline unsigned long array_ptr_mask(unsigned long idx, unsigned long sz)
>  	__u._bit &= _mask;						\
>  	__u._ptr;							\
>  })
> +
> +/**
> + * array_idx - Generate a pointer to an array index, ensuring the
> + * pointer is bounded under speculation to NULL.
> + *
> + * @idx: the index of the element, must be less than LONG_MAX
> + * @sz: the number of elements in the array, must be less than LONG_MAX
> + *
> + * If @idx falls in the interval [0, @sz), returns &@idx otherwise
> + * returns NULL.
> + */
> +#define array_idx(idx, sz)						\
> +({									\
> +	union { typeof((idx)) *_ptr; unsigned long _bit; } __u;		\
> +	typeof(idx) *_i = &(idx);					\
> +	unsigned long _mask = array_ptr_mask(*_i, (sz));		\
> +									\
> +	__u._ptr = _i;							\
> +	__u._bit &= _mask;						\
> +	__u._ptr;							\
> +})
>  #endif /* __NOSPEC_H__ */
> diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
> index 2b3dbcd40e46..202cb1dc03ee 100644
> --- a/net/wireless/nl80211.c
> +++ b/net/wireless/nl80211.c
> @@ -16,6 +16,7 @@
>  #include <linux/nl80211.h>
>  #include <linux/rtnetlink.h>
>  #include <linux/netlink.h>
> +#include <linux/nospec.h>
>  #include <linux/etherdevice.h>
>  #include <net/net_namespace.h>
>  #include <net/genetlink.h>
> @@ -2056,20 +2057,23 @@ static const struct nla_policy txq_params_policy[NL80211_TXQ_ATTR_MAX + 1] = {
>  static int parse_txq_params(struct nlattr *tb[],
>  			    struct ieee80211_txq_params *txq_params)
>  {
> +	u8 ac, *idx;
> +
>  	if (!tb[NL80211_TXQ_ATTR_AC] || !tb[NL80211_TXQ_ATTR_TXOP] ||
>  	    !tb[NL80211_TXQ_ATTR_CWMIN] || !tb[NL80211_TXQ_ATTR_CWMAX] ||
>  	    !tb[NL80211_TXQ_ATTR_AIFS])
>  		return -EINVAL;
>  
> -	txq_params->ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]);
> +	ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]);
>  	txq_params->txop = nla_get_u16(tb[NL80211_TXQ_ATTR_TXOP]);
>  	txq_params->cwmin = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMIN]);
>  	txq_params->cwmax = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMAX]);
>  	txq_params->aifs = nla_get_u8(tb[NL80211_TXQ_ATTR_AIFS]);
>  
> -	if (txq_params->ac >= NL80211_NUM_ACS)
> +	idx = array_idx(ac, NL80211_NUM_ACS);
> +	if (!idx)
>  		return -EINVAL;
> -
> +	txq_params->ac = *idx;
>  	return 0;
>  }
>  
> 
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 07/10] x86: narrow out of bounds syscalls to sys_read under speculation
  2018-01-19  0:02   ` [kernel-hardening] " Dan Williams
@ 2018-01-24 14:40     ` Jiri Slaby
  -1 siblings, 0 replies; 70+ messages in thread
From: Jiri Slaby @ 2018-01-24 14:40 UTC (permalink / raw)
  To: Dan Williams, linux-kernel
  Cc: linux-arch, kernel-hardening, gregkh, x86, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, tglx, torvalds, akpm, alan

On 01/19/2018, 01:02 AM, Dan Williams wrote:
> The syscall table base is a user controlled function pointer in kernel
> space. Like, 'get_user, use 'MASK_NOSPEC' to prevent any out of bounds
> speculation. While retpoline prevents speculating into the user
> controlled target it does not stop the pointer de-reference, the concern
> is leaking memory relative to the syscall table base.
> 
> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  arch/x86/entry/entry_64.S   |    2 ++
>  arch/x86/include/asm/smap.h |    9 ++++++++-
>  2 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 4f8e1d35a97c..2320017077d4 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -35,6 +35,7 @@
>  #include <asm/asm.h>
>  #include <asm/smap.h>
>  #include <asm/pgtable_types.h>
> +#include <asm/smap.h>

This is already included 2 lines above

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [kernel-hardening] Re: [PATCH v4 07/10] x86: narrow out of bounds syscalls to sys_read under speculation
@ 2018-01-24 14:40     ` Jiri Slaby
  0 siblings, 0 replies; 70+ messages in thread
From: Jiri Slaby @ 2018-01-24 14:40 UTC (permalink / raw)
  To: Dan Williams, linux-kernel
  Cc: linux-arch, kernel-hardening, gregkh, x86, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, tglx, torvalds, akpm, alan

On 01/19/2018, 01:02 AM, Dan Williams wrote:
> The syscall table base is a user controlled function pointer in kernel
> space. Like, 'get_user, use 'MASK_NOSPEC' to prevent any out of bounds
> speculation. While retpoline prevents speculating into the user
> controlled target it does not stop the pointer de-reference, the concern
> is leaking memory relative to the syscall table base.
> 
> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  arch/x86/entry/entry_64.S   |    2 ++
>  arch/x86/include/asm/smap.h |    9 ++++++++-
>  2 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 4f8e1d35a97c..2320017077d4 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -35,6 +35,7 @@
>  #include <asm/asm.h>
>  #include <asm/smap.h>
>  #include <asm/pgtable_types.h>
> +#include <asm/smap.h>

This is already included 2 lines above

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
  2018-01-19  0:01   ` [kernel-hardening] " Dan Williams
@ 2018-01-25  7:09     ` Cyril Novikov
  -1 siblings, 0 replies; 70+ messages in thread
From: Cyril Novikov @ 2018-01-25  7:09 UTC (permalink / raw)
  To: Dan Williams, linux-kernel
  Cc: linux-arch, kernel-hardening, Catalin Marinas, x86, Will Deacon,
	Russell King, Ingo Molnar, gregkh, H. Peter Anvin, tglx,
	torvalds, akpm, alan

On 1/18/2018 4:01 PM, Dan Williams wrote:
> 'array_ptr' is proposed as a generic mechanism to mitigate against
> Spectre-variant-1 attacks, i.e. an attack that bypasses boundary checks
> via speculative execution). The 'array_ptr' implementation is expected
> to be safe for current generation cpus across multiple architectures
> (ARM, x86).

I'm an outside reviewer, not subscribed to the list, so forgive me if I 
do something not according to protocol. I have the following comments on 
this change:

After discarding the speculation barrier variant, is array_ptr() needed 
at all? You could have a simpler sanitizing macro, say

#define array_sanitize_idx(idx, sz) ((idx) & array_ptr_mask((idx), (sz)))

(adjusted to not evaluate idx twice). And use it as follows:

if (idx < array_size) {
     idx = array_sanitize_idx(idx, array_size);
     do_something(array[idx]);
}

If I understand the speculation stuff correctly, unlike array_ptr(), 
this "leaks" array[0] rather than nothing (*NULL) when executed 
speculatively. However, it's still much better than leaking an arbitrary 
location in memory. The attacker can likely get array[0] "leaked" by 
passing 0 as idx anyway.

> +/*
> + * If idx is negative or if idx > size then bit 63 is set in the mask,
> + * and the value of ~(-1L) is zero. When the mask is zero, bounds check
> + * failed, array_ptr will return NULL.
> + */
> +#ifndef array_ptr_mask
> +static inline unsigned long array_ptr_mask(unsigned long idx, unsigned long sz)
> +{
> +	return ~(long)(idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1);
> +}
> +#endif

Why does this have to resort to the undefined behavior of shifting a 
negative number to the right? You can do without it:

return ((idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1)) - 1;

Of course, you could argue that subtracting 1 from 0 to get all ones is 
also an undefined behavior, but it's still much better than the shift, 
isn't it?

> +#define array_ptr(base, idx, sz)					\
> +({									\
> +	union { typeof(*(base)) *_ptr; unsigned long _bit; } __u;	\
> +	typeof(*(base)) *_arr = (base);					\
> +	unsigned long _i = (idx);					\
> +	unsigned long _mask = array_ptr_mask(_i, (sz));			\
> +									\
> +	__u._ptr = _arr + (_i & _mask);					\
> +	__u._bit &= _mask;						\
> +	__u._ptr;							\
> +})

Call me paranoid, but I think this may actually create an exploitable 
bug on 32-bit systems due to casting the index to an unsigned long, if 
the index as it comes from userland is a 64-bit value. You have 
*replaced* the "if (idx < array_size)" check with checking if 
array_ptr() returns NULL. Well, it doesn't return NULL if the low 32 
bits of the index are in-bounds, but the high 32 bits are not zero. 
Apart from the return value pointing to the wrong place, the subsequent 
code may then assume that the 64-bit idx is actually valid and trip on 
it badly.

--
Cyril

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [kernel-hardening] Re: [PATCH v4 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
@ 2018-01-25  7:09     ` Cyril Novikov
  0 siblings, 0 replies; 70+ messages in thread
From: Cyril Novikov @ 2018-01-25  7:09 UTC (permalink / raw)
  To: Dan Williams, linux-kernel
  Cc: linux-arch, kernel-hardening, Catalin Marinas, x86, Will Deacon,
	Russell King, Ingo Molnar, gregkh, H. Peter Anvin, tglx,
	torvalds, akpm, alan

On 1/18/2018 4:01 PM, Dan Williams wrote:
> 'array_ptr' is proposed as a generic mechanism to mitigate against
> Spectre-variant-1 attacks, i.e. an attack that bypasses boundary checks
> via speculative execution). The 'array_ptr' implementation is expected
> to be safe for current generation cpus across multiple architectures
> (ARM, x86).

I'm an outside reviewer, not subscribed to the list, so forgive me if I 
do something not according to protocol. I have the following comments on 
this change:

After discarding the speculation barrier variant, is array_ptr() needed 
at all? You could have a simpler sanitizing macro, say

#define array_sanitize_idx(idx, sz) ((idx) & array_ptr_mask((idx), (sz)))

(adjusted to not evaluate idx twice). And use it as follows:

if (idx < array_size) {
     idx = array_sanitize_idx(idx, array_size);
     do_something(array[idx]);
}

If I understand the speculation stuff correctly, unlike array_ptr(), 
this "leaks" array[0] rather than nothing (*NULL) when executed 
speculatively. However, it's still much better than leaking an arbitrary 
location in memory. The attacker can likely get array[0] "leaked" by 
passing 0 as idx anyway.

> +/*
> + * If idx is negative or if idx > size then bit 63 is set in the mask,
> + * and the value of ~(-1L) is zero. When the mask is zero, bounds check
> + * failed, array_ptr will return NULL.
> + */
> +#ifndef array_ptr_mask
> +static inline unsigned long array_ptr_mask(unsigned long idx, unsigned long sz)
> +{
> +	return ~(long)(idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1);
> +}
> +#endif

Why does this have to resort to the undefined behavior of shifting a 
negative number to the right? You can do without it:

return ((idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1)) - 1;

Of course, you could argue that subtracting 1 from 0 to get all ones is 
also an undefined behavior, but it's still much better than the shift, 
isn't it?

> +#define array_ptr(base, idx, sz)					\
> +({									\
> +	union { typeof(*(base)) *_ptr; unsigned long _bit; } __u;	\
> +	typeof(*(base)) *_arr = (base);					\
> +	unsigned long _i = (idx);					\
> +	unsigned long _mask = array_ptr_mask(_i, (sz));			\
> +									\
> +	__u._ptr = _arr + (_i & _mask);					\
> +	__u._bit &= _mask;						\
> +	__u._ptr;							\
> +})

Call me paranoid, but I think this may actually create an exploitable 
bug on 32-bit systems due to casting the index to an unsigned long, if 
the index as it comes from userland is a 64-bit value. You have 
*replaced* the "if (idx < array_size)" check with checking if 
array_ptr() returns NULL. Well, it doesn't return NULL if the low 32 
bits of the index are in-bounds, but the high 32 bits are not zero. 
Apart from the return value pointing to the wrong place, the subsequent 
code may then assume that the 64-bit idx is actually valid and trip on 
it badly.

--
Cyril

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
  2018-01-25  7:09     ` [kernel-hardening] " Cyril Novikov
@ 2018-01-25 22:37       ` Dan Williams
  -1 siblings, 0 replies; 70+ messages in thread
From: Dan Williams @ 2018-01-25 22:37 UTC (permalink / raw)
  To: Cyril Novikov
  Cc: Linux Kernel Mailing List, linux-arch, Kernel Hardening,
	Catalin Marinas, X86 ML, Will Deacon, Russell King, Ingo Molnar,
	Greg KH, H. Peter Anvin, Thomas Gleixner, Linus Torvalds,
	Andrew Morton, Alan Cox, Adam Sampson

On Wed, Jan 24, 2018 at 11:09 PM, Cyril Novikov <cnovikov@lynx.com> wrote:
> On 1/18/2018 4:01 PM, Dan Williams wrote:
>>
>> 'array_ptr' is proposed as a generic mechanism to mitigate against
>> Spectre-variant-1 attacks, i.e. an attack that bypasses boundary checks
>> via speculative execution). The 'array_ptr' implementation is expected
>> to be safe for current generation cpus across multiple architectures
>> (ARM, x86).
>
>
> I'm an outside reviewer, not subscribed to the list, so forgive me if I do
> something not according to protocol. I have the following comments on this
> change:
>
> After discarding the speculation barrier variant, is array_ptr() needed at
> all? You could have a simpler sanitizing macro, say
>
> #define array_sanitize_idx(idx, sz) ((idx) & array_ptr_mask((idx), (sz)))
>
> (adjusted to not evaluate idx twice). And use it as follows:
>
> if (idx < array_size) {
>     idx = array_sanitize_idx(idx, array_size);
>     do_something(array[idx]);
> }
>
> If I understand the speculation stuff correctly, unlike array_ptr(), this
> "leaks" array[0] rather than nothing (*NULL) when executed speculatively.
> However, it's still much better than leaking an arbitrary location in
> memory. The attacker can likely get array[0] "leaked" by passing 0 as idx
> anyway.

True, we could simplify it to just sanitize the index with the
expectation that speculating with index 0 is safe. Although we'd want
to document it just case there is some odd code paths where the valid
portion of the array is offset from 0.

I like this approach better because it handles cases where the bounds
check is far away from the array de-reference. I.e. instead of having
array_ptr() and array_idx() for different cases, just sanitize the
index.

>
>> +/*
>> + * If idx is negative or if idx > size then bit 63 is set in the mask,
>> + * and the value of ~(-1L) is zero. When the mask is zero, bounds check
>> + * failed, array_ptr will return NULL.
>> + */
>> +#ifndef array_ptr_mask
>> +static inline unsigned long array_ptr_mask(unsigned long idx, unsigned
>> long sz)
>> +{
>> +       return ~(long)(idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1);
>> +}
>> +#endif
>
>
> Why does this have to resort to the undefined behavior of shifting a
> negative number to the right? You can do without it:
>
> return ((idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1)) - 1;
>
> Of course, you could argue that subtracting 1 from 0 to get all ones is also
> an undefined behavior, but it's still much better than the shift, isn't it?

The goal is to prevent the compiler from emitting conditional
instructions. For example, the simplest form of this sanitize
operation from Adam:

    return 0 - ((long) (idx < sz));

...produces the desired "cmp, sbb" sequence on x86, but leads to a
"csel" instruction being emitted for arm64.

Stepping back and realizing that all this churn is around the
un-optimized form of the comparison vs the inline asm that an arch can
provide, and we're effectively handling the carry bit in software, we
can have a WARN_ON_ONCE(idx < 0 || sz < 0) to catch places where the
expectations of the macro are violated.

Archs can remove the overhead of that extra branch with an instruction
sequence to handle the carry bit in hardware, otherwise we get runtime
coverage of places where array_idx() gets used incorrectly.

>
>> +#define array_ptr(base, idx, sz)                                       \
>> +({                                                                     \
>> +       union { typeof(*(base)) *_ptr; unsigned long _bit; } __u;       \
>> +       typeof(*(base)) *_arr = (base);                                 \
>> +       unsigned long _i = (idx);                                       \
>> +       unsigned long _mask = array_ptr_mask(_i, (sz));                 \
>> +                                                                       \
>> +       __u._ptr = _arr + (_i & _mask);                                 \
>> +       __u._bit &= _mask;                                              \
>> +       __u._ptr;                                                       \
>> +})
>
>
> Call me paranoid, but I think this may actually create an exploitable bug on
> 32-bit systems due to casting the index to an unsigned long, if the index as
> it comes from userland is a 64-bit value. You have *replaced* the "if (idx <
> array_size)" check with checking if array_ptr() returns NULL. Well, it
> doesn't return NULL if the low 32 bits of the index are in-bounds, but the
> high 32 bits are not zero. Apart from the return value pointing to the wrong
> place, the subsequent code may then assume that the 64-bit idx is actually
> valid and trip on it badly.

Yes, I'll add some BUILD_BUG_ON safety for cases where sizeof(idx) >
sizeof(long).

Thanks for taking a look.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [kernel-hardening] Re: [PATCH v4 02/10] asm/nospec, array_ptr: sanitize speculative array de-references
@ 2018-01-25 22:37       ` Dan Williams
  0 siblings, 0 replies; 70+ messages in thread
From: Dan Williams @ 2018-01-25 22:37 UTC (permalink / raw)
  To: Cyril Novikov
  Cc: Linux Kernel Mailing List, linux-arch, Kernel Hardening,
	Catalin Marinas, X86 ML, Will Deacon, Russell King, Ingo Molnar,
	Greg KH, H. Peter Anvin, Thomas Gleixner, Linus Torvalds,
	Andrew Morton, Alan Cox, Adam Sampson

On Wed, Jan 24, 2018 at 11:09 PM, Cyril Novikov <cnovikov@lynx.com> wrote:
> On 1/18/2018 4:01 PM, Dan Williams wrote:
>>
>> 'array_ptr' is proposed as a generic mechanism to mitigate against
>> Spectre-variant-1 attacks, i.e. an attack that bypasses boundary checks
>> via speculative execution). The 'array_ptr' implementation is expected
>> to be safe for current generation cpus across multiple architectures
>> (ARM, x86).
>
>
> I'm an outside reviewer, not subscribed to the list, so forgive me if I do
> something not according to protocol. I have the following comments on this
> change:
>
> After discarding the speculation barrier variant, is array_ptr() needed at
> all? You could have a simpler sanitizing macro, say
>
> #define array_sanitize_idx(idx, sz) ((idx) & array_ptr_mask((idx), (sz)))
>
> (adjusted to not evaluate idx twice). And use it as follows:
>
> if (idx < array_size) {
>     idx = array_sanitize_idx(idx, array_size);
>     do_something(array[idx]);
> }
>
> If I understand the speculation stuff correctly, unlike array_ptr(), this
> "leaks" array[0] rather than nothing (*NULL) when executed speculatively.
> However, it's still much better than leaking an arbitrary location in
> memory. The attacker can likely get array[0] "leaked" by passing 0 as idx
> anyway.

True, we could simplify it to just sanitize the index with the
expectation that speculating with index 0 is safe. Although we'd want
to document it just case there is some odd code paths where the valid
portion of the array is offset from 0.

I like this approach better because it handles cases where the bounds
check is far away from the array de-reference. I.e. instead of having
array_ptr() and array_idx() for different cases, just sanitize the
index.

>
>> +/*
>> + * If idx is negative or if idx > size then bit 63 is set in the mask,
>> + * and the value of ~(-1L) is zero. When the mask is zero, bounds check
>> + * failed, array_ptr will return NULL.
>> + */
>> +#ifndef array_ptr_mask
>> +static inline unsigned long array_ptr_mask(unsigned long idx, unsigned
>> long sz)
>> +{
>> +       return ~(long)(idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1);
>> +}
>> +#endif
>
>
> Why does this have to resort to the undefined behavior of shifting a
> negative number to the right? You can do without it:
>
> return ((idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1)) - 1;
>
> Of course, you could argue that subtracting 1 from 0 to get all ones is also
> an undefined behavior, but it's still much better than the shift, isn't it?

The goal is to prevent the compiler from emitting conditional
instructions. For example, the simplest form of this sanitize
operation from Adam:

    return 0 - ((long) (idx < sz));

...produces the desired "cmp, sbb" sequence on x86, but leads to a
"csel" instruction being emitted for arm64.

Stepping back and realizing that all this churn is around the
un-optimized form of the comparison vs the inline asm that an arch can
provide, and we're effectively handling the carry bit in software, we
can have a WARN_ON_ONCE(idx < 0 || sz < 0) to catch places where the
expectations of the macro are violated.

Archs can remove the overhead of that extra branch with an instruction
sequence to handle the carry bit in hardware, otherwise we get runtime
coverage of places where array_idx() gets used incorrectly.

>
>> +#define array_ptr(base, idx, sz)                                       \
>> +({                                                                     \
>> +       union { typeof(*(base)) *_ptr; unsigned long _bit; } __u;       \
>> +       typeof(*(base)) *_arr = (base);                                 \
>> +       unsigned long _i = (idx);                                       \
>> +       unsigned long _mask = array_ptr_mask(_i, (sz));                 \
>> +                                                                       \
>> +       __u._ptr = _arr + (_i & _mask);                                 \
>> +       __u._bit &= _mask;                                              \
>> +       __u._ptr;                                                       \
>> +})
>
>
> Call me paranoid, but I think this may actually create an exploitable bug on
> 32-bit systems due to casting the index to an unsigned long, if the index as
> it comes from userland is a 64-bit value. You have *replaced* the "if (idx <
> array_size)" check with checking if array_ptr() returns NULL. Well, it
> doesn't return NULL if the low 32 bits of the index are in-bounds, but the
> high 32 bits are not zero. Apart from the return value pointing to the wrong
> place, the subsequent code may then assume that the 64-bit idx is actually
> valid and trip on it badly.

Yes, I'll add some BUILD_BUG_ON safety for cases where sizeof(idx) >
sizeof(long).

Thanks for taking a look.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 07/10] x86: narrow out of bounds syscalls to sys_read under speculation
  2018-01-19  0:02   ` [kernel-hardening] " Dan Williams
  (?)
  (?)
@ 2018-02-06 19:29   ` Luis Henriques
  2018-02-06 19:48     ` Dan Williams
  -1 siblings, 1 reply; 70+ messages in thread
From: Luis Henriques @ 2018-02-06 19:29 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-kernel, linux-arch, kernel-hardening, gregkh, x86,
	Ingo Molnar, Andy Lutomirski, H. Peter Anvin, tglx, torvalds,
	akpm, alan

On Thu, Jan 18, 2018 at 04:02:21PM -0800, Dan Williams wrote:
> The syscall table base is a user controlled function pointer in kernel
> space. Like, 'get_user, use 'MASK_NOSPEC' to prevent any out of bounds
> speculation. While retpoline prevents speculating into the user
> controlled target it does not stop the pointer de-reference, the concern
> is leaking memory relative to the syscall table base.

This patch seems to cause a regression.  An easy way to reproduce what
I'm seeing is to run the samples/statx/test-statx.  Here's what I see
when I have this patchset applied:

# ./test-statx /tmp
statx(/tmp) = -1
/tmp: Bad file descriptor

Reverting this single patch seems to fix it.

Cheers,
--
Luís

> 
> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  arch/x86/entry/entry_64.S   |    2 ++
>  arch/x86/include/asm/smap.h |    9 ++++++++-
>  2 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 4f8e1d35a97c..2320017077d4 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -35,6 +35,7 @@
>  #include <asm/asm.h>
>  #include <asm/smap.h>
>  #include <asm/pgtable_types.h>
> +#include <asm/smap.h>
>  #include <asm/export.h>
>  #include <asm/frame.h>
>  #include <asm/nospec-branch.h>
> @@ -264,6 +265,7 @@ entry_SYSCALL_64_fastpath:
>  	cmpl	$__NR_syscall_max, %eax
>  #endif
>  	ja	1f				/* return -ENOSYS (already in pt_regs->ax) */
> +	MASK_NOSPEC %r11 %rax			/* sanitize syscall_nr wrt speculation */
>  	movq	%r10, %rcx
>  
>  	/*
> diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
> index 2b4ad4c6a226..3b5b2cf58dc6 100644
> --- a/arch/x86/include/asm/smap.h
> +++ b/arch/x86/include/asm/smap.h
> @@ -35,7 +35,14 @@
>   * this directs the cpu to speculate with a NULL ptr rather than
>   * something targeting kernel memory.
>   *
> - * assumes CF is set from a previous 'cmp TASK_addr_limit, %ptr'
> + * In the syscall entry path it is possible to speculate past the
> + * validation of the system call number. Use MASK_NOSPEC to sanitize the
> + * syscall array index to zero (sys_read) rather than an arbitrary
> + * target.
> + *
> + * assumes CF is set from a previous 'cmp' i.e.:
> + *     cmp TASK_addr_limit, %ptr
> + *     cmp __NR_syscall_max, %idx
>   */
>  .macro MASK_NOSPEC mask val
>  	sbb \mask, \mask
> 
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 07/10] x86: narrow out of bounds syscalls to sys_read under speculation
  2018-02-06 19:29   ` Luis Henriques
@ 2018-02-06 19:48     ` Dan Williams
  2018-02-06 20:26       ` Linus Torvalds
  2018-02-06 22:51         ` Luis Henriques
  0 siblings, 2 replies; 70+ messages in thread
From: Dan Williams @ 2018-02-06 19:48 UTC (permalink / raw)
  To: Luis Henriques
  Cc: Linux Kernel Mailing List, linux-arch, Kernel Hardening, Greg KH,
	X86 ML, Ingo Molnar, Andy Lutomirski, H. Peter Anvin,
	Thomas Gleixner, Linus Torvalds, Andrew Morton, Alan Cox

On Tue, Feb 6, 2018 at 11:29 AM, Luis Henriques <lhenriques@suse.com> wrote:
> On Thu, Jan 18, 2018 at 04:02:21PM -0800, Dan Williams wrote:
>> The syscall table base is a user controlled function pointer in kernel
>> space. Like, 'get_user, use 'MASK_NOSPEC' to prevent any out of bounds
>> speculation. While retpoline prevents speculating into the user
>> controlled target it does not stop the pointer de-reference, the concern
>> is leaking memory relative to the syscall table base.
>
> This patch seems to cause a regression.  An easy way to reproduce what
> I'm seeing is to run the samples/statx/test-statx.  Here's what I see
> when I have this patchset applied:
>
> # ./test-statx /tmp
> statx(/tmp) = -1
> /tmp: Bad file descriptor
>
> Reverting this single patch seems to fix it.

Just to clarify, when you say "this patch" you mean:

     2fbd7af5af86 x86/syscall: Sanitize syscall table de-references
under speculation

...not this early MASK_NOSPEC version of the patch, right?

>
> Cheers,
> --
> Luís
>
>>
>> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: x86@kernel.org
>> Cc: Andy Lutomirski <luto@kernel.org>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> ---
>>  arch/x86/entry/entry_64.S   |    2 ++
>>  arch/x86/include/asm/smap.h |    9 ++++++++-
>>  2 files changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>> index 4f8e1d35a97c..2320017077d4 100644
>> --- a/arch/x86/entry/entry_64.S
>> +++ b/arch/x86/entry/entry_64.S
>> @@ -35,6 +35,7 @@
>>  #include <asm/asm.h>
>>  #include <asm/smap.h>
>>  #include <asm/pgtable_types.h>
>> +#include <asm/smap.h>
>>  #include <asm/export.h>
>>  #include <asm/frame.h>
>>  #include <asm/nospec-branch.h>
>> @@ -264,6 +265,7 @@ entry_SYSCALL_64_fastpath:
>>       cmpl    $__NR_syscall_max, %eax
>>  #endif
>>       ja      1f                              /* return -ENOSYS (already in pt_regs->ax) */
>> +     MASK_NOSPEC %r11 %rax                   /* sanitize syscall_nr wrt speculation */
>>       movq    %r10, %rcx
>>
>>       /*
>> diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
>> index 2b4ad4c6a226..3b5b2cf58dc6 100644
>> --- a/arch/x86/include/asm/smap.h
>> +++ b/arch/x86/include/asm/smap.h
>> @@ -35,7 +35,14 @@
>>   * this directs the cpu to speculate with a NULL ptr rather than
>>   * something targeting kernel memory.
>>   *
>> - * assumes CF is set from a previous 'cmp TASK_addr_limit, %ptr'
>> + * In the syscall entry path it is possible to speculate past the
>> + * validation of the system call number. Use MASK_NOSPEC to sanitize the
>> + * syscall array index to zero (sys_read) rather than an arbitrary
>> + * target.
>> + *
>> + * assumes CF is set from a previous 'cmp' i.e.:
>> + *     cmp TASK_addr_limit, %ptr
>> + *     cmp __NR_syscall_max, %idx
>>   */
>>  .macro MASK_NOSPEC mask val
>>       sbb \mask, \mask
>>
>>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 07/10] x86: narrow out of bounds syscalls to sys_read under speculation
  2018-02-06 19:48     ` Dan Williams
@ 2018-02-06 20:26       ` Linus Torvalds
  2018-02-06 20:37         ` Dan Williams
  2018-02-06 22:51         ` Luis Henriques
  1 sibling, 1 reply; 70+ messages in thread
From: Linus Torvalds @ 2018-02-06 20:26 UTC (permalink / raw)
  To: Dan Williams
  Cc: Luis Henriques, Linux Kernel Mailing List, linux-arch,
	Kernel Hardening, Greg KH, X86 ML, Ingo Molnar, Andy Lutomirski,
	H. Peter Anvin, Thomas Gleixner, Andrew Morton, Alan Cox

On Tue, Feb 6, 2018 at 11:48 AM, Dan Williams <dan.j.williams@intel.com> wrote:
>
> Just to clarify, when you say "this patch" you mean:
>
>      2fbd7af5af86 x86/syscall: Sanitize syscall table de-references
> under speculation
>
> ...not this early MASK_NOSPEC version of the patch, right?

I suspect not. If that patch is broken, the system wouldn't even boot.

That said, looking at 2fbd7af5af86, I do note that the code generation
is horribly stupid.

It's due to two different issues:

 (a) the x86 asm constraints for that inline asm is nasty, and
requires a register for 'size', even though an immediate works just
fine.

 (b) the "cmp" is inside the asm, so gcc can't combine it with the
*other* cmp in the C code.

Fixing (a) is easy:

  +++ b/arch/x86/include/asm/barrier.h
  @@ -43 +43 @@ static inline unsigned long
array_index_mask_nospec(unsigned long index,
  -                       :"r"(size),"r" (index)
  +                       :"ir"(size),"r" (index)

but fixing (b) looks fundamentally hard. Gcc generates (for do_syscall()):

        cmpq    $332, %rbp      #, nr
        ja      .L295   #,
        cmp $333,%rbp
        sbb %rax,%rax;   #, nr, mask

note how it completely pointlessly does the comparison twice, even
though it could have just done

        cmp $333,%rbp
        jae      .L295   #,
        sbb %rax,%rax;   #, nr, mask

Ho humm. Sad.

              Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 07/10] x86: narrow out of bounds syscalls to sys_read under speculation
  2018-02-06 20:26       ` Linus Torvalds
@ 2018-02-06 20:37         ` Dan Williams
  2018-02-06 20:42           ` Linus Torvalds
  0 siblings, 1 reply; 70+ messages in thread
From: Dan Williams @ 2018-02-06 20:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Luis Henriques, Linux Kernel Mailing List, linux-arch,
	Kernel Hardening, Greg KH, X86 ML, Ingo Molnar, Andy Lutomirski,
	H. Peter Anvin, Thomas Gleixner, Andrew Morton, Alan Cox

On Tue, Feb 6, 2018 at 12:26 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Feb 6, 2018 at 11:48 AM, Dan Williams <dan.j.williams@intel.com> wrote:
>>
>> Just to clarify, when you say "this patch" you mean:
>>
>>      2fbd7af5af86 x86/syscall: Sanitize syscall table de-references
>> under speculation
>>
>> ...not this early MASK_NOSPEC version of the patch, right?
>
> I suspect not. If that patch is broken, the system wouldn't even boot.
>
> That said, looking at 2fbd7af5af86, I do note that the code generation
> is horribly stupid.
>
> It's due to two different issues:
>
>  (a) the x86 asm constraints for that inline asm is nasty, and
> requires a register for 'size', even though an immediate works just
> fine.
>
>  (b) the "cmp" is inside the asm, so gcc can't combine it with the
> *other* cmp in the C code.
>
> Fixing (a) is easy:
>
>   +++ b/arch/x86/include/asm/barrier.h
>   @@ -43 +43 @@ static inline unsigned long
> array_index_mask_nospec(unsigned long index,
>   -                       :"r"(size),"r" (index)
>   +                       :"ir"(size),"r" (index)
>
> but fixing (b) looks fundamentally hard. Gcc generates (for do_syscall()):
>
>         cmpq    $332, %rbp      #, nr
>         ja      .L295   #,
>         cmp $333,%rbp
>         sbb %rax,%rax;   #, nr, mask
>
> note how it completely pointlessly does the comparison twice, even
> though it could have just done
>
>         cmp $333,%rbp
>         jae      .L295   #,
>         sbb %rax,%rax;   #, nr, mask
>
> Ho humm. Sad.

Are there any compilers that would miscompile:

    mask = 0 - (index < size);

That might be a way to improve the assembly.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 07/10] x86: narrow out of bounds syscalls to sys_read under speculation
  2018-02-06 20:37         ` Dan Williams
@ 2018-02-06 20:42           ` Linus Torvalds
  2018-02-06 20:43             ` Linus Torvalds
  2018-02-06 20:49             ` Andy Lutomirski
  0 siblings, 2 replies; 70+ messages in thread
From: Linus Torvalds @ 2018-02-06 20:42 UTC (permalink / raw)
  To: Dan Williams
  Cc: Luis Henriques, Linux Kernel Mailing List, linux-arch,
	Kernel Hardening, Greg KH, X86 ML, Ingo Molnar, Andy Lutomirski,
	H. Peter Anvin, Thomas Gleixner, Andrew Morton, Alan Cox

On Tue, Feb 6, 2018 at 12:37 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>
> Are there any compilers that would miscompile:
>
>     mask = 0 - (index < size);
>
> That might be a way to improve the assembly.

Sadly, that is *very* easy to miscompile. In fact, I'd be very
surprised indeed if any compiler worth its name wouldn't combine the
comparison with the conditional branch it accompanies, and just turn
that into a constant. IOW, you'd get

        mask = 0 - (index < size);
        if (index <= size) {
                 ... use mask ..

and the compiler would just turn that into

        if (index <= size) {
                mask = -1;

and be done with it.

               Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 07/10] x86: narrow out of bounds syscalls to sys_read under speculation
  2018-02-06 20:42           ` Linus Torvalds
@ 2018-02-06 20:43             ` Linus Torvalds
  2018-02-06 20:49             ` Andy Lutomirski
  1 sibling, 0 replies; 70+ messages in thread
From: Linus Torvalds @ 2018-02-06 20:43 UTC (permalink / raw)
  To: Dan Williams
  Cc: Luis Henriques, Linux Kernel Mailing List, linux-arch,
	Kernel Hardening, Greg KH, X86 ML, Ingo Molnar, Andy Lutomirski,
	H. Peter Anvin, Thomas Gleixner, Andrew Morton, Alan Cox

On Tue, Feb 6, 2018 at 12:42 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Sadly, that is *very* easy to miscompile.

Side note: don't read email, go watch the falcon heavy takeoff.

              Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 07/10] x86: narrow out of bounds syscalls to sys_read under speculation
  2018-02-06 20:42           ` Linus Torvalds
  2018-02-06 20:43             ` Linus Torvalds
@ 2018-02-06 20:49             ` Andy Lutomirski
  2018-02-06 20:58               ` Linus Torvalds
  1 sibling, 1 reply; 70+ messages in thread
From: Andy Lutomirski @ 2018-02-06 20:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dan Williams, Luis Henriques, Linux Kernel Mailing List,
	linux-arch, Kernel Hardening, Greg KH, X86 ML, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Thomas Gleixner, Andrew Morton,
	Alan Cox

On Tue, Feb 6, 2018 at 8:42 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Feb 6, 2018 at 12:37 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>>
>> Are there any compilers that would miscompile:
>>
>>     mask = 0 - (index < size);
>>
>> That might be a way to improve the assembly.
>
> Sadly, that is *very* easy to miscompile. In fact, I'd be very
> surprised indeed if any compiler worth its name wouldn't combine the
> comparison with the conditional branch it accompanies, and just turn
> that into a constant. IOW, you'd get
>
>         mask = 0 - (index < size);
>         if (index <= size) {
>                  ... use mask ..
>
> and the compiler would just turn that into
>
>         if (index <= size) {
>                 mask = -1;
>
> and be done with it.
>
>                Linus

Can you use @cc to make an asm statement that outputs both the masked
array index and the "if" condition?  I can never remember the syntax,
but something like:

asm ("cmp %[limit], %[index]\n\tcmovae %[zero], %[index]" : [index]
"+" (index), "@ccb" (result));

Then you shove this into a statement expression macro so you can do:

if (index_mask_nospec(&nr, NR_syscalls)) {
  ... sys_call_table[nr] ..;
}

(Caveat emptor: I can also *ever* remember which way the $*!& AT&T
syntax cmp instruction goes.)

A down side is that nr actually ends up containing zero outside the
if.  *That* could be avoided with jump labels.

--Andy

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 07/10] x86: narrow out of bounds syscalls to sys_read under speculation
  2018-02-06 20:49             ` Andy Lutomirski
@ 2018-02-06 20:58               ` Linus Torvalds
  2018-02-06 21:37                 ` Dan Williams
  0 siblings, 1 reply; 70+ messages in thread
From: Linus Torvalds @ 2018-02-06 20:58 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dan Williams, Luis Henriques, Linux Kernel Mailing List,
	linux-arch, Kernel Hardening, Greg KH, X86 ML, Ingo Molnar,
	H. Peter Anvin, Thomas Gleixner, Andrew Morton, Alan Cox

On Tue, Feb 6, 2018 at 12:49 PM, Andy Lutomirski <luto@kernel.org> wrote:
>
> Can you use @cc to make an asm statement that outputs both the masked
> array index and the "if" condition?  I can never remember the syntax,
> but something like:

Yes. Although I'd actually suggest just using an "asm goto" if we
really want to optimize this. Give the "index_mask_nospec()" a third
argument that is the label to jump to for overflow.

Then you can just decide how to implement it best for any particular
architecture (and compiler limitation).

              Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 07/10] x86: narrow out of bounds syscalls to sys_read under speculation
  2018-02-06 20:58               ` Linus Torvalds
@ 2018-02-06 21:37                 ` Dan Williams
  2018-02-06 22:52                   ` Linus Torvalds
  0 siblings, 1 reply; 70+ messages in thread
From: Dan Williams @ 2018-02-06 21:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Luis Henriques, Linux Kernel Mailing List,
	linux-arch, Kernel Hardening, Greg KH, X86 ML, Ingo Molnar,
	H. Peter Anvin, Thomas Gleixner, Andrew Morton, Alan Cox

On Tue, Feb 6, 2018 at 12:58 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Feb 6, 2018 at 12:49 PM, Andy Lutomirski <luto@kernel.org> wrote:
>>
>> Can you use @cc to make an asm statement that outputs both the masked
>> array index and the "if" condition?  I can never remember the syntax,
>> but something like:
>
> Yes. Although I'd actually suggest just using an "asm goto" if we
> really want to optimize this. Give the "index_mask_nospec()" a third
> argument that is the label to jump to for overflow.
>
> Then you can just decide how to implement it best for any particular
> architecture (and compiler limitation).

At that point we're basically just back to the array_ptr() version
that returned a sanitized pointer to an array element.

        call = array_ptr(sys_call_table, nr & __SYSCALL_MASK, NR_syscalls);
        if (likely(call))
                regs->ax = (*call)(
                        regs->di, regs->si, regs->dx,
                        regs->r10, regs->r8, regs->r9);


     e1e:       ba 4d 01 00 00          mov    $0x14d,%edx
     e23:       48 39 d5                cmp    %rdx,%rbp
     e26:       48 19 d2                sbb    %rdx,%rdx
        call = array_ptr(sys_call_table, nr & __SYSCALL_MASK, NR_syscalls);
     e29:       48 21 d5                and    %rdx,%rbp
     e2c:       48 8d 04 ed 00 00 00    lea    0x0(,%rbp,8),%rax
     e33:       00
        if (likely(call))
     e34:       48 21 d0                and    %rdx,%rax
     e37:       74 1e                   je     e57 <do_syscall_64+0x77>
                regs->ax = (*call)(
     e39:       48 8b 4b 38             mov    0x38(%rbx),%rcx
     e3d:       48 8b 53 60             mov    0x60(%rbx),%rdx
     e41:       48 8b 73 68             mov    0x68(%rbx),%rsi
     e45:       48 8b 7b 70             mov    0x70(%rbx),%rdi
     e49:       4c 8b 4b 40             mov    0x40(%rbx),%r9
     e4d:       4c 8b 43 48             mov    0x48(%rbx),%r8
     e51:       ff 10                   callq  *(%rax)
     e53:       48 89 43 50             mov    %rax,0x50(%rbx)
     e57:       65 48 8b 04 25 00 00    mov    %gs:0x0,%rax

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 07/10] x86: narrow out of bounds syscalls to sys_read under speculation
  2018-02-06 19:48     ` Dan Williams
@ 2018-02-06 22:51         ` Luis Henriques
  2018-02-06 22:51         ` Luis Henriques
  1 sibling, 0 replies; 70+ messages in thread
From: Luis Henriques @ 2018-02-06 22:51 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux Kernel Mailing List, linux-arch, Kernel Hardening, Greg KH,
	X86 ML, Ingo Molnar, Andy Lutomirski, H. Peter Anvin,
	Thomas Gleixner, Linus Torvalds, Andrew Morton, Alan Cox

On Tue, Feb 06, 2018 at 11:48:45AM -0800, Dan Williams wrote:
> On Tue, Feb 6, 2018 at 11:29 AM, Luis Henriques <lhenriques@suse.com> wrote:
> > On Thu, Jan 18, 2018 at 04:02:21PM -0800, Dan Williams wrote:
> >> The syscall table base is a user controlled function pointer in kernel
> >> space. Like, 'get_user, use 'MASK_NOSPEC' to prevent any out of bounds
> >> speculation. While retpoline prevents speculating into the user
> >> controlled target it does not stop the pointer de-reference, the concern
> >> is leaking memory relative to the syscall table base.
> >
> > This patch seems to cause a regression.  An easy way to reproduce what
> > I'm seeing is to run the samples/statx/test-statx.  Here's what I see
> > when I have this patchset applied:
> >
> > # ./test-statx /tmp
> > statx(/tmp) = -1
> > /tmp: Bad file descriptor
> >
> > Reverting this single patch seems to fix it.
> 
> Just to clarify, when you say "this patch" you mean:
> 
>      2fbd7af5af86 x86/syscall: Sanitize syscall table de-references
> under speculation
> 
> ...not this early MASK_NOSPEC version of the patch, right?

*sigh*

Looks like I spent some good amount of time hunting a non-issue just
because I have enough old branches hanging around to confusing me :-(

Sorry for the noise.

Cheers,
--
Luís

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 07/10] x86: narrow out of bounds syscalls to sys_read under speculation
@ 2018-02-06 22:51         ` Luis Henriques
  0 siblings, 0 replies; 70+ messages in thread
From: Luis Henriques @ 2018-02-06 22:51 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux Kernel Mailing List, linux-arch, Kernel Hardening, Greg KH,
	X86 ML, Ingo Molnar, Andy Lutomirski, H. Peter Anvin,
	Thomas Gleixner, Linus Torvalds, Andrew Morton, Alan Cox

On Tue, Feb 06, 2018 at 11:48:45AM -0800, Dan Williams wrote:
> On Tue, Feb 6, 2018 at 11:29 AM, Luis Henriques <lhenriques@suse.com> wrote:
> > On Thu, Jan 18, 2018 at 04:02:21PM -0800, Dan Williams wrote:
> >> The syscall table base is a user controlled function pointer in kernel
> >> space. Like, 'get_user, use 'MASK_NOSPEC' to prevent any out of bounds
> >> speculation. While retpoline prevents speculating into the user
> >> controlled target it does not stop the pointer de-reference, the concern
> >> is leaking memory relative to the syscall table base.
> >
> > This patch seems to cause a regression.  An easy way to reproduce what
> > I'm seeing is to run the samples/statx/test-statx.  Here's what I see
> > when I have this patchset applied:
> >
> > # ./test-statx /tmp
> > statx(/tmp) = -1
> > /tmp: Bad file descriptor
> >
> > Reverting this single patch seems to fix it.
> 
> Just to clarify, when you say "this patch" you mean:
> 
>      2fbd7af5af86 x86/syscall: Sanitize syscall table de-references
> under speculation
> 
> ...not this early MASK_NOSPEC version of the patch, right?

*sigh*

Looks like I spent some good amount of time hunting a non-issue just
because I have enough old branches hanging around to confusing me :-(

Sorry for the noise.

Cheers,

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 07/10] x86: narrow out of bounds syscalls to sys_read under speculation
  2018-02-06 21:37                 ` Dan Williams
@ 2018-02-06 22:52                   ` Linus Torvalds
  2018-02-07  0:33                     ` Dan Williams
  0 siblings, 1 reply; 70+ messages in thread
From: Linus Torvalds @ 2018-02-06 22:52 UTC (permalink / raw)
  To: Dan Williams
  Cc: Andy Lutomirski, Luis Henriques, Linux Kernel Mailing List,
	linux-arch, Kernel Hardening, Greg KH, X86 ML, Ingo Molnar,
	H. Peter Anvin, Thomas Gleixner, Andrew Morton, Alan Cox

On Tue, Feb 6, 2018 at 1:37 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>
> At that point we're basically just back to the array_ptr() version
> that returned a sanitized pointer to an array element.

.. that one does an extra unnecessary 'andq' instead of the duplicated
cmp.  But at least it avoids comparing that 32-bit integer twice, so
it's probably slightly smaller.

(And your code generation is without the "r" -> "ir" fix for the size argument)

Probably doesn't matter. But a "asm goto" would give you at least
potentially optimal code.

            Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 07/10] x86: narrow out of bounds syscalls to sys_read under speculation
  2018-02-06 22:52                   ` Linus Torvalds
@ 2018-02-07  0:33                     ` Dan Williams
  2018-02-07  1:23                       ` Linus Torvalds
  0 siblings, 1 reply; 70+ messages in thread
From: Dan Williams @ 2018-02-07  0:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Luis Henriques, Linux Kernel Mailing List,
	linux-arch, Kernel Hardening, Greg KH, X86 ML, Ingo Molnar,
	H. Peter Anvin, Thomas Gleixner, Andrew Morton, Alan Cox

On Tue, Feb 6, 2018 at 2:52 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Feb 6, 2018 at 1:37 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>>
>> At that point we're basically just back to the array_ptr() version
>> that returned a sanitized pointer to an array element.
>
> .. that one does an extra unnecessary 'andq' instead of the duplicated
> cmp.  But at least it avoids comparing that 32-bit integer twice, so
> it's probably slightly smaller.
>
> (And your code generation is without the "r" -> "ir" fix for the size argument)
>
> Probably doesn't matter. But a "asm goto" would give you at least
> potentially optimal code.
>

Should we go with array_element_nospec() in the meantime? So we're not
depending on jump labels? With the constraint fix and killing that
superfluous AND the assembly is now:

     e26:       48 81 fd 4d 01 00 00    cmp    $0x14d,%rbp
     e2d:       48 19 d2                sbb    %rdx,%rdx
                        NR_syscalls);
        if (likely(call))
     e30:       48 21 d0                and    %rdx,%rax
     e33:       74 1e                   je     e53 <do_syscall_64+0x73>
                regs->ax = (*call)(regs->di, regs->si, regs->dx,
     e35:       48 8b 4b 38             mov    0x38(%rbx),%rcx
     e39:       48 8b 53 60             mov    0x60(%rbx),%rdx
     e3d:       48 8b 73 68             mov    0x68(%rbx),%rsi
     e41:       48 8b 7b 70             mov    0x70(%rbx),%rdi
     e45:       4c 8b 4b 40             mov    0x40(%rbx),%r9
     e49:       4c 8b 43 48             mov    0x48(%rbx),%r8
     e4d:       ff 10                   callq  *(%rax)
     e4f:       48 89 43 50             mov    %rax,0x50(%rbx)
     e53:       65 48 8b 04 25 00 00    mov    %gs:0x0,%rax

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v4 07/10] x86: narrow out of bounds syscalls to sys_read under speculation
  2018-02-07  0:33                     ` Dan Williams
@ 2018-02-07  1:23                       ` Linus Torvalds
  0 siblings, 0 replies; 70+ messages in thread
From: Linus Torvalds @ 2018-02-07  1:23 UTC (permalink / raw)
  To: Dan Williams
  Cc: Andy Lutomirski, Luis Henriques, Linux Kernel Mailing List,
	linux-arch, Kernel Hardening, Greg KH, X86 ML, Ingo Molnar,
	H. Peter Anvin, Thomas Gleixner, Andrew Morton, Alan Cox

On Tue, Feb 6, 2018 at 4:33 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>
> Should we go with array_element_nospec() in the meantime? So we're not
> depending on jump labels? With the constraint fix and killing that
> superfluous AND the assembly is now:
>
>      e26:       48 81 fd 4d 01 00 00    cmp    $0x14d,%rbp
>      e2d:       48 19 d2                sbb    %rdx,%rdx
>                         NR_syscalls);
>         if (likely(call))
>      e30:       48 21 d0                and    %rdx,%rax
>      e33:       74 1e                   je     e53 <do_syscall_64+0x73>
>                 regs->ax = (*call)(regs->di, regs->si, regs->dx,
>      e35:       48 8b 4b 38             mov    0x38(%rbx),%rcx
>      e39:       48 8b 53 60             mov    0x60(%rbx),%rdx
>      e3d:       48 8b 73 68             mov    0x68(%rbx),%rsi
>      e41:       48 8b 7b 70             mov    0x70(%rbx),%rdi
>      e45:       4c 8b 4b 40             mov    0x40(%rbx),%r9
>      e49:       4c 8b 43 48             mov    0x48(%rbx),%r8
>      e4d:       ff 10                   callq  *(%rax)

That looks fairly optimal, except for the fact that the callq is
through a register.

Of course, that register-indirect calling convention is forced on us
by retpoline anyway (which you don't have enabled, likely because of a
lack of compiler). But without retpoline that callq could be

         callq  sys_call_table(,%rax,8)

if the masking is done on the index (and if the conditional jump had
been done on the cmp rather than the later 'and').

Instead, you have a

        leaq    sys_call_table(,%rbp,8),%rax

hiding somewhere earlier that doesn't show in your asm snippet.

Oh well. We'll have an extra instruction however we do this. I guess
that's just something we'll have to live with. No more bikeshedding..

            Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

end of thread, other threads:[~2018-02-07  1:23 UTC | newest]

Thread overview: 70+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-19  0:01 [PATCH v4 00/10] prevent bounds-check bypass via speculative execution Dan Williams
2018-01-19  0:01 ` [kernel-hardening] " Dan Williams
2018-01-19  0:01 ` Dan Williams
2018-01-19  0:01 ` [PATCH v4 01/10] Documentation: document array_ptr Dan Williams
2018-01-19  0:01   ` [kernel-hardening] " Dan Williams
2018-01-19  0:01 ` [PATCH v4 02/10] asm/nospec, array_ptr: sanitize speculative array de-references Dan Williams
2018-01-19  0:01   ` [kernel-hardening] " Dan Williams
2018-01-19 10:20   ` Jann Horn
2018-01-19 17:48     ` Adam Sampson
2018-01-19 17:48       ` Adam Sampson
2018-01-19 17:48       ` Adam Sampson
2018-01-19 17:48       ` Adam Sampson
2018-01-19 18:12       ` Dan Williams
2018-01-19 18:18         ` Will Deacon
2018-01-19 18:18           ` Will Deacon
2018-01-19 18:26           ` [kernel-hardening] " Dan Williams
2018-01-19 18:18     ` Linus Torvalds
2018-01-19 18:18       ` Linus Torvalds
2018-01-19 20:55       ` [kernel-hardening] " Dan Williams
2018-01-25  7:09   ` Cyril Novikov
2018-01-25  7:09     ` [kernel-hardening] " Cyril Novikov
2018-01-25 22:37     ` Dan Williams
2018-01-25 22:37       ` [kernel-hardening] " Dan Williams
2018-01-19  0:01 ` [PATCH v4 03/10] x86: implement array_ptr_mask() Dan Williams
2018-01-19  0:01   ` [kernel-hardening] " Dan Williams
2018-01-19  0:01 ` [PATCH v4 04/10] x86: introduce __uaccess_begin_nospec and ifence Dan Williams
2018-01-19  0:01   ` [kernel-hardening] " Dan Williams
2018-01-19  0:02 ` [PATCH v4 05/10] x86, __get_user: use __uaccess_begin_nospec Dan Williams
2018-01-19  0:02   ` [kernel-hardening] " Dan Williams
2018-01-19  0:02 ` [PATCH v4 06/10] x86, get_user: use pointer masking to limit speculation Dan Williams
2018-01-19  0:02   ` [kernel-hardening] " Dan Williams
2018-01-19  0:02 ` [PATCH v4 07/10] x86: narrow out of bounds syscalls to sys_read under speculation Dan Williams
2018-01-19  0:02   ` [kernel-hardening] " Dan Williams
2018-01-24 14:40   ` Jiri Slaby
2018-01-24 14:40     ` [kernel-hardening] " Jiri Slaby
2018-02-06 19:29   ` Luis Henriques
2018-02-06 19:48     ` Dan Williams
2018-02-06 20:26       ` Linus Torvalds
2018-02-06 20:37         ` Dan Williams
2018-02-06 20:42           ` Linus Torvalds
2018-02-06 20:43             ` Linus Torvalds
2018-02-06 20:49             ` Andy Lutomirski
2018-02-06 20:58               ` Linus Torvalds
2018-02-06 21:37                 ` Dan Williams
2018-02-06 22:52                   ` Linus Torvalds
2018-02-07  0:33                     ` Dan Williams
2018-02-07  1:23                       ` Linus Torvalds
2018-02-06 22:51       ` Luis Henriques
2018-02-06 22:51         ` Luis Henriques
2018-01-19  0:02 ` [PATCH v4 08/10] vfs, fdtable: prevent bounds-check bypass via speculative execution Dan Williams
2018-01-19  0:02   ` [kernel-hardening] " Dan Williams
2018-01-19  0:02 ` [PATCH v4 09/10] kvm, x86: fix spectre-v1 mitigation Dan Williams
2018-01-19  0:02   ` [kernel-hardening] " Dan Williams
2018-01-19  8:42   ` Paolo Bonzini
2018-01-19  8:42     ` [kernel-hardening] " Paolo Bonzini
2018-01-19  0:02 ` [PATCH v4 10/10] nl80211: sanitize array index in parse_txq_params Dan Williams
2018-01-19  0:02   ` [kernel-hardening] " Dan Williams
2018-01-19  0:02   ` Dan Williams
2018-01-21 10:37   ` Johannes Berg
2018-01-21 10:37     ` [kernel-hardening] " Johannes Berg
2018-01-21 10:37     ` Johannes Berg
2018-01-20  6:58 ` [PATCH v4 00/10] prevent bounds-check bypass via speculative execution Dan Williams
2018-01-20  6:58   ` [kernel-hardening] " Dan Williams
2018-01-20  6:58   ` Dan Williams
2018-01-20 16:56   ` Alexei Starovoitov
2018-01-20 16:56     ` [kernel-hardening] " Alexei Starovoitov
2018-01-20 16:56     ` Alexei Starovoitov
2018-01-20 17:07     ` Alexei Starovoitov
2018-01-20 17:07       ` [kernel-hardening] " Alexei Starovoitov
2018-01-20 17:07       ` Alexei Starovoitov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.