[PATCH v5 00/12] spectre variant1 mitigations for tip/x86/pti

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v5 00/12] spectre variant1 mitigations for tip/x86/pti
@ 2018-01-27  7:55 ` Dan Williams
  0 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-01-27  7:55 UTC (permalink / raw)
  To: tglx
  Cc: Mark Rutland, Cyril Novikov, kernel-hardening, Peter Zijlstra,
	Catalin Marinas, Will Deacon, H. Peter Anvin, Jiri Slaby,
	Elena Reshetova, linux-arch, Andi Kleen, Jonathan Corbet, x86,
	Russell King, Ingo Molnar, Andrew Honig, alan, Tom Lendacky,
	Kees Cook, Al Viro, Andy Lutomirski, Jim Mattson,
	Christian Lamparter, gregkh, linux-wireless, Paolo Bonzini,
	Johannes Berg, torvalds, David S. Miller

Hi Thomas,

Here's another spin of the spectre-v1 mitigations for 4.16.

Changes since v4.1: [1]
* Tweak the sanitization scheme yet again to make it even simpler. Now,
  instead of 'array_ptr' to get a sanitized pointer to an array element,
  just provide an array index sanitization helper 'array_idx' to be called
  after successfully validating the index is in bounds. I.e. in the
  exact same location one would otherwise put an lfence, place this
  sanitizer:

      if (idx < sz) {
          idx = array_idx(idx, sz);
          val = array[idx];
      }

  This lets the implementation include more sanity checking that the
  compiler can usually compile out. It otherwise appears to produce
  better assembly. This also cleans up the concern about comparing the
  value returned from array_ptr to create another speculation point.
  (Russell, Linus, Cyril)

* Drop the syscall_64_fastpath.  This is the straightforward patch from
  Linus that might also be in flight from Andy, but I went ahead and
  included it since I did not see it on LKML yet.

* Kill the MASK_NOSPEC macro and just open code it. (Andy)

* Add system-call-number sanitization to the slow path syscall table
  lookups.

* Redo the array_ptr conversions with array_idx.

* Update /sys/devices/system/cpu/vulnerabilities/spectre_v1 to indicate
  the new protections. It now reports "Vulnerable: Minimal user pointer
  sanitization". (Jiri)

---

Dan Williams (11):
      array_idx: sanitize speculative array de-references
      x86: implement array_idx_mask
      x86: introduce __uaccess_begin_nospec and ifence
      x86, __get_user: use __uaccess_begin_nospec
      x86, get_user: use pointer masking to limit speculation
      x86: remove the syscall_64 fast-path
      x86: sanitize sycall table de-references under speculation
      vfs, fdtable: prevent bounds-check bypass via speculative execution
      kvm, x86: update spectre-v1 mitigation
      nl80211: sanitize array index in parse_txq_params
      x86/spectre: report get_user mitigation for spectre_v1

Mark Rutland (1):
      Documentation: document array_idx

 Documentation/speculation.txt     |   87 ++++++++++++++++++++++++++++
 arch/x86/entry/common.c           |    3 +
 arch/x86/entry/entry_64.S         |  116 -------------------------------------
 arch/x86/entry/syscall_64.c       |    7 +-
 arch/x86/include/asm/barrier.h    |   26 ++++++++
 arch/x86/include/asm/msr.h        |    3 -
 arch/x86/include/asm/uaccess.h    |   15 ++++-
 arch/x86/include/asm/uaccess_32.h |    6 +-
 arch/x86/include/asm/uaccess_64.h |   12 ++--
 arch/x86/kernel/cpu/bugs.c        |    2 -
 arch/x86/kvm/vmx.c                |   14 +++-
 arch/x86/lib/getuser.S            |   10 +++
 arch/x86/lib/usercopy_32.c        |    8 +--
 include/linux/fdtable.h           |    5 +-
 include/linux/nospec.h            |   64 ++++++++++++++++++++
 net/wireless/nl80211.c            |    9 ++-
 16 files changed, 239 insertions(+), 148 deletions(-)
 create mode 100644 Documentation/speculation.txt
 create mode 100644 include/linux/nospec.h

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v5 00/12] spectre variant1 mitigations for tip/x86/pti
@ 2018-01-27  7:55 ` Dan Williams
  0 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-01-27  7:55 UTC (permalink / raw)
  To: tglx
  Cc: Mark Rutland, Cyril Novikov, kernel-hardening, Peter Zijlstra,
	Catalin Marinas, Will Deacon, H. Peter Anvin, Jiri Slaby,
	Elena Reshetova, linux-arch, Andi Kleen, Jonathan Corbet, x86,
	Russell King, Ingo Molnar, Andrew Honig, alan, Tom Lendacky,
	Kees Cook, Al Viro, Andy Lutomirski, Jim Mattson

Hi Thomas,

Here's another spin of the spectre-v1 mitigations for 4.16.

Changes since v4.1: [1]
* Tweak the sanitization scheme yet again to make it even simpler. Now,
  instead of 'array_ptr' to get a sanitized pointer to an array element,
  just provide an array index sanitization helper 'array_idx' to be called
  after successfully validating the index is in bounds. I.e. in the
  exact same location one would otherwise put an lfence, place this
  sanitizer:

      if (idx < sz) {
          idx = array_idx(idx, sz);
          val = array[idx];
      }

  This lets the implementation include more sanity checking that the
  compiler can usually compile out. It otherwise appears to produce
  better assembly. This also cleans up the concern about comparing the
  value returned from array_ptr to create another speculation point.
  (Russell, Linus, Cyril)

* Drop the syscall_64_fastpath.  This is the straightforward patch from
  Linus that might also be in flight from Andy, but I went ahead and
  included it since I did not see it on LKML yet.

* Kill the MASK_NOSPEC macro and just open code it. (Andy)

* Add system-call-number sanitization to the slow path syscall table
  lookups.

* Redo the array_ptr conversions with array_idx.

* Update /sys/devices/system/cpu/vulnerabilities/spectre_v1 to indicate
  the new protections. It now reports "Vulnerable: Minimal user pointer
  sanitization". (Jiri)

---

Dan Williams (11):
      array_idx: sanitize speculative array de-references
      x86: implement array_idx_mask
      x86: introduce __uaccess_begin_nospec and ifence
      x86, __get_user: use __uaccess_begin_nospec
      x86, get_user: use pointer masking to limit speculation
      x86: remove the syscall_64 fast-path
      x86: sanitize sycall table de-references under speculation
      vfs, fdtable: prevent bounds-check bypass via speculative execution
      kvm, x86: update spectre-v1 mitigation
      nl80211: sanitize array index in parse_txq_params
      x86/spectre: report get_user mitigation for spectre_v1

Mark Rutland (1):
      Documentation: document array_idx

 Documentation/speculation.txt     |   87 ++++++++++++++++++++++++++++
 arch/x86/entry/common.c           |    3 +
 arch/x86/entry/entry_64.S         |  116 -------------------------------------
 arch/x86/entry/syscall_64.c       |    7 +-
 arch/x86/include/asm/barrier.h    |   26 ++++++++
 arch/x86/include/asm/msr.h        |    3 -
 arch/x86/include/asm/uaccess.h    |   15 ++++-
 arch/x86/include/asm/uaccess_32.h |    6 +-
 arch/x86/include/asm/uaccess_64.h |   12 ++--
 arch/x86/kernel/cpu/bugs.c        |    2 -
 arch/x86/kvm/vmx.c                |   14 +++-
 arch/x86/lib/getuser.S            |   10 +++
 arch/x86/lib/usercopy_32.c        |    8 +--
 include/linux/fdtable.h           |    5 +-
 include/linux/nospec.h            |   64 ++++++++++++++++++++
 net/wireless/nl80211.c            |    9 ++-
 16 files changed, 239 insertions(+), 148 deletions(-)
 create mode 100644 Documentation/speculation.txt
 create mode 100644 include/linux/nospec.h

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [kernel-hardening] [PATCH v5 00/12] spectre variant1 mitigations for tip/x86/pti
@ 2018-01-27  7:55 ` Dan Williams
  0 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-01-27  7:55 UTC (permalink / raw)
  To: tglx
  Cc: Mark Rutland, Cyril Novikov, kernel-hardening, Peter Zijlstra,
	Catalin Marinas, Will Deacon, H. Peter Anvin, Jiri Slaby,
	Elena Reshetova, linux-arch, Andi Kleen, Jonathan Corbet, x86,
	Russell King, Ingo Molnar, Andrew Honig, alan, Tom Lendacky,
	Kees Cook, Al Viro, Andy Lutomirski, Jim Mattson,
	Christian Lamparter, gregkh, linux-wireless, Paolo Bonzini,
	Johannes Berg, torvalds, David S. Miller

Hi Thomas,

Here's another spin of the spectre-v1 mitigations for 4.16.

Changes since v4.1: [1]
* Tweak the sanitization scheme yet again to make it even simpler. Now,
  instead of 'array_ptr' to get a sanitized pointer to an array element,
  just provide an array index sanitization helper 'array_idx' to be called
  after successfully validating the index is in bounds. I.e. in the
  exact same location one would otherwise put an lfence, place this
  sanitizer:

      if (idx < sz) {
          idx = array_idx(idx, sz);
          val = array[idx];
      }

  This lets the implementation include more sanity checking that the
  compiler can usually compile out. It otherwise appears to produce
  better assembly. This also cleans up the concern about comparing the
  value returned from array_ptr to create another speculation point.
  (Russell, Linus, Cyril)

* Drop the syscall_64_fastpath.  This is the straightforward patch from
  Linus that might also be in flight from Andy, but I went ahead and
  included it since I did not see it on LKML yet.

* Kill the MASK_NOSPEC macro and just open code it. (Andy)

* Add system-call-number sanitization to the slow path syscall table
  lookups.

* Redo the array_ptr conversions with array_idx.

* Update /sys/devices/system/cpu/vulnerabilities/spectre_v1 to indicate
  the new protections. It now reports "Vulnerable: Minimal user pointer
  sanitization". (Jiri)

---

Dan Williams (11):
      array_idx: sanitize speculative array de-references
      x86: implement array_idx_mask
      x86: introduce __uaccess_begin_nospec and ifence
      x86, __get_user: use __uaccess_begin_nospec
      x86, get_user: use pointer masking to limit speculation
      x86: remove the syscall_64 fast-path
      x86: sanitize sycall table de-references under speculation
      vfs, fdtable: prevent bounds-check bypass via speculative execution
      kvm, x86: update spectre-v1 mitigation
      nl80211: sanitize array index in parse_txq_params
      x86/spectre: report get_user mitigation for spectre_v1

Mark Rutland (1):
      Documentation: document array_idx

 Documentation/speculation.txt     |   87 ++++++++++++++++++++++++++++
 arch/x86/entry/common.c           |    3 +
 arch/x86/entry/entry_64.S         |  116 -------------------------------------
 arch/x86/entry/syscall_64.c       |    7 +-
 arch/x86/include/asm/barrier.h    |   26 ++++++++
 arch/x86/include/asm/msr.h        |    3 -
 arch/x86/include/asm/uaccess.h    |   15 ++++-
 arch/x86/include/asm/uaccess_32.h |    6 +-
 arch/x86/include/asm/uaccess_64.h |   12 ++--
 arch/x86/kernel/cpu/bugs.c        |    2 -
 arch/x86/kvm/vmx.c                |   14 +++-
 arch/x86/lib/getuser.S            |   10 +++
 arch/x86/lib/usercopy_32.c        |    8 +--
 include/linux/fdtable.h           |    5 +-
 include/linux/nospec.h            |   64 ++++++++++++++++++++
 net/wireless/nl80211.c            |    9 ++-
 16 files changed, 239 insertions(+), 148 deletions(-)
 create mode 100644 Documentation/speculation.txt
 create mode 100644 include/linux/nospec.h

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v5 01/12] Documentation: document array_idx
  2018-01-27  7:55 ` Dan Williams
@ 2018-01-27  7:55   ` Dan Williams
  -1 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-01-27  7:55 UTC (permalink / raw)
  To: tglx
  Cc: Mark Rutland, linux-arch, Kees Cook, kernel-hardening,
	Peter Zijlstra, gregkh, Jonathan Corbet, Will Deacon, torvalds,
	alan

From: Mark Rutland <mark.rutland@arm.com>

Document the rationale and usage of the new array_idx() helper.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Documentation/speculation.txt |   87 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 87 insertions(+)
 create mode 100644 Documentation/speculation.txt

diff --git a/Documentation/speculation.txt b/Documentation/speculation.txt
new file mode 100644
index 000000000000..308177cab617
--- /dev/null
+++ b/Documentation/speculation.txt
@@ -0,0 +1,87 @@
+This document explains potential effects of speculation, and how undesirable
+effects can be mitigated portably using common APIs.
+
+===========
+Speculation
+===========
+
+To improve performance and minimize average latencies, many contemporary CPUs
+employ speculative execution techniques such as branch prediction, performing
+work which may be discarded at a later stage.
+
+Typically speculative execution cannot be observed from architectural state,
+such as the contents of registers. However, in some cases it is possible to
+observe its impact on microarchitectural state, such as the presence or
+absence of data in caches. Such state may form side-channels which can be
+observed to extract secret information.
+
+For example, in the presence of branch prediction, it is possible for bounds
+checks to be ignored by code which is speculatively executed. Consider the
+following code:
+
+	int load_array(int *array, unsigned int idx)
+	{
+		if (idx >= MAX_ARRAY_ELEMS)
+			return 0;
+		else
+			return array[idx];
+	}
+
+Which, on arm64, may be compiled to an assembly sequence such as:
+
+	CMP	<idx>, #MAX_ARRAY_ELEMS
+	B.LT	less
+	MOV	<returnval>, #0
+	RET
+  less:
+	LDR	<returnval>, [<array>, <idx>]
+	RET
+
+It is possible that a CPU mis-predicts the conditional branch, and
+speculatively loads array[idx], even if idx >= MAX_ARRAY_ELEMS. This value
+will subsequently be discarded, but the speculated load may affect
+microarchitectural state which can be subsequently measured.
+
+More complex sequences involving multiple dependent memory accesses may result
+in sensitive information being leaked. Consider the following code, building
+on the prior example:
+
+	int load_dependent_arrays(int *arr1, int *arr2, int idx)
+	{
+		int val1, val2,
+
+		val1 = load_array(arr1, idx);
+		val2 = load_array(arr2, val1);
+
+		return val2;
+	}
+
+Under speculation, the first call to load_array() may return the value of an
+out-of-bounds address, while the second call will influence microarchitectural
+state dependent on this value. This may provide an arbitrary read primitive.
+
+====================================
+Mitigating speculation side-channels
+====================================
+
+The kernel provides a generic API to ensure that bounds checks are respected
+even under speculation. Architectures which are affected by speculation-based
+side-channels are expected to implement these primitives.
+
+The array_idx() helper in <linux/nospec.h> can be used to prevent
+information from being leaked via side-channels.
+
+A call to array_idx(idx, sz) returns a sanitized index value that is
+bounded to [0, sz) even under cpu speculation conditions.
+
+This can be used to protect the earlier load_array() example:
+
+	int load_array(int *array, unsigned int idx)
+	{
+		if (idx >= MAX_ARRAY_ELEMS)
+			return 0;
+		else {
+			idx = array_idx(idx, MAX_ARRAY_ELEMS);
+			return array[idx];
+		}
+	}

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [kernel-hardening] [PATCH v5 01/12] Documentation: document array_idx
@ 2018-01-27  7:55   ` Dan Williams
  0 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-01-27  7:55 UTC (permalink / raw)
  To: tglx
  Cc: Mark Rutland, linux-arch, Kees Cook, kernel-hardening,
	Peter Zijlstra, gregkh, Jonathan Corbet, Will Deacon, torvalds,
	alan

From: Mark Rutland <mark.rutland@arm.com>

Document the rationale and usage of the new array_idx() helper.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Documentation/speculation.txt |   87 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 87 insertions(+)
 create mode 100644 Documentation/speculation.txt

diff --git a/Documentation/speculation.txt b/Documentation/speculation.txt
new file mode 100644
index 000000000000..308177cab617
--- /dev/null
+++ b/Documentation/speculation.txt
@@ -0,0 +1,87 @@
+This document explains potential effects of speculation, and how undesirable
+effects can be mitigated portably using common APIs.
+
+===========
+Speculation
+===========
+
+To improve performance and minimize average latencies, many contemporary CPUs
+employ speculative execution techniques such as branch prediction, performing
+work which may be discarded at a later stage.
+
+Typically speculative execution cannot be observed from architectural state,
+such as the contents of registers. However, in some cases it is possible to
+observe its impact on microarchitectural state, such as the presence or
+absence of data in caches. Such state may form side-channels which can be
+observed to extract secret information.
+
+For example, in the presence of branch prediction, it is possible for bounds
+checks to be ignored by code which is speculatively executed. Consider the
+following code:
+
+	int load_array(int *array, unsigned int idx)
+	{
+		if (idx >= MAX_ARRAY_ELEMS)
+			return 0;
+		else
+			return array[idx];
+	}
+
+Which, on arm64, may be compiled to an assembly sequence such as:
+
+	CMP	<idx>, #MAX_ARRAY_ELEMS
+	B.LT	less
+	MOV	<returnval>, #0
+	RET
+  less:
+	LDR	<returnval>, [<array>, <idx>]
+	RET
+
+It is possible that a CPU mis-predicts the conditional branch, and
+speculatively loads array[idx], even if idx >= MAX_ARRAY_ELEMS. This value
+will subsequently be discarded, but the speculated load may affect
+microarchitectural state which can be subsequently measured.
+
+More complex sequences involving multiple dependent memory accesses may result
+in sensitive information being leaked. Consider the following code, building
+on the prior example:
+
+	int load_dependent_arrays(int *arr1, int *arr2, int idx)
+	{
+		int val1, val2,
+
+		val1 = load_array(arr1, idx);
+		val2 = load_array(arr2, val1);
+
+		return val2;
+	}
+
+Under speculation, the first call to load_array() may return the value of an
+out-of-bounds address, while the second call will influence microarchitectural
+state dependent on this value. This may provide an arbitrary read primitive.
+
+====================================
+Mitigating speculation side-channels
+====================================
+
+The kernel provides a generic API to ensure that bounds checks are respected
+even under speculation. Architectures which are affected by speculation-based
+side-channels are expected to implement these primitives.
+
+The array_idx() helper in <linux/nospec.h> can be used to prevent
+information from being leaked via side-channels.
+
+A call to array_idx(idx, sz) returns a sanitized index value that is
+bounded to [0, sz) even under cpu speculation conditions.
+
+This can be used to protect the earlier load_array() example:
+
+	int load_array(int *array, unsigned int idx)
+	{
+		if (idx >= MAX_ARRAY_ELEMS)
+			return 0;
+		else {
+			idx = array_idx(idx, MAX_ARRAY_ELEMS);
+			return array[idx];
+		}
+	}

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v5 02/12] array_idx: sanitize speculative array de-references
  2018-01-27  7:55 ` Dan Williams
@ 2018-01-27  7:55   ` Dan Williams
  -1 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-01-27  7:55 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, Cyril Novikov, kernel-hardening, Peter Zijlstra,
	Catalin Marinas, x86, Will Deacon, Russell King, Ingo Molnar,
	gregkh, H. Peter Anvin, torvalds, alan

'array_idx' is proposed as a generic mechanism to mitigate against
Spectre-variant-1 attacks, i.e. an attack that bypasses boundary checks
via speculative execution). The 'array_idx' implementation is expected
to be safe for current generation cpus across multiple architectures
(ARM, x86).

Based on an original implementation by Linus Torvalds, tweaked to remove
speculative flows by Alexei Starovoitov, and tweaked again by Linus to
introduce an x86 assembly implementation for the mask generation.

Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org>
Co-developed-by: Alexei Starovoitov <ast@kernel.org>
Suggested-by: Cyril Novikov <cnovikov@lynx.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/nospec.h |   64 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)
 create mode 100644 include/linux/nospec.h

diff --git a/include/linux/nospec.h b/include/linux/nospec.h
new file mode 100644
index 000000000000..f59f81889ba3
--- /dev/null
+++ b/include/linux/nospec.h
@@ -0,0 +1,64 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright(c) 2018 Intel Corporation. All rights reserved.
+
+#ifndef __NOSPEC_H__
+#define __NOSPEC_H__
+
+/*
+ * When idx is out of bounds (idx >= sz), the sign bit will be set.
+ * Extend the sign bit to all bits and invert, giving a result of zero
+ * for an out of bounds idx, or ~0UL if within bounds [0, sz).
+ */
+#ifndef array_idx_mask
+static inline unsigned long array_idx_mask(unsigned long idx, unsigned long sz)
+{
+	/*
+	 * Warn developers about inappropriate array_idx usage.
+	 *
+	 * Even if the cpu speculates past the WARN_ONCE branch, the
+	 * sign bit of idx is taken into account when generating the
+	 * mask.
+	 *
+	 * This warning is compiled out when the compiler can infer that
+	 * idx and sz are less than LONG_MAX.
+	 */
+	if (WARN_ONCE(idx > LONG_MAX || sz > LONG_MAX,
+			"array_idx limited to range of [0, LONG_MAX]\n"))
+		return 0;
+
+	/*
+	 * Always calculate and emit the mask even if the compiler
+	 * thinks the mask is not needed. The compiler does not take
+	 * into account the value of idx under speculation.
+	 */
+	OPTIMIZER_HIDE_VAR(idx);
+	return ~(long)(idx | (sz - 1UL - idx)) >> (BITS_PER_LONG - 1);
+}
+#endif
+
+/*
+ * array_idx - sanitize an array index after a bounds check
+ *
+ * For a code sequence like:
+ *
+ *     if (idx < sz) {
+ *         idx = array_idx(idx, sz);
+ *         val = array[idx];
+ *     }
+ *
+ * ...if the cpu speculates past the bounds check then array_idx() will
+ * clamp the index within the range of [0, sz).
+ */
+#define array_idx(idx, sz)						\
+({									\
+	typeof(idx) _i = (idx);						\
+	typeof(sz) _s = (sz);						\
+	unsigned long _mask = array_idx_mask(_i, _s);			\
+									\
+	BUILD_BUG_ON(sizeof(_i) > sizeof(long));			\
+	BUILD_BUG_ON(sizeof(_s) > sizeof(long));			\
+									\
+	_i &= _mask;							\
+	_i;								\
+})
+#endif /* __NOSPEC_H__ */

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [kernel-hardening] [PATCH v5 02/12] array_idx: sanitize speculative array de-references
@ 2018-01-27  7:55   ` Dan Williams
  0 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-01-27  7:55 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, Cyril Novikov, kernel-hardening, Peter Zijlstra,
	Catalin Marinas, x86, Will Deacon, Russell King, Ingo Molnar,
	gregkh, H. Peter Anvin, torvalds, alan

'array_idx' is proposed as a generic mechanism to mitigate against
Spectre-variant-1 attacks, i.e. an attack that bypasses boundary checks
via speculative execution). The 'array_idx' implementation is expected
to be safe for current generation cpus across multiple architectures
(ARM, x86).

Based on an original implementation by Linus Torvalds, tweaked to remove
speculative flows by Alexei Starovoitov, and tweaked again by Linus to
introduce an x86 assembly implementation for the mask generation.

Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org>
Co-developed-by: Alexei Starovoitov <ast@kernel.org>
Suggested-by: Cyril Novikov <cnovikov@lynx.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/nospec.h |   64 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)
 create mode 100644 include/linux/nospec.h

diff --git a/include/linux/nospec.h b/include/linux/nospec.h
new file mode 100644
index 000000000000..f59f81889ba3
--- /dev/null
+++ b/include/linux/nospec.h
@@ -0,0 +1,64 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright(c) 2018 Intel Corporation. All rights reserved.
+
+#ifndef __NOSPEC_H__
+#define __NOSPEC_H__
+
+/*
+ * When idx is out of bounds (idx >= sz), the sign bit will be set.
+ * Extend the sign bit to all bits and invert, giving a result of zero
+ * for an out of bounds idx, or ~0UL if within bounds [0, sz).
+ */
+#ifndef array_idx_mask
+static inline unsigned long array_idx_mask(unsigned long idx, unsigned long sz)
+{
+	/*
+	 * Warn developers about inappropriate array_idx usage.
+	 *
+	 * Even if the cpu speculates past the WARN_ONCE branch, the
+	 * sign bit of idx is taken into account when generating the
+	 * mask.
+	 *
+	 * This warning is compiled out when the compiler can infer that
+	 * idx and sz are less than LONG_MAX.
+	 */
+	if (WARN_ONCE(idx > LONG_MAX || sz > LONG_MAX,
+			"array_idx limited to range of [0, LONG_MAX]\n"))
+		return 0;
+
+	/*
+	 * Always calculate and emit the mask even if the compiler
+	 * thinks the mask is not needed. The compiler does not take
+	 * into account the value of idx under speculation.
+	 */
+	OPTIMIZER_HIDE_VAR(idx);
+	return ~(long)(idx | (sz - 1UL - idx)) >> (BITS_PER_LONG - 1);
+}
+#endif
+
+/*
+ * array_idx - sanitize an array index after a bounds check
+ *
+ * For a code sequence like:
+ *
+ *     if (idx < sz) {
+ *         idx = array_idx(idx, sz);
+ *         val = array[idx];
+ *     }
+ *
+ * ...if the cpu speculates past the bounds check then array_idx() will
+ * clamp the index within the range of [0, sz).
+ */
+#define array_idx(idx, sz)						\
+({									\
+	typeof(idx) _i = (idx);						\
+	typeof(sz) _s = (sz);						\
+	unsigned long _mask = array_idx_mask(_i, _s);			\
+									\
+	BUILD_BUG_ON(sizeof(_i) > sizeof(long));			\
+	BUILD_BUG_ON(sizeof(_s) > sizeof(long));			\
+									\
+	_i &= _mask;							\
+	_i;								\
+})
+#endif /* __NOSPEC_H__ */

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v5 03/12] x86: implement array_idx_mask
  2018-01-27  7:55 ` Dan Williams
@ 2018-01-27  7:55   ` Dan Williams
  -1 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-01-27  7:55 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, kernel-hardening, gregkh, x86, Ingo Molnar,
	H. Peter Anvin, torvalds, alan

'array_idx' uses a mask to sanitize user controllable array indexes,
i.e. generate a 0 mask if idx >= sz, and a ~0 mask otherwise. While the
default array_idx_mask handles the carry-bit from the (index - size)
result in software. The x86 'array_idx_mask' does the same, but the
carry-bit is handled in the processor CF flag without conditional
instructions in the control flow.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/barrier.h |   22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 01727dbc294a..30419b674ebd 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -24,6 +24,28 @@
 #define wmb()	asm volatile("sfence" ::: "memory")
 #endif
 
+/**
+ * array_idx_mask - generate a mask for array_idx() that is ~0UL when
+ * the bounds check succeeds and 0 otherwise
+ *
+ *     mask = 0 - (idx < sz);
+ */
+#define array_idx_mask array_idx_mask
+static inline unsigned long array_idx_mask(unsigned long idx, unsigned long sz)
+{
+	unsigned long mask;
+
+#ifdef CONFIG_X86_32
+	asm ("cmpl %1,%2; sbbl %0,%0;"
+#else
+	asm ("cmpq %1,%2; sbbq %0,%0;"
+#endif
+			:"=r" (mask)
+			:"r"(sz),"r" (idx)
+			:"cc");
+	return mask;
+}
+
 #ifdef CONFIG_X86_PPRO_FENCE
 #define dma_rmb()	rmb()
 #else

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [kernel-hardening] [PATCH v5 03/12] x86: implement array_idx_mask
@ 2018-01-27  7:55   ` Dan Williams
  0 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-01-27  7:55 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, kernel-hardening, gregkh, x86, Ingo Molnar,
	H. Peter Anvin, torvalds, alan

'array_idx' uses a mask to sanitize user controllable array indexes,
i.e. generate a 0 mask if idx >= sz, and a ~0 mask otherwise. While the
default array_idx_mask handles the carry-bit from the (index - size)
result in software. The x86 'array_idx_mask' does the same, but the
carry-bit is handled in the processor CF flag without conditional
instructions in the control flow.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/barrier.h |   22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 01727dbc294a..30419b674ebd 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -24,6 +24,28 @@
 #define wmb()	asm volatile("sfence" ::: "memory")
 #endif
 
+/**
+ * array_idx_mask - generate a mask for array_idx() that is ~0UL when
+ * the bounds check succeeds and 0 otherwise
+ *
+ *     mask = 0 - (idx < sz);
+ */
+#define array_idx_mask array_idx_mask
+static inline unsigned long array_idx_mask(unsigned long idx, unsigned long sz)
+{
+	unsigned long mask;
+
+#ifdef CONFIG_X86_32
+	asm ("cmpl %1,%2; sbbl %0,%0;"
+#else
+	asm ("cmpq %1,%2; sbbq %0,%0;"
+#endif
+			:"=r" (mask)
+			:"r"(sz),"r" (idx)
+			:"cc");
+	return mask;
+}
+
 #ifdef CONFIG_X86_PPRO_FENCE
 #define dma_rmb()	rmb()
 #else

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v5 04/12] x86: introduce __uaccess_begin_nospec and ifence
  2018-01-27  7:55 ` Dan Williams
@ 2018-01-27  7:55   ` Dan Williams
  -1 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-01-27  7:55 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, Tom Lendacky, Andi Kleen, Kees Cook,
	kernel-hardening, gregkh, x86, Ingo Molnar, Al Viro,
	H. Peter Anvin, torvalds, alan

For '__get_user' paths, do not allow the kernel to speculate on the
value of a user controlled pointer. In addition to the 'stac'
instruction for Supervisor Mode Access Protection, an 'ifence' causes
the 'access_ok' result to resolve in the pipeline before the cpu might
take any speculative action on the pointer value.

Since __get_user is a major kernel interface that deals with user
controlled pointers, the '__uaccess_begin_nospec' mechanism will prevent
speculative execution past an 'access_ok' permission check. While
speculative execution past 'access_ok' is not enough to lead to a kernel
memory leak, it is a necessary precondition.

To be clear, '__uaccess_begin_nospec' is addressing a class of potential
problems near '__get_user' usages.

Note, that while ifence is used to protect '__get_user', pointer masking
will be used for 'get_user' since it incorporates a bounds check near
the usage.

There are no functional changes in this patch.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Andi Kleen <ak@linux.intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/barrier.h |    4 ++++
 arch/x86/include/asm/msr.h     |    3 +--
 arch/x86/include/asm/uaccess.h |    9 +++++++++
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 30419b674ebd..5f11d4c5c862 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -46,6 +46,10 @@ static inline unsigned long array_idx_mask(unsigned long idx, unsigned long sz)
 	return mask;
 }
 
+/* prevent speculative execution past this barrier */
+#define ifence() alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC, \
+				   "lfence", X86_FEATURE_LFENCE_RDTSC)
+
 #ifdef CONFIG_X86_PPRO_FENCE
 #define dma_rmb()	rmb()
 #else
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 07962f5f6fba..e426d2a33ff3 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -214,8 +214,7 @@ static __always_inline unsigned long long rdtsc_ordered(void)
 	 * that some other imaginary CPU is updating continuously with a
 	 * time stamp.
 	 */
-	alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC,
-			  "lfence", X86_FEATURE_LFENCE_RDTSC);
+	ifence();
 	return rdtsc();
 }
 
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 574dff4d2913..626caf58183a 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -124,6 +124,11 @@ extern int __get_user_bad(void);
 
 #define __uaccess_begin() stac()
 #define __uaccess_end()   clac()
+#define __uaccess_begin_nospec()	\
+({					\
+	stac();				\
+	ifence();			\
+})
 
 /*
  * This is a type: either unsigned long, if the argument fits into
@@ -487,6 +492,10 @@ struct __large_struct { unsigned long buf[100]; };
 	__uaccess_begin();						\
 	barrier();
 
+#define uaccess_try_nospec do {						\
+	current->thread.uaccess_err = 0;				\
+	__uaccess_begin_nospec();					\
+
 #define uaccess_catch(err)						\
 	__uaccess_end();						\
 	(err) |= (current->thread.uaccess_err ? -EFAULT : 0);		\

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [kernel-hardening] [PATCH v5 04/12] x86: introduce __uaccess_begin_nospec and ifence
@ 2018-01-27  7:55   ` Dan Williams
  0 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-01-27  7:55 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, Tom Lendacky, Andi Kleen, Kees Cook,
	kernel-hardening, gregkh, x86, Ingo Molnar, Al Viro,
	H. Peter Anvin, torvalds, alan

For '__get_user' paths, do not allow the kernel to speculate on the
value of a user controlled pointer. In addition to the 'stac'
instruction for Supervisor Mode Access Protection, an 'ifence' causes
the 'access_ok' result to resolve in the pipeline before the cpu might
take any speculative action on the pointer value.

Since __get_user is a major kernel interface that deals with user
controlled pointers, the '__uaccess_begin_nospec' mechanism will prevent
speculative execution past an 'access_ok' permission check. While
speculative execution past 'access_ok' is not enough to lead to a kernel
memory leak, it is a necessary precondition.

To be clear, '__uaccess_begin_nospec' is addressing a class of potential
problems near '__get_user' usages.

Note, that while ifence is used to protect '__get_user', pointer masking
will be used for 'get_user' since it incorporates a bounds check near
the usage.

There are no functional changes in this patch.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Andi Kleen <ak@linux.intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/barrier.h |    4 ++++
 arch/x86/include/asm/msr.h     |    3 +--
 arch/x86/include/asm/uaccess.h |    9 +++++++++
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 30419b674ebd..5f11d4c5c862 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -46,6 +46,10 @@ static inline unsigned long array_idx_mask(unsigned long idx, unsigned long sz)
 	return mask;
 }
 
+/* prevent speculative execution past this barrier */
+#define ifence() alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC, \
+				   "lfence", X86_FEATURE_LFENCE_RDTSC)
+
 #ifdef CONFIG_X86_PPRO_FENCE
 #define dma_rmb()	rmb()
 #else
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 07962f5f6fba..e426d2a33ff3 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -214,8 +214,7 @@ static __always_inline unsigned long long rdtsc_ordered(void)
 	 * that some other imaginary CPU is updating continuously with a
 	 * time stamp.
 	 */
-	alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC,
-			  "lfence", X86_FEATURE_LFENCE_RDTSC);
+	ifence();
 	return rdtsc();
 }
 
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 574dff4d2913..626caf58183a 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -124,6 +124,11 @@ extern int __get_user_bad(void);
 
 #define __uaccess_begin() stac()
 #define __uaccess_end()   clac()
+#define __uaccess_begin_nospec()	\
+({					\
+	stac();				\
+	ifence();			\
+})
 
 /*
  * This is a type: either unsigned long, if the argument fits into
@@ -487,6 +492,10 @@ struct __large_struct { unsigned long buf[100]; };
 	__uaccess_begin();						\
 	barrier();
 
+#define uaccess_try_nospec do {						\
+	current->thread.uaccess_err = 0;				\
+	__uaccess_begin_nospec();					\
+
 #define uaccess_catch(err)						\
 	__uaccess_end();						\
 	(err) |= (current->thread.uaccess_err ? -EFAULT : 0);		\

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v5 05/12] x86, __get_user: use __uaccess_begin_nospec
  2018-01-27  7:55 ` Dan Williams
@ 2018-01-27  7:55   ` Dan Williams
  -1 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-01-27  7:55 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, Andi Kleen, Kees Cook, kernel-hardening, gregkh, x86,
	Ingo Molnar, Al Viro, H. Peter Anvin, torvalds, alan

Quoting Linus:

    I do think that it would be a good idea to very expressly document
    the fact that it's not that the user access itself is unsafe. I do
    agree that things like "get_user()" want to be protected, but not
    because of any direct bugs or problems with get_user() and friends,
    but simply because get_user() is an excellent source of a pointer
    that is obviously controlled from a potentially attacking user
    space. So it's a prime candidate for then finding _subsequent_
    accesses that can then be used to perturb the cache.

'__uaccess_begin_nospec' covers '__get_user' and 'copy_from_iter' where
the limit check is far away from the user pointer de-reference. In those
cases an 'lfence' prevents speculation with a potential pointer to
privileged memory.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Andi Kleen <ak@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/uaccess.h    |    6 +++---
 arch/x86/include/asm/uaccess_32.h |    6 +++---
 arch/x86/include/asm/uaccess_64.h |   12 ++++++------
 arch/x86/lib/usercopy_32.c        |    8 ++++----
 4 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 626caf58183a..a930585fa3b5 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -450,7 +450,7 @@ do {									\
 ({									\
 	int __gu_err;							\
 	__inttype(*(ptr)) __gu_val;					\
-	__uaccess_begin();						\
+	__uaccess_begin_nospec();					\
 	__get_user_size(__gu_val, (ptr), (size), __gu_err, -EFAULT);	\
 	__uaccess_end();						\
 	(x) = (__force __typeof__(*(ptr)))__gu_val;			\
@@ -557,7 +557,7 @@ struct __large_struct { unsigned long buf[100]; };
  *	get_user_ex(...);
  * } get_user_catch(err)
  */
-#define get_user_try		uaccess_try
+#define get_user_try		uaccess_try_nospec
 #define get_user_catch(err)	uaccess_catch(err)
 
 #define get_user_ex(x, ptr)	do {					\
@@ -591,7 +591,7 @@ extern void __cmpxchg_wrong_size(void)
 	__typeof__(ptr) __uval = (uval);				\
 	__typeof__(*(ptr)) __old = (old);				\
 	__typeof__(*(ptr)) __new = (new);				\
-	__uaccess_begin();						\
+	__uaccess_begin_nospec();					\
 	switch (size) {							\
 	case 1:								\
 	{								\
diff --git a/arch/x86/include/asm/uaccess_32.h b/arch/x86/include/asm/uaccess_32.h
index 72950401b223..ba2dc1930630 100644
--- a/arch/x86/include/asm/uaccess_32.h
+++ b/arch/x86/include/asm/uaccess_32.h
@@ -29,21 +29,21 @@ raw_copy_from_user(void *to, const void __user *from, unsigned long n)
 		switch (n) {
 		case 1:
 			ret = 0;
-			__uaccess_begin();
+			__uaccess_begin_nospec();
 			__get_user_asm_nozero(*(u8 *)to, from, ret,
 					      "b", "b", "=q", 1);
 			__uaccess_end();
 			return ret;
 		case 2:
 			ret = 0;
-			__uaccess_begin();
+			__uaccess_begin_nospec();
 			__get_user_asm_nozero(*(u16 *)to, from, ret,
 					      "w", "w", "=r", 2);
 			__uaccess_end();
 			return ret;
 		case 4:
 			ret = 0;
-			__uaccess_begin();
+			__uaccess_begin_nospec();
 			__get_user_asm_nozero(*(u32 *)to, from, ret,
 					      "l", "k", "=r", 4);
 			__uaccess_end();
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index f07ef3c575db..62546b3a398e 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -55,31 +55,31 @@ raw_copy_from_user(void *dst, const void __user *src, unsigned long size)
 		return copy_user_generic(dst, (__force void *)src, size);
 	switch (size) {
 	case 1:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u8 *)dst, (u8 __user *)src,
 			      ret, "b", "b", "=q", 1);
 		__uaccess_end();
 		return ret;
 	case 2:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u16 *)dst, (u16 __user *)src,
 			      ret, "w", "w", "=r", 2);
 		__uaccess_end();
 		return ret;
 	case 4:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u32 *)dst, (u32 __user *)src,
 			      ret, "l", "k", "=r", 4);
 		__uaccess_end();
 		return ret;
 	case 8:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
 			      ret, "q", "", "=r", 8);
 		__uaccess_end();
 		return ret;
 	case 10:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
 			       ret, "q", "", "=r", 10);
 		if (likely(!ret))
@@ -89,7 +89,7 @@ raw_copy_from_user(void *dst, const void __user *src, unsigned long size)
 		__uaccess_end();
 		return ret;
 	case 16:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
 			       ret, "q", "", "=r", 16);
 		if (likely(!ret))
diff --git a/arch/x86/lib/usercopy_32.c b/arch/x86/lib/usercopy_32.c
index 1b377f734e64..7add8ba06887 100644
--- a/arch/x86/lib/usercopy_32.c
+++ b/arch/x86/lib/usercopy_32.c
@@ -331,12 +331,12 @@ do {									\
 
 unsigned long __copy_user_ll(void *to, const void *from, unsigned long n)
 {
-	stac();
+	__uaccess_begin_nospec();
 	if (movsl_is_ok(to, from, n))
 		__copy_user(to, from, n);
 	else
 		n = __copy_user_intel(to, from, n);
-	clac();
+	__uaccess_end();
 	return n;
 }
 EXPORT_SYMBOL(__copy_user_ll);
@@ -344,7 +344,7 @@ EXPORT_SYMBOL(__copy_user_ll);
 unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *from,
 					unsigned long n)
 {
-	stac();
+	__uaccess_begin_nospec();
 #ifdef CONFIG_X86_INTEL_USERCOPY
 	if (n > 64 && static_cpu_has(X86_FEATURE_XMM2))
 		n = __copy_user_intel_nocache(to, from, n);
@@ -353,7 +353,7 @@ unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *fr
 #else
 	__copy_user(to, from, n);
 #endif
-	clac();
+	__uaccess_end();
 	return n;
 }
 EXPORT_SYMBOL(__copy_from_user_ll_nocache_nozero);

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [kernel-hardening] [PATCH v5 05/12] x86, __get_user: use __uaccess_begin_nospec
@ 2018-01-27  7:55   ` Dan Williams
  0 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-01-27  7:55 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, Andi Kleen, Kees Cook, kernel-hardening, gregkh, x86,
	Ingo Molnar, Al Viro, H. Peter Anvin, torvalds, alan

Quoting Linus:

    I do think that it would be a good idea to very expressly document
    the fact that it's not that the user access itself is unsafe. I do
    agree that things like "get_user()" want to be protected, but not
    because of any direct bugs or problems with get_user() and friends,
    but simply because get_user() is an excellent source of a pointer
    that is obviously controlled from a potentially attacking user
    space. So it's a prime candidate for then finding _subsequent_
    accesses that can then be used to perturb the cache.

'__uaccess_begin_nospec' covers '__get_user' and 'copy_from_iter' where
the limit check is far away from the user pointer de-reference. In those
cases an 'lfence' prevents speculation with a potential pointer to
privileged memory.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Andi Kleen <ak@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/uaccess.h    |    6 +++---
 arch/x86/include/asm/uaccess_32.h |    6 +++---
 arch/x86/include/asm/uaccess_64.h |   12 ++++++------
 arch/x86/lib/usercopy_32.c        |    8 ++++----
 4 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 626caf58183a..a930585fa3b5 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -450,7 +450,7 @@ do {									\
 ({									\
 	int __gu_err;							\
 	__inttype(*(ptr)) __gu_val;					\
-	__uaccess_begin();						\
+	__uaccess_begin_nospec();					\
 	__get_user_size(__gu_val, (ptr), (size), __gu_err, -EFAULT);	\
 	__uaccess_end();						\
 	(x) = (__force __typeof__(*(ptr)))__gu_val;			\
@@ -557,7 +557,7 @@ struct __large_struct { unsigned long buf[100]; };
  *	get_user_ex(...);
  * } get_user_catch(err)
  */
-#define get_user_try		uaccess_try
+#define get_user_try		uaccess_try_nospec
 #define get_user_catch(err)	uaccess_catch(err)
 
 #define get_user_ex(x, ptr)	do {					\
@@ -591,7 +591,7 @@ extern void __cmpxchg_wrong_size(void)
 	__typeof__(ptr) __uval = (uval);				\
 	__typeof__(*(ptr)) __old = (old);				\
 	__typeof__(*(ptr)) __new = (new);				\
-	__uaccess_begin();						\
+	__uaccess_begin_nospec();					\
 	switch (size) {							\
 	case 1:								\
 	{								\
diff --git a/arch/x86/include/asm/uaccess_32.h b/arch/x86/include/asm/uaccess_32.h
index 72950401b223..ba2dc1930630 100644
--- a/arch/x86/include/asm/uaccess_32.h
+++ b/arch/x86/include/asm/uaccess_32.h
@@ -29,21 +29,21 @@ raw_copy_from_user(void *to, const void __user *from, unsigned long n)
 		switch (n) {
 		case 1:
 			ret = 0;
-			__uaccess_begin();
+			__uaccess_begin_nospec();
 			__get_user_asm_nozero(*(u8 *)to, from, ret,
 					      "b", "b", "=q", 1);
 			__uaccess_end();
 			return ret;
 		case 2:
 			ret = 0;
-			__uaccess_begin();
+			__uaccess_begin_nospec();
 			__get_user_asm_nozero(*(u16 *)to, from, ret,
 					      "w", "w", "=r", 2);
 			__uaccess_end();
 			return ret;
 		case 4:
 			ret = 0;
-			__uaccess_begin();
+			__uaccess_begin_nospec();
 			__get_user_asm_nozero(*(u32 *)to, from, ret,
 					      "l", "k", "=r", 4);
 			__uaccess_end();
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index f07ef3c575db..62546b3a398e 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -55,31 +55,31 @@ raw_copy_from_user(void *dst, const void __user *src, unsigned long size)
 		return copy_user_generic(dst, (__force void *)src, size);
 	switch (size) {
 	case 1:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u8 *)dst, (u8 __user *)src,
 			      ret, "b", "b", "=q", 1);
 		__uaccess_end();
 		return ret;
 	case 2:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u16 *)dst, (u16 __user *)src,
 			      ret, "w", "w", "=r", 2);
 		__uaccess_end();
 		return ret;
 	case 4:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u32 *)dst, (u32 __user *)src,
 			      ret, "l", "k", "=r", 4);
 		__uaccess_end();
 		return ret;
 	case 8:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
 			      ret, "q", "", "=r", 8);
 		__uaccess_end();
 		return ret;
 	case 10:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
 			       ret, "q", "", "=r", 10);
 		if (likely(!ret))
@@ -89,7 +89,7 @@ raw_copy_from_user(void *dst, const void __user *src, unsigned long size)
 		__uaccess_end();
 		return ret;
 	case 16:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
 			       ret, "q", "", "=r", 16);
 		if (likely(!ret))
diff --git a/arch/x86/lib/usercopy_32.c b/arch/x86/lib/usercopy_32.c
index 1b377f734e64..7add8ba06887 100644
--- a/arch/x86/lib/usercopy_32.c
+++ b/arch/x86/lib/usercopy_32.c
@@ -331,12 +331,12 @@ do {									\
 
 unsigned long __copy_user_ll(void *to, const void *from, unsigned long n)
 {
-	stac();
+	__uaccess_begin_nospec();
 	if (movsl_is_ok(to, from, n))
 		__copy_user(to, from, n);
 	else
 		n = __copy_user_intel(to, from, n);
-	clac();
+	__uaccess_end();
 	return n;
 }
 EXPORT_SYMBOL(__copy_user_ll);
@@ -344,7 +344,7 @@ EXPORT_SYMBOL(__copy_user_ll);
 unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *from,
 					unsigned long n)
 {
-	stac();
+	__uaccess_begin_nospec();
 #ifdef CONFIG_X86_INTEL_USERCOPY
 	if (n > 64 && static_cpu_has(X86_FEATURE_XMM2))
 		n = __copy_user_intel_nocache(to, from, n);
@@ -353,7 +353,7 @@ unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *fr
 #else
 	__copy_user(to, from, n);
 #endif
-	clac();
+	__uaccess_end();
 	return n;
 }
 EXPORT_SYMBOL(__copy_from_user_ll_nocache_nozero);

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v5 06/12] x86, get_user: use pointer masking to limit speculation
  2018-01-27  7:55 ` Dan Williams
@ 2018-01-27  7:55   ` Dan Williams
  -1 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-01-27  7:55 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, Kees Cook, kernel-hardening, gregkh, x86,
	Ingo Molnar, Al Viro, Andy Lutomirski, H. Peter Anvin, torvalds,
	alan

Quoting Linus:

    I do think that it would be a good idea to very expressly document
    the fact that it's not that the user access itself is unsafe. I do
    agree that things like "get_user()" want to be protected, but not
    because of any direct bugs or problems with get_user() and friends,
    but simply because get_user() is an excellent source of a pointer
    that is obviously controlled from a potentially attacking user
    space. So it's a prime candidate for then finding _subsequent_
    accesses that can then be used to perturb the cache.

Unlike the '__get_user' case 'get_user' includes the address limit check
near the pointer de-reference. With that locality the speculation can be
mitigated with pointer narrowing rather than a barrier. Where the
narrowing is performed by:

	cmp %limit, %ptr
	sbb %mask, %mask
	and %mask, %ptr

With respect to speculation the value of %ptr is either less than %limit
or NULL.

Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/lib/getuser.S |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/lib/getuser.S b/arch/x86/lib/getuser.S
index c97d935a29e8..e7a1a9421998 100644
--- a/arch/x86/lib/getuser.S
+++ b/arch/x86/lib/getuser.S
@@ -40,6 +40,8 @@ ENTRY(__get_user_1)
 	mov PER_CPU_VAR(current_task), %_ASM_DX
 	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
 	jae bad_get_user
+	sbb %_ASM_DX, %_ASM_DX		/* 0 - (uptr < addr_limit) */
+	and %_ASM_DX, %_ASM_AX
 	ASM_STAC
 1:	movzbl (%_ASM_AX),%edx
 	xor %eax,%eax
@@ -54,6 +56,8 @@ ENTRY(__get_user_2)
 	mov PER_CPU_VAR(current_task), %_ASM_DX
 	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
 	jae bad_get_user
+	sbb %_ASM_DX, %_ASM_DX		/* 0 - (uptr < addr_limit) */
+	and %_ASM_DX, %_ASM_AX
 	ASM_STAC
 2:	movzwl -1(%_ASM_AX),%edx
 	xor %eax,%eax
@@ -68,6 +72,8 @@ ENTRY(__get_user_4)
 	mov PER_CPU_VAR(current_task), %_ASM_DX
 	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
 	jae bad_get_user
+	sbb %_ASM_DX, %_ASM_DX		/* 0 - (uptr < addr_limit) */
+	and %_ASM_DX, %_ASM_AX
 	ASM_STAC
 3:	movl -3(%_ASM_AX),%edx
 	xor %eax,%eax
@@ -83,6 +89,8 @@ ENTRY(__get_user_8)
 	mov PER_CPU_VAR(current_task), %_ASM_DX
 	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
 	jae bad_get_user
+	sbb %_ASM_DX, %_ASM_DX		/* 0 - (uptr < addr_limit) */
+	and %_ASM_DX, %_ASM_AX
 	ASM_STAC
 4:	movq -7(%_ASM_AX),%rdx
 	xor %eax,%eax
@@ -94,6 +102,8 @@ ENTRY(__get_user_8)
 	mov PER_CPU_VAR(current_task), %_ASM_DX
 	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
 	jae bad_get_user_8
+	sbb %_ASM_DX, %_ASM_DX		/* 0 - (uptr < addr_limit) */
+	and %_ASM_DX, %_ASM_AX
 	ASM_STAC
 4:	movl -7(%_ASM_AX),%edx
 5:	movl -3(%_ASM_AX),%ecx

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [kernel-hardening] [PATCH v5 06/12] x86, get_user: use pointer masking to limit speculation
@ 2018-01-27  7:55   ` Dan Williams
  0 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-01-27  7:55 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, Kees Cook, kernel-hardening, gregkh, x86,
	Ingo Molnar, Al Viro, Andy Lutomirski, H. Peter Anvin, torvalds,
	alan

Quoting Linus:

    I do think that it would be a good idea to very expressly document
    the fact that it's not that the user access itself is unsafe. I do
    agree that things like "get_user()" want to be protected, but not
    because of any direct bugs or problems with get_user() and friends,
    but simply because get_user() is an excellent source of a pointer
    that is obviously controlled from a potentially attacking user
    space. So it's a prime candidate for then finding _subsequent_
    accesses that can then be used to perturb the cache.

Unlike the '__get_user' case 'get_user' includes the address limit check
near the pointer de-reference. With that locality the speculation can be
mitigated with pointer narrowing rather than a barrier. Where the
narrowing is performed by:

	cmp %limit, %ptr
	sbb %mask, %mask
	and %mask, %ptr

With respect to speculation the value of %ptr is either less than %limit
or NULL.

Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/lib/getuser.S |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/lib/getuser.S b/arch/x86/lib/getuser.S
index c97d935a29e8..e7a1a9421998 100644
--- a/arch/x86/lib/getuser.S
+++ b/arch/x86/lib/getuser.S
@@ -40,6 +40,8 @@ ENTRY(__get_user_1)
 	mov PER_CPU_VAR(current_task), %_ASM_DX
 	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
 	jae bad_get_user
+	sbb %_ASM_DX, %_ASM_DX		/* 0 - (uptr < addr_limit) */
+	and %_ASM_DX, %_ASM_AX
 	ASM_STAC
 1:	movzbl (%_ASM_AX),%edx
 	xor %eax,%eax
@@ -54,6 +56,8 @@ ENTRY(__get_user_2)
 	mov PER_CPU_VAR(current_task), %_ASM_DX
 	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
 	jae bad_get_user
+	sbb %_ASM_DX, %_ASM_DX		/* 0 - (uptr < addr_limit) */
+	and %_ASM_DX, %_ASM_AX
 	ASM_STAC
 2:	movzwl -1(%_ASM_AX),%edx
 	xor %eax,%eax
@@ -68,6 +72,8 @@ ENTRY(__get_user_4)
 	mov PER_CPU_VAR(current_task), %_ASM_DX
 	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
 	jae bad_get_user
+	sbb %_ASM_DX, %_ASM_DX		/* 0 - (uptr < addr_limit) */
+	and %_ASM_DX, %_ASM_AX
 	ASM_STAC
 3:	movl -3(%_ASM_AX),%edx
 	xor %eax,%eax
@@ -83,6 +89,8 @@ ENTRY(__get_user_8)
 	mov PER_CPU_VAR(current_task), %_ASM_DX
 	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
 	jae bad_get_user
+	sbb %_ASM_DX, %_ASM_DX		/* 0 - (uptr < addr_limit) */
+	and %_ASM_DX, %_ASM_AX
 	ASM_STAC
 4:	movq -7(%_ASM_AX),%rdx
 	xor %eax,%eax
@@ -94,6 +102,8 @@ ENTRY(__get_user_8)
 	mov PER_CPU_VAR(current_task), %_ASM_DX
 	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
 	jae bad_get_user_8
+	sbb %_ASM_DX, %_ASM_DX		/* 0 - (uptr < addr_limit) */
+	and %_ASM_DX, %_ASM_AX
 	ASM_STAC
 4:	movl -7(%_ASM_AX),%edx
 5:	movl -3(%_ASM_AX),%ecx

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v5 07/12] x86: remove the syscall_64 fast-path
  2018-01-27  7:55 ` Dan Williams
@ 2018-01-27  7:55   ` Dan Williams
  -1 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-01-27  7:55 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, kernel-hardening, gregkh, x86, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, torvalds, alan

Quoting Linus:

  "Honestly, I'd rather get rid of the fast-path entirely. Compared to
   all the PTI mess, it's not even noticeable.

   And if we ever get CPU's that have this all fixed, we can re-visit
   introducing the fastpath. But this is all very messy and it doesn't
   seem worth it right now.

   If we get rid of the fastpath, we can lay out the slow path slightly
   better, and get rid of some of those jump-overs. And we'd get rid of
   the ptregs hooks entirely.

   So we can try to make the "slow" path better while at it, but I
   really don't think it matters much now in the post-PTI era. Sadly."

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/entry/entry_64.S   |  116 -------------------------------------------
 arch/x86/entry/syscall_64.c |    7 +--
 2 files changed, 2 insertions(+), 121 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 63f4320602a3..0c18b127ec10 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -237,80 +237,6 @@ GLOBAL(entry_SYSCALL_64_after_hwframe)
 	sub	$(6*8), %rsp			/* pt_regs->bp, bx, r12-15 not saved */
 	UNWIND_HINT_REGS extra=0
 
-	/*
-	 * If we need to do entry work or if we guess we'll need to do
-	 * exit work, go straight to the slow path.
-	 */
-	movq	PER_CPU_VAR(current_task), %r11
-	testl	$_TIF_WORK_SYSCALL_ENTRY|_TIF_ALLWORK_MASK, TASK_TI_flags(%r11)
-	jnz	entry_SYSCALL64_slow_path
-
-entry_SYSCALL_64_fastpath:
-	/*
-	 * Easy case: enable interrupts and issue the syscall.  If the syscall
-	 * needs pt_regs, we'll call a stub that disables interrupts again
-	 * and jumps to the slow path.
-	 */
-	TRACE_IRQS_ON
-	ENABLE_INTERRUPTS(CLBR_NONE)
-#if __SYSCALL_MASK == ~0
-	cmpq	$__NR_syscall_max, %rax
-#else
-	andl	$__SYSCALL_MASK, %eax
-	cmpl	$__NR_syscall_max, %eax
-#endif
-	ja	1f				/* return -ENOSYS (already in pt_regs->ax) */
-	movq	%r10, %rcx
-
-	/*
-	 * This call instruction is handled specially in stub_ptregs_64.
-	 * It might end up jumping to the slow path.  If it jumps, RAX
-	 * and all argument registers are clobbered.
-	 */
-#ifdef CONFIG_RETPOLINE
-	movq	sys_call_table(, %rax, 8), %rax
-	call	__x86_indirect_thunk_rax
-#else
-	call	*sys_call_table(, %rax, 8)
-#endif
-.Lentry_SYSCALL_64_after_fastpath_call:
-
-	movq	%rax, RAX(%rsp)
-1:
-
-	/*
-	 * If we get here, then we know that pt_regs is clean for SYSRET64.
-	 * If we see that no exit work is required (which we are required
-	 * to check with IRQs off), then we can go straight to SYSRET64.
-	 */
-	DISABLE_INTERRUPTS(CLBR_ANY)
-	TRACE_IRQS_OFF
-	movq	PER_CPU_VAR(current_task), %r11
-	testl	$_TIF_ALLWORK_MASK, TASK_TI_flags(%r11)
-	jnz	1f
-
-	LOCKDEP_SYS_EXIT
-	TRACE_IRQS_ON		/* user mode is traced as IRQs on */
-	movq	RIP(%rsp), %rcx
-	movq	EFLAGS(%rsp), %r11
-	addq	$6*8, %rsp	/* skip extra regs -- they were preserved */
-	UNWIND_HINT_EMPTY
-	jmp	.Lpop_c_regs_except_rcx_r11_and_sysret
-
-1:
-	/*
-	 * The fast path looked good when we started, but something changed
-	 * along the way and we need to switch to the slow path.  Calling
-	 * raise(3) will trigger this, for example.  IRQs are off.
-	 */
-	TRACE_IRQS_ON
-	ENABLE_INTERRUPTS(CLBR_ANY)
-	SAVE_EXTRA_REGS
-	movq	%rsp, %rdi
-	call	syscall_return_slowpath	/* returns with IRQs disabled */
-	jmp	return_from_SYSCALL_64
-
-entry_SYSCALL64_slow_path:
 	/* IRQs are off. */
 	SAVE_EXTRA_REGS
 	movq	%rsp, %rdi
@@ -389,7 +315,6 @@ syscall_return_via_sysret:
 	/* rcx and r11 are already restored (see code above) */
 	UNWIND_HINT_EMPTY
 	POP_EXTRA_REGS
-.Lpop_c_regs_except_rcx_r11_and_sysret:
 	popq	%rsi	/* skip r11 */
 	popq	%r10
 	popq	%r9
@@ -420,47 +345,6 @@ syscall_return_via_sysret:
 	USERGS_SYSRET64
 END(entry_SYSCALL_64)
 
-ENTRY(stub_ptregs_64)
-	/*
-	 * Syscalls marked as needing ptregs land here.
-	 * If we are on the fast path, we need to save the extra regs,
-	 * which we achieve by trying again on the slow path.  If we are on
-	 * the slow path, the extra regs are already saved.
-	 *
-	 * RAX stores a pointer to the C function implementing the syscall.
-	 * IRQs are on.
-	 */
-	cmpq	$.Lentry_SYSCALL_64_after_fastpath_call, (%rsp)
-	jne	1f
-
-	/*
-	 * Called from fast path -- disable IRQs again, pop return address
-	 * and jump to slow path
-	 */
-	DISABLE_INTERRUPTS(CLBR_ANY)
-	TRACE_IRQS_OFF
-	popq	%rax
-	UNWIND_HINT_REGS extra=0
-	jmp	entry_SYSCALL64_slow_path
-
-1:
-	JMP_NOSPEC %rax				/* Called from C */
-END(stub_ptregs_64)
-
-.macro ptregs_stub func
-ENTRY(ptregs_\func)
-	UNWIND_HINT_FUNC
-	leaq	\func(%rip), %rax
-	jmp	stub_ptregs_64
-END(ptregs_\func)
-.endm
-
-/* Instantiate ptregs_stub for each ptregs-using syscall */
-#define __SYSCALL_64_QUAL_(sym)
-#define __SYSCALL_64_QUAL_ptregs(sym) ptregs_stub sym
-#define __SYSCALL_64(nr, sym, qual) __SYSCALL_64_QUAL_##qual(sym)
-#include <asm/syscalls_64.h>
-
 /*
  * %rdi: prev task
  * %rsi: next task
diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
index 9c09775e589d..c176d2fab1da 100644
--- a/arch/x86/entry/syscall_64.c
+++ b/arch/x86/entry/syscall_64.c
@@ -7,14 +7,11 @@
 #include <asm/asm-offsets.h>
 #include <asm/syscall.h>
 
-#define __SYSCALL_64_QUAL_(sym) sym
-#define __SYSCALL_64_QUAL_ptregs(sym) ptregs_##sym
-
-#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long __SYSCALL_64_QUAL_##qual(sym)(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
+#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
 #include <asm/syscalls_64.h>
 #undef __SYSCALL_64
 
-#define __SYSCALL_64(nr, sym, qual) [nr] = __SYSCALL_64_QUAL_##qual(sym),
+#define __SYSCALL_64(nr, sym, qual) [nr] = sym,
 
 extern long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
 

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [kernel-hardening] [PATCH v5 07/12] x86: remove the syscall_64 fast-path
@ 2018-01-27  7:55   ` Dan Williams
  0 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-01-27  7:55 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, kernel-hardening, gregkh, x86, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, torvalds, alan

Quoting Linus:

  "Honestly, I'd rather get rid of the fast-path entirely. Compared to
   all the PTI mess, it's not even noticeable.

   And if we ever get CPU's that have this all fixed, we can re-visit
   introducing the fastpath. But this is all very messy and it doesn't
   seem worth it right now.

   If we get rid of the fastpath, we can lay out the slow path slightly
   better, and get rid of some of those jump-overs. And we'd get rid of
   the ptregs hooks entirely.

   So we can try to make the "slow" path better while at it, but I
   really don't think it matters much now in the post-PTI era. Sadly."

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/entry/entry_64.S   |  116 -------------------------------------------
 arch/x86/entry/syscall_64.c |    7 +--
 2 files changed, 2 insertions(+), 121 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 63f4320602a3..0c18b127ec10 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -237,80 +237,6 @@ GLOBAL(entry_SYSCALL_64_after_hwframe)
 	sub	$(6*8), %rsp			/* pt_regs->bp, bx, r12-15 not saved */
 	UNWIND_HINT_REGS extra=0
 
-	/*
-	 * If we need to do entry work or if we guess we'll need to do
-	 * exit work, go straight to the slow path.
-	 */
-	movq	PER_CPU_VAR(current_task), %r11
-	testl	$_TIF_WORK_SYSCALL_ENTRY|_TIF_ALLWORK_MASK, TASK_TI_flags(%r11)
-	jnz	entry_SYSCALL64_slow_path
-
-entry_SYSCALL_64_fastpath:
-	/*
-	 * Easy case: enable interrupts and issue the syscall.  If the syscall
-	 * needs pt_regs, we'll call a stub that disables interrupts again
-	 * and jumps to the slow path.
-	 */
-	TRACE_IRQS_ON
-	ENABLE_INTERRUPTS(CLBR_NONE)
-#if __SYSCALL_MASK == ~0
-	cmpq	$__NR_syscall_max, %rax
-#else
-	andl	$__SYSCALL_MASK, %eax
-	cmpl	$__NR_syscall_max, %eax
-#endif
-	ja	1f				/* return -ENOSYS (already in pt_regs->ax) */
-	movq	%r10, %rcx
-
-	/*
-	 * This call instruction is handled specially in stub_ptregs_64.
-	 * It might end up jumping to the slow path.  If it jumps, RAX
-	 * and all argument registers are clobbered.
-	 */
-#ifdef CONFIG_RETPOLINE
-	movq	sys_call_table(, %rax, 8), %rax
-	call	__x86_indirect_thunk_rax
-#else
-	call	*sys_call_table(, %rax, 8)
-#endif
-.Lentry_SYSCALL_64_after_fastpath_call:
-
-	movq	%rax, RAX(%rsp)
-1:
-
-	/*
-	 * If we get here, then we know that pt_regs is clean for SYSRET64.
-	 * If we see that no exit work is required (which we are required
-	 * to check with IRQs off), then we can go straight to SYSRET64.
-	 */
-	DISABLE_INTERRUPTS(CLBR_ANY)
-	TRACE_IRQS_OFF
-	movq	PER_CPU_VAR(current_task), %r11
-	testl	$_TIF_ALLWORK_MASK, TASK_TI_flags(%r11)
-	jnz	1f
-
-	LOCKDEP_SYS_EXIT
-	TRACE_IRQS_ON		/* user mode is traced as IRQs on */
-	movq	RIP(%rsp), %rcx
-	movq	EFLAGS(%rsp), %r11
-	addq	$6*8, %rsp	/* skip extra regs -- they were preserved */
-	UNWIND_HINT_EMPTY
-	jmp	.Lpop_c_regs_except_rcx_r11_and_sysret
-
-1:
-	/*
-	 * The fast path looked good when we started, but something changed
-	 * along the way and we need to switch to the slow path.  Calling
-	 * raise(3) will trigger this, for example.  IRQs are off.
-	 */
-	TRACE_IRQS_ON
-	ENABLE_INTERRUPTS(CLBR_ANY)
-	SAVE_EXTRA_REGS
-	movq	%rsp, %rdi
-	call	syscall_return_slowpath	/* returns with IRQs disabled */
-	jmp	return_from_SYSCALL_64
-
-entry_SYSCALL64_slow_path:
 	/* IRQs are off. */
 	SAVE_EXTRA_REGS
 	movq	%rsp, %rdi
@@ -389,7 +315,6 @@ syscall_return_via_sysret:
 	/* rcx and r11 are already restored (see code above) */
 	UNWIND_HINT_EMPTY
 	POP_EXTRA_REGS
-.Lpop_c_regs_except_rcx_r11_and_sysret:
 	popq	%rsi	/* skip r11 */
 	popq	%r10
 	popq	%r9
@@ -420,47 +345,6 @@ syscall_return_via_sysret:
 	USERGS_SYSRET64
 END(entry_SYSCALL_64)
 
-ENTRY(stub_ptregs_64)
-	/*
-	 * Syscalls marked as needing ptregs land here.
-	 * If we are on the fast path, we need to save the extra regs,
-	 * which we achieve by trying again on the slow path.  If we are on
-	 * the slow path, the extra regs are already saved.
-	 *
-	 * RAX stores a pointer to the C function implementing the syscall.
-	 * IRQs are on.
-	 */
-	cmpq	$.Lentry_SYSCALL_64_after_fastpath_call, (%rsp)
-	jne	1f
-
-	/*
-	 * Called from fast path -- disable IRQs again, pop return address
-	 * and jump to slow path
-	 */
-	DISABLE_INTERRUPTS(CLBR_ANY)
-	TRACE_IRQS_OFF
-	popq	%rax
-	UNWIND_HINT_REGS extra=0
-	jmp	entry_SYSCALL64_slow_path
-
-1:
-	JMP_NOSPEC %rax				/* Called from C */
-END(stub_ptregs_64)
-
-.macro ptregs_stub func
-ENTRY(ptregs_\func)
-	UNWIND_HINT_FUNC
-	leaq	\func(%rip), %rax
-	jmp	stub_ptregs_64
-END(ptregs_\func)
-.endm
-
-/* Instantiate ptregs_stub for each ptregs-using syscall */
-#define __SYSCALL_64_QUAL_(sym)
-#define __SYSCALL_64_QUAL_ptregs(sym) ptregs_stub sym
-#define __SYSCALL_64(nr, sym, qual) __SYSCALL_64_QUAL_##qual(sym)
-#include <asm/syscalls_64.h>
-
 /*
  * %rdi: prev task
  * %rsi: next task
diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
index 9c09775e589d..c176d2fab1da 100644
--- a/arch/x86/entry/syscall_64.c
+++ b/arch/x86/entry/syscall_64.c
@@ -7,14 +7,11 @@
 #include <asm/asm-offsets.h>
 #include <asm/syscall.h>
 
-#define __SYSCALL_64_QUAL_(sym) sym
-#define __SYSCALL_64_QUAL_ptregs(sym) ptregs_##sym
-
-#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long __SYSCALL_64_QUAL_##qual(sym)(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
+#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
 #include <asm/syscalls_64.h>
 #undef __SYSCALL_64
 
-#define __SYSCALL_64(nr, sym, qual) [nr] = __SYSCALL_64_QUAL_##qual(sym),
+#define __SYSCALL_64(nr, sym, qual) [nr] = sym,
 
 extern long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
 

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v5 08/12] x86: sanitize sycall table de-references under speculation
  2018-01-27  7:55 ` Dan Williams
@ 2018-01-27  7:55   ` Dan Williams
  -1 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-01-27  7:55 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, kernel-hardening, gregkh, x86, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, torvalds, alan

The syscall table base is a user controlled function pointer in kernel
space. Use 'array_idx' to prevent any out of bounds speculation. While
retpoline prevents speculating into a userspace directed target it does
not stop the pointer de-reference, the concern is leaking memory
relative to the syscall table base, by observing instruction cache
behavior.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/entry/common.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 03505ffbe1b6..f78bf8bfdfae 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -21,6 +21,7 @@
 #include <linux/export.h>
 #include <linux/context_tracking.h>
 #include <linux/user-return-notifier.h>
+#include <linux/nospec.h>
 #include <linux/uprobes.h>
 #include <linux/livepatch.h>
 #include <linux/syscalls.h>
@@ -284,6 +285,7 @@ __visible void do_syscall_64(struct pt_regs *regs)
 	 * regs->orig_ax, which changes the behavior of some syscalls.
 	 */
 	if (likely((nr & __SYSCALL_MASK) < NR_syscalls)) {
+		nr = array_idx(nr & __SYSCALL_MASK, NR_syscalls);
 		regs->ax = sys_call_table[nr & __SYSCALL_MASK](
 			regs->di, regs->si, regs->dx,
 			regs->r10, regs->r8, regs->r9);
@@ -320,6 +322,7 @@ static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs)
 	}
 
 	if (likely(nr < IA32_NR_syscalls)) {
+		nr = array_idx(nr, IA32_NR_syscalls);
 		/*
 		 * It's possible that a 32-bit syscall implementation
 		 * takes a 64-bit parameter but nonetheless assumes that

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [kernel-hardening] [PATCH v5 08/12] x86: sanitize sycall table de-references under speculation
@ 2018-01-27  7:55   ` Dan Williams
  0 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-01-27  7:55 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, kernel-hardening, gregkh, x86, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, torvalds, alan

The syscall table base is a user controlled function pointer in kernel
space. Use 'array_idx' to prevent any out of bounds speculation. While
retpoline prevents speculating into a userspace directed target it does
not stop the pointer de-reference, the concern is leaking memory
relative to the syscall table base, by observing instruction cache
behavior.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/entry/common.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 03505ffbe1b6..f78bf8bfdfae 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -21,6 +21,7 @@
 #include <linux/export.h>
 #include <linux/context_tracking.h>
 #include <linux/user-return-notifier.h>
+#include <linux/nospec.h>
 #include <linux/uprobes.h>
 #include <linux/livepatch.h>
 #include <linux/syscalls.h>
@@ -284,6 +285,7 @@ __visible void do_syscall_64(struct pt_regs *regs)
 	 * regs->orig_ax, which changes the behavior of some syscalls.
 	 */
 	if (likely((nr & __SYSCALL_MASK) < NR_syscalls)) {
+		nr = array_idx(nr & __SYSCALL_MASK, NR_syscalls);
 		regs->ax = sys_call_table[nr & __SYSCALL_MASK](
 			regs->di, regs->si, regs->dx,
 			regs->r10, regs->r8, regs->r9);
@@ -320,6 +322,7 @@ static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs)
 	}
 
 	if (likely(nr < IA32_NR_syscalls)) {
+		nr = array_idx(nr, IA32_NR_syscalls);
 		/*
 		 * It's possible that a 32-bit syscall implementation
 		 * takes a 64-bit parameter but nonetheless assumes that

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v5 09/12] vfs, fdtable: prevent bounds-check bypass via speculative execution
  2018-01-27  7:55 ` Dan Williams
@ 2018-01-27  7:56   ` Dan Williams
  -1 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-01-27  7:56 UTC (permalink / raw)
  To: tglx; +Cc: linux-arch, kernel-hardening, gregkh, Al Viro, torvalds, alan

'fd' is a user controlled value that is used as a data dependency to
read from the 'fdt->fd' array.  In order to avoid potential leaks of
kernel memory values, block speculative execution of the instruction
stream that could issue reads based on an invalid 'file *' returned from
__fcheck_files.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Co-developed-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/fdtable.h |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index 1c65817673db..c61f06c77fdf 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -10,6 +10,7 @@
 #include <linux/compiler.h>
 #include <linux/spinlock.h>
 #include <linux/rcupdate.h>
+#include <linux/nospec.h>
 #include <linux/types.h>
 #include <linux/init.h>
 #include <linux/fs.h>
@@ -82,8 +83,10 @@ static inline struct file *__fcheck_files(struct files_struct *files, unsigned i
 {
 	struct fdtable *fdt = rcu_dereference_raw(files->fdt);
 
-	if (fd < fdt->max_fds)
+	if (fd < fdt->max_fds) {
+		fd = array_idx(fd, fdt->max_fds);
 		return rcu_dereference_raw(fdt->fd[fd]);
+	}
 	return NULL;
 }
 

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [kernel-hardening] [PATCH v5 09/12] vfs, fdtable: prevent bounds-check bypass via speculative execution
@ 2018-01-27  7:56   ` Dan Williams
  0 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-01-27  7:56 UTC (permalink / raw)
  To: tglx; +Cc: linux-arch, kernel-hardening, gregkh, Al Viro, torvalds, alan

'fd' is a user controlled value that is used as a data dependency to
read from the 'fdt->fd' array.  In order to avoid potential leaks of
kernel memory values, block speculative execution of the instruction
stream that could issue reads based on an invalid 'file *' returned from
__fcheck_files.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Co-developed-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/fdtable.h |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index 1c65817673db..c61f06c77fdf 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -10,6 +10,7 @@
 #include <linux/compiler.h>
 #include <linux/spinlock.h>
 #include <linux/rcupdate.h>
+#include <linux/nospec.h>
 #include <linux/types.h>
 #include <linux/init.h>
 #include <linux/fs.h>
@@ -82,8 +83,10 @@ static inline struct file *__fcheck_files(struct files_struct *files, unsigned i
 {
 	struct fdtable *fdt = rcu_dereference_raw(files->fdt);
 
-	if (fd < fdt->max_fds)
+	if (fd < fdt->max_fds) {
+		fd = array_idx(fd, fdt->max_fds);
 		return rcu_dereference_raw(fdt->fd[fd]);
+	}
 	return NULL;
 }
 

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v5 10/12] kvm, x86: update spectre-v1 mitigation
  2018-01-27  7:55 ` Dan Williams
@ 2018-01-27  7:56   ` Dan Williams
  -1 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-01-27  7:56 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, kernel-hardening, Andrew Honig, gregkh,
	Paolo Bonzini, alan, torvalds, Jim Mattson

Commit 75f139aaf896 "KVM: x86: Add memory barrier on vmcs field lookup"
added a raw 'asm("lfence");' to prevent a bounds check bypass of
'vmcs_field_to_offset_table'. We can save an lfence in this path and
just use the common 'array_idx' helper designed for these types of
fixes.

Cc: Andrew Honig <ahonig@google.com>
Cc: Jim Mattson <jmattson@google.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/kvm/vmx.c |   14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 924589c53422..5345e68cc506 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -34,6 +34,7 @@
 #include <linux/tboot.h>
 #include <linux/hrtimer.h>
 #include <linux/frame.h>
+#include <linux/nospec.h>
 #include "kvm_cache_regs.h"
 #include "x86.h"
 
@@ -883,13 +884,18 @@ static const unsigned short vmcs_field_to_offset_table[] = {
 
 static inline short vmcs_field_to_offset(unsigned long field)
 {
-	BUILD_BUG_ON(ARRAY_SIZE(vmcs_field_to_offset_table) > SHRT_MAX);
+	const size_t sz = ARRAY_SIZE(vmcs_field_to_offset_table);
+	unsigned short offset;
 
-	if (field >= ARRAY_SIZE(vmcs_field_to_offset_table) ||
-	    vmcs_field_to_offset_table[field] == 0)
+	BUILD_BUG_ON(sz > SHRT_MAX);
+	if (field >= sz)
 		return -ENOENT;
 
-	return vmcs_field_to_offset_table[field];
+	field = array_idx(field, sz);
+	offset = vmcs_field_to_offset_table[field];
+	if (offset == 0)
+		return -ENOENT;
+	return offset;
 }
 
 static inline struct vmcs12 *get_vmcs12(struct kvm_vcpu *vcpu)

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [kernel-hardening] [PATCH v5 10/12] kvm, x86: update spectre-v1 mitigation
@ 2018-01-27  7:56   ` Dan Williams
  0 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-01-27  7:56 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, kernel-hardening, Andrew Honig, gregkh,
	Paolo Bonzini, alan, torvalds, Jim Mattson

Commit 75f139aaf896 "KVM: x86: Add memory barrier on vmcs field lookup"
added a raw 'asm("lfence");' to prevent a bounds check bypass of
'vmcs_field_to_offset_table'. We can save an lfence in this path and
just use the common 'array_idx' helper designed for these types of
fixes.

Cc: Andrew Honig <ahonig@google.com>
Cc: Jim Mattson <jmattson@google.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/kvm/vmx.c |   14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 924589c53422..5345e68cc506 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -34,6 +34,7 @@
 #include <linux/tboot.h>
 #include <linux/hrtimer.h>
 #include <linux/frame.h>
+#include <linux/nospec.h>
 #include "kvm_cache_regs.h"
 #include "x86.h"
 
@@ -883,13 +884,18 @@ static const unsigned short vmcs_field_to_offset_table[] = {
 
 static inline short vmcs_field_to_offset(unsigned long field)
 {
-	BUILD_BUG_ON(ARRAY_SIZE(vmcs_field_to_offset_table) > SHRT_MAX);
+	const size_t sz = ARRAY_SIZE(vmcs_field_to_offset_table);
+	unsigned short offset;
 
-	if (field >= ARRAY_SIZE(vmcs_field_to_offset_table) ||
-	    vmcs_field_to_offset_table[field] == 0)
+	BUILD_BUG_ON(sz > SHRT_MAX);
+	if (field >= sz)
 		return -ENOENT;
 
-	return vmcs_field_to_offset_table[field];
+	field = array_idx(field, sz);
+	offset = vmcs_field_to_offset_table[field];
+	if (offset == 0)
+		return -ENOENT;
+	return offset;
 }
 
 static inline struct vmcs12 *get_vmcs12(struct kvm_vcpu *vcpu)

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v5 11/12] nl80211: sanitize array index in parse_txq_params
@ 2018-01-27  7:56   ` Dan Williams
  0 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-01-27  7:56 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, kernel-hardening, gregkh, Christian Lamparter,
	linux-wireless, Johannes Berg, torvalds, David S. Miller,
	Elena Reshetova, alan

Wireless drivers rely on parse_txq_params to validate that
txq_params->ac is less than NL80211_NUM_ACS by the time the low-level
driver's ->conf_tx() handler is called. Use a new helper, 'array_idx',
to sanitize txq_params->ac with respect to speculation. I.e. ensure that
any speculation into ->conf_tx() handlers is done with a value of
txq_params->ac that is within the bounds of [0, NL80211_NUM_ACS).

Reported-by: Christian Lamparter <chunkeey@gmail.com>
Reported-by: Elena Reshetova <elena.reshetova@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-wireless@vger.kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 net/wireless/nl80211.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index d396cb61a280..1479a1c7819c 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -16,6 +16,7 @@
 #include <linux/nl80211.h>
 #include <linux/rtnetlink.h>
 #include <linux/netlink.h>
+#include <linux/nospec.h>
 #include <linux/etherdevice.h>
 #include <net/net_namespace.h>
 #include <net/genetlink.h>
@@ -2056,20 +2057,22 @@ static const struct nla_policy txq_params_policy[NL80211_TXQ_ATTR_MAX + 1] = {
 static int parse_txq_params(struct nlattr *tb[],
 			    struct ieee80211_txq_params *txq_params)
 {
+	u8 ac;
+
 	if (!tb[NL80211_TXQ_ATTR_AC] || !tb[NL80211_TXQ_ATTR_TXOP] ||
 	    !tb[NL80211_TXQ_ATTR_CWMIN] || !tb[NL80211_TXQ_ATTR_CWMAX] ||
 	    !tb[NL80211_TXQ_ATTR_AIFS])
 		return -EINVAL;
 
-	txq_params->ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]);
+	ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]);
 	txq_params->txop = nla_get_u16(tb[NL80211_TXQ_ATTR_TXOP]);
 	txq_params->cwmin = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMIN]);
 	txq_params->cwmax = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMAX]);
 	txq_params->aifs = nla_get_u8(tb[NL80211_TXQ_ATTR_AIFS]);
 
-	if (txq_params->ac >= NL80211_NUM_ACS)
+	if (ac >= NL80211_NUM_ACS)
 		return -EINVAL;
-
+	txq_params->ac = array_idx(ac, NL80211_NUM_ACS);
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v5 11/12] nl80211: sanitize array index in parse_txq_params
@ 2018-01-27  7:56   ` Dan Williams
  0 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-01-27  7:56 UTC (permalink / raw)
  To: tglx-hfZtesqFncYOwBW4kG4KsQ
  Cc: linux-arch-u79uwXL29TY76Z2rM5mHXA,
	kernel-hardening-ZwoEplunGu1jrUoiu81ncdBPR1lH4CV8,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r, Christian Lamparter,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA, Johannes Berg,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, David S. Miller,
	Elena Reshetova, alan-VuQAYsv1563Yd54FQh9/CA

Wireless drivers rely on parse_txq_params to validate that
txq_params->ac is less than NL80211_NUM_ACS by the time the low-level
driver's ->conf_tx() handler is called. Use a new helper, 'array_idx',
to sanitize txq_params->ac with respect to speculation. I.e. ensure that
any speculation into ->conf_tx() handlers is done with a value of
txq_params->ac that is within the bounds of [0, NL80211_NUM_ACS).

Reported-by: Christian Lamparter <chunkeey-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Reported-by: Elena Reshetova <elena.reshetova-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Acked-by: Johannes Berg <johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>
Cc: "David S. Miller" <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Signed-off-by: Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 net/wireless/nl80211.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index d396cb61a280..1479a1c7819c 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -16,6 +16,7 @@
 #include <linux/nl80211.h>
 #include <linux/rtnetlink.h>
 #include <linux/netlink.h>
+#include <linux/nospec.h>
 #include <linux/etherdevice.h>
 #include <net/net_namespace.h>
 #include <net/genetlink.h>
@@ -2056,20 +2057,22 @@ static const struct nla_policy txq_params_policy[NL80211_TXQ_ATTR_MAX + 1] = {
 static int parse_txq_params(struct nlattr *tb[],
 			    struct ieee80211_txq_params *txq_params)
 {
+	u8 ac;
+
 	if (!tb[NL80211_TXQ_ATTR_AC] || !tb[NL80211_TXQ_ATTR_TXOP] ||
 	    !tb[NL80211_TXQ_ATTR_CWMIN] || !tb[NL80211_TXQ_ATTR_CWMAX] ||
 	    !tb[NL80211_TXQ_ATTR_AIFS])
 		return -EINVAL;
 
-	txq_params->ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]);
+	ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]);
 	txq_params->txop = nla_get_u16(tb[NL80211_TXQ_ATTR_TXOP]);
 	txq_params->cwmin = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMIN]);
 	txq_params->cwmax = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMAX]);
 	txq_params->aifs = nla_get_u8(tb[NL80211_TXQ_ATTR_AIFS]);
 
-	if (txq_params->ac >= NL80211_NUM_ACS)
+	if (ac >= NL80211_NUM_ACS)
 		return -EINVAL;

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [kernel-hardening] [PATCH v5 11/12] nl80211: sanitize array index in parse_txq_params
@ 2018-01-27  7:56   ` Dan Williams
  0 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-01-27  7:56 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, kernel-hardening, gregkh, Christian Lamparter,
	linux-wireless, Johannes Berg, torvalds, David S. Miller,
	Elena Reshetova, alan

Wireless drivers rely on parse_txq_params to validate that
txq_params->ac is less than NL80211_NUM_ACS by the time the low-level
driver's ->conf_tx() handler is called. Use a new helper, 'array_idx',
to sanitize txq_params->ac with respect to speculation. I.e. ensure that
any speculation into ->conf_tx() handlers is done with a value of
txq_params->ac that is within the bounds of [0, NL80211_NUM_ACS).

Reported-by: Christian Lamparter <chunkeey@gmail.com>
Reported-by: Elena Reshetova <elena.reshetova@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-wireless@vger.kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 net/wireless/nl80211.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index d396cb61a280..1479a1c7819c 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -16,6 +16,7 @@
 #include <linux/nl80211.h>
 #include <linux/rtnetlink.h>
 #include <linux/netlink.h>
+#include <linux/nospec.h>
 #include <linux/etherdevice.h>
 #include <net/net_namespace.h>
 #include <net/genetlink.h>
@@ -2056,20 +2057,22 @@ static const struct nla_policy txq_params_policy[NL80211_TXQ_ATTR_MAX + 1] = {
 static int parse_txq_params(struct nlattr *tb[],
 			    struct ieee80211_txq_params *txq_params)
 {
+	u8 ac;
+
 	if (!tb[NL80211_TXQ_ATTR_AC] || !tb[NL80211_TXQ_ATTR_TXOP] ||
 	    !tb[NL80211_TXQ_ATTR_CWMIN] || !tb[NL80211_TXQ_ATTR_CWMAX] ||
 	    !tb[NL80211_TXQ_ATTR_AIFS])
 		return -EINVAL;
 
-	txq_params->ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]);
+	ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]);
 	txq_params->txop = nla_get_u16(tb[NL80211_TXQ_ATTR_TXOP]);
 	txq_params->cwmin = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMIN]);
 	txq_params->cwmax = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMAX]);
 	txq_params->aifs = nla_get_u8(tb[NL80211_TXQ_ATTR_AIFS]);
 
-	if (txq_params->ac >= NL80211_NUM_ACS)
+	if (ac >= NL80211_NUM_ACS)
 		return -EINVAL;
-
+	txq_params->ac = array_idx(ac, NL80211_NUM_ACS);
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v5 12/12] x86/spectre: report get_user mitigation for spectre_v1
  2018-01-27  7:55 ` Dan Williams
@ 2018-01-27  7:56   ` Dan Williams
  -1 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-01-27  7:56 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, kernel-hardening, gregkh, x86, torvalds, Ingo Molnar,
	H. Peter Anvin, Jiri Slaby, alan

Reflect the presence of 'get_user', '__get_user', and 'syscall'
protections in sysfs. Keep the "Vulnerable" distinction given the
expectation that the places that have been identified for 'array_idx'
usage are likely incomplete.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reported-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/kernel/cpu/bugs.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 390b3dc3d438..01d5ba48f745 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -269,7 +269,7 @@ ssize_t cpu_show_spectre_v1(struct device *dev,
 {
 	if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V1))
 		return sprintf(buf, "Not affected\n");
-	return sprintf(buf, "Vulnerable\n");
+	return sprintf(buf, "Vulnerable: Minimal user pointer sanitization\n");
 }
 
 ssize_t cpu_show_spectre_v2(struct device *dev,

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [kernel-hardening] [PATCH v5 12/12] x86/spectre: report get_user mitigation for spectre_v1
@ 2018-01-27  7:56   ` Dan Williams
  0 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-01-27  7:56 UTC (permalink / raw)
  To: tglx
  Cc: linux-arch, kernel-hardening, gregkh, x86, torvalds, Ingo Molnar,
	H. Peter Anvin, Jiri Slaby, alan

Reflect the presence of 'get_user', '__get_user', and 'syscall'
protections in sysfs. Keep the "Vulnerable" distinction given the
expectation that the places that have been identified for 'array_idx'
usage are likely incomplete.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reported-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/kernel/cpu/bugs.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 390b3dc3d438..01d5ba48f745 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -269,7 +269,7 @@ ssize_t cpu_show_spectre_v1(struct device *dev,
 {
 	if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V1))
 		return sprintf(buf, "Not affected\n");
-	return sprintf(buf, "Vulnerable\n");
+	return sprintf(buf, "Vulnerable: Minimal user pointer sanitization\n");
 }
 
 ssize_t cpu_show_spectre_v2(struct device *dev,

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [PATCH v5 00/12] spectre variant1 mitigations for tip/x86/pti
  2018-01-27  7:55 ` Dan Williams
  (?)
@ 2018-01-27 18:52   ` Linus Torvalds
  -1 siblings, 0 replies; 75+ messages in thread
From: Linus Torvalds @ 2018-01-27 18:52 UTC (permalink / raw)
  To: Dan Williams
  Cc: Thomas Gleixner, Mark Rutland, Cyril Novikov, Kernel Hardening,
	Peter Zijlstra, Catalin Marinas, Will Deacon, H. Peter Anvin,
	Jiri Slaby, Elena Reshetova, linux-arch, Andi Kleen,
	Jonathan Corbet, the arch/x86 maintainers, Russell King,
	Ingo Molnar, Andrew Honig, Alan Cox, Tom Lendacky, Kees Cook,
	Al Viro, Andy Lutomirski, Jim Mattson, Christian Lamparter,
	Greg Kroah-Hartman, Linux Wireless List, Paolo Bonzini,
	Johannes Berg, David S. Miller

On Fri, Jan 26, 2018 at 11:55 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>
> Here's another spin of the spectre-v1 mitigations for 4.16.

I see nothing really objectionable here.

And unlike Spectre-v2 and Meltdown, I expect Spectre-v1 to be with us
for a long time. It's not a "CPU did a bad job with checking the
cached information it had" (whether it be from the TLB, BTB or RSB),
it's pretty fundamental to just regular conditional branch prediction.

So ack from me, and I don't expect this to be behind any config options.

I still haven't really seen any numbers for this, but I _assume_ it's
basically not measurable.

                 Linus

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v5 00/12] spectre variant1 mitigations for tip/x86/pti
@ 2018-01-27 18:52   ` Linus Torvalds
  0 siblings, 0 replies; 75+ messages in thread
From: Linus Torvalds @ 2018-01-27 18:52 UTC (permalink / raw)
  To: Dan Williams
  Cc: Thomas Gleixner, Mark Rutland, Cyril Novikov, Kernel Hardening,
	Peter Zijlstra, Catalin Marinas, Will Deacon, H. Peter Anvin,
	Jiri Slaby, Elena Reshetova, linux-arch, Andi Kleen,
	Jonathan Corbet, the arch/x86 maintainers, Russell King,
	Ingo Molnar, Andrew Honig, Alan Cox, Tom Lendacky, Kees Cook

On Fri, Jan 26, 2018 at 11:55 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>
> Here's another spin of the spectre-v1 mitigations for 4.16.

I see nothing really objectionable here.

And unlike Spectre-v2 and Meltdown, I expect Spectre-v1 to be with us
for a long time. It's not a "CPU did a bad job with checking the
cached information it had" (whether it be from the TLB, BTB or RSB),
it's pretty fundamental to just regular conditional branch prediction.

So ack from me, and I don't expect this to be behind any config options.

I still haven't really seen any numbers for this, but I _assume_ it's
basically not measurable.

                 Linus

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [kernel-hardening] Re: [PATCH v5 00/12] spectre variant1 mitigations for tip/x86/pti
@ 2018-01-27 18:52   ` Linus Torvalds
  0 siblings, 0 replies; 75+ messages in thread
From: Linus Torvalds @ 2018-01-27 18:52 UTC (permalink / raw)
  To: Dan Williams
  Cc: Thomas Gleixner, Mark Rutland, Cyril Novikov, Kernel Hardening,
	Peter Zijlstra, Catalin Marinas, Will Deacon, H. Peter Anvin,
	Jiri Slaby, Elena Reshetova, linux-arch, Andi Kleen,
	Jonathan Corbet, the arch/x86 maintainers, Russell King,
	Ingo Molnar, Andrew Honig, Alan Cox, Tom Lendacky, Kees Cook,
	Al Viro, Andy Lutomirski, Jim Mattson, Christian Lamparter,
	Greg Kroah-Hartman, Linux Wireless List, Paolo Bonzini,
	Johannes Berg, David S. Miller

On Fri, Jan 26, 2018 at 11:55 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>
> Here's another spin of the spectre-v1 mitigations for 4.16.

I see nothing really objectionable here.

And unlike Spectre-v2 and Meltdown, I expect Spectre-v1 to be with us
for a long time. It's not a "CPU did a bad job with checking the
cached information it had" (whether it be from the TLB, BTB or RSB),
it's pretty fundamental to just regular conditional branch prediction.

So ack from me, and I don't expect this to be behind any config options.

I still haven't really seen any numbers for this, but I _assume_ it's
basically not measurable.

                 Linus

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v5 00/12] spectre variant1 mitigations for tip/x86/pti
  2018-01-27  7:55 ` Dan Williams
  (?)
@ 2018-01-27 19:26   ` Dan Williams
  -1 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-01-27 19:26 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mark Rutland, Cyril Novikov, Kernel Hardening, Peter Zijlstra,
	Catalin Marinas, Will Deacon, H. Peter Anvin, Jiri Slaby,
	Elena Reshetova, linux-arch, Andi Kleen, Jonathan Corbet, X86 ML,
	Russell King, Ingo Molnar, Andrew Honig, Alan Cox, Tom Lendacky,
	Kees Cook, Al Viro, Andy Lutomirski, Jim Mattson,
	Christian Lamparter, Greg KH, Linux Wireless List, Paolo Bonzini,
	Johannes Berg, Linus Torvalds, David S. Miller,
	Linux Kernel Mailing List

[ adding lkml ]

I had inadvertently dropped lkml when sending this to Thomas. Archive here:

https://marc.info/?l=linux-wireless&m=151704026325010&w=2
https://marc.info/?l=linux-arch&m=151704027225013&w=2
https://marc.info/?l=linux-arch&m=151704027225014&w=2
https://marc.info/?l=linux-arch&m=151704027625015&w=2
https://marc.info/?l=linux-arch&m=151704028225016&w=2
https://marc.info/?l=linux-arch&m=151704028725019&w=2
https://marc.info/?l=linux-arch&m=151704086725186&w=2
https://marc.info/?l=linux-arch&m=151704030025025&w=2
https://marc.info/?l=linux-arch&m=151704030525028&w=2
https://marc.info/?l=linux-arch&m=151704031125029&w=2
https://marc.info/?l=linux-arch&m=151704032225034&w=2
https://marc.info/?l=linux-arch&m=151704032625035&w=2
https://marc.info/?l=linux-arch&m=151704032725037&w=2


On Fri, Jan 26, 2018 at 11:55 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> Hi Thomas,
>
> Here's another spin of the spectre-v1 mitigations for 4.16.
>
> Changes since v4.1: [1]
> * Tweak the sanitization scheme yet again to make it even simpler. Now,
>   instead of 'array_ptr' to get a sanitized pointer to an array element,
>   just provide an array index sanitization helper 'array_idx' to be called
>   after successfully validating the index is in bounds. I.e. in the
>   exact same location one would otherwise put an lfence, place this
>   sanitizer:
>
>       if (idx < sz) {
>           idx = array_idx(idx, sz);
>           val = array[idx];
>       }
>
>   This lets the implementation include more sanity checking that the
>   compiler can usually compile out. It otherwise appears to produce
>   better assembly. This also cleans up the concern about comparing the
>   value returned from array_ptr to create another speculation point.
>   (Russell, Linus, Cyril)
>
> * Drop the syscall_64_fastpath.  This is the straightforward patch from
>   Linus that might also be in flight from Andy, but I went ahead and
>   included it since I did not see it on LKML yet.
>
> * Kill the MASK_NOSPEC macro and just open code it. (Andy)
>
> * Add system-call-number sanitization to the slow path syscall table
>   lookups.
>
> * Redo the array_ptr conversions with array_idx.
>
> * Update /sys/devices/system/cpu/vulnerabilities/spectre_v1 to indicate
>   the new protections. It now reports "Vulnerable: Minimal user pointer
>   sanitization". (Jiri)
>
> ---
>
> Dan Williams (11):
>       array_idx: sanitize speculative array de-references
>       x86: implement array_idx_mask
>       x86: introduce __uaccess_begin_nospec and ifence
>       x86, __get_user: use __uaccess_begin_nospec
>       x86, get_user: use pointer masking to limit speculation
>       x86: remove the syscall_64 fast-path
>       x86: sanitize sycall table de-references under speculation
>       vfs, fdtable: prevent bounds-check bypass via speculative execution
>       kvm, x86: update spectre-v1 mitigation
>       nl80211: sanitize array index in parse_txq_params
>       x86/spectre: report get_user mitigation for spectre_v1
>
> Mark Rutland (1):
>       Documentation: document array_idx
>
>
>  Documentation/speculation.txt     |   87 ++++++++++++++++++++++++++++
>  arch/x86/entry/common.c           |    3 +
>  arch/x86/entry/entry_64.S         |  116 -------------------------------------
>  arch/x86/entry/syscall_64.c       |    7 +-
>  arch/x86/include/asm/barrier.h    |   26 ++++++++
>  arch/x86/include/asm/msr.h        |    3 -
>  arch/x86/include/asm/uaccess.h    |   15 ++++-
>  arch/x86/include/asm/uaccess_32.h |    6 +-
>  arch/x86/include/asm/uaccess_64.h |   12 ++--
>  arch/x86/kernel/cpu/bugs.c        |    2 -
>  arch/x86/kvm/vmx.c                |   14 +++-
>  arch/x86/lib/getuser.S            |   10 +++
>  arch/x86/lib/usercopy_32.c        |    8 +--
>  include/linux/fdtable.h           |    5 +-
>  include/linux/nospec.h            |   64 ++++++++++++++++++++
>  net/wireless/nl80211.c            |    9 ++-
>  16 files changed, 239 insertions(+), 148 deletions(-)
>  create mode 100644 Documentation/speculation.txt
>  create mode 100644 include/linux/nospec.h

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v5 00/12] spectre variant1 mitigations for tip/x86/pti
@ 2018-01-27 19:26   ` Dan Williams
  0 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-01-27 19:26 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mark Rutland, Cyril Novikov, Kernel Hardening, Peter Zijlstra,
	Catalin Marinas, Will Deacon, H. Peter Anvin, Jiri Slaby,
	Elena Reshetova, linux-arch, Andi Kleen, Jonathan Corbet, X86 ML,
	Russell King, Ingo Molnar, Andrew Honig, Alan Cox, Tom Lendacky,
	Kees Cook, Al Viro, Andy Lutomirski

[ adding lkml ]

I had inadvertently dropped lkml when sending this to Thomas. Archive here:

https://marc.info/?l=linux-wireless&m=151704026325010&w=2
https://marc.info/?l=linux-arch&m=151704027225013&w=2
https://marc.info/?l=linux-arch&m=151704027225014&w=2
https://marc.info/?l=linux-arch&m=151704027625015&w=2
https://marc.info/?l=linux-arch&m=151704028225016&w=2
https://marc.info/?l=linux-arch&m=151704028725019&w=2
https://marc.info/?l=linux-arch&m=151704086725186&w=2
https://marc.info/?l=linux-arch&m=151704030025025&w=2
https://marc.info/?l=linux-arch&m=151704030525028&w=2
https://marc.info/?l=linux-arch&m=151704031125029&w=2
https://marc.info/?l=linux-arch&m=151704032225034&w=2
https://marc.info/?l=linux-arch&m=151704032625035&w=2
https://marc.info/?l=linux-arch&m=151704032725037&w=2


On Fri, Jan 26, 2018 at 11:55 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> Hi Thomas,
>
> Here's another spin of the spectre-v1 mitigations for 4.16.
>
> Changes since v4.1: [1]
> * Tweak the sanitization scheme yet again to make it even simpler. Now,
>   instead of 'array_ptr' to get a sanitized pointer to an array element,
>   just provide an array index sanitization helper 'array_idx' to be called
>   after successfully validating the index is in bounds. I.e. in the
>   exact same location one would otherwise put an lfence, place this
>   sanitizer:
>
>       if (idx < sz) {
>           idx = array_idx(idx, sz);
>           val = array[idx];
>       }
>
>   This lets the implementation include more sanity checking that the
>   compiler can usually compile out. It otherwise appears to produce
>   better assembly. This also cleans up the concern about comparing the
>   value returned from array_ptr to create another speculation point.
>   (Russell, Linus, Cyril)
>
> * Drop the syscall_64_fastpath.  This is the straightforward patch from
>   Linus that might also be in flight from Andy, but I went ahead and
>   included it since I did not see it on LKML yet.
>
> * Kill the MASK_NOSPEC macro and just open code it. (Andy)
>
> * Add system-call-number sanitization to the slow path syscall table
>   lookups.
>
> * Redo the array_ptr conversions with array_idx.
>
> * Update /sys/devices/system/cpu/vulnerabilities/spectre_v1 to indicate
>   the new protections. It now reports "Vulnerable: Minimal user pointer
>   sanitization". (Jiri)
>
> ---
>
> Dan Williams (11):
>       array_idx: sanitize speculative array de-references
>       x86: implement array_idx_mask
>       x86: introduce __uaccess_begin_nospec and ifence
>       x86, __get_user: use __uaccess_begin_nospec
>       x86, get_user: use pointer masking to limit speculation
>       x86: remove the syscall_64 fast-path
>       x86: sanitize sycall table de-references under speculation
>       vfs, fdtable: prevent bounds-check bypass via speculative execution
>       kvm, x86: update spectre-v1 mitigation
>       nl80211: sanitize array index in parse_txq_params
>       x86/spectre: report get_user mitigation for spectre_v1
>
> Mark Rutland (1):
>       Documentation: document array_idx
>
>
>  Documentation/speculation.txt     |   87 ++++++++++++++++++++++++++++
>  arch/x86/entry/common.c           |    3 +
>  arch/x86/entry/entry_64.S         |  116 -------------------------------------
>  arch/x86/entry/syscall_64.c       |    7 +-
>  arch/x86/include/asm/barrier.h    |   26 ++++++++
>  arch/x86/include/asm/msr.h        |    3 -
>  arch/x86/include/asm/uaccess.h    |   15 ++++-
>  arch/x86/include/asm/uaccess_32.h |    6 +-
>  arch/x86/include/asm/uaccess_64.h |   12 ++--
>  arch/x86/kernel/cpu/bugs.c        |    2 -
>  arch/x86/kvm/vmx.c                |   14 +++-
>  arch/x86/lib/getuser.S            |   10 +++
>  arch/x86/lib/usercopy_32.c        |    8 +--
>  include/linux/fdtable.h           |    5 +-
>  include/linux/nospec.h            |   64 ++++++++++++++++++++
>  net/wireless/nl80211.c            |    9 ++-
>  16 files changed, 239 insertions(+), 148 deletions(-)
>  create mode 100644 Documentation/speculation.txt
>  create mode 100644 include/linux/nospec.h

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [kernel-hardening] Re: [PATCH v5 00/12] spectre variant1 mitigations for tip/x86/pti
@ 2018-01-27 19:26   ` Dan Williams
  0 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-01-27 19:26 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mark Rutland, Cyril Novikov, Kernel Hardening, Peter Zijlstra,
	Catalin Marinas, Will Deacon, H. Peter Anvin, Jiri Slaby,
	Elena Reshetova, linux-arch, Andi Kleen, Jonathan Corbet, X86 ML,
	Russell King, Ingo Molnar, Andrew Honig, Alan Cox, Tom Lendacky,
	Kees Cook, Al Viro, Andy Lutomirski, Jim Mattson,
	Christian Lamparter, Greg KH, Linux Wireless List, Paolo Bonzini,
	Johannes Berg, Linus Torvalds, David S. Miller,
	Linux Kernel Mailing List

[ adding lkml ]

I had inadvertently dropped lkml when sending this to Thomas. Archive here:

https://marc.info/?l=linux-wireless&m=151704026325010&w=2
https://marc.info/?l=linux-arch&m=151704027225013&w=2
https://marc.info/?l=linux-arch&m=151704027225014&w=2
https://marc.info/?l=linux-arch&m=151704027625015&w=2
https://marc.info/?l=linux-arch&m=151704028225016&w=2
https://marc.info/?l=linux-arch&m=151704028725019&w=2
https://marc.info/?l=linux-arch&m=151704086725186&w=2
https://marc.info/?l=linux-arch&m=151704030025025&w=2
https://marc.info/?l=linux-arch&m=151704030525028&w=2
https://marc.info/?l=linux-arch&m=151704031125029&w=2
https://marc.info/?l=linux-arch&m=151704032225034&w=2
https://marc.info/?l=linux-arch&m=151704032625035&w=2
https://marc.info/?l=linux-arch&m=151704032725037&w=2


On Fri, Jan 26, 2018 at 11:55 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> Hi Thomas,
>
> Here's another spin of the spectre-v1 mitigations for 4.16.
>
> Changes since v4.1: [1]
> * Tweak the sanitization scheme yet again to make it even simpler. Now,
>   instead of 'array_ptr' to get a sanitized pointer to an array element,
>   just provide an array index sanitization helper 'array_idx' to be called
>   after successfully validating the index is in bounds. I.e. in the
>   exact same location one would otherwise put an lfence, place this
>   sanitizer:
>
>       if (idx < sz) {
>           idx = array_idx(idx, sz);
>           val = array[idx];
>       }
>
>   This lets the implementation include more sanity checking that the
>   compiler can usually compile out. It otherwise appears to produce
>   better assembly. This also cleans up the concern about comparing the
>   value returned from array_ptr to create another speculation point.
>   (Russell, Linus, Cyril)
>
> * Drop the syscall_64_fastpath.  This is the straightforward patch from
>   Linus that might also be in flight from Andy, but I went ahead and
>   included it since I did not see it on LKML yet.
>
> * Kill the MASK_NOSPEC macro and just open code it. (Andy)
>
> * Add system-call-number sanitization to the slow path syscall table
>   lookups.
>
> * Redo the array_ptr conversions with array_idx.
>
> * Update /sys/devices/system/cpu/vulnerabilities/spectre_v1 to indicate
>   the new protections. It now reports "Vulnerable: Minimal user pointer
>   sanitization". (Jiri)
>
> ---
>
> Dan Williams (11):
>       array_idx: sanitize speculative array de-references
>       x86: implement array_idx_mask
>       x86: introduce __uaccess_begin_nospec and ifence
>       x86, __get_user: use __uaccess_begin_nospec
>       x86, get_user: use pointer masking to limit speculation
>       x86: remove the syscall_64 fast-path
>       x86: sanitize sycall table de-references under speculation
>       vfs, fdtable: prevent bounds-check bypass via speculative execution
>       kvm, x86: update spectre-v1 mitigation
>       nl80211: sanitize array index in parse_txq_params
>       x86/spectre: report get_user mitigation for spectre_v1
>
> Mark Rutland (1):
>       Documentation: document array_idx
>
>
>  Documentation/speculation.txt     |   87 ++++++++++++++++++++++++++++
>  arch/x86/entry/common.c           |    3 +
>  arch/x86/entry/entry_64.S         |  116 -------------------------------------
>  arch/x86/entry/syscall_64.c       |    7 +-
>  arch/x86/include/asm/barrier.h    |   26 ++++++++
>  arch/x86/include/asm/msr.h        |    3 -
>  arch/x86/include/asm/uaccess.h    |   15 ++++-
>  arch/x86/include/asm/uaccess_32.h |    6 +-
>  arch/x86/include/asm/uaccess_64.h |   12 ++--
>  arch/x86/kernel/cpu/bugs.c        |    2 -
>  arch/x86/kvm/vmx.c                |   14 +++-
>  arch/x86/lib/getuser.S            |   10 +++
>  arch/x86/lib/usercopy_32.c        |    8 +--
>  include/linux/fdtable.h           |    5 +-
>  include/linux/nospec.h            |   64 ++++++++++++++++++++
>  net/wireless/nl80211.c            |    9 ++-
>  16 files changed, 239 insertions(+), 148 deletions(-)
>  create mode 100644 Documentation/speculation.txt
>  create mode 100644 include/linux/nospec.h

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v5 02/12] array_idx: sanitize speculative array de-references
  2018-01-27  7:55   ` [kernel-hardening] " Dan Williams
@ 2018-01-28  8:55     ` Ingo Molnar
  -1 siblings, 0 replies; 75+ messages in thread
From: Ingo Molnar @ 2018-01-28  8:55 UTC (permalink / raw)
  To: Dan Williams
  Cc: tglx, linux-arch, Cyril Novikov, kernel-hardening,
	Peter Zijlstra, Catalin Marinas, x86, Will Deacon, Russell King,
	Ingo Molnar, gregkh, H. Peter Anvin, torvalds, alan,
	linux-kernel


Firstly, I only got a few patches of this series so I couldn't review all of them 
- please Cc: me to all future Meltdown and Spectre related patches!

* Dan Williams <dan.j.williams@intel.com> wrote:

> 'array_idx' is proposed as a generic mechanism to mitigate against
> Spectre-variant-1 attacks, i.e. an attack that bypasses boundary checks
> via speculative execution). The 'array_idx' implementation is expected
> to be safe for current generation cpus across multiple architectures
> (ARM, x86).

nit: Stray closing parenthesis

s/cpus/CPUs

> Based on an original implementation by Linus Torvalds, tweaked to remove
> speculative flows by Alexei Starovoitov, and tweaked again by Linus to
> introduce an x86 assembly implementation for the mask generation.
> 
> Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org>
> Co-developed-by: Alexei Starovoitov <ast@kernel.org>
> Suggested-by: Cyril Novikov <cnovikov@lynx.com>
> Cc: Russell King <linux@armlinux.org.uk>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  include/linux/nospec.h |   64 ++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 64 insertions(+)
>  create mode 100644 include/linux/nospec.h
> 
> diff --git a/include/linux/nospec.h b/include/linux/nospec.h
> new file mode 100644
> index 000000000000..f59f81889ba3
> --- /dev/null
> +++ b/include/linux/nospec.h
> @@ -0,0 +1,64 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright(c) 2018 Intel Corporation. All rights reserved.

Given the close similarity of Linus's array_access() prototype pseudocode there 
should probably also be:

    Copyright (C) 2018 Linus Torvalds

in that file?

> +
> +#ifndef __NOSPEC_H__
> +#define __NOSPEC_H__
> +
> +/*
> + * When idx is out of bounds (idx >= sz), the sign bit will be set.
> + * Extend the sign bit to all bits and invert, giving a result of zero
> + * for an out of bounds idx, or ~0UL if within bounds [0, sz).
> + */
> +#ifndef array_idx_mask
> +static inline unsigned long array_idx_mask(unsigned long idx, unsigned long sz)
> +{
> +	/*
> +	 * Warn developers about inappropriate array_idx usage.
> +	 *
> +	 * Even if the cpu speculates past the WARN_ONCE branch, the

s/cpu/CPU

> +	 * sign bit of idx is taken into account when generating the
> +	 * mask.
> +	 *
> +	 * This warning is compiled out when the compiler can infer that
> +	 * idx and sz are less than LONG_MAX.

Please use 'idx' and 'sz' in quotes, to make sure they stand out more in free 
flowing comment text. Also please use '()' to denote functions/methods.

I.e. something like:

	 * Warn developers about inappropriate array_idx() usage.
	 *
	 * Even if the CPU speculates past the WARN_ONCE() branch, the
	 * sign bit of 'idx' is taken into account when generating the
	 * mask.
	 *
	 * This warning is compiled out when the compiler can infer that
	 * 'idx' and 'sz' are less than LONG_MAX.

That's just one example - please apply it to all comments consistently.

> +	 */
> +	if (WARN_ONCE(idx > LONG_MAX || sz > LONG_MAX,
> +			"array_idx limited to range of [0, LONG_MAX]\n"))

Same in user facing messages:

			"array_idx() limited to range of [0, LONG_MAX]\n"))

> + * For a code sequence like:
> + *
> + *     if (idx < sz) {
> + *         idx = array_idx(idx, sz);
> + *         val = array[idx];
> + *     }
> + *
> + * ...if the cpu speculates past the bounds check then array_idx() will
> + * clamp the index within the range of [0, sz).

s/cpu/CPU

> + */
> +#define array_idx(idx, sz)						\
> +({									\
> +	typeof(idx) _i = (idx);						\
> +	typeof(sz) _s = (sz);						\
> +	unsigned long _mask = array_idx_mask(_i, _s);			\
> +									\
> +	BUILD_BUG_ON(sizeof(_i) > sizeof(long));			\
> +	BUILD_BUG_ON(sizeof(_s) > sizeof(long));			\
> +									\
> +	_i &= _mask;							\
> +	_i;								\
> +})
> +#endif /* __NOSPEC_H__ */

For heaven's sake, please name a size variable as 'size', not 'sz'. We don't have 
a shortage of characters and can deobfuscate common primitives, can we?

Also, beyond the nits, I also hate the namespace here. We have a new generic 
header providing two new methods:

	#include <linux/nospec.h>

	array_idx_mask()
	array_idx()

which is then optimized for x86 in asm/barrier.h. That's already a non-sequitor.

Then we introduce uaccess API variants with a _nospec() postfix.

Then we add ifence() to x86.

There's no naming coherency to this.

A better approach would be to signal the 'no speculation' aspect of the 
array_idx() methods already: naming it array_idx_nospec() would be a solution,
as it clearly avoids speculation beyond the array boundaries.

Also, without seeing the full series it's hard to tell, whether the introduction 
of linux/nospec.h is justified, but it feels somewhat suspect.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [kernel-hardening] Re: [PATCH v5 02/12] array_idx: sanitize speculative array de-references
@ 2018-01-28  8:55     ` Ingo Molnar
  0 siblings, 0 replies; 75+ messages in thread
From: Ingo Molnar @ 2018-01-28  8:55 UTC (permalink / raw)
  To: Dan Williams
  Cc: tglx, linux-arch, Cyril Novikov, kernel-hardening,
	Peter Zijlstra, Catalin Marinas, x86, Will Deacon, Russell King,
	Ingo Molnar, gregkh, H. Peter Anvin, torvalds, alan,
	linux-kernel


Firstly, I only got a few patches of this series so I couldn't review all of them 
- please Cc: me to all future Meltdown and Spectre related patches!

* Dan Williams <dan.j.williams@intel.com> wrote:

> 'array_idx' is proposed as a generic mechanism to mitigate against
> Spectre-variant-1 attacks, i.e. an attack that bypasses boundary checks
> via speculative execution). The 'array_idx' implementation is expected
> to be safe for current generation cpus across multiple architectures
> (ARM, x86).

nit: Stray closing parenthesis

s/cpus/CPUs

> Based on an original implementation by Linus Torvalds, tweaked to remove
> speculative flows by Alexei Starovoitov, and tweaked again by Linus to
> introduce an x86 assembly implementation for the mask generation.
> 
> Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org>
> Co-developed-by: Alexei Starovoitov <ast@kernel.org>
> Suggested-by: Cyril Novikov <cnovikov@lynx.com>
> Cc: Russell King <linux@armlinux.org.uk>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  include/linux/nospec.h |   64 ++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 64 insertions(+)
>  create mode 100644 include/linux/nospec.h
> 
> diff --git a/include/linux/nospec.h b/include/linux/nospec.h
> new file mode 100644
> index 000000000000..f59f81889ba3
> --- /dev/null
> +++ b/include/linux/nospec.h
> @@ -0,0 +1,64 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright(c) 2018 Intel Corporation. All rights reserved.

Given the close similarity of Linus's array_access() prototype pseudocode there 
should probably also be:

    Copyright (C) 2018 Linus Torvalds

in that file?

> +
> +#ifndef __NOSPEC_H__
> +#define __NOSPEC_H__
> +
> +/*
> + * When idx is out of bounds (idx >= sz), the sign bit will be set.
> + * Extend the sign bit to all bits and invert, giving a result of zero
> + * for an out of bounds idx, or ~0UL if within bounds [0, sz).
> + */
> +#ifndef array_idx_mask
> +static inline unsigned long array_idx_mask(unsigned long idx, unsigned long sz)
> +{
> +	/*
> +	 * Warn developers about inappropriate array_idx usage.
> +	 *
> +	 * Even if the cpu speculates past the WARN_ONCE branch, the

s/cpu/CPU

> +	 * sign bit of idx is taken into account when generating the
> +	 * mask.
> +	 *
> +	 * This warning is compiled out when the compiler can infer that
> +	 * idx and sz are less than LONG_MAX.

Please use 'idx' and 'sz' in quotes, to make sure they stand out more in free 
flowing comment text. Also please use '()' to denote functions/methods.

I.e. something like:

	 * Warn developers about inappropriate array_idx() usage.
	 *
	 * Even if the CPU speculates past the WARN_ONCE() branch, the
	 * sign bit of 'idx' is taken into account when generating the
	 * mask.
	 *
	 * This warning is compiled out when the compiler can infer that
	 * 'idx' and 'sz' are less than LONG_MAX.

That's just one example - please apply it to all comments consistently.

> +	 */
> +	if (WARN_ONCE(idx > LONG_MAX || sz > LONG_MAX,
> +			"array_idx limited to range of [0, LONG_MAX]\n"))

Same in user facing messages:

			"array_idx() limited to range of [0, LONG_MAX]\n"))

> + * For a code sequence like:
> + *
> + *     if (idx < sz) {
> + *         idx = array_idx(idx, sz);
> + *         val = array[idx];
> + *     }
> + *
> + * ...if the cpu speculates past the bounds check then array_idx() will
> + * clamp the index within the range of [0, sz).

s/cpu/CPU

> + */
> +#define array_idx(idx, sz)						\
> +({									\
> +	typeof(idx) _i = (idx);						\
> +	typeof(sz) _s = (sz);						\
> +	unsigned long _mask = array_idx_mask(_i, _s);			\
> +									\
> +	BUILD_BUG_ON(sizeof(_i) > sizeof(long));			\
> +	BUILD_BUG_ON(sizeof(_s) > sizeof(long));			\
> +									\
> +	_i &= _mask;							\
> +	_i;								\
> +})
> +#endif /* __NOSPEC_H__ */

For heaven's sake, please name a size variable as 'size', not 'sz'. We don't have 
a shortage of characters and can deobfuscate common primitives, can we?

Also, beyond the nits, I also hate the namespace here. We have a new generic 
header providing two new methods:

	#include <linux/nospec.h>

	array_idx_mask()
	array_idx()

which is then optimized for x86 in asm/barrier.h. That's already a non-sequitor.

Then we introduce uaccess API variants with a _nospec() postfix.

Then we add ifence() to x86.

There's no naming coherency to this.

A better approach would be to signal the 'no speculation' aspect of the 
array_idx() methods already: naming it array_idx_nospec() would be a solution,
as it clearly avoids speculation beyond the array boundaries.

Also, without seeing the full series it's hard to tell, whether the introduction 
of linux/nospec.h is justified, but it feels somewhat suspect.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v5 03/12] x86: implement array_idx_mask
  2018-01-27  7:55   ` [kernel-hardening] " Dan Williams
@ 2018-01-28  9:02     ` Ingo Molnar
  -1 siblings, 0 replies; 75+ messages in thread
From: Ingo Molnar @ 2018-01-28  9:02 UTC (permalink / raw)
  To: Dan Williams
  Cc: tglx, linux-arch, kernel-hardening, gregkh, x86, Ingo Molnar,
	H. Peter Anvin, torvalds, alan


* Dan Williams <dan.j.williams@intel.com> wrote:

> 'array_idx' uses a mask to sanitize user controllable array indexes,
> i.e. generate a 0 mask if idx >= sz, and a ~0 mask otherwise. While the
> default array_idx_mask handles the carry-bit from the (index - size)
> result in software. The x86 'array_idx_mask' does the same, but the
> carry-bit is handled in the processor CF flag without conditional
> instructions in the control flow.

Same style comments apply as for patch 02.

> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  arch/x86/include/asm/barrier.h |   22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
> index 01727dbc294a..30419b674ebd 100644
> --- a/arch/x86/include/asm/barrier.h
> +++ b/arch/x86/include/asm/barrier.h
> @@ -24,6 +24,28 @@
>  #define wmb()	asm volatile("sfence" ::: "memory")
>  #endif
>  
> +/**
> + * array_idx_mask - generate a mask for array_idx() that is ~0UL when
> + * the bounds check succeeds and 0 otherwise
> + *
> + *     mask = 0 - (idx < sz);
> + */
> +#define array_idx_mask array_idx_mask
> +static inline unsigned long array_idx_mask(unsigned long idx, unsigned long sz)

Please put an extra newline between definitions (even if they are closely related 
as these).

> +{
> +	unsigned long mask;
> +
> +#ifdef CONFIG_X86_32
> +	asm ("cmpl %1,%2; sbbl %0,%0;"
> +#else
> +	asm ("cmpq %1,%2; sbbq %0,%0;"
> +#endif

Wouldn't this suffice:

	asm ("cmp %1,%2; sbb %0,%0;"

... as the word width should automatically be 32 bits on 32-bit kernels and 64 
bits on 64-bit kernels?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [kernel-hardening] Re: [PATCH v5 03/12] x86: implement array_idx_mask
@ 2018-01-28  9:02     ` Ingo Molnar
  0 siblings, 0 replies; 75+ messages in thread
From: Ingo Molnar @ 2018-01-28  9:02 UTC (permalink / raw)
  To: Dan Williams
  Cc: tglx, linux-arch, kernel-hardening, gregkh, x86, Ingo Molnar,
	H. Peter Anvin, torvalds, alan


* Dan Williams <dan.j.williams@intel.com> wrote:

> 'array_idx' uses a mask to sanitize user controllable array indexes,
> i.e. generate a 0 mask if idx >= sz, and a ~0 mask otherwise. While the
> default array_idx_mask handles the carry-bit from the (index - size)
> result in software. The x86 'array_idx_mask' does the same, but the
> carry-bit is handled in the processor CF flag without conditional
> instructions in the control flow.

Same style comments apply as for patch 02.

> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  arch/x86/include/asm/barrier.h |   22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
> index 01727dbc294a..30419b674ebd 100644
> --- a/arch/x86/include/asm/barrier.h
> +++ b/arch/x86/include/asm/barrier.h
> @@ -24,6 +24,28 @@
>  #define wmb()	asm volatile("sfence" ::: "memory")
>  #endif
>  
> +/**
> + * array_idx_mask - generate a mask for array_idx() that is ~0UL when
> + * the bounds check succeeds and 0 otherwise
> + *
> + *     mask = 0 - (idx < sz);
> + */
> +#define array_idx_mask array_idx_mask
> +static inline unsigned long array_idx_mask(unsigned long idx, unsigned long sz)

Please put an extra newline between definitions (even if they are closely related 
as these).

> +{
> +	unsigned long mask;
> +
> +#ifdef CONFIG_X86_32
> +	asm ("cmpl %1,%2; sbbl %0,%0;"
> +#else
> +	asm ("cmpq %1,%2; sbbq %0,%0;"
> +#endif

Wouldn't this suffice:

	asm ("cmp %1,%2; sbb %0,%0;"

... as the word width should automatically be 32 bits on 32-bit kernels and 64 
bits on 64-bit kernels?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v5 04/12] x86: introduce __uaccess_begin_nospec and ifence
  2018-01-27  7:55   ` [kernel-hardening] " Dan Williams
@ 2018-01-28  9:06     ` Ingo Molnar
  -1 siblings, 0 replies; 75+ messages in thread
From: Ingo Molnar @ 2018-01-28  9:06 UTC (permalink / raw)
  To: Dan Williams
  Cc: tglx, linux-arch, Tom Lendacky, Andi Kleen, Kees Cook,
	kernel-hardening, gregkh, x86, Ingo Molnar, Al Viro,
	H. Peter Anvin, torvalds, alan


* Dan Williams <dan.j.williams@intel.com> wrote:

> For '__get_user' paths, do not allow the kernel to speculate on the
> value of a user controlled pointer. In addition to the 'stac'
> instruction for Supervisor Mode Access Protection, an 'ifence' causes
> the 'access_ok' result to resolve in the pipeline before the cpu might
> take any speculative action on the pointer value.
> 
> Since __get_user is a major kernel interface that deals with user
> controlled pointers, the '__uaccess_begin_nospec' mechanism will prevent
> speculative execution past an 'access_ok' permission check. While
> speculative execution past 'access_ok' is not enough to lead to a kernel
> memory leak, it is a necessary precondition.
> 
> To be clear, '__uaccess_begin_nospec' is addressing a class of potential
> problems near '__get_user' usages.
> 
> Note, that while ifence is used to protect '__get_user', pointer masking
> will be used for 'get_user' since it incorporates a bounds check near
> the usage.
> 
> There are no functional changes in this patch.

The style problems/inconsistencies of the #2 patch are repeated here too.

Also, please split this patch into two patches:

 - #1 introducing ifence() and using it where it was open coded before
 - #2 introducing the _nospec() uaccess variants

> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Suggested-by: Andi Kleen <ak@linux.intel.com>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: x86@kernel.org
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  arch/x86/include/asm/barrier.h |    4 ++++
>  arch/x86/include/asm/msr.h     |    3 +--
>  arch/x86/include/asm/uaccess.h |    9 +++++++++
>  3 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
> index 30419b674ebd..5f11d4c5c862 100644
> --- a/arch/x86/include/asm/barrier.h
> +++ b/arch/x86/include/asm/barrier.h
> @@ -46,6 +46,10 @@ static inline unsigned long array_idx_mask(unsigned long idx, unsigned long sz)
>  	return mask;
>  }
>  
> +/* prevent speculative execution past this barrier */

Please use consistent capitalization in comments.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [kernel-hardening] Re: [PATCH v5 04/12] x86: introduce __uaccess_begin_nospec and ifence
@ 2018-01-28  9:06     ` Ingo Molnar
  0 siblings, 0 replies; 75+ messages in thread
From: Ingo Molnar @ 2018-01-28  9:06 UTC (permalink / raw)
  To: Dan Williams
  Cc: tglx, linux-arch, Tom Lendacky, Andi Kleen, Kees Cook,
	kernel-hardening, gregkh, x86, Ingo Molnar, Al Viro,
	H. Peter Anvin, torvalds, alan


* Dan Williams <dan.j.williams@intel.com> wrote:

> For '__get_user' paths, do not allow the kernel to speculate on the
> value of a user controlled pointer. In addition to the 'stac'
> instruction for Supervisor Mode Access Protection, an 'ifence' causes
> the 'access_ok' result to resolve in the pipeline before the cpu might
> take any speculative action on the pointer value.
> 
> Since __get_user is a major kernel interface that deals with user
> controlled pointers, the '__uaccess_begin_nospec' mechanism will prevent
> speculative execution past an 'access_ok' permission check. While
> speculative execution past 'access_ok' is not enough to lead to a kernel
> memory leak, it is a necessary precondition.
> 
> To be clear, '__uaccess_begin_nospec' is addressing a class of potential
> problems near '__get_user' usages.
> 
> Note, that while ifence is used to protect '__get_user', pointer masking
> will be used for 'get_user' since it incorporates a bounds check near
> the usage.
> 
> There are no functional changes in this patch.

The style problems/inconsistencies of the #2 patch are repeated here too.

Also, please split this patch into two patches:

 - #1 introducing ifence() and using it where it was open coded before
 - #2 introducing the _nospec() uaccess variants

> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Suggested-by: Andi Kleen <ak@linux.intel.com>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: x86@kernel.org
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  arch/x86/include/asm/barrier.h |    4 ++++
>  arch/x86/include/asm/msr.h     |    3 +--
>  arch/x86/include/asm/uaccess.h |    9 +++++++++
>  3 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
> index 30419b674ebd..5f11d4c5c862 100644
> --- a/arch/x86/include/asm/barrier.h
> +++ b/arch/x86/include/asm/barrier.h
> @@ -46,6 +46,10 @@ static inline unsigned long array_idx_mask(unsigned long idx, unsigned long sz)
>  	return mask;
>  }
>  
> +/* prevent speculative execution past this barrier */

Please use consistent capitalization in comments.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v5 04/12] x86: introduce __uaccess_begin_nospec and ifence
  2018-01-27  7:55   ` [kernel-hardening] " Dan Williams
@ 2018-01-28  9:14     ` Ingo Molnar
  -1 siblings, 0 replies; 75+ messages in thread
From: Ingo Molnar @ 2018-01-28  9:14 UTC (permalink / raw)
  To: Dan Williams
  Cc: tglx, linux-arch, Tom Lendacky, Andi Kleen, Kees Cook,
	kernel-hardening, gregkh, x86, Ingo Molnar, Al Viro,
	H. Peter Anvin, torvalds, alan, linux-kernel

* Dan Williams <dan.j.williams@intel.com> wrote:

> --- a/arch/x86/include/asm/uaccess.h
> +++ b/arch/x86/include/asm/uaccess.h
> @@ -124,6 +124,11 @@ extern int __get_user_bad(void);
>  
>  #define __uaccess_begin() stac()
>  #define __uaccess_end()   clac()
> +#define __uaccess_begin_nospec()	\
> +({					\
> +	stac();				\
> +	ifence();			\
> +})

BTW., wouldn't it be better to switch the barrier order here, i.e. to do:

	ifence();			\
	stac();				\

?

The reason is that stac()/clac() is usually paired, so there's a chance with short 
sequences that it would resolve with 'no externally visible changes to flags'.

Also, there's many cases where flags are modified _inside_ the STAC/CLAC section, 
so grouping them together inside a speculation atom could be beneficial.

The flip side is that if the MFENCE stalls the STAC that is ahead of it could be 
processed for 'free' - while it's always post barrier with my suggestion.

But in any case it would be nice to see a discussion of this aspect in the 
changelog, even if the patch does not change.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [kernel-hardening] Re: [PATCH v5 04/12] x86: introduce __uaccess_begin_nospec and ifence
@ 2018-01-28  9:14     ` Ingo Molnar
  0 siblings, 0 replies; 75+ messages in thread
From: Ingo Molnar @ 2018-01-28  9:14 UTC (permalink / raw)
  To: Dan Williams
  Cc: tglx, linux-arch, Tom Lendacky, Andi Kleen, Kees Cook,
	kernel-hardening, gregkh, x86, Ingo Molnar, Al Viro,
	H. Peter Anvin, torvalds, alan, linux-kernel

* Dan Williams <dan.j.williams@intel.com> wrote:

> --- a/arch/x86/include/asm/uaccess.h
> +++ b/arch/x86/include/asm/uaccess.h
> @@ -124,6 +124,11 @@ extern int __get_user_bad(void);
>  
>  #define __uaccess_begin() stac()
>  #define __uaccess_end()   clac()
> +#define __uaccess_begin_nospec()	\
> +({					\
> +	stac();				\
> +	ifence();			\
> +})

BTW., wouldn't it be better to switch the barrier order here, i.e. to do:

	ifence();			\
	stac();				\

?

The reason is that stac()/clac() is usually paired, so there's a chance with short 
sequences that it would resolve with 'no externally visible changes to flags'.

Also, there's many cases where flags are modified _inside_ the STAC/CLAC section, 
so grouping them together inside a speculation atom could be beneficial.

The flip side is that if the MFENCE stalls the STAC that is ahead of it could be 
processed for 'free' - while it's always post barrier with my suggestion.

But in any case it would be nice to see a discussion of this aspect in the 
changelog, even if the patch does not change.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v5 05/12] x86, __get_user: use __uaccess_begin_nospec
  2018-01-27  7:55   ` [kernel-hardening] " Dan Williams
@ 2018-01-28  9:19     ` Ingo Molnar
  -1 siblings, 0 replies; 75+ messages in thread
From: Ingo Molnar @ 2018-01-28  9:19 UTC (permalink / raw)
  To: Dan Williams
  Cc: tglx, linux-arch, Andi Kleen, Kees Cook, kernel-hardening,
	gregkh, x86, Ingo Molnar, Al Viro, H. Peter Anvin, torvalds,
	alan, linux-kernel


* Dan Williams <dan.j.williams@intel.com> wrote:

> Quoting Linus:
> 
>     I do think that it would be a good idea to very expressly document
>     the fact that it's not that the user access itself is unsafe. I do
>     agree that things like "get_user()" want to be protected, but not
>     because of any direct bugs or problems with get_user() and friends,
>     but simply because get_user() is an excellent source of a pointer
>     that is obviously controlled from a potentially attacking user
>     space. So it's a prime candidate for then finding _subsequent_
>     accesses that can then be used to perturb the cache.
> 
> '__uaccess_begin_nospec' covers '__get_user' and 'copy_from_iter' where
> the limit check is far away from the user pointer de-reference. In those
> cases an 'lfence' prevents speculation with a potential pointer to
> privileged memory.

(Same style comments as for the previous patches)

> 
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Suggested-by: Andi Kleen <ak@linux.intel.com>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: x86@kernel.org
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  arch/x86/include/asm/uaccess.h    |    6 +++---
>  arch/x86/include/asm/uaccess_32.h |    6 +++---
>  arch/x86/include/asm/uaccess_64.h |   12 ++++++------
>  arch/x86/lib/usercopy_32.c        |    8 ++++----
>  4 files changed, 16 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
> index 626caf58183a..a930585fa3b5 100644
> --- a/arch/x86/include/asm/uaccess.h
> +++ b/arch/x86/include/asm/uaccess.h
> @@ -450,7 +450,7 @@ do {									\
>  ({									\
>  	int __gu_err;							\
>  	__inttype(*(ptr)) __gu_val;					\
> -	__uaccess_begin();						\
> +	__uaccess_begin_nospec();					\
>  	__get_user_size(__gu_val, (ptr), (size), __gu_err, -EFAULT);	\
>  	__uaccess_end();						\
>  	(x) = (__force __typeof__(*(ptr)))__gu_val;			\
> @@ -557,7 +557,7 @@ struct __large_struct { unsigned long buf[100]; };
>   *	get_user_ex(...);
>   * } get_user_catch(err)
>   */
> -#define get_user_try		uaccess_try
> +#define get_user_try		uaccess_try_nospec
>  #define get_user_catch(err)	uaccess_catch(err)
>  
>  #define get_user_ex(x, ptr)	do {					\
> @@ -591,7 +591,7 @@ extern void __cmpxchg_wrong_size(void)
>  	__typeof__(ptr) __uval = (uval);				\
>  	__typeof__(*(ptr)) __old = (old);				\
>  	__typeof__(*(ptr)) __new = (new);				\
> -	__uaccess_begin();						\
> +	__uaccess_begin_nospec();					\

>  	else
>  		n = __copy_user_intel(to, from, n);
> -	clac();
> +	__uaccess_end();

>  	return n;
>  }
>  EXPORT_SYMBOL(__copy_user_ll);
> @@ -344,7 +344,7 @@ EXPORT_SYMBOL(__copy_user_ll);
>  unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *from,
>  					unsigned long n)
>  {
> -	stac();
> +	__uaccess_begin_nospec();
>  #ifdef CONFIG_X86_INTEL_USERCOPY
>  	if (n > 64 && static_cpu_has(X86_FEATURE_XMM2))
>  		n = __copy_user_intel_nocache(to, from, n);
> @@ -353,7 +353,7 @@ unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *fr
>  #else
>  	__copy_user(to, from, n);
>  #endif
> -	clac();
> +	__uaccess_end();
>  	return n;
>  }
>  EXPORT_SYMBOL(__copy_from_user_ll_nocache_nozero);
> 

These three chunks appears to be unrelated changes changing open-coded clac()s to 
__uaccess_end() calls correctly: please split these out into a separate patch.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [kernel-hardening] Re: [PATCH v5 05/12] x86, __get_user: use __uaccess_begin_nospec
@ 2018-01-28  9:19     ` Ingo Molnar
  0 siblings, 0 replies; 75+ messages in thread
From: Ingo Molnar @ 2018-01-28  9:19 UTC (permalink / raw)
  To: Dan Williams
  Cc: tglx, linux-arch, Andi Kleen, Kees Cook, kernel-hardening,
	gregkh, x86, Ingo Molnar, Al Viro, H. Peter Anvin, torvalds,
	alan, linux-kernel


* Dan Williams <dan.j.williams@intel.com> wrote:

> Quoting Linus:
> 
>     I do think that it would be a good idea to very expressly document
>     the fact that it's not that the user access itself is unsafe. I do
>     agree that things like "get_user()" want to be protected, but not
>     because of any direct bugs or problems with get_user() and friends,
>     but simply because get_user() is an excellent source of a pointer
>     that is obviously controlled from a potentially attacking user
>     space. So it's a prime candidate for then finding _subsequent_
>     accesses that can then be used to perturb the cache.
> 
> '__uaccess_begin_nospec' covers '__get_user' and 'copy_from_iter' where
> the limit check is far away from the user pointer de-reference. In those
> cases an 'lfence' prevents speculation with a potential pointer to
> privileged memory.

(Same style comments as for the previous patches)

> 
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Suggested-by: Andi Kleen <ak@linux.intel.com>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: x86@kernel.org
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  arch/x86/include/asm/uaccess.h    |    6 +++---
>  arch/x86/include/asm/uaccess_32.h |    6 +++---
>  arch/x86/include/asm/uaccess_64.h |   12 ++++++------
>  arch/x86/lib/usercopy_32.c        |    8 ++++----
>  4 files changed, 16 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
> index 626caf58183a..a930585fa3b5 100644
> --- a/arch/x86/include/asm/uaccess.h
> +++ b/arch/x86/include/asm/uaccess.h
> @@ -450,7 +450,7 @@ do {									\
>  ({									\
>  	int __gu_err;							\
>  	__inttype(*(ptr)) __gu_val;					\
> -	__uaccess_begin();						\
> +	__uaccess_begin_nospec();					\
>  	__get_user_size(__gu_val, (ptr), (size), __gu_err, -EFAULT);	\
>  	__uaccess_end();						\
>  	(x) = (__force __typeof__(*(ptr)))__gu_val;			\
> @@ -557,7 +557,7 @@ struct __large_struct { unsigned long buf[100]; };
>   *	get_user_ex(...);
>   * } get_user_catch(err)
>   */
> -#define get_user_try		uaccess_try
> +#define get_user_try		uaccess_try_nospec
>  #define get_user_catch(err)	uaccess_catch(err)
>  
>  #define get_user_ex(x, ptr)	do {					\
> @@ -591,7 +591,7 @@ extern void __cmpxchg_wrong_size(void)
>  	__typeof__(ptr) __uval = (uval);				\
>  	__typeof__(*(ptr)) __old = (old);				\
>  	__typeof__(*(ptr)) __new = (new);				\
> -	__uaccess_begin();						\
> +	__uaccess_begin_nospec();					\

>  	else
>  		n = __copy_user_intel(to, from, n);
> -	clac();
> +	__uaccess_end();

>  	return n;
>  }
>  EXPORT_SYMBOL(__copy_user_ll);
> @@ -344,7 +344,7 @@ EXPORT_SYMBOL(__copy_user_ll);
>  unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *from,
>  					unsigned long n)
>  {
> -	stac();
> +	__uaccess_begin_nospec();
>  #ifdef CONFIG_X86_INTEL_USERCOPY
>  	if (n > 64 && static_cpu_has(X86_FEATURE_XMM2))
>  		n = __copy_user_intel_nocache(to, from, n);
> @@ -353,7 +353,7 @@ unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *fr
>  #else
>  	__copy_user(to, from, n);
>  #endif
> -	clac();
> +	__uaccess_end();
>  	return n;
>  }
>  EXPORT_SYMBOL(__copy_from_user_ll_nocache_nozero);
> 

These three chunks appears to be unrelated changes changing open-coded clac()s to 
__uaccess_end() calls correctly: please split these out into a separate patch.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v5 06/12] x86, get_user: use pointer masking to limit speculation
  2018-01-27  7:55   ` [kernel-hardening] " Dan Williams
@ 2018-01-28  9:25     ` Ingo Molnar
  -1 siblings, 0 replies; 75+ messages in thread
From: Ingo Molnar @ 2018-01-28  9:25 UTC (permalink / raw)
  To: Dan Williams
  Cc: tglx, linux-arch, Kees Cook, kernel-hardening, gregkh, x86,
	Ingo Molnar, Al Viro, Andy Lutomirski, H. Peter Anvin, torvalds,
	alan, linux-kernel


* Dan Williams <dan.j.williams@intel.com> wrote:

> Quoting Linus:
> 
>     I do think that it would be a good idea to very expressly document
>     the fact that it's not that the user access itself is unsafe. I do
>     agree that things like "get_user()" want to be protected, but not
>     because of any direct bugs or problems with get_user() and friends,
>     but simply because get_user() is an excellent source of a pointer
>     that is obviously controlled from a potentially attacking user
>     space. So it's a prime candidate for then finding _subsequent_
>     accesses that can then be used to perturb the cache.
> 
> Unlike the '__get_user' case 'get_user' includes the address limit check
> near the pointer de-reference. With that locality the speculation can be
> mitigated with pointer narrowing rather than a barrier. Where the
> narrowing is performed by:
> 
> 	cmp %limit, %ptr
> 	sbb %mask, %mask
> 	and %mask, %ptr
> 
> With respect to speculation the value of %ptr is either less than %limit
> or NULL.

(The style problems/inconsistencies of the #2 patch are repeated here too.)

> --- a/arch/x86/lib/getuser.S
> +++ b/arch/x86/lib/getuser.S
> @@ -40,6 +40,8 @@ ENTRY(__get_user_1)
>  	mov PER_CPU_VAR(current_task), %_ASM_DX
>  	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
>  	jae bad_get_user
> +	sbb %_ASM_DX, %_ASM_DX		/* 0 - (uptr < addr_limit) */
> +	and %_ASM_DX, %_ASM_AX
>  	ASM_STAC
>  1:	movzbl (%_ASM_AX),%edx
>  	xor %eax,%eax
> @@ -54,6 +56,8 @@ ENTRY(__get_user_2)
>  	mov PER_CPU_VAR(current_task), %_ASM_DX
>  	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
>  	jae bad_get_user
> +	sbb %_ASM_DX, %_ASM_DX		/* 0 - (uptr < addr_limit) */
> +	and %_ASM_DX, %_ASM_AX
>  	ASM_STAC
>  2:	movzwl -1(%_ASM_AX),%edx
>  	xor %eax,%eax
> @@ -68,6 +72,8 @@ ENTRY(__get_user_4)
>  	mov PER_CPU_VAR(current_task), %_ASM_DX
>  	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
>  	jae bad_get_user
> +	sbb %_ASM_DX, %_ASM_DX		/* 0 - (uptr < addr_limit) */
> +	and %_ASM_DX, %_ASM_AX
>  	ASM_STAC
>  3:	movl -3(%_ASM_AX),%edx
>  	xor %eax,%eax
> @@ -83,6 +89,8 @@ ENTRY(__get_user_8)
>  	mov PER_CPU_VAR(current_task), %_ASM_DX
>  	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
>  	jae bad_get_user
> +	sbb %_ASM_DX, %_ASM_DX		/* 0 - (uptr < addr_limit) */
> +	and %_ASM_DX, %_ASM_AX
>  	ASM_STAC
>  4:	movq -7(%_ASM_AX),%rdx
>  	xor %eax,%eax
> @@ -94,6 +102,8 @@ ENTRY(__get_user_8)
>  	mov PER_CPU_VAR(current_task), %_ASM_DX
>  	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
>  	jae bad_get_user_8
> +	sbb %_ASM_DX, %_ASM_DX		/* 0 - (uptr < addr_limit) */
> +	and %_ASM_DX, %_ASM_AX

Please make it clear in the comments that these are essentially open-coded 
assembly versions of array_idx_mask_nospec(), that the purpose here is to provide 
as a partial speculation barrier against the value range of user-provided 
pointers.

In a couple of years this sequence will be too obscure to understand at first 
glance.

It would also make it easier to find these spots if someone wants to tweak (or 
backport) array_idx_mask_nospec() related changes.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [kernel-hardening] Re: [PATCH v5 06/12] x86, get_user: use pointer masking to limit speculation
@ 2018-01-28  9:25     ` Ingo Molnar
  0 siblings, 0 replies; 75+ messages in thread
From: Ingo Molnar @ 2018-01-28  9:25 UTC (permalink / raw)
  To: Dan Williams
  Cc: tglx, linux-arch, Kees Cook, kernel-hardening, gregkh, x86,
	Ingo Molnar, Al Viro, Andy Lutomirski, H. Peter Anvin, torvalds,
	alan, linux-kernel


* Dan Williams <dan.j.williams@intel.com> wrote:

> Quoting Linus:
> 
>     I do think that it would be a good idea to very expressly document
>     the fact that it's not that the user access itself is unsafe. I do
>     agree that things like "get_user()" want to be protected, but not
>     because of any direct bugs or problems with get_user() and friends,
>     but simply because get_user() is an excellent source of a pointer
>     that is obviously controlled from a potentially attacking user
>     space. So it's a prime candidate for then finding _subsequent_
>     accesses that can then be used to perturb the cache.
> 
> Unlike the '__get_user' case 'get_user' includes the address limit check
> near the pointer de-reference. With that locality the speculation can be
> mitigated with pointer narrowing rather than a barrier. Where the
> narrowing is performed by:
> 
> 	cmp %limit, %ptr
> 	sbb %mask, %mask
> 	and %mask, %ptr
> 
> With respect to speculation the value of %ptr is either less than %limit
> or NULL.

(The style problems/inconsistencies of the #2 patch are repeated here too.)

> --- a/arch/x86/lib/getuser.S
> +++ b/arch/x86/lib/getuser.S
> @@ -40,6 +40,8 @@ ENTRY(__get_user_1)
>  	mov PER_CPU_VAR(current_task), %_ASM_DX
>  	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
>  	jae bad_get_user
> +	sbb %_ASM_DX, %_ASM_DX		/* 0 - (uptr < addr_limit) */
> +	and %_ASM_DX, %_ASM_AX
>  	ASM_STAC
>  1:	movzbl (%_ASM_AX),%edx
>  	xor %eax,%eax
> @@ -54,6 +56,8 @@ ENTRY(__get_user_2)
>  	mov PER_CPU_VAR(current_task), %_ASM_DX
>  	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
>  	jae bad_get_user
> +	sbb %_ASM_DX, %_ASM_DX		/* 0 - (uptr < addr_limit) */
> +	and %_ASM_DX, %_ASM_AX
>  	ASM_STAC
>  2:	movzwl -1(%_ASM_AX),%edx
>  	xor %eax,%eax
> @@ -68,6 +72,8 @@ ENTRY(__get_user_4)
>  	mov PER_CPU_VAR(current_task), %_ASM_DX
>  	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
>  	jae bad_get_user
> +	sbb %_ASM_DX, %_ASM_DX		/* 0 - (uptr < addr_limit) */
> +	and %_ASM_DX, %_ASM_AX
>  	ASM_STAC
>  3:	movl -3(%_ASM_AX),%edx
>  	xor %eax,%eax
> @@ -83,6 +89,8 @@ ENTRY(__get_user_8)
>  	mov PER_CPU_VAR(current_task), %_ASM_DX
>  	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
>  	jae bad_get_user
> +	sbb %_ASM_DX, %_ASM_DX		/* 0 - (uptr < addr_limit) */
> +	and %_ASM_DX, %_ASM_AX
>  	ASM_STAC
>  4:	movq -7(%_ASM_AX),%rdx
>  	xor %eax,%eax
> @@ -94,6 +102,8 @@ ENTRY(__get_user_8)
>  	mov PER_CPU_VAR(current_task), %_ASM_DX
>  	cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
>  	jae bad_get_user_8
> +	sbb %_ASM_DX, %_ASM_DX		/* 0 - (uptr < addr_limit) */
> +	and %_ASM_DX, %_ASM_AX

Please make it clear in the comments that these are essentially open-coded 
assembly versions of array_idx_mask_nospec(), that the purpose here is to provide 
as a partial speculation barrier against the value range of user-provided 
pointers.

In a couple of years this sequence will be too obscure to understand at first 
glance.

It would also make it easier to find these spots if someone wants to tweak (or 
backport) array_idx_mask_nospec() related changes.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v5 07/12] x86: remove the syscall_64 fast-path
  2018-01-27  7:55   ` [kernel-hardening] " Dan Williams
@ 2018-01-28  9:29     ` Ingo Molnar
  -1 siblings, 0 replies; 75+ messages in thread
From: Ingo Molnar @ 2018-01-28  9:29 UTC (permalink / raw)
  To: Dan Williams
  Cc: tglx, linux-arch, kernel-hardening, gregkh, x86, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, torvalds, alan, linux-kernel


* Dan Williams <dan.j.williams@intel.com> wrote:

> Quoting Linus:
> 
>   "Honestly, I'd rather get rid of the fast-path entirely. Compared to
>    all the PTI mess, it's not even noticeable.
> 
>    And if we ever get CPU's that have this all fixed, we can re-visit
>    introducing the fastpath. But this is all very messy and it doesn't
>    seem worth it right now.
> 
>    If we get rid of the fastpath, we can lay out the slow path slightly
>    better, and get rid of some of those jump-overs. And we'd get rid of
>    the ptregs hooks entirely.
> 
>    So we can try to make the "slow" path better while at it, but I
>    really don't think it matters much now in the post-PTI era. Sadly."

Please fix the title to have the proper prefix and to reference the function that 
is actually modified by the patch, i.e. something like:

s/ x86: remove the syscall_64 fast-path
 / x86/entry/64: Remove the entry_SYSCALL_64() fast-path

With the title fixed:

Reviewed-by: Ingo Molnar <mingo@kernel.org>

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [kernel-hardening] Re: [PATCH v5 07/12] x86: remove the syscall_64 fast-path
@ 2018-01-28  9:29     ` Ingo Molnar
  0 siblings, 0 replies; 75+ messages in thread
From: Ingo Molnar @ 2018-01-28  9:29 UTC (permalink / raw)
  To: Dan Williams
  Cc: tglx, linux-arch, kernel-hardening, gregkh, x86, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, torvalds, alan, linux-kernel


* Dan Williams <dan.j.williams@intel.com> wrote:

> Quoting Linus:
> 
>   "Honestly, I'd rather get rid of the fast-path entirely. Compared to
>    all the PTI mess, it's not even noticeable.
> 
>    And if we ever get CPU's that have this all fixed, we can re-visit
>    introducing the fastpath. But this is all very messy and it doesn't
>    seem worth it right now.
> 
>    If we get rid of the fastpath, we can lay out the slow path slightly
>    better, and get rid of some of those jump-overs. And we'd get rid of
>    the ptregs hooks entirely.
> 
>    So we can try to make the "slow" path better while at it, but I
>    really don't think it matters much now in the post-PTI era. Sadly."

Please fix the title to have the proper prefix and to reference the function that 
is actually modified by the patch, i.e. something like:

s/ x86: remove the syscall_64 fast-path
 / x86/entry/64: Remove the entry_SYSCALL_64() fast-path

With the title fixed:

Reviewed-by: Ingo Molnar <mingo@kernel.org>

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v5 08/12] x86: sanitize sycall table de-references under speculation
  2018-01-27  7:55   ` [kernel-hardening] " Dan Williams
@ 2018-01-28  9:36     ` Ingo Molnar
  -1 siblings, 0 replies; 75+ messages in thread
From: Ingo Molnar @ 2018-01-28  9:36 UTC (permalink / raw)
  To: Dan Williams
  Cc: tglx, linux-arch, kernel-hardening, gregkh, x86, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, torvalds, alan


* Dan Williams <dan.j.williams@intel.com> wrote:

> The syscall table base is a user controlled function pointer in kernel
> space. Use 'array_idx' to prevent any out of bounds speculation. While
> retpoline prevents speculating into a userspace directed target it does
> not stop the pointer de-reference, the concern is leaking memory
> relative to the syscall table base, by observing instruction cache
> behavior.

(The style problems/inconsistencies of the previous patches are repeated here too, 
please fix.)
> 
> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  arch/x86/entry/common.c |    3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> index 03505ffbe1b6..f78bf8bfdfae 100644
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -21,6 +21,7 @@
>  #include <linux/export.h>
>  #include <linux/context_tracking.h>
>  #include <linux/user-return-notifier.h>
> +#include <linux/nospec.h>
>  #include <linux/uprobes.h>
>  #include <linux/livepatch.h>
>  #include <linux/syscalls.h>
> @@ -284,6 +285,7 @@ __visible void do_syscall_64(struct pt_regs *regs)
>  	 * regs->orig_ax, which changes the behavior of some syscalls.
>  	 */
>  	if (likely((nr & __SYSCALL_MASK) < NR_syscalls)) {
> +		nr = array_idx(nr & __SYSCALL_MASK, NR_syscalls);
>  		regs->ax = sys_call_table[nr & __SYSCALL_MASK](
>  			regs->di, regs->si, regs->dx,
>  			regs->r10, regs->r8, regs->r9);

Btw., in the future we could optimize the 64-bit fastpath here, by doing something 
like:

	if (unlikely(nr >= NR_syscalls)) {
		nr = array_idx(nr, NR_syscalls);
		...
	} else {
		if ((nr & __SYSCALL_MASK) < NR_syscalls) {
			... X32 ABI ...
		} else {
			... error ...
		}
	}

This would remove 2-3 instructions from the 64-bit syscall fast-path I believe, by 
pushing the x32 details to a slow-path.

But obviously that should not be part of the Spectre series.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [kernel-hardening] Re: [PATCH v5 08/12] x86: sanitize sycall table de-references under speculation
@ 2018-01-28  9:36     ` Ingo Molnar
  0 siblings, 0 replies; 75+ messages in thread
From: Ingo Molnar @ 2018-01-28  9:36 UTC (permalink / raw)
  To: Dan Williams
  Cc: tglx, linux-arch, kernel-hardening, gregkh, x86, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, torvalds, alan


* Dan Williams <dan.j.williams@intel.com> wrote:

> The syscall table base is a user controlled function pointer in kernel
> space. Use 'array_idx' to prevent any out of bounds speculation. While
> retpoline prevents speculating into a userspace directed target it does
> not stop the pointer de-reference, the concern is leaking memory
> relative to the syscall table base, by observing instruction cache
> behavior.

(The style problems/inconsistencies of the previous patches are repeated here too, 
please fix.)
> 
> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  arch/x86/entry/common.c |    3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> index 03505ffbe1b6..f78bf8bfdfae 100644
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -21,6 +21,7 @@
>  #include <linux/export.h>
>  #include <linux/context_tracking.h>
>  #include <linux/user-return-notifier.h>
> +#include <linux/nospec.h>
>  #include <linux/uprobes.h>
>  #include <linux/livepatch.h>
>  #include <linux/syscalls.h>
> @@ -284,6 +285,7 @@ __visible void do_syscall_64(struct pt_regs *regs)
>  	 * regs->orig_ax, which changes the behavior of some syscalls.
>  	 */
>  	if (likely((nr & __SYSCALL_MASK) < NR_syscalls)) {
> +		nr = array_idx(nr & __SYSCALL_MASK, NR_syscalls);
>  		regs->ax = sys_call_table[nr & __SYSCALL_MASK](
>  			regs->di, regs->si, regs->dx,
>  			regs->r10, regs->r8, regs->r9);

Btw., in the future we could optimize the 64-bit fastpath here, by doing something 
like:

	if (unlikely(nr >= NR_syscalls)) {
		nr = array_idx(nr, NR_syscalls);
		...
	} else {
		if ((nr & __SYSCALL_MASK) < NR_syscalls) {
			... X32 ABI ...
		} else {
			... error ...
		}
	}

This would remove 2-3 instructions from the 64-bit syscall fast-path I believe, by 
pushing the x32 details to a slow-path.

But obviously that should not be part of the Spectre series.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v5 12/12] x86/spectre: report get_user mitigation for spectre_v1
  2018-01-27  7:56   ` [kernel-hardening] " Dan Williams
@ 2018-01-28  9:50     ` Ingo Molnar
  -1 siblings, 0 replies; 75+ messages in thread
From: Ingo Molnar @ 2018-01-28  9:50 UTC (permalink / raw)
  To: Dan Williams
  Cc: tglx, linux-arch, kernel-hardening, gregkh, x86, torvalds,
	Ingo Molnar, H. Peter Anvin, Jiri Slaby, alan


* Dan Williams <dan.j.williams@intel.com> wrote:

> Reflect the presence of 'get_user', '__get_user', and 'syscall'
> protections in sysfs. Keep the "Vulnerable" distinction given the
> expectation that the places that have been identified for 'array_idx'
> usage are likely incomplete.

(The style problems/inconsistencies of the previous patches are repeated here too, 
please fix.)

> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Reported-by: Jiri Slaby <jslaby@suse.cz>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  arch/x86/kernel/cpu/bugs.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> index 390b3dc3d438..01d5ba48f745 100644
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -269,7 +269,7 @@ ssize_t cpu_show_spectre_v1(struct device *dev,
>  {
>  	if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V1))
>  		return sprintf(buf, "Not affected\n");
> -	return sprintf(buf, "Vulnerable\n");
> +	return sprintf(buf, "Vulnerable: Minimal user pointer sanitization\n");

Btw., I think this string is still somewhat passive-aggressive towards users, as 
it doesn't really give them any idea about what is missing from their system so 
that they can turn it into not vulnerable.

What else is missing that would turn this into a "Mitigated" entry?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [kernel-hardening] Re: [PATCH v5 12/12] x86/spectre: report get_user mitigation for spectre_v1
@ 2018-01-28  9:50     ` Ingo Molnar
  0 siblings, 0 replies; 75+ messages in thread
From: Ingo Molnar @ 2018-01-28  9:50 UTC (permalink / raw)
  To: Dan Williams
  Cc: tglx, linux-arch, kernel-hardening, gregkh, x86, torvalds,
	Ingo Molnar, H. Peter Anvin, Jiri Slaby, alan


* Dan Williams <dan.j.williams@intel.com> wrote:

> Reflect the presence of 'get_user', '__get_user', and 'syscall'
> protections in sysfs. Keep the "Vulnerable" distinction given the
> expectation that the places that have been identified for 'array_idx'
> usage are likely incomplete.

(The style problems/inconsistencies of the previous patches are repeated here too, 
please fix.)

> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Reported-by: Jiri Slaby <jslaby@suse.cz>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  arch/x86/kernel/cpu/bugs.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> index 390b3dc3d438..01d5ba48f745 100644
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -269,7 +269,7 @@ ssize_t cpu_show_spectre_v1(struct device *dev,
>  {
>  	if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V1))
>  		return sprintf(buf, "Not affected\n");
> -	return sprintf(buf, "Vulnerable\n");
> +	return sprintf(buf, "Vulnerable: Minimal user pointer sanitization\n");

Btw., I think this string is still somewhat passive-aggressive towards users, as 
it doesn't really give them any idea about what is missing from their system so 
that they can turn it into not vulnerable.

What else is missing that would turn this into a "Mitigated" entry?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v5 02/12] array_idx: sanitize speculative array de-references
  2018-01-28  8:55     ` [kernel-hardening] " Ingo Molnar
@ 2018-01-28 11:36       ` Thomas Gleixner
  -1 siblings, 0 replies; 75+ messages in thread
From: Thomas Gleixner @ 2018-01-28 11:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Dan Williams, linux-arch, Cyril Novikov, kernel-hardening,
	Peter Zijlstra, Catalin Marinas, x86, Will Deacon, Russell King,
	Ingo Molnar, gregkh, H. Peter Anvin, torvalds, alan,
	linux-kernel

On Sun, 28 Jan 2018, Ingo Molnar wrote:
> * Dan Williams <dan.j.williams@intel.com> wrote:
> > +
> > +#ifndef __NOSPEC_H__
> > +#define __NOSPEC_H__

_LINUX_NOSPEC_H

We have an established practice for those header guards...

> > +/*
> > + * When idx is out of bounds (idx >= sz), the sign bit will be set.
> > + * Extend the sign bit to all bits and invert, giving a result of zero
> > + * for an out of bounds idx, or ~0UL if within bounds [0, sz).
> > + */
> > +#ifndef array_idx_mask
> > +static inline unsigned long array_idx_mask(unsigned long idx, unsigned long sz)
> > +{
> > +	/*
> > +	 * Warn developers about inappropriate array_idx usage.
> > +	 *
> > +	 * Even if the cpu speculates past the WARN_ONCE branch, the
> 
> s/cpu/CPU
> 
> > +	 * sign bit of idx is taken into account when generating the
> > +	 * mask.
> > +	 *
> > +	 * This warning is compiled out when the compiler can infer that
> > +	 * idx and sz are less than LONG_MAX.
> 
> Please use 'idx' and 'sz' in quotes, to make sure they stand out more in free 
> flowing comment text. Also please use '()' to denote functions/methods.
> 
> I.e. something like:
> 
> 	 * Warn developers about inappropriate array_idx() usage.
> 	 *
> 	 * Even if the CPU speculates past the WARN_ONCE() branch, the
> 	 * sign bit of 'idx' is taken into account when generating the
> 	 * mask.
> 	 *
> 	 * This warning is compiled out when the compiler can infer that
> 	 * 'idx' and 'sz' are less than LONG_MAX.

I rather prefer to have a proper kernel doc instead of the comment above
the function and then use @idx and @sz in the code comments.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [kernel-hardening] Re: [PATCH v5 02/12] array_idx: sanitize speculative array de-references
@ 2018-01-28 11:36       ` Thomas Gleixner
  0 siblings, 0 replies; 75+ messages in thread
From: Thomas Gleixner @ 2018-01-28 11:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Dan Williams, linux-arch, Cyril Novikov, kernel-hardening,
	Peter Zijlstra, Catalin Marinas, x86, Will Deacon, Russell King,
	Ingo Molnar, gregkh, H. Peter Anvin, torvalds, alan,
	linux-kernel

On Sun, 28 Jan 2018, Ingo Molnar wrote:
> * Dan Williams <dan.j.williams@intel.com> wrote:
> > +
> > +#ifndef __NOSPEC_H__
> > +#define __NOSPEC_H__

_LINUX_NOSPEC_H

We have an established practice for those header guards...

> > +/*
> > + * When idx is out of bounds (idx >= sz), the sign bit will be set.
> > + * Extend the sign bit to all bits and invert, giving a result of zero
> > + * for an out of bounds idx, or ~0UL if within bounds [0, sz).
> > + */
> > +#ifndef array_idx_mask
> > +static inline unsigned long array_idx_mask(unsigned long idx, unsigned long sz)
> > +{
> > +	/*
> > +	 * Warn developers about inappropriate array_idx usage.
> > +	 *
> > +	 * Even if the cpu speculates past the WARN_ONCE branch, the
> 
> s/cpu/CPU
> 
> > +	 * sign bit of idx is taken into account when generating the
> > +	 * mask.
> > +	 *
> > +	 * This warning is compiled out when the compiler can infer that
> > +	 * idx and sz are less than LONG_MAX.
> 
> Please use 'idx' and 'sz' in quotes, to make sure they stand out more in free 
> flowing comment text. Also please use '()' to denote functions/methods.
> 
> I.e. something like:
> 
> 	 * Warn developers about inappropriate array_idx() usage.
> 	 *
> 	 * Even if the CPU speculates past the WARN_ONCE() branch, the
> 	 * sign bit of 'idx' is taken into account when generating the
> 	 * mask.
> 	 *
> 	 * This warning is compiled out when the compiler can infer that
> 	 * 'idx' and 'sz' are less than LONG_MAX.

I rather prefer to have a proper kernel doc instead of the comment above
the function and then use @idx and @sz in the code comments.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v5 07/12] x86: remove the syscall_64 fast-path
  2018-01-28  9:29     ` [kernel-hardening] " Ingo Molnar
@ 2018-01-28 15:22       ` Andy Lutomirski
  -1 siblings, 0 replies; 75+ messages in thread
From: Andy Lutomirski @ 2018-01-28 15:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Dan Williams, tglx, linux-arch, kernel-hardening, gregkh, x86,
	Ingo Molnar, Andy Lutomirski, H. Peter Anvin, torvalds, alan,
	linux-kernel




> On Jan 28, 2018, at 1:29 AM, Ingo Molnar <mingo@kernel.org> wrote:
> 
> 
> * Dan Williams <dan.j.williams@intel.com> wrote:
> 
>> Quoting Linus:
>> 
>>  "Honestly, I'd rather get rid of the fast-path entirely. Compared to
>>   all the PTI mess, it's not even noticeable.
>> 
>>   And if we ever get CPU's that have this all fixed, we can re-visit
>>   introducing the fastpath. But this is all very messy and it doesn't
>>   seem worth it right now.
>> 
>>   If we get rid of the fastpath, we can lay out the slow path slightly
>>   better, and get rid of some of those jump-overs. And we'd get rid of
>>   the ptregs hooks entirely.
>> 
>>   So we can try to make the "slow" path better while at it, but I
>>   really don't think it matters much now in the post-PTI era. Sadly."
> 
> Please fix the title to have the proper prefix and to reference the function that 
> is actually modified by the patch, i.e. something like:
> 
> s/ x86: remove the syscall_64 fast-path
> / x86/entry/64: Remove the entry_SYSCALL_64() fast-path
> 
> With the title fixed:
> 
> Reviewed-by: Ingo Molnar <mingo@kernel.org>

I have a very similar but not quite identical version I'll send out shortly.  The difference is that I fixed the silly prologue.

> 
> Thanks,
> 
>    Ingo

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [kernel-hardening] Re: [PATCH v5 07/12] x86: remove the syscall_64 fast-path
@ 2018-01-28 15:22       ` Andy Lutomirski
  0 siblings, 0 replies; 75+ messages in thread
From: Andy Lutomirski @ 2018-01-28 15:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Dan Williams, tglx, linux-arch, kernel-hardening, gregkh, x86,
	Ingo Molnar, Andy Lutomirski, H. Peter Anvin, torvalds, alan,
	linux-kernel




> On Jan 28, 2018, at 1:29 AM, Ingo Molnar <mingo@kernel.org> wrote:
> 
> 
> * Dan Williams <dan.j.williams@intel.com> wrote:
> 
>> Quoting Linus:
>> 
>>  "Honestly, I'd rather get rid of the fast-path entirely. Compared to
>>   all the PTI mess, it's not even noticeable.
>> 
>>   And if we ever get CPU's that have this all fixed, we can re-visit
>>   introducing the fastpath. But this is all very messy and it doesn't
>>   seem worth it right now.
>> 
>>   If we get rid of the fastpath, we can lay out the slow path slightly
>>   better, and get rid of some of those jump-overs. And we'd get rid of
>>   the ptregs hooks entirely.
>> 
>>   So we can try to make the "slow" path better while at it, but I
>>   really don't think it matters much now in the post-PTI era. Sadly."
> 
> Please fix the title to have the proper prefix and to reference the function that 
> is actually modified by the patch, i.e. something like:
> 
> s/ x86: remove the syscall_64 fast-path
> / x86/entry/64: Remove the entry_SYSCALL_64() fast-path
> 
> With the title fixed:
> 
> Reviewed-by: Ingo Molnar <mingo@kernel.org>

I have a very similar but not quite identical version I'll send out shortly.  The difference is that I fixed the silly prologue.

> 
> Thanks,
> 
>    Ingo

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v5 02/12] array_idx: sanitize speculative array de-references
  2018-01-28  8:55     ` [kernel-hardening] " Ingo Molnar
@ 2018-01-28 16:28       ` Dan Williams
  -1 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-01-28 16:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, linux-arch, Cyril Novikov, Kernel Hardening,
	Peter Zijlstra, Catalin Marinas, X86 ML, Will Deacon,
	Russell King, Ingo Molnar, Greg KH, H. Peter Anvin,
	Linus Torvalds, Alan Cox, Linux Kernel Mailing List

On Sun, Jan 28, 2018 at 12:55 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> Firstly, I only got a few patches of this series so I couldn't review all of them
> - please Cc: me to all future Meltdown and Spectre related patches!
>
> * Dan Williams <dan.j.williams@intel.com> wrote:
>
>> 'array_idx' is proposed as a generic mechanism to mitigate against
>> Spectre-variant-1 attacks, i.e. an attack that bypasses boundary checks
>> via speculative execution). The 'array_idx' implementation is expected
>> to be safe for current generation cpus across multiple architectures
>> (ARM, x86).
>
> nit: Stray closing parenthesis
>
> s/cpus/CPUs
>
>> Based on an original implementation by Linus Torvalds, tweaked to remove
>> speculative flows by Alexei Starovoitov, and tweaked again by Linus to
>> introduce an x86 assembly implementation for the mask generation.
>>
>> Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org>
>> Co-developed-by: Alexei Starovoitov <ast@kernel.org>
>> Suggested-by: Cyril Novikov <cnovikov@lynx.com>
>> Cc: Russell King <linux@armlinux.org.uk>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Will Deacon <will.deacon@arm.com>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: x86@kernel.org
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> ---
>>  include/linux/nospec.h |   64 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 64 insertions(+)
>>  create mode 100644 include/linux/nospec.h
>>
>> diff --git a/include/linux/nospec.h b/include/linux/nospec.h
>> new file mode 100644
>> index 000000000000..f59f81889ba3
>> --- /dev/null
>> +++ b/include/linux/nospec.h
>> @@ -0,0 +1,64 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +// Copyright(c) 2018 Intel Corporation. All rights reserved.
>
> Given the close similarity of Linus's array_access() prototype pseudocode there
> should probably also be:
>
>     Copyright (C) 2018 Linus Torvalds
>
> in that file?

Yes, and Alexei as well.

>
>> +
>> +#ifndef __NOSPEC_H__
>> +#define __NOSPEC_H__
>> +
>> +/*
>> + * When idx is out of bounds (idx >= sz), the sign bit will be set.
>> + * Extend the sign bit to all bits and invert, giving a result of zero
>> + * for an out of bounds idx, or ~0UL if within bounds [0, sz).
>> + */
>> +#ifndef array_idx_mask
>> +static inline unsigned long array_idx_mask(unsigned long idx, unsigned long sz)
>> +{
>> +     /*
>> +      * Warn developers about inappropriate array_idx usage.
>> +      *
>> +      * Even if the cpu speculates past the WARN_ONCE branch, the
>
> s/cpu/CPU
>
>> +      * sign bit of idx is taken into account when generating the
>> +      * mask.
>> +      *
>> +      * This warning is compiled out when the compiler can infer that
>> +      * idx and sz are less than LONG_MAX.
>
> Please use 'idx' and 'sz' in quotes, to make sure they stand out more in free
> flowing comment text. Also please use '()' to denote functions/methods.
>
> I.e. something like:
>
>          * Warn developers about inappropriate array_idx() usage.
>          *
>          * Even if the CPU speculates past the WARN_ONCE() branch, the
>          * sign bit of 'idx' is taken into account when generating the
>          * mask.
>          *
>          * This warning is compiled out when the compiler can infer that
>          * 'idx' and 'sz' are less than LONG_MAX.
>
> That's just one example - please apply it to all comments consistently.
>
>> +      */
>> +     if (WARN_ONCE(idx > LONG_MAX || sz > LONG_MAX,
>> +                     "array_idx limited to range of [0, LONG_MAX]\n"))
>
> Same in user facing messages:
>
>                         "array_idx() limited to range of [0, LONG_MAX]\n"))
>
>> + * For a code sequence like:
>> + *
>> + *     if (idx < sz) {
>> + *         idx = array_idx(idx, sz);
>> + *         val = array[idx];
>> + *     }
>> + *
>> + * ...if the cpu speculates past the bounds check then array_idx() will
>> + * clamp the index within the range of [0, sz).
>
> s/cpu/CPU
>
>> + */
>> +#define array_idx(idx, sz)                                           \
>> +({                                                                   \
>> +     typeof(idx) _i = (idx);                                         \
>> +     typeof(sz) _s = (sz);                                           \
>> +     unsigned long _mask = array_idx_mask(_i, _s);                   \
>> +                                                                     \
>> +     BUILD_BUG_ON(sizeof(_i) > sizeof(long));                        \
>> +     BUILD_BUG_ON(sizeof(_s) > sizeof(long));                        \
>> +                                                                     \
>> +     _i &= _mask;                                                    \
>> +     _i;                                                             \
>> +})
>> +#endif /* __NOSPEC_H__ */
>
> For heaven's sake, please name a size variable as 'size', not 'sz'. We don't have
> a shortage of characters and can deobfuscate common primitives, can we?
>
> Also, beyond the nits, I also hate the namespace here. We have a new generic
> header providing two new methods:
>
>         #include <linux/nospec.h>
>
>         array_idx_mask()
>         array_idx()
>
> which is then optimized for x86 in asm/barrier.h. That's already a non-sequitor.
>
> Then we introduce uaccess API variants with a _nospec() postfix.
>
> Then we add ifence() to x86.
>
> There's no naming coherency to this.

Ingo, I love you, but please take the incredulity down a bit,
especially when I had 'nospec' in all the names in v1. Thomas, Peter,
and Alexei wanted s/nospec_barrier/ifence/ and
s/array_idx_nospec/array_idx/. You can always follow on with a patch
to fix up the names and placements to your liking. While they'll pick
on my name choices, they won't pick on yours, because I simply can't
be bothered to care about a bikeshed color at this point after being
bounced around for 5 revisions of this patch set.

> A better approach would be to signal the 'no speculation' aspect of the
> array_idx() methods already: naming it array_idx_nospec() would be a solution,
> as it clearly avoids speculation beyond the array boundaries.
>
> Also, without seeing the full series it's hard to tell, whether the introduction
> of linux/nospec.h is justified, but it feels somewhat suspect.
>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [kernel-hardening] Re: [PATCH v5 02/12] array_idx: sanitize speculative array de-references
@ 2018-01-28 16:28       ` Dan Williams
  0 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-01-28 16:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, linux-arch, Cyril Novikov, Kernel Hardening,
	Peter Zijlstra, Catalin Marinas, X86 ML, Will Deacon,
	Russell King, Ingo Molnar, Greg KH, H. Peter Anvin,
	Linus Torvalds, Alan Cox, Linux Kernel Mailing List

On Sun, Jan 28, 2018 at 12:55 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> Firstly, I only got a few patches of this series so I couldn't review all of them
> - please Cc: me to all future Meltdown and Spectre related patches!
>
> * Dan Williams <dan.j.williams@intel.com> wrote:
>
>> 'array_idx' is proposed as a generic mechanism to mitigate against
>> Spectre-variant-1 attacks, i.e. an attack that bypasses boundary checks
>> via speculative execution). The 'array_idx' implementation is expected
>> to be safe for current generation cpus across multiple architectures
>> (ARM, x86).
>
> nit: Stray closing parenthesis
>
> s/cpus/CPUs
>
>> Based on an original implementation by Linus Torvalds, tweaked to remove
>> speculative flows by Alexei Starovoitov, and tweaked again by Linus to
>> introduce an x86 assembly implementation for the mask generation.
>>
>> Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org>
>> Co-developed-by: Alexei Starovoitov <ast@kernel.org>
>> Suggested-by: Cyril Novikov <cnovikov@lynx.com>
>> Cc: Russell King <linux@armlinux.org.uk>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Will Deacon <will.deacon@arm.com>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: x86@kernel.org
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> ---
>>  include/linux/nospec.h |   64 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 64 insertions(+)
>>  create mode 100644 include/linux/nospec.h
>>
>> diff --git a/include/linux/nospec.h b/include/linux/nospec.h
>> new file mode 100644
>> index 000000000000..f59f81889ba3
>> --- /dev/null
>> +++ b/include/linux/nospec.h
>> @@ -0,0 +1,64 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +// Copyright(c) 2018 Intel Corporation. All rights reserved.
>
> Given the close similarity of Linus's array_access() prototype pseudocode there
> should probably also be:
>
>     Copyright (C) 2018 Linus Torvalds
>
> in that file?

Yes, and Alexei as well.

>
>> +
>> +#ifndef __NOSPEC_H__
>> +#define __NOSPEC_H__
>> +
>> +/*
>> + * When idx is out of bounds (idx >= sz), the sign bit will be set.
>> + * Extend the sign bit to all bits and invert, giving a result of zero
>> + * for an out of bounds idx, or ~0UL if within bounds [0, sz).
>> + */
>> +#ifndef array_idx_mask
>> +static inline unsigned long array_idx_mask(unsigned long idx, unsigned long sz)
>> +{
>> +     /*
>> +      * Warn developers about inappropriate array_idx usage.
>> +      *
>> +      * Even if the cpu speculates past the WARN_ONCE branch, the
>
> s/cpu/CPU
>
>> +      * sign bit of idx is taken into account when generating the
>> +      * mask.
>> +      *
>> +      * This warning is compiled out when the compiler can infer that
>> +      * idx and sz are less than LONG_MAX.
>
> Please use 'idx' and 'sz' in quotes, to make sure they stand out more in free
> flowing comment text. Also please use '()' to denote functions/methods.
>
> I.e. something like:
>
>          * Warn developers about inappropriate array_idx() usage.
>          *
>          * Even if the CPU speculates past the WARN_ONCE() branch, the
>          * sign bit of 'idx' is taken into account when generating the
>          * mask.
>          *
>          * This warning is compiled out when the compiler can infer that
>          * 'idx' and 'sz' are less than LONG_MAX.
>
> That's just one example - please apply it to all comments consistently.
>
>> +      */
>> +     if (WARN_ONCE(idx > LONG_MAX || sz > LONG_MAX,
>> +                     "array_idx limited to range of [0, LONG_MAX]\n"))
>
> Same in user facing messages:
>
>                         "array_idx() limited to range of [0, LONG_MAX]\n"))
>
>> + * For a code sequence like:
>> + *
>> + *     if (idx < sz) {
>> + *         idx = array_idx(idx, sz);
>> + *         val = array[idx];
>> + *     }
>> + *
>> + * ...if the cpu speculates past the bounds check then array_idx() will
>> + * clamp the index within the range of [0, sz).
>
> s/cpu/CPU
>
>> + */
>> +#define array_idx(idx, sz)                                           \
>> +({                                                                   \
>> +     typeof(idx) _i = (idx);                                         \
>> +     typeof(sz) _s = (sz);                                           \
>> +     unsigned long _mask = array_idx_mask(_i, _s);                   \
>> +                                                                     \
>> +     BUILD_BUG_ON(sizeof(_i) > sizeof(long));                        \
>> +     BUILD_BUG_ON(sizeof(_s) > sizeof(long));                        \
>> +                                                                     \
>> +     _i &= _mask;                                                    \
>> +     _i;                                                             \
>> +})
>> +#endif /* __NOSPEC_H__ */
>
> For heaven's sake, please name a size variable as 'size', not 'sz'. We don't have
> a shortage of characters and can deobfuscate common primitives, can we?
>
> Also, beyond the nits, I also hate the namespace here. We have a new generic
> header providing two new methods:
>
>         #include <linux/nospec.h>
>
>         array_idx_mask()
>         array_idx()
>
> which is then optimized for x86 in asm/barrier.h. That's already a non-sequitor.
>
> Then we introduce uaccess API variants with a _nospec() postfix.
>
> Then we add ifence() to x86.
>
> There's no naming coherency to this.

Ingo, I love you, but please take the incredulity down a bit,
especially when I had 'nospec' in all the names in v1. Thomas, Peter,
and Alexei wanted s/nospec_barrier/ifence/ and
s/array_idx_nospec/array_idx/. You can always follow on with a patch
to fix up the names and placements to your liking. While they'll pick
on my name choices, they won't pick on yours, because I simply can't
be bothered to care about a bikeshed color at this point after being
bounced around for 5 revisions of this patch set.

> A better approach would be to signal the 'no speculation' aspect of the
> array_idx() methods already: naming it array_idx_nospec() would be a solution,
> as it clearly avoids speculation beyond the array boundaries.
>
> Also, without seeing the full series it's hard to tell, whether the introduction
> of linux/nospec.h is justified, but it feels somewhat suspect.
>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v5 02/12] array_idx: sanitize speculative array de-references
  2018-01-28 16:28       ` [kernel-hardening] " Dan Williams
@ 2018-01-28 18:33         ` Ingo Molnar
  -1 siblings, 0 replies; 75+ messages in thread
From: Ingo Molnar @ 2018-01-28 18:33 UTC (permalink / raw)
  To: Dan Williams
  Cc: Thomas Gleixner, linux-arch, Cyril Novikov, Kernel Hardening,
	Peter Zijlstra, Catalin Marinas, X86 ML, Will Deacon,
	Russell King, Ingo Molnar, Greg KH, H. Peter Anvin,
	Linus Torvalds, Alan Cox, Linux Kernel Mailing List

* Dan Williams <dan.j.williams@intel.com> wrote:

> Thomas, Peter, and Alexei wanted s/nospec_barrier/ifence/ and 

I just checked past discussions, and I cannot find that part, got any links or 
message-IDs?

PeterZ's feedback on Jan 8 was:

> On Sun, Jan 07, 2018 at 06:24:11PM -0800, Alexei Starovoitov wrote:
> > How about:
> > CONFIG_SPECTRE1_WORKAROUND_INDEX_MASK
> > CONFIG_SPECTRE1_WORKAROUND_LOAD_FENCE
>
> INSTRUCTION_FENCE if anything. LFENCE for Intel (and now also for AMD as
> per 0592b0bce169) is a misnomer, IFENCE would be a better name for it.

Which in that context clearly talked about the config space and how to name the 
instruction semantics in light of the confusion of LFENCE and MFENCE opcodes on 
Intel and AMD CPUs...

Also, those early reviews were fundamentally low level feedback related to the 
actual functionality of the barriers and their mapping to the hardware.

But the fact is, the current series introduces an inconsistent barrier namespace 
extension of:

	barrier()
	barrier_data()
	mb()
	rmb()
	wmb()
	store_mb()
	read_barrier_depends()
	...
+	ifence()
+	array_idx()
+	array_idx_mask()

This isn't bikeshed painting: _ALL_ existing barrier API names have 'barrier' or 
its abbreviation 'mb' (memory barrier) somewhere in their names, which makes it 
pretty easy to recognize them at a glance.

I'm giving you high level API naming feedback because we are now growing the size 
of the barrier API.

array_idx() on the other hand is pretty much close to a 'worst possible' name:

 - it's named in an overly generic, opaque fashion
 - doesn't indicate it at all that it's a barrier for something

... and since we want all kernel developers to use these facilities correctly, we 
want the names to be good and suggestive as well.

I'd accept pretty much anything else that adds 'barrier' or 'nospec' to the name: 
array_idx_barrier() or array_idx_nospec(). (I'm slightly leaning towards 'nospec' 
because it's more indicative of what is being done, and it also is what we do for 
get uaccess APIs.)

ifence() is a similar departure from existing barrier naming nomenclature, and I'd 
accept pretty much any other variant:

	barrier_nospec()
	ifence_nospec()

The kernel namespace cleanliness rules are clear and consistent, and there's 
nothing new about them:

 - the name of the API should unambiguously refer back to the API category. For
   barriers this common reference is 'barrier' or 'mb'.

 - use postfixes or prefixes consistently: pick one and don't mix them. If we go 
   with a _nospec() variant for the uaccess API names then we should use a similar
   naming for array_idx() and for the new barrier as well - no mixing.

> You can always follow on with a patch to fix up the names and placements to your 
> liking. While they'll pick on my name choices, they won't pick on yours, because 
> I simply can't be bothered to care about a bikeshed color at this point after 
> being bounced around for 5 revisions of this patch set.

Sorry, this kind of dismissive and condescending attitude won't cut it.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [kernel-hardening] Re: [PATCH v5 02/12] array_idx: sanitize speculative array de-references
@ 2018-01-28 18:33         ` Ingo Molnar
  0 siblings, 0 replies; 75+ messages in thread
From: Ingo Molnar @ 2018-01-28 18:33 UTC (permalink / raw)
  To: Dan Williams
  Cc: Thomas Gleixner, linux-arch, Cyril Novikov, Kernel Hardening,
	Peter Zijlstra, Catalin Marinas, X86 ML, Will Deacon,
	Russell King, Ingo Molnar, Greg KH, H. Peter Anvin,
	Linus Torvalds, Alan Cox, Linux Kernel Mailing List

* Dan Williams <dan.j.williams@intel.com> wrote:

> Thomas, Peter, and Alexei wanted s/nospec_barrier/ifence/ and 

I just checked past discussions, and I cannot find that part, got any links or 
message-IDs?

PeterZ's feedback on Jan 8 was:

> On Sun, Jan 07, 2018 at 06:24:11PM -0800, Alexei Starovoitov wrote:
> > How about:
> > CONFIG_SPECTRE1_WORKAROUND_INDEX_MASK
> > CONFIG_SPECTRE1_WORKAROUND_LOAD_FENCE
>
> INSTRUCTION_FENCE if anything. LFENCE for Intel (and now also for AMD as
> per 0592b0bce169) is a misnomer, IFENCE would be a better name for it.

Which in that context clearly talked about the config space and how to name the 
instruction semantics in light of the confusion of LFENCE and MFENCE opcodes on 
Intel and AMD CPUs...

Also, those early reviews were fundamentally low level feedback related to the 
actual functionality of the barriers and their mapping to the hardware.

But the fact is, the current series introduces an inconsistent barrier namespace 
extension of:

	barrier()
	barrier_data()
	mb()
	rmb()
	wmb()
	store_mb()
	read_barrier_depends()
	...
+	ifence()
+	array_idx()
+	array_idx_mask()

This isn't bikeshed painting: _ALL_ existing barrier API names have 'barrier' or 
its abbreviation 'mb' (memory barrier) somewhere in their names, which makes it 
pretty easy to recognize them at a glance.

I'm giving you high level API naming feedback because we are now growing the size 
of the barrier API.

array_idx() on the other hand is pretty much close to a 'worst possible' name:

 - it's named in an overly generic, opaque fashion
 - doesn't indicate it at all that it's a barrier for something

... and since we want all kernel developers to use these facilities correctly, we 
want the names to be good and suggestive as well.

I'd accept pretty much anything else that adds 'barrier' or 'nospec' to the name: 
array_idx_barrier() or array_idx_nospec(). (I'm slightly leaning towards 'nospec' 
because it's more indicative of what is being done, and it also is what we do for 
get uaccess APIs.)

ifence() is a similar departure from existing barrier naming nomenclature, and I'd 
accept pretty much any other variant:

	barrier_nospec()
	ifence_nospec()

The kernel namespace cleanliness rules are clear and consistent, and there's 
nothing new about them:

 - the name of the API should unambiguously refer back to the API category. For
   barriers this common reference is 'barrier' or 'mb'.

 - use postfixes or prefixes consistently: pick one and don't mix them. If we go 
   with a _nospec() variant for the uaccess API names then we should use a similar
   naming for array_idx() and for the new barrier as well - no mixing.

> You can always follow on with a patch to fix up the names and placements to your 
> liking. While they'll pick on my name choices, they won't pick on yours, because 
> I simply can't be bothered to care about a bikeshed color at this point after 
> being bounced around for 5 revisions of this patch set.

Sorry, this kind of dismissive and condescending attitude won't cut it.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [kernel-hardening] Re: [PATCH v5 02/12] array_idx: sanitize speculative array de-references
  2018-01-28 16:28       ` [kernel-hardening] " Dan Williams
@ 2018-01-28 18:36         ` Thomas Gleixner
  -1 siblings, 0 replies; 75+ messages in thread
From: Thomas Gleixner @ 2018-01-28 18:36 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ingo Molnar, linux-arch, Cyril Novikov, Kernel Hardening,
	Peter Zijlstra, Catalin Marinas, X86 ML, Will Deacon,
	Russell King, Ingo Molnar, Greg KH, H. Peter Anvin,
	Linus Torvalds, Alan Cox, Linux Kernel Mailing List

On Sun, 28 Jan 2018, Dan Williams wrote:
> On Sun, Jan 28, 2018 at 12:55 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >> + */
> >> +#define array_idx(idx, sz)                                           \
> >> +({                                                                   \
> >> +     typeof(idx) _i = (idx);                                         \
> >> +     typeof(sz) _s = (sz);                                           \
> >> +     unsigned long _mask = array_idx_mask(_i, _s);                   \
> >> +                                                                     \
> >> +     BUILD_BUG_ON(sizeof(_i) > sizeof(long));                        \
> >> +     BUILD_BUG_ON(sizeof(_s) > sizeof(long));                        \
> >> +                                                                     \
> >> +     _i &= _mask;                                                    \
> >> +     _i;                                                             \
> >> +})
> >> +#endif /* __NOSPEC_H__ */
> >
> > For heaven's sake, please name a size variable as 'size', not 'sz'. We don't have
> > a shortage of characters and can deobfuscate common primitives, can we?
> >
> > Also, beyond the nits, I also hate the namespace here. We have a new generic
> > header providing two new methods:
> >
> >         #include <linux/nospec.h>
> >
> >         array_idx_mask()
> >         array_idx()
> >
> > which is then optimized for x86 in asm/barrier.h. That's already a non-sequitor.
> >
> > Then we introduce uaccess API variants with a _nospec() postfix.
> >
> > Then we add ifence() to x86.
> >
> > There's no naming coherency to this.
> 
> Ingo, I love you, but please take the incredulity down a bit,
> especially when I had 'nospec' in all the names in v1. Thomas, Peter,
> and Alexei wanted s/nospec_barrier/ifence/ and

Sorry, I never was involved in that discussion.

> s/array_idx_nospec/array_idx/. You can always follow on with a patch
> to fix up the names and placements to your liking. While they'll pick
> on my name choices, they won't pick on yours, because I simply can't
> be bothered to care about a bikeshed color at this point after being
> bounced around for 5 revisions of this patch set.

Oh well, we really need this kind of attitude right now. We are all fed up
with that mess, but Ingo and I care about the details, consistency and
general code quality beyond the current rush to get stuff solved. It's our
damned job as maintainers.

If you decide you don't care anymore, please let me know, so I can try to
free up some cycles to pick up the stuff from where you decided to dump it.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v5 02/12] array_idx: sanitize speculative array de-references
@ 2018-01-28 18:36         ` Thomas Gleixner
  0 siblings, 0 replies; 75+ messages in thread
From: Thomas Gleixner @ 2018-01-28 18:36 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ingo Molnar, linux-arch, Cyril Novikov, Kernel Hardening,
	Peter Zijlstra, Catalin Marinas, X86 ML, Will Deacon,
	Russell King, Ingo Molnar, Greg KH, H. Peter Anvin,
	Linus Torvalds, Alan Cox, Linux Kernel Mailing List

On Sun, 28 Jan 2018, Dan Williams wrote:
> On Sun, Jan 28, 2018 at 12:55 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >> + */
> >> +#define array_idx(idx, sz)                                           \
> >> +({                                                                   \
> >> +     typeof(idx) _i = (idx);                                         \
> >> +     typeof(sz) _s = (sz);                                           \
> >> +     unsigned long _mask = array_idx_mask(_i, _s);                   \
> >> +                                                                     \
> >> +     BUILD_BUG_ON(sizeof(_i) > sizeof(long));                        \
> >> +     BUILD_BUG_ON(sizeof(_s) > sizeof(long));                        \
> >> +                                                                     \
> >> +     _i &= _mask;                                                    \
> >> +     _i;                                                             \
> >> +})
> >> +#endif /* __NOSPEC_H__ */
> >
> > For heaven's sake, please name a size variable as 'size', not 'sz'. We don't have
> > a shortage of characters and can deobfuscate common primitives, can we?
> >
> > Also, beyond the nits, I also hate the namespace here. We have a new generic
> > header providing two new methods:
> >
> >         #include <linux/nospec.h>
> >
> >         array_idx_mask()
> >         array_idx()
> >
> > which is then optimized for x86 in asm/barrier.h. That's already a non-sequitor.
> >
> > Then we introduce uaccess API variants with a _nospec() postfix.
> >
> > Then we add ifence() to x86.
> >
> > There's no naming coherency to this.
> 
> Ingo, I love you, but please take the incredulity down a bit,
> especially when I had 'nospec' in all the names in v1. Thomas, Peter,
> and Alexei wanted s/nospec_barrier/ifence/ and

Sorry, I never was involved in that discussion.

> s/array_idx_nospec/array_idx/. You can always follow on with a patch
> to fix up the names and placements to your liking. While they'll pick
> on my name choices, they won't pick on yours, because I simply can't
> be bothered to care about a bikeshed color at this point after being
> bounced around for 5 revisions of this patch set.

Oh well, we really need this kind of attitude right now. We are all fed up
with that mess, but Ingo and I care about the details, consistency and
general code quality beyond the current rush to get stuff solved. It's our
damned job as maintainers.

If you decide you don't care anymore, please let me know, so I can try to
free up some cycles to pick up the stuff from where you decided to dump it.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v5 02/12] array_idx: sanitize speculative array de-references
  2018-01-28 18:33         ` [kernel-hardening] " Ingo Molnar
  (?)
@ 2018-01-29 16:45         ` Dan Williams
  -1 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-01-29 16:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, linux-arch, Cyril Novikov, Kernel Hardening,
	Peter Zijlstra, Catalin Marinas, X86 ML, Will Deacon,
	Russell King, Ingo Molnar, Greg KH, H. Peter Anvin,
	Linus Torvalds, Alan Cox, Linux Kernel Mailing List

On Sun, Jan 28, 2018 at 10:33 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Dan Williams <dan.j.williams@intel.com> wrote:
>
>> Thomas, Peter, and Alexei wanted s/nospec_barrier/ifence/ and
>
> I just checked past discussions, and I cannot find that part, got any links or
> message-IDs?
>
> PeterZ's feedback on Jan 8 was:
>
>> On Sun, Jan 07, 2018 at 06:24:11PM -0800, Alexei Starovoitov wrote:
>> > How about:
>> > CONFIG_SPECTRE1_WORKAROUND_INDEX_MASK
>> > CONFIG_SPECTRE1_WORKAROUND_LOAD_FENCE
>>
>> INSTRUCTION_FENCE if anything. LFENCE for Intel (and now also for AMD as
>> per 0592b0bce169) is a misnomer, IFENCE would be a better name for it.
>
> Which in that context clearly talked about the config space and how to name the
> instruction semantics in light of the confusion of LFENCE and MFENCE opcodes on
> Intel and AMD CPUs...
>
> Also, those early reviews were fundamentally low level feedback related to the
> actual functionality of the barriers and their mapping to the hardware.
>
> But the fact is, the current series introduces an inconsistent barrier namespace
> extension of:
>
>         barrier()
>         barrier_data()
>         mb()
>         rmb()
>         wmb()
>         store_mb()
>         read_barrier_depends()
>         ...
> +       ifence()
> +       array_idx()
> +       array_idx_mask()
>
> This isn't bikeshed painting: _ALL_ existing barrier API names have 'barrier' or
> its abbreviation 'mb' (memory barrier) somewhere in their names, which makes it
> pretty easy to recognize them at a glance.
>
> I'm giving you high level API naming feedback because we are now growing the size
> of the barrier API.
>
> array_idx() on the other hand is pretty much close to a 'worst possible' name:
>
>  - it's named in an overly generic, opaque fashion
>  - doesn't indicate it at all that it's a barrier for something
>
> ... and since we want all kernel developers to use these facilities correctly, we
> want the names to be good and suggestive as well.
>
> I'd accept pretty much anything else that adds 'barrier' or 'nospec' to the name:
> array_idx_barrier() or array_idx_nospec(). (I'm slightly leaning towards 'nospec'
> because it's more indicative of what is being done, and it also is what we do for
> get uaccess APIs.)
>
> ifence() is a similar departure from existing barrier naming nomenclature, and I'd
> accept pretty much any other variant:
>
>         barrier_nospec()
>         ifence_nospec()
>
> The kernel namespace cleanliness rules are clear and consistent, and there's
> nothing new about them:
>
>  - the name of the API should unambiguously refer back to the API category. For
>    barriers this common reference is 'barrier' or 'mb'.
>
>  - use postfixes or prefixes consistently: pick one and don't mix them. If we go
>    with a _nospec() variant for the uaccess API names then we should use a similar
>    naming for array_idx() and for the new barrier as well - no mixing.

This is the feedback I can take action with, thank you.

>
>> You can always follow on with a patch to fix up the names and placements to your
>> liking. While they'll pick on my name choices, they won't pick on yours, because
>> I simply can't be bothered to care about a bikeshed color at this point after
>> being bounced around for 5 revisions of this patch set.
>
> Sorry, this kind of dismissive and condescending attitude won't cut it.

I reacted to your "for heaven's sake", I'll send a v6.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v5 04/12] x86: introduce __uaccess_begin_nospec and ifence
  2018-01-28  9:14     ` [kernel-hardening] " Ingo Molnar
  (?)
@ 2018-01-29 20:41     ` Dan Williams
  2018-01-30  6:56       ` Ingo Molnar
  -1 siblings, 1 reply; 75+ messages in thread
From: Dan Williams @ 2018-01-29 20:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, linux-arch, Tom Lendacky, Andi Kleen, Kees Cook,
	Kernel Hardening, Greg KH, X86 ML, Ingo Molnar, Al Viro,
	H. Peter Anvin, Linus Torvalds, Alan Cox,
	Linux Kernel Mailing List

On Sun, Jan 28, 2018 at 1:14 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Dan Williams <dan.j.williams@intel.com> wrote:
>
>> --- a/arch/x86/include/asm/uaccess.h
>> +++ b/arch/x86/include/asm/uaccess.h
>> @@ -124,6 +124,11 @@ extern int __get_user_bad(void);
>>
>>  #define __uaccess_begin() stac()
>>  #define __uaccess_end()   clac()
>> +#define __uaccess_begin_nospec()     \
>> +({                                   \
>> +     stac();                         \
>> +     ifence();                       \
>> +})
>
> BTW., wouldn't it be better to switch the barrier order here, i.e. to do:
>
>         ifence();                       \
>         stac();                         \
>
> ?
>
> The reason is that stac()/clac() is usually paired, so there's a chance with short
> sequences that it would resolve with 'no externally visible changes to flags'.
>
> Also, there's many cases where flags are modified _inside_ the STAC/CLAC section,
> so grouping them together inside a speculation atom could be beneficial.

I'm struggling a bit to grok this. Are you referring to the state of
the flags from the address limit comparison? That's the result that
needs fencing before speculative execution leaks to to the user
pointer de-reference.

> The flip side is that if the MFENCE stalls the STAC that is ahead of it could be
> processed for 'free' - while it's always post barrier with my suggestion.

This 'for free' aspect is what I aiming for.

>
> But in any case it would be nice to see a discussion of this aspect in the
> changelog, even if the patch does not change.

I'll add a note to the changelog that having the fence after the
'stac' hopefully allows some overlap of the cost of 'stac' and the
flushing of the instruction pipeline.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v5 12/12] x86/spectre: report get_user mitigation for spectre_v1
  2018-01-28  9:50     ` [kernel-hardening] " Ingo Molnar
  (?)
@ 2018-01-29 22:05     ` Dan Williams
  2018-01-31  8:07       ` Ingo Molnar
  -1 siblings, 1 reply; 75+ messages in thread
From: Dan Williams @ 2018-01-29 22:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, linux-arch, Kernel Hardening, Greg KH, X86 ML,
	Linus Torvalds, Ingo Molnar, H. Peter Anvin, Jiri Slaby,
	Alan Cox

On Sun, Jan 28, 2018 at 1:50 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Dan Williams <dan.j.williams@intel.com> wrote:
>
>> Reflect the presence of 'get_user', '__get_user', and 'syscall'
>> protections in sysfs. Keep the "Vulnerable" distinction given the
>> expectation that the places that have been identified for 'array_idx'
>> usage are likely incomplete.
>
> (The style problems/inconsistencies of the previous patches are repeated here too,
> please fix.)
>
>>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: x86@kernel.org
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Reported-by: Jiri Slaby <jslaby@suse.cz>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> ---
>>  arch/x86/kernel/cpu/bugs.c |    2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
>> index 390b3dc3d438..01d5ba48f745 100644
>> --- a/arch/x86/kernel/cpu/bugs.c
>> +++ b/arch/x86/kernel/cpu/bugs.c
>> @@ -269,7 +269,7 @@ ssize_t cpu_show_spectre_v1(struct device *dev,
>>  {
>>       if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V1))
>>               return sprintf(buf, "Not affected\n");
>> -     return sprintf(buf, "Vulnerable\n");
>> +     return sprintf(buf, "Vulnerable: Minimal user pointer sanitization\n");
>
> Btw., I think this string is still somewhat passive-aggressive towards users, as
> it doesn't really give them any idea about what is missing from their system so
> that they can turn it into not vulnerable.
>
> What else is missing that would turn this into a "Mitigated" entry?

Part of the problem is that there are different sub-classes of Spectre
variant1 vulnerabilities. For example, speculating on the value of a
user pointer returned from get_user() is mitigated by these kernel
changes. However, cleaning up occasions where the CPU might speculate
on the validity of a user-controlled pointer offset, or
user-controlled array index is only covered by manual inspection of
some noisy / incomplete tooling results. I.e. the handful of
array_index_nospec() usages in this series is likely incomplete.

The usage of barrier_nospec() in __get_user() and open coded
array_index_nospec() in get_user() does raise the bar and mitigates an
entire class of problems. Perhaps it would be reasonable to have
cpu_show_spectre_v1() emit:

    "Mitigation: __user pointer sanitization"

...with the expectation that the kernel community intends to use new
and better tooling to find more places to use array_index_nospec().
Once there is wider confidence in that tooling, or a compiler that
does it automatically, the kernel can emit:

    "Mitigation: automated user input sanitization"

...or something like that.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v5 02/12] array_idx: sanitize speculative array de-references
  2018-01-28 18:36         ` Thomas Gleixner
  (?)
@ 2018-01-30  6:29         ` Dan Williams
  2018-01-30 19:38           ` Linus Torvalds
  -1 siblings, 1 reply; 75+ messages in thread
From: Dan Williams @ 2018-01-30  6:29 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, linux-arch, Cyril Novikov, Kernel Hardening,
	Peter Zijlstra, Catalin Marinas, X86 ML, Will Deacon,
	Russell King, Ingo Molnar, Greg KH, H. Peter Anvin,
	Linus Torvalds, Alan Cox, Linux Kernel Mailing List

On Sun, Jan 28, 2018 at 10:36 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Sun, 28 Jan 2018, Dan Williams wrote:
>> On Sun, Jan 28, 2018 at 12:55 AM, Ingo Molnar <mingo@kernel.org> wrote:
>> >> + */
>> >> +#define array_idx(idx, sz)                                           \
>> >> +({                                                                   \
>> >> +     typeof(idx) _i = (idx);                                         \
>> >> +     typeof(sz) _s = (sz);                                           \
>> >> +     unsigned long _mask = array_idx_mask(_i, _s);                   \
>> >> +                                                                     \
>> >> +     BUILD_BUG_ON(sizeof(_i) > sizeof(long));                        \
>> >> +     BUILD_BUG_ON(sizeof(_s) > sizeof(long));                        \
>> >> +                                                                     \
>> >> +     _i &= _mask;                                                    \
>> >> +     _i;                                                             \
>> >> +})
>> >> +#endif /* __NOSPEC_H__ */
>> >
>> > For heaven's sake, please name a size variable as 'size', not 'sz'. We don't have
>> > a shortage of characters and can deobfuscate common primitives, can we?
>> >
>> > Also, beyond the nits, I also hate the namespace here. We have a new generic
>> > header providing two new methods:
>> >
>> >         #include <linux/nospec.h>
>> >
>> >         array_idx_mask()
>> >         array_idx()
>> >
>> > which is then optimized for x86 in asm/barrier.h. That's already a non-sequitor.
>> >
>> > Then we introduce uaccess API variants with a _nospec() postfix.
>> >
>> > Then we add ifence() to x86.
>> >
>> > There's no naming coherency to this.
>>
>> Ingo, I love you, but please take the incredulity down a bit,
>> especially when I had 'nospec' in all the names in v1. Thomas, Peter,
>> and Alexei wanted s/nospec_barrier/ifence/ and
>
> Sorry, I never was involved in that discussion.
>
>> s/array_idx_nospec/array_idx/. You can always follow on with a patch
>> to fix up the names and placements to your liking. While they'll pick
>> on my name choices, they won't pick on yours, because I simply can't
>> be bothered to care about a bikeshed color at this point after being
>> bounced around for 5 revisions of this patch set.
>
> Oh well, we really need this kind of attitude right now. We are all fed up
> with that mess, but Ingo and I care about the details, consistency and
> general code quality beyond the current rush to get stuff solved. It's our
> damned job as maintainers.

Of course, and everything about the technical feedback and suggestions
was completely valid, on point, and made the patches that much better.
What wasn't appreciated and what I am tired of grinning and bearing is
the idea that it's only the maintainer that can show emotion, that
it's only the maintainer that can be exasperated and burnt out.

For example, I would have spun v6 the same day (same hour?) had the
mail started, "hey I'm late to the party, why aren't all these helpers
in a new _nospec namespace?". I might have pointed to the mails from
Peter about ifence being his preferred name for a speculation barrier
and Alexei's request to drop nospec [1] and moved on because naming is
a maintainer's prerogative.

> If you decide you don't care anymore, please let me know, so I can try to
> free up some cycles to pick up the stuff from where you decided to dump it.

Care is a two way street. I respect yours and Ingo's workload, please
respect mine.

[1]: https://lkml.org/lkml/2018/1/9/1232

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v5 04/12] x86: introduce __uaccess_begin_nospec and ifence
  2018-01-29 20:41     ` Dan Williams
@ 2018-01-30  6:56       ` Ingo Molnar
  0 siblings, 0 replies; 75+ messages in thread
From: Ingo Molnar @ 2018-01-30  6:56 UTC (permalink / raw)
  To: Dan Williams
  Cc: Thomas Gleixner, linux-arch, Tom Lendacky, Andi Kleen, Kees Cook,
	Kernel Hardening, Greg KH, X86 ML, Ingo Molnar, Al Viro,
	H. Peter Anvin, Linus Torvalds, Alan Cox,
	Linux Kernel Mailing List


* Dan Williams <dan.j.williams@intel.com> wrote:

> > The flip side is that if the MFENCE stalls the STAC that is ahead of it could be
> > processed for 'free' - while it's always post barrier with my suggestion.
> 
> This 'for free' aspect is what I aiming for.

Ok.

> >
> > But in any case it would be nice to see a discussion of this aspect in the
> > changelog, even if the patch does not change.
> 
> I'll add a note to the changelog that having the fence after the
> 'stac' hopefully allows some overlap of the cost of 'stac' and the
> flushing of the instruction pipeline.

Perfect!

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v5 02/12] array_idx: sanitize speculative array de-references
  2018-01-30  6:29         ` Dan Williams
@ 2018-01-30 19:38           ` Linus Torvalds
  2018-01-30 20:13             ` Dan Williams
  0 siblings, 1 reply; 75+ messages in thread
From: Linus Torvalds @ 2018-01-30 19:38 UTC (permalink / raw)
  To: Dan Williams
  Cc: Thomas Gleixner, Ingo Molnar, linux-arch, Cyril Novikov,
	Kernel Hardening, Peter Zijlstra, Catalin Marinas, X86 ML,
	Will Deacon, Russell King, Ingo Molnar, Greg KH, H. Peter Anvin,
	Alan Cox, Linux Kernel Mailing List

On Mon, Jan 29, 2018 at 10:29 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>
> Of course, and everything about the technical feedback and suggestions
> was completely valid, on point, and made the patches that much better.
> What wasn't appreciated and what I am tired of grinning and bearing is
> the idea that it's only the maintainer that can show emotion, that
> it's only the maintainer that can be exasperated and burnt out.

Yeah, I think everybody is a bit tired of - and burnt out by - these
patches, and they are subtler and somewhat more core than most are,
which makes the stakes a bit higher too, and the explanations can be a
bit more difficult.

I think everybody is entitled to being a bit snippy occasionally.
Definitely not just maintainers.

So by all means, push right back.

Anyway, I do think the patches I've seen so far are ok, and the real
reason I'm writing this email is actually more about future patches:
do we have a good handle on where these array index sanitations will
be needed?

Also, while array limit checking was obviously the official
"spectre-v1" issue, I do wonder if there are possible other issues
where mispredicted conditional branches can end up leaking
information?

IOW, is there some work on tooling/analysis/similar? Not asking for
near-term, but more of a "big picture" question..

            Linus

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v5 02/12] array_idx: sanitize speculative array de-references
  2018-01-30 19:38           ` Linus Torvalds
@ 2018-01-30 20:13             ` Dan Williams
  2018-01-30 20:27               ` Van De Ven, Arjan
  0 siblings, 1 reply; 75+ messages in thread
From: Dan Williams @ 2018-01-30 20:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Ingo Molnar, linux-arch, Cyril Novikov,
	Kernel Hardening, Peter Zijlstra, Catalin Marinas, X86 ML,
	Will Deacon, Russell King, Ingo Molnar, Greg KH, H. Peter Anvin,
	Alan Cox, Linux Kernel Mailing List, Arjan Van De Ven

[ adding Arjan ]

On Tue, Jan 30, 2018 at 11:38 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
[..]
> Anyway, I do think the patches I've seen so far are ok, and the real
> reason I'm writing this email is actually more about future patches:
> do we have a good handle on where these array index sanitations will
> be needed?
>
> Also, while array limit checking was obviously the official
> "spectre-v1" issue, I do wonder if there are possible other issues
> where mispredicted conditional branches can end up leaking
> information?
>
> IOW, is there some work on tooling/analysis/similar? Not asking for
> near-term, but more of a "big picture" question..
>
>             Linus

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v5 02/12] array_idx: sanitize speculative array de-references
  2018-01-30 20:13             ` Dan Williams
@ 2018-01-30 20:27               ` Van De Ven, Arjan
  2018-01-31  8:03                 ` Ingo Molnar
  0 siblings, 1 reply; 75+ messages in thread
From: Van De Ven, Arjan @ 2018-01-30 20:27 UTC (permalink / raw)
  To: Williams, Dan J, Linus Torvalds
  Cc: Thomas Gleixner, Ingo Molnar, linux-arch, Cyril Novikov,
	Kernel Hardening, Peter Zijlstra, Catalin Marinas, X86 ML,
	Will Deacon, Russell King, Ingo Molnar, Greg KH, H. Peter Anvin,
	Alan Cox, Linux Kernel Mailing List

> > Anyway, I do think the patches I've seen so far are ok, and the real
> > reason I'm writing this email is actually more about future patches:
> > do we have a good handle on where these array index sanitations will
> > be needed?

the obvious cases are currently obviously being discussed.

but to be realistic, the places people will find will go on for the next two+ years, and the distros will also 
end up needing to backport those for the next two years.

(v1 is like "buffer overflow", it's a class of issues. buffer overflows were not fixed in one patch or overnight)

> > Also, while array limit checking was obviously the official
> > "spectre-v1" issue, I do wonder if there are possible other issues
> > where mispredicted conditional branches can end up leaking
> > information?

there's obviously many things one can do to leak things, for example the v1 paper talks about using the cache
as a mechanism to leak, but you can trivially construct something that uses the TLBs as the channel to leak
(thankfully KPTI cuts that side off) but other variations are possible. I suspect many maser thesis projects
got redirected in the last few months ;-)

> > IOW, is there some work on tooling/analysis/similar? Not asking for
> > near-term, but more of a "big picture" question..

short term there was some extremely rudimentary static analysis done.
clearly that has extreme limitations both in insane rate of false positives, and missing cases.

the real case is to get things like compiler plugins and the like to identify suspect cases.

but we also need to consider changing the kernel a bit. Part of what makes fixing
V1 hard is that security checks on user data are spread in many places across drivers and subsystems.
If we do the security checks more concentrated we cut off many attacks.

For example, we are considering doing a

copy_from_user_struct(*dst, *src, type, validation_function)

where the validation function gets auto-generated from the uapi headers (with some form of annotation)
and if anything fails or the copy faults partway through, the dst is zeroed.

such a beast can do the basic security checks on the user data inside that function(wrapper), and that greatly
limits the number of places we need to take care of.

bonus is that this also will improve the situation of drivers being sloppy with input validation and where 
the current rootholes are happening

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v5 02/12] array_idx: sanitize speculative array de-references
  2018-01-30 20:27               ` Van De Ven, Arjan
@ 2018-01-31  8:03                 ` Ingo Molnar
  2018-01-31 14:13                   ` Van De Ven, Arjan
  0 siblings, 1 reply; 75+ messages in thread
From: Ingo Molnar @ 2018-01-31  8:03 UTC (permalink / raw)
  To: Van De Ven, Arjan
  Cc: Williams, Dan J, Linus Torvalds, Thomas Gleixner, linux-arch,
	Cyril Novikov, Kernel Hardening, Peter Zijlstra, Catalin Marinas,
	X86 ML, Will Deacon, Russell King, Ingo Molnar, Greg KH,
	H. Peter Anvin, Alan Cox, Linux Kernel Mailing List,
	Peter Zijlstra


* Van De Ven, Arjan <arjan.van.de.ven@intel.com> wrote:

> > > IOW, is there some work on tooling/analysis/similar? Not asking for
> > > near-term, but more of a "big picture" question..
> 
> short term there was some extremely rudimentary static analysis done. clearly 
> that has extreme limitations both in insane rate of false positives, and missing 
> cases.

What was the output roughly, how many suspect places that need array_idx_nospec() 
handling: a few, a few dozen, a few hundred or a few thousand?

I'm guessing 'static tool emitted hundreds of sites with many false positives 
included, but the real sites are probably a few dozen' - but that's really just a 
very, very wild guess.

Our whole mindset and approach with these generic APIs obviously very much depends 
on the order of magnitude:

- For array users up to a few dozen per 20 MLOC code base accumulated over 20
  years, and with no more than ~1 new site per kernel release we can probably
  do what we do for buffer overflows: static analyze and fix via 
  array_idx_nospec().

- If it's more than a few dozen then intuitively I'd also be very much in favor of
  compiler help: for example trickle down __user annotations that Sparse uses some
  more and let the compiler sanitize indices or warn about them - without hurting 
  performance of in-kernel array handling.

- If it's "hundreds" then probably both the correctness and the maintenance
  aspects won't scale well to a 20+ MLOC kernel code base - in that case we _have_
  to catch the out of range values at a much earlier stage, at the system call and
  other system ABI level, and probably need to do it via a self-maintaining 
  approach such as annotations/types that also warns about 'unsanitized' uses. But 
  filtering earlier has its own problems as well: mostly the layering violations 
  (high level interfaces often don't know the safe array index range) can create 
  worse bugs and more fragility than what is being fixed ...

- If it's "thousands" then it _clearly_ won't scale and we probably need compiler 
  help: i.e. a compiler that tracks ranges and automatically sanitizes indices. 
  This will have some performance effect, but should result in almost immediate 
  correctness.

Also, IMHO any tooling should very much be open source: this isn't the kind of bug 
that can be identified via code review, so there's no fall-back detection method 
like we have for buffer overflows.

Anyway, the approach taken very much depends on the order of magnitude of such 
affected array users we are targeting ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v5 12/12] x86/spectre: report get_user mitigation for spectre_v1
  2018-01-29 22:05     ` Dan Williams
@ 2018-01-31  8:07       ` Ingo Molnar
  2018-02-01 20:23         ` Dan Williams
  0 siblings, 1 reply; 75+ messages in thread
From: Ingo Molnar @ 2018-01-31  8:07 UTC (permalink / raw)
  To: Dan Williams
  Cc: Thomas Gleixner, linux-arch, Kernel Hardening, Greg KH, X86 ML,
	Linus Torvalds, Ingo Molnar, H. Peter Anvin, Jiri Slaby,
	Alan Cox


* Dan Williams <dan.j.williams@intel.com> wrote:

> On Sun, Jan 28, 2018 at 1:50 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > * Dan Williams <dan.j.williams@intel.com> wrote:
> >
> >> Reflect the presence of 'get_user', '__get_user', and 'syscall'
> >> protections in sysfs. Keep the "Vulnerable" distinction given the
> >> expectation that the places that have been identified for 'array_idx'
> >> usage are likely incomplete.
> >
> > (The style problems/inconsistencies of the previous patches are repeated here too,
> > please fix.)
> >
> >>
> >> Cc: Thomas Gleixner <tglx@linutronix.de>
> >> Cc: Ingo Molnar <mingo@redhat.com>
> >> Cc: "H. Peter Anvin" <hpa@zytor.com>
> >> Cc: x86@kernel.org
> >> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> >> Reported-by: Jiri Slaby <jslaby@suse.cz>
> >> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> >> ---
> >>  arch/x86/kernel/cpu/bugs.c |    2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> >> index 390b3dc3d438..01d5ba48f745 100644
> >> --- a/arch/x86/kernel/cpu/bugs.c
> >> +++ b/arch/x86/kernel/cpu/bugs.c
> >> @@ -269,7 +269,7 @@ ssize_t cpu_show_spectre_v1(struct device *dev,
> >>  {
> >>       if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V1))
> >>               return sprintf(buf, "Not affected\n");
> >> -     return sprintf(buf, "Vulnerable\n");
> >> +     return sprintf(buf, "Vulnerable: Minimal user pointer sanitization\n");
> >
> > Btw., I think this string is still somewhat passive-aggressive towards users, as
> > it doesn't really give them any idea about what is missing from their system so
> > that they can turn it into not vulnerable.
> >
> > What else is missing that would turn this into a "Mitigated" entry?
> 
> Part of the problem is that there are different sub-classes of Spectre
> variant1 vulnerabilities. For example, speculating on the value of a
> user pointer returned from get_user() is mitigated by these kernel
> changes. However, cleaning up occasions where the CPU might speculate
> on the validity of a user-controlled pointer offset, or
> user-controlled array index is only covered by manual inspection of
> some noisy / incomplete tooling results. I.e. the handful of
> array_index_nospec() usages in this series is likely incomplete.
> 
> The usage of barrier_nospec() in __get_user() and open coded
> array_index_nospec() in get_user() does raise the bar and mitigates an
> entire class of problems. Perhaps it would be reasonable to have
> cpu_show_spectre_v1() emit:
> 
>     "Mitigation: __user pointer sanitization"

Yeah, so I think the ideal approach would be if we emitted _both_ pieces of 
information:

     Mitigation: __user pointer sanitization
     Vulnerable: incomplete array index sanitization

I.e. we inform the user about the mitigation measures that are active, but we also 
fully inform that we think it's incomplete at this stage.

> ...with the expectation that the kernel community intends to use new
> and better tooling to find more places to use array_index_nospec().
> Once there is wider confidence in that tooling, or a compiler that
> does it automatically, the kernel can emit:
> 
>     "Mitigation: automated user input sanitization"
> 
> ...or something like that.

Yeah.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v5 02/12] array_idx: sanitize speculative array de-references
  2018-01-31  8:03                 ` Ingo Molnar
@ 2018-01-31 14:13                   ` Van De Ven, Arjan
  2018-01-31 14:21                     ` Greg KH
  0 siblings, 1 reply; 75+ messages in thread
From: Van De Ven, Arjan @ 2018-01-31 14:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Williams, Dan J, Linus Torvalds, Thomas Gleixner, linux-arch,
	Cyril Novikov, Kernel Hardening, Peter Zijlstra, Catalin Marinas,
	X86 ML, Will Deacon, Russell King, Ingo Molnar, Greg KH,
	H. Peter Anvin, Alan Cox, Linux Kernel Mailing List,
	Peter Zijlstra

> > short term there was some extremely rudimentary static analysis done. clearly
> > that has extreme limitations both in insane rate of false positives, and missing
> > cases.
> 
> What was the output roughly, how many suspect places that need
> array_idx_nospec()
> handling: a few, a few dozen, a few hundred or a few thousand?
> 
> I'm guessing 'static tool emitted hundreds of sites with many false positives
> included, but the real sites are probably a few dozen' - but that's really just a
> very, very wild guess.

your guess is pretty accurate; we ended up with some 15 or so places (the first patch kit Dan mailed out)


> 
> - If it's more than a few dozen then intuitively I'd also be very much in favor of
>   compiler help: for example trickle down __user annotations that Sparse uses
> some
>   more and let the compiler sanitize indices or warn about them - without hurting
>   performance of in-kernel array handling.

we need this kind of help even if it's only for the static analysis tool


 
> Also, IMHO any tooling should very much be open source: this isn't the kind of
> bug
> that can be identified via code review, so there's no fall-back detection method
> like we have for buffer overflows.

we absolutely need some good open source tooling; my personal preference will be a compiler plugin; that way you can use all the fancy logic inside the compilers for analysis, and you can make a "I don't care just fix it" option in addition to flagging for human inspection as the kernel would.



^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v5 02/12] array_idx: sanitize speculative array de-references
  2018-01-31 14:13                   ` Van De Ven, Arjan
@ 2018-01-31 14:21                     ` Greg KH
  0 siblings, 0 replies; 75+ messages in thread
From: Greg KH @ 2018-01-31 14:21 UTC (permalink / raw)
  To: Van De Ven, Arjan
  Cc: Ingo Molnar, Williams, Dan J, Linus Torvalds, Thomas Gleixner,
	linux-arch, Cyril Novikov, Kernel Hardening, Peter Zijlstra,
	Catalin Marinas, X86 ML, Will Deacon, Russell King, Ingo Molnar,
	H. Peter Anvin, Alan Cox, Linux Kernel Mailing List,
	Peter Zijlstra

On Wed, Jan 31, 2018 at 02:13:45PM +0000, Van De Ven, Arjan wrote:
> > > short term there was some extremely rudimentary static analysis done. clearly
> > > that has extreme limitations both in insane rate of false positives, and missing
> > > cases.
> > 
> > What was the output roughly, how many suspect places that need
> > array_idx_nospec()
> > handling: a few, a few dozen, a few hundred or a few thousand?
> > 
> > I'm guessing 'static tool emitted hundreds of sites with many false positives
> > included, but the real sites are probably a few dozen' - but that's really just a
> > very, very wild guess.
> 
> your guess is pretty accurate; we ended up with some 15 or so places
> (the first patch kit Dan mailed out)

But in looking at that first patch set, I don't see many, if any, that
could be solved with your proposal of copy_from_user_struct().  The two
networking patches couldn't, the scsi one was just bizarre (seriously,
you had to trust the input from the hardware, if not, there's worse
things happening), and the networking drivers were dealing with other
data streams I think.

So while I love the idea of copy_from_user_struct(), I don't see it as
any type of "this will solve the issues we are trying to address here"
solution :(

I've been emailing the Coverity people recently about this, and they
say they are close to having a ruleset/test that can identify this data
pattern better than the original tool that Intel and others came up
with.  But as I haven't seen the output of it yet, I have no idea if
that's true or not.  Hopefully they will release it in a few days so we
can get an idea of if this is even going to be possible to automatically
scan for at all with their tool.

Other than Coverity, I don't know of any other tool that is trying to
even identify this pattern :(

> > Also, IMHO any tooling should very much be open source: this isn't the kind of
> > bug
> > that can be identified via code review, so there's no fall-back detection method
> > like we have for buffer overflows.
> 
> we absolutely need some good open source tooling; my personal
> preference will be a compiler plugin; that way you can use all the
> fancy logic inside the compilers for analysis, and you can make a "I
> don't care just fix it" option in addition to flagging for human
> inspection as the kernel would.

I thought gcc plugins were going to enable this type of analysis, or
maybe clang plugins, but I have yet to hear of anyone working on this.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v5 12/12] x86/spectre: report get_user mitigation for spectre_v1
  2018-01-31  8:07       ` Ingo Molnar
@ 2018-02-01 20:23         ` Dan Williams
  0 siblings, 0 replies; 75+ messages in thread
From: Dan Williams @ 2018-02-01 20:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, linux-arch, Kernel Hardening, Greg KH, X86 ML,
	Linus Torvalds, Ingo Molnar, H. Peter Anvin, Jiri Slaby,
	Alan Cox

On Wed, Jan 31, 2018 at 12:07 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Dan Williams <dan.j.williams@intel.com> wrote:
>
>> On Sun, Jan 28, 2018 at 1:50 AM, Ingo Molnar <mingo@kernel.org> wrote:
>> >
>> > * Dan Williams <dan.j.williams@intel.com> wrote:
>> >
>> >> Reflect the presence of 'get_user', '__get_user', and 'syscall'
>> >> protections in sysfs. Keep the "Vulnerable" distinction given the
>> >> expectation that the places that have been identified for 'array_idx'
>> >> usage are likely incomplete.
>> >
>> > (The style problems/inconsistencies of the previous patches are repeated here too,
>> > please fix.)
>> >
>> >>
>> >> Cc: Thomas Gleixner <tglx@linutronix.de>
>> >> Cc: Ingo Molnar <mingo@redhat.com>
>> >> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> >> Cc: x86@kernel.org
>> >> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> >> Reported-by: Jiri Slaby <jslaby@suse.cz>
>> >> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> >> ---
>> >>  arch/x86/kernel/cpu/bugs.c |    2 +-
>> >>  1 file changed, 1 insertion(+), 1 deletion(-)
>> >>
>> >> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
>> >> index 390b3dc3d438..01d5ba48f745 100644
>> >> --- a/arch/x86/kernel/cpu/bugs.c
>> >> +++ b/arch/x86/kernel/cpu/bugs.c
>> >> @@ -269,7 +269,7 @@ ssize_t cpu_show_spectre_v1(struct device *dev,
>> >>  {
>> >>       if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V1))
>> >>               return sprintf(buf, "Not affected\n");
>> >> -     return sprintf(buf, "Vulnerable\n");
>> >> +     return sprintf(buf, "Vulnerable: Minimal user pointer sanitization\n");
>> >
>> > Btw., I think this string is still somewhat passive-aggressive towards users, as
>> > it doesn't really give them any idea about what is missing from their system so
>> > that they can turn it into not vulnerable.
>> >
>> > What else is missing that would turn this into a "Mitigated" entry?
>>
>> Part of the problem is that there are different sub-classes of Spectre
>> variant1 vulnerabilities. For example, speculating on the value of a
>> user pointer returned from get_user() is mitigated by these kernel
>> changes. However, cleaning up occasions where the CPU might speculate
>> on the validity of a user-controlled pointer offset, or
>> user-controlled array index is only covered by manual inspection of
>> some noisy / incomplete tooling results. I.e. the handful of
>> array_index_nospec() usages in this series is likely incomplete.
>>
>> The usage of barrier_nospec() in __get_user() and open coded
>> array_index_nospec() in get_user() does raise the bar and mitigates an
>> entire class of problems. Perhaps it would be reasonable to have
>> cpu_show_spectre_v1() emit:
>>
>>     "Mitigation: __user pointer sanitization"
>
> Yeah, so I think the ideal approach would be if we emitted _both_ pieces of
> information:
>
>      Mitigation: __user pointer sanitization
>      Vulnerable: incomplete array index sanitization
>
> I.e. we inform the user about the mitigation measures that are active, but we also
> fully inform that we think it's incomplete at this stage.

But that would assume that it's only array index sanitization that we
need to worry about. The get_user() protections were interesting
because they showed a class of potential leaks near pointer
de-references not necessarily arrays. So I think it's more likely the
case that we'll add more "Mitigation:" lines over time.

^ permalink raw reply	[flat|nested] 75+ messages in thread

end of thread, other threads:[~2018-02-01 20:23 UTC | newest]

Thread overview: 75+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-27  7:55 [PATCH v5 00/12] spectre variant1 mitigations for tip/x86/pti Dan Williams
2018-01-27  7:55 ` [kernel-hardening] " Dan Williams
2018-01-27  7:55 ` Dan Williams
2018-01-27  7:55 ` [PATCH v5 01/12] Documentation: document array_idx Dan Williams
2018-01-27  7:55   ` [kernel-hardening] " Dan Williams
2018-01-27  7:55 ` [PATCH v5 02/12] array_idx: sanitize speculative array de-references Dan Williams
2018-01-27  7:55   ` [kernel-hardening] " Dan Williams
2018-01-28  8:55   ` Ingo Molnar
2018-01-28  8:55     ` [kernel-hardening] " Ingo Molnar
2018-01-28 11:36     ` Thomas Gleixner
2018-01-28 11:36       ` [kernel-hardening] " Thomas Gleixner
2018-01-28 16:28     ` Dan Williams
2018-01-28 16:28       ` [kernel-hardening] " Dan Williams
2018-01-28 18:33       ` Ingo Molnar
2018-01-28 18:33         ` [kernel-hardening] " Ingo Molnar
2018-01-29 16:45         ` Dan Williams
2018-01-28 18:36       ` [kernel-hardening] " Thomas Gleixner
2018-01-28 18:36         ` Thomas Gleixner
2018-01-30  6:29         ` Dan Williams
2018-01-30 19:38           ` Linus Torvalds
2018-01-30 20:13             ` Dan Williams
2018-01-30 20:27               ` Van De Ven, Arjan
2018-01-31  8:03                 ` Ingo Molnar
2018-01-31 14:13                   ` Van De Ven, Arjan
2018-01-31 14:21                     ` Greg KH
2018-01-27  7:55 ` [PATCH v5 03/12] x86: implement array_idx_mask Dan Williams
2018-01-27  7:55   ` [kernel-hardening] " Dan Williams
2018-01-28  9:02   ` Ingo Molnar
2018-01-28  9:02     ` [kernel-hardening] " Ingo Molnar
2018-01-27  7:55 ` [PATCH v5 04/12] x86: introduce __uaccess_begin_nospec and ifence Dan Williams
2018-01-27  7:55   ` [kernel-hardening] " Dan Williams
2018-01-28  9:06   ` Ingo Molnar
2018-01-28  9:06     ` [kernel-hardening] " Ingo Molnar
2018-01-28  9:14   ` Ingo Molnar
2018-01-28  9:14     ` [kernel-hardening] " Ingo Molnar
2018-01-29 20:41     ` Dan Williams
2018-01-30  6:56       ` Ingo Molnar
2018-01-27  7:55 ` [PATCH v5 05/12] x86, __get_user: use __uaccess_begin_nospec Dan Williams
2018-01-27  7:55   ` [kernel-hardening] " Dan Williams
2018-01-28  9:19   ` Ingo Molnar
2018-01-28  9:19     ` [kernel-hardening] " Ingo Molnar
2018-01-27  7:55 ` [PATCH v5 06/12] x86, get_user: use pointer masking to limit speculation Dan Williams
2018-01-27  7:55   ` [kernel-hardening] " Dan Williams
2018-01-28  9:25   ` Ingo Molnar
2018-01-28  9:25     ` [kernel-hardening] " Ingo Molnar
2018-01-27  7:55 ` [PATCH v5 07/12] x86: remove the syscall_64 fast-path Dan Williams
2018-01-27  7:55   ` [kernel-hardening] " Dan Williams
2018-01-28  9:29   ` Ingo Molnar
2018-01-28  9:29     ` [kernel-hardening] " Ingo Molnar
2018-01-28 15:22     ` Andy Lutomirski
2018-01-28 15:22       ` [kernel-hardening] " Andy Lutomirski
2018-01-27  7:55 ` [PATCH v5 08/12] x86: sanitize sycall table de-references under speculation Dan Williams
2018-01-27  7:55   ` [kernel-hardening] " Dan Williams
2018-01-28  9:36   ` Ingo Molnar
2018-01-28  9:36     ` [kernel-hardening] " Ingo Molnar
2018-01-27  7:56 ` [PATCH v5 09/12] vfs, fdtable: prevent bounds-check bypass via speculative execution Dan Williams
2018-01-27  7:56   ` [kernel-hardening] " Dan Williams
2018-01-27  7:56 ` [PATCH v5 10/12] kvm, x86: update spectre-v1 mitigation Dan Williams
2018-01-27  7:56   ` [kernel-hardening] " Dan Williams
2018-01-27  7:56 ` [PATCH v5 11/12] nl80211: sanitize array index in parse_txq_params Dan Williams
2018-01-27  7:56   ` [kernel-hardening] " Dan Williams
2018-01-27  7:56   ` Dan Williams
2018-01-27  7:56 ` [PATCH v5 12/12] x86/spectre: report get_user mitigation for spectre_v1 Dan Williams
2018-01-27  7:56   ` [kernel-hardening] " Dan Williams
2018-01-28  9:50   ` Ingo Molnar
2018-01-28  9:50     ` [kernel-hardening] " Ingo Molnar
2018-01-29 22:05     ` Dan Williams
2018-01-31  8:07       ` Ingo Molnar
2018-02-01 20:23         ` Dan Williams
2018-01-27 18:52 ` [PATCH v5 00/12] spectre variant1 mitigations for tip/x86/pti Linus Torvalds
2018-01-27 18:52   ` [kernel-hardening] " Linus Torvalds
2018-01-27 18:52   ` Linus Torvalds
2018-01-27 19:26 ` Dan Williams
2018-01-27 19:26   ` [kernel-hardening] " Dan Williams
2018-01-27 19:26   ` Dan Williams

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.