linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/2] x86: Implement fast refcount overflow protection
@ 2017-07-19  0:03 Kees Cook
  2017-07-19  0:03 ` [PATCH v6 1/2] x86/asm: Add suffix macro for GEN_*_RMWcc() Kees Cook
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Kees Cook @ 2017-07-19  0:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Peter Zijlstra, Josh Poimboeuf, Christoph Hellwig,
	Eric W. Biederman, Andrew Morton, Jann Horn, Eric Biggers,
	Elena Reshetova, Hans Liljestrand, Greg KH, Alexey Dobriyan,
	Serge E. Hallyn, arozansk, Davidlohr Bueso, Manfred Spraul,
	axboe, James Bottomley, x86, Arnd Bergmann, David S. Miller,
	Rik van Riel, linux-kernel, linux-arch, kernel-hardening

This series implements a fast refcount overflow protection for x86, which is
needed to provide coverage for the several refcount-overflow use-after-free
flaws the kernel has seen over the last many years. For example, here are
two from 2016:

http://perception-point.io/2016/01/14/analysis-and-exploitation-of-a-linux-kernel-vulnerability-cve-2016-0728/

http://cyseclabs.com/page?n=02012016

Patch 1 provides support for adding additional assembly to the GEN_*_RMWcc
macros, and patch 2 does the real work. (I left Josh's Reviewed-by since
the exception handler code has remained the same.)

Here is patch 2's full commit log:


This implements refcount_t overflow protection on x86 without a noticeable
performance impact, though without the fuller checking of REFCOUNT_FULL.
This is done by duplicating the existing atomic_t refcount implementation
but with normally a single instruction added to detect if the refcount
has gone negative (i.e. wrapped past INT_MAX or below zero). When
detected, the handler saturates the refcount_t to INT_MIN / 2. With this
overflow protection, the erroneous reference release that would follow
a wrap back to zero is blocked from happening, avoiding the class of
refcount-over-increment use-after-free vulnerabilities entirely.

Only the overflow case of refcounting can be perfectly protected, since it
can be detected and stopped before the reference is freed and left to be
abused by an attacker. This implementation also notices some of the "dec
to 0 without test", and "below 0" cases. However, these only indicate that
a use-after-free may have already happened. Such notifications are likely
avoidable by an attacker that has already exploited a use-after-free
vulnerability, but it's better to have them than allow such conditions to
remain universally silent.

On first overflow detection, the refcount value is reset to INT_MIN / 2
(which serves as a saturation value), the offending process is killed,
and a report and stack trace are produced. When operations detect only
negative value results (such as changing an already saturated value),
saturation still happens but no notification is performed (since the
value was already saturated).

On the matter of races, since the entire range beyond INT_MAX but before
0 is negative, every operation at INT_MIN / 2 will trap, leaving no
overflow-only race condition.

As for performance, this implementation adds a single "js" instruction
to the regular execution flow of a copy of the standard atomic_t refcount
operations. (The non-"and_test" refcount_dec() function, which is uncommon
in regular refcount design patterns, has an additional "jz" instruction
to detect reaching exactly zero.) Since this is a forward jump, it is by
default the non-predicted path, which will be reinforced by dynamic branch
prediction. The result is this protection having virtually no measurable
change in performance over standard atomic_t operations. The error path,
located in .text.unlikely, saves the refcount location and then uses UD0
to fire a refcount exception handler, which resets the refcount, handles
reporting, and returns to regular execution. This keeps the changes to
.text size minimal, avoiding return jumps and open-coded calls to the
error reporting routine.

Example assembly comparison:

refcount_inc before
.text:
ffffffff81546149:       f0 ff 45 f4             lock incl -0xc(%rbp)

refcount_inc after
.text:
ffffffff81546149:       f0 ff 45 f4             lock incl -0xc(%rbp)
ffffffff8154614d:       0f 88 80 d5 17 00       js     ffffffff816c36d3
...
.text.unlikely:
ffffffff816c36d3:       48 8d 4d f4             lea    -0xc(%rbp),%rcx
ffffffff816c36d7:       0f ff                   (bad)

This protection is a modified version of the x86 PAX_REFCOUNT defense
from the last public patch of PaX/grsecurity, based on my understanding of
the code. Changes or omissions from the original code are mine and don't
reflect the original grsecurity/PaX code. Thanks to PaX Team for various
suggestions for improvement.


Thanks!

-Kees

v6:
- use single saturation value (INT_MIN / 2)
- detect refcount_dec() to zero and saturate

v5:
- add unchecked atomic_t implementation when !CONFIG_REFCOUNT_FULL
- use "leal" again, as in v3 for more flexible reset handling
- provide better underflow detection, with saturation

v4:
- switch to js from jns to gain static branch prediction benefits
- use .text.unlikely for js target, effectively making handler __cold
- use UD0 with refcount exception handler instead of int 0x81
- Kconfig defaults on when arch has support

v3:
- drop named text sections until we need to distinguish sizes/directions
- reset value immediately instead of passing back to handler
- drop needless export; josh

v2:
- fix instruction pointer decrement bug; thejh
- switch to js; pax-team
- improve commit log

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v6 1/2] x86/asm: Add suffix macro for GEN_*_RMWcc()
  2017-07-19  0:03 [PATCH v6 0/2] x86: Implement fast refcount overflow protection Kees Cook
@ 2017-07-19  0:03 ` Kees Cook
  2017-07-19  0:03 ` [PATCH v6 2/2] x86/refcount: Implement fast refcount overflow protection Kees Cook
  2017-07-20  9:11 ` [PATCH v6 0/2] x86: " Ingo Molnar
  2 siblings, 0 replies; 20+ messages in thread
From: Kees Cook @ 2017-07-19  0:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Peter Zijlstra, Josh Poimboeuf, Christoph Hellwig,
	Eric W. Biederman, Andrew Morton, Jann Horn, Eric Biggers,
	Elena Reshetova, Hans Liljestrand, Greg KH, Alexey Dobriyan,
	Serge E. Hallyn, arozansk, Davidlohr Bueso, Manfred Spraul,
	axboe, James Bottomley, x86, Arnd Bergmann, David S. Miller,
	Rik van Riel, linux-kernel, linux-arch, kernel-hardening

The coming x86 refcount protection needs to be able to add trailing
instructions to the GEN_*_RMWcc() operations. This extracts the
difference between the goto/non-goto cases so the helper macros
can be defined outside the #ifdef cases. Additionally adds argument
naming to the resulting asm for referencing from suffixed
instructions, and adds clobbers for "cc", and "cx" to let suffixes
use _ASM_CX, and retain any set flags.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/include/asm/rmwcc.h | 37 ++++++++++++++++++++++++-------------
 1 file changed, 24 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/rmwcc.h b/arch/x86/include/asm/rmwcc.h
index 661dd305694a..045f99211a99 100644
--- a/arch/x86/include/asm/rmwcc.h
+++ b/arch/x86/include/asm/rmwcc.h
@@ -1,45 +1,56 @@
 #ifndef _ASM_X86_RMWcc
 #define _ASM_X86_RMWcc
 
+#define __CLOBBERS_MEM		"memory"
+#define __CLOBBERS_MEM_CC_CX	"memory", "cc", "cx"
+
 #if !defined(__GCC_ASM_FLAG_OUTPUTS__) && defined(CC_HAVE_ASM_GOTO)
 
 /* Use asm goto */
 
-#define __GEN_RMWcc(fullop, var, cc, ...)				\
+#define __GEN_RMWcc(fullop, var, cc, clobbers, ...)			\
 do {									\
 	asm_volatile_goto (fullop "; j" #cc " %l[cc_label]"		\
-			: : "m" (var), ## __VA_ARGS__ 			\
-			: "memory" : cc_label);				\
+			: : [counter] "m" (var), ## __VA_ARGS__		\
+			: clobbers : cc_label);				\
 	return 0;							\
 cc_label:								\
 	return 1;							\
 } while (0)
 
-#define GEN_UNARY_RMWcc(op, var, arg0, cc) 				\
-	__GEN_RMWcc(op " " arg0, var, cc)
+#define __BINARY_RMWcc_ARG	" %1, "
 
-#define GEN_BINARY_RMWcc(op, var, vcon, val, arg0, cc)			\
-	__GEN_RMWcc(op " %1, " arg0, var, cc, vcon (val))
 
 #else /* defined(__GCC_ASM_FLAG_OUTPUTS__) || !defined(CC_HAVE_ASM_GOTO) */
 
 /* Use flags output or a set instruction */
 
-#define __GEN_RMWcc(fullop, var, cc, ...)				\
+#define __GEN_RMWcc(fullop, var, cc, clobbers, ...)			\
 do {									\
 	bool c;								\
 	asm volatile (fullop ";" CC_SET(cc)				\
-			: "+m" (var), CC_OUT(cc) (c)			\
-			: __VA_ARGS__ : "memory");			\
+			: [counter] "+m" (var), CC_OUT(cc) (c)		\
+			: __VA_ARGS__ : clobbers);			\
 	return c;							\
 } while (0)
 
+#define __BINARY_RMWcc_ARG	" %2, "
+
+#endif /* defined(__GCC_ASM_FLAG_OUTPUTS__) || !defined(CC_HAVE_ASM_GOTO) */
+
 #define GEN_UNARY_RMWcc(op, var, arg0, cc)				\
-	__GEN_RMWcc(op " " arg0, var, cc)
+	__GEN_RMWcc(op " " arg0, var, cc, __CLOBBERS_MEM)
+
+#define GEN_UNARY_SUFFIXED_RMWcc(op, suffix, var, arg0, cc)		\
+	__GEN_RMWcc(op " " arg0 "\n\t" suffix, var, cc,			\
+		    __CLOBBERS_MEM_CC_CX)
 
 #define GEN_BINARY_RMWcc(op, var, vcon, val, arg0, cc)			\
-	__GEN_RMWcc(op " %2, " arg0, var, cc, vcon (val))
+	__GEN_RMWcc(op __BINARY_RMWcc_ARG arg0, var, cc,		\
+		    __CLOBBERS_MEM, vcon (val))
 
-#endif /* defined(__GCC_ASM_FLAG_OUTPUTS__) || !defined(CC_HAVE_ASM_GOTO) */
+#define GEN_BINARY_SUFFIXED_RMWcc(op, suffix, var, vcon, val, arg0, cc)	\
+	__GEN_RMWcc(op __BINARY_RMWcc_ARG arg0 "\n\t" suffix, var, cc,	\
+		    __CLOBBERS_MEM_CC_CX, vcon (val))
 
 #endif /* _ASM_X86_RMWcc */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v6 2/2] x86/refcount: Implement fast refcount overflow protection
  2017-07-19  0:03 [PATCH v6 0/2] x86: Implement fast refcount overflow protection Kees Cook
  2017-07-19  0:03 ` [PATCH v6 1/2] x86/asm: Add suffix macro for GEN_*_RMWcc() Kees Cook
@ 2017-07-19  0:03 ` Kees Cook
  2017-07-19 19:37   ` Josh Poimboeuf
  2017-07-20  9:11 ` [PATCH v6 0/2] x86: " Ingo Molnar
  2 siblings, 1 reply; 20+ messages in thread
From: Kees Cook @ 2017-07-19  0:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Peter Zijlstra, Josh Poimboeuf, Christoph Hellwig,
	Eric W. Biederman, Andrew Morton, Jann Horn, Eric Biggers,
	Elena Reshetova, Hans Liljestrand, Greg KH, Alexey Dobriyan,
	Serge E. Hallyn, arozansk, Davidlohr Bueso, Manfred Spraul,
	axboe, James Bottomley, x86, Arnd Bergmann, David S. Miller,
	Rik van Riel, linux-kernel, linux-arch, kernel-hardening

This implements refcount_t overflow protection on x86 without a noticeable
performance impact, though without the fuller checking of REFCOUNT_FULL.
This is done by duplicating the existing atomic_t refcount implementation
but with normally a single instruction added to detect if the refcount
has gone negative (i.e. wrapped past INT_MAX or below zero). When
detected, the handler saturates the refcount_t to INT_MIN / 2. With this
overflow protection, the erroneous reference release that would follow
a wrap back to zero is blocked from happening, avoiding the class of
refcount-over-increment use-after-free vulnerabilities entirely.

Only the overflow case of refcounting can be perfectly protected, since it
can be detected and stopped before the reference is freed and left to be
abused by an attacker. This implementation also notices some of the "dec
to 0 without test", and "below 0" cases. However, these only indicate that
a use-after-free may have already happened. Such notifications are likely
avoidable by an attacker that has already exploited a use-after-free
vulnerability, but it's better to have them than allow such conditions to
remain universally silent.

On first overflow detection, the refcount value is reset to INT_MIN / 2
(which serves as a saturation value), the offending process is killed,
and a report and stack trace are produced. When operations detect only
negative value results (such as changing an already saturated value),
saturation still happens but no notification is performed (since the
value was already saturated).

On the matter of races, since the entire range beyond INT_MAX but before
0 is negative, every operation at INT_MIN / 2 will trap, leaving no
overflow-only race condition.

As for performance, this implementation adds a single "js" instruction
to the regular execution flow of a copy of the standard atomic_t refcount
operations. (The non-"and_test" refcount_dec() function, which is uncommon
in regular refcount design patterns, has an additional "jz" instruction
to detect reaching exactly zero.) Since this is a forward jump, it is by
default the non-predicted path, which will be reinforced by dynamic branch
prediction. The result is this protection having virtually no measurable
change in performance over standard atomic_t operations. The error path,
located in .text.unlikely, saves the refcount location and then uses UD0
to fire a refcount exception handler, which resets the refcount, handles
reporting, and returns to regular execution. This keeps the changes to
.text size minimal, avoiding return jumps and open-coded calls to the
error reporting routine.

Example assembly comparison:

refcount_inc before
.text:
ffffffff81546149:       f0 ff 45 f4             lock incl -0xc(%rbp)

refcount_inc after
.text:
ffffffff81546149:       f0 ff 45 f4             lock incl -0xc(%rbp)
ffffffff8154614d:       0f 88 80 d5 17 00       js     ffffffff816c36d3
...
.text.unlikely:
ffffffff816c36d3:       48 8d 4d f4             lea    -0xc(%rbp),%rcx
ffffffff816c36d7:       0f ff                   (bad)

This protection is a modified version of the x86 PAX_REFCOUNT defense
from the last public patch of PaX/grsecurity, based on my understanding of
the code. Changes or omissions from the original code are mine and don't
reflect the original grsecurity/PaX code. Thanks to PaX Team for various
suggestions for improvement.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/Kconfig                    |   9 ++++
 arch/x86/Kconfig                |   1 +
 arch/x86/include/asm/asm.h      |   6 +++
 arch/x86/include/asm/refcount.h | 105 ++++++++++++++++++++++++++++++++++++++++
 arch/x86/mm/extable.c           |  37 ++++++++++++++
 include/linux/kernel.h          |   6 +++
 include/linux/refcount.h        |   4 ++
 kernel/panic.c                  |  22 +++++++++
 8 files changed, 190 insertions(+)
 create mode 100644 arch/x86/include/asm/refcount.h

diff --git a/arch/Kconfig b/arch/Kconfig
index 21d0089117fe..349185f951cd 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -931,6 +931,15 @@ config STRICT_MODULE_RWX
 config ARCH_WANT_RELAX_ORDER
 	bool
 
+config ARCH_HAS_REFCOUNT
+	bool
+	help
+	  An architecture selects this when it has implemented refcount_t
+	  using primitizes that provide a faster runtime at the expense
+	  of some full refcount state checks. The refcount overflow condition,
+	  however, must be retained. Catching overflows is the primary
+	  security concern for protecting against bugs in reference counts.
+
 config REFCOUNT_FULL
 	bool "Perform full reference count validation at the expense of speed"
 	help
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 781521b7cf9e..73574c91e857 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -55,6 +55,7 @@ config X86
 	select ARCH_HAS_KCOV			if X86_64
 	select ARCH_HAS_MMIO_FLUSH
 	select ARCH_HAS_PMEM_API		if X86_64
+	select ARCH_HAS_REFCOUNT
 	select ARCH_HAS_UACCESS_FLUSHCACHE	if X86_64
 	select ARCH_HAS_SET_MEMORY
 	select ARCH_HAS_SG_CHAIN
diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
index 7a9df3beb89b..676ee5807d86 100644
--- a/arch/x86/include/asm/asm.h
+++ b/arch/x86/include/asm/asm.h
@@ -74,6 +74,9 @@
 # define _ASM_EXTABLE_EX(from, to)				\
 	_ASM_EXTABLE_HANDLE(from, to, ex_handler_ext)
 
+# define _ASM_EXTABLE_REFCOUNT(from, to)			\
+	_ASM_EXTABLE_HANDLE(from, to, ex_handler_refcount)
+
 # define _ASM_NOKPROBE(entry)					\
 	.pushsection "_kprobe_blacklist","aw" ;			\
 	_ASM_ALIGN ;						\
@@ -123,6 +126,9 @@
 # define _ASM_EXTABLE_EX(from, to)				\
 	_ASM_EXTABLE_HANDLE(from, to, ex_handler_ext)
 
+# define _ASM_EXTABLE_REFCOUNT(from, to)			\
+	_ASM_EXTABLE_HANDLE(from, to, ex_handler_refcount)
+
 /* For C file, we already have NOKPROBE_SYMBOL macro */
 #endif
 
diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h
new file mode 100644
index 000000000000..13b91e850a02
--- /dev/null
+++ b/arch/x86/include/asm/refcount.h
@@ -0,0 +1,105 @@
+#ifndef __ASM_X86_REFCOUNT_H
+#define __ASM_X86_REFCOUNT_H
+/*
+ * x86-specific implementation of refcount_t. Based on PAX_REFCOUNT from
+ * PaX/grsecurity.
+ */
+#include <linux/refcount.h>
+
+/*
+ * Body of refcount error handling: in .text.unlikely, saved into CX the
+ * address of the refcount that has entered a bad state, and trigger an
+ * exception. Fixup address is back in regular execution flow in .text.
+ */
+#define _REFCOUNT_EXCEPTION				\
+	".pushsection .text.unlikely\n"			\
+	"111:\tlea %[counter], %%" _ASM_CX "\n"		\
+	"112:\t" ASM_UD0 "\n"				\
+	".popsection\n"					\
+	"113:\n"					\
+	_ASM_EXTABLE_REFCOUNT(112b, 113b)
+
+/* Trigger refcount exception if refcount result is negative. */
+#define REFCOUNT_CHECK_LT_ZERO				\
+	"js 111f\n\t"					\
+	_REFCOUNT_EXCEPTION
+
+/* Trigger refcount exception if refcount result is zero or negative. */
+#define REFCOUNT_CHECK_LE_ZERO				\
+	"jz 111f\n\t"					\
+	REFCOUNT_CHECK_LT_ZERO
+
+/* Trigger refcount exception unconditionally. */
+#define REFCOUNT_ERROR					\
+	"jmp 111f\n\t"					\
+	_REFCOUNT_EXCEPTION
+
+static __always_inline void refcount_add(unsigned int i, refcount_t *r)
+{
+	asm volatile(LOCK_PREFIX "addl %1,%0\n\t"
+		REFCOUNT_CHECK_LT_ZERO
+		: [counter] "+m" (r->refs.counter)
+		: "ir" (i)
+		: "cc", "cx");
+}
+
+static __always_inline void refcount_inc(refcount_t *r)
+{
+	asm volatile(LOCK_PREFIX "incl %0\n\t"
+		REFCOUNT_CHECK_LT_ZERO
+		: [counter] "+m" (r->refs.counter)
+		: : "cc", "cx");
+}
+
+static __always_inline void refcount_dec(refcount_t *r)
+{
+	asm volatile(LOCK_PREFIX "decl %0\n\t"
+		REFCOUNT_CHECK_LE_ZERO
+		: [counter] "+m" (r->refs.counter)
+		: : "cc", "cx");
+}
+
+static __always_inline __must_check
+bool refcount_sub_and_test(unsigned int i, refcount_t *r)
+{
+	GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl", REFCOUNT_CHECK_LT_ZERO,
+				  r->refs.counter, "er", i, "%0", e);
+}
+
+static __always_inline __must_check bool refcount_dec_and_test(refcount_t *r)
+{
+	GEN_UNARY_SUFFIXED_RMWcc(LOCK_PREFIX "decl", REFCOUNT_CHECK_LT_ZERO,
+				 r->refs.counter, "%0", e);
+}
+
+static __always_inline __must_check
+bool refcount_add_not_zero(unsigned int i, refcount_t *r)
+{
+	int c, result;
+
+	c = atomic_read(&(r->refs));
+	do {
+		if (unlikely(c == 0))
+			return false;
+
+		result = c + i;
+
+		/* Did we try to increment from/to an undesirable state? */
+		if (unlikely(c < 0 || c == INT_MAX || result < c)) {
+			asm volatile(REFCOUNT_ERROR
+				     : : [counter] "m" (r->refs.counter)
+				     : "cc", "cx");
+			break;
+		}
+
+	} while (!atomic_try_cmpxchg(&(r->refs), &c, result));
+
+	return c != 0;
+}
+
+static __always_inline __must_check bool refcount_inc_not_zero(refcount_t *r)
+{
+	return refcount_add_not_zero(1, r);
+}
+
+#endif
diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index 0ea8afcb929c..62f71aad0403 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -36,6 +36,43 @@ bool ex_handler_fault(const struct exception_table_entry *fixup,
 }
 EXPORT_SYMBOL_GPL(ex_handler_fault);
 
+/*
+ * Handler for UD0 exception following a failed test against the
+ * result of a refcount inc/dec/add/sub.
+ */
+bool ex_handler_refcount(const struct exception_table_entry *fixup,
+			 struct pt_regs *regs, int trapnr)
+{
+	bool report = false;
+
+	/*
+	 * If we crossed from INT_MAX to INT_MIN or from -1 to 0, the OF
+	 * flag (result wrapped around) will be set. Additionally, going
+	 * from 1 to 0 will set the ZF flag. In each of these cases we
+	 * will kill the process, since it is the first to trigger the
+	 * condition from an unsaturated state.
+	 */
+	if (regs->flags & (X86_EFLAGS_OF | X86_EFLAGS_ZF))
+		report = true;
+	*(int *)regs->cx = INT_MIN / 2;
+
+	/*
+	 * Strictly speaking, this reports the fixup destination, not
+	 * the fault location, and not the actually overflowing
+	 * instruction, which is the instruction before the "js", but
+	 * since that instruction could be a variety of lengths, just
+	 * report the location after the overflow, which should be close
+	 * enough for finding the overflow, as it's at least back in
+	 * the function, having returned from .text.unlikely.
+	 */
+	regs->ip = ex_fixup_addr(fixup);
+	if (report)
+		refcount_error_report(regs);
+
+	return true;
+}
+EXPORT_SYMBOL_GPL(ex_handler_refcount);
+
 bool ex_handler_ext(const struct exception_table_entry *fixup,
 		   struct pt_regs *regs, int trapnr)
 {
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index bd6d96cf80b1..02ac674e5ce4 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -277,6 +277,12 @@ extern int oops_may_print(void);
 void do_exit(long error_code) __noreturn;
 void complete_and_exit(struct completion *, long) __noreturn;
 
+#ifdef CONFIG_ARCH_HAS_REFCOUNT
+void refcount_error_report(struct pt_regs *regs);
+#else
+static inline void refcount_error_report(struct pt_regs *regs) { }
+#endif
+
 /* Internal, do not use. */
 int __must_check _kstrtoul(const char *s, unsigned int base, unsigned long *res);
 int __must_check _kstrtol(const char *s, unsigned int base, long *res);
diff --git a/include/linux/refcount.h b/include/linux/refcount.h
index 591792c8e5b0..48b7c9c68c4d 100644
--- a/include/linux/refcount.h
+++ b/include/linux/refcount.h
@@ -53,6 +53,9 @@ extern __must_check bool refcount_sub_and_test(unsigned int i, refcount_t *r);
 extern __must_check bool refcount_dec_and_test(refcount_t *r);
 extern void refcount_dec(refcount_t *r);
 #else
+# ifdef CONFIG_ARCH_HAS_REFCOUNT
+#  include <asm/refcount.h>
+# else
 static inline __must_check bool refcount_add_not_zero(unsigned int i, refcount_t *r)
 {
 	return atomic_add_unless(&r->refs, i, 0);
@@ -87,6 +90,7 @@ static inline void refcount_dec(refcount_t *r)
 {
 	atomic_dec(&r->refs);
 }
+# endif /* !CONFIG_ARCH_HAS_REFCOUNT */
 #endif /* CONFIG_REFCOUNT_FULL */
 
 extern __must_check bool refcount_dec_if_one(refcount_t *r);
diff --git a/kernel/panic.c b/kernel/panic.c
index a58932b41700..fb8576ce1638 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -26,6 +26,7 @@
 #include <linux/nmi.h>
 #include <linux/console.h>
 #include <linux/bug.h>
+#include <linux/ratelimit.h>
 
 #define PANIC_TIMER_STEP 100
 #define PANIC_BLINK_SPD 18
@@ -601,6 +602,27 @@ EXPORT_SYMBOL(__stack_chk_fail);
 
 #endif
 
+#ifdef CONFIG_ARCH_HAS_REFCOUNT
+static DEFINE_RATELIMIT_STATE(refcount_ratelimit, 15 * HZ, 3);
+
+void refcount_error_report(struct pt_regs *regs)
+{
+	/* Always make sure triggering process will be terminated. */
+	do_send_sig_info(SIGKILL, SEND_SIG_FORCED, current, true);
+
+	if (!__ratelimit(&refcount_ratelimit))
+		return;
+
+	pr_emerg("refcount overflow detected in: %s:%d, uid/euid: %u/%u\n",
+		current->comm, task_pid_nr(current),
+		from_kuid_munged(&init_user_ns, current_uid()),
+		from_kuid_munged(&init_user_ns, current_euid()));
+	print_symbol(KERN_EMERG "refcount error occurred at: %s\n",
+		instruction_pointer(regs));
+	show_regs(regs);
+}
+#endif
+
 core_param(panic, panic_timeout, int, 0644);
 core_param(pause_on_oops, pause_on_oops, int, 0644);
 core_param(panic_on_warn, panic_on_warn, int, 0644);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 2/2] x86/refcount: Implement fast refcount overflow protection
  2017-07-19  0:03 ` [PATCH v6 2/2] x86/refcount: Implement fast refcount overflow protection Kees Cook
@ 2017-07-19 19:37   ` Josh Poimboeuf
  2017-07-19 19:45     ` Kees Cook
  2017-07-19 23:12     ` Kees Cook
  0 siblings, 2 replies; 20+ messages in thread
From: Josh Poimboeuf @ 2017-07-19 19:37 UTC (permalink / raw)
  To: Kees Cook
  Cc: Ingo Molnar, Peter Zijlstra, Christoph Hellwig,
	Eric W. Biederman, Andrew Morton, Jann Horn, Eric Biggers,
	Elena Reshetova, Hans Liljestrand, Greg KH, Alexey Dobriyan,
	Serge E. Hallyn, arozansk, Davidlohr Bueso, Manfred Spraul,
	axboe, James Bottomley, x86, Arnd Bergmann, David S. Miller,
	Rik van Riel, linux-kernel, linux-arch, kernel-hardening

On Tue, Jul 18, 2017 at 05:03:34PM -0700, Kees Cook wrote:
> +/*
> + * Body of refcount error handling: in .text.unlikely, saved into CX the
> + * address of the refcount that has entered a bad state, and trigger an
> + * exception. Fixup address is back in regular execution flow in .text.
> + */
> +#define _REFCOUNT_EXCEPTION				\
> +	".pushsection .text.unlikely\n"			\
> +	"111:\tlea %[counter], %%" _ASM_CX "\n"		\
> +	"112:\t" ASM_UD0 "\n"				\
> +	".popsection\n"					\
> +	"113:\n"					\
> +	_ASM_EXTABLE_REFCOUNT(112b, 113b)

This confuses the freshly merged objtool 2.0, which is now too smart for
its own good.  It's reporting some errors like:

  >> kernel/sched/autogroup.o: warning: objtool: sched_autogroup_exit()+0x48: return with modified stack frame
  >> kernel/sched/autogroup.o: warning: objtool: .text.unlikely+0x27: stack state mismatch: reg1[3]=-2-40 reg2[3]=-2-24
  >> kernel/sched/autogroup.o: warning: objtool: sched_autogroup_exit()+0x14: stack state mismatch: reg1[3]=-2-40 reg2[3]=-2-24

Because the UD instructions are used for both WARN and BUG, objtool
doesn't know whether control flow continues past the instruction.  So in
cases like this, it needs an "unreachable" annotation.

Here's a patch to fix it, feel free to squash it into yours:


diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h
index 13b91e850a02..e7587db3487c 100644
--- a/arch/x86/include/asm/refcount.h
+++ b/arch/x86/include/asm/refcount.h
@@ -15,6 +15,7 @@
 	".pushsection .text.unlikely\n"			\
 	"111:\tlea %[counter], %%" _ASM_CX "\n"		\
 	"112:\t" ASM_UD0 "\n"				\
+	ASM_UNREACHABLE					\
 	".popsection\n"					\
 	"113:\n"					\
 	_ASM_EXTABLE_REFCOUNT(112b, 113b)
diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index cd4bbe8242bd..85e0b8f42ca0 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -202,15 +202,25 @@
 #endif
 
 #ifdef CONFIG_STACK_VALIDATION
+
 #define annotate_unreachable() ({					\
 	asm("%c0:\t\n"							\
-	    ".pushsection .discard.unreachable\t\n"			\
-	    ".long %c0b - .\t\n"					\
-	    ".popsection\t\n" : : "i" (__LINE__));			\
+	    ".pushsection .discard.unreachable\n\t"			\
+	    ".long %c0b - .\n\t"					\
+	    ".popsection\n\t" : : "i" (__LINE__));			\
 })
+
+#define ASM_UNREACHABLE							\
+	"999: .pushsection .discard.unreachable\n\t"			\
+	".long 999b - .\n\t"						\
+	".popsection\n\t"
+
 #else
+
 #define annotate_unreachable()
-#endif
+#define ASM_UNREACHABLE
+
+#endif /* CONFIG_STACK_VALIDATION */
 
 /*
  * Mark a position in code as unreachable.  This can be used to

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 2/2] x86/refcount: Implement fast refcount overflow protection
  2017-07-19 19:37   ` Josh Poimboeuf
@ 2017-07-19 19:45     ` Kees Cook
  2017-07-19 19:52       ` Josh Poimboeuf
  2017-07-19 23:12     ` Kees Cook
  1 sibling, 1 reply; 20+ messages in thread
From: Kees Cook @ 2017-07-19 19:45 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Ingo Molnar, Peter Zijlstra, Christoph Hellwig,
	Eric W. Biederman, Andrew Morton, Jann Horn, Eric Biggers,
	Elena Reshetova, Hans Liljestrand, Greg KH, Alexey Dobriyan,
	Serge E. Hallyn, arozansk, Davidlohr Bueso, Manfred Spraul,
	axboe, James Bottomley, x86, Arnd Bergmann, David S. Miller,
	Rik van Riel, LKML, linux-arch, kernel-hardening

On Wed, Jul 19, 2017 at 12:37 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Tue, Jul 18, 2017 at 05:03:34PM -0700, Kees Cook wrote:
>> +/*
>> + * Body of refcount error handling: in .text.unlikely, saved into CX the
>> + * address of the refcount that has entered a bad state, and trigger an
>> + * exception. Fixup address is back in regular execution flow in .text.
>> + */
>> +#define _REFCOUNT_EXCEPTION                          \
>> +     ".pushsection .text.unlikely\n"                 \
>> +     "111:\tlea %[counter], %%" _ASM_CX "\n"         \
>> +     "112:\t" ASM_UD0 "\n"                           \
>> +     ".popsection\n"                                 \
>> +     "113:\n"                                        \
>> +     _ASM_EXTABLE_REFCOUNT(112b, 113b)
>
> This confuses the freshly merged objtool 2.0, which is now too smart for
> its own good.  It's reporting some errors like:
>
>   >> kernel/sched/autogroup.o: warning: objtool: sched_autogroup_exit()+0x48: return with modified stack frame
>   >> kernel/sched/autogroup.o: warning: objtool: .text.unlikely+0x27: stack state mismatch: reg1[3]=-2-40 reg2[3]=-2-24
>   >> kernel/sched/autogroup.o: warning: objtool: sched_autogroup_exit()+0x14: stack state mismatch: reg1[3]=-2-40 reg2[3]=-2-24
>
> Because the UD instructions are used for both WARN and BUG, objtool
> doesn't know whether control flow continues past the instruction.  So in
> cases like this, it needs an "unreachable" annotation.
>
> Here's a patch to fix it, feel free to squash it into yours:

Thanks! I'll add it for v7.

> diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h
> index 13b91e850a02..e7587db3487c 100644
> --- a/arch/x86/include/asm/refcount.h
> +++ b/arch/x86/include/asm/refcount.h
> @@ -15,6 +15,7 @@
>         ".pushsection .text.unlikely\n"                 \
>         "111:\tlea %[counter], %%" _ASM_CX "\n"         \
>         "112:\t" ASM_UD0 "\n"                           \
> +       ASM_UNREACHABLE                                 \
>         ".popsection\n"                                 \
>         "113:\n"                                        \
>         _ASM_EXTABLE_REFCOUNT(112b, 113b)
> diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> index cd4bbe8242bd..85e0b8f42ca0 100644
> --- a/include/linux/compiler-gcc.h
> +++ b/include/linux/compiler-gcc.h
> @@ -202,15 +202,25 @@
>  #endif
>
>  #ifdef CONFIG_STACK_VALIDATION
> +
>  #define annotate_unreachable() ({                                      \
>         asm("%c0:\t\n"                                                  \
> -           ".pushsection .discard.unreachable\t\n"                     \
> -           ".long %c0b - .\t\n"                                        \
> -           ".popsection\t\n" : : "i" (__LINE__));                      \
> +           ".pushsection .discard.unreachable\n\t"                     \
> +           ".long %c0b - .\n\t"                                        \
> +           ".popsection\n\t" : : "i" (__LINE__));                      \

Is this just an indentation change?

>  })
> +
> +#define ASM_UNREACHABLE                                                        \
> +       "999: .pushsection .discard.unreachable\n\t"                    \
> +       ".long 999b - .\n\t"                                            \
> +       ".popsection\n\t"

Just so I understand, we'll get a single byte added for each exception
case, but it'll get discarded during final link?

> +
>  #else
> +
>  #define annotate_unreachable()
> -#endif
> +#define ASM_UNREACHABLE
> +
> +#endif /* CONFIG_STACK_VALIDATION */
>
>  /*
>   * Mark a position in code as unreachable.  This can be used to

Thanks!

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 2/2] x86/refcount: Implement fast refcount overflow protection
  2017-07-19 19:45     ` Kees Cook
@ 2017-07-19 19:52       ` Josh Poimboeuf
  2017-07-19 22:50         ` Kees Cook
  0 siblings, 1 reply; 20+ messages in thread
From: Josh Poimboeuf @ 2017-07-19 19:52 UTC (permalink / raw)
  To: Kees Cook
  Cc: Ingo Molnar, Peter Zijlstra, Christoph Hellwig,
	Eric W. Biederman, Andrew Morton, Jann Horn, Eric Biggers,
	Elena Reshetova, Hans Liljestrand, Greg KH, Alexey Dobriyan,
	Serge E. Hallyn, arozansk, Davidlohr Bueso, Manfred Spraul,
	axboe, James Bottomley, x86, Arnd Bergmann, David S. Miller,
	Rik van Riel, LKML, linux-arch, kernel-hardening

On Wed, Jul 19, 2017 at 12:45:19PM -0700, Kees Cook wrote:
> > diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h
> > index 13b91e850a02..e7587db3487c 100644
> > --- a/arch/x86/include/asm/refcount.h
> > +++ b/arch/x86/include/asm/refcount.h
> > @@ -15,6 +15,7 @@
> >         ".pushsection .text.unlikely\n"                 \
> >         "111:\tlea %[counter], %%" _ASM_CX "\n"         \
> >         "112:\t" ASM_UD0 "\n"                           \
> > +       ASM_UNREACHABLE                                 \
> >         ".popsection\n"                                 \
> >         "113:\n"                                        \
> >         _ASM_EXTABLE_REFCOUNT(112b, 113b)
> > diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> > index cd4bbe8242bd..85e0b8f42ca0 100644
> > --- a/include/linux/compiler-gcc.h
> > +++ b/include/linux/compiler-gcc.h
> > @@ -202,15 +202,25 @@
> >  #endif
> >
> >  #ifdef CONFIG_STACK_VALIDATION
> > +
> >  #define annotate_unreachable() ({                                      \
> >         asm("%c0:\t\n"                                                  \
> > -           ".pushsection .discard.unreachable\t\n"                     \
> > -           ".long %c0b - .\t\n"                                        \
> > -           ".popsection\t\n" : : "i" (__LINE__));                      \
> > +           ".pushsection .discard.unreachable\n\t"                     \
> > +           ".long %c0b - .\n\t"                                        \
> > +           ".popsection\n\t" : : "i" (__LINE__));                      \
> 
> Is this just an indentation change?

This was sneaking in a fix to put the tab after the newline instead of
before it.  I figured it's not worth its own commit.

> >  })
> > +
> > +#define ASM_UNREACHABLE                                                        \
> > +       "999: .pushsection .discard.unreachable\n\t"                    \
> > +       ".long 999b - .\n\t"                                            \
> > +       ".popsection\n\t"
> 
> Just so I understand, we'll get a single byte added for each exception
> case, but it'll get discarded during final link?

I think it's four bytes actually, but yeah, the section gets stripped at
vmlinux link time.

-- 
Josh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 2/2] x86/refcount: Implement fast refcount overflow protection
  2017-07-19 19:52       ` Josh Poimboeuf
@ 2017-07-19 22:50         ` Kees Cook
  2017-07-19 23:01           ` Josh Poimboeuf
  0 siblings, 1 reply; 20+ messages in thread
From: Kees Cook @ 2017-07-19 22:50 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Ingo Molnar, Peter Zijlstra, Christoph Hellwig,
	Eric W. Biederman, Andrew Morton, Jann Horn, Eric Biggers,
	Elena Reshetova, Hans Liljestrand, Greg KH, Alexey Dobriyan,
	Serge E. Hallyn, arozansk, Davidlohr Bueso, Manfred Spraul,
	axboe, James Bottomley, x86, Arnd Bergmann, David S. Miller,
	Rik van Riel, LKML, linux-arch, kernel-hardening

On Wed, Jul 19, 2017 at 12:52 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Wed, Jul 19, 2017 at 12:45:19PM -0700, Kees Cook wrote:
>> > diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h
>> > index 13b91e850a02..e7587db3487c 100644
>> > --- a/arch/x86/include/asm/refcount.h
>> > +++ b/arch/x86/include/asm/refcount.h
>> > @@ -15,6 +15,7 @@
>> >         ".pushsection .text.unlikely\n"                 \
>> >         "111:\tlea %[counter], %%" _ASM_CX "\n"         \
>> >         "112:\t" ASM_UD0 "\n"                           \
>> > +       ASM_UNREACHABLE                                 \
>> >         ".popsection\n"                                 \
>> >         "113:\n"                                        \
>> >         _ASM_EXTABLE_REFCOUNT(112b, 113b)
>> > diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
>> > index cd4bbe8242bd..85e0b8f42ca0 100644
>> > --- a/include/linux/compiler-gcc.h
>> > +++ b/include/linux/compiler-gcc.h
>> > @@ -202,15 +202,25 @@
>> >  #endif
>> >
>> >  #ifdef CONFIG_STACK_VALIDATION
>> > +
>> >  #define annotate_unreachable() ({                                      \
>> >         asm("%c0:\t\n"                                                  \
>> > -           ".pushsection .discard.unreachable\t\n"                     \
>> > -           ".long %c0b - .\t\n"                                        \
>> > -           ".popsection\t\n" : : "i" (__LINE__));                      \
>> > +           ".pushsection .discard.unreachable\n\t"                     \
>> > +           ".long %c0b - .\n\t"                                        \
>> > +           ".popsection\n\t" : : "i" (__LINE__));                      \
>>
>> Is this just an indentation change?
>
> This was sneaking in a fix to put the tab after the newline instead of
> before it.  I figured it's not worth its own commit.

Ah! Now I see it. Gotcha.

>> >  })
>> > +
>> > +#define ASM_UNREACHABLE                                                        \
>> > +       "999: .pushsection .discard.unreachable\n\t"                    \
>> > +       ".long 999b - .\n\t"                                            \
>> > +       ".popsection\n\t"
>>
>> Just so I understand, we'll get a single byte added for each exception
>> case, but it'll get discarded during final link?
>
> I think it's four bytes actually, but yeah, the section gets stripped at
> vmlinux link time.

Right, yes.

BTW, I think this needs compiler.h coverage instead of the #else in
compiler-gcc.h (since it's different from how annotate_unreachable is
used only in compiler-gcc.h. I'll adjust.

Also, in looking at CONFIG_STACK_VALIDATION, do you want it to just
warn and skip, or do you want to error out the build if validation
isn't available but it's in the .config?

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 2/2] x86/refcount: Implement fast refcount overflow protection
  2017-07-19 22:50         ` Kees Cook
@ 2017-07-19 23:01           ` Josh Poimboeuf
  0 siblings, 0 replies; 20+ messages in thread
From: Josh Poimboeuf @ 2017-07-19 23:01 UTC (permalink / raw)
  To: Kees Cook
  Cc: Ingo Molnar, Peter Zijlstra, Christoph Hellwig,
	Eric W. Biederman, Andrew Morton, Jann Horn, Eric Biggers,
	Elena Reshetova, Hans Liljestrand, Greg KH, Alexey Dobriyan,
	Serge E. Hallyn, arozansk, Davidlohr Bueso, Manfred Spraul,
	axboe, James Bottomley, x86, Arnd Bergmann, David S. Miller,
	Rik van Riel, LKML, linux-arch, kernel-hardening

On Wed, Jul 19, 2017 at 03:50:14PM -0700, Kees Cook wrote:
> >> >  })
> >> > +
> >> > +#define ASM_UNREACHABLE                                                        \
> >> > +       "999: .pushsection .discard.unreachable\n\t"                    \
> >> > +       ".long 999b - .\n\t"                                            \
> >> > +       ".popsection\n\t"
> >>
> >> Just so I understand, we'll get a single byte added for each exception
> >> case, but it'll get discarded during final link?
> >
> > I think it's four bytes actually, but yeah, the section gets stripped at
> > vmlinux link time.
> 
> Right, yes.
> 
> BTW, I think this needs compiler.h coverage instead of the #else in
> compiler-gcc.h (since it's different from how annotate_unreachable is
> used only in compiler-gcc.h. I'll adjust.

Ah, right.  Sounds good.

> Also, in looking at CONFIG_STACK_VALIDATION, do you want it to just
> warn and skip, or do you want to error out the build if validation
> isn't available but it's in the .config?

I think the current warn and skip behavior is fine.  It's usually not a
life-or-death matter, and if it is, you'll be checking the warnings
anyway.

-- 
Josh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 2/2] x86/refcount: Implement fast refcount overflow protection
  2017-07-19 19:37   ` Josh Poimboeuf
  2017-07-19 19:45     ` Kees Cook
@ 2017-07-19 23:12     ` Kees Cook
  2017-07-19 23:30       ` Josh Poimboeuf
  1 sibling, 1 reply; 20+ messages in thread
From: Kees Cook @ 2017-07-19 23:12 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Ingo Molnar, Peter Zijlstra, Christoph Hellwig,
	Eric W. Biederman, Andrew Morton, Jann Horn, Eric Biggers,
	Elena Reshetova, Hans Liljestrand, Greg KH, Alexey Dobriyan,
	Serge E. Hallyn, arozansk, Davidlohr Bueso, Manfred Spraul,
	axboe, James Bottomley, x86, Arnd Bergmann, David S. Miller,
	Rik van Riel, LKML, linux-arch, kernel-hardening

On Wed, Jul 19, 2017 at 12:37 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> +#define ASM_UNREACHABLE                                                        \
> +       "999: .pushsection .discard.unreachable\n\t"                    \
> +       ".long 999b - .\n\t"                                            \
> +       ".popsection\n\t"

Will this end up running into the same bug fixed with
3d1e236022cc1426b0834565995ddee2ca231cee and the __LINE__ macro? I
could tell under which conditions gcc would do this kind of merging...

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 2/2] x86/refcount: Implement fast refcount overflow protection
  2017-07-19 23:12     ` Kees Cook
@ 2017-07-19 23:30       ` Josh Poimboeuf
  0 siblings, 0 replies; 20+ messages in thread
From: Josh Poimboeuf @ 2017-07-19 23:30 UTC (permalink / raw)
  To: Kees Cook
  Cc: Ingo Molnar, Peter Zijlstra, Christoph Hellwig,
	Eric W. Biederman, Andrew Morton, Jann Horn, Eric Biggers,
	Elena Reshetova, Hans Liljestrand, Greg KH, Alexey Dobriyan,
	Serge E. Hallyn, arozansk, Davidlohr Bueso, Manfred Spraul,
	axboe, James Bottomley, x86, Arnd Bergmann, David S. Miller,
	Rik van Riel, LKML, linux-arch, kernel-hardening

On Wed, Jul 19, 2017 at 04:12:37PM -0700, Kees Cook wrote:
> On Wed, Jul 19, 2017 at 12:37 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > +#define ASM_UNREACHABLE                                                        \
> > +       "999: .pushsection .discard.unreachable\n\t"                    \
> > +       ".long 999b - .\n\t"                                            \
> > +       ".popsection\n\t"
> 
> Will this end up running into the same bug fixed with
> 3d1e236022cc1426b0834565995ddee2ca231cee and the __LINE__ macro? I
> could tell under which conditions gcc would do this kind of merging...

I hope not, but objtool should complain about it if it does.

We could involve the __LINE__ macro, or some other way to make the
labels unique, but that might complicate the code a bit, so we can delay
doing that unless necessary.

-- 
Josh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 0/2] x86: Implement fast refcount overflow protection
  2017-07-19  0:03 [PATCH v6 0/2] x86: Implement fast refcount overflow protection Kees Cook
  2017-07-19  0:03 ` [PATCH v6 1/2] x86/asm: Add suffix macro for GEN_*_RMWcc() Kees Cook
  2017-07-19  0:03 ` [PATCH v6 2/2] x86/refcount: Implement fast refcount overflow protection Kees Cook
@ 2017-07-20  9:11 ` Ingo Molnar
  2017-07-20 17:15   ` Kees Cook
  2017-07-21 21:22   ` Andrew Morton
  2 siblings, 2 replies; 20+ messages in thread
From: Ingo Molnar @ 2017-07-20  9:11 UTC (permalink / raw)
  To: Kees Cook
  Cc: Peter Zijlstra, Josh Poimboeuf, Christoph Hellwig,
	Eric W. Biederman, Andrew Morton, Jann Horn, Eric Biggers,
	Elena Reshetova, Hans Liljestrand, Greg KH, Alexey Dobriyan,
	Serge E. Hallyn, arozansk, Davidlohr Bueso, Manfred Spraul,
	axboe, James Bottomley, x86, Arnd Bergmann, David S. Miller,
	Rik van Riel, linux-kernel, linux-arch, kernel-hardening


* Kees Cook <keescook@chromium.org> wrote:

> This implements refcount_t overflow protection on x86 without a noticeable
> performance impact, though without the fuller checking of REFCOUNT_FULL.
> This is done by duplicating the existing atomic_t refcount implementation
> but with normally a single instruction added to detect if the refcount
> has gone negative (i.e. wrapped past INT_MAX or below zero). When
> detected, the handler saturates the refcount_t to INT_MIN / 2. With this
> overflow protection, the erroneous reference release that would follow
> a wrap back to zero is blocked from happening, avoiding the class of
> refcount-over-increment use-after-free vulnerabilities entirely.
> 
> Only the overflow case of refcounting can be perfectly protected, since it
> can be detected and stopped before the reference is freed and left to be
> abused by an attacker. This implementation also notices some of the "dec
> to 0 without test", and "below 0" cases. However, these only indicate that
> a use-after-free may have already happened. Such notifications are likely
> avoidable by an attacker that has already exploited a use-after-free
> vulnerability, but it's better to have them than allow such conditions to
> remain universally silent.
> 
> On first overflow detection, the refcount value is reset to INT_MIN / 2
> (which serves as a saturation value), the offending process is killed,
> and a report and stack trace are produced. When operations detect only
> negative value results (such as changing an already saturated value),
> saturation still happens but no notification is performed (since the
> value was already saturated).
> 
> On the matter of races, since the entire range beyond INT_MAX but before
> 0 is negative, every operation at INT_MIN / 2 will trap, leaving no
> overflow-only race condition.
> 
> As for performance, this implementation adds a single "js" instruction
> to the regular execution flow of a copy of the standard atomic_t refcount
> operations. (The non-"and_test" refcount_dec() function, which is uncommon
> in regular refcount design patterns, has an additional "jz" instruction
> to detect reaching exactly zero.) Since this is a forward jump, it is by
> default the non-predicted path, which will be reinforced by dynamic branch
> prediction. The result is this protection having virtually no measurable
> change in performance over standard atomic_t operations. The error path,
> located in .text.unlikely, saves the refcount location and then uses UD0
> to fire a refcount exception handler, which resets the refcount, handles
> reporting, and returns to regular execution. This keeps the changes to
> .text size minimal, avoiding return jumps and open-coded calls to the
> error reporting routine.

Pretty nice!

Could you please also create a tabulated quick-comparison of the three variants, 
of all key properties, about behavior, feature and tradeoff differences?

Something like:

				!ARCH_HAS_REFCOUNT	ARCH_HAS_REFCOUNT=y	REFCOUNT_FULL=y

avg fast path instructions:	5			3			10
behavior on overflow:		unsafe, silent		safe,   verbose		safe,   verbose
behavior on underflow:		unsafe, silent		unsafe, verbose		unsafe, verbose
...

etc. - note that this table is just a quick mockup with wild guesses. (Please add 
more comparisons of other aspects as well.)

Such a comparison would make it easier for arch, subsystem and distribution 
maintainers to decide on which variant to use/enable.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 0/2] x86: Implement fast refcount overflow protection
  2017-07-20  9:11 ` [PATCH v6 0/2] x86: " Ingo Molnar
@ 2017-07-20 17:15   ` Kees Cook
  2017-07-20 22:53     ` Kees Cook
  2017-07-21 21:22   ` Andrew Morton
  1 sibling, 1 reply; 20+ messages in thread
From: Kees Cook @ 2017-07-20 17:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Josh Poimboeuf, Christoph Hellwig,
	Eric W. Biederman, Andrew Morton, Jann Horn, Eric Biggers,
	Elena Reshetova, Hans Liljestrand, Greg KH, Alexey Dobriyan,
	Serge E. Hallyn, arozansk, Davidlohr Bueso, Manfred Spraul,
	axboe, James Bottomley, x86, Arnd Bergmann, David S. Miller,
	Rik van Riel, LKML, linux-arch, kernel-hardening

On Thu, Jul 20, 2017 at 2:11 AM, Ingo Molnar <mingo@kernel.org> wrote:
> Could you please also create a tabulated quick-comparison of the three variants,
> of all key properties, about behavior, feature and tradeoff differences?
>
> Something like:
>
>                                 !ARCH_HAS_REFCOUNT      ARCH_HAS_REFCOUNT=y     REFCOUNT_FULL=y
>
> avg fast path instructions:     5                       3                       10
> behavior on overflow:           unsafe, silent          safe,   verbose         safe,   verbose
> behavior on underflow:          unsafe, silent          unsafe, verbose         unsafe, verbose
> ...
>
> etc. - note that this table is just a quick mockup with wild guesses. (Please add
> more comparisons of other aspects as well.)
>
> Such a comparison would make it easier for arch, subsystem and distribution
> maintainers to decide on which variant to use/enable.

Sure, I can write this up. I'm not sure "safe"/"unsafe" is quite that
clean. The differences between -full and -fast are pretty subtle, but
I think I can describe it using the updated LKDTM tests I've written
to compare the two. There are conditions that -fast doesn't catch, but
those cases aren't actually useful for the overflow defense.

As for "avg fast path instructions", do you mean the resulting
assembly for each refcount API function? I think it's going to look
something like "1   2   45", but I'll write it up.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 0/2] x86: Implement fast refcount overflow protection
  2017-07-20 17:15   ` Kees Cook
@ 2017-07-20 22:53     ` Kees Cook
  2017-07-21  7:50       ` Ingo Molnar
  0 siblings, 1 reply; 20+ messages in thread
From: Kees Cook @ 2017-07-20 22:53 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Josh Poimboeuf, Christoph Hellwig,
	Eric W. Biederman, Andrew Morton, Jann Horn, Eric Biggers,
	Elena Reshetova, Hans Liljestrand, Greg KH, Alexey Dobriyan,
	Serge E. Hallyn, arozansk, Davidlohr Bueso, Manfred Spraul,
	axboe, James Bottomley, x86, Arnd Bergmann, David S. Miller,
	Rik van Riel, LKML, linux-arch, kernel-hardening

On Thu, Jul 20, 2017 at 10:15 AM, Kees Cook <keescook@chromium.org> wrote:
> On Thu, Jul 20, 2017 at 2:11 AM, Ingo Molnar <mingo@kernel.org> wrote:
>> Could you please also create a tabulated quick-comparison of the three variants,
>> of all key properties, about behavior, feature and tradeoff differences?
>>
>> Something like:
>>
>>                                 !ARCH_HAS_REFCOUNT      ARCH_HAS_REFCOUNT=y     REFCOUNT_FULL=y
>>
>> avg fast path instructions:     5                       3                       10
>> behavior on overflow:           unsafe, silent          safe,   verbose         safe,   verbose
>> behavior on underflow:          unsafe, silent          unsafe, verbose         unsafe, verbose
>> ...
>>
>> etc. - note that this table is just a quick mockup with wild guesses. (Please add
>> more comparisons of other aspects as well.)
>>
>> Such a comparison would make it easier for arch, subsystem and distribution
>> maintainers to decide on which variant to use/enable.
>
> Sure, I can write this up. I'm not sure "safe"/"unsafe" is quite that
> clean. The differences between -full and -fast are pretty subtle, but
> I think I can describe it using the updated LKDTM tests I've written
> to compare the two. There are conditions that -fast doesn't catch, but
> those cases aren't actually useful for the overflow defense.
>
> As for "avg fast path instructions", do you mean the resulting
> assembly for each refcount API function? I think it's going to look
> something like "1   2   45", but I'll write it up.

So, doing a worst-case timing of a loop of inc() to INT_MAX and then
dec_and_test() back to zero, I see this out of perf:

atomic
25255.114805      task-clock (msec)
 82249267387      cycles
 11208720041      instructions

refcount-fast
25259.577583      task-clock (msec)
 82211446892      cycles
 15486246572      instructions

refcount-full
44625.923432      task-clock (msec)
144814735193      cycles
105937495952      instructions

I'll still summarize all this in the v7 series, but I think that
really clarifies the differences: 1.5x more instructions in -fast, but
nearly identical cycles and clock. Using -full sees a large change (as
expected).

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 0/2] x86: Implement fast refcount overflow protection
  2017-07-20 22:53     ` Kees Cook
@ 2017-07-21  7:50       ` Ingo Molnar
  0 siblings, 0 replies; 20+ messages in thread
From: Ingo Molnar @ 2017-07-21  7:50 UTC (permalink / raw)
  To: Kees Cook
  Cc: Peter Zijlstra, Josh Poimboeuf, Christoph Hellwig,
	Eric W. Biederman, Andrew Morton, Jann Horn, Eric Biggers,
	Elena Reshetova, Hans Liljestrand, Greg KH, Alexey Dobriyan,
	Serge E. Hallyn, arozansk, Davidlohr Bueso, Manfred Spraul,
	axboe, James Bottomley, x86, Arnd Bergmann, David S. Miller,
	Rik van Riel, LKML, linux-arch, kernel-hardening


* Kees Cook <keescook@chromium.org> wrote:

> On Thu, Jul 20, 2017 at 10:15 AM, Kees Cook <keescook@chromium.org> wrote:
> > On Thu, Jul 20, 2017 at 2:11 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >> Could you please also create a tabulated quick-comparison of the three variants,
> >> of all key properties, about behavior, feature and tradeoff differences?
> >>
> >> Something like:
> >>
> >>                                 !ARCH_HAS_REFCOUNT      ARCH_HAS_REFCOUNT=y     REFCOUNT_FULL=y
> >>
> >> avg fast path instructions:     5                       3                       10
> >> behavior on overflow:           unsafe, silent          safe,   verbose         safe,   verbose
> >> behavior on underflow:          unsafe, silent          unsafe, verbose         unsafe, verbose
> >> ...
> >>
> >> etc. - note that this table is just a quick mockup with wild guesses. (Please add
> >> more comparisons of other aspects as well.)
> >>
> >> Such a comparison would make it easier for arch, subsystem and distribution
> >> maintainers to decide on which variant to use/enable.
> >
> > Sure, I can write this up. I'm not sure "safe"/"unsafe" is quite that
> > clean. The differences between -full and -fast are pretty subtle, but
> > I think I can describe it using the updated LKDTM tests I've written
> > to compare the two. There are conditions that -fast doesn't catch, but
> > those cases aren't actually useful for the overflow defense.
> >
> > As for "avg fast path instructions", do you mean the resulting
> > assembly for each refcount API function? I think it's going to look
> > something like "1   2   45", but I'll write it up.
> 
> So, doing a worst-case timing of a loop of inc() to INT_MAX and then
> dec_and_test() back to zero, I see this out of perf:
> 
> atomic
> 25255.114805      task-clock (msec)
>  82249267387      cycles
>  11208720041      instructions
> 
> refcount-fast
> 25259.577583      task-clock (msec)
>  82211446892      cycles
>  15486246572      instructions
> 
> refcount-full
> 44625.923432      task-clock (msec)
> 144814735193      cycles
> 105937495952      instructions
> 
> I'll still summarize all this in the v7 series, but I think that
> really clarifies the differences: 1.5x more instructions in -fast, but
> nearly identical cycles and clock. Using -full sees a large change (as
> expected).

Ok, that's pretty convincig - I'd suggest including a cicles row in the table
instead of an instructions row: number of instructions is indeed slightly
misleading in this case.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 0/2] x86: Implement fast refcount overflow protection
  2017-07-20  9:11 ` [PATCH v6 0/2] x86: " Ingo Molnar
  2017-07-20 17:15   ` Kees Cook
@ 2017-07-21 21:22   ` Andrew Morton
  2017-07-22  3:33     ` Kees Cook
  1 sibling, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2017-07-21 21:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Peter Zijlstra, Josh Poimboeuf, Christoph Hellwig,
	Eric W. Biederman, Jann Horn, Eric Biggers, Elena Reshetova,
	Hans Liljestrand, Greg KH, Alexey Dobriyan, Serge E. Hallyn,
	arozansk, Davidlohr Bueso, Manfred Spraul, axboe,
	James Bottomley, x86, Arnd Bergmann, David S. Miller,
	Rik van Riel, linux-kernel, linux-arch, kernel-hardening

On Thu, 20 Jul 2017 11:11:06 +0200 Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Kees Cook <keescook@chromium.org> wrote:
> 
> > This implements refcount_t overflow protection on x86 without a noticeable
> > performance impact, though without the fuller checking of REFCOUNT_FULL.
> > This is done by duplicating the existing atomic_t refcount implementation
> > but with normally a single instruction added to detect if the refcount
> > has gone negative (i.e. wrapped past INT_MAX or below zero). When
> > detected, the handler saturates the refcount_t to INT_MIN / 2. With this
> > overflow protection, the erroneous reference release that would follow
> > a wrap back to zero is blocked from happening, avoiding the class of
> > refcount-over-increment use-after-free vulnerabilities entirely.
> > 
> > Only the overflow case of refcounting can be perfectly protected, since it
> > can be detected and stopped before the reference is freed and left to be
> > abused by an attacker. This implementation also notices some of the "dec
> > to 0 without test", and "below 0" cases. However, these only indicate that
> > a use-after-free may have already happened. Such notifications are likely
> > avoidable by an attacker that has already exploited a use-after-free
> > vulnerability, but it's better to have them than allow such conditions to
> > remain universally silent.
> > 
> > On first overflow detection, the refcount value is reset to INT_MIN / 2
> > (which serves as a saturation value), the offending process is killed,
> > and a report and stack trace are produced. When operations detect only
> > negative value results (such as changing an already saturated value),
> > saturation still happens but no notification is performed (since the
> > value was already saturated).
> > 
> > On the matter of races, since the entire range beyond INT_MAX but before
> > 0 is negative, every operation at INT_MIN / 2 will trap, leaving no
> > overflow-only race condition.
> > 
> > As for performance, this implementation adds a single "js" instruction
> > to the regular execution flow of a copy of the standard atomic_t refcount
> > operations. (The non-"and_test" refcount_dec() function, which is uncommon
> > in regular refcount design patterns, has an additional "jz" instruction
> > to detect reaching exactly zero.) Since this is a forward jump, it is by
> > default the non-predicted path, which will be reinforced by dynamic branch
> > prediction. The result is this protection having virtually no measurable
> > change in performance over standard atomic_t operations. The error path,
> > located in .text.unlikely, saves the refcount location and then uses UD0
> > to fire a refcount exception handler, which resets the refcount, handles
> > reporting, and returns to regular execution. This keeps the changes to
> > .text size minimal, avoiding return jumps and open-coded calls to the
> > error reporting routine.
> 
> Pretty nice!
> 

Yes, this is a relief.

Do we have a feeling for how feasible/difficult it will be for other
architectures to implement such a thing?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 0/2] x86: Implement fast refcount overflow protection
  2017-07-21 21:22   ` Andrew Morton
@ 2017-07-22  3:33     ` Kees Cook
  2017-07-24  6:38       ` Michael Ellerman
  0 siblings, 1 reply; 20+ messages in thread
From: Kees Cook @ 2017-07-22  3:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Peter Zijlstra, Josh Poimboeuf, Christoph Hellwig,
	Eric W. Biederman, Jann Horn, Eric Biggers, Elena Reshetova,
	Hans Liljestrand, Greg KH, Alexey Dobriyan, Serge E. Hallyn,
	arozansk, Davidlohr Bueso, Manfred Spraul, axboe,
	James Bottomley, x86, Arnd Bergmann, David S. Miller,
	Rik van Riel, LKML, linux-arch, kernel-hardening

On Fri, Jul 21, 2017 at 2:22 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Thu, 20 Jul 2017 11:11:06 +0200 Ingo Molnar <mingo@kernel.org> wrote:
>
>>
>> * Kees Cook <keescook@chromium.org> wrote:
>>
>> > This implements refcount_t overflow protection on x86 without a noticeable
>> > performance impact, though without the fuller checking of REFCOUNT_FULL.
>> > This is done by duplicating the existing atomic_t refcount implementation
>> > but with normally a single instruction added to detect if the refcount
>> > has gone negative (i.e. wrapped past INT_MAX or below zero). When
>> > detected, the handler saturates the refcount_t to INT_MIN / 2. With this
>> > overflow protection, the erroneous reference release that would follow
>> > a wrap back to zero is blocked from happening, avoiding the class of
>> > refcount-over-increment use-after-free vulnerabilities entirely.
>> >
>> > Only the overflow case of refcounting can be perfectly protected, since it
>> > can be detected and stopped before the reference is freed and left to be
>> > abused by an attacker. This implementation also notices some of the "dec
>> > to 0 without test", and "below 0" cases. However, these only indicate that
>> > a use-after-free may have already happened. Such notifications are likely
>> > avoidable by an attacker that has already exploited a use-after-free
>> > vulnerability, but it's better to have them than allow such conditions to
>> > remain universally silent.
>> >
>> > On first overflow detection, the refcount value is reset to INT_MIN / 2
>> > (which serves as a saturation value), the offending process is killed,
>> > and a report and stack trace are produced. When operations detect only
>> > negative value results (such as changing an already saturated value),
>> > saturation still happens but no notification is performed (since the
>> > value was already saturated).
>> >
>> > On the matter of races, since the entire range beyond INT_MAX but before
>> > 0 is negative, every operation at INT_MIN / 2 will trap, leaving no
>> > overflow-only race condition.
>> >
>> > As for performance, this implementation adds a single "js" instruction
>> > to the regular execution flow of a copy of the standard atomic_t refcount
>> > operations. (The non-"and_test" refcount_dec() function, which is uncommon
>> > in regular refcount design patterns, has an additional "jz" instruction
>> > to detect reaching exactly zero.) Since this is a forward jump, it is by
>> > default the non-predicted path, which will be reinforced by dynamic branch
>> > prediction. The result is this protection having virtually no measurable
>> > change in performance over standard atomic_t operations. The error path,
>> > located in .text.unlikely, saves the refcount location and then uses UD0
>> > to fire a refcount exception handler, which resets the refcount, handles
>> > reporting, and returns to regular execution. This keeps the changes to
>> > .text size minimal, avoiding return jumps and open-coded calls to the
>> > error reporting routine.
>>
>> Pretty nice!
>>
>
> Yes, this is a relief.
>
> Do we have a feeling for how feasible/difficult it will be for other
> architectures to implement such a thing?

The PaX atomic_t overflow protection this is heavily based on was
ported to a number of architectures (arm, powerpc, mips, sparc), so I
suspect it shouldn't be too hard to adapt those for the more narrow
refcount_t protection:
https://forums.grsecurity.net/viewtopic.php?f=7&t=4173

And an arm64 port of the fast refcount_t protection is already happening too.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 0/2] x86: Implement fast refcount overflow protection
  2017-07-22  3:33     ` Kees Cook
@ 2017-07-24  6:38       ` Michael Ellerman
  2017-07-24  8:44         ` Peter Zijlstra
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Ellerman @ 2017-07-24  6:38 UTC (permalink / raw)
  To: Kees Cook, Andrew Morton
  Cc: Ingo Molnar, Peter Zijlstra, Josh Poimboeuf, Christoph Hellwig,
	Eric W. Biederman, Jann Horn, Eric Biggers, Elena Reshetova,
	Hans Liljestrand, Greg KH, Alexey Dobriyan, Serge E. Hallyn,
	arozansk, Davidlohr Bueso, Manfred Spraul, axboe,
	James Bottomley, x86, Arnd Bergmann, David S. Miller,
	Rik van Riel, LKML, linux-arch, kernel-hardening

Kees Cook <keescook@chromium.org> writes:
> On Fri, Jul 21, 2017 at 2:22 PM, Andrew Morton
>>
>> Do we have a feeling for how feasible/difficult it will be for other
>> architectures to implement such a thing?
>
> The PaX atomic_t overflow protection this is heavily based on was
> ported to a number of architectures (arm, powerpc, mips, sparc), so I
> suspect it shouldn't be too hard to adapt those for the more narrow
> refcount_t protection:
> https://forums.grsecurity.net/viewtopic.php?f=7&t=4173

The ppc code there appears to be 32-bit only and has other issues so
probably isn't something we'd use.

I don't think there should be any fundamental problem implementing it.

What I'm not entirely clear on is what the best trade off is in terms of
overhead vs checks. The summary of behaviour between the fast and full
versions you promised Ingo will help there I think.

cheers

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 0/2] x86: Implement fast refcount overflow protection
  2017-07-24  6:38       ` Michael Ellerman
@ 2017-07-24  8:44         ` Peter Zijlstra
  2017-07-24 12:09           ` Michael Ellerman
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2017-07-24  8:44 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Kees Cook, Andrew Morton, Ingo Molnar, Josh Poimboeuf,
	Christoph Hellwig, Eric W. Biederman, Jann Horn, Eric Biggers,
	Elena Reshetova, Hans Liljestrand, Greg KH, Alexey Dobriyan,
	Serge E. Hallyn, arozansk, Davidlohr Bueso, Manfred Spraul,
	axboe, James Bottomley, x86, Arnd Bergmann, David S. Miller,
	Rik van Riel, LKML, linux-arch, kernel-hardening

On Mon, Jul 24, 2017 at 04:38:06PM +1000, Michael Ellerman wrote:

> What I'm not entirely clear on is what the best trade off is in terms of
> overhead vs checks. The summary of behaviour between the fast and full
> versions you promised Ingo will help there I think.

That's something that's probably completely different for PPC than it is
for x86. Both because your primitive is LL/SC and thus the saturation
semantics we need a cmpxchg loop for are more natural in your case
anyway, and the fact that your LL/SC is horrendously slow in any case.

Also, I still haven't seen an actual benchmark where our cmpxchg loop
actually regresses anything, just a lot of yelling about potential
regressions :/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 0/2] x86: Implement fast refcount overflow protection
  2017-07-24  8:44         ` Peter Zijlstra
@ 2017-07-24 12:09           ` Michael Ellerman
  2017-07-24 12:23             ` Peter Zijlstra
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Ellerman @ 2017-07-24 12:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kees Cook, Andrew Morton, Ingo Molnar, Josh Poimboeuf,
	Christoph Hellwig, Eric W. Biederman, Jann Horn, Eric Biggers,
	Elena Reshetova, Hans Liljestrand, Greg KH, Alexey Dobriyan,
	Serge E. Hallyn, arozansk, Davidlohr Bueso, Manfred Spraul,
	axboe, James Bottomley, x86, Arnd Bergmann, David S. Miller,
	Rik van Riel, LKML, linux-arch, kernel-hardening

Peter Zijlstra <peterz@infradead.org> writes:

> On Mon, Jul 24, 2017 at 04:38:06PM +1000, Michael Ellerman wrote:
>
>> What I'm not entirely clear on is what the best trade off is in terms of
>> overhead vs checks. The summary of behaviour between the fast and full
>> versions you promised Ingo will help there I think.
>
> That's something that's probably completely different for PPC than it is
> for x86.

Yeah definitely. I guess I see the x86 version as a lower bound on the
semantics we'd need to implement and still claim to implement the
refcount stuff.

> Both because your primitive is LL/SC and thus the saturation
> semantics we need a cmpxchg loop for are more natural in your case

Yay!

> anyway, and the fact that your LL/SC is horrendously slow in any case.

Boo :/

Just kidding. I suspect you're right that we can probably pack a
reasonable amount of tests in the body of the LL/SC and not notice.

> Also, I still haven't seen an actual benchmark where our cmpxchg loop
> actually regresses anything, just a lot of yelling about potential
> regressions :/

Heh yeah. Though I have looked at the code it generates on PPC and it's
not sleek, though I guess that's not a benchmark is it :)

cheers

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 0/2] x86: Implement fast refcount overflow protection
  2017-07-24 12:09           ` Michael Ellerman
@ 2017-07-24 12:23             ` Peter Zijlstra
  0 siblings, 0 replies; 20+ messages in thread
From: Peter Zijlstra @ 2017-07-24 12:23 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Kees Cook, Andrew Morton, Ingo Molnar, Josh Poimboeuf,
	Christoph Hellwig, Eric W. Biederman, Jann Horn, Eric Biggers,
	Elena Reshetova, Hans Liljestrand, Greg KH, Alexey Dobriyan,
	Serge E. Hallyn, arozansk, Davidlohr Bueso, Manfred Spraul,
	axboe, James Bottomley, x86, Arnd Bergmann, David S. Miller,
	Rik van Riel, LKML, linux-arch, kernel-hardening

On Mon, Jul 24, 2017 at 10:09:32PM +1000, Michael Ellerman wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
> > anyway, and the fact that your LL/SC is horrendously slow in any case.
> 
> Boo :/

:-)

> Just kidding. I suspect you're right that we can probably pack a
> reasonable amount of tests in the body of the LL/SC and not notice.
> 
> > Also, I still haven't seen an actual benchmark where our cmpxchg loop
> > actually regresses anything, just a lot of yelling about potential
> > regressions :/
> 
> Heh yeah. Though I have looked at the code it generates on PPC and it's
> not sleek, though I guess that's not a benchmark is it :)

Oh for sure, GCC still can't sanely convert a cmpxchg loop (esp. if the
cmpxchg is implemented using asm) into a native LL/SC sequence, so the
generic code will end up looking pretty horrendous.

A native implementation of the same semantics should look loads better.


One thing that might help you is that refcount_dec_and_test() is weaker
than atomic_dec_and_test() wrt ordering, so that might help some
(RELEASE vs fully ordered).

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-07-24 12:24 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-19  0:03 [PATCH v6 0/2] x86: Implement fast refcount overflow protection Kees Cook
2017-07-19  0:03 ` [PATCH v6 1/2] x86/asm: Add suffix macro for GEN_*_RMWcc() Kees Cook
2017-07-19  0:03 ` [PATCH v6 2/2] x86/refcount: Implement fast refcount overflow protection Kees Cook
2017-07-19 19:37   ` Josh Poimboeuf
2017-07-19 19:45     ` Kees Cook
2017-07-19 19:52       ` Josh Poimboeuf
2017-07-19 22:50         ` Kees Cook
2017-07-19 23:01           ` Josh Poimboeuf
2017-07-19 23:12     ` Kees Cook
2017-07-19 23:30       ` Josh Poimboeuf
2017-07-20  9:11 ` [PATCH v6 0/2] x86: " Ingo Molnar
2017-07-20 17:15   ` Kees Cook
2017-07-20 22:53     ` Kees Cook
2017-07-21  7:50       ` Ingo Molnar
2017-07-21 21:22   ` Andrew Morton
2017-07-22  3:33     ` Kees Cook
2017-07-24  6:38       ` Michael Ellerman
2017-07-24  8:44         ` Peter Zijlstra
2017-07-24 12:09           ` Michael Ellerman
2017-07-24 12:23             ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).