linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] refcount_t and various related bits
@ 2017-02-03 13:25 Peter Zijlstra
  2017-02-03 13:25 ` [PATCH 1/5] refcount_t: A special purpose refcount type Peter Zijlstra
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Peter Zijlstra @ 2017-02-03 13:25 UTC (permalink / raw)
  To: elena.reshetova, gregkh, keescook, arnd, tglx, mingo,
	h.peter.anvin, will.deacon, dwindsor, dhowells, peterz
  Cc: linux-kernel, kernel-hardening

Hi all,

Here is the refcount_t patch and some related bits that I hacked up the past
few days while playing with it in order to make it generate less aweful code
(on x86_64).

I think we can start by merging the refcount_t and kref patches, the rest can
come later if so desired. New here is the Kconfig knob to turn warnings
off -- it significantly reduces the generated code while still avoiding
the UAF.

I added David Howells because both he and hpa have suggested something like
try_cmpxchg() in the past.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/5] refcount_t: A special purpose refcount type
  2017-02-03 13:25 [PATCH 0/5] refcount_t and various related bits Peter Zijlstra
@ 2017-02-03 13:25 ` Peter Zijlstra
  2017-02-03 18:09   ` Kees Cook
  2017-02-03 23:37   ` Kees Cook
  2017-02-03 13:26 ` [PATCH 2/5] kref: Implement using refcount_t Peter Zijlstra
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 12+ messages in thread
From: Peter Zijlstra @ 2017-02-03 13:25 UTC (permalink / raw)
  To: elena.reshetova, gregkh, keescook, arnd, tglx, mingo,
	h.peter.anvin, will.deacon, dwindsor, dhowells, peterz
  Cc: linux-kernel, kernel-hardening

[-- Attachment #1: peterz-ref-5.patch --]
[-- Type: text/plain, Size: 8910 bytes --]

Provide refcount_t, an atomic_t like primitive built just for
refcounting.

It provides saturation semantics such that overflow becomes impossible
and thereby 'spurious' use-after-free is avoided.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/refcount.h |  294 +++++++++++++++++++++++++++++++++++++++++++++++
 lib/Kconfig.debug        |   13 ++
 2 files changed, 307 insertions(+)

--- /dev/null
+++ b/include/linux/refcount.h
@@ -0,0 +1,294 @@
+#ifndef _LINUX_REFCOUNT_H
+#define _LINUX_REFCOUNT_H
+
+/*
+ * Variant of atomic_t specialized for reference counts.
+ *
+ * The interface matches the atomic_t interface (to aid in porting) but only
+ * provides the few functions one should use for reference counting.
+ *
+ * It differs in that the counter saturates at UINT_MAX and will not move once
+ * there. This avoids wrapping the counter and causing 'spurious'
+ * use-after-free issues.
+ *
+ * Memory ordering rules are slightly relaxed wrt regular atomic_t functions
+ * and provide only what is strictly required for refcounts.
+ *
+ * The increments are fully relaxed; these will not provide ordering. The
+ * rationale is that whatever is used to obtain the object we're increasing the
+ * reference count on will provide the ordering. For locked data structures,
+ * its the lock acquire, for RCU/lockless data structures its the dependent
+ * load.
+ *
+ * Do note that inc_not_zero() provides a control dependency which will order
+ * future stores against the inc, this ensures we'll never modify the object
+ * if we did not in fact acquire a reference.
+ *
+ * The decrements will provide release order, such that all the prior loads and
+ * stores will be issued before, it also provides a control dependency, which
+ * will order us against the subsequent free().
+ *
+ * The control dependency is against the load of the cmpxchg (ll/sc) that
+ * succeeded. This means the stores aren't fully ordered, but this is fine
+ * because the 1->0 transition indicates no concurrency.
+ *
+ * Note that the allocator is responsible for ordering things between free()
+ * and alloc().
+ *
+ */
+
+#include <linux/atomic.h>
+#include <linux/bug.h>
+#include <linux/mutex.h>
+#include <linux/spinlock.h>
+
+#if CONFIG_DEBUG_REFCOUNT
+#define REFCOUNT_WARN(cond, str) WARN_ON(cond)
+#define __refcount_check	__must_check
+#else
+#define REFCOUNT_WARN(cond, str) (void)(cond)
+#define __refcount_check
+#endif
+
+typedef struct refcount_struct {
+	atomic_t refs;
+} refcount_t;
+
+#define REFCOUNT_INIT(n)	{ .refs = ATOMIC_INIT(n), }
+
+static inline void refcount_set(refcount_t *r, unsigned int n)
+{
+	atomic_set(&r->refs, n);
+}
+
+static inline unsigned int refcount_read(const refcount_t *r)
+{
+	return atomic_read(&r->refs);
+}
+
+static inline __refcount_check
+bool refcount_add_not_zero(unsigned int i, refcount_t *r)
+{
+	unsigned int old, new, val = atomic_read(&r->refs);
+
+	for (;;) {
+		if (!val)
+			return false;
+
+		if (unlikely(val == UINT_MAX))
+			return true;
+
+		new = val + i;
+		if (new < val)
+			new = UINT_MAX;
+		old = atomic_cmpxchg_relaxed(&r->refs, val, new);
+		if (old == val)
+			break;
+
+		val = old;
+	}
+
+	REFCOUNT_WARN(new == UINT_MAX, "refcount_t: saturated; leaking memory.\n");
+
+	return true;
+}
+
+static inline void refcount_add(unsigned int i, refcount_t *r)
+{
+	REFCOUNT_WARN(!refcount_add_not_zero(i, r), "refcount_t: addition on 0; use-after-free.\n");
+}
+
+/*
+ * Similar to atomic_inc_not_zero(), will saturate at UINT_MAX and WARN.
+ *
+ * Provides no memory ordering, it is assumed the caller has guaranteed the
+ * object memory to be stable (RCU, etc.). It does provide a control dependency
+ * and thereby orders future stores. See the comment on top.
+ */
+static inline __refcount_check
+bool refcount_inc_not_zero(refcount_t *r)
+{
+	unsigned int old, new, val = atomic_read(&r->refs);
+
+	for (;;) {
+		new = val + 1;
+
+		if (!val)
+			return false;
+
+		if (unlikely(!new))
+			return true;
+
+		old = atomic_cmpxchg_relaxed(&r->refs, val, new);
+		if (old == val)
+			break;
+
+		val = old;
+	}
+
+	REFCOUNT_WARN(new == UINT_MAX, "refcount_t: saturated; leaking memory.\n");
+
+	return true;
+}
+
+/*
+ * Similar to atomic_inc(), will saturate at UINT_MAX and WARN.
+ *
+ * Provides no memory ordering, it is assumed the caller already has a
+ * reference on the object, will WARN when this is not so.
+ */
+static inline void refcount_inc(refcount_t *r)
+{
+	REFCOUNT_WARN(!refcount_inc_not_zero(r), "refcount_t: increment on 0; use-after-free.\n");
+}
+
+/*
+ * Similar to atomic_dec_and_test(), it will WARN on underflow and fail to
+ * decrement when saturated at UINT_MAX.
+ *
+ * Provides release memory ordering, such that prior loads and stores are done
+ * before, and provides a control dependency such that free() must come after.
+ * See the comment on top.
+ */
+static inline __refcount_check
+bool refcount_sub_and_test(unsigned int i, refcount_t *r)
+{
+	unsigned int old, new, val = atomic_read(&r->refs);
+
+	for (;;) {
+		if (unlikely(val == UINT_MAX))
+			return false;
+
+		new = val - i;
+		if (new > val) {
+			REFCOUNT_WARN(new > val, "refcount_t: underflow; use-after-free.\n");
+			return false;
+		}
+
+		old = atomic_cmpxchg_release(&r->refs, val, new);
+		if (old == val)
+			break;
+
+		val = old;
+	}
+
+	return !new;
+}
+
+static inline __refcount_check
+bool refcount_dec_and_test(refcount_t *r)
+{
+	return refcount_sub_and_test(1, r);
+}
+
+/*
+ * Similar to atomic_dec(), it will WARN on underflow and fail to decrement
+ * when saturated at UINT_MAX.
+ *
+ * Provides release memory ordering, such that prior loads and stores are done
+ * before.
+ */
+static inline
+void refcount_dec(refcount_t *r)
+{
+	REFCOUNT_WARN(refcount_dec_and_test(r), "refcount_t: decrement hit 0; leaking memory.\n");
+}
+
+/*
+ * No atomic_t counterpart, it attempts a 1 -> 0 transition and returns the
+ * success thereof.
+ *
+ * Like all decrement operations, it provides release memory order and provides
+ * a control dependency.
+ *
+ * It can be used like a try-delete operator; this explicit case is provided
+ * and not cmpxchg in generic, because that would allow implementing unsafe
+ * operations.
+ */
+static inline __refcount_check
+bool refcount_dec_if_one(refcount_t *r)
+{
+	return atomic_cmpxchg_release(&r->refs, 1, 0) == 1;
+}
+
+/*
+ * No atomic_t counterpart, it decrements unless the value is 1, in which case
+ * it will return false.
+ *
+ * Was often done like: atomic_add_unless(&var, -1, 1)
+ */
+static inline __refcount_check
+bool refcount_dec_not_one(refcount_t *r)
+{
+	unsigned int old, new, val = atomic_read(&r->refs);
+
+	for (;;) {
+		if (unlikely(val == UINT_MAX))
+			return true;
+
+		if (val == 1)
+			return false;
+
+		new = val - 1;
+		if (new > val) {
+			REFCOUNT_WARN(new > val, "refcount_t: underflow; use-after-free.\n");
+			return true;
+		}
+
+		old = atomic_cmpxchg_release(&r->refs, val, new);
+		if (old == val)
+			break;
+
+		val = old;
+	}
+
+	return true;
+}
+
+/*
+ * Similar to atomic_dec_and_mutex_lock(), it will WARN on underflow and fail
+ * to decrement when saturated at UINT_MAX.
+ *
+ * Provides release memory ordering, such that prior loads and stores are done
+ * before, and provides a control dependency such that free() must come after.
+ * See the comment on top.
+ */
+static inline __refcount_check
+bool refcount_dec_and_mutex_lock(refcount_t *r, struct mutex *lock)
+{
+	if (refcount_dec_not_one(r))
+		return false;
+
+	mutex_lock(lock);
+	if (!refcount_dec_and_test(r)) {
+		mutex_unlock(lock);
+		return false;
+	}
+
+	return true;
+}
+
+/*
+ * Similar to atomic_dec_and_lock(), it will WARN on underflow and fail to
+ * decrement when saturated at UINT_MAX.
+ *
+ * Provides release memory ordering, such that prior loads and stores are done
+ * before, and provides a control dependency such that free() must come after.
+ * See the comment on top.
+ */
+static inline __refcount_check
+bool refcount_dec_and_lock(refcount_t *r, spinlock_t *lock)
+{
+	if (refcount_dec_not_one(r))
+		return false;
+
+	spin_lock(lock);
+	if (!refcount_dec_and_test(r)) {
+		spin_unlock(lock);
+		return false;
+	}
+
+	return true;
+}
+
+#endif /* _LINUX_REFCOUNT_H */
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -716,6 +716,19 @@ source "lib/Kconfig.kmemcheck"
 
 source "lib/Kconfig.kasan"
 
+config DEBUG_REFCOUNT
+	bool "Verbose refcount checks"
+	--help--
+	  Say Y here if you want reference counters (refcount_t and kref) to
+	  generate WARNs on dubious usage. Without this refcount_t will still
+	  be a saturating counter and avoid Use-After-Free by turning it into
+	  a resource leak Denial-Of-Service.
+
+	  Use of this option will increase kernel text size but will alert the
+	  admin of potential abuse.
+
+	  If in doubt, say "N".
+
 endmenu # "Memory Debugging"
 
 config ARCH_HAS_KCOV

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 2/5] kref: Implement using refcount_t
  2017-02-03 13:25 [PATCH 0/5] refcount_t and various related bits Peter Zijlstra
  2017-02-03 13:25 ` [PATCH 1/5] refcount_t: A special purpose refcount type Peter Zijlstra
@ 2017-02-03 13:26 ` Peter Zijlstra
  2017-02-06 13:06   ` Greg KH
  2017-02-03 13:26 ` [PATCH 3/5] x86: Implement __WARN using UD2 Peter Zijlstra
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2017-02-03 13:26 UTC (permalink / raw)
  To: elena.reshetova, gregkh, keescook, arnd, tglx, mingo,
	h.peter.anvin, will.deacon, dwindsor, dhowells, peterz
  Cc: linux-kernel, kernel-hardening

[-- Attachment #1: peterz-ref-5a.patch --]
[-- Type: text/plain, Size: 2539 bytes --]

Use the refcount_t 'atomic' type to implement kref, this makes kref
more robust by bringing saturation semantics.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/kref.h |   29 +++++++++++------------------
 1 file changed, 11 insertions(+), 18 deletions(-)

--- a/include/linux/kref.h
+++ b/include/linux/kref.h
@@ -15,17 +15,14 @@
 #ifndef _KREF_H_
 #define _KREF_H_
 
-#include <linux/bug.h>
-#include <linux/atomic.h>
-#include <linux/kernel.h>
-#include <linux/mutex.h>
 #include <linux/spinlock.h>
+#include <linux/refcount.h>
 
 struct kref {
-	atomic_t refcount;
+	refcount_t refcount;
 };
 
-#define KREF_INIT(n)	{ .refcount = ATOMIC_INIT(n), }
+#define KREF_INIT(n)	{ .refcount = REFCOUNT_INIT(n), }
 
 /**
  * kref_init - initialize object.
@@ -33,12 +30,12 @@ struct kref {
  */
 static inline void kref_init(struct kref *kref)
 {
-	atomic_set(&kref->refcount, 1);
+	refcount_set(&kref->refcount, 1);
 }
 
-static inline int kref_read(const struct kref *kref)
+static inline unsigned int kref_read(const struct kref *kref)
 {
-	return atomic_read(&kref->refcount);
+	return refcount_read(&kref->refcount);
 }
 
 /**
@@ -47,11 +44,7 @@ static inline int kref_read(const struct
  */
 static inline void kref_get(struct kref *kref)
 {
-	/* If refcount was 0 before incrementing then we have a race
-	 * condition when this kref is freeing by some other thread right now.
-	 * In this case one should use kref_get_unless_zero()
-	 */
-	WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2);
+	refcount_inc(&kref->refcount);
 }
 
 /**
@@ -75,7 +68,7 @@ static inline int kref_put(struct kref *
 {
 	WARN_ON(release == NULL);
 
-	if (atomic_dec_and_test(&kref->refcount)) {
+	if (refcount_dec_and_test(&kref->refcount)) {
 		release(kref);
 		return 1;
 	}
@@ -88,7 +81,7 @@ static inline int kref_put_mutex(struct
 {
 	WARN_ON(release == NULL);
 
-	if (atomic_dec_and_mutex_lock(&kref->refcount, lock)) {
+	if (refcount_dec_and_mutex_lock(&kref->refcount, lock)) {
 		release(kref);
 		return 1;
 	}
@@ -101,7 +94,7 @@ static inline int kref_put_lock(struct k
 {
 	WARN_ON(release == NULL);
 
-	if (atomic_dec_and_lock(&kref->refcount, lock)) {
+	if (refcount_dec_and_lock(&kref->refcount, lock)) {
 		release(kref);
 		return 1;
 	}
@@ -126,6 +119,6 @@ static inline int kref_put_lock(struct k
  */
 static inline int __must_check kref_get_unless_zero(struct kref *kref)
 {
-	return atomic_add_unless(&kref->refcount, 1, 0);
+	return refcount_inc_not_zero(&kref->refcount);
 }
 #endif /* _KREF_H_ */

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 3/5] x86: Implement __WARN using UD2
  2017-02-03 13:25 [PATCH 0/5] refcount_t and various related bits Peter Zijlstra
  2017-02-03 13:25 ` [PATCH 1/5] refcount_t: A special purpose refcount type Peter Zijlstra
  2017-02-03 13:26 ` [PATCH 2/5] kref: Implement using refcount_t Peter Zijlstra
@ 2017-02-03 13:26 ` Peter Zijlstra
  2017-02-03 13:26 ` [PATCH 4/5] atomic: Introduce atomic_try_cmpxchg() Peter Zijlstra
  2017-02-03 13:26 ` [PATCH 5/5] refcount: Use atomic_try_cmpxchg() Peter Zijlstra
  4 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2017-02-03 13:26 UTC (permalink / raw)
  To: elena.reshetova, gregkh, keescook, arnd, tglx, mingo,
	h.peter.anvin, will.deacon, dwindsor, dhowells, peterz
  Cc: linux-kernel, kernel-hardening

[-- Attachment #1: peterz-x86-warn-ud2.patch --]
[-- Type: text/plain, Size: 7022 bytes --]

By using "UD2" for WARNs we remove the function call and its possible
__FILE__ and __LINE__ immediate arguments from the instruction stream.

Total image size will not change much, what we win in the instrucion
stream we'll loose because of the __bug_table entries. Still, saves on
I$ footprint and the total image size does go down a bit.

  text            data     bss    dec              hex    filename                     size
  10475740        4428992  974848 15879580         f24d9c defconfig-build/vmlinux.pre  25215824
  10451804        4428992  974848 15855644         f1f01c defconfig-build/vmlinux.post 25211288

In particular this makes:

0000000000001490 <ihold>:
    1490:       55                      push   %rbp
    1491:       8b 87 48 01 00 00       mov    0x148(%rdi),%eax
    1497:       48 89 e5                mov    %rsp,%rbp
    149a:       eb 0a                   jmp    14a6 <ihold+0x16>
    149c:       f0 0f b1 97 48 01 00    lock cmpxchg %edx,0x148(%rdi)
    14a3:       00 
    14a4:       74 20                   je     14c6 <ihold+0x36>
    14a6:       85 c0                   test   %eax,%eax
    14a8:       8d 50 01                lea    0x1(%rax),%edx
    14ab:       74 06                   je     14b3 <ihold+0x23>
    14ad:       85 d2                   test   %edx,%edx
    14af:       75 eb                   jne    149c <ihold+0xc>
    14b1:       5d                      pop    %rbp
    14b2:       c3                      retq   
    14b3:       be 8d 00 00 00          mov    $0x8d,%esi
    14b8:       48 c7 c7 00 00 00 00    mov    $0x0,%rdi
                        14bb: R_X86_64_32S      .rodata.str1.1+0x35
    14bf:       e8 00 00 00 00          callq  14c4 <ihold+0x34>
                        14c0: R_X86_64_PC32     warn_slowpath_null-0x4
    14c4:       5d                      pop    %rbp
    14c5:       c3                      retq   
    14c6:       83 fa ff                cmp    $0xffffffff,%edx
    14c9:       75 e6                   jne    14b1 <ihold+0x21>
    14cb:       be 80 00 00 00          mov    $0x80,%esi
    14d0:       48 c7 c7 00 00 00 00    mov    $0x0,%rdi
                        14d3: R_X86_64_32S      .rodata.str1.1+0x35
    14d7:       e8 00 00 00 00          callq  14dc <ihold+0x4c>
                        14d8: R_X86_64_PC32     warn_slowpath_null-0x4
    14dc:       5d                      pop    %rbp
    14dd:       c3                      retq   
    14de:       66 90                   xchg   %ax,%ax

Look like:

0000000000001400 <ihold>:
    1400:       55                      push   %rbp
    1401:       8b 87 48 01 00 00       mov    0x148(%rdi),%eax
    1407:       48 89 e5                mov    %rsp,%rbp
    140a:       eb 0a                   jmp    1416 <ihold+0x16>
    140c:       f0 0f b1 97 48 01 00    lock cmpxchg %edx,0x148(%rdi)
    1413:       00 
    1414:       74 11                   je     1427 <ihold+0x27>
    1416:       85 c0                   test   %eax,%eax
    1418:       8d 50 01                lea    0x1(%rax),%edx
    141b:       74 06                   je     1423 <ihold+0x23>
    141d:       85 d2                   test   %edx,%edx
    141f:       75 eb                   jne    140c <ihold+0xc>
    1421:       5d                      pop    %rbp
    1422:       c3                      retq   
    1423:       0f 0b                   ud2    
    1425:       5d                      pop    %rbp
    1426:       c3                      retq   
    1427:       83 fa ff                cmp    $0xffffffff,%edx
    142a:       75 f5                   jne    1421 <ihold+0x21>
    142c:       0f 0b                   ud2    
    142e:       5d                      pop    %rbp
    142f:       c3                      retq  

Note that custom x86 code could do better using the exception table
with a custom exception handler looking at the regs->ax value to
determine which of the failure cases was hit, removing a bunch of
compares from the actual code path.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/bug.h  |   36 +++++++++++++++++++++++++++++-------
 arch/x86/kernel/dumpstack.c |    3 ---
 arch/x86/kernel/traps.c     |   33 +++++++++++++++++++++++++++------
 3 files changed, 56 insertions(+), 16 deletions(-)

--- a/arch/x86/include/asm/bug.h
+++ b/arch/x86/include/asm/bug.h
@@ -11,26 +11,48 @@
 # define __BUG_C0	"2:\t.long 1b - 2b, %c0 - 2b\n"
 #endif
 
-#define BUG()							\
+#define _BUG_FLAGS(flags)					\
 do {								\
 	asm volatile("1:\tud2\n"				\
 		     ".pushsection __bug_table,\"a\"\n"		\
 		     __BUG_C0					\
-		     "\t.word %c1, 0\n"				\
-		     "\t.org 2b+%c2\n"				\
+		     "\t.word %c1, %c2\n"			\
+		     "\t.org 2b+%c3\n"				\
 		     ".popsection"				\
 		     : : "i" (__FILE__), "i" (__LINE__),	\
-		     "i" (sizeof(struct bug_entry)));		\
-	unreachable();						\
+			 "i" (flags),				\
+			 "i" (sizeof(struct bug_entry)));	\
 } while (0)
 
 #else
+
+#ifdef CONFIG_X86_32
+# define __BUG_C0	"2:\t.long 1b\n"
+#else
+# define __BUG_C0	"2:\t.long 1b - 2b\n"
+#endif
+
+#define _BUG_FLAGS(flags)					\
+do {								\
+	asm volatile("1:\tud2\n"				\
+		     ".pushsection __bug_table,\"a\"\n"		\
+		     __BUG_C0					\
+		     "\t.word %c0\n"				\
+		     "\t.org 2b+%c1\n"				\
+		     ".popsection"				\
+		     : : "i" (flags),				\
+			 "i" (sizeof(struct bug_entry)));	\
+} while (0)
+
+#endif
+
 #define BUG()							\
 do {								\
-	asm volatile("ud2");					\
+	_BUG_FLAGS(0);						\
 	unreachable();						\
 } while (0)
-#endif
+
+#define __WARN_TAINT(taint)	_BUG_FLAGS(BUGFLAG_TAINT(taint))
 
 #include <asm-generic/bug.h>
 
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -287,9 +287,6 @@ void die(const char *str, struct pt_regs
 	unsigned long flags = oops_begin();
 	int sig = SIGSEGV;
 
-	if (!user_mode(regs))
-		report_bug(regs->ip, regs);
-
 	if (__die(str, regs, err))
 		sig = 0;
 	oops_end(flags, regs, sig);
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -168,6 +168,24 @@ void ist_end_non_atomic(void)
 	preempt_disable();
 }
 
+static int fixup_bug(struct pt_regs *regs, int trapnr)
+{
+	if (trapnr != X86_TRAP_UD)
+		return 0;
+
+	switch (report_bug(regs->ip, regs)) {
+	case BUG_TRAP_TYPE_NONE:
+	case BUG_TRAP_TYPE_BUG:
+		break;
+
+	case BUG_TRAP_TYPE_WARN:
+		regs->ip += 2;
+		return 1;
+	}
+
+	return 0;
+}
+
 static nokprobe_inline int
 do_trap_no_signal(struct task_struct *tsk, int trapnr, char *str,
 		  struct pt_regs *regs,	long error_code)
@@ -186,12 +204,15 @@ do_trap_no_signal(struct task_struct *ts
 	}
 
 	if (!user_mode(regs)) {
-		if (!fixup_exception(regs, trapnr)) {
-			tsk->thread.error_code = error_code;
-			tsk->thread.trap_nr = trapnr;
-			die(str, regs, error_code);
-		}
-		return 0;
+		if (fixup_exception(regs, trapnr))
+			return 0;
+
+		if (fixup_bug(regs, trapnr))
+			return 0;
+
+		tsk->thread.error_code = error_code;
+		tsk->thread.trap_nr = trapnr;
+		die(str, regs, error_code);
 	}
 
 	return -1;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 4/5] atomic: Introduce atomic_try_cmpxchg()
  2017-02-03 13:25 [PATCH 0/5] refcount_t and various related bits Peter Zijlstra
                   ` (2 preceding siblings ...)
  2017-02-03 13:26 ` [PATCH 3/5] x86: Implement __WARN using UD2 Peter Zijlstra
@ 2017-02-03 13:26 ` Peter Zijlstra
  2017-02-06  4:24   ` Boqun Feng
  2017-02-03 13:26 ` [PATCH 5/5] refcount: Use atomic_try_cmpxchg() Peter Zijlstra
  4 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2017-02-03 13:26 UTC (permalink / raw)
  To: elena.reshetova, gregkh, keescook, arnd, tglx, mingo,
	h.peter.anvin, will.deacon, dwindsor, dhowells, peterz
  Cc: linux-kernel, kernel-hardening

[-- Attachment #1: peterz-try_atomic.patch --]
[-- Type: text/plain, Size: 4531 bytes --]

Add a new cmpxchg interface:

  bool try_cmpxchg(u{8,16,32,64} *ptr, u{8,16,32,64} *val, u{8,16,32,64} new);

Where the boolean returns the result of the compare; and thus if the
exchange happened; and in case of failure, the new value of *ptr is
returned in *val.

This allows simplification/improvement of loops like:

	for (;;) {
		new = val $op $imm;
		old = cmpxchg(ptr, val, new);
		if (old == val)
			break;
		val = old;
	}

into:

	for (;;) {
		new = val $op $imm;
		if (try_cmpxchg(ptr, &val, new))
			break;
	}

while also generating better code (GCC6 and onwards).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -186,6 +186,12 @@ static __always_inline int atomic_cmpxch
 	return cmpxchg(&v->counter, old, new);
 }
 
+#define atomic_try_cmpxchg atomic_try_cmpxchg
+static __always_inline bool atomic_try_cmpxchg(atomic_t *v, int *old, int new)
+{
+	return try_cmpxchg(&v->counter, old, new);
+}
+
 static inline int atomic_xchg(atomic_t *v, int new)
 {
 	return xchg(&v->counter, new);
--- a/arch/x86/include/asm/cmpxchg.h
+++ b/arch/x86/include/asm/cmpxchg.h
@@ -153,6 +153,75 @@ extern void __add_wrong_size(void)
 #define cmpxchg_local(ptr, old, new)					\
 	__cmpxchg_local(ptr, old, new, sizeof(*(ptr)))
 
+
+#define __raw_try_cmpxchg(_ptr, _pold, _new, size, lock)		\
+({									\
+	bool success;							\
+	__typeof__(_ptr) _old = (_pold);				\
+	__typeof__(*(_ptr)) __old = *_old;				\
+	__typeof__(*(_ptr)) __new = (_new);				\
+	switch (size) {							\
+	case __X86_CASE_B:						\
+	{								\
+		volatile u8 *__ptr = (volatile u8 *)(_ptr);		\
+		asm volatile(lock "cmpxchgb %[new], %[ptr]"		\
+			     CC_SET(z)					\
+			     : CC_OUT(z) (success),			\
+			       [ptr] "+m" (*__ptr),			\
+			       [old] "+a" (__old)			\
+			     : [new] "q" (__new)			\
+			     : "memory");				\
+		break;							\
+	}								\
+	case __X86_CASE_W:						\
+	{								\
+		volatile u16 *__ptr = (volatile u16 *)(_ptr);		\
+		asm volatile(lock "cmpxchgw %[new], %[ptr]"		\
+			     CC_SET(z)					\
+			     : CC_OUT(z) (success),			\
+			       [ptr] "+m" (*__ptr),			\
+			       [old] "+a" (__old)			\
+			     : [new] "r" (__new)			\
+			     : "memory");				\
+		break;							\
+	}								\
+	case __X86_CASE_L:						\
+	{								\
+		volatile u32 *__ptr = (volatile u32 *)(_ptr);		\
+		asm volatile(lock "cmpxchgl %[new], %[ptr]"		\
+			     CC_SET(z)					\
+			     : CC_OUT(z) (success),			\
+			       [ptr] "+m" (*__ptr),			\
+			       [old] "+a" (__old)			\
+			     : [new] "r" (__new)			\
+			     : "memory");				\
+		break;							\
+	}								\
+	case __X86_CASE_Q:						\
+	{								\
+		volatile u64 *__ptr = (volatile u64 *)(_ptr);		\
+		asm volatile(lock "cmpxchgq %[new], %[ptr]"		\
+			     CC_SET(z)					\
+			     : CC_OUT(z) (success),			\
+			       [ptr] "+m" (*__ptr),			\
+			       [old] "+a" (__old)			\
+			     : [new] "r" (__new)			\
+			     : "memory");				\
+		break;							\
+	}								\
+	default:							\
+		__cmpxchg_wrong_size();					\
+	}								\
+	*_old = __old;							\
+	success;							\
+})
+
+#define __try_cmpxchg(ptr, pold, new, size)				\
+	__raw_try_cmpxchg((ptr), (pold), (new), (size), LOCK_PREFIX)
+
+#define try_cmpxchg(ptr, pold, new)					\
+	__try_cmpxchg((ptr), (pold), (new), sizeof(*(ptr)))
+
 /*
  * xadd() adds "inc" to "*ptr" and atomically returns the previous
  * value of "*ptr".
--- a/include/linux/atomic.h
+++ b/include/linux/atomic.h
@@ -423,6 +423,28 @@
 #endif
 #endif /* atomic_cmpxchg_relaxed */
 
+#ifndef atomic_try_cmpxchg
+
+#define __atomic_try_cmpxchg(type, _p, _po, _n)				\
+({									\
+	typeof(_po) __po = (_po);					\
+	typeof(*(_po)) __o = *__po;					\
+	bool success = (atomic_cmpxchg##type((_p), __o, (_n)) == __o);	\
+	*__po = __o;							\
+	success;							\
+})
+
+#define atomic_try_cmpxchg(_p, _po, _n)		__atomic_try_cmpxchg(, _p, _po, _n)
+#define atomic_try_cmpxchg_relaxed(_p, _po, _n)	__atomic_try_cmpxchg(_relaxed, _p, _po, _n)
+#define atomic_try_cmpxchg_acquire(_p, _po, _n)	__atomic_try_cmpxchg(_acquire, _p, _po, _n)
+#define atomic_try_cmpxchg_release(_p, _po, _n)	__atomic_try_cmpxchg(_release, _p, _po, _n)
+
+#else /* atomic_try_cmpxchg */
+#define atomic_try_cmpxchg_relaxed	atomic_try_cmpxchg
+#define atomic_try_cmpxchg_acquire	atomic_try_cmpxchg
+#define atomic_try_cmpxchg_release	atomic_try_cmpxchg
+#endif /* atomic_try_cmpxchg */
+
 /* cmpxchg_relaxed */
 #ifndef cmpxchg_relaxed
 #define  cmpxchg_relaxed		cmpxchg

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 5/5] refcount: Use atomic_try_cmpxchg()
  2017-02-03 13:25 [PATCH 0/5] refcount_t and various related bits Peter Zijlstra
                   ` (3 preceding siblings ...)
  2017-02-03 13:26 ` [PATCH 4/5] atomic: Introduce atomic_try_cmpxchg() Peter Zijlstra
@ 2017-02-03 13:26 ` Peter Zijlstra
  4 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2017-02-03 13:26 UTC (permalink / raw)
  To: elena.reshetova, gregkh, keescook, arnd, tglx, mingo,
	h.peter.anvin, will.deacon, dwindsor, dhowells, peterz
  Cc: linux-kernel, kernel-hardening

[-- Attachment #1: peterz-ref-6.patch --]
[-- Type: text/plain, Size: 4648 bytes --]

Generates better code (GCC-6.2.1):

0000000000000420 <ihold>:
     420:       55                      push   %rbp
     421:       8b 97 48 01 00 00       mov    0x148(%rdi),%edx
     427:       48 89 e5                mov    %rsp,%rbp
     42a:       eb 10                   jmp    43c <ihold+0x1c>
     42c:       89 d0                   mov    %edx,%eax
     42e:       f0 0f b1 8f 48 01 00    lock cmpxchg %ecx,0x148(%rdi)
     435:       00 
     436:       39 c2                   cmp    %eax,%edx
     438:       74 0d                   je     447 <ihold+0x27>
     43a:       89 c2                   mov    %eax,%edx
     43c:       8d 42 ff                lea    -0x1(%rdx),%eax
     43f:       8d 4a 01                lea    0x1(%rdx),%ecx
     442:       83 f8 fd                cmp    $0xfffffffd,%eax
     445:       76 e5                   jbe    42c <ihold+0xc>
     447:       5d                      pop    %rbp
     448:       c3                      retq   

0000000000001490 <ihold>:
    1490:       55                      push   %rbp
    1491:       8b 87 48 01 00 00       mov    0x148(%rdi),%eax
    1497:       48 89 e5                mov    %rsp,%rbp
    149a:       eb 0a                   jmp    14a6 <ihold+0x16>
    149c:       f0 0f b1 97 48 01 00    lock cmpxchg %edx,0x148(%rdi)
    14a3:       00 
    14a4:       74 0b                   je     14b1 <ihold+0x21>
    14a6:       8d 48 ff                lea    -0x1(%rax),%ecx
    14a9:       8d 50 01                lea    0x1(%rax),%edx
    14ac:       83 f9 fd                cmp    $0xfffffffd,%ecx
    14af:       76 eb                   jbe    149c <ihold+0xc>
    14b1:       5d                      pop    %rbp
    14b2:       c3                      retq   

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/refcount.h |   33 ++++++++++++---------------------
 1 file changed, 12 insertions(+), 21 deletions(-)

--- a/include/linux/refcount.h
+++ b/include/linux/refcount.h
@@ -69,7 +69,7 @@ static inline unsigned int refcount_read
 static inline __refcount_check
 bool refcount_add_not_zero(unsigned int i, refcount_t *r)
 {
-	unsigned int old, new, val = atomic_read(&r->refs);
+	unsigned int new, val = atomic_read(&r->refs);
 
 	for (;;) {
 		if (!val)
@@ -81,11 +81,9 @@ bool refcount_add_not_zero(unsigned int
 		new = val + i;
 		if (new < val)
 			new = UINT_MAX;
-		old = atomic_cmpxchg_relaxed(&r->refs, val, new);
-		if (old == val)
-			break;
 
-		val = old;
+		if (atomic_try_cmpxchg_relaxed(&r->refs, &val, new))
+			break;
 	}
 
 	REFCOUNT_WARN(new == UINT_MAX, "refcount_t: saturated; leaking memory.\n");
@@ -108,7 +106,7 @@ static inline void refcount_add(unsigned
 static inline __refcount_check
 bool refcount_inc_not_zero(refcount_t *r)
 {
-	unsigned int old, new, val = atomic_read(&r->refs);
+	unsigned int new, val = atomic_read(&r->refs);
 
 	for (;;) {
 		new = val + 1;
@@ -119,11 +117,8 @@ bool refcount_inc_not_zero(refcount_t *r
 		if (unlikely(!new))
 			return true;
 
-		old = atomic_cmpxchg_relaxed(&r->refs, val, new);
-		if (old == val)
+		if (atomic_try_cmpxchg_relaxed(&r->refs, &val, new))
 			break;
-
-		val = old;
 	}
 
 	REFCOUNT_WARN(new == UINT_MAX, "refcount_t: saturated; leaking memory.\n");
@@ -153,7 +148,7 @@ static inline void refcount_inc(refcount
 static inline __refcount_check
 bool refcount_sub_and_test(unsigned int i, refcount_t *r)
 {
-	unsigned int old, new, val = atomic_read(&r->refs);
+	unsigned int new, val = atomic_read(&r->refs);
 
 	for (;;) {
 		if (unlikely(val == UINT_MAX))
@@ -165,11 +160,8 @@ bool refcount_sub_and_test(unsigned int
 			return false;
 		}
 
-		old = atomic_cmpxchg_release(&r->refs, val, new);
-		if (old == val)
+		if (atomic_try_cmpxchg_release(&r->refs, &val, new))
 			break;
-
-		val = old;
 	}
 
 	return !new;
@@ -208,7 +200,9 @@ void refcount_dec(refcount_t *r)
 static inline __refcount_check
 bool refcount_dec_if_one(refcount_t *r)
 {
-	return atomic_cmpxchg_release(&r->refs, 1, 0) == 1;
+	int val = 1;
+
+	return atomic_try_cmpxchg_release(&r->refs, &val, 0);
 }
 
 /*
@@ -220,7 +214,7 @@ bool refcount_dec_if_one(refcount_t *r)
 static inline __refcount_check
 bool refcount_dec_not_one(refcount_t *r)
 {
-	unsigned int old, new, val = atomic_read(&r->refs);
+	unsigned int new, val = atomic_read(&r->refs);
 
 	for (;;) {
 		if (unlikely(val == UINT_MAX))
@@ -235,11 +229,8 @@ bool refcount_dec_not_one(refcount_t *r)
 			return true;
 		}
 
-		old = atomic_cmpxchg_release(&r->refs, val, new);
-		if (old == val)
+		if (atomic_try_cmpxchg_release(&r->refs, &val, new))
 			break;
-
-		val = old;
 	}
 
 	return true;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/5] refcount_t: A special purpose refcount type
  2017-02-03 13:25 ` [PATCH 1/5] refcount_t: A special purpose refcount type Peter Zijlstra
@ 2017-02-03 18:09   ` Kees Cook
  2017-02-03 23:37   ` Kees Cook
  1 sibling, 0 replies; 12+ messages in thread
From: Kees Cook @ 2017-02-03 18:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Reshetova, Elena, Greg KH, Arnd Bergmann, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Will Deacon, David Windsor,
	David Howells, LKML, kernel-hardening

On Fri, Feb 3, 2017 at 5:25 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> Provide refcount_t, an atomic_t like primitive built just for
> refcounting.
>
> It provides saturation semantics such that overflow becomes impossible
> and thereby 'spurious' use-after-free is avoided.

Wheee :) Thanks for working on this

>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> [...]
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -716,6 +716,19 @@ source "lib/Kconfig.kmemcheck"
>
>  source "lib/Kconfig.kasan"
>
> +config DEBUG_REFCOUNT
> +       bool "Verbose refcount checks"
> +       --help--

Quick feedback as I start playing with this: this isn't valid Kconfig
syntax (build breaks). It should either be "---help---" or just
"help", latter preferred.

> +         Say Y here if you want reference counters (refcount_t and kref) to
> +         generate WARNs on dubious usage. Without this refcount_t will still
> +         be a saturating counter and avoid Use-After-Free by turning it into
> +         a resource leak Denial-Of-Service.
> +
> +         Use of this option will increase kernel text size but will alert the
> +         admin of potential abuse.
> +
> +         If in doubt, say "N".
> +
>  endmenu # "Memory Debugging"
>
>  config ARCH_HAS_KCOV
>
>

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/5] refcount_t: A special purpose refcount type
  2017-02-03 13:25 ` [PATCH 1/5] refcount_t: A special purpose refcount type Peter Zijlstra
  2017-02-03 18:09   ` Kees Cook
@ 2017-02-03 23:37   ` Kees Cook
  1 sibling, 0 replies; 12+ messages in thread
From: Kees Cook @ 2017-02-03 23:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Reshetova, Elena, Greg KH, Arnd Bergmann, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Will Deacon, David Windsor,
	David Howells, LKML, kernel-hardening

On Fri, Feb 3, 2017 at 5:25 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> Provide refcount_t, an atomic_t like primitive built just for
> refcounting.
>
> It provides saturation semantics such that overflow becomes impossible
> and thereby 'spurious' use-after-free is avoided.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/linux/refcount.h |  294 +++++++++++++++++++++++++++++++++++++++++++++++
>  lib/Kconfig.debug        |   13 ++
>  2 files changed, 307 insertions(+)
>
> --- /dev/null
> +++ b/include/linux/refcount.h
> @@ -0,0 +1,294 @@
> [...]
> +#if CONFIG_DEBUG_REFCOUNT

Oh, and I just hit this too, it should be "#ifdef" ... I didn't notice
until after I sent my improvement series. Whoops. :P Yay Friday.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/5] atomic: Introduce atomic_try_cmpxchg()
  2017-02-03 13:26 ` [PATCH 4/5] atomic: Introduce atomic_try_cmpxchg() Peter Zijlstra
@ 2017-02-06  4:24   ` Boqun Feng
  2017-02-06  6:32     ` Boqun Feng
  2017-02-06  8:12     ` Peter Zijlstra
  0 siblings, 2 replies; 12+ messages in thread
From: Boqun Feng @ 2017-02-06  4:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: elena.reshetova, gregkh, keescook, arnd, tglx, mingo,
	h.peter.anvin, will.deacon, dwindsor, dhowells, linux-kernel,
	kernel-hardening

[-- Attachment #1: Type: text/plain, Size: 5609 bytes --]

On Fri, Feb 03, 2017 at 02:26:02PM +0100, Peter Zijlstra wrote:
> Add a new cmpxchg interface:
> 
>   bool try_cmpxchg(u{8,16,32,64} *ptr, u{8,16,32,64} *val, u{8,16,32,64} new);
> 
> Where the boolean returns the result of the compare; and thus if the
> exchange happened; and in case of failure, the new value of *ptr is
> returned in *val.
> 
> This allows simplification/improvement of loops like:
> 
> 	for (;;) {
> 		new = val $op $imm;
> 		old = cmpxchg(ptr, val, new);
> 		if (old == val)
> 			break;
> 		val = old;
> 	}
> 
> into:
> 
> 	for (;;) {
> 		new = val $op $imm;
> 		if (try_cmpxchg(ptr, &val, new))
> 			break;
> 	}
> 
> while also generating better code (GCC6 and onwards).
> 

But switching to try_cmpxchg() will make @val a memory location, which
could not be put in a register. And this will generate unnecessary
memory accesses on archs having enough registers(PPC, e.g.).

> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> --- a/arch/x86/include/asm/atomic.h
> +++ b/arch/x86/include/asm/atomic.h
> @@ -186,6 +186,12 @@ static __always_inline int atomic_cmpxch
>  	return cmpxchg(&v->counter, old, new);
>  }
>  
> +#define atomic_try_cmpxchg atomic_try_cmpxchg
> +static __always_inline bool atomic_try_cmpxchg(atomic_t *v, int *old, int new)
> +{
> +	return try_cmpxchg(&v->counter, old, new);
> +}
> +
>  static inline int atomic_xchg(atomic_t *v, int new)
>  {
>  	return xchg(&v->counter, new);
> --- a/arch/x86/include/asm/cmpxchg.h
> +++ b/arch/x86/include/asm/cmpxchg.h
> @@ -153,6 +153,75 @@ extern void __add_wrong_size(void)
>  #define cmpxchg_local(ptr, old, new)					\
>  	__cmpxchg_local(ptr, old, new, sizeof(*(ptr)))
>  
> +
> +#define __raw_try_cmpxchg(_ptr, _pold, _new, size, lock)		\
> +({									\
> +	bool success;							\
> +	__typeof__(_ptr) _old = (_pold);				\
> +	__typeof__(*(_ptr)) __old = *_old;				\
> +	__typeof__(*(_ptr)) __new = (_new);				\
> +	switch (size) {							\
> +	case __X86_CASE_B:						\
> +	{								\
> +		volatile u8 *__ptr = (volatile u8 *)(_ptr);		\
> +		asm volatile(lock "cmpxchgb %[new], %[ptr]"		\
> +			     CC_SET(z)					\
> +			     : CC_OUT(z) (success),			\
> +			       [ptr] "+m" (*__ptr),			\
> +			       [old] "+a" (__old)			\
> +			     : [new] "q" (__new)			\
> +			     : "memory");				\
> +		break;							\
> +	}								\
> +	case __X86_CASE_W:						\
> +	{								\
> +		volatile u16 *__ptr = (volatile u16 *)(_ptr);		\
> +		asm volatile(lock "cmpxchgw %[new], %[ptr]"		\
> +			     CC_SET(z)					\
> +			     : CC_OUT(z) (success),			\
> +			       [ptr] "+m" (*__ptr),			\
> +			       [old] "+a" (__old)			\
> +			     : [new] "r" (__new)			\
> +			     : "memory");				\
> +		break;							\
> +	}								\
> +	case __X86_CASE_L:						\
> +	{								\
> +		volatile u32 *__ptr = (volatile u32 *)(_ptr);		\
> +		asm volatile(lock "cmpxchgl %[new], %[ptr]"		\
> +			     CC_SET(z)					\
> +			     : CC_OUT(z) (success),			\
> +			       [ptr] "+m" (*__ptr),			\
> +			       [old] "+a" (__old)			\
> +			     : [new] "r" (__new)			\
> +			     : "memory");				\
> +		break;							\
> +	}								\
> +	case __X86_CASE_Q:						\
> +	{								\
> +		volatile u64 *__ptr = (volatile u64 *)(_ptr);		\
> +		asm volatile(lock "cmpxchgq %[new], %[ptr]"		\
> +			     CC_SET(z)					\
> +			     : CC_OUT(z) (success),			\
> +			       [ptr] "+m" (*__ptr),			\
> +			       [old] "+a" (__old)			\
> +			     : [new] "r" (__new)			\
> +			     : "memory");				\
> +		break;							\
> +	}								\
> +	default:							\
> +		__cmpxchg_wrong_size();					\
> +	}								\
> +	*_old = __old;							\
> +	success;							\
> +})
> +
> +#define __try_cmpxchg(ptr, pold, new, size)				\
> +	__raw_try_cmpxchg((ptr), (pold), (new), (size), LOCK_PREFIX)
> +
> +#define try_cmpxchg(ptr, pold, new)					\
> +	__try_cmpxchg((ptr), (pold), (new), sizeof(*(ptr)))
> +
>  /*
>   * xadd() adds "inc" to "*ptr" and atomically returns the previous
>   * value of "*ptr".
> --- a/include/linux/atomic.h
> +++ b/include/linux/atomic.h
> @@ -423,6 +423,28 @@
>  #endif
>  #endif /* atomic_cmpxchg_relaxed */
>  
> +#ifndef atomic_try_cmpxchg
> +
> +#define __atomic_try_cmpxchg(type, _p, _po, _n)				\
> +({									\
> +	typeof(_po) __po = (_po);					\
> +	typeof(*(_po)) __o = *__po;					\
> +	bool success = (atomic_cmpxchg##type((_p), __o, (_n)) == __o);	\
> +	*__po = __o;							\

Besides, is this part correct? atomic_cmpxchg_*() wouldn't change the
value of __o, so *__po wouldn't be changed.. IOW, in case of failure,
*ptr wouldn't be updated to a new value.

Maybe this should be:

	bool success;
	*__po = atomic_cmpxchg##type((_p), __o, (_n));
	sucess = (*__po == _o);

, right?

Regards,
Boqun

> +	success;							\
> +})
> +
> +#define atomic_try_cmpxchg(_p, _po, _n)		__atomic_try_cmpxchg(, _p, _po, _n)
> +#define atomic_try_cmpxchg_relaxed(_p, _po, _n)	__atomic_try_cmpxchg(_relaxed, _p, _po, _n)
> +#define atomic_try_cmpxchg_acquire(_p, _po, _n)	__atomic_try_cmpxchg(_acquire, _p, _po, _n)
> +#define atomic_try_cmpxchg_release(_p, _po, _n)	__atomic_try_cmpxchg(_release, _p, _po, _n)
> +
> +#else /* atomic_try_cmpxchg */
> +#define atomic_try_cmpxchg_relaxed	atomic_try_cmpxchg
> +#define atomic_try_cmpxchg_acquire	atomic_try_cmpxchg
> +#define atomic_try_cmpxchg_release	atomic_try_cmpxchg
> +#endif /* atomic_try_cmpxchg */
> +
>  /* cmpxchg_relaxed */
>  #ifndef cmpxchg_relaxed
>  #define  cmpxchg_relaxed		cmpxchg
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/5] atomic: Introduce atomic_try_cmpxchg()
  2017-02-06  4:24   ` Boqun Feng
@ 2017-02-06  6:32     ` Boqun Feng
  2017-02-06  8:12     ` Peter Zijlstra
  1 sibling, 0 replies; 12+ messages in thread
From: Boqun Feng @ 2017-02-06  6:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: elena.reshetova, gregkh, keescook, arnd, tglx, mingo,
	h.peter.anvin, will.deacon, dwindsor, dhowells, linux-kernel,
	kernel-hardening

[-- Attachment #1: Type: text/plain, Size: 6348 bytes --]

On Mon, Feb 06, 2017 at 12:24:28PM +0800, Boqun Feng wrote:
> On Fri, Feb 03, 2017 at 02:26:02PM +0100, Peter Zijlstra wrote:
> > Add a new cmpxchg interface:
> > 
> >   bool try_cmpxchg(u{8,16,32,64} *ptr, u{8,16,32,64} *val, u{8,16,32,64} new);
> > 
> > Where the boolean returns the result of the compare; and thus if the
> > exchange happened; and in case of failure, the new value of *ptr is
> > returned in *val.
> > 
> > This allows simplification/improvement of loops like:
> > 
> > 	for (;;) {
> > 		new = val $op $imm;
> > 		old = cmpxchg(ptr, val, new);
> > 		if (old == val)
> > 			break;
> > 		val = old;
> > 	}
> > 
> > into:
> > 
> > 	for (;;) {
> > 		new = val $op $imm;
> > 		if (try_cmpxchg(ptr, &val, new))
> > 			break;
> > 	}
> > 
> > while also generating better code (GCC6 and onwards).
> > 
> 
> But switching to try_cmpxchg() will make @val a memory location, which
> could not be put in a register. And this will generate unnecessary
> memory accesses on archs having enough registers(PPC, e.g.).
> 

Hmm.. it turns out that compilers can figure out that @val could be fit
in a register(maybe in the escape analysis step), so don't treat it as a
memory location. This is at least true for GCC 5.4.0 on PPC. So I think
we can rely on this?

> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> > --- a/arch/x86/include/asm/atomic.h
> > +++ b/arch/x86/include/asm/atomic.h
> > @@ -186,6 +186,12 @@ static __always_inline int atomic_cmpxch
> >  	return cmpxchg(&v->counter, old, new);
> >  }
> >  
> > +#define atomic_try_cmpxchg atomic_try_cmpxchg
> > +static __always_inline bool atomic_try_cmpxchg(atomic_t *v, int *old, int new)
> > +{
> > +	return try_cmpxchg(&v->counter, old, new);
> > +}
> > +
> >  static inline int atomic_xchg(atomic_t *v, int new)
> >  {
> >  	return xchg(&v->counter, new);
> > --- a/arch/x86/include/asm/cmpxchg.h
> > +++ b/arch/x86/include/asm/cmpxchg.h
> > @@ -153,6 +153,75 @@ extern void __add_wrong_size(void)
> >  #define cmpxchg_local(ptr, old, new)					\
> >  	__cmpxchg_local(ptr, old, new, sizeof(*(ptr)))
> >  
> > +
> > +#define __raw_try_cmpxchg(_ptr, _pold, _new, size, lock)		\
> > +({									\
> > +	bool success;							\
> > +	__typeof__(_ptr) _old = (_pold);				\
> > +	__typeof__(*(_ptr)) __old = *_old;				\
> > +	__typeof__(*(_ptr)) __new = (_new);				\
> > +	switch (size) {							\
> > +	case __X86_CASE_B:						\
> > +	{								\
> > +		volatile u8 *__ptr = (volatile u8 *)(_ptr);		\
> > +		asm volatile(lock "cmpxchgb %[new], %[ptr]"		\
> > +			     CC_SET(z)					\
> > +			     : CC_OUT(z) (success),			\
> > +			       [ptr] "+m" (*__ptr),			\
> > +			       [old] "+a" (__old)			\
> > +			     : [new] "q" (__new)			\
> > +			     : "memory");				\
> > +		break;							\
> > +	}								\
> > +	case __X86_CASE_W:						\
> > +	{								\
> > +		volatile u16 *__ptr = (volatile u16 *)(_ptr);		\
> > +		asm volatile(lock "cmpxchgw %[new], %[ptr]"		\
> > +			     CC_SET(z)					\
> > +			     : CC_OUT(z) (success),			\
> > +			       [ptr] "+m" (*__ptr),			\
> > +			       [old] "+a" (__old)			\
> > +			     : [new] "r" (__new)			\
> > +			     : "memory");				\
> > +		break;							\
> > +	}								\
> > +	case __X86_CASE_L:						\
> > +	{								\
> > +		volatile u32 *__ptr = (volatile u32 *)(_ptr);		\
> > +		asm volatile(lock "cmpxchgl %[new], %[ptr]"		\
> > +			     CC_SET(z)					\
> > +			     : CC_OUT(z) (success),			\
> > +			       [ptr] "+m" (*__ptr),			\
> > +			       [old] "+a" (__old)			\
> > +			     : [new] "r" (__new)			\
> > +			     : "memory");				\
> > +		break;							\
> > +	}								\
> > +	case __X86_CASE_Q:						\
> > +	{								\
> > +		volatile u64 *__ptr = (volatile u64 *)(_ptr);		\
> > +		asm volatile(lock "cmpxchgq %[new], %[ptr]"		\
> > +			     CC_SET(z)					\
> > +			     : CC_OUT(z) (success),			\
> > +			       [ptr] "+m" (*__ptr),			\
> > +			       [old] "+a" (__old)			\
> > +			     : [new] "r" (__new)			\
> > +			     : "memory");				\
> > +		break;							\
> > +	}								\
> > +	default:							\
> > +		__cmpxchg_wrong_size();					\
> > +	}								\
> > +	*_old = __old;							\
> > +	success;							\
> > +})
> > +
> > +#define __try_cmpxchg(ptr, pold, new, size)				\
> > +	__raw_try_cmpxchg((ptr), (pold), (new), (size), LOCK_PREFIX)
> > +
> > +#define try_cmpxchg(ptr, pold, new)					\
> > +	__try_cmpxchg((ptr), (pold), (new), sizeof(*(ptr)))
> > +
> >  /*
> >   * xadd() adds "inc" to "*ptr" and atomically returns the previous
> >   * value of "*ptr".
> > --- a/include/linux/atomic.h
> > +++ b/include/linux/atomic.h
> > @@ -423,6 +423,28 @@
> >  #endif
> >  #endif /* atomic_cmpxchg_relaxed */
> >  
> > +#ifndef atomic_try_cmpxchg
> > +
> > +#define __atomic_try_cmpxchg(type, _p, _po, _n)				\
> > +({									\
> > +	typeof(_po) __po = (_po);					\
> > +	typeof(*(_po)) __o = *__po;					\
> > +	bool success = (atomic_cmpxchg##type((_p), __o, (_n)) == __o);	\
> > +	*__po = __o;							\
> 
> Besides, is this part correct? atomic_cmpxchg_*() wouldn't change the
> value of __o, so *__po wouldn't be changed.. IOW, in case of failure,
> *ptr wouldn't be updated to a new value.
> 
> Maybe this should be:
> 
> 	bool success;
> 	*__po = atomic_cmpxchg##type((_p), __o, (_n));
> 	sucess = (*__po == _o);

typo... should be

	success = (*__po == __o);

Regards,
Boqun

> 
> , right?
> 
> Regards,
> Boqun
> 
> > +	success;							\
> > +})
> > +
> > +#define atomic_try_cmpxchg(_p, _po, _n)		__atomic_try_cmpxchg(, _p, _po, _n)
> > +#define atomic_try_cmpxchg_relaxed(_p, _po, _n)	__atomic_try_cmpxchg(_relaxed, _p, _po, _n)
> > +#define atomic_try_cmpxchg_acquire(_p, _po, _n)	__atomic_try_cmpxchg(_acquire, _p, _po, _n)
> > +#define atomic_try_cmpxchg_release(_p, _po, _n)	__atomic_try_cmpxchg(_release, _p, _po, _n)
> > +
> > +#else /* atomic_try_cmpxchg */
> > +#define atomic_try_cmpxchg_relaxed	atomic_try_cmpxchg
> > +#define atomic_try_cmpxchg_acquire	atomic_try_cmpxchg
> > +#define atomic_try_cmpxchg_release	atomic_try_cmpxchg
> > +#endif /* atomic_try_cmpxchg */
> > +
> >  /* cmpxchg_relaxed */
> >  #ifndef cmpxchg_relaxed
> >  #define  cmpxchg_relaxed		cmpxchg
> > 
> > 



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/5] atomic: Introduce atomic_try_cmpxchg()
  2017-02-06  4:24   ` Boqun Feng
  2017-02-06  6:32     ` Boqun Feng
@ 2017-02-06  8:12     ` Peter Zijlstra
  1 sibling, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2017-02-06  8:12 UTC (permalink / raw)
  To: Boqun Feng
  Cc: elena.reshetova, gregkh, keescook, arnd, tglx, mingo,
	h.peter.anvin, will.deacon, dwindsor, dhowells, linux-kernel,
	kernel-hardening

On Mon, Feb 06, 2017 at 12:24:28PM +0800, Boqun Feng wrote:
> On Fri, Feb 03, 2017 at 02:26:02PM +0100, Peter Zijlstra wrote:

> > 
> > 	for (;;) {
> > 		new = val $op $imm;
> > 		if (try_cmpxchg(ptr, &val, new))
> > 			break;
> > 	}
> > 
> > while also generating better code (GCC6 and onwards).
> > 
> 
> But switching to try_cmpxchg() will make @val a memory location, which
> could not be put in a register. And this will generate unnecessary
> memory accesses on archs having enough registers(PPC, e.g.).

GCC was perfectly capable of making @val a register in the code I was
looking at.

> > +#ifndef atomic_try_cmpxchg
> > +
> > +#define __atomic_try_cmpxchg(type, _p, _po, _n)				\
> > +({									\
> > +	typeof(_po) __po = (_po);					\
> > +	typeof(*(_po)) __o = *__po;					\
> > +	bool success = (atomic_cmpxchg##type((_p), __o, (_n)) == __o);	\
> > +	*__po = __o;							\
> 
> Besides, is this part correct? atomic_cmpxchg_*() wouldn't change the
> value of __o, so *__po wouldn't be changed.. IOW, in case of failure,
> *ptr wouldn't be updated to a new value.
> 
> Maybe this should be:
> 
> 	bool success;
> 	*__po = atomic_cmpxchg##type((_p), __o, (_n));
> 	sucess = (*__po == _o);
> 
> , right?

Yes, botched that. Don't think I even compiled it to be honest :/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/5] kref: Implement using refcount_t
  2017-02-03 13:26 ` [PATCH 2/5] kref: Implement using refcount_t Peter Zijlstra
@ 2017-02-06 13:06   ` Greg KH
  0 siblings, 0 replies; 12+ messages in thread
From: Greg KH @ 2017-02-06 13:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: elena.reshetova, keescook, arnd, tglx, mingo, h.peter.anvin,
	will.deacon, dwindsor, dhowells, linux-kernel, kernel-hardening

On Fri, Feb 03, 2017 at 02:26:00PM +0100, Peter Zijlstra wrote:
> Use the refcount_t 'atomic' type to implement kref, this makes kref
> more robust by bringing saturation semantics.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>


Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-02-06 13:06 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-03 13:25 [PATCH 0/5] refcount_t and various related bits Peter Zijlstra
2017-02-03 13:25 ` [PATCH 1/5] refcount_t: A special purpose refcount type Peter Zijlstra
2017-02-03 18:09   ` Kees Cook
2017-02-03 23:37   ` Kees Cook
2017-02-03 13:26 ` [PATCH 2/5] kref: Implement using refcount_t Peter Zijlstra
2017-02-06 13:06   ` Greg KH
2017-02-03 13:26 ` [PATCH 3/5] x86: Implement __WARN using UD2 Peter Zijlstra
2017-02-03 13:26 ` [PATCH 4/5] atomic: Introduce atomic_try_cmpxchg() Peter Zijlstra
2017-02-06  4:24   ` Boqun Feng
2017-02-06  6:32     ` Boqun Feng
2017-02-06  8:12     ` Peter Zijlstra
2017-02-03 13:26 ` [PATCH 5/5] refcount: Use atomic_try_cmpxchg() Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).