All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/5] refcount: Improve code-gen
@ 2021-12-08 18:36 Peter Zijlstra
  2021-12-08 18:36 ` [RFC][PATCH 1/5] atomic: Introduce atomic_{inc,dec,dec_and_test}_ofl() Peter Zijlstra
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Peter Zijlstra @ 2021-12-08 18:36 UTC (permalink / raw)
  To: will, boqun.feng
  Cc: linux-kernel, x86, peterz, mark.rutland, elver, keescook, hch, torvalds

Hi,

Improves the refcount_t code-gen; I've still got to go through the latest thing
Linus suggested, but figured I should get these patches out to see if there's
other concerns etc..




^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC][PATCH 1/5] atomic: Introduce atomic_{inc,dec,dec_and_test}_ofl()
  2021-12-08 18:36 [RFC][PATCH 0/5] refcount: Improve code-gen Peter Zijlstra
@ 2021-12-08 18:36 ` Peter Zijlstra
  2021-12-09 12:42   ` Mark Rutland
  2021-12-08 18:36 ` [RFC][PATCH 2/5] refcount: Use atomic_*_ofl() Peter Zijlstra
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2021-12-08 18:36 UTC (permalink / raw)
  To: will, boqun.feng
  Cc: linux-kernel, x86, peterz, mark.rutland, elver, keescook, hch, torvalds

In order to facilitate architecture support for refcount_t, introduce
a number of new atomic primitives that have a uaccess style exception
for overflow.

Notably:

  atomic_inc_ofl(v, Label) -- increment and goto Label when
			      v is zero or negative.

  atomic_dec_ofl(v, Label) -- decrement and goto Label when
			      the result is zero or negative

  atomic_dec_and_test_ofl(v, Label) -- decrement and return true when
				       the result is zero and goto Label
				       when the result is negative

Since the GCC 'Labels as Values' extention doesn't allow having the
goto in an inline function, these new 'functions' must in fact be
implemented as macro magic.

This meant extending the atomic generation scripts to deal with
wrapping macros instead of inline functions. Since
xchg/cmpxchg/try_cmpxchg were already macro magic, there was existant
code for that. While extending/improving that a few latent
'instrumentation' bugs were uncovered and 'accidentally' fixed.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/atomic/atomic-arch-fallback.h |   64 ++++++++++++
 include/linux/atomic/atomic-instrumented.h  |  139 ++++++++++++++++++++--------
 include/linux/atomic/atomic-long.h          |   32 ++++++
 scripts/atomic/atomic-tbl.sh                |    6 +
 scripts/atomic/atomics.tbl                  |    6 +
 scripts/atomic/fallbacks/dec_and_test_ofl   |   12 ++
 scripts/atomic/fallbacks/dec_ofl            |    8 +
 scripts/atomic/fallbacks/inc_ofl            |    8 +
 scripts/atomic/gen-atomic-fallback.sh       |    4 
 scripts/atomic/gen-atomic-instrumented.sh   |  117 +++++++++++++++++++----
 scripts/atomic/gen-atomic-long.sh           |   32 +++++-
 11 files changed, 359 insertions(+), 69 deletions(-)

--- a/include/linux/atomic/atomic-arch-fallback.h
+++ b/include/linux/atomic/atomic-arch-fallback.h
@@ -1250,6 +1250,37 @@ arch_atomic_dec_if_positive(atomic_t *v)
 #define arch_atomic_dec_if_positive arch_atomic_dec_if_positive
 #endif
 
+#ifndef arch_atomic_inc_ofl
+#define arch_atomic_inc_ofl(_v, _label)				\
+do {									\
+	int __old = arch_atomic_fetch_inc(_v);			\
+	if (unlikely(__old <= 0))					\
+		goto _label;						\
+} while (0)
+#endif
+
+#ifndef arch_atomic_dec_ofl
+#define arch_atomic_dec_ofl(_v, _label)				\
+do {									\
+	int __new = arch_atomic_dec_return(_v);			\
+	if (unlikely(__new <= 0))					\
+		goto _label;						\
+} while (0)
+#endif
+
+#ifndef arch_atomic_dec_and_test_ofl
+#define arch_atomic_dec_and_test_ofl(_v, _label)			\
+({									\
+	bool __ret = false;						\
+	int __new = arch_atomic_dec_return(_v);			\
+	if (unlikely(__new < 0))					\
+		goto _label;						\
+	if (unlikely(__new == 0))					\
+		__ret = true;						\
+	__ret;								\
+})
+#endif
+
 #ifdef CONFIG_GENERIC_ATOMIC64
 #include <asm-generic/atomic64.h>
 #endif
@@ -2357,5 +2388,36 @@ arch_atomic64_dec_if_positive(atomic64_t
 #define arch_atomic64_dec_if_positive arch_atomic64_dec_if_positive
 #endif
 
+#ifndef arch_atomic64_inc_ofl
+#define arch_atomic64_inc_ofl(_v, _label)				\
+do {									\
+	s64 __old = arch_atomic64_fetch_inc(_v);			\
+	if (unlikely(__old <= 0))					\
+		goto _label;						\
+} while (0)
+#endif
+
+#ifndef arch_atomic64_dec_ofl
+#define arch_atomic64_dec_ofl(_v, _label)				\
+do {									\
+	s64 __new = arch_atomic64_dec_return(_v);			\
+	if (unlikely(__new <= 0))					\
+		goto _label;						\
+} while (0)
+#endif
+
+#ifndef arch_atomic64_dec_and_test_ofl
+#define arch_atomic64_dec_and_test_ofl(_v, _label)			\
+({									\
+	bool __ret = false;						\
+	s64 __new = arch_atomic64_dec_return(_v);			\
+	if (unlikely(__new < 0))					\
+		goto _label;						\
+	if (unlikely(__new == 0))					\
+		__ret = true;						\
+	__ret;								\
+})
+#endif
+
 #endif /* _LINUX_ATOMIC_FALLBACK_H */
-// cca554917d7ea73d5e3e7397dd70c484cad9b2c4
+// a59904b14db62a38bbab8699edc4a785a97871fb
--- a/include/linux/atomic/atomic-instrumented.h
+++ b/include/linux/atomic/atomic-instrumented.h
@@ -501,7 +501,7 @@ static __always_inline bool
 atomic_try_cmpxchg(atomic_t *v, int *old, int new)
 {
 	instrument_atomic_read_write(v, sizeof(*v));
-	instrument_atomic_read_write(old, sizeof(*old));
+	instrument_read_write(old, sizeof(*old));
 	return arch_atomic_try_cmpxchg(v, old, new);
 }
 
@@ -509,7 +509,7 @@ static __always_inline bool
 atomic_try_cmpxchg_acquire(atomic_t *v, int *old, int new)
 {
 	instrument_atomic_read_write(v, sizeof(*v));
-	instrument_atomic_read_write(old, sizeof(*old));
+	instrument_read_write(old, sizeof(*old));
 	return arch_atomic_try_cmpxchg_acquire(v, old, new);
 }
 
@@ -517,7 +517,7 @@ static __always_inline bool
 atomic_try_cmpxchg_release(atomic_t *v, int *old, int new)
 {
 	instrument_atomic_read_write(v, sizeof(*v));
-	instrument_atomic_read_write(old, sizeof(*old));
+	instrument_read_write(old, sizeof(*old));
 	return arch_atomic_try_cmpxchg_release(v, old, new);
 }
 
@@ -525,7 +525,7 @@ static __always_inline bool
 atomic_try_cmpxchg_relaxed(atomic_t *v, int *old, int new)
 {
 	instrument_atomic_read_write(v, sizeof(*v));
-	instrument_atomic_read_write(old, sizeof(*old));
+	instrument_read_write(old, sizeof(*old));
 	return arch_atomic_try_cmpxchg_relaxed(v, old, new);
 }
 
@@ -599,6 +599,27 @@ atomic_dec_if_positive(atomic_t *v)
 	return arch_atomic_dec_if_positive(v);
 }
 
+#define atomic_inc_ofl(v, L) \
+({ \
+	typeof(v) __ai_v = (v); \
+	instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \
+	arch_atomic_inc_ofl(__ai_v, L); \
+})
+
+#define atomic_dec_ofl(v, L) \
+({ \
+	typeof(v) __ai_v = (v); \
+	instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \
+	arch_atomic_dec_ofl(__ai_v, L); \
+})
+
+#define atomic_dec_and_test_ofl(v, L) \
+({ \
+	typeof(v) __ai_v = (v); \
+	instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \
+	arch_atomic_dec_and_test_ofl(__ai_v, L); \
+})
+
 static __always_inline s64
 atomic64_read(const atomic64_t *v)
 {
@@ -1079,7 +1100,7 @@ static __always_inline bool
 atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
 {
 	instrument_atomic_read_write(v, sizeof(*v));
-	instrument_atomic_read_write(old, sizeof(*old));
+	instrument_read_write(old, sizeof(*old));
 	return arch_atomic64_try_cmpxchg(v, old, new);
 }
 
@@ -1087,7 +1108,7 @@ static __always_inline bool
 atomic64_try_cmpxchg_acquire(atomic64_t *v, s64 *old, s64 new)
 {
 	instrument_atomic_read_write(v, sizeof(*v));
-	instrument_atomic_read_write(old, sizeof(*old));
+	instrument_read_write(old, sizeof(*old));
 	return arch_atomic64_try_cmpxchg_acquire(v, old, new);
 }
 
@@ -1095,7 +1116,7 @@ static __always_inline bool
 atomic64_try_cmpxchg_release(atomic64_t *v, s64 *old, s64 new)
 {
 	instrument_atomic_read_write(v, sizeof(*v));
-	instrument_atomic_read_write(old, sizeof(*old));
+	instrument_read_write(old, sizeof(*old));
 	return arch_atomic64_try_cmpxchg_release(v, old, new);
 }
 
@@ -1103,7 +1124,7 @@ static __always_inline bool
 atomic64_try_cmpxchg_relaxed(atomic64_t *v, s64 *old, s64 new)
 {
 	instrument_atomic_read_write(v, sizeof(*v));
-	instrument_atomic_read_write(old, sizeof(*old));
+	instrument_read_write(old, sizeof(*old));
 	return arch_atomic64_try_cmpxchg_relaxed(v, old, new);
 }
 
@@ -1177,6 +1198,27 @@ atomic64_dec_if_positive(atomic64_t *v)
 	return arch_atomic64_dec_if_positive(v);
 }
 
+#define atomic64_inc_ofl(v, L) \
+({ \
+	typeof(v) __ai_v = (v); \
+	instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \
+	arch_atomic64_inc_ofl(__ai_v, L); \
+})
+
+#define atomic64_dec_ofl(v, L) \
+({ \
+	typeof(v) __ai_v = (v); \
+	instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \
+	arch_atomic64_dec_ofl(__ai_v, L); \
+})
+
+#define atomic64_dec_and_test_ofl(v, L) \
+({ \
+	typeof(v) __ai_v = (v); \
+	instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \
+	arch_atomic64_dec_and_test_ofl(__ai_v, L); \
+})
+
 static __always_inline long
 atomic_long_read(const atomic_long_t *v)
 {
@@ -1657,7 +1699,7 @@ static __always_inline bool
 atomic_long_try_cmpxchg(atomic_long_t *v, long *old, long new)
 {
 	instrument_atomic_read_write(v, sizeof(*v));
-	instrument_atomic_read_write(old, sizeof(*old));
+	instrument_read_write(old, sizeof(*old));
 	return arch_atomic_long_try_cmpxchg(v, old, new);
 }
 
@@ -1665,7 +1707,7 @@ static __always_inline bool
 atomic_long_try_cmpxchg_acquire(atomic_long_t *v, long *old, long new)
 {
 	instrument_atomic_read_write(v, sizeof(*v));
-	instrument_atomic_read_write(old, sizeof(*old));
+	instrument_read_write(old, sizeof(*old));
 	return arch_atomic_long_try_cmpxchg_acquire(v, old, new);
 }
 
@@ -1673,7 +1715,7 @@ static __always_inline bool
 atomic_long_try_cmpxchg_release(atomic_long_t *v, long *old, long new)
 {
 	instrument_atomic_read_write(v, sizeof(*v));
-	instrument_atomic_read_write(old, sizeof(*old));
+	instrument_read_write(old, sizeof(*old));
 	return arch_atomic_long_try_cmpxchg_release(v, old, new);
 }
 
@@ -1681,7 +1723,7 @@ static __always_inline bool
 atomic_long_try_cmpxchg_relaxed(atomic_long_t *v, long *old, long new)
 {
 	instrument_atomic_read_write(v, sizeof(*v));
-	instrument_atomic_read_write(old, sizeof(*old));
+	instrument_read_write(old, sizeof(*old));
 	return arch_atomic_long_try_cmpxchg_relaxed(v, old, new);
 }
 
@@ -1755,87 +1797,108 @@ atomic_long_dec_if_positive(atomic_long_
 	return arch_atomic_long_dec_if_positive(v);
 }
 
+#define atomic_long_inc_ofl(v, L) \
+({ \
+	typeof(v) __ai_v = (v); \
+	instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \
+	arch_atomic_long_inc_ofl(__ai_v, L); \
+})
+
+#define atomic_long_dec_ofl(v, L) \
+({ \
+	typeof(v) __ai_v = (v); \
+	instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \
+	arch_atomic_long_dec_ofl(__ai_v, L); \
+})
+
+#define atomic_long_dec_and_test_ofl(v, L) \
+({ \
+	typeof(v) __ai_v = (v); \
+	instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \
+	arch_atomic_long_dec_and_test_ofl(__ai_v, L); \
+})
+
 #define xchg(ptr, ...) \
 ({ \
 	typeof(ptr) __ai_ptr = (ptr); \
-	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
 	arch_xchg(__ai_ptr, __VA_ARGS__); \
 })
 
 #define xchg_acquire(ptr, ...) \
 ({ \
 	typeof(ptr) __ai_ptr = (ptr); \
-	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
 	arch_xchg_acquire(__ai_ptr, __VA_ARGS__); \
 })
 
 #define xchg_release(ptr, ...) \
 ({ \
 	typeof(ptr) __ai_ptr = (ptr); \
-	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
 	arch_xchg_release(__ai_ptr, __VA_ARGS__); \
 })
 
 #define xchg_relaxed(ptr, ...) \
 ({ \
 	typeof(ptr) __ai_ptr = (ptr); \
-	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
 	arch_xchg_relaxed(__ai_ptr, __VA_ARGS__); \
 })
 
 #define cmpxchg(ptr, ...) \
 ({ \
 	typeof(ptr) __ai_ptr = (ptr); \
-	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
 	arch_cmpxchg(__ai_ptr, __VA_ARGS__); \
 })
 
 #define cmpxchg_acquire(ptr, ...) \
 ({ \
 	typeof(ptr) __ai_ptr = (ptr); \
-	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
 	arch_cmpxchg_acquire(__ai_ptr, __VA_ARGS__); \
 })
 
 #define cmpxchg_release(ptr, ...) \
 ({ \
 	typeof(ptr) __ai_ptr = (ptr); \
-	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
 	arch_cmpxchg_release(__ai_ptr, __VA_ARGS__); \
 })
 
 #define cmpxchg_relaxed(ptr, ...) \
 ({ \
 	typeof(ptr) __ai_ptr = (ptr); \
-	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
 	arch_cmpxchg_relaxed(__ai_ptr, __VA_ARGS__); \
 })
 
 #define cmpxchg64(ptr, ...) \
 ({ \
 	typeof(ptr) __ai_ptr = (ptr); \
-	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
 	arch_cmpxchg64(__ai_ptr, __VA_ARGS__); \
 })
 
 #define cmpxchg64_acquire(ptr, ...) \
 ({ \
 	typeof(ptr) __ai_ptr = (ptr); \
-	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
 	arch_cmpxchg64_acquire(__ai_ptr, __VA_ARGS__); \
 })
 
 #define cmpxchg64_release(ptr, ...) \
 ({ \
 	typeof(ptr) __ai_ptr = (ptr); \
-	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
 	arch_cmpxchg64_release(__ai_ptr, __VA_ARGS__); \
 })
 
 #define cmpxchg64_relaxed(ptr, ...) \
 ({ \
 	typeof(ptr) __ai_ptr = (ptr); \
-	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
 	arch_cmpxchg64_relaxed(__ai_ptr, __VA_ARGS__); \
 })
 
@@ -1843,8 +1906,8 @@ atomic_long_dec_if_positive(atomic_long_
 ({ \
 	typeof(ptr) __ai_ptr = (ptr); \
 	typeof(oldp) __ai_oldp = (oldp); \
-	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
-	instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
+	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	instrument_read_write(__ai_oldp, sizeof(*__ai_oldp)); \
 	arch_try_cmpxchg(__ai_ptr, __ai_oldp, __VA_ARGS__); \
 })
 
@@ -1852,8 +1915,8 @@ atomic_long_dec_if_positive(atomic_long_
 ({ \
 	typeof(ptr) __ai_ptr = (ptr); \
 	typeof(oldp) __ai_oldp = (oldp); \
-	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
-	instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
+	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	instrument_read_write(__ai_oldp, sizeof(*__ai_oldp)); \
 	arch_try_cmpxchg_acquire(__ai_ptr, __ai_oldp, __VA_ARGS__); \
 })
 
@@ -1861,8 +1924,8 @@ atomic_long_dec_if_positive(atomic_long_
 ({ \
 	typeof(ptr) __ai_ptr = (ptr); \
 	typeof(oldp) __ai_oldp = (oldp); \
-	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
-	instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
+	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	instrument_read_write(__ai_oldp, sizeof(*__ai_oldp)); \
 	arch_try_cmpxchg_release(__ai_ptr, __ai_oldp, __VA_ARGS__); \
 })
 
@@ -1870,36 +1933,36 @@ atomic_long_dec_if_positive(atomic_long_
 ({ \
 	typeof(ptr) __ai_ptr = (ptr); \
 	typeof(oldp) __ai_oldp = (oldp); \
-	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
-	instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
+	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	instrument_read_write(__ai_oldp, sizeof(*__ai_oldp)); \
 	arch_try_cmpxchg_relaxed(__ai_ptr, __ai_oldp, __VA_ARGS__); \
 })
 
 #define cmpxchg_local(ptr, ...) \
 ({ \
 	typeof(ptr) __ai_ptr = (ptr); \
-	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
 	arch_cmpxchg_local(__ai_ptr, __VA_ARGS__); \
 })
 
 #define cmpxchg64_local(ptr, ...) \
 ({ \
 	typeof(ptr) __ai_ptr = (ptr); \
-	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
 	arch_cmpxchg64_local(__ai_ptr, __VA_ARGS__); \
 })
 
 #define sync_cmpxchg(ptr, ...) \
 ({ \
 	typeof(ptr) __ai_ptr = (ptr); \
-	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
 	arch_sync_cmpxchg(__ai_ptr, __VA_ARGS__); \
 })
 
 #define cmpxchg_double(ptr, ...) \
 ({ \
 	typeof(ptr) __ai_ptr = (ptr); \
-	instrument_atomic_write(__ai_ptr, 2 * sizeof(*__ai_ptr)); \
+	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
 	arch_cmpxchg_double(__ai_ptr, __VA_ARGS__); \
 })
 
@@ -1907,9 +1970,9 @@ atomic_long_dec_if_positive(atomic_long_
 #define cmpxchg_double_local(ptr, ...) \
 ({ \
 	typeof(ptr) __ai_ptr = (ptr); \
-	instrument_atomic_write(__ai_ptr, 2 * sizeof(*__ai_ptr)); \
+	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
 	arch_cmpxchg_double_local(__ai_ptr, __VA_ARGS__); \
 })
 
 #endif /* _LINUX_ATOMIC_INSTRUMENTED_H */
-// 2a9553f0a9d5619f19151092df5cabbbf16ce835
+// 214f9a7e972966a9a8e28e1568665cfb75decf91
--- a/include/linux/atomic/atomic-long.h
+++ b/include/linux/atomic/atomic-long.h
@@ -515,6 +515,21 @@ arch_atomic_long_dec_if_positive(atomic_
 	return arch_atomic64_dec_if_positive(v);
 }
 
+#define arch_atomic_long_inc_ofl(v, L) \
+({ \
+	arch_atomic64_inc_ofl((v), L) \
+})
+
+#define arch_atomic_long_dec_ofl(v, L) \
+({ \
+	arch_atomic64_dec_ofl((v), L) \
+})
+
+#define arch_atomic_long_dec_and_test_ofl(v, L) \
+({ \
+	arch_atomic64_dec_and_test_ofl((v), L) \
+})
+
 #else /* CONFIG_64BIT */
 
 static __always_inline long
@@ -1009,6 +1024,21 @@ arch_atomic_long_dec_if_positive(atomic_
 	return arch_atomic_dec_if_positive(v);
 }
 
+#define arch_atomic_long_inc_ofl(v, L) \
+({ \
+	arch_atomic_inc_ofl((v), L) \
+})
+
+#define arch_atomic_long_dec_ofl(v, L) \
+({ \
+	arch_atomic_dec_ofl((v), L) \
+})
+
+#define arch_atomic_long_dec_and_test_ofl(v, L) \
+({ \
+	arch_atomic_dec_and_test_ofl((v), L) \
+})
+
 #endif /* CONFIG_64BIT */
 #endif /* _LINUX_ATOMIC_LONG_H */
-// e8f0e08ff072b74d180eabe2ad001282b38c2c88
+// 120f718985fa4c8f0e884cc4f23db8aa950255fb
--- a/scripts/atomic/atomic-tbl.sh
+++ b/scripts/atomic/atomic-tbl.sh
@@ -36,6 +36,12 @@ meta_has_relaxed()
 	meta_in "$1" "BFIR"
 }
 
+#meta_has_macro(meta)
+meta_has_macro()
+{
+	meta_in "$1" "m"
+}
+
 #find_fallback_template(pfx, name, sfx, order)
 find_fallback_template()
 {
--- a/scripts/atomic/atomics.tbl
+++ b/scripts/atomic/atomics.tbl
@@ -10,12 +10,15 @@
 # * F/f	- fetch: returns base type (has fetch_ variants)
 # * l	- load: returns base type (has _acquire order variant)
 # * s	- store: returns void (has _release order variant)
+# * m	- macro:
 #
 # Where args contains list of type[:name], where type is:
 # * cv	- const pointer to atomic base type (atomic_t/atomic64_t/atomic_long_t)
 # * v	- pointer to atomic base type (atomic_t/atomic64_t/atomic_long_t)
 # * i	- base type (int/s64/long)
 # * p	- pointer to base type (int/s64/long)
+# * L	- label for exception case
+# * V:... - vararg
 #
 read			l	cv
 set			s	v	i
@@ -39,3 +42,6 @@ inc_not_zero		b	v
 inc_unless_negative	b	v
 dec_unless_positive	b	v
 dec_if_positive		i	v
+inc_ofl			m	v	L
+dec_ofl			m	v	L
+dec_and_test_ofl	m	v	L
--- /dev/null
+++ b/scripts/atomic/fallbacks/dec_and_test_ofl
@@ -0,0 +1,12 @@
+cat << EOF
+#define arch_${atomic}_dec_and_test_ofl(_v, _label)			\\
+({									\\
+	bool __ret = false;						\\
+	${int} __new = arch_${atomic}_dec_return(_v);			\\
+	if (unlikely(__new < 0))					\\
+		goto _label;						\\
+	if (unlikely(__new == 0))					\\
+		__ret = true;						\\
+	__ret;								\\
+})
+EOF
--- /dev/null
+++ b/scripts/atomic/fallbacks/dec_ofl
@@ -0,0 +1,8 @@
+cat << EOF
+#define arch_${atomic}_dec_ofl(_v, _label)				\\
+do {									\\
+	${int} __new = arch_${atomic}_dec_return(_v);			\\
+	if (unlikely(__new <= 0))					\\
+		goto _label;						\\
+} while (0)
+EOF
--- /dev/null
+++ b/scripts/atomic/fallbacks/inc_ofl
@@ -0,0 +1,8 @@
+cat << EOF
+#define arch_${atomic}_inc_ofl(_v, _label)				\\
+do {									\\
+	${int} __old = arch_${atomic}_fetch_inc(_v);			\\
+	if (unlikely(__old <= 0))					\\
+		goto _label;						\\
+} while (0)
+EOF
--- a/scripts/atomic/gen-atomic-fallback.sh
+++ b/scripts/atomic/gen-atomic-fallback.sh
@@ -27,7 +27,9 @@ gen_template_fallback()
 	if [ ! -z "${template}" ]; then
 		printf "#ifndef ${atomicname}\n"
 		. ${template}
-		printf "#define ${atomicname} ${atomicname}\n"
+		if ! meta_has_macro "${meta}"; then
+			printf "#define ${atomicname} ${atomicname}\n"
+		fi
 		printf "#endif\n\n"
 	fi
 }
--- a/scripts/atomic/gen-atomic-instrumented.sh
+++ b/scripts/atomic/gen-atomic-instrumented.sh
@@ -13,9 +13,13 @@ gen_param_check()
 	local type="${arg%%:*}"
 	local name="$(gen_param_name "${arg}")"
 	local rw="write"
+	local pfx;
 
 	case "${type#c}" in
+	v) pfx="atomic_";;
 	i) return;;
+	L) return;;
+	V) return;;
 	esac
 
 	if [ ${type#c} != ${type} ]; then
@@ -27,7 +31,16 @@ gen_param_check()
 		rw="read_write"
 	fi
 
-	printf "\tinstrument_atomic_${rw}(${name}, sizeof(*${name}));\n"
+	if meta_has_macro "${meta}"; then
+		name="__ai_${name}"
+	fi
+
+	printf "\tinstrument_${pfx}${rw}(${name}, sizeof(*${name}));"
+	if meta_has_macro "${meta}"; then
+		printf " \\"
+	fi
+	printf "\n"
+
 }
 
 #gen_params_checks(meta, arg...)
@@ -41,6 +54,52 @@ gen_params_checks()
 	done
 }
 
+#gen_var(arg)
+gen_var()
+{
+	local type="${1%%:*}"
+	local name="$(gen_param_name "$1")"
+
+	case "${type#c}" in
+	L) return;;
+	V) return;;
+	esac
+
+	printf "\ttypeof(${name}) __ai_${name} = (${name}); \\\\\n";
+}
+
+#gen_vars(arg...)
+gen_vars()
+{
+	while [ "$#" -gt 0 ]; do
+		gen_var "$1"
+		shift
+	done
+}
+
+#gen_varg(arg)
+gen_varg()
+{
+	local type="${1%%:*}"
+	local name="$(gen_param_name "$1")"
+
+	case "${type#c}" in
+	L)	printf "${name}";;
+	V)	printf "__VA_ARGS__";;
+	*)	printf "__ai_${name}";;
+	esac
+}
+
+#gen_vargs(arg...)
+gen_vargs()
+{
+	while [ "$#" -gt 0 ]; do
+		printf "$(gen_varg "$1")"
+		[ "$#" -gt 1 ] && printf ", "
+		shift
+	done
+}
+
 #gen_proto_order_variant(meta, pfx, name, sfx, order, atomic, int, arg...)
 gen_proto_order_variant()
 {
@@ -54,11 +113,28 @@ gen_proto_order_variant()
 
 	local atomicname="${atomic}_${pfx}${name}${sfx}${order}"
 
-	local ret="$(gen_ret_type "${meta}" "${int}")"
-	local params="$(gen_params "${int}" "${atomic}" "$@")"
 	local checks="$(gen_params_checks "${meta}" "$@")"
 	local args="$(gen_args "$@")"
-	local retstmt="$(gen_ret_stmt "${meta}")"
+
+	if meta_has_macro "${meta}"; then
+
+		local vars="$(gen_vars "$@")"
+		local vargs="$(gen_vargs "$@")"
+
+cat <<EOF
+#define ${atomicname}(${args}) \\
+({ \\
+${vars}
+${checks}
+	arch_${atomicname}(${vargs}); \\
+})
+EOF
+
+	else
+
+		local ret="$(gen_ret_type "${meta}" "${int}")"
+		local params="$(gen_params "${int}" "${atomic}" "$@")"
+		local retstmt="$(gen_ret_stmt "${meta}")"
 
 cat <<EOF
 static __always_inline ${ret}
@@ -69,6 +145,8 @@ ${checks}
 }
 EOF
 
+	fi
+
 	printf "\n"
 }
 
@@ -76,32 +154,27 @@ gen_xchg()
 {
 	local xchg="$1"; shift
 	local mult="$1"; shift
+	local ARGS;
 
 	if [ "${xchg%${xchg#try_cmpxchg}}" = "try_cmpxchg" ] ; then
-
-cat <<EOF
-#define ${xchg}(ptr, oldp, ...) \\
-({ \\
-	typeof(ptr) __ai_ptr = (ptr); \\
-	typeof(oldp) __ai_oldp = (oldp); \\
-	instrument_atomic_write(__ai_ptr, ${mult}sizeof(*__ai_ptr)); \\
-	instrument_atomic_write(__ai_oldp, ${mult}sizeof(*__ai_oldp)); \\
-	arch_${xchg}(__ai_ptr, __ai_oldp, __VA_ARGS__); \\
-})
-EOF
-
+		ARGS="v:ptr p:oldp V:..."
 	else
+		ARGS="v:ptr V:..."
+	fi
+
+	local args="$(gen_args ${ARGS})"
+	local vars="$(gen_vars ${ARGS})"
+	local checks="$(gen_params_checks "m" ${ARGS})"
+	local vargs="$(gen_vargs ${ARGS})"
 
 cat <<EOF
-#define ${xchg}(ptr, ...) \\
+#define ${xchg}(${args}) \\
 ({ \\
-	typeof(ptr) __ai_ptr = (ptr); \\
-	instrument_atomic_write(__ai_ptr, ${mult}sizeof(*__ai_ptr)); \\
-	arch_${xchg}(__ai_ptr, __VA_ARGS__); \\
+${vars}
+${checks}
+	arch_${xchg}(${vargs}); \\
 })
 EOF
-
-	fi
 }
 
 cat << EOF
--- a/scripts/atomic/gen-atomic-long.sh
+++ b/scripts/atomic/gen-atomic-long.sh
@@ -17,16 +17,21 @@ gen_cast()
 	printf "($(gen_param_type "${arg}" "${int}" "${atomic}"))"
 }
 
-#gen_args_cast(int, atomic, arg...)
+#gen_args_cast(meta, int, atomic, arg...)
 gen_args_cast()
 {
+	local meta=$1; shift
 	local int="$1"; shift
 	local atomic="$1"; shift
 
 	while [ "$#" -gt 0 ]; do
 		local cast="$(gen_cast "$1" "${int}" "${atomic}")"
 		local arg="$(gen_param_name "$1")"
-		printf "${cast}${arg}"
+		if meta_has_macro "${meta}" && [ "${1%%:*}" != "L" ]; then
+			printf "${cast}(${arg})"
+		else
+			printf "${cast}${arg}"
+		fi
 		[ "$#" -gt 1 ] && printf ", "
 		shift;
 	done
@@ -40,10 +45,24 @@ gen_proto_order_variant()
 	local atomic="$1"; shift
 	local int="$1"; shift
 
-	local ret="$(gen_ret_type "${meta}" "long")"
-	local params="$(gen_params "long" "atomic_long" "$@")"
-	local argscast="$(gen_args_cast "${int}" "${atomic}" "$@")"
-	local retstmt="$(gen_ret_stmt "${meta}")"
+	local argscast="$(gen_args_cast "${meta}" "${int}" "${atomic}" "$@")"
+
+	if meta_has_macro "${meta}"; then
+
+		local args="$(gen_args "$@")"
+
+cat <<EOF
+#define arch_atomic_long_${name}(${args}) \\
+({ \\
+	arch_${atomic}_${name}(${argscast}) \\
+})
+
+EOF
+	else
+
+		local ret="$(gen_ret_type "${meta}" "long")"
+		local params="$(gen_params "long" "atomic_long" "$@")"
+		local retstmt="$(gen_ret_stmt "${meta}")"
 
 cat <<EOF
 static __always_inline ${ret}
@@ -53,6 +72,7 @@ arch_atomic_long_${name}(${params})
 }
 
 EOF
+	fi
 }
 
 cat << EOF



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC][PATCH 2/5] refcount: Use atomic_*_ofl()
  2021-12-08 18:36 [RFC][PATCH 0/5] refcount: Improve code-gen Peter Zijlstra
  2021-12-08 18:36 ` [RFC][PATCH 1/5] atomic: Introduce atomic_{inc,dec,dec_and_test}_ofl() Peter Zijlstra
@ 2021-12-08 18:36 ` Peter Zijlstra
  2021-12-08 19:19   ` Peter Zijlstra
  2021-12-09 13:17   ` Mark Rutland
  2021-12-08 18:36 ` [RFC][PATCH 3/5] refcount: Improve out-of-line code-gen Peter Zijlstra
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 17+ messages in thread
From: Peter Zijlstra @ 2021-12-08 18:36 UTC (permalink / raw)
  To: will, boqun.feng
  Cc: linux-kernel, x86, peterz, mark.rutland, elver, keescook, hch, torvalds

Use the shiny new atomic_*_ofl() functions in order to have better
code-gen.

Notably refcount_inc() case no longer distinguishes between
inc-from-zero and inc-negative in the fast path, this improves
code-gen:

    4b88:       b8 01 00 00 00          mov    $0x1,%eax
    4b8d:       f0 0f c1 43 28          lock xadd %eax,0x28(%rbx)
    4b92:       85 c0                   test   %eax,%eax
    4b94:       74 1b                   je     4bb1 <alloc_perf_context+0xf1>
    4b96:       8d 50 01                lea    0x1(%rax),%edx
    4b99:       09 c2                   or     %eax,%edx
    4b9b:       78 20                   js     4bbd <alloc_perf_context+0xfd>

to:

    4768:       b8 01 00 00 00          mov    $0x1,%eax
    476d:       f0 0f c1 43 28          lock xadd %eax,0x28(%rbx)
    4772:       85 c0                   test   %eax,%eax
    4774:       7e 14                   jle    478a <alloc_perf_context+0xea>


without sacrificing on functionality; the only thing that suffers is
the reported error condition, which might now 'spuriously' report
'saturated' instead of 'inc-from-zero'.

refcount_dec_and_test() is also improved:

    aa40:       b8 ff ff ff ff          mov    $0xffffffff,%eax
    aa45:       f0 0f c1 07             lock xadd %eax,(%rdi)
    aa49:       83 f8 01                cmp    $0x1,%eax
    aa4c:       74 05                   je     aa53 <ring_buffer_put+0x13>
    aa4e:       85 c0                   test   %eax,%eax
    aa50:       7e 1e                   jle    aa70 <ring_buffer_put+0x30>
    aa52:       c3                      ret

to:

    a980:       b8 ff ff ff ff          mov    $0xffffffff,%eax
    a985:       f0 0f c1 07             lock xadd %eax,(%rdi)
    a989:       83 e8 01                sub    $0x1,%eax
    a98c:       78 20                   js     a9ae <ring_buffer_put+0x2e>
    a98e:       74 01                   je     a991 <ring_buffer_put+0x11>
    a990:       c3                      ret

XXX atomic_dec_and_test_ofl() is strictly stronger ordered than
refcount_dec_and_test() is defined -- Power and Arrghh64 ?

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/refcount.h |   15 ++++++++++++---
 lib/refcount.c           |    5 +++++
 2 files changed, 17 insertions(+), 3 deletions(-)

--- a/include/linux/refcount.h
+++ b/include/linux/refcount.h
@@ -264,7 +264,10 @@ static inline void __refcount_inc(refcou
  */
 static inline void refcount_inc(refcount_t *r)
 {
-	__refcount_inc(r, NULL);
+	atomic_inc_ofl(&r->refs, Eoverflow);
+	return;
+Eoverflow:
+	refcount_warn_saturate(r, REFCOUNT_ADD_OVF);
 }
 
 static inline __must_check bool __refcount_sub_and_test(int i, refcount_t *r, int *oldp)
@@ -330,7 +333,10 @@ static inline __must_check bool __refcou
  */
 static inline __must_check bool refcount_dec_and_test(refcount_t *r)
 {
-	return __refcount_dec_and_test(r, NULL);
+	return atomic_dec_and_test_ofl(&r->refs, Eoverflow);
+Eoverflow:
+	refcount_warn_saturate(r, REFCOUNT_SUB_UAF);
+	return false;
 }
 
 static inline void __refcount_dec(refcount_t *r, int *oldp)
@@ -356,7 +362,10 @@ static inline void __refcount_dec(refcou
  */
 static inline void refcount_dec(refcount_t *r)
 {
-	__refcount_dec(r, NULL);
+	atomic_dec_ofl(&r->refs, Eoverflow);
+	return;
+Eoverflow:
+	refcount_warn_saturate(r, REFCOUNT_DEC_LEAK);
 }
 
 extern __must_check bool refcount_dec_if_one(refcount_t *r);
--- a/lib/refcount.c
+++ b/lib/refcount.c
@@ -12,8 +12,13 @@
 
 void refcount_warn_saturate(refcount_t *r, enum refcount_saturation_type t)
 {
+	int old = refcount_read(r);
 	refcount_set(r, REFCOUNT_SATURATED);
 
+	/* racy; who cares */
+	if (old == 1 && t == REFCOUNT_ADD_OVF)
+		t = REFCOUNT_ADD_UAF;
+
 	switch (t) {
 	case REFCOUNT_ADD_NOT_ZERO_OVF:
 		REFCOUNT_WARN("saturated; leaking memory");



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC][PATCH 3/5] refcount: Improve out-of-line code-gen
  2021-12-08 18:36 [RFC][PATCH 0/5] refcount: Improve code-gen Peter Zijlstra
  2021-12-08 18:36 ` [RFC][PATCH 1/5] atomic: Introduce atomic_{inc,dec,dec_and_test}_ofl() Peter Zijlstra
  2021-12-08 18:36 ` [RFC][PATCH 2/5] refcount: Use atomic_*_ofl() Peter Zijlstra
@ 2021-12-08 18:36 ` Peter Zijlstra
  2021-12-09  8:33   ` Peter Zijlstra
  2021-12-08 18:36 ` [RFC][PATCH 4/5] atomic,x86: Implement atomic_dec_and_test_ofl() Peter Zijlstra
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2021-12-08 18:36 UTC (permalink / raw)
  To: will, boqun.feng
  Cc: linux-kernel, x86, peterz, mark.rutland, elver, keescook, hch, torvalds

Allow a number of ops to tail-call refcount_warn_saturate() in order
to generate smaller out-of-line code.

   text    data     bss     dec     hex filename
  97341    4985    1116  103442   19412 defconfig-build/kernel/events/core.o
  97299    4985    1116  103400   193e8 defconfig-build/kernel/events/core.o

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/refcount.h |    9 ++++-----
 lib/refcount.c           |    4 +++-
 2 files changed, 7 insertions(+), 6 deletions(-)

--- a/include/linux/refcount.h
+++ b/include/linux/refcount.h
@@ -124,7 +124,7 @@ enum refcount_saturation_type {
 	REFCOUNT_DEC_LEAK,
 };
 
-void refcount_warn_saturate(refcount_t *r, enum refcount_saturation_type t);
+bool refcount_warn_saturate(refcount_t *r, enum refcount_saturation_type t);
 
 /**
  * refcount_set - set a refcount's value
@@ -160,7 +160,7 @@ static inline __must_check bool __refcou
 		*oldp = old;
 
 	if (unlikely(old < 0 || old + i < 0))
-		refcount_warn_saturate(r, REFCOUNT_ADD_NOT_ZERO_OVF);
+		return refcount_warn_saturate(r, REFCOUNT_ADD_NOT_ZERO_OVF);
 
 	return old;
 }
@@ -283,7 +283,7 @@ static inline __must_check bool __refcou
 	}
 
 	if (unlikely(old < 0 || old - i < 0))
-		refcount_warn_saturate(r, REFCOUNT_SUB_UAF);
+		return refcount_warn_saturate(r, REFCOUNT_SUB_UAF);
 
 	return false;
 }
@@ -335,8 +335,7 @@ static inline __must_check bool refcount
 {
 	return atomic_dec_and_test_ofl(&r->refs, Eoverflow);
 Eoverflow:
-	refcount_warn_saturate(r, REFCOUNT_SUB_UAF);
-	return false;
+	return refcount_warn_saturate(r, REFCOUNT_SUB_UAF);
 }
 
 static inline void __refcount_dec(refcount_t *r, int *oldp)
--- a/lib/refcount.c
+++ b/lib/refcount.c
@@ -10,7 +10,7 @@
 
 #define REFCOUNT_WARN(str)	WARN_ONCE(1, "refcount_t: " str ".\n")
 
-void refcount_warn_saturate(refcount_t *r, enum refcount_saturation_type t)
+bool refcount_warn_saturate(refcount_t *r, enum refcount_saturation_type t)
 {
 	int old = refcount_read(r);
 	refcount_set(r, REFCOUNT_SATURATED);
@@ -38,6 +38,8 @@ void refcount_warn_saturate(refcount_t *
 	default:
 		REFCOUNT_WARN("unknown saturation event!?");
 	}
+
+	return false;
 }
 EXPORT_SYMBOL(refcount_warn_saturate);
 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC][PATCH 4/5] atomic,x86: Implement atomic_dec_and_test_ofl()
  2021-12-08 18:36 [RFC][PATCH 0/5] refcount: Improve code-gen Peter Zijlstra
                   ` (2 preceding siblings ...)
  2021-12-08 18:36 ` [RFC][PATCH 3/5] refcount: Improve out-of-line code-gen Peter Zijlstra
@ 2021-12-08 18:36 ` Peter Zijlstra
  2021-12-08 18:37 ` [RFC][PATCH 5/5] atomic: Document the atomic_{}_ofl() functions Peter Zijlstra
  2021-12-09  8:25 ` [RFC][PATCH 0/5] refcount: Improve code-gen Peter Zijlstra
  5 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2021-12-08 18:36 UTC (permalink / raw)
  To: will, boqun.feng
  Cc: linux-kernel, x86, peterz, mark.rutland, elver, keescook, hch, torvalds

Provide a better implementation of atomic_{inc,dec_and_test}_ofl() by
making use of the atomic-op condition codes.

This further improves the fast path code:

    a980:       b8 ff ff ff ff          mov    $0xffffffff,%eax
    a985:       f0 0f c1 07             lock xadd %eax,(%rdi)
    a989:       83 e8 01                sub    $0x1,%eax
    a98c:       78 20                   js     a9ae <ring_buffer_put+0x2e>
    a98e:       74 01                   je     a991 <ring_buffer_put+0x11>
    a990:       c3                      ret

to:

    ab81:       48 89 fb                mov    %rdi,%rbx
    ab84:       f0 ff 0f                lock decl (%rdi)
    ab87:       7c 04                   jl     ab8d <ring_buffer_put+0xd>
    ab89:       74 10                   je     ab9b <ring_buffer_put+0x1b>
    ab8b:       c3                      ret

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/atomic.h |   23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -263,6 +263,29 @@ static __always_inline int arch_atomic_f
 }
 #define arch_atomic_fetch_xor arch_atomic_fetch_xor
 
+#define arch_atomic_dec_ofl(_v, _label)					\
+	asm_volatile_goto(LOCK_PREFIX "decl %[var]\n\t"			\
+			  "jle %l1"					\
+			  : : [var] "m" ((_v)->counter)			\
+			  : "memory"					\
+			  : _label)
+
+#define arch_atomic_dec_and_test_ofl(_v, _label)			\
+({									\
+	__label__ __zero;						\
+	__label__ __out;						\
+	bool __ret = false;						\
+	asm_volatile_goto(LOCK_PREFIX "decl %[var]\n\t"			\
+			  "jl %l2\n\t"					\
+			  "je %l[__zero]"				\
+			  : : [var] "m" ((_v)->counter)			\
+			  : "memory"					\
+			  : __zero, _label);				\
+	goto __out;							\
+__zero:	__ret = true;							\
+__out:	__ret;								\
+})
+
 #ifdef CONFIG_X86_32
 # include <asm/atomic64_32.h>
 #else



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC][PATCH 5/5] atomic: Document the atomic_{}_ofl() functions
  2021-12-08 18:36 [RFC][PATCH 0/5] refcount: Improve code-gen Peter Zijlstra
                   ` (3 preceding siblings ...)
  2021-12-08 18:36 ` [RFC][PATCH 4/5] atomic,x86: Implement atomic_dec_and_test_ofl() Peter Zijlstra
@ 2021-12-08 18:37 ` Peter Zijlstra
  2021-12-09  8:25 ` [RFC][PATCH 0/5] refcount: Improve code-gen Peter Zijlstra
  5 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2021-12-08 18:37 UTC (permalink / raw)
  To: will, boqun.feng
  Cc: linux-kernel, x86, peterz, mark.rutland, elver, keescook, hch, torvalds

Them special, them cause confusion, them needing docs for reading.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 Documentation/atomic_t.txt |   20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

--- a/Documentation/atomic_t.txt
+++ b/Documentation/atomic_t.txt
@@ -44,6 +44,10 @@ The 'full' API consists of (atomic64_ an
   atomic_add_unless(), atomic_inc_not_zero()
   atomic_sub_and_test(), atomic_dec_and_test()
 
+Reference count with overflow (as used by refcount_t):
+
+  atomic_inc_ofl(), atomic_dec_ofl()
+  atomic_dec_and_test_ofl()
 
 Misc:
 
@@ -157,6 +161,22 @@ atomic variable) can be fully ordered an
 visible.
 
 
+Overflow ops:
+
+The atomic_{}_ofl() ops are similar to their !_ofl() bretheren with the
+notable exception that they take a label as their final argument to jump to
+when the atomic op overflows.
+
+Overflow for inc is zero-or-negative on the value prior to increment.
+Overflow for dec is zero-or-negative on the value after the decrement.
+Overflow for dec_and_test is negative on the value after the decrement.
+
+These semantics match the reference count use-case (for which they were
+created). Specifically incrementing from zero is a failure because 0 means the
+object is freed (IOW use-after-free). decrementing to zero is a failure
+because it goes undetected (see dec_and_test) and the object would leak.
+
+
 ORDERING  (go read memory-barriers.txt first)
 --------
 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 2/5] refcount: Use atomic_*_ofl()
  2021-12-08 18:36 ` [RFC][PATCH 2/5] refcount: Use atomic_*_ofl() Peter Zijlstra
@ 2021-12-08 19:19   ` Peter Zijlstra
  2021-12-08 20:56     ` Peter Zijlstra
  2021-12-09 13:17   ` Mark Rutland
  1 sibling, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2021-12-08 19:19 UTC (permalink / raw)
  To: will, boqun.feng
  Cc: linux-kernel, x86, mark.rutland, elver, keescook, hch, torvalds

On Wed, Dec 08, 2021 at 07:36:57PM +0100, Peter Zijlstra wrote:
> Use the shiny new atomic_*_ofl() functions in order to have better
> code-gen.
> 

*sigh*, so I forgot to adjust the other primitives to the slightly
tweaked range of these new functions..

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 2/5] refcount: Use atomic_*_ofl()
  2021-12-08 19:19   ` Peter Zijlstra
@ 2021-12-08 20:56     ` Peter Zijlstra
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2021-12-08 20:56 UTC (permalink / raw)
  To: will, boqun.feng
  Cc: linux-kernel, x86, mark.rutland, elver, keescook, hch, torvalds

On Wed, Dec 08, 2021 at 08:19:35PM +0100, Peter Zijlstra wrote:
> On Wed, Dec 08, 2021 at 07:36:57PM +0100, Peter Zijlstra wrote:
> > Use the shiny new atomic_*_ofl() functions in order to have better
> > code-gen.
> > 
> 
> *sigh*, so I forgot to adjust the other primitives to the slightly
> tweaked range of these new functions..

Looks like it all just works. The add/sub stuff needs to check before
and after anyway, and it saturing slightly earlier isn't a problem.

The only difference I found was, so I'll just fold that.

--- a/include/linux/refcount.h
+++ b/include/linux/refcount.h
@@ -346,7 +346,7 @@ static inline void __refcount_dec(refcou
 	if (oldp)
 		*oldp = old;
 
-	if (unlikely(old <= 1))
+	if (unlikely(old - 1 <= 0))
 		refcount_warn_saturate(r, REFCOUNT_DEC_LEAK);
 }
 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 0/5] refcount: Improve code-gen
  2021-12-08 18:36 [RFC][PATCH 0/5] refcount: Improve code-gen Peter Zijlstra
                   ` (4 preceding siblings ...)
  2021-12-08 18:37 ` [RFC][PATCH 5/5] atomic: Document the atomic_{}_ofl() functions Peter Zijlstra
@ 2021-12-09  8:25 ` Peter Zijlstra
  2021-12-09 16:19   ` Jens Axboe
  5 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2021-12-09  8:25 UTC (permalink / raw)
  To: will, boqun.feng
  Cc: linux-kernel, x86, mark.rutland, elver, keescook, hch, torvalds, axboe

On Wed, Dec 08, 2021 at 07:36:55PM +0100, Peter Zijlstra wrote:
> Hi,
> 
> Improves the refcount_t code-gen; I've still got to go through the latest thing
> Linus suggested, but figured I should get these patches out to see if there's
> other concerns etc..
> 

Bah, I forgot to Cc Jens, let me bounce him the lot.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 3/5] refcount: Improve out-of-line code-gen
  2021-12-08 18:36 ` [RFC][PATCH 3/5] refcount: Improve out-of-line code-gen Peter Zijlstra
@ 2021-12-09  8:33   ` Peter Zijlstra
  2021-12-09 17:51     ` Linus Torvalds
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2021-12-09  8:33 UTC (permalink / raw)
  To: will, boqun.feng
  Cc: linux-kernel, x86, mark.rutland, elver, keescook, hch, torvalds, axboe

On Wed, Dec 08, 2021 at 07:36:58PM +0100, Peter Zijlstra wrote:
> Allow a number of ops to tail-call refcount_warn_saturate() in order
> to generate smaller out-of-line code.
> 
>    text    data     bss     dec     hex filename
>   97341    4985    1116  103442   19412 defconfig-build/kernel/events/core.o
>   97299    4985    1116  103400   193e8 defconfig-build/kernel/events/core.o
> 

This patch also makes GCC do worse code-gen on the fast path, so I'll
drop it.

For some obscure raisin it causes this:

ring_buffer_put:
    a950:       f0 ff 0f                lock decl (%rdi)
    a953:       7c 20                   jl     a975 <ring_buffer_put+0x25>
    a955:       74 01                   je     a958 <ring_buffer_put+0x8>
    a957:       c3                      ret


ring_buffer_put:
    a940:       53                      push   %rbx
    a941:       48 89 fb                mov    %rdi,%rbx
    a944:       f0 ff 0f                lock decl (%rdi)
    a947:       7c 04                   jl     a94d <ring_buffer_put+0xd>
    a949:       74 10                   je     a95b <ring_buffer_put+0x1b>
    a94b:       5b                      pop    %rbx
    a94c:       c3                      ret

Which is just unacceptible...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 1/5] atomic: Introduce atomic_{inc,dec,dec_and_test}_ofl()
  2021-12-08 18:36 ` [RFC][PATCH 1/5] atomic: Introduce atomic_{inc,dec,dec_and_test}_ofl() Peter Zijlstra
@ 2021-12-09 12:42   ` Mark Rutland
  2021-12-09 13:34     ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Mark Rutland @ 2021-12-09 12:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: will, boqun.feng, linux-kernel, x86, elver, keescook, hch,
	torvalds, axboe

On Wed, Dec 08, 2021 at 07:36:56PM +0100, Peter Zijlstra wrote:
> In order to facilitate architecture support for refcount_t, introduce
> a number of new atomic primitives that have a uaccess style exception
> for overflow.
> 
> Notably:
> 
>   atomic_inc_ofl(v, Label) -- increment and goto Label when
> 			      v is zero or negative.
> 
>   atomic_dec_ofl(v, Label) -- decrement and goto Label when
> 			      the result is zero or negative
> 
>   atomic_dec_and_test_ofl(v, Label) -- decrement and return true when
> 				       the result is zero and goto Label
> 				       when the result is negative

Just to check, atomic_inc_ofl() tests the *old* value of `v`, and the other
cases check the *new* value of `v`?

For clarity, in the descriptions it might be worth:

  s/v/the old value of v/
  s/the result/the new value of v/

... which I think makes that clearer.

> Since the GCC 'Labels as Values' extention doesn't allow having the
> goto in an inline function, these new 'functions' must in fact be
> implemented as macro magic.

Oh; fun... :(

GCC allows the label to be passed into in a *nested* function, so I had a play
with that, hoping we might be able to get scoping and evaluatio with something
like:

| #define arch_${atomic}_dec_and_test_ofl(_v, _label)					\\
| ({											\\
| 	bool __nested_arch_${atomic}_dec_and_test_ovf(${atomic}_t *v, void *label)	\\
| 	{										\\
| 		${int} new = arch_${atomic}_dec_return(v);				\\
| 		if (unlikely(new < 0))							\\
| 			goto label;							\\
| 		if (unlikely(new == 0))							\\
| 			return true;							\\
| 		return false;								\\
| 	}										\\
| 	__nested_arch_${atomic}_dec_and_test_ovf(_v, &&_label);				\\
| })

... but GCC refused to inline things, and Clang doesn't appear to support
nested functions, so it does appear we're stuck with using macros...

It's a shame to have to use macros, since we have to take great care to avoid
multiple-evaluation and/or shadowing of variables, but we're probably OK in
practice given these are simple enough.

> This meant extending the atomic generation scripts to deal with
> wrapping macros instead of inline functions. Since
> xchg/cmpxchg/try_cmpxchg were already macro magic, there was existant
> code for that. While extending/improving that a few latent
> 'instrumentation' bugs were uncovered and 'accidentally' fixed.

I assume for non-RFC we can split that out into a preparatory patch. :)

Having played about a bit, I don't have any suggestions for improving the
scripting, and I think that looks OK.

Thanks,
Mark.

> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/linux/atomic/atomic-arch-fallback.h |   64 ++++++++++++
>  include/linux/atomic/atomic-instrumented.h  |  139 ++++++++++++++++++++--------
>  include/linux/atomic/atomic-long.h          |   32 ++++++
>  scripts/atomic/atomic-tbl.sh                |    6 +
>  scripts/atomic/atomics.tbl                  |    6 +
>  scripts/atomic/fallbacks/dec_and_test_ofl   |   12 ++
>  scripts/atomic/fallbacks/dec_ofl            |    8 +
>  scripts/atomic/fallbacks/inc_ofl            |    8 +
>  scripts/atomic/gen-atomic-fallback.sh       |    4 
>  scripts/atomic/gen-atomic-instrumented.sh   |  117 +++++++++++++++++++----
>  scripts/atomic/gen-atomic-long.sh           |   32 +++++-
>  11 files changed, 359 insertions(+), 69 deletions(-)
> 
> --- a/include/linux/atomic/atomic-arch-fallback.h
> +++ b/include/linux/atomic/atomic-arch-fallback.h
> @@ -1250,6 +1250,37 @@ arch_atomic_dec_if_positive(atomic_t *v)
>  #define arch_atomic_dec_if_positive arch_atomic_dec_if_positive
>  #endif
>  
> +#ifndef arch_atomic_inc_ofl
> +#define arch_atomic_inc_ofl(_v, _label)				\
> +do {									\
> +	int __old = arch_atomic_fetch_inc(_v);			\
> +	if (unlikely(__old <= 0))					\
> +		goto _label;						\
> +} while (0)
> +#endif
> +
> +#ifndef arch_atomic_dec_ofl
> +#define arch_atomic_dec_ofl(_v, _label)				\
> +do {									\
> +	int __new = arch_atomic_dec_return(_v);			\
> +	if (unlikely(__new <= 0))					\
> +		goto _label;						\
> +} while (0)
> +#endif
> +
> +#ifndef arch_atomic_dec_and_test_ofl
> +#define arch_atomic_dec_and_test_ofl(_v, _label)			\
> +({									\
> +	bool __ret = false;						\
> +	int __new = arch_atomic_dec_return(_v);			\
> +	if (unlikely(__new < 0))					\
> +		goto _label;						\
> +	if (unlikely(__new == 0))					\
> +		__ret = true;						\
> +	__ret;								\
> +})
> +#endif
> +
>  #ifdef CONFIG_GENERIC_ATOMIC64
>  #include <asm-generic/atomic64.h>
>  #endif
> @@ -2357,5 +2388,36 @@ arch_atomic64_dec_if_positive(atomic64_t
>  #define arch_atomic64_dec_if_positive arch_atomic64_dec_if_positive
>  #endif
>  
> +#ifndef arch_atomic64_inc_ofl
> +#define arch_atomic64_inc_ofl(_v, _label)				\
> +do {									\
> +	s64 __old = arch_atomic64_fetch_inc(_v);			\
> +	if (unlikely(__old <= 0))					\
> +		goto _label;						\
> +} while (0)
> +#endif
> +
> +#ifndef arch_atomic64_dec_ofl
> +#define arch_atomic64_dec_ofl(_v, _label)				\
> +do {									\
> +	s64 __new = arch_atomic64_dec_return(_v);			\
> +	if (unlikely(__new <= 0))					\
> +		goto _label;						\
> +} while (0)
> +#endif
> +
> +#ifndef arch_atomic64_dec_and_test_ofl
> +#define arch_atomic64_dec_and_test_ofl(_v, _label)			\
> +({									\
> +	bool __ret = false;						\
> +	s64 __new = arch_atomic64_dec_return(_v);			\
> +	if (unlikely(__new < 0))					\
> +		goto _label;						\
> +	if (unlikely(__new == 0))					\
> +		__ret = true;						\
> +	__ret;								\
> +})
> +#endif
> +
>  #endif /* _LINUX_ATOMIC_FALLBACK_H */
> -// cca554917d7ea73d5e3e7397dd70c484cad9b2c4
> +// a59904b14db62a38bbab8699edc4a785a97871fb
> --- a/include/linux/atomic/atomic-instrumented.h
> +++ b/include/linux/atomic/atomic-instrumented.h
> @@ -501,7 +501,7 @@ static __always_inline bool
>  atomic_try_cmpxchg(atomic_t *v, int *old, int new)
>  {
>  	instrument_atomic_read_write(v, sizeof(*v));
> -	instrument_atomic_read_write(old, sizeof(*old));
> +	instrument_read_write(old, sizeof(*old));
>  	return arch_atomic_try_cmpxchg(v, old, new);
>  }
>  
> @@ -509,7 +509,7 @@ static __always_inline bool
>  atomic_try_cmpxchg_acquire(atomic_t *v, int *old, int new)
>  {
>  	instrument_atomic_read_write(v, sizeof(*v));
> -	instrument_atomic_read_write(old, sizeof(*old));
> +	instrument_read_write(old, sizeof(*old));
>  	return arch_atomic_try_cmpxchg_acquire(v, old, new);
>  }
>  
> @@ -517,7 +517,7 @@ static __always_inline bool
>  atomic_try_cmpxchg_release(atomic_t *v, int *old, int new)
>  {
>  	instrument_atomic_read_write(v, sizeof(*v));
> -	instrument_atomic_read_write(old, sizeof(*old));
> +	instrument_read_write(old, sizeof(*old));
>  	return arch_atomic_try_cmpxchg_release(v, old, new);
>  }
>  
> @@ -525,7 +525,7 @@ static __always_inline bool
>  atomic_try_cmpxchg_relaxed(atomic_t *v, int *old, int new)
>  {
>  	instrument_atomic_read_write(v, sizeof(*v));
> -	instrument_atomic_read_write(old, sizeof(*old));
> +	instrument_read_write(old, sizeof(*old));
>  	return arch_atomic_try_cmpxchg_relaxed(v, old, new);
>  }
>  
> @@ -599,6 +599,27 @@ atomic_dec_if_positive(atomic_t *v)
>  	return arch_atomic_dec_if_positive(v);
>  }
>  
> +#define atomic_inc_ofl(v, L) \
> +({ \
> +	typeof(v) __ai_v = (v); \
> +	instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \
> +	arch_atomic_inc_ofl(__ai_v, L); \
> +})
> +
> +#define atomic_dec_ofl(v, L) \
> +({ \
> +	typeof(v) __ai_v = (v); \
> +	instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \
> +	arch_atomic_dec_ofl(__ai_v, L); \
> +})
> +
> +#define atomic_dec_and_test_ofl(v, L) \
> +({ \
> +	typeof(v) __ai_v = (v); \
> +	instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \
> +	arch_atomic_dec_and_test_ofl(__ai_v, L); \
> +})
> +
>  static __always_inline s64
>  atomic64_read(const atomic64_t *v)
>  {
> @@ -1079,7 +1100,7 @@ static __always_inline bool
>  atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
>  {
>  	instrument_atomic_read_write(v, sizeof(*v));
> -	instrument_atomic_read_write(old, sizeof(*old));
> +	instrument_read_write(old, sizeof(*old));
>  	return arch_atomic64_try_cmpxchg(v, old, new);
>  }
>  
> @@ -1087,7 +1108,7 @@ static __always_inline bool
>  atomic64_try_cmpxchg_acquire(atomic64_t *v, s64 *old, s64 new)
>  {
>  	instrument_atomic_read_write(v, sizeof(*v));
> -	instrument_atomic_read_write(old, sizeof(*old));
> +	instrument_read_write(old, sizeof(*old));
>  	return arch_atomic64_try_cmpxchg_acquire(v, old, new);
>  }
>  
> @@ -1095,7 +1116,7 @@ static __always_inline bool
>  atomic64_try_cmpxchg_release(atomic64_t *v, s64 *old, s64 new)
>  {
>  	instrument_atomic_read_write(v, sizeof(*v));
> -	instrument_atomic_read_write(old, sizeof(*old));
> +	instrument_read_write(old, sizeof(*old));
>  	return arch_atomic64_try_cmpxchg_release(v, old, new);
>  }
>  
> @@ -1103,7 +1124,7 @@ static __always_inline bool
>  atomic64_try_cmpxchg_relaxed(atomic64_t *v, s64 *old, s64 new)
>  {
>  	instrument_atomic_read_write(v, sizeof(*v));
> -	instrument_atomic_read_write(old, sizeof(*old));
> +	instrument_read_write(old, sizeof(*old));
>  	return arch_atomic64_try_cmpxchg_relaxed(v, old, new);
>  }
>  
> @@ -1177,6 +1198,27 @@ atomic64_dec_if_positive(atomic64_t *v)
>  	return arch_atomic64_dec_if_positive(v);
>  }
>  
> +#define atomic64_inc_ofl(v, L) \
> +({ \
> +	typeof(v) __ai_v = (v); \
> +	instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \
> +	arch_atomic64_inc_ofl(__ai_v, L); \
> +})
> +
> +#define atomic64_dec_ofl(v, L) \
> +({ \
> +	typeof(v) __ai_v = (v); \
> +	instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \
> +	arch_atomic64_dec_ofl(__ai_v, L); \
> +})
> +
> +#define atomic64_dec_and_test_ofl(v, L) \
> +({ \
> +	typeof(v) __ai_v = (v); \
> +	instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \
> +	arch_atomic64_dec_and_test_ofl(__ai_v, L); \
> +})
> +
>  static __always_inline long
>  atomic_long_read(const atomic_long_t *v)
>  {
> @@ -1657,7 +1699,7 @@ static __always_inline bool
>  atomic_long_try_cmpxchg(atomic_long_t *v, long *old, long new)
>  {
>  	instrument_atomic_read_write(v, sizeof(*v));
> -	instrument_atomic_read_write(old, sizeof(*old));
> +	instrument_read_write(old, sizeof(*old));
>  	return arch_atomic_long_try_cmpxchg(v, old, new);
>  }
>  
> @@ -1665,7 +1707,7 @@ static __always_inline bool
>  atomic_long_try_cmpxchg_acquire(atomic_long_t *v, long *old, long new)
>  {
>  	instrument_atomic_read_write(v, sizeof(*v));
> -	instrument_atomic_read_write(old, sizeof(*old));
> +	instrument_read_write(old, sizeof(*old));
>  	return arch_atomic_long_try_cmpxchg_acquire(v, old, new);
>  }
>  
> @@ -1673,7 +1715,7 @@ static __always_inline bool
>  atomic_long_try_cmpxchg_release(atomic_long_t *v, long *old, long new)
>  {
>  	instrument_atomic_read_write(v, sizeof(*v));
> -	instrument_atomic_read_write(old, sizeof(*old));
> +	instrument_read_write(old, sizeof(*old));
>  	return arch_atomic_long_try_cmpxchg_release(v, old, new);
>  }
>  
> @@ -1681,7 +1723,7 @@ static __always_inline bool
>  atomic_long_try_cmpxchg_relaxed(atomic_long_t *v, long *old, long new)
>  {
>  	instrument_atomic_read_write(v, sizeof(*v));
> -	instrument_atomic_read_write(old, sizeof(*old));
> +	instrument_read_write(old, sizeof(*old));
>  	return arch_atomic_long_try_cmpxchg_relaxed(v, old, new);
>  }
>  
> @@ -1755,87 +1797,108 @@ atomic_long_dec_if_positive(atomic_long_
>  	return arch_atomic_long_dec_if_positive(v);
>  }
>  

> +#define atomic_long_inc_ofl(v, L) \
> +({ \
> +	typeof(v) __ai_v = (v); \
> +	instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \
> +	arch_atomic_long_inc_ofl(__ai_v, L); \
> +})
> +
> +#define atomic_long_dec_ofl(v, L) \
> +({ \
> +	typeof(v) __ai_v = (v); \
> +	instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \
> +	arch_atomic_long_dec_ofl(__ai_v, L); \
> +})
> +
> +#define atomic_long_dec_and_test_ofl(v, L) \
> +({ \
> +	typeof(v) __ai_v = (v); \
> +	instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \
> +	arch_atomic_long_dec_and_test_ofl(__ai_v, L); \
> +})
> +
>  #define xchg(ptr, ...) \
>  ({ \
>  	typeof(ptr) __ai_ptr = (ptr); \
> -	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
> +	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
>  	arch_xchg(__ai_ptr, __VA_ARGS__); \
>  })
>  
>  #define xchg_acquire(ptr, ...) \
>  ({ \
>  	typeof(ptr) __ai_ptr = (ptr); \
> -	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
> +	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
>  	arch_xchg_acquire(__ai_ptr, __VA_ARGS__); \
>  })
>  
>  #define xchg_release(ptr, ...) \
>  ({ \
>  	typeof(ptr) __ai_ptr = (ptr); \
> -	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
> +	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
>  	arch_xchg_release(__ai_ptr, __VA_ARGS__); \
>  })
>  
>  #define xchg_relaxed(ptr, ...) \
>  ({ \
>  	typeof(ptr) __ai_ptr = (ptr); \
> -	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
> +	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
>  	arch_xchg_relaxed(__ai_ptr, __VA_ARGS__); \
>  })
>  
>  #define cmpxchg(ptr, ...) \
>  ({ \
>  	typeof(ptr) __ai_ptr = (ptr); \
> -	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
> +	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
>  	arch_cmpxchg(__ai_ptr, __VA_ARGS__); \
>  })
>  
>  #define cmpxchg_acquire(ptr, ...) \
>  ({ \
>  	typeof(ptr) __ai_ptr = (ptr); \
> -	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
> +	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
>  	arch_cmpxchg_acquire(__ai_ptr, __VA_ARGS__); \
>  })
>  
>  #define cmpxchg_release(ptr, ...) \
>  ({ \
>  	typeof(ptr) __ai_ptr = (ptr); \
> -	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
> +	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
>  	arch_cmpxchg_release(__ai_ptr, __VA_ARGS__); \
>  })
>  
>  #define cmpxchg_relaxed(ptr, ...) \
>  ({ \
>  	typeof(ptr) __ai_ptr = (ptr); \
> -	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
> +	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
>  	arch_cmpxchg_relaxed(__ai_ptr, __VA_ARGS__); \
>  })
>  
>  #define cmpxchg64(ptr, ...) \
>  ({ \
>  	typeof(ptr) __ai_ptr = (ptr); \
> -	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
> +	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
>  	arch_cmpxchg64(__ai_ptr, __VA_ARGS__); \
>  })
>  
>  #define cmpxchg64_acquire(ptr, ...) \
>  ({ \
>  	typeof(ptr) __ai_ptr = (ptr); \
> -	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
> +	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
>  	arch_cmpxchg64_acquire(__ai_ptr, __VA_ARGS__); \
>  })
>  
>  #define cmpxchg64_release(ptr, ...) \
>  ({ \
>  	typeof(ptr) __ai_ptr = (ptr); \
> -	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
> +	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
>  	arch_cmpxchg64_release(__ai_ptr, __VA_ARGS__); \
>  })
>  
>  #define cmpxchg64_relaxed(ptr, ...) \
>  ({ \
>  	typeof(ptr) __ai_ptr = (ptr); \
> -	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
> +	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
>  	arch_cmpxchg64_relaxed(__ai_ptr, __VA_ARGS__); \
>  })
>  
> @@ -1843,8 +1906,8 @@ atomic_long_dec_if_positive(atomic_long_
>  ({ \
>  	typeof(ptr) __ai_ptr = (ptr); \
>  	typeof(oldp) __ai_oldp = (oldp); \
> -	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
> -	instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
> +	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
> +	instrument_read_write(__ai_oldp, sizeof(*__ai_oldp)); \
>  	arch_try_cmpxchg(__ai_ptr, __ai_oldp, __VA_ARGS__); \
>  })
>  
> @@ -1852,8 +1915,8 @@ atomic_long_dec_if_positive(atomic_long_
>  ({ \
>  	typeof(ptr) __ai_ptr = (ptr); \
>  	typeof(oldp) __ai_oldp = (oldp); \
> -	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
> -	instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
> +	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
> +	instrument_read_write(__ai_oldp, sizeof(*__ai_oldp)); \
>  	arch_try_cmpxchg_acquire(__ai_ptr, __ai_oldp, __VA_ARGS__); \
>  })
>  
> @@ -1861,8 +1924,8 @@ atomic_long_dec_if_positive(atomic_long_
>  ({ \
>  	typeof(ptr) __ai_ptr = (ptr); \
>  	typeof(oldp) __ai_oldp = (oldp); \
> -	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
> -	instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
> +	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
> +	instrument_read_write(__ai_oldp, sizeof(*__ai_oldp)); \
>  	arch_try_cmpxchg_release(__ai_ptr, __ai_oldp, __VA_ARGS__); \
>  })
>  
> @@ -1870,36 +1933,36 @@ atomic_long_dec_if_positive(atomic_long_
>  ({ \
>  	typeof(ptr) __ai_ptr = (ptr); \
>  	typeof(oldp) __ai_oldp = (oldp); \
> -	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
> -	instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
> +	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
> +	instrument_read_write(__ai_oldp, sizeof(*__ai_oldp)); \
>  	arch_try_cmpxchg_relaxed(__ai_ptr, __ai_oldp, __VA_ARGS__); \
>  })
>  
>  #define cmpxchg_local(ptr, ...) \
>  ({ \
>  	typeof(ptr) __ai_ptr = (ptr); \
> -	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
> +	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
>  	arch_cmpxchg_local(__ai_ptr, __VA_ARGS__); \
>  })
>  
>  #define cmpxchg64_local(ptr, ...) \
>  ({ \
>  	typeof(ptr) __ai_ptr = (ptr); \
> -	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
> +	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
>  	arch_cmpxchg64_local(__ai_ptr, __VA_ARGS__); \
>  })
>  
>  #define sync_cmpxchg(ptr, ...) \
>  ({ \
>  	typeof(ptr) __ai_ptr = (ptr); \
> -	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
> +	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
>  	arch_sync_cmpxchg(__ai_ptr, __VA_ARGS__); \
>  })
>  
>  #define cmpxchg_double(ptr, ...) \
>  ({ \
>  	typeof(ptr) __ai_ptr = (ptr); \
> -	instrument_atomic_write(__ai_ptr, 2 * sizeof(*__ai_ptr)); \
> +	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
>  	arch_cmpxchg_double(__ai_ptr, __VA_ARGS__); \
>  })
>  
> @@ -1907,9 +1970,9 @@ atomic_long_dec_if_positive(atomic_long_
>  #define cmpxchg_double_local(ptr, ...) \
>  ({ \
>  	typeof(ptr) __ai_ptr = (ptr); \
> -	instrument_atomic_write(__ai_ptr, 2 * sizeof(*__ai_ptr)); \
> +	instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
>  	arch_cmpxchg_double_local(__ai_ptr, __VA_ARGS__); \
>  })
>  
>  #endif /* _LINUX_ATOMIC_INSTRUMENTED_H */
> -// 2a9553f0a9d5619f19151092df5cabbbf16ce835
> +// 214f9a7e972966a9a8e28e1568665cfb75decf91
> --- a/include/linux/atomic/atomic-long.h
> +++ b/include/linux/atomic/atomic-long.h
> @@ -515,6 +515,21 @@ arch_atomic_long_dec_if_positive(atomic_
>  	return arch_atomic64_dec_if_positive(v);
>  }
>  
> +#define arch_atomic_long_inc_ofl(v, L) \
> +({ \
> +	arch_atomic64_inc_ofl((v), L) \
> +})
> +
> +#define arch_atomic_long_dec_ofl(v, L) \
> +({ \
> +	arch_atomic64_dec_ofl((v), L) \
> +})
> +
> +#define arch_atomic_long_dec_and_test_ofl(v, L) \
> +({ \
> +	arch_atomic64_dec_and_test_ofl((v), L) \
> +})
> +
>  #else /* CONFIG_64BIT */
>  
>  static __always_inline long
> @@ -1009,6 +1024,21 @@ arch_atomic_long_dec_if_positive(atomic_
>  	return arch_atomic_dec_if_positive(v);
>  }
>  
> +#define arch_atomic_long_inc_ofl(v, L) \
> +({ \
> +	arch_atomic_inc_ofl((v), L) \
> +})
> +
> +#define arch_atomic_long_dec_ofl(v, L) \
> +({ \
> +	arch_atomic_dec_ofl((v), L) \
> +})
> +
> +#define arch_atomic_long_dec_and_test_ofl(v, L) \
> +({ \
> +	arch_atomic_dec_and_test_ofl((v), L) \
> +})
> +
>  #endif /* CONFIG_64BIT */
>  #endif /* _LINUX_ATOMIC_LONG_H */


> -// e8f0e08ff072b74d180eabe2ad001282b38c2c88
> +// 120f718985fa4c8f0e884cc4f23db8aa950255fb
> --- a/scripts/atomic/atomic-tbl.sh
> +++ b/scripts/atomic/atomic-tbl.sh
> @@ -36,6 +36,12 @@ meta_has_relaxed()
>  	meta_in "$1" "BFIR"
>  }
>  
> +#meta_has_macro(meta)
> +meta_has_macro()
> +{
> +	meta_in "$1" "m"
> +}
> +
>  #find_fallback_template(pfx, name, sfx, order)
>  find_fallback_template()
>  {
> --- a/scripts/atomic/atomics.tbl
> +++ b/scripts/atomic/atomics.tbl
> @@ -10,12 +10,15 @@
>  # * F/f	- fetch: returns base type (has fetch_ variants)
>  # * l	- load: returns base type (has _acquire order variant)
>  # * s	- store: returns void (has _release order variant)
> +# * m	- macro:
>  #
>  # Where args contains list of type[:name], where type is:
>  # * cv	- const pointer to atomic base type (atomic_t/atomic64_t/atomic_long_t)
>  # * v	- pointer to atomic base type (atomic_t/atomic64_t/atomic_long_t)
>  # * i	- base type (int/s64/long)
>  # * p	- pointer to base type (int/s64/long)
> +# * L	- label for exception case
> +# * V:... - vararg



>  #
>  read			l	cv
>  set			s	v	i
> @@ -39,3 +42,6 @@ inc_not_zero		b	v
>  inc_unless_negative	b	v
>  dec_unless_positive	b	v
>  dec_if_positive		i	v
> +inc_ofl			m	v	L
> +dec_ofl			m	v	L
> +dec_and_test_ofl	m	v	L
> --- /dev/null
> +++ b/scripts/atomic/fallbacks/dec_and_test_ofl
> @@ -0,0 +1,12 @@
> +cat << EOF
> +#define arch_${atomic}_dec_and_test_ofl(_v, _label)			\\
> +({									\\
> +	bool __ret = false;						\\
> +	${int} __new = arch_${atomic}_dec_return(_v);			\\
> +	if (unlikely(__new < 0))					\\
> +		goto _label;						\\
> +	if (unlikely(__new == 0))					\\
> +		__ret = true;						\\
> +	__ret;								\\
> +})
> +EOF
> --- /dev/null
> +++ b/scripts/atomic/fallbacks/dec_ofl
> @@ -0,0 +1,8 @@
> +cat << EOF
> +#define arch_${atomic}_dec_ofl(_v, _label)				\\
> +do {									\\
> +	${int} __new = arch_${atomic}_dec_return(_v);			\\
> +	if (unlikely(__new <= 0))					\\
> +		goto _label;						\\
> +} while (0)
> +EOF
> --- /dev/null
> +++ b/scripts/atomic/fallbacks/inc_ofl
> @@ -0,0 +1,8 @@
> +cat << EOF
> +#define arch_${atomic}_inc_ofl(_v, _label)				\\
> +do {									\\
> +	${int} __old = arch_${atomic}_fetch_inc(_v);			\\
> +	if (unlikely(__old <= 0))					\\
> +		goto _label;						\\
> +} while (0)
> +EOF
> --- a/scripts/atomic/gen-atomic-fallback.sh
> +++ b/scripts/atomic/gen-atomic-fallback.sh
> @@ -27,7 +27,9 @@ gen_template_fallback()
>  	if [ ! -z "${template}" ]; then
>  		printf "#ifndef ${atomicname}\n"
>  		. ${template}
> -		printf "#define ${atomicname} ${atomicname}\n"
> +		if ! meta_has_macro "${meta}"; then
> +			printf "#define ${atomicname} ${atomicname}\n"
> +		fi
>  		printf "#endif\n\n"
>  	fi
>  }
> --- a/scripts/atomic/gen-atomic-instrumented.sh
> +++ b/scripts/atomic/gen-atomic-instrumented.sh
> @@ -13,9 +13,13 @@ gen_param_check()
>  	local type="${arg%%:*}"
>  	local name="$(gen_param_name "${arg}")"
>  	local rw="write"
> +	local pfx;
>  
>  	case "${type#c}" in
> +	v) pfx="atomic_";;
>  	i) return;;
> +	L) return;;
> +	V) return;;
>  	esac
>  
>  	if [ ${type#c} != ${type} ]; then
> @@ -27,7 +31,16 @@ gen_param_check()
>  		rw="read_write"
>  	fi
>  
> -	printf "\tinstrument_atomic_${rw}(${name}, sizeof(*${name}));\n"
> +	if meta_has_macro "${meta}"; then
> +		name="__ai_${name}"
> +	fi
> +
> +	printf "\tinstrument_${pfx}${rw}(${name}, sizeof(*${name}));"
> +	if meta_has_macro "${meta}"; then
> +		printf " \\"
> +	fi
> +	printf "\n"
> +
>  }
>  
>  #gen_params_checks(meta, arg...)
> @@ -41,6 +54,52 @@ gen_params_checks()
>  	done
>  }
>  
> +#gen_var(arg)
> +gen_var()
> +{
> +	local type="${1%%:*}"
> +	local name="$(gen_param_name "$1")"
> +
> +	case "${type#c}" in
> +	L) return;;
> +	V) return;;
> +	esac
> +
> +	printf "\ttypeof(${name}) __ai_${name} = (${name}); \\\\\n";
> +}
> +
> +#gen_vars(arg...)
> +gen_vars()
> +{
> +	while [ "$#" -gt 0 ]; do
> +		gen_var "$1"
> +		shift
> +	done
> +}
> +
> +#gen_varg(arg)
> +gen_varg()
> +{
> +	local type="${1%%:*}"
> +	local name="$(gen_param_name "$1")"
> +
> +	case "${type#c}" in
> +	L)	printf "${name}";;
> +	V)	printf "__VA_ARGS__";;
> +	*)	printf "__ai_${name}";;
> +	esac
> +}
> +
> +#gen_vargs(arg...)
> +gen_vargs()
> +{
> +	while [ "$#" -gt 0 ]; do
> +		printf "$(gen_varg "$1")"
> +		[ "$#" -gt 1 ] && printf ", "
> +		shift
> +	done
> +}
> +
>  #gen_proto_order_variant(meta, pfx, name, sfx, order, atomic, int, arg...)
>  gen_proto_order_variant()
>  {
> @@ -54,11 +113,28 @@ gen_proto_order_variant()
>  
>  	local atomicname="${atomic}_${pfx}${name}${sfx}${order}"
>  
> -	local ret="$(gen_ret_type "${meta}" "${int}")"
> -	local params="$(gen_params "${int}" "${atomic}" "$@")"
>  	local checks="$(gen_params_checks "${meta}" "$@")"
>  	local args="$(gen_args "$@")"
> -	local retstmt="$(gen_ret_stmt "${meta}")"
> +
> +	if meta_has_macro "${meta}"; then
> +
> +		local vars="$(gen_vars "$@")"
> +		local vargs="$(gen_vargs "$@")"
> +
> +cat <<EOF
> +#define ${atomicname}(${args}) \\
> +({ \\
> +${vars}
> +${checks}
> +	arch_${atomicname}(${vargs}); \\
> +})
> +EOF
> +
> +	else
> +
> +		local ret="$(gen_ret_type "${meta}" "${int}")"
> +		local params="$(gen_params "${int}" "${atomic}" "$@")"
> +		local retstmt="$(gen_ret_stmt "${meta}")"
>  
>  cat <<EOF
>  static __always_inline ${ret}
> @@ -69,6 +145,8 @@ ${checks}
>  }
>  EOF
>  
> +	fi
> +
>  	printf "\n"
>  }
>  
> @@ -76,32 +154,27 @@ gen_xchg()
>  {
>  	local xchg="$1"; shift
>  	local mult="$1"; shift
> +	local ARGS;
>  
>  	if [ "${xchg%${xchg#try_cmpxchg}}" = "try_cmpxchg" ] ; then
> -
> -cat <<EOF
> -#define ${xchg}(ptr, oldp, ...) \\
> -({ \\
> -	typeof(ptr) __ai_ptr = (ptr); \\
> -	typeof(oldp) __ai_oldp = (oldp); \\
> -	instrument_atomic_write(__ai_ptr, ${mult}sizeof(*__ai_ptr)); \\
> -	instrument_atomic_write(__ai_oldp, ${mult}sizeof(*__ai_oldp)); \\
> -	arch_${xchg}(__ai_ptr, __ai_oldp, __VA_ARGS__); \\
> -})
> -EOF
> -
> +		ARGS="v:ptr p:oldp V:..."
>  	else
> +		ARGS="v:ptr V:..."
> +	fi
> +
> +	local args="$(gen_args ${ARGS})"
> +	local vars="$(gen_vars ${ARGS})"
> +	local checks="$(gen_params_checks "m" ${ARGS})"
> +	local vargs="$(gen_vargs ${ARGS})"
>  
>  cat <<EOF
> -#define ${xchg}(ptr, ...) \\
> +#define ${xchg}(${args}) \\
>  ({ \\
> -	typeof(ptr) __ai_ptr = (ptr); \\
> -	instrument_atomic_write(__ai_ptr, ${mult}sizeof(*__ai_ptr)); \\
> -	arch_${xchg}(__ai_ptr, __VA_ARGS__); \\
> +${vars}
> +${checks}
> +	arch_${xchg}(${vargs}); \\
>  })
>  EOF
> -
> -	fi
>  }
>  
>  cat << EOF
> --- a/scripts/atomic/gen-atomic-long.sh
> +++ b/scripts/atomic/gen-atomic-long.sh
> @@ -17,16 +17,21 @@ gen_cast()
>  	printf "($(gen_param_type "${arg}" "${int}" "${atomic}"))"
>  }
>  
> -#gen_args_cast(int, atomic, arg...)
> +#gen_args_cast(meta, int, atomic, arg...)
>  gen_args_cast()
>  {
> +	local meta=$1; shift
>  	local int="$1"; shift
>  	local atomic="$1"; shift
>  
>  	while [ "$#" -gt 0 ]; do
>  		local cast="$(gen_cast "$1" "${int}" "${atomic}")"
>  		local arg="$(gen_param_name "$1")"
> -		printf "${cast}${arg}"
> +		if meta_has_macro "${meta}" && [ "${1%%:*}" != "L" ]; then
> +			printf "${cast}(${arg})"
> +		else
> +			printf "${cast}${arg}"
> +		fi
>  		[ "$#" -gt 1 ] && printf ", "
>  		shift;
>  	done
> @@ -40,10 +45,24 @@ gen_proto_order_variant()
>  	local atomic="$1"; shift
>  	local int="$1"; shift
>  
> -	local ret="$(gen_ret_type "${meta}" "long")"
> -	local params="$(gen_params "long" "atomic_long" "$@")"
> -	local argscast="$(gen_args_cast "${int}" "${atomic}" "$@")"
> -	local retstmt="$(gen_ret_stmt "${meta}")"
> +	local argscast="$(gen_args_cast "${meta}" "${int}" "${atomic}" "$@")"
> +
> +	if meta_has_macro "${meta}"; then
> +
> +		local args="$(gen_args "$@")"
> +
> +cat <<EOF
> +#define arch_atomic_long_${name}(${args}) \\
> +({ \\
> +	arch_${atomic}_${name}(${argscast}) \\
> +})
> +
> +EOF
> +	else
> +
> +		local ret="$(gen_ret_type "${meta}" "long")"
> +		local params="$(gen_params "long" "atomic_long" "$@")"
> +		local retstmt="$(gen_ret_stmt "${meta}")"
>  
>  cat <<EOF
>  static __always_inline ${ret}
> @@ -53,6 +72,7 @@ arch_atomic_long_${name}(${params})
>  }
>  
>  EOF
> +	fi
>  }
>  
>  cat << EOF
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 2/5] refcount: Use atomic_*_ofl()
  2021-12-08 18:36 ` [RFC][PATCH 2/5] refcount: Use atomic_*_ofl() Peter Zijlstra
  2021-12-08 19:19   ` Peter Zijlstra
@ 2021-12-09 13:17   ` Mark Rutland
  2021-12-09 17:00     ` Mark Rutland
  1 sibling, 1 reply; 17+ messages in thread
From: Mark Rutland @ 2021-12-09 13:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: will, boqun.feng, linux-kernel, x86, elver, keescook, hch, torvalds

On Wed, Dec 08, 2021 at 07:36:57PM +0100, Peter Zijlstra wrote:
> Use the shiny new atomic_*_ofl() functions in order to have better
> code-gen.
> 
> Notably refcount_inc() case no longer distinguishes between
> inc-from-zero and inc-negative in the fast path, this improves
> code-gen:
> 
>     4b88:       b8 01 00 00 00          mov    $0x1,%eax
>     4b8d:       f0 0f c1 43 28          lock xadd %eax,0x28(%rbx)
>     4b92:       85 c0                   test   %eax,%eax
>     4b94:       74 1b                   je     4bb1 <alloc_perf_context+0xf1>
>     4b96:       8d 50 01                lea    0x1(%rax),%edx
>     4b99:       09 c2                   or     %eax,%edx
>     4b9b:       78 20                   js     4bbd <alloc_perf_context+0xfd>
> 
> to:
> 
>     4768:       b8 01 00 00 00          mov    $0x1,%eax
>     476d:       f0 0f c1 43 28          lock xadd %eax,0x28(%rbx)
>     4772:       85 c0                   test   %eax,%eax
>     4774:       7e 14                   jle    478a <alloc_perf_context+0xea>

For comparison, I generated the same for arm64 below with kernel.org crosstool
GCC 11.1.0 and defconfig.

For arm64 there's an existing sub-optimiality for inc/dec where the register
for `1` or `-1` gets generated with a `MOV;MOV` chain or `MOV;NEG` chain rather
than a single `MOV` in either case. I think taht's due to the way we build
LSE/LL-SC variants of add() and build a common inc() atop, and the compiler
just loses the opportunity to constant-fold. I'll take a look at how to make
that neater.

Regardless, the code goes from:

    51f4:       52800024        mov     w4, #0x1                        // #1
    ...
    5224:       2a0403e1        mov     w1, w4
    5228:       b8210001        ldadd   w1, w1, [x0]
    522c:       34000261        cbz     w1, 5278 <alloc_perf_context+0xf8>
    5230:       11000422        add     w2, w1, #0x1
    5234:       2a010041        orr     w1, w2, w1
    5238:       37f80181        tbnz    w1, #31, 5268 <alloc_perf_context+0xe8>

to:

    40e8:       52800024        mov     w4, #0x1                        // #1
    ...
    4118:       2a0403e1        mov     w1, w4
    411c:       b8e10001        ldaddal w1, w1, [x0]
    4120:       7100003f        cmp     w1, #0x0
    4124:       5400018d        b.le    4154 <alloc_perf_context+0xe0>

> without sacrificing on functionality; the only thing that suffers is
> the reported error condition, which might now 'spuriously' report
> 'saturated' instead of 'inc-from-zero'.
> 
> refcount_dec_and_test() is also improved:
> 
>     aa40:       b8 ff ff ff ff          mov    $0xffffffff,%eax
>     aa45:       f0 0f c1 07             lock xadd %eax,(%rdi)
>     aa49:       83 f8 01                cmp    $0x1,%eax
>     aa4c:       74 05                   je     aa53 <ring_buffer_put+0x13>
>     aa4e:       85 c0                   test   %eax,%eax
>     aa50:       7e 1e                   jle    aa70 <ring_buffer_put+0x30>
>     aa52:       c3                      ret
> 
> to:
> 
>     a980:       b8 ff ff ff ff          mov    $0xffffffff,%eax
>     a985:       f0 0f c1 07             lock xadd %eax,(%rdi)
>     a989:       83 e8 01                sub    $0x1,%eax
>     a98c:       78 20                   js     a9ae <ring_buffer_put+0x2e>
>     a98e:       74 01                   je     a991 <ring_buffer_put+0x11>
>     a990:       c3                      ret

For arm64 (with your fixlet applied) that goes from:

    c42c:       52800021        mov     w1, #0x1                        // #1
    c430:       4b0103e1        neg     w1, w1
    c434:       b8610001        ldaddl  w1, w1, [x0]
    c438:       7100043f        cmp     w1, #0x1
    c43c:       54000140        b.eq    c464 <ring_buffer_put+0x50>  // b.none
    c440:       7100003f        cmp     w1, #0x0
    c444:       5400028d        b.le    c494 <ring_buffer_put+0x80>

to:

    c3dc:       52800021        mov     w1, #0x1                        // #1
    c3e0:       4b0103e1        neg     w1, w1
    c3e4:       b8e10002        ldaddal w1, w2, [x0]
    c3e8:       0b020021        add     w1, w1, w2
    c3ec:       7100003f        cmp     w1, #0x0
    c3f0:       5400012b        b.lt    c414 <ring_buffer_put+0x50>  // b.tstop
    c3f4:       540001a0        b.eq    c428 <ring_buffer_put+0x64>  // b.none

I think the add here is due to  the change in your fixlet:

-       if (unlikely(old <= 1))                                                                                                                                                                             
+       if (unlikely(old - 1 <= 0)) 

> XXX atomic_dec_and_test_ofl() is strictly stronger ordered than
> refcount_dec_and_test() is defined -- Power and Arrghh64 ?

I'll leave the ordering to Will.

Thanks,
Mark.

> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/linux/refcount.h |   15 ++++++++++++---
>  lib/refcount.c           |    5 +++++
>  2 files changed, 17 insertions(+), 3 deletions(-)
> 
> --- a/include/linux/refcount.h
> +++ b/include/linux/refcount.h
> @@ -264,7 +264,10 @@ static inline void __refcount_inc(refcou
>   */
>  static inline void refcount_inc(refcount_t *r)
>  {
> -	__refcount_inc(r, NULL);
> +	atomic_inc_ofl(&r->refs, Eoverflow);
> +	return;
> +Eoverflow:
> +	refcount_warn_saturate(r, REFCOUNT_ADD_OVF);
>  }
>  
>  static inline __must_check bool __refcount_sub_and_test(int i, refcount_t *r, int *oldp)
> @@ -330,7 +333,10 @@ static inline __must_check bool __refcou
>   */
>  static inline __must_check bool refcount_dec_and_test(refcount_t *r)
>  {
> -	return __refcount_dec_and_test(r, NULL);
> +	return atomic_dec_and_test_ofl(&r->refs, Eoverflow);
> +Eoverflow:
> +	refcount_warn_saturate(r, REFCOUNT_SUB_UAF);
> +	return false;
>  }
>  
>  static inline void __refcount_dec(refcount_t *r, int *oldp)
> @@ -356,7 +362,10 @@ static inline void __refcount_dec(refcou
>   */
>  static inline void refcount_dec(refcount_t *r)
>  {
> -	__refcount_dec(r, NULL);
> +	atomic_dec_ofl(&r->refs, Eoverflow);
> +	return;
> +Eoverflow:
> +	refcount_warn_saturate(r, REFCOUNT_DEC_LEAK);
>  }
>  
>  extern __must_check bool refcount_dec_if_one(refcount_t *r);
> --- a/lib/refcount.c
> +++ b/lib/refcount.c
> @@ -12,8 +12,13 @@
>  
>  void refcount_warn_saturate(refcount_t *r, enum refcount_saturation_type t)
>  {
> +	int old = refcount_read(r);
>  	refcount_set(r, REFCOUNT_SATURATED);
>  
> +	/* racy; who cares */
> +	if (old == 1 && t == REFCOUNT_ADD_OVF)
> +		t = REFCOUNT_ADD_UAF;
> +
>  	switch (t) {
>  	case REFCOUNT_ADD_NOT_ZERO_OVF:
>  		REFCOUNT_WARN("saturated; leaking memory");
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 1/5] atomic: Introduce atomic_{inc,dec,dec_and_test}_ofl()
  2021-12-09 12:42   ` Mark Rutland
@ 2021-12-09 13:34     ` Peter Zijlstra
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2021-12-09 13:34 UTC (permalink / raw)
  To: Mark Rutland
  Cc: will, boqun.feng, linux-kernel, x86, elver, keescook, hch,
	torvalds, axboe

On Thu, Dec 09, 2021 at 12:42:47PM +0000, Mark Rutland wrote:
> On Wed, Dec 08, 2021 at 07:36:56PM +0100, Peter Zijlstra wrote:
> > In order to facilitate architecture support for refcount_t, introduce
> > a number of new atomic primitives that have a uaccess style exception
> > for overflow.
> > 
> > Notably:
> > 
> >   atomic_inc_ofl(v, Label) -- increment and goto Label when
> > 			      v is zero or negative.
> > 
> >   atomic_dec_ofl(v, Label) -- decrement and goto Label when
> > 			      the result is zero or negative
> > 
> >   atomic_dec_and_test_ofl(v, Label) -- decrement and return true when
> > 				       the result is zero and goto Label
> > 				       when the result is negative
> 
> Just to check, atomic_inc_ofl() tests the *old* value of `v`, and the other
> cases check the *new* value of `v`?
> 
> For clarity, in the descriptions it might be worth:
> 
>   s/v/the old value of v/
>   s/the result/the new value of v/
> 
> ... which I think makes that clearer.

Right, I'll clarify.

> > Since the GCC 'Labels as Values' extention doesn't allow having the
> > goto in an inline function, these new 'functions' must in fact be
> > implemented as macro magic.
> 
> Oh; fun... :(

Yeah, I tried all sorta things, it's all >.< close to working but then
GCC refuses to do the sensible thing.

> > This meant extending the atomic generation scripts to deal with
> > wrapping macros instead of inline functions. Since
> > xchg/cmpxchg/try_cmpxchg were already macro magic, there was existant
> > code for that. While extending/improving that a few latent
> > 'instrumentation' bugs were uncovered and 'accidentally' fixed.
> 
> I assume for non-RFC we can split that out into a preparatory patch. :)

Sure, I can split it in two; one add the infra and fix bugs and two
introduce the new ops.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 0/5] refcount: Improve code-gen
  2021-12-09  8:25 ` [RFC][PATCH 0/5] refcount: Improve code-gen Peter Zijlstra
@ 2021-12-09 16:19   ` Jens Axboe
  2021-12-09 16:51     ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2021-12-09 16:19 UTC (permalink / raw)
  To: Peter Zijlstra, will, boqun.feng
  Cc: linux-kernel, x86, mark.rutland, elver, keescook, hch, torvalds

On 12/9/21 1:25 AM, Peter Zijlstra wrote:
> On Wed, Dec 08, 2021 at 07:36:55PM +0100, Peter Zijlstra wrote:
>> Hi,
>>
>> Improves the refcount_t code-gen; I've still got to go through the latest thing
>> Linus suggested, but figured I should get these patches out to see if there's
>> other concerns etc..
>>
> 
> Bah, I forgot to Cc Jens, let me bounce him the lot.

Traveling the next few days, just put me on v2 and I should have time to
look at this start next week. Bouncing only helps for initial messages,
I'll have to do dig for followups :-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 0/5] refcount: Improve code-gen
  2021-12-09 16:19   ` Jens Axboe
@ 2021-12-09 16:51     ` Peter Zijlstra
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2021-12-09 16:51 UTC (permalink / raw)
  To: Jens Axboe
  Cc: will, boqun.feng, linux-kernel, x86, mark.rutland, elver,
	keescook, hch, torvalds

On Thu, Dec 09, 2021 at 09:19:58AM -0700, Jens Axboe wrote:
> On 12/9/21 1:25 AM, Peter Zijlstra wrote:
> > On Wed, Dec 08, 2021 at 07:36:55PM +0100, Peter Zijlstra wrote:
> >> Hi,
> >>
> >> Improves the refcount_t code-gen; I've still got to go through the latest thing
> >> Linus suggested, but figured I should get these patches out to see if there's
> >> other concerns etc..
> >>
> > 
> > Bah, I forgot to Cc Jens, let me bounce him the lot.
> 
> Traveling the next few days, just put me on v2 and I should have time to
> look at this start next week. Bouncing only helps for initial messages,
> I'll have to do dig for followups :-)

Hmm.. I tagged the whole thread and did tag-bounce. Anyway, I'll be sure
to copy you on v2. Thanks!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 2/5] refcount: Use atomic_*_ofl()
  2021-12-09 13:17   ` Mark Rutland
@ 2021-12-09 17:00     ` Mark Rutland
  0 siblings, 0 replies; 17+ messages in thread
From: Mark Rutland @ 2021-12-09 17:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: will, boqun.feng, linux-kernel, x86, elver, keescook, hch, torvalds

On Thu, Dec 09, 2021 at 01:17:33PM +0000, Mark Rutland wrote:
> On Wed, Dec 08, 2021 at 07:36:57PM +0100, Peter Zijlstra wrote:
> > Use the shiny new atomic_*_ofl() functions in order to have better
> > code-gen.
> > 
> > Notably refcount_inc() case no longer distinguishes between
> > inc-from-zero and inc-negative in the fast path, this improves
> > code-gen:
> > 
> >     4b88:       b8 01 00 00 00          mov    $0x1,%eax
> >     4b8d:       f0 0f c1 43 28          lock xadd %eax,0x28(%rbx)
> >     4b92:       85 c0                   test   %eax,%eax
> >     4b94:       74 1b                   je     4bb1 <alloc_perf_context+0xf1>
> >     4b96:       8d 50 01                lea    0x1(%rax),%edx
> >     4b99:       09 c2                   or     %eax,%edx
> >     4b9b:       78 20                   js     4bbd <alloc_perf_context+0xfd>
> > 
> > to:
> > 
> >     4768:       b8 01 00 00 00          mov    $0x1,%eax
> >     476d:       f0 0f c1 43 28          lock xadd %eax,0x28(%rbx)
> >     4772:       85 c0                   test   %eax,%eax
> >     4774:       7e 14                   jle    478a <alloc_perf_context+0xea>
> 
> For comparison, I generated the same for arm64 below with kernel.org crosstool
> GCC 11.1.0 and defconfig.
> 
> For arm64 there's an existing sub-optimiality for inc/dec where the register
> for `1` or `-1` gets generated with a `MOV;MOV` chain or `MOV;NEG` chain rather
> than a single `MOV` in either case. I think taht's due to the way we build
> LSE/LL-SC variants of add() and build a common inc() atop, and the compiler
> just loses the opportunity to constant-fold. I'll take a look at how to make
> that neater.

With some improvement's to arm64's LSE atomics, that becomes a comparable
sequence to x86's:

    2df8:       52800021        mov     w1, #0x1                        // #1
    ...
    2e20:       b8e10002        ldaddal w1, w2, [x0]
    2e24:       7100005f        cmp     w2, #0x0
    2e28:       5400012d        b.le    2e4c <alloc_perf_context+0xc8>

> > without sacrificing on functionality; the only thing that suffers is
> > the reported error condition, which might now 'spuriously' report
> > 'saturated' instead of 'inc-from-zero'.
> > 
> > refcount_dec_and_test() is also improved:
> > 
> >     aa40:       b8 ff ff ff ff          mov    $0xffffffff,%eax
> >     aa45:       f0 0f c1 07             lock xadd %eax,(%rdi)
> >     aa49:       83 f8 01                cmp    $0x1,%eax
> >     aa4c:       74 05                   je     aa53 <ring_buffer_put+0x13>
> >     aa4e:       85 c0                   test   %eax,%eax
> >     aa50:       7e 1e                   jle    aa70 <ring_buffer_put+0x30>
> >     aa52:       c3                      ret
> > 
> > to:
> > 
> >     a980:       b8 ff ff ff ff          mov    $0xffffffff,%eax
> >     a985:       f0 0f c1 07             lock xadd %eax,(%rdi)
> >     a989:       83 e8 01                sub    $0x1,%eax
> >     a98c:       78 20                   js     a9ae <ring_buffer_put+0x2e>
> >     a98e:       74 01                   je     a991 <ring_buffer_put+0x11>
> >     a990:       c3                      ret

Likewise I can get the arm64 equivalent down to:

    bebc:       12800001        mov     w1, #0xffffffff                 // #-1
    ...
    becc:       b8e10001        ldaddal w1, w1, [x0]
    bed0:       71000421        subs    w1, w1, #0x1
    bed4:       540000c4        b.mi    beec <ring_buffer_put+0x3c>  // b.first
    bed8:       54000120        b.eq    befc <ring_buffer_put+0x4c>  // b.none

I've pushed my WIP patches to:

  https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/atomics/improvements
  git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/atomics/improvements

... and I'll try to get those cleaned up and posted soon.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 3/5] refcount: Improve out-of-line code-gen
  2021-12-09  8:33   ` Peter Zijlstra
@ 2021-12-09 17:51     ` Linus Torvalds
  0 siblings, 0 replies; 17+ messages in thread
From: Linus Torvalds @ 2021-12-09 17:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Boqun Feng, Linux Kernel Mailing List,
	the arch/x86 maintainers, Mark Rutland, Marco Elver, Kees Cook,
	Christoph Hellwig, Jens Axboe

On Thu, Dec 9, 2021 at 12:33 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> For some obscure raisin it causes this: [snip]

Looks like the code generation at one point thought that it needs to
get the return value from the slow-path function call, and by the time
it had optimized that away it had already ended up with enough
temporaries that it needed to spill things into %rbx.

That looks like a fairly "internal to gcc" thing, but I think dropping
that patch is indeed the right thing to do.

When you made it do

        return refcount_warn_saturate(r, ..);

it now means that those inline functions don't obviously always return
'false' any more, so the compiler did end up needing a variable for
the return value at one point, and it really was a much more complex
decision. After it has inlined that top-level function, the return
value is not visible to the compiler because it doesn't see that
refcount_warn_saturate() always returns false.

The fact that quite often that whole refcount_warn_saturate() call
ended up then being in a cold part of the code, and could then be
optimized to a tailcall if that was the last thing generated, doesn't
end up helping in the general case, and instead only hurts: the
compiler doesn't see what the return value is for the warning case, so
it might end up doing other things in the place that the function was
inlined into.

I think the original patch was likely a mis-optimization triggered by
your test-case being a function that *only* did that
refcount_add_not_zero(), and returned the value. It didn't actually
have other code that then depended on the return value of it.

              Linus

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-12-09 17:53 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-08 18:36 [RFC][PATCH 0/5] refcount: Improve code-gen Peter Zijlstra
2021-12-08 18:36 ` [RFC][PATCH 1/5] atomic: Introduce atomic_{inc,dec,dec_and_test}_ofl() Peter Zijlstra
2021-12-09 12:42   ` Mark Rutland
2021-12-09 13:34     ` Peter Zijlstra
2021-12-08 18:36 ` [RFC][PATCH 2/5] refcount: Use atomic_*_ofl() Peter Zijlstra
2021-12-08 19:19   ` Peter Zijlstra
2021-12-08 20:56     ` Peter Zijlstra
2021-12-09 13:17   ` Mark Rutland
2021-12-09 17:00     ` Mark Rutland
2021-12-08 18:36 ` [RFC][PATCH 3/5] refcount: Improve out-of-line code-gen Peter Zijlstra
2021-12-09  8:33   ` Peter Zijlstra
2021-12-09 17:51     ` Linus Torvalds
2021-12-08 18:36 ` [RFC][PATCH 4/5] atomic,x86: Implement atomic_dec_and_test_ofl() Peter Zijlstra
2021-12-08 18:37 ` [RFC][PATCH 5/5] atomic: Document the atomic_{}_ofl() functions Peter Zijlstra
2021-12-09  8:25 ` [RFC][PATCH 0/5] refcount: Improve code-gen Peter Zijlstra
2021-12-09 16:19   ` Jens Axboe
2021-12-09 16:51     ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.