* [RFC][PATCH 0/5] refcount: Improve code-gen @ 2021-12-08 18:36 Peter Zijlstra 2021-12-08 18:36 ` [RFC][PATCH 1/5] atomic: Introduce atomic_{inc,dec,dec_and_test}_ofl() Peter Zijlstra ` (5 more replies) 0 siblings, 6 replies; 17+ messages in thread From: Peter Zijlstra @ 2021-12-08 18:36 UTC (permalink / raw) To: will, boqun.feng Cc: linux-kernel, x86, peterz, mark.rutland, elver, keescook, hch, torvalds Hi, Improves the refcount_t code-gen; I've still got to go through the latest thing Linus suggested, but figured I should get these patches out to see if there's other concerns etc.. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC][PATCH 1/5] atomic: Introduce atomic_{inc,dec,dec_and_test}_ofl() 2021-12-08 18:36 [RFC][PATCH 0/5] refcount: Improve code-gen Peter Zijlstra @ 2021-12-08 18:36 ` Peter Zijlstra 2021-12-09 12:42 ` Mark Rutland 2021-12-08 18:36 ` [RFC][PATCH 2/5] refcount: Use atomic_*_ofl() Peter Zijlstra ` (4 subsequent siblings) 5 siblings, 1 reply; 17+ messages in thread From: Peter Zijlstra @ 2021-12-08 18:36 UTC (permalink / raw) To: will, boqun.feng Cc: linux-kernel, x86, peterz, mark.rutland, elver, keescook, hch, torvalds In order to facilitate architecture support for refcount_t, introduce a number of new atomic primitives that have a uaccess style exception for overflow. Notably: atomic_inc_ofl(v, Label) -- increment and goto Label when v is zero or negative. atomic_dec_ofl(v, Label) -- decrement and goto Label when the result is zero or negative atomic_dec_and_test_ofl(v, Label) -- decrement and return true when the result is zero and goto Label when the result is negative Since the GCC 'Labels as Values' extention doesn't allow having the goto in an inline function, these new 'functions' must in fact be implemented as macro magic. This meant extending the atomic generation scripts to deal with wrapping macros instead of inline functions. Since xchg/cmpxchg/try_cmpxchg were already macro magic, there was existant code for that. While extending/improving that a few latent 'instrumentation' bugs were uncovered and 'accidentally' fixed. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- include/linux/atomic/atomic-arch-fallback.h | 64 ++++++++++++ include/linux/atomic/atomic-instrumented.h | 139 ++++++++++++++++++++-------- include/linux/atomic/atomic-long.h | 32 ++++++ scripts/atomic/atomic-tbl.sh | 6 + scripts/atomic/atomics.tbl | 6 + scripts/atomic/fallbacks/dec_and_test_ofl | 12 ++ scripts/atomic/fallbacks/dec_ofl | 8 + scripts/atomic/fallbacks/inc_ofl | 8 + scripts/atomic/gen-atomic-fallback.sh | 4 scripts/atomic/gen-atomic-instrumented.sh | 117 +++++++++++++++++++---- scripts/atomic/gen-atomic-long.sh | 32 +++++- 11 files changed, 359 insertions(+), 69 deletions(-) --- a/include/linux/atomic/atomic-arch-fallback.h +++ b/include/linux/atomic/atomic-arch-fallback.h @@ -1250,6 +1250,37 @@ arch_atomic_dec_if_positive(atomic_t *v) #define arch_atomic_dec_if_positive arch_atomic_dec_if_positive #endif +#ifndef arch_atomic_inc_ofl +#define arch_atomic_inc_ofl(_v, _label) \ +do { \ + int __old = arch_atomic_fetch_inc(_v); \ + if (unlikely(__old <= 0)) \ + goto _label; \ +} while (0) +#endif + +#ifndef arch_atomic_dec_ofl +#define arch_atomic_dec_ofl(_v, _label) \ +do { \ + int __new = arch_atomic_dec_return(_v); \ + if (unlikely(__new <= 0)) \ + goto _label; \ +} while (0) +#endif + +#ifndef arch_atomic_dec_and_test_ofl +#define arch_atomic_dec_and_test_ofl(_v, _label) \ +({ \ + bool __ret = false; \ + int __new = arch_atomic_dec_return(_v); \ + if (unlikely(__new < 0)) \ + goto _label; \ + if (unlikely(__new == 0)) \ + __ret = true; \ + __ret; \ +}) +#endif + #ifdef CONFIG_GENERIC_ATOMIC64 #include <asm-generic/atomic64.h> #endif @@ -2357,5 +2388,36 @@ arch_atomic64_dec_if_positive(atomic64_t #define arch_atomic64_dec_if_positive arch_atomic64_dec_if_positive #endif +#ifndef arch_atomic64_inc_ofl +#define arch_atomic64_inc_ofl(_v, _label) \ +do { \ + s64 __old = arch_atomic64_fetch_inc(_v); \ + if (unlikely(__old <= 0)) \ + goto _label; \ +} while (0) +#endif + +#ifndef arch_atomic64_dec_ofl +#define arch_atomic64_dec_ofl(_v, _label) \ +do { \ + s64 __new = arch_atomic64_dec_return(_v); \ + if (unlikely(__new <= 0)) \ + goto _label; \ +} while (0) +#endif + +#ifndef arch_atomic64_dec_and_test_ofl +#define arch_atomic64_dec_and_test_ofl(_v, _label) \ +({ \ + bool __ret = false; \ + s64 __new = arch_atomic64_dec_return(_v); \ + if (unlikely(__new < 0)) \ + goto _label; \ + if (unlikely(__new == 0)) \ + __ret = true; \ + __ret; \ +}) +#endif + #endif /* _LINUX_ATOMIC_FALLBACK_H */ -// cca554917d7ea73d5e3e7397dd70c484cad9b2c4 +// a59904b14db62a38bbab8699edc4a785a97871fb --- a/include/linux/atomic/atomic-instrumented.h +++ b/include/linux/atomic/atomic-instrumented.h @@ -501,7 +501,7 @@ static __always_inline bool atomic_try_cmpxchg(atomic_t *v, int *old, int new) { instrument_atomic_read_write(v, sizeof(*v)); - instrument_atomic_read_write(old, sizeof(*old)); + instrument_read_write(old, sizeof(*old)); return arch_atomic_try_cmpxchg(v, old, new); } @@ -509,7 +509,7 @@ static __always_inline bool atomic_try_cmpxchg_acquire(atomic_t *v, int *old, int new) { instrument_atomic_read_write(v, sizeof(*v)); - instrument_atomic_read_write(old, sizeof(*old)); + instrument_read_write(old, sizeof(*old)); return arch_atomic_try_cmpxchg_acquire(v, old, new); } @@ -517,7 +517,7 @@ static __always_inline bool atomic_try_cmpxchg_release(atomic_t *v, int *old, int new) { instrument_atomic_read_write(v, sizeof(*v)); - instrument_atomic_read_write(old, sizeof(*old)); + instrument_read_write(old, sizeof(*old)); return arch_atomic_try_cmpxchg_release(v, old, new); } @@ -525,7 +525,7 @@ static __always_inline bool atomic_try_cmpxchg_relaxed(atomic_t *v, int *old, int new) { instrument_atomic_read_write(v, sizeof(*v)); - instrument_atomic_read_write(old, sizeof(*old)); + instrument_read_write(old, sizeof(*old)); return arch_atomic_try_cmpxchg_relaxed(v, old, new); } @@ -599,6 +599,27 @@ atomic_dec_if_positive(atomic_t *v) return arch_atomic_dec_if_positive(v); } +#define atomic_inc_ofl(v, L) \ +({ \ + typeof(v) __ai_v = (v); \ + instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \ + arch_atomic_inc_ofl(__ai_v, L); \ +}) + +#define atomic_dec_ofl(v, L) \ +({ \ + typeof(v) __ai_v = (v); \ + instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \ + arch_atomic_dec_ofl(__ai_v, L); \ +}) + +#define atomic_dec_and_test_ofl(v, L) \ +({ \ + typeof(v) __ai_v = (v); \ + instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \ + arch_atomic_dec_and_test_ofl(__ai_v, L); \ +}) + static __always_inline s64 atomic64_read(const atomic64_t *v) { @@ -1079,7 +1100,7 @@ static __always_inline bool atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new) { instrument_atomic_read_write(v, sizeof(*v)); - instrument_atomic_read_write(old, sizeof(*old)); + instrument_read_write(old, sizeof(*old)); return arch_atomic64_try_cmpxchg(v, old, new); } @@ -1087,7 +1108,7 @@ static __always_inline bool atomic64_try_cmpxchg_acquire(atomic64_t *v, s64 *old, s64 new) { instrument_atomic_read_write(v, sizeof(*v)); - instrument_atomic_read_write(old, sizeof(*old)); + instrument_read_write(old, sizeof(*old)); return arch_atomic64_try_cmpxchg_acquire(v, old, new); } @@ -1095,7 +1116,7 @@ static __always_inline bool atomic64_try_cmpxchg_release(atomic64_t *v, s64 *old, s64 new) { instrument_atomic_read_write(v, sizeof(*v)); - instrument_atomic_read_write(old, sizeof(*old)); + instrument_read_write(old, sizeof(*old)); return arch_atomic64_try_cmpxchg_release(v, old, new); } @@ -1103,7 +1124,7 @@ static __always_inline bool atomic64_try_cmpxchg_relaxed(atomic64_t *v, s64 *old, s64 new) { instrument_atomic_read_write(v, sizeof(*v)); - instrument_atomic_read_write(old, sizeof(*old)); + instrument_read_write(old, sizeof(*old)); return arch_atomic64_try_cmpxchg_relaxed(v, old, new); } @@ -1177,6 +1198,27 @@ atomic64_dec_if_positive(atomic64_t *v) return arch_atomic64_dec_if_positive(v); } +#define atomic64_inc_ofl(v, L) \ +({ \ + typeof(v) __ai_v = (v); \ + instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \ + arch_atomic64_inc_ofl(__ai_v, L); \ +}) + +#define atomic64_dec_ofl(v, L) \ +({ \ + typeof(v) __ai_v = (v); \ + instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \ + arch_atomic64_dec_ofl(__ai_v, L); \ +}) + +#define atomic64_dec_and_test_ofl(v, L) \ +({ \ + typeof(v) __ai_v = (v); \ + instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \ + arch_atomic64_dec_and_test_ofl(__ai_v, L); \ +}) + static __always_inline long atomic_long_read(const atomic_long_t *v) { @@ -1657,7 +1699,7 @@ static __always_inline bool atomic_long_try_cmpxchg(atomic_long_t *v, long *old, long new) { instrument_atomic_read_write(v, sizeof(*v)); - instrument_atomic_read_write(old, sizeof(*old)); + instrument_read_write(old, sizeof(*old)); return arch_atomic_long_try_cmpxchg(v, old, new); } @@ -1665,7 +1707,7 @@ static __always_inline bool atomic_long_try_cmpxchg_acquire(atomic_long_t *v, long *old, long new) { instrument_atomic_read_write(v, sizeof(*v)); - instrument_atomic_read_write(old, sizeof(*old)); + instrument_read_write(old, sizeof(*old)); return arch_atomic_long_try_cmpxchg_acquire(v, old, new); } @@ -1673,7 +1715,7 @@ static __always_inline bool atomic_long_try_cmpxchg_release(atomic_long_t *v, long *old, long new) { instrument_atomic_read_write(v, sizeof(*v)); - instrument_atomic_read_write(old, sizeof(*old)); + instrument_read_write(old, sizeof(*old)); return arch_atomic_long_try_cmpxchg_release(v, old, new); } @@ -1681,7 +1723,7 @@ static __always_inline bool atomic_long_try_cmpxchg_relaxed(atomic_long_t *v, long *old, long new) { instrument_atomic_read_write(v, sizeof(*v)); - instrument_atomic_read_write(old, sizeof(*old)); + instrument_read_write(old, sizeof(*old)); return arch_atomic_long_try_cmpxchg_relaxed(v, old, new); } @@ -1755,87 +1797,108 @@ atomic_long_dec_if_positive(atomic_long_ return arch_atomic_long_dec_if_positive(v); } +#define atomic_long_inc_ofl(v, L) \ +({ \ + typeof(v) __ai_v = (v); \ + instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \ + arch_atomic_long_inc_ofl(__ai_v, L); \ +}) + +#define atomic_long_dec_ofl(v, L) \ +({ \ + typeof(v) __ai_v = (v); \ + instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \ + arch_atomic_long_dec_ofl(__ai_v, L); \ +}) + +#define atomic_long_dec_and_test_ofl(v, L) \ +({ \ + typeof(v) __ai_v = (v); \ + instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \ + arch_atomic_long_dec_and_test_ofl(__ai_v, L); \ +}) + #define xchg(ptr, ...) \ ({ \ typeof(ptr) __ai_ptr = (ptr); \ - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ arch_xchg(__ai_ptr, __VA_ARGS__); \ }) #define xchg_acquire(ptr, ...) \ ({ \ typeof(ptr) __ai_ptr = (ptr); \ - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ arch_xchg_acquire(__ai_ptr, __VA_ARGS__); \ }) #define xchg_release(ptr, ...) \ ({ \ typeof(ptr) __ai_ptr = (ptr); \ - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ arch_xchg_release(__ai_ptr, __VA_ARGS__); \ }) #define xchg_relaxed(ptr, ...) \ ({ \ typeof(ptr) __ai_ptr = (ptr); \ - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ arch_xchg_relaxed(__ai_ptr, __VA_ARGS__); \ }) #define cmpxchg(ptr, ...) \ ({ \ typeof(ptr) __ai_ptr = (ptr); \ - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ arch_cmpxchg(__ai_ptr, __VA_ARGS__); \ }) #define cmpxchg_acquire(ptr, ...) \ ({ \ typeof(ptr) __ai_ptr = (ptr); \ - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ arch_cmpxchg_acquire(__ai_ptr, __VA_ARGS__); \ }) #define cmpxchg_release(ptr, ...) \ ({ \ typeof(ptr) __ai_ptr = (ptr); \ - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ arch_cmpxchg_release(__ai_ptr, __VA_ARGS__); \ }) #define cmpxchg_relaxed(ptr, ...) \ ({ \ typeof(ptr) __ai_ptr = (ptr); \ - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ arch_cmpxchg_relaxed(__ai_ptr, __VA_ARGS__); \ }) #define cmpxchg64(ptr, ...) \ ({ \ typeof(ptr) __ai_ptr = (ptr); \ - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ arch_cmpxchg64(__ai_ptr, __VA_ARGS__); \ }) #define cmpxchg64_acquire(ptr, ...) \ ({ \ typeof(ptr) __ai_ptr = (ptr); \ - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ arch_cmpxchg64_acquire(__ai_ptr, __VA_ARGS__); \ }) #define cmpxchg64_release(ptr, ...) \ ({ \ typeof(ptr) __ai_ptr = (ptr); \ - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ arch_cmpxchg64_release(__ai_ptr, __VA_ARGS__); \ }) #define cmpxchg64_relaxed(ptr, ...) \ ({ \ typeof(ptr) __ai_ptr = (ptr); \ - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ arch_cmpxchg64_relaxed(__ai_ptr, __VA_ARGS__); \ }) @@ -1843,8 +1906,8 @@ atomic_long_dec_if_positive(atomic_long_ ({ \ typeof(ptr) __ai_ptr = (ptr); \ typeof(oldp) __ai_oldp = (oldp); \ - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ - instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \ + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ + instrument_read_write(__ai_oldp, sizeof(*__ai_oldp)); \ arch_try_cmpxchg(__ai_ptr, __ai_oldp, __VA_ARGS__); \ }) @@ -1852,8 +1915,8 @@ atomic_long_dec_if_positive(atomic_long_ ({ \ typeof(ptr) __ai_ptr = (ptr); \ typeof(oldp) __ai_oldp = (oldp); \ - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ - instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \ + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ + instrument_read_write(__ai_oldp, sizeof(*__ai_oldp)); \ arch_try_cmpxchg_acquire(__ai_ptr, __ai_oldp, __VA_ARGS__); \ }) @@ -1861,8 +1924,8 @@ atomic_long_dec_if_positive(atomic_long_ ({ \ typeof(ptr) __ai_ptr = (ptr); \ typeof(oldp) __ai_oldp = (oldp); \ - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ - instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \ + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ + instrument_read_write(__ai_oldp, sizeof(*__ai_oldp)); \ arch_try_cmpxchg_release(__ai_ptr, __ai_oldp, __VA_ARGS__); \ }) @@ -1870,36 +1933,36 @@ atomic_long_dec_if_positive(atomic_long_ ({ \ typeof(ptr) __ai_ptr = (ptr); \ typeof(oldp) __ai_oldp = (oldp); \ - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ - instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \ + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ + instrument_read_write(__ai_oldp, sizeof(*__ai_oldp)); \ arch_try_cmpxchg_relaxed(__ai_ptr, __ai_oldp, __VA_ARGS__); \ }) #define cmpxchg_local(ptr, ...) \ ({ \ typeof(ptr) __ai_ptr = (ptr); \ - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ arch_cmpxchg_local(__ai_ptr, __VA_ARGS__); \ }) #define cmpxchg64_local(ptr, ...) \ ({ \ typeof(ptr) __ai_ptr = (ptr); \ - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ arch_cmpxchg64_local(__ai_ptr, __VA_ARGS__); \ }) #define sync_cmpxchg(ptr, ...) \ ({ \ typeof(ptr) __ai_ptr = (ptr); \ - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ arch_sync_cmpxchg(__ai_ptr, __VA_ARGS__); \ }) #define cmpxchg_double(ptr, ...) \ ({ \ typeof(ptr) __ai_ptr = (ptr); \ - instrument_atomic_write(__ai_ptr, 2 * sizeof(*__ai_ptr)); \ + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ arch_cmpxchg_double(__ai_ptr, __VA_ARGS__); \ }) @@ -1907,9 +1970,9 @@ atomic_long_dec_if_positive(atomic_long_ #define cmpxchg_double_local(ptr, ...) \ ({ \ typeof(ptr) __ai_ptr = (ptr); \ - instrument_atomic_write(__ai_ptr, 2 * sizeof(*__ai_ptr)); \ + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ arch_cmpxchg_double_local(__ai_ptr, __VA_ARGS__); \ }) #endif /* _LINUX_ATOMIC_INSTRUMENTED_H */ -// 2a9553f0a9d5619f19151092df5cabbbf16ce835 +// 214f9a7e972966a9a8e28e1568665cfb75decf91 --- a/include/linux/atomic/atomic-long.h +++ b/include/linux/atomic/atomic-long.h @@ -515,6 +515,21 @@ arch_atomic_long_dec_if_positive(atomic_ return arch_atomic64_dec_if_positive(v); } +#define arch_atomic_long_inc_ofl(v, L) \ +({ \ + arch_atomic64_inc_ofl((v), L) \ +}) + +#define arch_atomic_long_dec_ofl(v, L) \ +({ \ + arch_atomic64_dec_ofl((v), L) \ +}) + +#define arch_atomic_long_dec_and_test_ofl(v, L) \ +({ \ + arch_atomic64_dec_and_test_ofl((v), L) \ +}) + #else /* CONFIG_64BIT */ static __always_inline long @@ -1009,6 +1024,21 @@ arch_atomic_long_dec_if_positive(atomic_ return arch_atomic_dec_if_positive(v); } +#define arch_atomic_long_inc_ofl(v, L) \ +({ \ + arch_atomic_inc_ofl((v), L) \ +}) + +#define arch_atomic_long_dec_ofl(v, L) \ +({ \ + arch_atomic_dec_ofl((v), L) \ +}) + +#define arch_atomic_long_dec_and_test_ofl(v, L) \ +({ \ + arch_atomic_dec_and_test_ofl((v), L) \ +}) + #endif /* CONFIG_64BIT */ #endif /* _LINUX_ATOMIC_LONG_H */ -// e8f0e08ff072b74d180eabe2ad001282b38c2c88 +// 120f718985fa4c8f0e884cc4f23db8aa950255fb --- a/scripts/atomic/atomic-tbl.sh +++ b/scripts/atomic/atomic-tbl.sh @@ -36,6 +36,12 @@ meta_has_relaxed() meta_in "$1" "BFIR" } +#meta_has_macro(meta) +meta_has_macro() +{ + meta_in "$1" "m" +} + #find_fallback_template(pfx, name, sfx, order) find_fallback_template() { --- a/scripts/atomic/atomics.tbl +++ b/scripts/atomic/atomics.tbl @@ -10,12 +10,15 @@ # * F/f - fetch: returns base type (has fetch_ variants) # * l - load: returns base type (has _acquire order variant) # * s - store: returns void (has _release order variant) +# * m - macro: # # Where args contains list of type[:name], where type is: # * cv - const pointer to atomic base type (atomic_t/atomic64_t/atomic_long_t) # * v - pointer to atomic base type (atomic_t/atomic64_t/atomic_long_t) # * i - base type (int/s64/long) # * p - pointer to base type (int/s64/long) +# * L - label for exception case +# * V:... - vararg # read l cv set s v i @@ -39,3 +42,6 @@ inc_not_zero b v inc_unless_negative b v dec_unless_positive b v dec_if_positive i v +inc_ofl m v L +dec_ofl m v L +dec_and_test_ofl m v L --- /dev/null +++ b/scripts/atomic/fallbacks/dec_and_test_ofl @@ -0,0 +1,12 @@ +cat << EOF +#define arch_${atomic}_dec_and_test_ofl(_v, _label) \\ +({ \\ + bool __ret = false; \\ + ${int} __new = arch_${atomic}_dec_return(_v); \\ + if (unlikely(__new < 0)) \\ + goto _label; \\ + if (unlikely(__new == 0)) \\ + __ret = true; \\ + __ret; \\ +}) +EOF --- /dev/null +++ b/scripts/atomic/fallbacks/dec_ofl @@ -0,0 +1,8 @@ +cat << EOF +#define arch_${atomic}_dec_ofl(_v, _label) \\ +do { \\ + ${int} __new = arch_${atomic}_dec_return(_v); \\ + if (unlikely(__new <= 0)) \\ + goto _label; \\ +} while (0) +EOF --- /dev/null +++ b/scripts/atomic/fallbacks/inc_ofl @@ -0,0 +1,8 @@ +cat << EOF +#define arch_${atomic}_inc_ofl(_v, _label) \\ +do { \\ + ${int} __old = arch_${atomic}_fetch_inc(_v); \\ + if (unlikely(__old <= 0)) \\ + goto _label; \\ +} while (0) +EOF --- a/scripts/atomic/gen-atomic-fallback.sh +++ b/scripts/atomic/gen-atomic-fallback.sh @@ -27,7 +27,9 @@ gen_template_fallback() if [ ! -z "${template}" ]; then printf "#ifndef ${atomicname}\n" . ${template} - printf "#define ${atomicname} ${atomicname}\n" + if ! meta_has_macro "${meta}"; then + printf "#define ${atomicname} ${atomicname}\n" + fi printf "#endif\n\n" fi } --- a/scripts/atomic/gen-atomic-instrumented.sh +++ b/scripts/atomic/gen-atomic-instrumented.sh @@ -13,9 +13,13 @@ gen_param_check() local type="${arg%%:*}" local name="$(gen_param_name "${arg}")" local rw="write" + local pfx; case "${type#c}" in + v) pfx="atomic_";; i) return;; + L) return;; + V) return;; esac if [ ${type#c} != ${type} ]; then @@ -27,7 +31,16 @@ gen_param_check() rw="read_write" fi - printf "\tinstrument_atomic_${rw}(${name}, sizeof(*${name}));\n" + if meta_has_macro "${meta}"; then + name="__ai_${name}" + fi + + printf "\tinstrument_${pfx}${rw}(${name}, sizeof(*${name}));" + if meta_has_macro "${meta}"; then + printf " \\" + fi + printf "\n" + } #gen_params_checks(meta, arg...) @@ -41,6 +54,52 @@ gen_params_checks() done } +#gen_var(arg) +gen_var() +{ + local type="${1%%:*}" + local name="$(gen_param_name "$1")" + + case "${type#c}" in + L) return;; + V) return;; + esac + + printf "\ttypeof(${name}) __ai_${name} = (${name}); \\\\\n"; +} + +#gen_vars(arg...) +gen_vars() +{ + while [ "$#" -gt 0 ]; do + gen_var "$1" + shift + done +} + +#gen_varg(arg) +gen_varg() +{ + local type="${1%%:*}" + local name="$(gen_param_name "$1")" + + case "${type#c}" in + L) printf "${name}";; + V) printf "__VA_ARGS__";; + *) printf "__ai_${name}";; + esac +} + +#gen_vargs(arg...) +gen_vargs() +{ + while [ "$#" -gt 0 ]; do + printf "$(gen_varg "$1")" + [ "$#" -gt 1 ] && printf ", " + shift + done +} + #gen_proto_order_variant(meta, pfx, name, sfx, order, atomic, int, arg...) gen_proto_order_variant() { @@ -54,11 +113,28 @@ gen_proto_order_variant() local atomicname="${atomic}_${pfx}${name}${sfx}${order}" - local ret="$(gen_ret_type "${meta}" "${int}")" - local params="$(gen_params "${int}" "${atomic}" "$@")" local checks="$(gen_params_checks "${meta}" "$@")" local args="$(gen_args "$@")" - local retstmt="$(gen_ret_stmt "${meta}")" + + if meta_has_macro "${meta}"; then + + local vars="$(gen_vars "$@")" + local vargs="$(gen_vargs "$@")" + +cat <<EOF +#define ${atomicname}(${args}) \\ +({ \\ +${vars} +${checks} + arch_${atomicname}(${vargs}); \\ +}) +EOF + + else + + local ret="$(gen_ret_type "${meta}" "${int}")" + local params="$(gen_params "${int}" "${atomic}" "$@")" + local retstmt="$(gen_ret_stmt "${meta}")" cat <<EOF static __always_inline ${ret} @@ -69,6 +145,8 @@ ${checks} } EOF + fi + printf "\n" } @@ -76,32 +154,27 @@ gen_xchg() { local xchg="$1"; shift local mult="$1"; shift + local ARGS; if [ "${xchg%${xchg#try_cmpxchg}}" = "try_cmpxchg" ] ; then - -cat <<EOF -#define ${xchg}(ptr, oldp, ...) \\ -({ \\ - typeof(ptr) __ai_ptr = (ptr); \\ - typeof(oldp) __ai_oldp = (oldp); \\ - instrument_atomic_write(__ai_ptr, ${mult}sizeof(*__ai_ptr)); \\ - instrument_atomic_write(__ai_oldp, ${mult}sizeof(*__ai_oldp)); \\ - arch_${xchg}(__ai_ptr, __ai_oldp, __VA_ARGS__); \\ -}) -EOF - + ARGS="v:ptr p:oldp V:..." else + ARGS="v:ptr V:..." + fi + + local args="$(gen_args ${ARGS})" + local vars="$(gen_vars ${ARGS})" + local checks="$(gen_params_checks "m" ${ARGS})" + local vargs="$(gen_vargs ${ARGS})" cat <<EOF -#define ${xchg}(ptr, ...) \\ +#define ${xchg}(${args}) \\ ({ \\ - typeof(ptr) __ai_ptr = (ptr); \\ - instrument_atomic_write(__ai_ptr, ${mult}sizeof(*__ai_ptr)); \\ - arch_${xchg}(__ai_ptr, __VA_ARGS__); \\ +${vars} +${checks} + arch_${xchg}(${vargs}); \\ }) EOF - - fi } cat << EOF --- a/scripts/atomic/gen-atomic-long.sh +++ b/scripts/atomic/gen-atomic-long.sh @@ -17,16 +17,21 @@ gen_cast() printf "($(gen_param_type "${arg}" "${int}" "${atomic}"))" } -#gen_args_cast(int, atomic, arg...) +#gen_args_cast(meta, int, atomic, arg...) gen_args_cast() { + local meta=$1; shift local int="$1"; shift local atomic="$1"; shift while [ "$#" -gt 0 ]; do local cast="$(gen_cast "$1" "${int}" "${atomic}")" local arg="$(gen_param_name "$1")" - printf "${cast}${arg}" + if meta_has_macro "${meta}" && [ "${1%%:*}" != "L" ]; then + printf "${cast}(${arg})" + else + printf "${cast}${arg}" + fi [ "$#" -gt 1 ] && printf ", " shift; done @@ -40,10 +45,24 @@ gen_proto_order_variant() local atomic="$1"; shift local int="$1"; shift - local ret="$(gen_ret_type "${meta}" "long")" - local params="$(gen_params "long" "atomic_long" "$@")" - local argscast="$(gen_args_cast "${int}" "${atomic}" "$@")" - local retstmt="$(gen_ret_stmt "${meta}")" + local argscast="$(gen_args_cast "${meta}" "${int}" "${atomic}" "$@")" + + if meta_has_macro "${meta}"; then + + local args="$(gen_args "$@")" + +cat <<EOF +#define arch_atomic_long_${name}(${args}) \\ +({ \\ + arch_${atomic}_${name}(${argscast}) \\ +}) + +EOF + else + + local ret="$(gen_ret_type "${meta}" "long")" + local params="$(gen_params "long" "atomic_long" "$@")" + local retstmt="$(gen_ret_stmt "${meta}")" cat <<EOF static __always_inline ${ret} @@ -53,6 +72,7 @@ arch_atomic_long_${name}(${params}) } EOF + fi } cat << EOF ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC][PATCH 1/5] atomic: Introduce atomic_{inc,dec,dec_and_test}_ofl() 2021-12-08 18:36 ` [RFC][PATCH 1/5] atomic: Introduce atomic_{inc,dec,dec_and_test}_ofl() Peter Zijlstra @ 2021-12-09 12:42 ` Mark Rutland 2021-12-09 13:34 ` Peter Zijlstra 0 siblings, 1 reply; 17+ messages in thread From: Mark Rutland @ 2021-12-09 12:42 UTC (permalink / raw) To: Peter Zijlstra Cc: will, boqun.feng, linux-kernel, x86, elver, keescook, hch, torvalds, axboe On Wed, Dec 08, 2021 at 07:36:56PM +0100, Peter Zijlstra wrote: > In order to facilitate architecture support for refcount_t, introduce > a number of new atomic primitives that have a uaccess style exception > for overflow. > > Notably: > > atomic_inc_ofl(v, Label) -- increment and goto Label when > v is zero or negative. > > atomic_dec_ofl(v, Label) -- decrement and goto Label when > the result is zero or negative > > atomic_dec_and_test_ofl(v, Label) -- decrement and return true when > the result is zero and goto Label > when the result is negative Just to check, atomic_inc_ofl() tests the *old* value of `v`, and the other cases check the *new* value of `v`? For clarity, in the descriptions it might be worth: s/v/the old value of v/ s/the result/the new value of v/ ... which I think makes that clearer. > Since the GCC 'Labels as Values' extention doesn't allow having the > goto in an inline function, these new 'functions' must in fact be > implemented as macro magic. Oh; fun... :( GCC allows the label to be passed into in a *nested* function, so I had a play with that, hoping we might be able to get scoping and evaluatio with something like: | #define arch_${atomic}_dec_and_test_ofl(_v, _label) \\ | ({ \\ | bool __nested_arch_${atomic}_dec_and_test_ovf(${atomic}_t *v, void *label) \\ | { \\ | ${int} new = arch_${atomic}_dec_return(v); \\ | if (unlikely(new < 0)) \\ | goto label; \\ | if (unlikely(new == 0)) \\ | return true; \\ | return false; \\ | } \\ | __nested_arch_${atomic}_dec_and_test_ovf(_v, &&_label); \\ | }) ... but GCC refused to inline things, and Clang doesn't appear to support nested functions, so it does appear we're stuck with using macros... It's a shame to have to use macros, since we have to take great care to avoid multiple-evaluation and/or shadowing of variables, but we're probably OK in practice given these are simple enough. > This meant extending the atomic generation scripts to deal with > wrapping macros instead of inline functions. Since > xchg/cmpxchg/try_cmpxchg were already macro magic, there was existant > code for that. While extending/improving that a few latent > 'instrumentation' bugs were uncovered and 'accidentally' fixed. I assume for non-RFC we can split that out into a preparatory patch. :) Having played about a bit, I don't have any suggestions for improving the scripting, and I think that looks OK. Thanks, Mark. > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > --- > include/linux/atomic/atomic-arch-fallback.h | 64 ++++++++++++ > include/linux/atomic/atomic-instrumented.h | 139 ++++++++++++++++++++-------- > include/linux/atomic/atomic-long.h | 32 ++++++ > scripts/atomic/atomic-tbl.sh | 6 + > scripts/atomic/atomics.tbl | 6 + > scripts/atomic/fallbacks/dec_and_test_ofl | 12 ++ > scripts/atomic/fallbacks/dec_ofl | 8 + > scripts/atomic/fallbacks/inc_ofl | 8 + > scripts/atomic/gen-atomic-fallback.sh | 4 > scripts/atomic/gen-atomic-instrumented.sh | 117 +++++++++++++++++++---- > scripts/atomic/gen-atomic-long.sh | 32 +++++- > 11 files changed, 359 insertions(+), 69 deletions(-) > > --- a/include/linux/atomic/atomic-arch-fallback.h > +++ b/include/linux/atomic/atomic-arch-fallback.h > @@ -1250,6 +1250,37 @@ arch_atomic_dec_if_positive(atomic_t *v) > #define arch_atomic_dec_if_positive arch_atomic_dec_if_positive > #endif > > +#ifndef arch_atomic_inc_ofl > +#define arch_atomic_inc_ofl(_v, _label) \ > +do { \ > + int __old = arch_atomic_fetch_inc(_v); \ > + if (unlikely(__old <= 0)) \ > + goto _label; \ > +} while (0) > +#endif > + > +#ifndef arch_atomic_dec_ofl > +#define arch_atomic_dec_ofl(_v, _label) \ > +do { \ > + int __new = arch_atomic_dec_return(_v); \ > + if (unlikely(__new <= 0)) \ > + goto _label; \ > +} while (0) > +#endif > + > +#ifndef arch_atomic_dec_and_test_ofl > +#define arch_atomic_dec_and_test_ofl(_v, _label) \ > +({ \ > + bool __ret = false; \ > + int __new = arch_atomic_dec_return(_v); \ > + if (unlikely(__new < 0)) \ > + goto _label; \ > + if (unlikely(__new == 0)) \ > + __ret = true; \ > + __ret; \ > +}) > +#endif > + > #ifdef CONFIG_GENERIC_ATOMIC64 > #include <asm-generic/atomic64.h> > #endif > @@ -2357,5 +2388,36 @@ arch_atomic64_dec_if_positive(atomic64_t > #define arch_atomic64_dec_if_positive arch_atomic64_dec_if_positive > #endif > > +#ifndef arch_atomic64_inc_ofl > +#define arch_atomic64_inc_ofl(_v, _label) \ > +do { \ > + s64 __old = arch_atomic64_fetch_inc(_v); \ > + if (unlikely(__old <= 0)) \ > + goto _label; \ > +} while (0) > +#endif > + > +#ifndef arch_atomic64_dec_ofl > +#define arch_atomic64_dec_ofl(_v, _label) \ > +do { \ > + s64 __new = arch_atomic64_dec_return(_v); \ > + if (unlikely(__new <= 0)) \ > + goto _label; \ > +} while (0) > +#endif > + > +#ifndef arch_atomic64_dec_and_test_ofl > +#define arch_atomic64_dec_and_test_ofl(_v, _label) \ > +({ \ > + bool __ret = false; \ > + s64 __new = arch_atomic64_dec_return(_v); \ > + if (unlikely(__new < 0)) \ > + goto _label; \ > + if (unlikely(__new == 0)) \ > + __ret = true; \ > + __ret; \ > +}) > +#endif > + > #endif /* _LINUX_ATOMIC_FALLBACK_H */ > -// cca554917d7ea73d5e3e7397dd70c484cad9b2c4 > +// a59904b14db62a38bbab8699edc4a785a97871fb > --- a/include/linux/atomic/atomic-instrumented.h > +++ b/include/linux/atomic/atomic-instrumented.h > @@ -501,7 +501,7 @@ static __always_inline bool > atomic_try_cmpxchg(atomic_t *v, int *old, int new) > { > instrument_atomic_read_write(v, sizeof(*v)); > - instrument_atomic_read_write(old, sizeof(*old)); > + instrument_read_write(old, sizeof(*old)); > return arch_atomic_try_cmpxchg(v, old, new); > } > > @@ -509,7 +509,7 @@ static __always_inline bool > atomic_try_cmpxchg_acquire(atomic_t *v, int *old, int new) > { > instrument_atomic_read_write(v, sizeof(*v)); > - instrument_atomic_read_write(old, sizeof(*old)); > + instrument_read_write(old, sizeof(*old)); > return arch_atomic_try_cmpxchg_acquire(v, old, new); > } > > @@ -517,7 +517,7 @@ static __always_inline bool > atomic_try_cmpxchg_release(atomic_t *v, int *old, int new) > { > instrument_atomic_read_write(v, sizeof(*v)); > - instrument_atomic_read_write(old, sizeof(*old)); > + instrument_read_write(old, sizeof(*old)); > return arch_atomic_try_cmpxchg_release(v, old, new); > } > > @@ -525,7 +525,7 @@ static __always_inline bool > atomic_try_cmpxchg_relaxed(atomic_t *v, int *old, int new) > { > instrument_atomic_read_write(v, sizeof(*v)); > - instrument_atomic_read_write(old, sizeof(*old)); > + instrument_read_write(old, sizeof(*old)); > return arch_atomic_try_cmpxchg_relaxed(v, old, new); > } > > @@ -599,6 +599,27 @@ atomic_dec_if_positive(atomic_t *v) > return arch_atomic_dec_if_positive(v); > } > > +#define atomic_inc_ofl(v, L) \ > +({ \ > + typeof(v) __ai_v = (v); \ > + instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \ > + arch_atomic_inc_ofl(__ai_v, L); \ > +}) > + > +#define atomic_dec_ofl(v, L) \ > +({ \ > + typeof(v) __ai_v = (v); \ > + instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \ > + arch_atomic_dec_ofl(__ai_v, L); \ > +}) > + > +#define atomic_dec_and_test_ofl(v, L) \ > +({ \ > + typeof(v) __ai_v = (v); \ > + instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \ > + arch_atomic_dec_and_test_ofl(__ai_v, L); \ > +}) > + > static __always_inline s64 > atomic64_read(const atomic64_t *v) > { > @@ -1079,7 +1100,7 @@ static __always_inline bool > atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new) > { > instrument_atomic_read_write(v, sizeof(*v)); > - instrument_atomic_read_write(old, sizeof(*old)); > + instrument_read_write(old, sizeof(*old)); > return arch_atomic64_try_cmpxchg(v, old, new); > } > > @@ -1087,7 +1108,7 @@ static __always_inline bool > atomic64_try_cmpxchg_acquire(atomic64_t *v, s64 *old, s64 new) > { > instrument_atomic_read_write(v, sizeof(*v)); > - instrument_atomic_read_write(old, sizeof(*old)); > + instrument_read_write(old, sizeof(*old)); > return arch_atomic64_try_cmpxchg_acquire(v, old, new); > } > > @@ -1095,7 +1116,7 @@ static __always_inline bool > atomic64_try_cmpxchg_release(atomic64_t *v, s64 *old, s64 new) > { > instrument_atomic_read_write(v, sizeof(*v)); > - instrument_atomic_read_write(old, sizeof(*old)); > + instrument_read_write(old, sizeof(*old)); > return arch_atomic64_try_cmpxchg_release(v, old, new); > } > > @@ -1103,7 +1124,7 @@ static __always_inline bool > atomic64_try_cmpxchg_relaxed(atomic64_t *v, s64 *old, s64 new) > { > instrument_atomic_read_write(v, sizeof(*v)); > - instrument_atomic_read_write(old, sizeof(*old)); > + instrument_read_write(old, sizeof(*old)); > return arch_atomic64_try_cmpxchg_relaxed(v, old, new); > } > > @@ -1177,6 +1198,27 @@ atomic64_dec_if_positive(atomic64_t *v) > return arch_atomic64_dec_if_positive(v); > } > > +#define atomic64_inc_ofl(v, L) \ > +({ \ > + typeof(v) __ai_v = (v); \ > + instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \ > + arch_atomic64_inc_ofl(__ai_v, L); \ > +}) > + > +#define atomic64_dec_ofl(v, L) \ > +({ \ > + typeof(v) __ai_v = (v); \ > + instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \ > + arch_atomic64_dec_ofl(__ai_v, L); \ > +}) > + > +#define atomic64_dec_and_test_ofl(v, L) \ > +({ \ > + typeof(v) __ai_v = (v); \ > + instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \ > + arch_atomic64_dec_and_test_ofl(__ai_v, L); \ > +}) > + > static __always_inline long > atomic_long_read(const atomic_long_t *v) > { > @@ -1657,7 +1699,7 @@ static __always_inline bool > atomic_long_try_cmpxchg(atomic_long_t *v, long *old, long new) > { > instrument_atomic_read_write(v, sizeof(*v)); > - instrument_atomic_read_write(old, sizeof(*old)); > + instrument_read_write(old, sizeof(*old)); > return arch_atomic_long_try_cmpxchg(v, old, new); > } > > @@ -1665,7 +1707,7 @@ static __always_inline bool > atomic_long_try_cmpxchg_acquire(atomic_long_t *v, long *old, long new) > { > instrument_atomic_read_write(v, sizeof(*v)); > - instrument_atomic_read_write(old, sizeof(*old)); > + instrument_read_write(old, sizeof(*old)); > return arch_atomic_long_try_cmpxchg_acquire(v, old, new); > } > > @@ -1673,7 +1715,7 @@ static __always_inline bool > atomic_long_try_cmpxchg_release(atomic_long_t *v, long *old, long new) > { > instrument_atomic_read_write(v, sizeof(*v)); > - instrument_atomic_read_write(old, sizeof(*old)); > + instrument_read_write(old, sizeof(*old)); > return arch_atomic_long_try_cmpxchg_release(v, old, new); > } > > @@ -1681,7 +1723,7 @@ static __always_inline bool > atomic_long_try_cmpxchg_relaxed(atomic_long_t *v, long *old, long new) > { > instrument_atomic_read_write(v, sizeof(*v)); > - instrument_atomic_read_write(old, sizeof(*old)); > + instrument_read_write(old, sizeof(*old)); > return arch_atomic_long_try_cmpxchg_relaxed(v, old, new); > } > > @@ -1755,87 +1797,108 @@ atomic_long_dec_if_positive(atomic_long_ > return arch_atomic_long_dec_if_positive(v); > } > > +#define atomic_long_inc_ofl(v, L) \ > +({ \ > + typeof(v) __ai_v = (v); \ > + instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \ > + arch_atomic_long_inc_ofl(__ai_v, L); \ > +}) > + > +#define atomic_long_dec_ofl(v, L) \ > +({ \ > + typeof(v) __ai_v = (v); \ > + instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \ > + arch_atomic_long_dec_ofl(__ai_v, L); \ > +}) > + > +#define atomic_long_dec_and_test_ofl(v, L) \ > +({ \ > + typeof(v) __ai_v = (v); \ > + instrument_atomic_read_write(__ai_v, sizeof(*__ai_v)); \ > + arch_atomic_long_dec_and_test_ofl(__ai_v, L); \ > +}) > + > #define xchg(ptr, ...) \ > ({ \ > typeof(ptr) __ai_ptr = (ptr); \ > - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ > + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ > arch_xchg(__ai_ptr, __VA_ARGS__); \ > }) > > #define xchg_acquire(ptr, ...) \ > ({ \ > typeof(ptr) __ai_ptr = (ptr); \ > - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ > + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ > arch_xchg_acquire(__ai_ptr, __VA_ARGS__); \ > }) > > #define xchg_release(ptr, ...) \ > ({ \ > typeof(ptr) __ai_ptr = (ptr); \ > - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ > + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ > arch_xchg_release(__ai_ptr, __VA_ARGS__); \ > }) > > #define xchg_relaxed(ptr, ...) \ > ({ \ > typeof(ptr) __ai_ptr = (ptr); \ > - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ > + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ > arch_xchg_relaxed(__ai_ptr, __VA_ARGS__); \ > }) > > #define cmpxchg(ptr, ...) \ > ({ \ > typeof(ptr) __ai_ptr = (ptr); \ > - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ > + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ > arch_cmpxchg(__ai_ptr, __VA_ARGS__); \ > }) > > #define cmpxchg_acquire(ptr, ...) \ > ({ \ > typeof(ptr) __ai_ptr = (ptr); \ > - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ > + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ > arch_cmpxchg_acquire(__ai_ptr, __VA_ARGS__); \ > }) > > #define cmpxchg_release(ptr, ...) \ > ({ \ > typeof(ptr) __ai_ptr = (ptr); \ > - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ > + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ > arch_cmpxchg_release(__ai_ptr, __VA_ARGS__); \ > }) > > #define cmpxchg_relaxed(ptr, ...) \ > ({ \ > typeof(ptr) __ai_ptr = (ptr); \ > - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ > + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ > arch_cmpxchg_relaxed(__ai_ptr, __VA_ARGS__); \ > }) > > #define cmpxchg64(ptr, ...) \ > ({ \ > typeof(ptr) __ai_ptr = (ptr); \ > - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ > + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ > arch_cmpxchg64(__ai_ptr, __VA_ARGS__); \ > }) > > #define cmpxchg64_acquire(ptr, ...) \ > ({ \ > typeof(ptr) __ai_ptr = (ptr); \ > - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ > + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ > arch_cmpxchg64_acquire(__ai_ptr, __VA_ARGS__); \ > }) > > #define cmpxchg64_release(ptr, ...) \ > ({ \ > typeof(ptr) __ai_ptr = (ptr); \ > - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ > + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ > arch_cmpxchg64_release(__ai_ptr, __VA_ARGS__); \ > }) > > #define cmpxchg64_relaxed(ptr, ...) \ > ({ \ > typeof(ptr) __ai_ptr = (ptr); \ > - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ > + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ > arch_cmpxchg64_relaxed(__ai_ptr, __VA_ARGS__); \ > }) > > @@ -1843,8 +1906,8 @@ atomic_long_dec_if_positive(atomic_long_ > ({ \ > typeof(ptr) __ai_ptr = (ptr); \ > typeof(oldp) __ai_oldp = (oldp); \ > - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ > - instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \ > + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ > + instrument_read_write(__ai_oldp, sizeof(*__ai_oldp)); \ > arch_try_cmpxchg(__ai_ptr, __ai_oldp, __VA_ARGS__); \ > }) > > @@ -1852,8 +1915,8 @@ atomic_long_dec_if_positive(atomic_long_ > ({ \ > typeof(ptr) __ai_ptr = (ptr); \ > typeof(oldp) __ai_oldp = (oldp); \ > - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ > - instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \ > + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ > + instrument_read_write(__ai_oldp, sizeof(*__ai_oldp)); \ > arch_try_cmpxchg_acquire(__ai_ptr, __ai_oldp, __VA_ARGS__); \ > }) > > @@ -1861,8 +1924,8 @@ atomic_long_dec_if_positive(atomic_long_ > ({ \ > typeof(ptr) __ai_ptr = (ptr); \ > typeof(oldp) __ai_oldp = (oldp); \ > - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ > - instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \ > + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ > + instrument_read_write(__ai_oldp, sizeof(*__ai_oldp)); \ > arch_try_cmpxchg_release(__ai_ptr, __ai_oldp, __VA_ARGS__); \ > }) > > @@ -1870,36 +1933,36 @@ atomic_long_dec_if_positive(atomic_long_ > ({ \ > typeof(ptr) __ai_ptr = (ptr); \ > typeof(oldp) __ai_oldp = (oldp); \ > - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ > - instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \ > + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ > + instrument_read_write(__ai_oldp, sizeof(*__ai_oldp)); \ > arch_try_cmpxchg_relaxed(__ai_ptr, __ai_oldp, __VA_ARGS__); \ > }) > > #define cmpxchg_local(ptr, ...) \ > ({ \ > typeof(ptr) __ai_ptr = (ptr); \ > - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ > + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ > arch_cmpxchg_local(__ai_ptr, __VA_ARGS__); \ > }) > > #define cmpxchg64_local(ptr, ...) \ > ({ \ > typeof(ptr) __ai_ptr = (ptr); \ > - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ > + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ > arch_cmpxchg64_local(__ai_ptr, __VA_ARGS__); \ > }) > > #define sync_cmpxchg(ptr, ...) \ > ({ \ > typeof(ptr) __ai_ptr = (ptr); \ > - instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ > + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ > arch_sync_cmpxchg(__ai_ptr, __VA_ARGS__); \ > }) > > #define cmpxchg_double(ptr, ...) \ > ({ \ > typeof(ptr) __ai_ptr = (ptr); \ > - instrument_atomic_write(__ai_ptr, 2 * sizeof(*__ai_ptr)); \ > + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ > arch_cmpxchg_double(__ai_ptr, __VA_ARGS__); \ > }) > > @@ -1907,9 +1970,9 @@ atomic_long_dec_if_positive(atomic_long_ > #define cmpxchg_double_local(ptr, ...) \ > ({ \ > typeof(ptr) __ai_ptr = (ptr); \ > - instrument_atomic_write(__ai_ptr, 2 * sizeof(*__ai_ptr)); \ > + instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \ > arch_cmpxchg_double_local(__ai_ptr, __VA_ARGS__); \ > }) > > #endif /* _LINUX_ATOMIC_INSTRUMENTED_H */ > -// 2a9553f0a9d5619f19151092df5cabbbf16ce835 > +// 214f9a7e972966a9a8e28e1568665cfb75decf91 > --- a/include/linux/atomic/atomic-long.h > +++ b/include/linux/atomic/atomic-long.h > @@ -515,6 +515,21 @@ arch_atomic_long_dec_if_positive(atomic_ > return arch_atomic64_dec_if_positive(v); > } > > +#define arch_atomic_long_inc_ofl(v, L) \ > +({ \ > + arch_atomic64_inc_ofl((v), L) \ > +}) > + > +#define arch_atomic_long_dec_ofl(v, L) \ > +({ \ > + arch_atomic64_dec_ofl((v), L) \ > +}) > + > +#define arch_atomic_long_dec_and_test_ofl(v, L) \ > +({ \ > + arch_atomic64_dec_and_test_ofl((v), L) \ > +}) > + > #else /* CONFIG_64BIT */ > > static __always_inline long > @@ -1009,6 +1024,21 @@ arch_atomic_long_dec_if_positive(atomic_ > return arch_atomic_dec_if_positive(v); > } > > +#define arch_atomic_long_inc_ofl(v, L) \ > +({ \ > + arch_atomic_inc_ofl((v), L) \ > +}) > + > +#define arch_atomic_long_dec_ofl(v, L) \ > +({ \ > + arch_atomic_dec_ofl((v), L) \ > +}) > + > +#define arch_atomic_long_dec_and_test_ofl(v, L) \ > +({ \ > + arch_atomic_dec_and_test_ofl((v), L) \ > +}) > + > #endif /* CONFIG_64BIT */ > #endif /* _LINUX_ATOMIC_LONG_H */ > -// e8f0e08ff072b74d180eabe2ad001282b38c2c88 > +// 120f718985fa4c8f0e884cc4f23db8aa950255fb > --- a/scripts/atomic/atomic-tbl.sh > +++ b/scripts/atomic/atomic-tbl.sh > @@ -36,6 +36,12 @@ meta_has_relaxed() > meta_in "$1" "BFIR" > } > > +#meta_has_macro(meta) > +meta_has_macro() > +{ > + meta_in "$1" "m" > +} > + > #find_fallback_template(pfx, name, sfx, order) > find_fallback_template() > { > --- a/scripts/atomic/atomics.tbl > +++ b/scripts/atomic/atomics.tbl > @@ -10,12 +10,15 @@ > # * F/f - fetch: returns base type (has fetch_ variants) > # * l - load: returns base type (has _acquire order variant) > # * s - store: returns void (has _release order variant) > +# * m - macro: > # > # Where args contains list of type[:name], where type is: > # * cv - const pointer to atomic base type (atomic_t/atomic64_t/atomic_long_t) > # * v - pointer to atomic base type (atomic_t/atomic64_t/atomic_long_t) > # * i - base type (int/s64/long) > # * p - pointer to base type (int/s64/long) > +# * L - label for exception case > +# * V:... - vararg > # > read l cv > set s v i > @@ -39,3 +42,6 @@ inc_not_zero b v > inc_unless_negative b v > dec_unless_positive b v > dec_if_positive i v > +inc_ofl m v L > +dec_ofl m v L > +dec_and_test_ofl m v L > --- /dev/null > +++ b/scripts/atomic/fallbacks/dec_and_test_ofl > @@ -0,0 +1,12 @@ > +cat << EOF > +#define arch_${atomic}_dec_and_test_ofl(_v, _label) \\ > +({ \\ > + bool __ret = false; \\ > + ${int} __new = arch_${atomic}_dec_return(_v); \\ > + if (unlikely(__new < 0)) \\ > + goto _label; \\ > + if (unlikely(__new == 0)) \\ > + __ret = true; \\ > + __ret; \\ > +}) > +EOF > --- /dev/null > +++ b/scripts/atomic/fallbacks/dec_ofl > @@ -0,0 +1,8 @@ > +cat << EOF > +#define arch_${atomic}_dec_ofl(_v, _label) \\ > +do { \\ > + ${int} __new = arch_${atomic}_dec_return(_v); \\ > + if (unlikely(__new <= 0)) \\ > + goto _label; \\ > +} while (0) > +EOF > --- /dev/null > +++ b/scripts/atomic/fallbacks/inc_ofl > @@ -0,0 +1,8 @@ > +cat << EOF > +#define arch_${atomic}_inc_ofl(_v, _label) \\ > +do { \\ > + ${int} __old = arch_${atomic}_fetch_inc(_v); \\ > + if (unlikely(__old <= 0)) \\ > + goto _label; \\ > +} while (0) > +EOF > --- a/scripts/atomic/gen-atomic-fallback.sh > +++ b/scripts/atomic/gen-atomic-fallback.sh > @@ -27,7 +27,9 @@ gen_template_fallback() > if [ ! -z "${template}" ]; then > printf "#ifndef ${atomicname}\n" > . ${template} > - printf "#define ${atomicname} ${atomicname}\n" > + if ! meta_has_macro "${meta}"; then > + printf "#define ${atomicname} ${atomicname}\n" > + fi > printf "#endif\n\n" > fi > } > --- a/scripts/atomic/gen-atomic-instrumented.sh > +++ b/scripts/atomic/gen-atomic-instrumented.sh > @@ -13,9 +13,13 @@ gen_param_check() > local type="${arg%%:*}" > local name="$(gen_param_name "${arg}")" > local rw="write" > + local pfx; > > case "${type#c}" in > + v) pfx="atomic_";; > i) return;; > + L) return;; > + V) return;; > esac > > if [ ${type#c} != ${type} ]; then > @@ -27,7 +31,16 @@ gen_param_check() > rw="read_write" > fi > > - printf "\tinstrument_atomic_${rw}(${name}, sizeof(*${name}));\n" > + if meta_has_macro "${meta}"; then > + name="__ai_${name}" > + fi > + > + printf "\tinstrument_${pfx}${rw}(${name}, sizeof(*${name}));" > + if meta_has_macro "${meta}"; then > + printf " \\" > + fi > + printf "\n" > + > } > > #gen_params_checks(meta, arg...) > @@ -41,6 +54,52 @@ gen_params_checks() > done > } > > +#gen_var(arg) > +gen_var() > +{ > + local type="${1%%:*}" > + local name="$(gen_param_name "$1")" > + > + case "${type#c}" in > + L) return;; > + V) return;; > + esac > + > + printf "\ttypeof(${name}) __ai_${name} = (${name}); \\\\\n"; > +} > + > +#gen_vars(arg...) > +gen_vars() > +{ > + while [ "$#" -gt 0 ]; do > + gen_var "$1" > + shift > + done > +} > + > +#gen_varg(arg) > +gen_varg() > +{ > + local type="${1%%:*}" > + local name="$(gen_param_name "$1")" > + > + case "${type#c}" in > + L) printf "${name}";; > + V) printf "__VA_ARGS__";; > + *) printf "__ai_${name}";; > + esac > +} > + > +#gen_vargs(arg...) > +gen_vargs() > +{ > + while [ "$#" -gt 0 ]; do > + printf "$(gen_varg "$1")" > + [ "$#" -gt 1 ] && printf ", " > + shift > + done > +} > + > #gen_proto_order_variant(meta, pfx, name, sfx, order, atomic, int, arg...) > gen_proto_order_variant() > { > @@ -54,11 +113,28 @@ gen_proto_order_variant() > > local atomicname="${atomic}_${pfx}${name}${sfx}${order}" > > - local ret="$(gen_ret_type "${meta}" "${int}")" > - local params="$(gen_params "${int}" "${atomic}" "$@")" > local checks="$(gen_params_checks "${meta}" "$@")" > local args="$(gen_args "$@")" > - local retstmt="$(gen_ret_stmt "${meta}")" > + > + if meta_has_macro "${meta}"; then > + > + local vars="$(gen_vars "$@")" > + local vargs="$(gen_vargs "$@")" > + > +cat <<EOF > +#define ${atomicname}(${args}) \\ > +({ \\ > +${vars} > +${checks} > + arch_${atomicname}(${vargs}); \\ > +}) > +EOF > + > + else > + > + local ret="$(gen_ret_type "${meta}" "${int}")" > + local params="$(gen_params "${int}" "${atomic}" "$@")" > + local retstmt="$(gen_ret_stmt "${meta}")" > > cat <<EOF > static __always_inline ${ret} > @@ -69,6 +145,8 @@ ${checks} > } > EOF > > + fi > + > printf "\n" > } > > @@ -76,32 +154,27 @@ gen_xchg() > { > local xchg="$1"; shift > local mult="$1"; shift > + local ARGS; > > if [ "${xchg%${xchg#try_cmpxchg}}" = "try_cmpxchg" ] ; then > - > -cat <<EOF > -#define ${xchg}(ptr, oldp, ...) \\ > -({ \\ > - typeof(ptr) __ai_ptr = (ptr); \\ > - typeof(oldp) __ai_oldp = (oldp); \\ > - instrument_atomic_write(__ai_ptr, ${mult}sizeof(*__ai_ptr)); \\ > - instrument_atomic_write(__ai_oldp, ${mult}sizeof(*__ai_oldp)); \\ > - arch_${xchg}(__ai_ptr, __ai_oldp, __VA_ARGS__); \\ > -}) > -EOF > - > + ARGS="v:ptr p:oldp V:..." > else > + ARGS="v:ptr V:..." > + fi > + > + local args="$(gen_args ${ARGS})" > + local vars="$(gen_vars ${ARGS})" > + local checks="$(gen_params_checks "m" ${ARGS})" > + local vargs="$(gen_vargs ${ARGS})" > > cat <<EOF > -#define ${xchg}(ptr, ...) \\ > +#define ${xchg}(${args}) \\ > ({ \\ > - typeof(ptr) __ai_ptr = (ptr); \\ > - instrument_atomic_write(__ai_ptr, ${mult}sizeof(*__ai_ptr)); \\ > - arch_${xchg}(__ai_ptr, __VA_ARGS__); \\ > +${vars} > +${checks} > + arch_${xchg}(${vargs}); \\ > }) > EOF > - > - fi > } > > cat << EOF > --- a/scripts/atomic/gen-atomic-long.sh > +++ b/scripts/atomic/gen-atomic-long.sh > @@ -17,16 +17,21 @@ gen_cast() > printf "($(gen_param_type "${arg}" "${int}" "${atomic}"))" > } > > -#gen_args_cast(int, atomic, arg...) > +#gen_args_cast(meta, int, atomic, arg...) > gen_args_cast() > { > + local meta=$1; shift > local int="$1"; shift > local atomic="$1"; shift > > while [ "$#" -gt 0 ]; do > local cast="$(gen_cast "$1" "${int}" "${atomic}")" > local arg="$(gen_param_name "$1")" > - printf "${cast}${arg}" > + if meta_has_macro "${meta}" && [ "${1%%:*}" != "L" ]; then > + printf "${cast}(${arg})" > + else > + printf "${cast}${arg}" > + fi > [ "$#" -gt 1 ] && printf ", " > shift; > done > @@ -40,10 +45,24 @@ gen_proto_order_variant() > local atomic="$1"; shift > local int="$1"; shift > > - local ret="$(gen_ret_type "${meta}" "long")" > - local params="$(gen_params "long" "atomic_long" "$@")" > - local argscast="$(gen_args_cast "${int}" "${atomic}" "$@")" > - local retstmt="$(gen_ret_stmt "${meta}")" > + local argscast="$(gen_args_cast "${meta}" "${int}" "${atomic}" "$@")" > + > + if meta_has_macro "${meta}"; then > + > + local args="$(gen_args "$@")" > + > +cat <<EOF > +#define arch_atomic_long_${name}(${args}) \\ > +({ \\ > + arch_${atomic}_${name}(${argscast}) \\ > +}) > + > +EOF > + else > + > + local ret="$(gen_ret_type "${meta}" "long")" > + local params="$(gen_params "long" "atomic_long" "$@")" > + local retstmt="$(gen_ret_stmt "${meta}")" > > cat <<EOF > static __always_inline ${ret} > @@ -53,6 +72,7 @@ arch_atomic_long_${name}(${params}) > } > > EOF > + fi > } > > cat << EOF > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC][PATCH 1/5] atomic: Introduce atomic_{inc,dec,dec_and_test}_ofl() 2021-12-09 12:42 ` Mark Rutland @ 2021-12-09 13:34 ` Peter Zijlstra 0 siblings, 0 replies; 17+ messages in thread From: Peter Zijlstra @ 2021-12-09 13:34 UTC (permalink / raw) To: Mark Rutland Cc: will, boqun.feng, linux-kernel, x86, elver, keescook, hch, torvalds, axboe On Thu, Dec 09, 2021 at 12:42:47PM +0000, Mark Rutland wrote: > On Wed, Dec 08, 2021 at 07:36:56PM +0100, Peter Zijlstra wrote: > > In order to facilitate architecture support for refcount_t, introduce > > a number of new atomic primitives that have a uaccess style exception > > for overflow. > > > > Notably: > > > > atomic_inc_ofl(v, Label) -- increment and goto Label when > > v is zero or negative. > > > > atomic_dec_ofl(v, Label) -- decrement and goto Label when > > the result is zero or negative > > > > atomic_dec_and_test_ofl(v, Label) -- decrement and return true when > > the result is zero and goto Label > > when the result is negative > > Just to check, atomic_inc_ofl() tests the *old* value of `v`, and the other > cases check the *new* value of `v`? > > For clarity, in the descriptions it might be worth: > > s/v/the old value of v/ > s/the result/the new value of v/ > > ... which I think makes that clearer. Right, I'll clarify. > > Since the GCC 'Labels as Values' extention doesn't allow having the > > goto in an inline function, these new 'functions' must in fact be > > implemented as macro magic. > > Oh; fun... :( Yeah, I tried all sorta things, it's all >.< close to working but then GCC refuses to do the sensible thing. > > This meant extending the atomic generation scripts to deal with > > wrapping macros instead of inline functions. Since > > xchg/cmpxchg/try_cmpxchg were already macro magic, there was existant > > code for that. While extending/improving that a few latent > > 'instrumentation' bugs were uncovered and 'accidentally' fixed. > > I assume for non-RFC we can split that out into a preparatory patch. :) Sure, I can split it in two; one add the infra and fix bugs and two introduce the new ops. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC][PATCH 2/5] refcount: Use atomic_*_ofl() 2021-12-08 18:36 [RFC][PATCH 0/5] refcount: Improve code-gen Peter Zijlstra 2021-12-08 18:36 ` [RFC][PATCH 1/5] atomic: Introduce atomic_{inc,dec,dec_and_test}_ofl() Peter Zijlstra @ 2021-12-08 18:36 ` Peter Zijlstra 2021-12-08 19:19 ` Peter Zijlstra 2021-12-09 13:17 ` Mark Rutland 2021-12-08 18:36 ` [RFC][PATCH 3/5] refcount: Improve out-of-line code-gen Peter Zijlstra ` (3 subsequent siblings) 5 siblings, 2 replies; 17+ messages in thread From: Peter Zijlstra @ 2021-12-08 18:36 UTC (permalink / raw) To: will, boqun.feng Cc: linux-kernel, x86, peterz, mark.rutland, elver, keescook, hch, torvalds Use the shiny new atomic_*_ofl() functions in order to have better code-gen. Notably refcount_inc() case no longer distinguishes between inc-from-zero and inc-negative in the fast path, this improves code-gen: 4b88: b8 01 00 00 00 mov $0x1,%eax 4b8d: f0 0f c1 43 28 lock xadd %eax,0x28(%rbx) 4b92: 85 c0 test %eax,%eax 4b94: 74 1b je 4bb1 <alloc_perf_context+0xf1> 4b96: 8d 50 01 lea 0x1(%rax),%edx 4b99: 09 c2 or %eax,%edx 4b9b: 78 20 js 4bbd <alloc_perf_context+0xfd> to: 4768: b8 01 00 00 00 mov $0x1,%eax 476d: f0 0f c1 43 28 lock xadd %eax,0x28(%rbx) 4772: 85 c0 test %eax,%eax 4774: 7e 14 jle 478a <alloc_perf_context+0xea> without sacrificing on functionality; the only thing that suffers is the reported error condition, which might now 'spuriously' report 'saturated' instead of 'inc-from-zero'. refcount_dec_and_test() is also improved: aa40: b8 ff ff ff ff mov $0xffffffff,%eax aa45: f0 0f c1 07 lock xadd %eax,(%rdi) aa49: 83 f8 01 cmp $0x1,%eax aa4c: 74 05 je aa53 <ring_buffer_put+0x13> aa4e: 85 c0 test %eax,%eax aa50: 7e 1e jle aa70 <ring_buffer_put+0x30> aa52: c3 ret to: a980: b8 ff ff ff ff mov $0xffffffff,%eax a985: f0 0f c1 07 lock xadd %eax,(%rdi) a989: 83 e8 01 sub $0x1,%eax a98c: 78 20 js a9ae <ring_buffer_put+0x2e> a98e: 74 01 je a991 <ring_buffer_put+0x11> a990: c3 ret XXX atomic_dec_and_test_ofl() is strictly stronger ordered than refcount_dec_and_test() is defined -- Power and Arrghh64 ? Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- include/linux/refcount.h | 15 ++++++++++++--- lib/refcount.c | 5 +++++ 2 files changed, 17 insertions(+), 3 deletions(-) --- a/include/linux/refcount.h +++ b/include/linux/refcount.h @@ -264,7 +264,10 @@ static inline void __refcount_inc(refcou */ static inline void refcount_inc(refcount_t *r) { - __refcount_inc(r, NULL); + atomic_inc_ofl(&r->refs, Eoverflow); + return; +Eoverflow: + refcount_warn_saturate(r, REFCOUNT_ADD_OVF); } static inline __must_check bool __refcount_sub_and_test(int i, refcount_t *r, int *oldp) @@ -330,7 +333,10 @@ static inline __must_check bool __refcou */ static inline __must_check bool refcount_dec_and_test(refcount_t *r) { - return __refcount_dec_and_test(r, NULL); + return atomic_dec_and_test_ofl(&r->refs, Eoverflow); +Eoverflow: + refcount_warn_saturate(r, REFCOUNT_SUB_UAF); + return false; } static inline void __refcount_dec(refcount_t *r, int *oldp) @@ -356,7 +362,10 @@ static inline void __refcount_dec(refcou */ static inline void refcount_dec(refcount_t *r) { - __refcount_dec(r, NULL); + atomic_dec_ofl(&r->refs, Eoverflow); + return; +Eoverflow: + refcount_warn_saturate(r, REFCOUNT_DEC_LEAK); } extern __must_check bool refcount_dec_if_one(refcount_t *r); --- a/lib/refcount.c +++ b/lib/refcount.c @@ -12,8 +12,13 @@ void refcount_warn_saturate(refcount_t *r, enum refcount_saturation_type t) { + int old = refcount_read(r); refcount_set(r, REFCOUNT_SATURATED); + /* racy; who cares */ + if (old == 1 && t == REFCOUNT_ADD_OVF) + t = REFCOUNT_ADD_UAF; + switch (t) { case REFCOUNT_ADD_NOT_ZERO_OVF: REFCOUNT_WARN("saturated; leaking memory"); ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC][PATCH 2/5] refcount: Use atomic_*_ofl() 2021-12-08 18:36 ` [RFC][PATCH 2/5] refcount: Use atomic_*_ofl() Peter Zijlstra @ 2021-12-08 19:19 ` Peter Zijlstra 2021-12-08 20:56 ` Peter Zijlstra 2021-12-09 13:17 ` Mark Rutland 1 sibling, 1 reply; 17+ messages in thread From: Peter Zijlstra @ 2021-12-08 19:19 UTC (permalink / raw) To: will, boqun.feng Cc: linux-kernel, x86, mark.rutland, elver, keescook, hch, torvalds On Wed, Dec 08, 2021 at 07:36:57PM +0100, Peter Zijlstra wrote: > Use the shiny new atomic_*_ofl() functions in order to have better > code-gen. > *sigh*, so I forgot to adjust the other primitives to the slightly tweaked range of these new functions.. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC][PATCH 2/5] refcount: Use atomic_*_ofl() 2021-12-08 19:19 ` Peter Zijlstra @ 2021-12-08 20:56 ` Peter Zijlstra 0 siblings, 0 replies; 17+ messages in thread From: Peter Zijlstra @ 2021-12-08 20:56 UTC (permalink / raw) To: will, boqun.feng Cc: linux-kernel, x86, mark.rutland, elver, keescook, hch, torvalds On Wed, Dec 08, 2021 at 08:19:35PM +0100, Peter Zijlstra wrote: > On Wed, Dec 08, 2021 at 07:36:57PM +0100, Peter Zijlstra wrote: > > Use the shiny new atomic_*_ofl() functions in order to have better > > code-gen. > > > > *sigh*, so I forgot to adjust the other primitives to the slightly > tweaked range of these new functions.. Looks like it all just works. The add/sub stuff needs to check before and after anyway, and it saturing slightly earlier isn't a problem. The only difference I found was, so I'll just fold that. --- a/include/linux/refcount.h +++ b/include/linux/refcount.h @@ -346,7 +346,7 @@ static inline void __refcount_dec(refcou if (oldp) *oldp = old; - if (unlikely(old <= 1)) + if (unlikely(old - 1 <= 0)) refcount_warn_saturate(r, REFCOUNT_DEC_LEAK); } ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC][PATCH 2/5] refcount: Use atomic_*_ofl() 2021-12-08 18:36 ` [RFC][PATCH 2/5] refcount: Use atomic_*_ofl() Peter Zijlstra 2021-12-08 19:19 ` Peter Zijlstra @ 2021-12-09 13:17 ` Mark Rutland 2021-12-09 17:00 ` Mark Rutland 1 sibling, 1 reply; 17+ messages in thread From: Mark Rutland @ 2021-12-09 13:17 UTC (permalink / raw) To: Peter Zijlstra Cc: will, boqun.feng, linux-kernel, x86, elver, keescook, hch, torvalds On Wed, Dec 08, 2021 at 07:36:57PM +0100, Peter Zijlstra wrote: > Use the shiny new atomic_*_ofl() functions in order to have better > code-gen. > > Notably refcount_inc() case no longer distinguishes between > inc-from-zero and inc-negative in the fast path, this improves > code-gen: > > 4b88: b8 01 00 00 00 mov $0x1,%eax > 4b8d: f0 0f c1 43 28 lock xadd %eax,0x28(%rbx) > 4b92: 85 c0 test %eax,%eax > 4b94: 74 1b je 4bb1 <alloc_perf_context+0xf1> > 4b96: 8d 50 01 lea 0x1(%rax),%edx > 4b99: 09 c2 or %eax,%edx > 4b9b: 78 20 js 4bbd <alloc_perf_context+0xfd> > > to: > > 4768: b8 01 00 00 00 mov $0x1,%eax > 476d: f0 0f c1 43 28 lock xadd %eax,0x28(%rbx) > 4772: 85 c0 test %eax,%eax > 4774: 7e 14 jle 478a <alloc_perf_context+0xea> For comparison, I generated the same for arm64 below with kernel.org crosstool GCC 11.1.0 and defconfig. For arm64 there's an existing sub-optimiality for inc/dec where the register for `1` or `-1` gets generated with a `MOV;MOV` chain or `MOV;NEG` chain rather than a single `MOV` in either case. I think taht's due to the way we build LSE/LL-SC variants of add() and build a common inc() atop, and the compiler just loses the opportunity to constant-fold. I'll take a look at how to make that neater. Regardless, the code goes from: 51f4: 52800024 mov w4, #0x1 // #1 ... 5224: 2a0403e1 mov w1, w4 5228: b8210001 ldadd w1, w1, [x0] 522c: 34000261 cbz w1, 5278 <alloc_perf_context+0xf8> 5230: 11000422 add w2, w1, #0x1 5234: 2a010041 orr w1, w2, w1 5238: 37f80181 tbnz w1, #31, 5268 <alloc_perf_context+0xe8> to: 40e8: 52800024 mov w4, #0x1 // #1 ... 4118: 2a0403e1 mov w1, w4 411c: b8e10001 ldaddal w1, w1, [x0] 4120: 7100003f cmp w1, #0x0 4124: 5400018d b.le 4154 <alloc_perf_context+0xe0> > without sacrificing on functionality; the only thing that suffers is > the reported error condition, which might now 'spuriously' report > 'saturated' instead of 'inc-from-zero'. > > refcount_dec_and_test() is also improved: > > aa40: b8 ff ff ff ff mov $0xffffffff,%eax > aa45: f0 0f c1 07 lock xadd %eax,(%rdi) > aa49: 83 f8 01 cmp $0x1,%eax > aa4c: 74 05 je aa53 <ring_buffer_put+0x13> > aa4e: 85 c0 test %eax,%eax > aa50: 7e 1e jle aa70 <ring_buffer_put+0x30> > aa52: c3 ret > > to: > > a980: b8 ff ff ff ff mov $0xffffffff,%eax > a985: f0 0f c1 07 lock xadd %eax,(%rdi) > a989: 83 e8 01 sub $0x1,%eax > a98c: 78 20 js a9ae <ring_buffer_put+0x2e> > a98e: 74 01 je a991 <ring_buffer_put+0x11> > a990: c3 ret For arm64 (with your fixlet applied) that goes from: c42c: 52800021 mov w1, #0x1 // #1 c430: 4b0103e1 neg w1, w1 c434: b8610001 ldaddl w1, w1, [x0] c438: 7100043f cmp w1, #0x1 c43c: 54000140 b.eq c464 <ring_buffer_put+0x50> // b.none c440: 7100003f cmp w1, #0x0 c444: 5400028d b.le c494 <ring_buffer_put+0x80> to: c3dc: 52800021 mov w1, #0x1 // #1 c3e0: 4b0103e1 neg w1, w1 c3e4: b8e10002 ldaddal w1, w2, [x0] c3e8: 0b020021 add w1, w1, w2 c3ec: 7100003f cmp w1, #0x0 c3f0: 5400012b b.lt c414 <ring_buffer_put+0x50> // b.tstop c3f4: 540001a0 b.eq c428 <ring_buffer_put+0x64> // b.none I think the add here is due to the change in your fixlet: - if (unlikely(old <= 1)) + if (unlikely(old - 1 <= 0)) > XXX atomic_dec_and_test_ofl() is strictly stronger ordered than > refcount_dec_and_test() is defined -- Power and Arrghh64 ? I'll leave the ordering to Will. Thanks, Mark. > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > --- > include/linux/refcount.h | 15 ++++++++++++--- > lib/refcount.c | 5 +++++ > 2 files changed, 17 insertions(+), 3 deletions(-) > > --- a/include/linux/refcount.h > +++ b/include/linux/refcount.h > @@ -264,7 +264,10 @@ static inline void __refcount_inc(refcou > */ > static inline void refcount_inc(refcount_t *r) > { > - __refcount_inc(r, NULL); > + atomic_inc_ofl(&r->refs, Eoverflow); > + return; > +Eoverflow: > + refcount_warn_saturate(r, REFCOUNT_ADD_OVF); > } > > static inline __must_check bool __refcount_sub_and_test(int i, refcount_t *r, int *oldp) > @@ -330,7 +333,10 @@ static inline __must_check bool __refcou > */ > static inline __must_check bool refcount_dec_and_test(refcount_t *r) > { > - return __refcount_dec_and_test(r, NULL); > + return atomic_dec_and_test_ofl(&r->refs, Eoverflow); > +Eoverflow: > + refcount_warn_saturate(r, REFCOUNT_SUB_UAF); > + return false; > } > > static inline void __refcount_dec(refcount_t *r, int *oldp) > @@ -356,7 +362,10 @@ static inline void __refcount_dec(refcou > */ > static inline void refcount_dec(refcount_t *r) > { > - __refcount_dec(r, NULL); > + atomic_dec_ofl(&r->refs, Eoverflow); > + return; > +Eoverflow: > + refcount_warn_saturate(r, REFCOUNT_DEC_LEAK); > } > > extern __must_check bool refcount_dec_if_one(refcount_t *r); > --- a/lib/refcount.c > +++ b/lib/refcount.c > @@ -12,8 +12,13 @@ > > void refcount_warn_saturate(refcount_t *r, enum refcount_saturation_type t) > { > + int old = refcount_read(r); > refcount_set(r, REFCOUNT_SATURATED); > > + /* racy; who cares */ > + if (old == 1 && t == REFCOUNT_ADD_OVF) > + t = REFCOUNT_ADD_UAF; > + > switch (t) { > case REFCOUNT_ADD_NOT_ZERO_OVF: > REFCOUNT_WARN("saturated; leaking memory"); > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC][PATCH 2/5] refcount: Use atomic_*_ofl() 2021-12-09 13:17 ` Mark Rutland @ 2021-12-09 17:00 ` Mark Rutland 0 siblings, 0 replies; 17+ messages in thread From: Mark Rutland @ 2021-12-09 17:00 UTC (permalink / raw) To: Peter Zijlstra Cc: will, boqun.feng, linux-kernel, x86, elver, keescook, hch, torvalds On Thu, Dec 09, 2021 at 01:17:33PM +0000, Mark Rutland wrote: > On Wed, Dec 08, 2021 at 07:36:57PM +0100, Peter Zijlstra wrote: > > Use the shiny new atomic_*_ofl() functions in order to have better > > code-gen. > > > > Notably refcount_inc() case no longer distinguishes between > > inc-from-zero and inc-negative in the fast path, this improves > > code-gen: > > > > 4b88: b8 01 00 00 00 mov $0x1,%eax > > 4b8d: f0 0f c1 43 28 lock xadd %eax,0x28(%rbx) > > 4b92: 85 c0 test %eax,%eax > > 4b94: 74 1b je 4bb1 <alloc_perf_context+0xf1> > > 4b96: 8d 50 01 lea 0x1(%rax),%edx > > 4b99: 09 c2 or %eax,%edx > > 4b9b: 78 20 js 4bbd <alloc_perf_context+0xfd> > > > > to: > > > > 4768: b8 01 00 00 00 mov $0x1,%eax > > 476d: f0 0f c1 43 28 lock xadd %eax,0x28(%rbx) > > 4772: 85 c0 test %eax,%eax > > 4774: 7e 14 jle 478a <alloc_perf_context+0xea> > > For comparison, I generated the same for arm64 below with kernel.org crosstool > GCC 11.1.0 and defconfig. > > For arm64 there's an existing sub-optimiality for inc/dec where the register > for `1` or `-1` gets generated with a `MOV;MOV` chain or `MOV;NEG` chain rather > than a single `MOV` in either case. I think taht's due to the way we build > LSE/LL-SC variants of add() and build a common inc() atop, and the compiler > just loses the opportunity to constant-fold. I'll take a look at how to make > that neater. With some improvement's to arm64's LSE atomics, that becomes a comparable sequence to x86's: 2df8: 52800021 mov w1, #0x1 // #1 ... 2e20: b8e10002 ldaddal w1, w2, [x0] 2e24: 7100005f cmp w2, #0x0 2e28: 5400012d b.le 2e4c <alloc_perf_context+0xc8> > > without sacrificing on functionality; the only thing that suffers is > > the reported error condition, which might now 'spuriously' report > > 'saturated' instead of 'inc-from-zero'. > > > > refcount_dec_and_test() is also improved: > > > > aa40: b8 ff ff ff ff mov $0xffffffff,%eax > > aa45: f0 0f c1 07 lock xadd %eax,(%rdi) > > aa49: 83 f8 01 cmp $0x1,%eax > > aa4c: 74 05 je aa53 <ring_buffer_put+0x13> > > aa4e: 85 c0 test %eax,%eax > > aa50: 7e 1e jle aa70 <ring_buffer_put+0x30> > > aa52: c3 ret > > > > to: > > > > a980: b8 ff ff ff ff mov $0xffffffff,%eax > > a985: f0 0f c1 07 lock xadd %eax,(%rdi) > > a989: 83 e8 01 sub $0x1,%eax > > a98c: 78 20 js a9ae <ring_buffer_put+0x2e> > > a98e: 74 01 je a991 <ring_buffer_put+0x11> > > a990: c3 ret Likewise I can get the arm64 equivalent down to: bebc: 12800001 mov w1, #0xffffffff // #-1 ... becc: b8e10001 ldaddal w1, w1, [x0] bed0: 71000421 subs w1, w1, #0x1 bed4: 540000c4 b.mi beec <ring_buffer_put+0x3c> // b.first bed8: 54000120 b.eq befc <ring_buffer_put+0x4c> // b.none I've pushed my WIP patches to: https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/atomics/improvements git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/atomics/improvements ... and I'll try to get those cleaned up and posted soon. Thanks, Mark. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC][PATCH 3/5] refcount: Improve out-of-line code-gen 2021-12-08 18:36 [RFC][PATCH 0/5] refcount: Improve code-gen Peter Zijlstra 2021-12-08 18:36 ` [RFC][PATCH 1/5] atomic: Introduce atomic_{inc,dec,dec_and_test}_ofl() Peter Zijlstra 2021-12-08 18:36 ` [RFC][PATCH 2/5] refcount: Use atomic_*_ofl() Peter Zijlstra @ 2021-12-08 18:36 ` Peter Zijlstra 2021-12-09 8:33 ` Peter Zijlstra 2021-12-08 18:36 ` [RFC][PATCH 4/5] atomic,x86: Implement atomic_dec_and_test_ofl() Peter Zijlstra ` (2 subsequent siblings) 5 siblings, 1 reply; 17+ messages in thread From: Peter Zijlstra @ 2021-12-08 18:36 UTC (permalink / raw) To: will, boqun.feng Cc: linux-kernel, x86, peterz, mark.rutland, elver, keescook, hch, torvalds Allow a number of ops to tail-call refcount_warn_saturate() in order to generate smaller out-of-line code. text data bss dec hex filename 97341 4985 1116 103442 19412 defconfig-build/kernel/events/core.o 97299 4985 1116 103400 193e8 defconfig-build/kernel/events/core.o Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- include/linux/refcount.h | 9 ++++----- lib/refcount.c | 4 +++- 2 files changed, 7 insertions(+), 6 deletions(-) --- a/include/linux/refcount.h +++ b/include/linux/refcount.h @@ -124,7 +124,7 @@ enum refcount_saturation_type { REFCOUNT_DEC_LEAK, }; -void refcount_warn_saturate(refcount_t *r, enum refcount_saturation_type t); +bool refcount_warn_saturate(refcount_t *r, enum refcount_saturation_type t); /** * refcount_set - set a refcount's value @@ -160,7 +160,7 @@ static inline __must_check bool __refcou *oldp = old; if (unlikely(old < 0 || old + i < 0)) - refcount_warn_saturate(r, REFCOUNT_ADD_NOT_ZERO_OVF); + return refcount_warn_saturate(r, REFCOUNT_ADD_NOT_ZERO_OVF); return old; } @@ -283,7 +283,7 @@ static inline __must_check bool __refcou } if (unlikely(old < 0 || old - i < 0)) - refcount_warn_saturate(r, REFCOUNT_SUB_UAF); + return refcount_warn_saturate(r, REFCOUNT_SUB_UAF); return false; } @@ -335,8 +335,7 @@ static inline __must_check bool refcount { return atomic_dec_and_test_ofl(&r->refs, Eoverflow); Eoverflow: - refcount_warn_saturate(r, REFCOUNT_SUB_UAF); - return false; + return refcount_warn_saturate(r, REFCOUNT_SUB_UAF); } static inline void __refcount_dec(refcount_t *r, int *oldp) --- a/lib/refcount.c +++ b/lib/refcount.c @@ -10,7 +10,7 @@ #define REFCOUNT_WARN(str) WARN_ONCE(1, "refcount_t: " str ".\n") -void refcount_warn_saturate(refcount_t *r, enum refcount_saturation_type t) +bool refcount_warn_saturate(refcount_t *r, enum refcount_saturation_type t) { int old = refcount_read(r); refcount_set(r, REFCOUNT_SATURATED); @@ -38,6 +38,8 @@ void refcount_warn_saturate(refcount_t * default: REFCOUNT_WARN("unknown saturation event!?"); } + + return false; } EXPORT_SYMBOL(refcount_warn_saturate); ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC][PATCH 3/5] refcount: Improve out-of-line code-gen 2021-12-08 18:36 ` [RFC][PATCH 3/5] refcount: Improve out-of-line code-gen Peter Zijlstra @ 2021-12-09 8:33 ` Peter Zijlstra 2021-12-09 17:51 ` Linus Torvalds 0 siblings, 1 reply; 17+ messages in thread From: Peter Zijlstra @ 2021-12-09 8:33 UTC (permalink / raw) To: will, boqun.feng Cc: linux-kernel, x86, mark.rutland, elver, keescook, hch, torvalds, axboe On Wed, Dec 08, 2021 at 07:36:58PM +0100, Peter Zijlstra wrote: > Allow a number of ops to tail-call refcount_warn_saturate() in order > to generate smaller out-of-line code. > > text data bss dec hex filename > 97341 4985 1116 103442 19412 defconfig-build/kernel/events/core.o > 97299 4985 1116 103400 193e8 defconfig-build/kernel/events/core.o > This patch also makes GCC do worse code-gen on the fast path, so I'll drop it. For some obscure raisin it causes this: ring_buffer_put: a950: f0 ff 0f lock decl (%rdi) a953: 7c 20 jl a975 <ring_buffer_put+0x25> a955: 74 01 je a958 <ring_buffer_put+0x8> a957: c3 ret ring_buffer_put: a940: 53 push %rbx a941: 48 89 fb mov %rdi,%rbx a944: f0 ff 0f lock decl (%rdi) a947: 7c 04 jl a94d <ring_buffer_put+0xd> a949: 74 10 je a95b <ring_buffer_put+0x1b> a94b: 5b pop %rbx a94c: c3 ret Which is just unacceptible... ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC][PATCH 3/5] refcount: Improve out-of-line code-gen 2021-12-09 8:33 ` Peter Zijlstra @ 2021-12-09 17:51 ` Linus Torvalds 0 siblings, 0 replies; 17+ messages in thread From: Linus Torvalds @ 2021-12-09 17:51 UTC (permalink / raw) To: Peter Zijlstra Cc: Will Deacon, Boqun Feng, Linux Kernel Mailing List, the arch/x86 maintainers, Mark Rutland, Marco Elver, Kees Cook, Christoph Hellwig, Jens Axboe On Thu, Dec 9, 2021 at 12:33 AM Peter Zijlstra <peterz@infradead.org> wrote: > > For some obscure raisin it causes this: [snip] Looks like the code generation at one point thought that it needs to get the return value from the slow-path function call, and by the time it had optimized that away it had already ended up with enough temporaries that it needed to spill things into %rbx. That looks like a fairly "internal to gcc" thing, but I think dropping that patch is indeed the right thing to do. When you made it do return refcount_warn_saturate(r, ..); it now means that those inline functions don't obviously always return 'false' any more, so the compiler did end up needing a variable for the return value at one point, and it really was a much more complex decision. After it has inlined that top-level function, the return value is not visible to the compiler because it doesn't see that refcount_warn_saturate() always returns false. The fact that quite often that whole refcount_warn_saturate() call ended up then being in a cold part of the code, and could then be optimized to a tailcall if that was the last thing generated, doesn't end up helping in the general case, and instead only hurts: the compiler doesn't see what the return value is for the warning case, so it might end up doing other things in the place that the function was inlined into. I think the original patch was likely a mis-optimization triggered by your test-case being a function that *only* did that refcount_add_not_zero(), and returned the value. It didn't actually have other code that then depended on the return value of it. Linus ^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC][PATCH 4/5] atomic,x86: Implement atomic_dec_and_test_ofl() 2021-12-08 18:36 [RFC][PATCH 0/5] refcount: Improve code-gen Peter Zijlstra ` (2 preceding siblings ...) 2021-12-08 18:36 ` [RFC][PATCH 3/5] refcount: Improve out-of-line code-gen Peter Zijlstra @ 2021-12-08 18:36 ` Peter Zijlstra 2021-12-08 18:37 ` [RFC][PATCH 5/5] atomic: Document the atomic_{}_ofl() functions Peter Zijlstra 2021-12-09 8:25 ` [RFC][PATCH 0/5] refcount: Improve code-gen Peter Zijlstra 5 siblings, 0 replies; 17+ messages in thread From: Peter Zijlstra @ 2021-12-08 18:36 UTC (permalink / raw) To: will, boqun.feng Cc: linux-kernel, x86, peterz, mark.rutland, elver, keescook, hch, torvalds Provide a better implementation of atomic_{inc,dec_and_test}_ofl() by making use of the atomic-op condition codes. This further improves the fast path code: a980: b8 ff ff ff ff mov $0xffffffff,%eax a985: f0 0f c1 07 lock xadd %eax,(%rdi) a989: 83 e8 01 sub $0x1,%eax a98c: 78 20 js a9ae <ring_buffer_put+0x2e> a98e: 74 01 je a991 <ring_buffer_put+0x11> a990: c3 ret to: ab81: 48 89 fb mov %rdi,%rbx ab84: f0 ff 0f lock decl (%rdi) ab87: 7c 04 jl ab8d <ring_buffer_put+0xd> ab89: 74 10 je ab9b <ring_buffer_put+0x1b> ab8b: c3 ret Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- arch/x86/include/asm/atomic.h | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) --- a/arch/x86/include/asm/atomic.h +++ b/arch/x86/include/asm/atomic.h @@ -263,6 +263,29 @@ static __always_inline int arch_atomic_f } #define arch_atomic_fetch_xor arch_atomic_fetch_xor +#define arch_atomic_dec_ofl(_v, _label) \ + asm_volatile_goto(LOCK_PREFIX "decl %[var]\n\t" \ + "jle %l1" \ + : : [var] "m" ((_v)->counter) \ + : "memory" \ + : _label) + +#define arch_atomic_dec_and_test_ofl(_v, _label) \ +({ \ + __label__ __zero; \ + __label__ __out; \ + bool __ret = false; \ + asm_volatile_goto(LOCK_PREFIX "decl %[var]\n\t" \ + "jl %l2\n\t" \ + "je %l[__zero]" \ + : : [var] "m" ((_v)->counter) \ + : "memory" \ + : __zero, _label); \ + goto __out; \ +__zero: __ret = true; \ +__out: __ret; \ +}) + #ifdef CONFIG_X86_32 # include <asm/atomic64_32.h> #else ^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC][PATCH 5/5] atomic: Document the atomic_{}_ofl() functions 2021-12-08 18:36 [RFC][PATCH 0/5] refcount: Improve code-gen Peter Zijlstra ` (3 preceding siblings ...) 2021-12-08 18:36 ` [RFC][PATCH 4/5] atomic,x86: Implement atomic_dec_and_test_ofl() Peter Zijlstra @ 2021-12-08 18:37 ` Peter Zijlstra 2021-12-09 8:25 ` [RFC][PATCH 0/5] refcount: Improve code-gen Peter Zijlstra 5 siblings, 0 replies; 17+ messages in thread From: Peter Zijlstra @ 2021-12-08 18:37 UTC (permalink / raw) To: will, boqun.feng Cc: linux-kernel, x86, peterz, mark.rutland, elver, keescook, hch, torvalds Them special, them cause confusion, them needing docs for reading. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- Documentation/atomic_t.txt | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) --- a/Documentation/atomic_t.txt +++ b/Documentation/atomic_t.txt @@ -44,6 +44,10 @@ The 'full' API consists of (atomic64_ an atomic_add_unless(), atomic_inc_not_zero() atomic_sub_and_test(), atomic_dec_and_test() +Reference count with overflow (as used by refcount_t): + + atomic_inc_ofl(), atomic_dec_ofl() + atomic_dec_and_test_ofl() Misc: @@ -157,6 +161,22 @@ atomic variable) can be fully ordered an visible. +Overflow ops: + +The atomic_{}_ofl() ops are similar to their !_ofl() bretheren with the +notable exception that they take a label as their final argument to jump to +when the atomic op overflows. + +Overflow for inc is zero-or-negative on the value prior to increment. +Overflow for dec is zero-or-negative on the value after the decrement. +Overflow for dec_and_test is negative on the value after the decrement. + +These semantics match the reference count use-case (for which they were +created). Specifically incrementing from zero is a failure because 0 means the +object is freed (IOW use-after-free). decrementing to zero is a failure +because it goes undetected (see dec_and_test) and the object would leak. + + ORDERING (go read memory-barriers.txt first) -------- ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC][PATCH 0/5] refcount: Improve code-gen 2021-12-08 18:36 [RFC][PATCH 0/5] refcount: Improve code-gen Peter Zijlstra ` (4 preceding siblings ...) 2021-12-08 18:37 ` [RFC][PATCH 5/5] atomic: Document the atomic_{}_ofl() functions Peter Zijlstra @ 2021-12-09 8:25 ` Peter Zijlstra 2021-12-09 16:19 ` Jens Axboe 5 siblings, 1 reply; 17+ messages in thread From: Peter Zijlstra @ 2021-12-09 8:25 UTC (permalink / raw) To: will, boqun.feng Cc: linux-kernel, x86, mark.rutland, elver, keescook, hch, torvalds, axboe On Wed, Dec 08, 2021 at 07:36:55PM +0100, Peter Zijlstra wrote: > Hi, > > Improves the refcount_t code-gen; I've still got to go through the latest thing > Linus suggested, but figured I should get these patches out to see if there's > other concerns etc.. > Bah, I forgot to Cc Jens, let me bounce him the lot. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC][PATCH 0/5] refcount: Improve code-gen 2021-12-09 8:25 ` [RFC][PATCH 0/5] refcount: Improve code-gen Peter Zijlstra @ 2021-12-09 16:19 ` Jens Axboe 2021-12-09 16:51 ` Peter Zijlstra 0 siblings, 1 reply; 17+ messages in thread From: Jens Axboe @ 2021-12-09 16:19 UTC (permalink / raw) To: Peter Zijlstra, will, boqun.feng Cc: linux-kernel, x86, mark.rutland, elver, keescook, hch, torvalds On 12/9/21 1:25 AM, Peter Zijlstra wrote: > On Wed, Dec 08, 2021 at 07:36:55PM +0100, Peter Zijlstra wrote: >> Hi, >> >> Improves the refcount_t code-gen; I've still got to go through the latest thing >> Linus suggested, but figured I should get these patches out to see if there's >> other concerns etc.. >> > > Bah, I forgot to Cc Jens, let me bounce him the lot. Traveling the next few days, just put me on v2 and I should have time to look at this start next week. Bouncing only helps for initial messages, I'll have to do dig for followups :-) -- Jens Axboe ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC][PATCH 0/5] refcount: Improve code-gen 2021-12-09 16:19 ` Jens Axboe @ 2021-12-09 16:51 ` Peter Zijlstra 0 siblings, 0 replies; 17+ messages in thread From: Peter Zijlstra @ 2021-12-09 16:51 UTC (permalink / raw) To: Jens Axboe Cc: will, boqun.feng, linux-kernel, x86, mark.rutland, elver, keescook, hch, torvalds On Thu, Dec 09, 2021 at 09:19:58AM -0700, Jens Axboe wrote: > On 12/9/21 1:25 AM, Peter Zijlstra wrote: > > On Wed, Dec 08, 2021 at 07:36:55PM +0100, Peter Zijlstra wrote: > >> Hi, > >> > >> Improves the refcount_t code-gen; I've still got to go through the latest thing > >> Linus suggested, but figured I should get these patches out to see if there's > >> other concerns etc.. > >> > > > > Bah, I forgot to Cc Jens, let me bounce him the lot. > > Traveling the next few days, just put me on v2 and I should have time to > look at this start next week. Bouncing only helps for initial messages, > I'll have to do dig for followups :-) Hmm.. I tagged the whole thread and did tag-bounce. Anyway, I'll be sure to copy you on v2. Thanks! ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2021-12-09 17:53 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-12-08 18:36 [RFC][PATCH 0/5] refcount: Improve code-gen Peter Zijlstra 2021-12-08 18:36 ` [RFC][PATCH 1/5] atomic: Introduce atomic_{inc,dec,dec_and_test}_ofl() Peter Zijlstra 2021-12-09 12:42 ` Mark Rutland 2021-12-09 13:34 ` Peter Zijlstra 2021-12-08 18:36 ` [RFC][PATCH 2/5] refcount: Use atomic_*_ofl() Peter Zijlstra 2021-12-08 19:19 ` Peter Zijlstra 2021-12-08 20:56 ` Peter Zijlstra 2021-12-09 13:17 ` Mark Rutland 2021-12-09 17:00 ` Mark Rutland 2021-12-08 18:36 ` [RFC][PATCH 3/5] refcount: Improve out-of-line code-gen Peter Zijlstra 2021-12-09 8:33 ` Peter Zijlstra 2021-12-09 17:51 ` Linus Torvalds 2021-12-08 18:36 ` [RFC][PATCH 4/5] atomic,x86: Implement atomic_dec_and_test_ofl() Peter Zijlstra 2021-12-08 18:37 ` [RFC][PATCH 5/5] atomic: Document the atomic_{}_ofl() functions Peter Zijlstra 2021-12-09 8:25 ` [RFC][PATCH 0/5] refcount: Improve code-gen Peter Zijlstra 2021-12-09 16:19 ` Jens Axboe 2021-12-09 16:51 ` Peter Zijlstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).