linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg
@ 2023-04-05 14:17 Uros Bizjak
  2023-04-05 14:17 ` [PATCH v2 1/5] locking/atomic: Add generic try_cmpxchg{,64}_local support Uros Bizjak
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Uros Bizjak @ 2023-04-05 14:17 UTC (permalink / raw)
  To: linux-alpha, loongarch, linux-mips, linuxppc-dev, x86,
	linux-arch, linux-perf-users, linux-kernel
  Cc: Uros Bizjak, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Huacai Chen, WANG Xuerui, Thomas Bogendoerfer, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Arnd Bergmann,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Ian Rogers,
	Will Deacon, Boqun Feng, Jiaxun Yang, Jun Yi

Add generic and target specific support for local{,64}_try_cmpxchg
and wire up support for all targets that use local_t infrastructure.

The patch enables x86 targets to emit special instruction for
local_try_cmpxchg and also local64_try_cmpxchg for x86_64.

The last patch changes __perf_output_begin in events/ring_buffer
to use new locking primitive and improves code from

     4b3:	48 8b 82 e8 00 00 00 	mov    0xe8(%rdx),%rax
     4ba:	48 8b b8 08 04 00 00 	mov    0x408(%rax),%rdi
     4c1:	8b 42 1c             	mov    0x1c(%rdx),%eax
     4c4:	48 8b 4a 28          	mov    0x28(%rdx),%rcx
     4c8:	85 c0                	test   %eax,%eax
     ...
     4ef:	48 89 c8             	mov    %rcx,%rax
     4f2:	48 0f b1 7a 28       	cmpxchg %rdi,0x28(%rdx)
     4f7:	48 39 c1             	cmp    %rax,%rcx
     4fa:	75 b7                	jne    4b3 <...>

to

     4b2:	48 8b 4a 28          	mov    0x28(%rdx),%rcx
     4b6:	48 8b 82 e8 00 00 00 	mov    0xe8(%rdx),%rax
     4bd:	48 8b b0 08 04 00 00 	mov    0x408(%rax),%rsi
     4c4:	8b 42 1c             	mov    0x1c(%rdx),%eax
     4c7:	85 c0                	test   %eax,%eax
     ...
     4d4:	48 89 c8             	mov    %rcx,%rax
     4d7:	48 0f b1 72 28       	cmpxchg %rsi,0x28(%rdx)
     4dc:	0f 85 d0 00 00 00    	jne    5b2 <...>
     ...
     5b2:	48 89 c1             	mov    %rax,%rcx
     5b5:	e9 fc fe ff ff       	jmp    4b6 <...>

Please note that in addition to removed compare, the load from
0x28(%rdx) gets moved out of the loop and the code is rearranged
according to likely/unlikely tags in the source.
---
v2:

Implement target specific support for local_try_cmpxchg and
local_cmpxchg using typed C wrappers that call their _local
counterpart and provide additional checking of their input
arguments.

Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Jun Yi <yijun@loongson.cn>

Uros Bizjak (5):
  locking/atomic: Add generic try_cmpxchg{,64}_local support
  locking/generic: Wire up local{,64}_try_cmpxchg
  locking/arch: Wire up local_try_cmpxchg
  locking/x86: Define arch_try_cmpxchg_local
  events: Illustrate the transition to local{,64}_try_cmpxchg

 arch/alpha/include/asm/local.h              | 12 +++++++++--
 arch/loongarch/include/asm/local.h          | 13 +++++++++--
 arch/mips/include/asm/local.h               | 13 +++++++++--
 arch/powerpc/include/asm/local.h            | 11 ++++++++++
 arch/x86/events/core.c                      |  9 ++++----
 arch/x86/include/asm/cmpxchg.h              |  6 ++++++
 arch/x86/include/asm/local.h                | 13 +++++++++--
 include/asm-generic/local.h                 |  1 +
 include/asm-generic/local64.h               | 12 ++++++++++-
 include/linux/atomic/atomic-arch-fallback.h | 24 ++++++++++++++++++++-
 include/linux/atomic/atomic-instrumented.h  | 20 ++++++++++++++++-
 kernel/events/ring_buffer.c                 |  5 +++--
 scripts/atomic/gen-atomic-fallback.sh       |  4 ++++
 scripts/atomic/gen-atomic-instrumented.sh   |  2 +-
 14 files changed, 126 insertions(+), 19 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/5] locking/atomic: Add generic try_cmpxchg{,64}_local support
  2023-04-05 14:17 [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg Uros Bizjak
@ 2023-04-05 14:17 ` Uros Bizjak
  2023-04-11 11:10   ` Mark Rutland
  2023-04-05 14:17 ` [PATCH v2 2/5] locking/generic: Wire up local{,64}_try_cmpxchg Uros Bizjak
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Uros Bizjak @ 2023-04-05 14:17 UTC (permalink / raw)
  To: linux-alpha, loongarch, linux-mips, linuxppc-dev, x86,
	linux-arch, linux-perf-users, linux-kernel
  Cc: Uros Bizjak, Will Deacon, Peter Zijlstra, Boqun Feng, Mark Rutland

Add generic support for try_cmpxchg{,64}_local and their falbacks.

These provides the generic try_cmpxchg_local family of functions
from the arch_ prefixed version, also adding explicit instrumentation.

Cc: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
---
 include/linux/atomic/atomic-arch-fallback.h | 24 ++++++++++++++++++++-
 include/linux/atomic/atomic-instrumented.h  | 20 ++++++++++++++++-
 scripts/atomic/gen-atomic-fallback.sh       |  4 ++++
 scripts/atomic/gen-atomic-instrumented.sh   |  2 +-
 4 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/include/linux/atomic/atomic-arch-fallback.h b/include/linux/atomic/atomic-arch-fallback.h
index 77bc5522e61c..36c92851cdee 100644
--- a/include/linux/atomic/atomic-arch-fallback.h
+++ b/include/linux/atomic/atomic-arch-fallback.h
@@ -217,6 +217,28 @@
 
 #endif /* arch_try_cmpxchg64_relaxed */
 
+#ifndef arch_try_cmpxchg_local
+#define arch_try_cmpxchg_local(_ptr, _oldp, _new) \
+({ \
+	typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+	___r = arch_cmpxchg_local((_ptr), ___o, (_new)); \
+	if (unlikely(___r != ___o)) \
+		*___op = ___r; \
+	likely(___r == ___o); \
+})
+#endif /* arch_try_cmpxchg_local */
+
+#ifndef arch_try_cmpxchg64_local
+#define arch_try_cmpxchg64_local(_ptr, _oldp, _new) \
+({ \
+	typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+	___r = arch_cmpxchg64_local((_ptr), ___o, (_new)); \
+	if (unlikely(___r != ___o)) \
+		*___op = ___r; \
+	likely(___r == ___o); \
+})
+#endif /* arch_try_cmpxchg64_local */
+
 #ifndef arch_atomic_read_acquire
 static __always_inline int
 arch_atomic_read_acquire(const atomic_t *v)
@@ -2456,4 +2478,4 @@ arch_atomic64_dec_if_positive(atomic64_t *v)
 #endif
 
 #endif /* _LINUX_ATOMIC_FALLBACK_H */
-// b5e87bdd5ede61470c29f7a7e4de781af3770f09
+// 1f49bd4895a4b7a5383906649027205c52ec80ab
diff --git a/include/linux/atomic/atomic-instrumented.h b/include/linux/atomic/atomic-instrumented.h
index 7a139ec030b0..14a9212cc987 100644
--- a/include/linux/atomic/atomic-instrumented.h
+++ b/include/linux/atomic/atomic-instrumented.h
@@ -2066,6 +2066,24 @@ atomic_long_dec_if_positive(atomic_long_t *v)
 	arch_sync_cmpxchg(__ai_ptr, __VA_ARGS__); \
 })
 
+#define try_cmpxchg_local(ptr, oldp, ...) \
+({ \
+	typeof(ptr) __ai_ptr = (ptr); \
+	typeof(oldp) __ai_oldp = (oldp); \
+	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
+	arch_try_cmpxchg_local(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+})
+
+#define try_cmpxchg64_local(ptr, oldp, ...) \
+({ \
+	typeof(ptr) __ai_ptr = (ptr); \
+	typeof(oldp) __ai_oldp = (oldp); \
+	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
+	arch_try_cmpxchg64_local(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+})
+
 #define cmpxchg_double(ptr, ...) \
 ({ \
 	typeof(ptr) __ai_ptr = (ptr); \
@@ -2083,4 +2101,4 @@ atomic_long_dec_if_positive(atomic_long_t *v)
 })
 
 #endif /* _LINUX_ATOMIC_INSTRUMENTED_H */
-// 764f741eb77a7ad565dc8d99ce2837d5542e8aee
+// 456e206c7e4e681126c482e4edcc6f46921ac731
diff --git a/scripts/atomic/gen-atomic-fallback.sh b/scripts/atomic/gen-atomic-fallback.sh
index 3a07695e3c89..6e853f0dad8d 100755
--- a/scripts/atomic/gen-atomic-fallback.sh
+++ b/scripts/atomic/gen-atomic-fallback.sh
@@ -225,6 +225,10 @@ for cmpxchg in "cmpxchg" "cmpxchg64"; do
 	gen_try_cmpxchg_fallbacks "${cmpxchg}"
 done
 
+for cmpxchg in "cmpxchg_local" "cmpxchg64_local"; do
+	gen_try_cmpxchg_fallback "${cmpxchg}" ""
+done
+
 grep '^[a-z]' "$1" | while read name meta args; do
 	gen_proto "${meta}" "${name}" "atomic" "int" ${args}
 done
diff --git a/scripts/atomic/gen-atomic-instrumented.sh b/scripts/atomic/gen-atomic-instrumented.sh
index 77c06526a574..c8165e9431bf 100755
--- a/scripts/atomic/gen-atomic-instrumented.sh
+++ b/scripts/atomic/gen-atomic-instrumented.sh
@@ -173,7 +173,7 @@ for xchg in "xchg" "cmpxchg" "cmpxchg64" "try_cmpxchg" "try_cmpxchg64"; do
 	done
 done
 
-for xchg in "cmpxchg_local" "cmpxchg64_local" "sync_cmpxchg"; do
+for xchg in "cmpxchg_local" "cmpxchg64_local" "sync_cmpxchg" "try_cmpxchg_local" "try_cmpxchg64_local" ; do
 	gen_xchg "${xchg}" "" ""
 	printf "\n"
 done
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/5] locking/generic: Wire up local{,64}_try_cmpxchg
  2023-04-05 14:17 [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg Uros Bizjak
  2023-04-05 14:17 ` [PATCH v2 1/5] locking/atomic: Add generic try_cmpxchg{,64}_local support Uros Bizjak
@ 2023-04-05 14:17 ` Uros Bizjak
  2023-04-11 11:13   ` Mark Rutland
  2023-04-05 14:17 ` [PATCH v2 3/5] locking/arch: Wire up local_try_cmpxchg Uros Bizjak
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Uros Bizjak @ 2023-04-05 14:17 UTC (permalink / raw)
  To: linux-alpha, loongarch, linux-mips, linuxppc-dev, x86,
	linux-arch, linux-perf-users, linux-kernel
  Cc: Uros Bizjak, Arnd Bergmann

Implement generic support for local{,64}_try_cmpxchg.

Redirect to the atomic_ family of functions when the target
does not provide its own local.h definitions.

For 64-bit targets, implement local64_try_cmpxchg and
local64_cmpxchg using typed C wrappers that call local_
family of functions and provide additional checking
of their input arguments.

Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
---
 include/asm-generic/local.h   |  1 +
 include/asm-generic/local64.h | 12 +++++++++++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/asm-generic/local.h b/include/asm-generic/local.h
index fca7f1d84818..7f97018df66f 100644
--- a/include/asm-generic/local.h
+++ b/include/asm-generic/local.h
@@ -42,6 +42,7 @@ typedef struct
 #define local_inc_return(l) atomic_long_inc_return(&(l)->a)
 
 #define local_cmpxchg(l, o, n) atomic_long_cmpxchg((&(l)->a), (o), (n))
+#define local_try_cmpxchg(l, po, n) atomic_long_try_cmpxchg((&(l)->a), (po), (n))
 #define local_xchg(l, n) atomic_long_xchg((&(l)->a), (n))
 #define local_add_unless(l, _a, u) atomic_long_add_unless((&(l)->a), (_a), (u))
 #define local_inc_not_zero(l) atomic_long_inc_not_zero(&(l)->a)
diff --git a/include/asm-generic/local64.h b/include/asm-generic/local64.h
index 765be0b7d883..14963a7a6253 100644
--- a/include/asm-generic/local64.h
+++ b/include/asm-generic/local64.h
@@ -42,7 +42,16 @@ typedef struct {
 #define local64_sub_return(i, l) local_sub_return((i), (&(l)->a))
 #define local64_inc_return(l)	local_inc_return(&(l)->a)
 
-#define local64_cmpxchg(l, o, n) local_cmpxchg((&(l)->a), (o), (n))
+static inline s64 local64_cmpxchg(local64_t *l, s64 old, s64 new)
+{
+	return local_cmpxchg(&l->a, old, new);
+}
+
+static inline bool local64_try_cmpxchg(local64_t *l, s64 *old, s64 new)
+{
+	return local_try_cmpxchg(&l->a, (long *)old, new);
+}
+
 #define local64_xchg(l, n)	local_xchg((&(l)->a), (n))
 #define local64_add_unless(l, _a, u) local_add_unless((&(l)->a), (_a), (u))
 #define local64_inc_not_zero(l)	local_inc_not_zero(&(l)->a)
@@ -81,6 +90,7 @@ typedef struct {
 #define local64_inc_return(l)	atomic64_inc_return(&(l)->a)
 
 #define local64_cmpxchg(l, o, n) atomic64_cmpxchg((&(l)->a), (o), (n))
+#define local64_try_cmpxchg(l, po, n) atomic64_try_cmpxchg((&(l)->a), (po), (n))
 #define local64_xchg(l, n)	atomic64_xchg((&(l)->a), (n))
 #define local64_add_unless(l, _a, u) atomic64_add_unless((&(l)->a), (_a), (u))
 #define local64_inc_not_zero(l)	atomic64_inc_not_zero(&(l)->a)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 3/5] locking/arch: Wire up local_try_cmpxchg
  2023-04-05 14:17 [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg Uros Bizjak
  2023-04-05 14:17 ` [PATCH v2 1/5] locking/atomic: Add generic try_cmpxchg{,64}_local support Uros Bizjak
  2023-04-05 14:17 ` [PATCH v2 2/5] locking/generic: Wire up local{,64}_try_cmpxchg Uros Bizjak
@ 2023-04-05 14:17 ` Uros Bizjak
  2023-04-12 11:32   ` Peter Zijlstra
  2023-05-17  7:41   ` Charlemagne Lasse
  2023-04-05 14:17 ` [PATCH v2 4/5] locking/x86: Define arch_try_cmpxchg_local Uros Bizjak
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 20+ messages in thread
From: Uros Bizjak @ 2023-04-05 14:17 UTC (permalink / raw)
  To: linux-alpha, loongarch, linux-mips, linuxppc-dev, x86,
	linux-arch, linux-perf-users, linux-kernel
  Cc: Uros Bizjak, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Huacai Chen, WANG Xuerui, Jiaxun Yang, Jun Yi,
	Thomas Bogendoerfer, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

Implement target specific support for local_try_cmpxchg
and local_cmpxchg using typed C wrappers that call their
_local counterpart and provide additional checking of
their input arguments.

Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Jun Yi <yijun@loongson.cn>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
---
 arch/alpha/include/asm/local.h     | 12 ++++++++++--
 arch/loongarch/include/asm/local.h | 13 +++++++++++--
 arch/mips/include/asm/local.h      | 13 +++++++++++--
 arch/powerpc/include/asm/local.h   | 11 +++++++++++
 arch/x86/include/asm/local.h       | 13 +++++++++++--
 5 files changed, 54 insertions(+), 8 deletions(-)

diff --git a/arch/alpha/include/asm/local.h b/arch/alpha/include/asm/local.h
index fab26a1c93d5..0fcaad642cc3 100644
--- a/arch/alpha/include/asm/local.h
+++ b/arch/alpha/include/asm/local.h
@@ -52,8 +52,16 @@ static __inline__ long local_sub_return(long i, local_t * l)
 	return result;
 }
 
-#define local_cmpxchg(l, o, n) \
-	(cmpxchg_local(&((l)->a.counter), (o), (n)))
+static __inline__ long local_cmpxchg(local_t *l, long old, long new)
+{
+	return cmpxchg_local(&l->a.counter, old, new);
+}
+
+static __inline__ bool local_try_cmpxchg(local_t *l, long *old, long new)
+{
+	return try_cmpxchg_local(&l->a.counter, (s64 *)old, new);
+}
+
 #define local_xchg(l, n) (xchg_local(&((l)->a.counter), (n)))
 
 /**
diff --git a/arch/loongarch/include/asm/local.h b/arch/loongarch/include/asm/local.h
index 65fbbae9fc4d..83e995b30e47 100644
--- a/arch/loongarch/include/asm/local.h
+++ b/arch/loongarch/include/asm/local.h
@@ -56,8 +56,17 @@ static inline long local_sub_return(long i, local_t *l)
 	return result;
 }
 
-#define local_cmpxchg(l, o, n) \
-	((long)cmpxchg_local(&((l)->a.counter), (o), (n)))
+static inline long local_cmpxchg(local_t *l, long old, long new)
+{
+	return cmpxchg_local(&l->a.counter, old, new);
+}
+
+static inline bool local_try_cmpxchg(local_t *l, long *old, long new)
+{
+	typeof(l->a.counter) *__old = (typeof(l->a.counter) *) old;
+	return try_cmpxchg_local(&l->a.counter, __old, new);
+}
+
 #define local_xchg(l, n) (atomic_long_xchg((&(l)->a), (n)))
 
 /**
diff --git a/arch/mips/include/asm/local.h b/arch/mips/include/asm/local.h
index 08366b1fd273..5daf6fe8e3e9 100644
--- a/arch/mips/include/asm/local.h
+++ b/arch/mips/include/asm/local.h
@@ -94,8 +94,17 @@ static __inline__ long local_sub_return(long i, local_t * l)
 	return result;
 }
 
-#define local_cmpxchg(l, o, n) \
-	((long)cmpxchg_local(&((l)->a.counter), (o), (n)))
+static __inline__ long local_cmpxchg(local_t *l, long old, long new)
+{
+	return cmpxchg_local(&l->a.counter, old, new);
+}
+
+static __inline__ bool local_try_cmpxchg(local_t *l, long *old, long new)
+{
+	typeof(l->a.counter) *__old = (typeof(l->a.counter) *) old;
+	return try_cmpxchg_local(&l->a.counter, __old, new);
+}
+
 #define local_xchg(l, n) (atomic_long_xchg((&(l)->a), (n)))
 
 /**
diff --git a/arch/powerpc/include/asm/local.h b/arch/powerpc/include/asm/local.h
index bc4bd19b7fc2..45492fb5bf22 100644
--- a/arch/powerpc/include/asm/local.h
+++ b/arch/powerpc/include/asm/local.h
@@ -90,6 +90,17 @@ static __inline__ long local_cmpxchg(local_t *l, long o, long n)
 	return t;
 }
 
+static __inline__ bool local_try_cmpxchg(local_t *l, long *po, long n)
+{
+	long o = *po, r;
+
+	r = local_cmpxchg(l, o, n);
+	if (unlikely(r != o))
+		*po = r;
+
+	return likely(r == o);
+}
+
 static __inline__ long local_xchg(local_t *l, long n)
 {
 	long t;
diff --git a/arch/x86/include/asm/local.h b/arch/x86/include/asm/local.h
index 349a47acaa4a..56d4ef604b91 100644
--- a/arch/x86/include/asm/local.h
+++ b/arch/x86/include/asm/local.h
@@ -120,8 +120,17 @@ static inline long local_sub_return(long i, local_t *l)
 #define local_inc_return(l)  (local_add_return(1, l))
 #define local_dec_return(l)  (local_sub_return(1, l))
 
-#define local_cmpxchg(l, o, n) \
-	(cmpxchg_local(&((l)->a.counter), (o), (n)))
+static inline long local_cmpxchg(local_t *l, long old, long new)
+{
+	return cmpxchg_local(&l->a.counter, old, new);
+}
+
+static inline bool local_try_cmpxchg(local_t *l, long *old, long new)
+{
+	typeof(l->a.counter) *__old = (typeof(l->a.counter) *) old;
+	return try_cmpxchg_local(&l->a.counter, __old, new);
+}
+
 /* Always has a lock prefix */
 #define local_xchg(l, n) (xchg(&((l)->a.counter), (n)))
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 4/5] locking/x86: Define arch_try_cmpxchg_local
  2023-04-05 14:17 [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg Uros Bizjak
                   ` (2 preceding siblings ...)
  2023-04-05 14:17 ` [PATCH v2 3/5] locking/arch: Wire up local_try_cmpxchg Uros Bizjak
@ 2023-04-05 14:17 ` Uros Bizjak
  2023-04-05 14:17 ` [PATCH v2 5/5] events: Illustrate the transition to local{,64}_try_cmpxchg Uros Bizjak
  2023-04-05 16:37 ` [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg Dave Hansen
  5 siblings, 0 replies; 20+ messages in thread
From: Uros Bizjak @ 2023-04-05 14:17 UTC (permalink / raw)
  To: linux-alpha, loongarch, linux-mips, linuxppc-dev, x86,
	linux-arch, linux-perf-users, linux-kernel
  Cc: Uros Bizjak, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

Define target specific arch_try_cmpxchg_local. This
definition overrides the generic arch_try_cmpxchg_local
fallback definition and enables target-specific
implementation of try_cmpxchg_local.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
---
 arch/x86/include/asm/cmpxchg.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
index 94fbe6ae7431..540573f515b7 100644
--- a/arch/x86/include/asm/cmpxchg.h
+++ b/arch/x86/include/asm/cmpxchg.h
@@ -221,9 +221,15 @@ extern void __add_wrong_size(void)
 #define __try_cmpxchg(ptr, pold, new, size)				\
 	__raw_try_cmpxchg((ptr), (pold), (new), (size), LOCK_PREFIX)
 
+#define __try_cmpxchg_local(ptr, pold, new, size)			\
+	__raw_try_cmpxchg((ptr), (pold), (new), (size), "")
+
 #define arch_try_cmpxchg(ptr, pold, new) 				\
 	__try_cmpxchg((ptr), (pold), (new), sizeof(*(ptr)))
 
+#define arch_try_cmpxchg_local(ptr, pold, new)				\
+	__try_cmpxchg_local((ptr), (pold), (new), sizeof(*(ptr)))
+
 /*
  * xadd() adds "inc" to "*ptr" and atomically returns the previous
  * value of "*ptr".
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 5/5] events: Illustrate the transition to local{,64}_try_cmpxchg
  2023-04-05 14:17 [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg Uros Bizjak
                   ` (3 preceding siblings ...)
  2023-04-05 14:17 ` [PATCH v2 4/5] locking/x86: Define arch_try_cmpxchg_local Uros Bizjak
@ 2023-04-05 14:17 ` Uros Bizjak
  2023-04-05 16:37 ` [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg Dave Hansen
  5 siblings, 0 replies; 20+ messages in thread
From: Uros Bizjak @ 2023-04-05 14:17 UTC (permalink / raw)
  To: linux-alpha, loongarch, linux-mips, linuxppc-dev, x86,
	linux-arch, linux-perf-users, linux-kernel
  Cc: Uros Bizjak

This patch illustrates the transition to local{,64}_try_cmpxchg.
It is not intended to be merged as-is.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
---
 arch/x86/events/core.c      | 9 ++++-----
 kernel/events/ring_buffer.c | 5 +++--
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index d096b04bf80e..d9310e9363f1 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -129,13 +129,12 @@ u64 x86_perf_event_update(struct perf_event *event)
 	 * exchange a new raw count - then add that new-prev delta
 	 * count to the generic event atomically:
 	 */
-again:
 	prev_raw_count = local64_read(&hwc->prev_count);
-	rdpmcl(hwc->event_base_rdpmc, new_raw_count);
 
-	if (local64_cmpxchg(&hwc->prev_count, prev_raw_count,
-					new_raw_count) != prev_raw_count)
-		goto again;
+	do {
+		rdpmcl(hwc->event_base_rdpmc, new_raw_count);
+	} while (!local64_try_cmpxchg(&hwc->prev_count, &prev_raw_count,
+				      new_raw_count));
 
 	/*
 	 * Now we have the new raw value and have updated the prev
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 273a0fe7910a..111ab85ee97d 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -191,9 +191,10 @@ __perf_output_begin(struct perf_output_handle *handle,
 
 	perf_output_get_handle(handle);
 
+	offset = local_read(&rb->head);
 	do {
 		tail = READ_ONCE(rb->user_page->data_tail);
-		offset = head = local_read(&rb->head);
+		head = offset;
 		if (!rb->overwrite) {
 			if (unlikely(!ring_buffer_has_space(head, tail,
 							    perf_data_size(rb),
@@ -217,7 +218,7 @@ __perf_output_begin(struct perf_output_handle *handle,
 			head += size;
 		else
 			head -= size;
-	} while (local_cmpxchg(&rb->head, offset, head) != offset);
+	} while (!local_try_cmpxchg(&rb->head, &offset, head));
 
 	if (backward) {
 		offset = head;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg
  2023-04-05 14:17 [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg Uros Bizjak
                   ` (4 preceding siblings ...)
  2023-04-05 14:17 ` [PATCH v2 5/5] events: Illustrate the transition to local{,64}_try_cmpxchg Uros Bizjak
@ 2023-04-05 16:37 ` Dave Hansen
  2023-04-05 18:53   ` Uros Bizjak
                     ` (2 more replies)
  5 siblings, 3 replies; 20+ messages in thread
From: Dave Hansen @ 2023-04-05 16:37 UTC (permalink / raw)
  To: Uros Bizjak, linux-alpha, loongarch, linux-mips, linuxppc-dev,
	x86, linux-arch, linux-perf-users, linux-kernel
  Cc: Richard Henderson, Ivan Kokshaysky, Matt Turner, Huacai Chen,
	WANG Xuerui, Thomas Bogendoerfer, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Arnd Bergmann,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Ian Rogers,
	Will Deacon, Boqun Feng, Jiaxun Yang, Jun Yi

On 4/5/23 07:17, Uros Bizjak wrote:
> Add generic and target specific support for local{,64}_try_cmpxchg
> and wire up support for all targets that use local_t infrastructure.

I feel like I'm missing some context.

What are the actual end user visible effects of this series?  Is there a
measurable decrease in perf overhead?  Why go to all this trouble for
perf?  Who else will use local_try_cmpxchg()?

I'm all for improving things, and perf is an important user.  But, if
the goal here is improving performance, it would be nice to see at least
a stab at quantifying the performance delta.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg
  2023-04-05 16:37 ` [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg Dave Hansen
@ 2023-04-05 18:53   ` Uros Bizjak
  2023-04-06  8:25   ` David Laight
  2023-04-11 11:35   ` Mark Rutland
  2 siblings, 0 replies; 20+ messages in thread
From: Uros Bizjak @ 2023-04-05 18:53 UTC (permalink / raw)
  To: Dave Hansen, Steven Rostedt
  Cc: linux-alpha, loongarch, linux-mips, linuxppc-dev, x86,
	linux-arch, linux-perf-users, linux-kernel, Richard Henderson,
	Ivan Kokshaysky, Matt Turner, Huacai Chen, WANG Xuerui,
	Thomas Bogendoerfer, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Arnd Bergmann, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Ian Rogers, Will Deacon, Boqun Feng,
	Jiaxun Yang, Jun Yi

On Wed, Apr 5, 2023 at 6:37 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 4/5/23 07:17, Uros Bizjak wrote:
> > Add generic and target specific support for local{,64}_try_cmpxchg
> > and wire up support for all targets that use local_t infrastructure.
>
> I feel like I'm missing some context.
>
> What are the actual end user visible effects of this series?  Is there a
> measurable decrease in perf overhead?  Why go to all this trouble for
> perf?  Who else will use local_try_cmpxchg()?

This functionality was requested by perf people [1], so perhaps Steven
can give us some concrete examples. In general, apart from the removal
of unneeded compare instruction on x86, usage of try_cmpxchg also
results in slightly better code on non-x86 targets [2], since the code
now correctly identifies fast-path through the cmpxchg loop.

Also important is that try_cmpxchg code reuses the result of cmpxchg
instruction in the loop, so a read from the memory in the loop is
eliminated. When reviewing the cmpxchg usage sites, I found numerous
places where unnecessary read from memory was present in the loop, two
examples can be seen in the last patch of this series.

Also, using try_cmpxchg prevents inconsistencies of the cmpxchg loop,
where the result of the cmpxchg is compared with the wrong "old" value
- one such bug is still lurking in x86 APIC code, please see [3].

Please note that apart from perf subsystem, event subsystem can also
be improved by using local_try_cmpxchg. This is the reason that the
last patch includes a change in events/core.c.

> I'm all for improving things, and perf is an important user.  But, if
> the goal here is improving performance, it would be nice to see at least
> a stab at quantifying the performance delta.

[1] https://lore.kernel.org/lkml/20230301131831.6c8d4ff5@gandalf.local.home/
[2] https://lore.kernel.org/lkml/Yo91omfDZtTgXhyn@FVFF77S0Q05N.cambridge.arm.com/
[3] https://lore.kernel.org/lkml/20230227160917.107820-1-ubizjak@gmail.com/

Uros.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg
  2023-04-05 16:37 ` [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg Dave Hansen
  2023-04-05 18:53   ` Uros Bizjak
@ 2023-04-06  8:25   ` David Laight
  2023-04-06  8:38     ` Uros Bizjak
  2023-04-11 11:35   ` Mark Rutland
  2 siblings, 1 reply; 20+ messages in thread
From: David Laight @ 2023-04-06  8:25 UTC (permalink / raw)
  To: 'Dave Hansen',
	Uros Bizjak, linux-alpha, loongarch, linux-mips, linuxppc-dev,
	x86, linux-arch, linux-perf-users, linux-kernel
  Cc: Richard Henderson, Ivan Kokshaysky, Matt Turner, Huacai Chen,
	WANG Xuerui, Thomas Bogendoerfer, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Arnd Bergmann,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Ian Rogers,
	Will Deacon, Boqun Feng, Jiaxun Yang, Jun Yi

From: Dave Hansen
> Sent: 05 April 2023 17:37
> 
> On 4/5/23 07:17, Uros Bizjak wrote:
> > Add generic and target specific support for local{,64}_try_cmpxchg
> > and wire up support for all targets that use local_t infrastructure.
> 
> I feel like I'm missing some context.
> 
> What are the actual end user visible effects of this series?  Is there a
> measurable decrease in perf overhead?  Why go to all this trouble for
> perf?  Who else will use local_try_cmpxchg()?

I'm assuming the local_xxx operations only have to be save wrt interrupts?
On x86 it is possible that an alternate instruction sequence
that doesn't use a locked instruction may actually be faster!

Although, maybe, any kind of locked cmpxchg just needs to ensure
the cache line isn't 'stolen', so apart from possible slight
delays on another cpu that gets a cache miss for the line in
all makes little difference.
The cache line miss costs a lot anyway, line bouncing more
and is best avoided.
So is there actually much of a benefit at all?

Clearly the try_cmpxchg help - but that is a different issue.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg
  2023-04-06  8:25   ` David Laight
@ 2023-04-06  8:38     ` Uros Bizjak
  2023-04-06  9:01       ` David Laight
  0 siblings, 1 reply; 20+ messages in thread
From: Uros Bizjak @ 2023-04-06  8:38 UTC (permalink / raw)
  To: David Laight
  Cc: Dave Hansen, linux-alpha, loongarch, linux-mips, linuxppc-dev,
	x86, linux-arch, linux-perf-users, linux-kernel,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Huacai Chen,
	WANG Xuerui, Thomas Bogendoerfer, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Arnd Bergmann,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Ian Rogers,
	Will Deacon, Boqun Feng, Jiaxun Yang, Jun Yi

On Thu, Apr 6, 2023 at 10:26 AM David Laight <David.Laight@aculab.com> wrote:
>
> From: Dave Hansen
> > Sent: 05 April 2023 17:37
> >
> > On 4/5/23 07:17, Uros Bizjak wrote:
> > > Add generic and target specific support for local{,64}_try_cmpxchg
> > > and wire up support for all targets that use local_t infrastructure.
> >
> > I feel like I'm missing some context.
> >
> > What are the actual end user visible effects of this series?  Is there a
> > measurable decrease in perf overhead?  Why go to all this trouble for
> > perf?  Who else will use local_try_cmpxchg()?
>
> I'm assuming the local_xxx operations only have to be save wrt interrupts?
> On x86 it is possible that an alternate instruction sequence
> that doesn't use a locked instruction may actually be faster!

Please note that "local" functions do not use lock prefix. Only atomic
properties of cmpxchg instruction are exploited since it only needs to
be safe wrt interrupts.

Uros.

> Although, maybe, any kind of locked cmpxchg just needs to ensure
> the cache line isn't 'stolen', so apart from possible slight
> delays on another cpu that gets a cache miss for the line in
> all makes little difference.
> The cache line miss costs a lot anyway, line bouncing more
> and is best avoided.
> So is there actually much of a benefit at all?
>
> Clearly the try_cmpxchg help - but that is a different issue.
>
>         David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg
  2023-04-06  8:38     ` Uros Bizjak
@ 2023-04-06  9:01       ` David Laight
  0 siblings, 0 replies; 20+ messages in thread
From: David Laight @ 2023-04-06  9:01 UTC (permalink / raw)
  To: 'Uros Bizjak'
  Cc: Dave Hansen, linux-alpha, loongarch, linux-mips, linuxppc-dev,
	x86, linux-arch, linux-perf-users, linux-kernel,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Huacai Chen,
	WANG Xuerui, Thomas Bogendoerfer, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Arnd Bergmann,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Ian Rogers,
	Will Deacon, Boqun Feng, Jiaxun Yang, Jun Yi

From: Uros Bizjak
> Sent: 06 April 2023 09:39
> 
> On Thu, Apr 6, 2023 at 10:26 AM David Laight <David.Laight@aculab.com> wrote:
> >
> > From: Dave Hansen
> > > Sent: 05 April 2023 17:37
> > >
> > > On 4/5/23 07:17, Uros Bizjak wrote:
> > > > Add generic and target specific support for local{,64}_try_cmpxchg
> > > > and wire up support for all targets that use local_t infrastructure.
> > >
> > > I feel like I'm missing some context.
> > >
> > > What are the actual end user visible effects of this series?  Is there a
> > > measurable decrease in perf overhead?  Why go to all this trouble for
> > > perf?  Who else will use local_try_cmpxchg()?
> >
> > I'm assuming the local_xxx operations only have to be save wrt interrupts?
> > On x86 it is possible that an alternate instruction sequence
> > that doesn't use a locked instruction may actually be faster!
> 
> Please note that "local" functions do not use lock prefix. Only atomic
> properties of cmpxchg instruction are exploited since it only needs to
> be safe wrt interrupts.

Gah, I was assuming that LOCK was implied - like it is for xchg
and all the bit instructions.

In any case I suspect it makes little difference unless the
locked variant affects the instruction pipeline.
In fact, you may want to stop the cacheline being invalidated
between the read and write in order to avoid an extra cache
line bounce.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/5] locking/atomic: Add generic try_cmpxchg{,64}_local support
  2023-04-05 14:17 ` [PATCH v2 1/5] locking/atomic: Add generic try_cmpxchg{,64}_local support Uros Bizjak
@ 2023-04-11 11:10   ` Mark Rutland
  0 siblings, 0 replies; 20+ messages in thread
From: Mark Rutland @ 2023-04-11 11:10 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: linux-alpha, loongarch, linux-mips, linuxppc-dev, x86,
	linux-arch, linux-perf-users, linux-kernel, Will Deacon,
	Peter Zijlstra, Boqun Feng

On Wed, Apr 05, 2023 at 04:17:06PM +0200, Uros Bizjak wrote:
> Add generic support for try_cmpxchg{,64}_local and their falbacks.
> 
> These provides the generic try_cmpxchg_local family of functions
> from the arch_ prefixed version, also adding explicit instrumentation.
> 
> Cc: Will Deacon <will@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Uros Bizjak <ubizjak@gmail.com>

Acked-by: Mark Rutland <mark.rutland@arm.com>

Mark.

> ---
>  include/linux/atomic/atomic-arch-fallback.h | 24 ++++++++++++++++++++-
>  include/linux/atomic/atomic-instrumented.h  | 20 ++++++++++++++++-
>  scripts/atomic/gen-atomic-fallback.sh       |  4 ++++
>  scripts/atomic/gen-atomic-instrumented.sh   |  2 +-
>  4 files changed, 47 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/atomic/atomic-arch-fallback.h b/include/linux/atomic/atomic-arch-fallback.h
> index 77bc5522e61c..36c92851cdee 100644
> --- a/include/linux/atomic/atomic-arch-fallback.h
> +++ b/include/linux/atomic/atomic-arch-fallback.h
> @@ -217,6 +217,28 @@
>  
>  #endif /* arch_try_cmpxchg64_relaxed */
>  
> +#ifndef arch_try_cmpxchg_local
> +#define arch_try_cmpxchg_local(_ptr, _oldp, _new) \
> +({ \
> +	typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
> +	___r = arch_cmpxchg_local((_ptr), ___o, (_new)); \
> +	if (unlikely(___r != ___o)) \
> +		*___op = ___r; \
> +	likely(___r == ___o); \
> +})
> +#endif /* arch_try_cmpxchg_local */
> +
> +#ifndef arch_try_cmpxchg64_local
> +#define arch_try_cmpxchg64_local(_ptr, _oldp, _new) \
> +({ \
> +	typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
> +	___r = arch_cmpxchg64_local((_ptr), ___o, (_new)); \
> +	if (unlikely(___r != ___o)) \
> +		*___op = ___r; \
> +	likely(___r == ___o); \
> +})
> +#endif /* arch_try_cmpxchg64_local */
> +
>  #ifndef arch_atomic_read_acquire
>  static __always_inline int
>  arch_atomic_read_acquire(const atomic_t *v)
> @@ -2456,4 +2478,4 @@ arch_atomic64_dec_if_positive(atomic64_t *v)
>  #endif
>  
>  #endif /* _LINUX_ATOMIC_FALLBACK_H */
> -// b5e87bdd5ede61470c29f7a7e4de781af3770f09
> +// 1f49bd4895a4b7a5383906649027205c52ec80ab
> diff --git a/include/linux/atomic/atomic-instrumented.h b/include/linux/atomic/atomic-instrumented.h
> index 7a139ec030b0..14a9212cc987 100644
> --- a/include/linux/atomic/atomic-instrumented.h
> +++ b/include/linux/atomic/atomic-instrumented.h
> @@ -2066,6 +2066,24 @@ atomic_long_dec_if_positive(atomic_long_t *v)
>  	arch_sync_cmpxchg(__ai_ptr, __VA_ARGS__); \
>  })
>  
> +#define try_cmpxchg_local(ptr, oldp, ...) \
> +({ \
> +	typeof(ptr) __ai_ptr = (ptr); \
> +	typeof(oldp) __ai_oldp = (oldp); \
> +	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
> +	instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
> +	arch_try_cmpxchg_local(__ai_ptr, __ai_oldp, __VA_ARGS__); \
> +})
> +
> +#define try_cmpxchg64_local(ptr, oldp, ...) \
> +({ \
> +	typeof(ptr) __ai_ptr = (ptr); \
> +	typeof(oldp) __ai_oldp = (oldp); \
> +	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
> +	instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
> +	arch_try_cmpxchg64_local(__ai_ptr, __ai_oldp, __VA_ARGS__); \
> +})
> +
>  #define cmpxchg_double(ptr, ...) \
>  ({ \
>  	typeof(ptr) __ai_ptr = (ptr); \
> @@ -2083,4 +2101,4 @@ atomic_long_dec_if_positive(atomic_long_t *v)
>  })
>  
>  #endif /* _LINUX_ATOMIC_INSTRUMENTED_H */
> -// 764f741eb77a7ad565dc8d99ce2837d5542e8aee
> +// 456e206c7e4e681126c482e4edcc6f46921ac731
> diff --git a/scripts/atomic/gen-atomic-fallback.sh b/scripts/atomic/gen-atomic-fallback.sh
> index 3a07695e3c89..6e853f0dad8d 100755
> --- a/scripts/atomic/gen-atomic-fallback.sh
> +++ b/scripts/atomic/gen-atomic-fallback.sh
> @@ -225,6 +225,10 @@ for cmpxchg in "cmpxchg" "cmpxchg64"; do
>  	gen_try_cmpxchg_fallbacks "${cmpxchg}"
>  done
>  
> +for cmpxchg in "cmpxchg_local" "cmpxchg64_local"; do
> +	gen_try_cmpxchg_fallback "${cmpxchg}" ""
> +done
> +
>  grep '^[a-z]' "$1" | while read name meta args; do
>  	gen_proto "${meta}" "${name}" "atomic" "int" ${args}
>  done
> diff --git a/scripts/atomic/gen-atomic-instrumented.sh b/scripts/atomic/gen-atomic-instrumented.sh
> index 77c06526a574..c8165e9431bf 100755
> --- a/scripts/atomic/gen-atomic-instrumented.sh
> +++ b/scripts/atomic/gen-atomic-instrumented.sh
> @@ -173,7 +173,7 @@ for xchg in "xchg" "cmpxchg" "cmpxchg64" "try_cmpxchg" "try_cmpxchg64"; do
>  	done
>  done
>  
> -for xchg in "cmpxchg_local" "cmpxchg64_local" "sync_cmpxchg"; do
> +for xchg in "cmpxchg_local" "cmpxchg64_local" "sync_cmpxchg" "try_cmpxchg_local" "try_cmpxchg64_local" ; do
>  	gen_xchg "${xchg}" "" ""
>  	printf "\n"
>  done
> -- 
> 2.39.2
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/5] locking/generic: Wire up local{,64}_try_cmpxchg
  2023-04-05 14:17 ` [PATCH v2 2/5] locking/generic: Wire up local{,64}_try_cmpxchg Uros Bizjak
@ 2023-04-11 11:13   ` Mark Rutland
  0 siblings, 0 replies; 20+ messages in thread
From: Mark Rutland @ 2023-04-11 11:13 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: linux-alpha, loongarch, linux-mips, linuxppc-dev, x86,
	linux-arch, linux-perf-users, linux-kernel, Arnd Bergmann

On Wed, Apr 05, 2023 at 04:17:07PM +0200, Uros Bizjak wrote:
> Implement generic support for local{,64}_try_cmpxchg.
> 
> Redirect to the atomic_ family of functions when the target
> does not provide its own local.h definitions.
> 
> For 64-bit targets, implement local64_try_cmpxchg and
> local64_cmpxchg using typed C wrappers that call local_
> family of functions and provide additional checking
> of their input arguments.
> 
> Cc: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: Uros Bizjak <ubizjak@gmail.com>

Acked-by: Mark Rutland <mark.rutland@arm.com>

Mark.

> ---
>  include/asm-generic/local.h   |  1 +
>  include/asm-generic/local64.h | 12 +++++++++++-
>  2 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/include/asm-generic/local.h b/include/asm-generic/local.h
> index fca7f1d84818..7f97018df66f 100644
> --- a/include/asm-generic/local.h
> +++ b/include/asm-generic/local.h
> @@ -42,6 +42,7 @@ typedef struct
>  #define local_inc_return(l) atomic_long_inc_return(&(l)->a)
>  
>  #define local_cmpxchg(l, o, n) atomic_long_cmpxchg((&(l)->a), (o), (n))
> +#define local_try_cmpxchg(l, po, n) atomic_long_try_cmpxchg((&(l)->a), (po), (n))
>  #define local_xchg(l, n) atomic_long_xchg((&(l)->a), (n))
>  #define local_add_unless(l, _a, u) atomic_long_add_unless((&(l)->a), (_a), (u))
>  #define local_inc_not_zero(l) atomic_long_inc_not_zero(&(l)->a)
> diff --git a/include/asm-generic/local64.h b/include/asm-generic/local64.h
> index 765be0b7d883..14963a7a6253 100644
> --- a/include/asm-generic/local64.h
> +++ b/include/asm-generic/local64.h
> @@ -42,7 +42,16 @@ typedef struct {
>  #define local64_sub_return(i, l) local_sub_return((i), (&(l)->a))
>  #define local64_inc_return(l)	local_inc_return(&(l)->a)
>  
> -#define local64_cmpxchg(l, o, n) local_cmpxchg((&(l)->a), (o), (n))
> +static inline s64 local64_cmpxchg(local64_t *l, s64 old, s64 new)
> +{
> +	return local_cmpxchg(&l->a, old, new);
> +}
> +
> +static inline bool local64_try_cmpxchg(local64_t *l, s64 *old, s64 new)
> +{
> +	return local_try_cmpxchg(&l->a, (long *)old, new);
> +}
> +
>  #define local64_xchg(l, n)	local_xchg((&(l)->a), (n))
>  #define local64_add_unless(l, _a, u) local_add_unless((&(l)->a), (_a), (u))
>  #define local64_inc_not_zero(l)	local_inc_not_zero(&(l)->a)
> @@ -81,6 +90,7 @@ typedef struct {
>  #define local64_inc_return(l)	atomic64_inc_return(&(l)->a)
>  
>  #define local64_cmpxchg(l, o, n) atomic64_cmpxchg((&(l)->a), (o), (n))
> +#define local64_try_cmpxchg(l, po, n) atomic64_try_cmpxchg((&(l)->a), (po), (n))
>  #define local64_xchg(l, n)	atomic64_xchg((&(l)->a), (n))
>  #define local64_add_unless(l, _a, u) atomic64_add_unless((&(l)->a), (_a), (u))
>  #define local64_inc_not_zero(l)	atomic64_inc_not_zero(&(l)->a)
> -- 
> 2.39.2
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg
  2023-04-05 16:37 ` [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg Dave Hansen
  2023-04-05 18:53   ` Uros Bizjak
  2023-04-06  8:25   ` David Laight
@ 2023-04-11 11:35   ` Mark Rutland
  2023-04-11 13:43     ` Dave Hansen
  2 siblings, 1 reply; 20+ messages in thread
From: Mark Rutland @ 2023-04-11 11:35 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Uros Bizjak, linux-alpha, loongarch, linux-mips, linuxppc-dev,
	x86, linux-arch, linux-perf-users, linux-kernel,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Huacai Chen,
	WANG Xuerui, Thomas Bogendoerfer, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Arnd Bergmann,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Ian Rogers, Will Deacon, Boqun Feng,
	Jiaxun Yang, Jun Yi

On Wed, Apr 05, 2023 at 09:37:04AM -0700, Dave Hansen wrote:
> On 4/5/23 07:17, Uros Bizjak wrote:
> > Add generic and target specific support for local{,64}_try_cmpxchg
> > and wire up support for all targets that use local_t infrastructure.
> 
> I feel like I'm missing some context.
> 
> What are the actual end user visible effects of this series?  Is there a
> measurable decrease in perf overhead?  Why go to all this trouble for
> perf?  Who else will use local_try_cmpxchg()?

Overall, the theory is that it can generate slightly better code (e.g. by
reusing the flags on x86). In practice, that might be in the noise, but as
demonstrated in prior postings the code generation is no worse than before.

From my perspective, the more important part is that this aligns local_t with
the other atomic*_t APIs, which all have ${atomictype}_try_cmpxchg(), and for
consistency/legibility/maintainability it's nice to be able to use the same
code patterns, e.g.

	${inttype} new, old = ${atomictype}_read(ptr);
	do {
		...
		new = do_something_with(old);
	} while (${atomictype}_try_cmpxvhg(ptr, &oldval, newval);

> I'm all for improving things, and perf is an important user.  But, if
> the goal here is improving performance, it would be nice to see at least
> a stab at quantifying the performance delta.

IIUC, Steve's original request for local_try_cmpxchg() was a combination of a
theoretical performance benefit and a more general preference to use
try_cmpxchg() for consistency / better structure of the source code:

  https://lore.kernel.org/lkml/20230301131831.6c8d4ff5@gandalf.local.home/

I agree it'd be nice to have performance figures, but I think those would only
need to demonstrate a lack of a regression rather than a performance
improvement, and I think it's fairly clear from eyeballing the generated
instructions that a regression isn't likely.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg
  2023-04-11 11:35   ` Mark Rutland
@ 2023-04-11 13:43     ` Dave Hansen
  2023-04-11 21:34       ` David Laight
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Hansen @ 2023-04-11 13:43 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Uros Bizjak, linux-alpha, loongarch, linux-mips, linuxppc-dev,
	x86, linux-arch, linux-perf-users, linux-kernel,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Huacai Chen,
	WANG Xuerui, Thomas Bogendoerfer, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Arnd Bergmann,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Ian Rogers, Will Deacon, Boqun Feng,
	Jiaxun Yang, Jun Yi

On 4/11/23 04:35, Mark Rutland wrote:
> I agree it'd be nice to have performance figures, but I think those would only
> need to demonstrate a lack of a regression rather than a performance
> improvement, and I think it's fairly clear from eyeballing the generated
> instructions that a regression isn't likely.

Thanks for the additional context.

I totally agree that there's zero burden here to show a performance
increase.  If anyone can think of a quick way to do _some_ kind of
benchmark on the code being changed and just show that it's free of
brown paper bags, it would be appreciated.  Nothing crazy, just think of
one workload (synthetic or not) that will stress the paths being changed
and run it with and without these changes.  Make sure there are not
surprises.

I also agree that it's unlikely to be brown paper bag material.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg
  2023-04-11 13:43     ` Dave Hansen
@ 2023-04-11 21:34       ` David Laight
  0 siblings, 0 replies; 20+ messages in thread
From: David Laight @ 2023-04-11 21:34 UTC (permalink / raw)
  To: 'Dave Hansen', Mark Rutland
  Cc: Uros Bizjak, linux-alpha, loongarch, linux-mips, linuxppc-dev,
	x86, linux-arch, linux-perf-users, linux-kernel,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Huacai Chen,
	WANG Xuerui, Thomas Bogendoerfer, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Arnd Bergmann,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Ian Rogers, Will Deacon, Boqun Feng,
	Jiaxun Yang, Jun Yi

From: Dave Hansen
> Sent: 11 April 2023 14:44
> 
> On 4/11/23 04:35, Mark Rutland wrote:
> > I agree it'd be nice to have performance figures, but I think those would only
> > need to demonstrate a lack of a regression rather than a performance
> > improvement, and I think it's fairly clear from eyeballing the generated
> > instructions that a regression isn't likely.
> 
> Thanks for the additional context.
> 
> I totally agree that there's zero burden here to show a performance
> increase.  If anyone can think of a quick way to do _some_ kind of
> benchmark on the code being changed and just show that it's free of
> brown paper bags, it would be appreciated.  Nothing crazy, just think of
> one workload (synthetic or not) that will stress the paths being changed
> and run it with and without these changes.  Make sure there are not
> surprises.
> 
> I also agree that it's unlikely to be brown paper bag material.

The only thing I can think of is that, on x86, the locked
variant may actually be faster!
Both require exclusive access to the cache line (the unlocked
variant always does the write! [1]).
So if the cache line is contended between cpu the unlocked
variant might ping-pong the cache line twice!
Of course, if the line is shared like that then performance
is horrid.

[1] I checked on an uncached PCIe address on which I can monitor
the TLP. The write always happens so you can use cmpxchg18b
with a 'known bad value' to do a 16 byte read as a single TLP
(without using an SSE register).

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] locking/arch: Wire up local_try_cmpxchg
  2023-04-05 14:17 ` [PATCH v2 3/5] locking/arch: Wire up local_try_cmpxchg Uros Bizjak
@ 2023-04-12 11:32   ` Peter Zijlstra
  2023-04-12 13:37     ` Uros Bizjak
  2023-05-17  7:41   ` Charlemagne Lasse
  1 sibling, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2023-04-12 11:32 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: linux-alpha, loongarch, linux-mips, linuxppc-dev, x86,
	linux-arch, linux-perf-users, linux-kernel, Richard Henderson,
	Ivan Kokshaysky, Matt Turner, Huacai Chen, WANG Xuerui,
	Jiaxun Yang, Jun Yi, Thomas Bogendoerfer, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin

On Wed, Apr 05, 2023 at 04:17:08PM +0200, Uros Bizjak wrote:
> diff --git a/arch/powerpc/include/asm/local.h b/arch/powerpc/include/asm/local.h
> index bc4bd19b7fc2..45492fb5bf22 100644
> --- a/arch/powerpc/include/asm/local.h
> +++ b/arch/powerpc/include/asm/local.h
> @@ -90,6 +90,17 @@ static __inline__ long local_cmpxchg(local_t *l, long o, long n)
>  	return t;
>  }
>  
> +static __inline__ bool local_try_cmpxchg(local_t *l, long *po, long n)
> +{
> +	long o = *po, r;
> +
> +	r = local_cmpxchg(l, o, n);
> +	if (unlikely(r != o))
> +		*po = r;
> +
> +	return likely(r == o);
> +}
> +

Why is the ppc one different from the rest? Why can't it use the
try_cmpxchg_local() fallback and needs to have it open-coded?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] locking/arch: Wire up local_try_cmpxchg
  2023-04-12 11:32   ` Peter Zijlstra
@ 2023-04-12 13:37     ` Uros Bizjak
  2023-04-12 13:40       ` Peter Zijlstra
  0 siblings, 1 reply; 20+ messages in thread
From: Uros Bizjak @ 2023-04-12 13:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-alpha, loongarch, linux-mips, linuxppc-dev, x86,
	linux-arch, linux-perf-users, linux-kernel, Richard Henderson,
	Ivan Kokshaysky, Matt Turner, Huacai Chen, WANG Xuerui,
	Jiaxun Yang, Jun Yi, Thomas Bogendoerfer, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin

On Wed, Apr 12, 2023 at 1:33 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Apr 05, 2023 at 04:17:08PM +0200, Uros Bizjak wrote:
> > diff --git a/arch/powerpc/include/asm/local.h b/arch/powerpc/include/asm/local.h
> > index bc4bd19b7fc2..45492fb5bf22 100644
> > --- a/arch/powerpc/include/asm/local.h
> > +++ b/arch/powerpc/include/asm/local.h
> > @@ -90,6 +90,17 @@ static __inline__ long local_cmpxchg(local_t *l, long o, long n)
> >       return t;
> >  }
> >
> > +static __inline__ bool local_try_cmpxchg(local_t *l, long *po, long n)
> > +{
> > +     long o = *po, r;
> > +
> > +     r = local_cmpxchg(l, o, n);
> > +     if (unlikely(r != o))
> > +             *po = r;
> > +
> > +     return likely(r == o);
> > +}
> > +
>
> Why is the ppc one different from the rest? Why can't it use the
> try_cmpxchg_local() fallback and needs to have it open-coded?

Please note that ppc directly defines local_cmpxchg that bypasses
cmpxchg_local/arch_cmpxchg_local machinery. The patch takes the same
approach for local_try_cmpxchg, because fallbacks are using
arch_cmpxchg_local definitions.

PPC should be converted to use arch_cmpxchg_local (to also enable
instrumentation), but this is not the scope of the proposed patchset.

Uros.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] locking/arch: Wire up local_try_cmpxchg
  2023-04-12 13:37     ` Uros Bizjak
@ 2023-04-12 13:40       ` Peter Zijlstra
  0 siblings, 0 replies; 20+ messages in thread
From: Peter Zijlstra @ 2023-04-12 13:40 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: linux-alpha, loongarch, linux-mips, linuxppc-dev, x86,
	linux-arch, linux-perf-users, linux-kernel, Richard Henderson,
	Ivan Kokshaysky, Matt Turner, Huacai Chen, WANG Xuerui,
	Jiaxun Yang, Jun Yi, Thomas Bogendoerfer, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin

On Wed, Apr 12, 2023 at 03:37:50PM +0200, Uros Bizjak wrote:
> On Wed, Apr 12, 2023 at 1:33 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Wed, Apr 05, 2023 at 04:17:08PM +0200, Uros Bizjak wrote:
> > > diff --git a/arch/powerpc/include/asm/local.h b/arch/powerpc/include/asm/local.h
> > > index bc4bd19b7fc2..45492fb5bf22 100644
> > > --- a/arch/powerpc/include/asm/local.h
> > > +++ b/arch/powerpc/include/asm/local.h
> > > @@ -90,6 +90,17 @@ static __inline__ long local_cmpxchg(local_t *l, long o, long n)
> > >       return t;
> > >  }
> > >
> > > +static __inline__ bool local_try_cmpxchg(local_t *l, long *po, long n)
> > > +{
> > > +     long o = *po, r;
> > > +
> > > +     r = local_cmpxchg(l, o, n);
> > > +     if (unlikely(r != o))
> > > +             *po = r;
> > > +
> > > +     return likely(r == o);
> > > +}
> > > +
> >
> > Why is the ppc one different from the rest? Why can't it use the
> > try_cmpxchg_local() fallback and needs to have it open-coded?
> 
> Please note that ppc directly defines local_cmpxchg that bypasses
> cmpxchg_local/arch_cmpxchg_local machinery. The patch takes the same
> approach for local_try_cmpxchg, because fallbacks are using
> arch_cmpxchg_local definitions.
> 
> PPC should be converted to use arch_cmpxchg_local (to also enable
> instrumentation), but this is not the scope of the proposed patchset.

Ah indeed. Thanks!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] locking/arch: Wire up local_try_cmpxchg
  2023-04-05 14:17 ` [PATCH v2 3/5] locking/arch: Wire up local_try_cmpxchg Uros Bizjak
  2023-04-12 11:32   ` Peter Zijlstra
@ 2023-05-17  7:41   ` Charlemagne Lasse
  1 sibling, 0 replies; 20+ messages in thread
From: Charlemagne Lasse @ 2023-05-17  7:41 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: linux-alpha, loongarch, linux-mips, linuxppc-dev, x86,
	linux-arch, linux-perf-users, linux-kernel, Richard Henderson,
	Ivan Kokshaysky, Matt Turner, Huacai Chen, WANG Xuerui,
	Jiaxun Yang, Jun Yi, Thomas Bogendoerfer, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin

> +static __inline__ bool local_try_cmpxchg(local_t *l, long *old, long new)
> +{
> +       typeof(l->a.counter) *__old = (typeof(l->a.counter) *) old;
> +       return try_cmpxchg_local(&l->a.counter, __old, new);
> +}
> +

This patch then causes following sparse errors:

    ./arch/x86/include/asm/local.h:131:16: warning: symbol '__old'
shadows an earlier one
    ./arch/x86/include/asm/local.h:130:30: originally declared here

This is then visible in all kinds of builds - which makes it hard to
find out actual problems with sparse.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-05-17  7:42 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-05 14:17 [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg Uros Bizjak
2023-04-05 14:17 ` [PATCH v2 1/5] locking/atomic: Add generic try_cmpxchg{,64}_local support Uros Bizjak
2023-04-11 11:10   ` Mark Rutland
2023-04-05 14:17 ` [PATCH v2 2/5] locking/generic: Wire up local{,64}_try_cmpxchg Uros Bizjak
2023-04-11 11:13   ` Mark Rutland
2023-04-05 14:17 ` [PATCH v2 3/5] locking/arch: Wire up local_try_cmpxchg Uros Bizjak
2023-04-12 11:32   ` Peter Zijlstra
2023-04-12 13:37     ` Uros Bizjak
2023-04-12 13:40       ` Peter Zijlstra
2023-05-17  7:41   ` Charlemagne Lasse
2023-04-05 14:17 ` [PATCH v2 4/5] locking/x86: Define arch_try_cmpxchg_local Uros Bizjak
2023-04-05 14:17 ` [PATCH v2 5/5] events: Illustrate the transition to local{,64}_try_cmpxchg Uros Bizjak
2023-04-05 16:37 ` [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg Dave Hansen
2023-04-05 18:53   ` Uros Bizjak
2023-04-06  8:25   ` David Laight
2023-04-06  8:38     ` Uros Bizjak
2023-04-06  9:01       ` David Laight
2023-04-11 11:35   ` Mark Rutland
2023-04-11 13:43     ` Dave Hansen
2023-04-11 21:34       ` David Laight

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).