[PATCH RFC cmpxchg 0/8] Provide emulation for one- and two-byte cmpxchg()

linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RFC cmpxchg 0/8] Provide emulation for one- and two-byte cmpxchg()
@ 2024-04-01 21:39 Paul E. McKenney
  2024-04-01 21:39 ` [PATCH RFC cmpxchg 1/8] lib: Add one-byte and two-byte cmpxchg() emulation functions Paul E. McKenney
  2024-04-08 17:47 ` [PATCH RFC cmpxchg 0/8] Provide emulation for one- and two-byte cmpxchg() Paul E. McKenney
  0 siblings, 2 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-04-01 21:39 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: elver, akpm, tglx, peterz, dianders, pmladek, torvalds

Hello!

This series provides emulation functions for one-byte and two-byte
cmpxchg, and uses them for those architectures not supporting these
in hardware.  The emulation is in terms of the fully ordered four-byte
cmpxchg() that is supplied by all of these architectures.  These were
tested by making x86 forget that it can do these natively:

88a9b3f7a924 ("EXP arch/x86: Test one-byte cmpxchg emulation")

This commit is local to -rcu and is of course not intended for mainline.

If accepted, RCU Tasks will use this capability in place of the current
rcu_trc_cmpxchg_need_qs() open-coding of this emulation.

1.	Add one-byte and two-byte cmpxchg() emulation functions.

2.	sparc: Emulate one-byte and two-byte cmpxchg.

3.	ARC: Emulate one-byte and two-byte cmpxchg.

4.	csky: Emulate one-byte and two-byte cmpxchg.

5.	sh: Emulate one-byte and two-byte cmpxchg.

6.	xtensa: Emulate one-byte and two-byte cmpxchg.

7.	parisc: Emulate two-byte cmpxchg.

8.	riscv: Emulate one-byte and two-byte cmpxchg.

						Thanx, Paul

------------------------------------------------------------------------

 arch/Kconfig                        |    3 +
 arch/arc/Kconfig                    |    1 
 arch/arc/include/asm/cmpxchg.h      |   38 ++++++++++++++----
 arch/csky/Kconfig                   |    1 
 arch/csky/include/asm/cmpxchg.h     |   18 ++++++++
 arch/parisc/Kconfig                 |    1 
 arch/parisc/include/asm/cmpxchg.h   |    1 
 arch/riscv/Kconfig                  |    1 
 arch/riscv/include/asm/cmpxchg.h    |   25 ++++++++++++
 arch/sh/Kconfig                     |    1 
 arch/sh/include/asm/cmpxchg.h       |    4 +
 arch/sparc/Kconfig                  |    1 
 arch/sparc/include/asm/cmpxchg_32.h |    6 ++
 arch/xtensa/Kconfig                 |    1 
 arch/xtensa/include/asm/cmpxchg.h   |    3 +
 include/linux/cmpxchg-emu.h         |   16 +++++++
 lib/Makefile                        |    1 
 lib/cmpxchg-emu.c                   |   74 ++++++++++++++++++++++++++++++++++++
 18 files changed, 187 insertions(+), 9 deletions(-)

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH RFC cmpxchg 1/8] lib: Add one-byte and two-byte cmpxchg() emulation functions
  2024-04-01 21:39 [PATCH RFC cmpxchg 0/8] Provide emulation for one- and two-byte cmpxchg() Paul E. McKenney
@ 2024-04-01 21:39 ` Paul E. McKenney
  2024-04-02 13:07   ` Marco Elver
  2024-04-08 17:47 ` [PATCH RFC cmpxchg 0/8] Provide emulation for one- and two-byte cmpxchg() Paul E. McKenney
  1 sibling, 1 reply; 71+ messages in thread
From: Paul E. McKenney @ 2024-04-01 21:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Paul E. McKenney, Marco Elver, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra (Intel),
	Douglas Anderson, Petr Mladek, linux-arch

Architectures are required to provide four-byte cmpxchg() and 64-bit
architectures are additionally required to provide eight-byte cmpxchg().
However, there are cases where one-byte and two-byte cmpxchg()
would be extremely useful.  Therefore, provide cmpxchg_emu_u8() and
cmpxchg_emu_u16() that emulate one-byte and two-byte cmpxchg() in terms
of four-byte cmpxchg().

Note that these emulations are fully ordered, and can (for example)
cause one-byte cmpxchg_relaxed() to incur the overhead of full ordering.
If this causes problems for a given architecture, that architecture is
free to provide its own lighter-weight primitives.

[ paulmck: Apply Marco Elver feedback. ]
[ paulmck: Apply kernel test robot feedback. ]

Link: https://lore.kernel.org/all/0733eb10-5e7a-4450-9b8a-527b97c842ff@paulmck-laptop/

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Marco Elver <elver@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: <linux-arch@vger.kernel.org>
---
 arch/Kconfig                |  3 ++
 include/linux/cmpxchg-emu.h | 16 ++++++++
 lib/Makefile                |  1 +
 lib/cmpxchg-emu.c           | 74 +++++++++++++++++++++++++++++++++++++
 4 files changed, 94 insertions(+)
 create mode 100644 include/linux/cmpxchg-emu.h
 create mode 100644 lib/cmpxchg-emu.c

diff --git a/arch/Kconfig b/arch/Kconfig
index ae4a4f37bbf08..01093c60952a5 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1609,4 +1609,7 @@ config CC_HAS_SANE_FUNCTION_ALIGNMENT
 	# strict alignment always, even with -falign-functions.
 	def_bool CC_HAS_MIN_FUNCTION_ALIGNMENT || CC_IS_CLANG
 
+config ARCH_NEED_CMPXCHG_1_2_EMU
+	bool
+
 endmenu
diff --git a/include/linux/cmpxchg-emu.h b/include/linux/cmpxchg-emu.h
new file mode 100644
index 0000000000000..fee8171fa05eb
--- /dev/null
+++ b/include/linux/cmpxchg-emu.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+/*
+ * Emulated 1-byte and 2-byte cmpxchg operations for architectures
+ * lacking direct support for these sizes.  These are implemented in terms
+ * of 4-byte cmpxchg operations.
+ *
+ * Copyright (C) 2024 Paul E. McKenney.
+ */
+
+#ifndef __LINUX_CMPXCHG_EMU_H
+#define __LINUX_CMPXCHG_EMU_H
+
+uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new);
+uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new);
+
+#endif /* __LINUX_CMPXCHG_EMU_H */
diff --git a/lib/Makefile b/lib/Makefile
index ffc6b2341b45a..1d93b61a7ecbe 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -236,6 +236,7 @@ obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
 lib-$(CONFIG_GENERIC_BUG) += bug.o
 
 obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o
+obj-$(CONFIG_ARCH_NEED_CMPXCHG_1_2_EMU) += cmpxchg-emu.o
 
 obj-$(CONFIG_DYNAMIC_DEBUG_CORE) += dynamic_debug.o
 #ensure exported functions have prototypes
diff --git a/lib/cmpxchg-emu.c b/lib/cmpxchg-emu.c
new file mode 100644
index 0000000000000..a88c4f3c88430
--- /dev/null
+++ b/lib/cmpxchg-emu.c
@@ -0,0 +1,74 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+/*
+ * Emulated 1-byte and 2-byte cmpxchg operations for architectures
+ * lacking direct support for these sizes.  These are implemented in terms
+ * of 4-byte cmpxchg operations.
+ *
+ * Copyright (C) 2024 Paul E. McKenney.
+ */
+
+#include <linux/types.h>
+#include <linux/export.h>
+#include <linux/instrumented.h>
+#include <linux/atomic.h>
+#include <linux/panic.h>
+#include <linux/bug.h>
+#include <asm-generic/rwonce.h>
+#include <linux/cmpxchg-emu.h>
+
+union u8_32 {
+	u8 b[4];
+	u32 w;
+};
+
+/* Emulate one-byte cmpxchg() in terms of 4-byte cmpxchg. */
+uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new)
+{
+	u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3);
+	int i = ((uintptr_t)p) & 0x3;
+	union u8_32 old32;
+	union u8_32 new32;
+	u32 ret;
+
+	ret = READ_ONCE(*p32);
+	do {
+		old32.w = ret;
+		if (old32.b[i] != old)
+			return old32.b[i];
+		new32.w = old32.w;
+		new32.b[i] = new;
+		instrument_atomic_read_write(p, 1);
+		ret = data_race(cmpxchg(p32, old32.w, new32.w));
+	} while (ret != old32.w);
+	return old;
+}
+EXPORT_SYMBOL_GPL(cmpxchg_emu_u8);
+
+union u16_32 {
+	u16 h[2];
+	u32 w;
+};
+
+/* Emulate two-byte cmpxchg() in terms of 4-byte cmpxchg. */
+uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new)
+{
+	u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3);
+	int i = (((uintptr_t)p) & 0x2) / 2;
+	union u16_32 old32;
+	union u16_32 new32;
+	u32 ret;
+
+	WARN_ON_ONCE(((uintptr_t)p) & 0x1);
+	ret = READ_ONCE(*p32);
+	do {
+		old32.w = ret;
+		if (old32.h[i] != old)
+			return old32.h[i];
+		new32.w = old32.w;
+		new32.h[i] = new;
+		instrument_atomic_read_write(p, 2);
+		ret = data_race(cmpxchg(p32, old32.w, new32.w));
+	} while (ret != old32.w);
+	return old;
+}
+EXPORT_SYMBOL_GPL(cmpxchg_emu_u16);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC cmpxchg 1/8] lib: Add one-byte and two-byte cmpxchg() emulation functions
  2024-04-01 21:39 ` [PATCH RFC cmpxchg 1/8] lib: Add one-byte and two-byte cmpxchg() emulation functions Paul E. McKenney
@ 2024-04-02 13:07   ` Marco Elver
  2024-04-02 17:15     ` Paul E. McKenney
  0 siblings, 1 reply; 71+ messages in thread
From: Marco Elver @ 2024-04-02 13:07 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, kernel-team, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra (Intel),
	Douglas Anderson, Petr Mladek, linux-arch

On Mon, 1 Apr 2024 at 23:39, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> Architectures are required to provide four-byte cmpxchg() and 64-bit
> architectures are additionally required to provide eight-byte cmpxchg().
> However, there are cases where one-byte and two-byte cmpxchg()
> would be extremely useful.  Therefore, provide cmpxchg_emu_u8() and
> cmpxchg_emu_u16() that emulate one-byte and two-byte cmpxchg() in terms
> of four-byte cmpxchg().
>
> Note that these emulations are fully ordered, and can (for example)
> cause one-byte cmpxchg_relaxed() to incur the overhead of full ordering.
> If this causes problems for a given architecture, that architecture is
> free to provide its own lighter-weight primitives.
>
> [ paulmck: Apply Marco Elver feedback. ]
> [ paulmck: Apply kernel test robot feedback. ]
>
> Link: https://lore.kernel.org/all/0733eb10-5e7a-4450-9b8a-527b97c842ff@paulmck-laptop/
>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Cc: Marco Elver <elver@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
> Cc: Douglas Anderson <dianders@chromium.org>
> Cc: Petr Mladek <pmladek@suse.com>
> Cc: <linux-arch@vger.kernel.org>

Acked-by: Marco Elver <elver@google.com>

> ---
>  arch/Kconfig                |  3 ++
>  include/linux/cmpxchg-emu.h | 16 ++++++++
>  lib/Makefile                |  1 +
>  lib/cmpxchg-emu.c           | 74 +++++++++++++++++++++++++++++++++++++
>  4 files changed, 94 insertions(+)
>  create mode 100644 include/linux/cmpxchg-emu.h
>  create mode 100644 lib/cmpxchg-emu.c
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index ae4a4f37bbf08..01093c60952a5 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -1609,4 +1609,7 @@ config CC_HAS_SANE_FUNCTION_ALIGNMENT
>         # strict alignment always, even with -falign-functions.
>         def_bool CC_HAS_MIN_FUNCTION_ALIGNMENT || CC_IS_CLANG
>
> +config ARCH_NEED_CMPXCHG_1_2_EMU
> +       bool
> +
>  endmenu
> diff --git a/include/linux/cmpxchg-emu.h b/include/linux/cmpxchg-emu.h
> new file mode 100644
> index 0000000000000..fee8171fa05eb
> --- /dev/null
> +++ b/include/linux/cmpxchg-emu.h
> @@ -0,0 +1,16 @@
> +/* SPDX-License-Identifier: GPL-2.0+ */
> +/*
> + * Emulated 1-byte and 2-byte cmpxchg operations for architectures
> + * lacking direct support for these sizes.  These are implemented in terms
> + * of 4-byte cmpxchg operations.
> + *
> + * Copyright (C) 2024 Paul E. McKenney.
> + */
> +
> +#ifndef __LINUX_CMPXCHG_EMU_H
> +#define __LINUX_CMPXCHG_EMU_H
> +
> +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new);
> +uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new);
> +
> +#endif /* __LINUX_CMPXCHG_EMU_H */
> diff --git a/lib/Makefile b/lib/Makefile
> index ffc6b2341b45a..1d93b61a7ecbe 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -236,6 +236,7 @@ obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
>  lib-$(CONFIG_GENERIC_BUG) += bug.o
>
>  obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o
> +obj-$(CONFIG_ARCH_NEED_CMPXCHG_1_2_EMU) += cmpxchg-emu.o
>
>  obj-$(CONFIG_DYNAMIC_DEBUG_CORE) += dynamic_debug.o
>  #ensure exported functions have prototypes
> diff --git a/lib/cmpxchg-emu.c b/lib/cmpxchg-emu.c
> new file mode 100644
> index 0000000000000..a88c4f3c88430
> --- /dev/null
> +++ b/lib/cmpxchg-emu.c
> @@ -0,0 +1,74 @@
> +/* SPDX-License-Identifier: GPL-2.0+ */
> +/*
> + * Emulated 1-byte and 2-byte cmpxchg operations for architectures
> + * lacking direct support for these sizes.  These are implemented in terms
> + * of 4-byte cmpxchg operations.
> + *
> + * Copyright (C) 2024 Paul E. McKenney.
> + */
> +
> +#include <linux/types.h>
> +#include <linux/export.h>
> +#include <linux/instrumented.h>
> +#include <linux/atomic.h>
> +#include <linux/panic.h>
> +#include <linux/bug.h>
> +#include <asm-generic/rwonce.h>
> +#include <linux/cmpxchg-emu.h>
> +
> +union u8_32 {
> +       u8 b[4];
> +       u32 w;
> +};
> +
> +/* Emulate one-byte cmpxchg() in terms of 4-byte cmpxchg. */
> +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new)
> +{
> +       u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3);
> +       int i = ((uintptr_t)p) & 0x3;
> +       union u8_32 old32;
> +       union u8_32 new32;
> +       u32 ret;
> +
> +       ret = READ_ONCE(*p32);
> +       do {
> +               old32.w = ret;
> +               if (old32.b[i] != old)
> +                       return old32.b[i];
> +               new32.w = old32.w;
> +               new32.b[i] = new;
> +               instrument_atomic_read_write(p, 1);
> +               ret = data_race(cmpxchg(p32, old32.w, new32.w));
> +       } while (ret != old32.w);
> +       return old;
> +}
> +EXPORT_SYMBOL_GPL(cmpxchg_emu_u8);
> +
> +union u16_32 {
> +       u16 h[2];
> +       u32 w;
> +};
> +
> +/* Emulate two-byte cmpxchg() in terms of 4-byte cmpxchg. */
> +uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new)
> +{
> +       u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3);
> +       int i = (((uintptr_t)p) & 0x2) / 2;
> +       union u16_32 old32;
> +       union u16_32 new32;
> +       u32 ret;
> +
> +       WARN_ON_ONCE(((uintptr_t)p) & 0x1);
> +       ret = READ_ONCE(*p32);
> +       do {
> +               old32.w = ret;
> +               if (old32.h[i] != old)
> +                       return old32.h[i];
> +               new32.w = old32.w;
> +               new32.h[i] = new;
> +               instrument_atomic_read_write(p, 2);
> +               ret = data_race(cmpxchg(p32, old32.w, new32.w));
> +       } while (ret != old32.w);
> +       return old;
> +}
> +EXPORT_SYMBOL_GPL(cmpxchg_emu_u16);
> --
> 2.40.1
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC cmpxchg 1/8] lib: Add one-byte and two-byte cmpxchg() emulation functions
  2024-04-02 13:07   ` Marco Elver
@ 2024-04-02 17:15     ` Paul E. McKenney
  0 siblings, 0 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-04-02 17:15 UTC (permalink / raw)
  To: Marco Elver
  Cc: linux-kernel, kernel-team, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra (Intel),
	Douglas Anderson, Petr Mladek, linux-arch

On Tue, Apr 02, 2024 at 03:07:22PM +0200, Marco Elver wrote:
> On Mon, 1 Apr 2024 at 23:39, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > Architectures are required to provide four-byte cmpxchg() and 64-bit
> > architectures are additionally required to provide eight-byte cmpxchg().
> > However, there are cases where one-byte and two-byte cmpxchg()
> > would be extremely useful.  Therefore, provide cmpxchg_emu_u8() and
> > cmpxchg_emu_u16() that emulate one-byte and two-byte cmpxchg() in terms
> > of four-byte cmpxchg().
> >
> > Note that these emulations are fully ordered, and can (for example)
> > cause one-byte cmpxchg_relaxed() to incur the overhead of full ordering.
> > If this causes problems for a given architecture, that architecture is
> > free to provide its own lighter-weight primitives.
> >
> > [ paulmck: Apply Marco Elver feedback. ]
> > [ paulmck: Apply kernel test robot feedback. ]
> >
> > Link: https://lore.kernel.org/all/0733eb10-5e7a-4450-9b8a-527b97c842ff@paulmck-laptop/
> >
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > Cc: Marco Elver <elver@google.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
> > Cc: Douglas Anderson <dianders@chromium.org>
> > Cc: Petr Mladek <pmladek@suse.com>
> > Cc: <linux-arch@vger.kernel.org>
> 
> Acked-by: Marco Elver <elver@google.com>

Thank you!  I will apply on my next rebase.

							Thanx, Paul

> > ---
> >  arch/Kconfig                |  3 ++
> >  include/linux/cmpxchg-emu.h | 16 ++++++++
> >  lib/Makefile                |  1 +
> >  lib/cmpxchg-emu.c           | 74 +++++++++++++++++++++++++++++++++++++
> >  4 files changed, 94 insertions(+)
> >  create mode 100644 include/linux/cmpxchg-emu.h
> >  create mode 100644 lib/cmpxchg-emu.c
> >
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index ae4a4f37bbf08..01093c60952a5 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -1609,4 +1609,7 @@ config CC_HAS_SANE_FUNCTION_ALIGNMENT
> >         # strict alignment always, even with -falign-functions.
> >         def_bool CC_HAS_MIN_FUNCTION_ALIGNMENT || CC_IS_CLANG
> >
> > +config ARCH_NEED_CMPXCHG_1_2_EMU
> > +       bool
> > +
> >  endmenu
> > diff --git a/include/linux/cmpxchg-emu.h b/include/linux/cmpxchg-emu.h
> > new file mode 100644
> > index 0000000000000..fee8171fa05eb
> > --- /dev/null
> > +++ b/include/linux/cmpxchg-emu.h
> > @@ -0,0 +1,16 @@
> > +/* SPDX-License-Identifier: GPL-2.0+ */
> > +/*
> > + * Emulated 1-byte and 2-byte cmpxchg operations for architectures
> > + * lacking direct support for these sizes.  These are implemented in terms
> > + * of 4-byte cmpxchg operations.
> > + *
> > + * Copyright (C) 2024 Paul E. McKenney.
> > + */
> > +
> > +#ifndef __LINUX_CMPXCHG_EMU_H
> > +#define __LINUX_CMPXCHG_EMU_H
> > +
> > +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new);
> > +uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new);
> > +
> > +#endif /* __LINUX_CMPXCHG_EMU_H */
> > diff --git a/lib/Makefile b/lib/Makefile
> > index ffc6b2341b45a..1d93b61a7ecbe 100644
> > --- a/lib/Makefile
> > +++ b/lib/Makefile
> > @@ -236,6 +236,7 @@ obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
> >  lib-$(CONFIG_GENERIC_BUG) += bug.o
> >
> >  obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o
> > +obj-$(CONFIG_ARCH_NEED_CMPXCHG_1_2_EMU) += cmpxchg-emu.o
> >
> >  obj-$(CONFIG_DYNAMIC_DEBUG_CORE) += dynamic_debug.o
> >  #ensure exported functions have prototypes
> > diff --git a/lib/cmpxchg-emu.c b/lib/cmpxchg-emu.c
> > new file mode 100644
> > index 0000000000000..a88c4f3c88430
> > --- /dev/null
> > +++ b/lib/cmpxchg-emu.c
> > @@ -0,0 +1,74 @@
> > +/* SPDX-License-Identifier: GPL-2.0+ */
> > +/*
> > + * Emulated 1-byte and 2-byte cmpxchg operations for architectures
> > + * lacking direct support for these sizes.  These are implemented in terms
> > + * of 4-byte cmpxchg operations.
> > + *
> > + * Copyright (C) 2024 Paul E. McKenney.
> > + */
> > +
> > +#include <linux/types.h>
> > +#include <linux/export.h>
> > +#include <linux/instrumented.h>
> > +#include <linux/atomic.h>
> > +#include <linux/panic.h>
> > +#include <linux/bug.h>
> > +#include <asm-generic/rwonce.h>
> > +#include <linux/cmpxchg-emu.h>
> > +
> > +union u8_32 {
> > +       u8 b[4];
> > +       u32 w;
> > +};
> > +
> > +/* Emulate one-byte cmpxchg() in terms of 4-byte cmpxchg. */
> > +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new)
> > +{
> > +       u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3);
> > +       int i = ((uintptr_t)p) & 0x3;
> > +       union u8_32 old32;
> > +       union u8_32 new32;
> > +       u32 ret;
> > +
> > +       ret = READ_ONCE(*p32);
> > +       do {
> > +               old32.w = ret;
> > +               if (old32.b[i] != old)
> > +                       return old32.b[i];
> > +               new32.w = old32.w;
> > +               new32.b[i] = new;
> > +               instrument_atomic_read_write(p, 1);
> > +               ret = data_race(cmpxchg(p32, old32.w, new32.w));
> > +       } while (ret != old32.w);
> > +       return old;
> > +}
> > +EXPORT_SYMBOL_GPL(cmpxchg_emu_u8);
> > +
> > +union u16_32 {
> > +       u16 h[2];
> > +       u32 w;
> > +};
> > +
> > +/* Emulate two-byte cmpxchg() in terms of 4-byte cmpxchg. */
> > +uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new)
> > +{
> > +       u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3);
> > +       int i = (((uintptr_t)p) & 0x2) / 2;
> > +       union u16_32 old32;
> > +       union u16_32 new32;
> > +       u32 ret;
> > +
> > +       WARN_ON_ONCE(((uintptr_t)p) & 0x1);
> > +       ret = READ_ONCE(*p32);
> > +       do {
> > +               old32.w = ret;
> > +               if (old32.h[i] != old)
> > +                       return old32.h[i];
> > +               new32.w = old32.w;
> > +               new32.h[i] = new;
> > +               instrument_atomic_read_write(p, 2);
> > +               ret = data_race(cmpxchg(p32, old32.w, new32.w));
> > +       } while (ret != old32.w);
> > +       return old;
> > +}
> > +EXPORT_SYMBOL_GPL(cmpxchg_emu_u16);
> > --
> > 2.40.1
> >

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC cmpxchg 0/8] Provide emulation for one- and two-byte cmpxchg()
  2024-04-01 21:39 [PATCH RFC cmpxchg 0/8] Provide emulation for one- and two-byte cmpxchg() Paul E. McKenney
  2024-04-01 21:39 ` [PATCH RFC cmpxchg 1/8] lib: Add one-byte and two-byte cmpxchg() emulation functions Paul E. McKenney
@ 2024-04-08 17:47 ` Paul E. McKenney
  2024-04-08 17:49   ` [PATCH cmpxchg 01/14] sparc32: make __cmpxchg_u32() return u32 Paul E. McKenney
                     ` (14 more replies)
  1 sibling, 15 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-04-08 17:47 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: elver, akpm, tglx, peterz, dianders, pmladek, torvalds, Arnd Bergmann

Hello!

This series provides emulation functions for one-byte cmpxchg, and uses it
for those architectures not supporting this in hardware.  The emulation
is in terms of the fully ordered four-byte cmpxchg() that is supplied
by all of these architectures.  This was tested by making x86 forget
that it can do one-byte cmpxchg() natively:

88a9b3f7a924 ("EXP arch/x86: Test one-byte cmpxchg emulation")

This commit is local to -rcu and is of course not intended for mainline.

If accepted, RCU Tasks will use this capability in place of the current
rcu_trc_cmpxchg_need_qs() open-coding of this emulation.

1.	sparc32: make __cmpxchg_u32() return u32, courtesy of Al Viro.

2.	sparc32: make the first argument of __cmpxchg_u64() volatile u64 *,
	courtesy of Al Viro.

3.	sparc32: unify __cmpxchg_u{32,64}, courtesy of Al Viro.

4.	sparc32: add __cmpxchg_u{8,16}() and teach __cmpxchg() to handle those
	sizes, courtesy of Al Viro.

5.	parisc: __cmpxchg_u32(): lift conversion into the callers, courtesy of
	Al Viro.

6.	parisc: unify implementations of __cmpxchg_u{8,32,64}, courtesy of
	Al Viro.

7.	parisc: add missing export of __cmpxchg_u8(), courtesy of Al Viro.

8.	parisc: add u16 support to cmpxchg(), courtesy of Al Viro.

9.	lib: Add one-byte emulation function.

10.	ARC: Emulate one-byte cmpxchg.

11.	csky: Emulate one-byte cmpxchg.

12.	sh: Emulate one-byte cmpxchg.

13.	xtensa: Emulate one-byte cmpxchg.

14.	riscv: Emulate one-byte cmpxchg.

Changes since v1:

o	Add native support for sparc32 and parisc, courtesy of Al Viro.

o	Remove two-byte emulation due to architectures that still do not
	support two-byte load and store instructions, per Arnd Bergmann
	feedback.  (Yes, there are a few systems out there that do not
	even support one-byte load instructions, but these are slated
	for removal anyway.)

o	Fix numerous casting bugs spotted by kernel test robot.

o	Fix SPDX header.  "//" for .c files and "/*" for .h files.
	I am sure that there is a good reason for this.  ;-)

						Thanx, Paul

------------------------------------------------------------------------

 arch/parisc/include/asm/cmpxchg.h     |   19 +++++-------
 arch/parisc/kernel/parisc_ksyms.c     |    1 
 arch/parisc/lib/bitops.c              |   52 +++++++++++-----------------------
 arch/sparc/include/asm/cmpxchg_32.h   |   18 +++++------
 arch/sparc/lib/atomic32.c             |   47 +++++++++++++-----------------
 b/arch/Kconfig                        |    3 +
 b/arch/arc/Kconfig                    |    1 
 b/arch/arc/include/asm/cmpxchg.h      |   32 +++++++++++++++-----
 b/arch/csky/Kconfig                   |    1 
 b/arch/csky/include/asm/cmpxchg.h     |   10 ++++++
 b/arch/parisc/include/asm/cmpxchg.h   |    3 -
 b/arch/parisc/kernel/parisc_ksyms.c   |    1 
 b/arch/parisc/lib/bitops.c            |    6 +--
 b/arch/riscv/Kconfig                  |    1 
 b/arch/riscv/include/asm/cmpxchg.h    |   13 ++++++++
 b/arch/sh/Kconfig                     |    1 
 b/arch/sh/include/asm/cmpxchg.h       |    2 +
 b/arch/sparc/include/asm/cmpxchg_32.h |    4 +-
 b/arch/sparc/lib/atomic32.c           |    4 +-
 b/arch/xtensa/Kconfig                 |    1 
 b/arch/xtensa/include/asm/cmpxchg.h   |    2 +
 b/include/linux/cmpxchg-emu.h         |   15 +++++++++
 b/lib/Makefile                        |    1 
 b/lib/cmpxchg-emu.c                   |   45 +++++++++++++++++++++++++++++
 24 files changed, 184 insertions(+), 99 deletions(-)

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH cmpxchg 01/14] sparc32: make __cmpxchg_u32() return u32
  2024-04-08 17:47 ` [PATCH RFC cmpxchg 0/8] Provide emulation for one- and two-byte cmpxchg() Paul E. McKenney
@ 2024-04-08 17:49   ` Paul E. McKenney
  2024-04-08 17:49   ` [PATCH cmpxchg 02/14] sparc32: make the first argument of __cmpxchg_u64() volatile u64 * Paul E. McKenney
                     ` (13 subsequent siblings)
  14 siblings, 0 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-04-08 17:49 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: elver, akpm, tglx, peterz, dianders, pmladek, torvalds,
	Arnd Bergmann, Al Viro, Paul E . McKenney

From: Al Viro <viro@zeniv.linux.org.uk>

Conversion between u32 and unsigned long is tautological there,
and the only use of return value is to return it from
__cmpxchg() (which return unsigned long).

Get rid of explicit casts in __cmpxchg_u32() call, while we are
at it - normal conversions for arguments will do just fine.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 arch/sparc/include/asm/cmpxchg_32.h | 4 ++--
 arch/sparc/lib/atomic32.c           | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/sparc/include/asm/cmpxchg_32.h b/arch/sparc/include/asm/cmpxchg_32.h
index d0af82c240b73..2a05cb236480c 100644
--- a/arch/sparc/include/asm/cmpxchg_32.h
+++ b/arch/sparc/include/asm/cmpxchg_32.h
@@ -39,7 +39,7 @@ static __always_inline unsigned long __arch_xchg(unsigned long x, __volatile__ v
 /* bug catcher for when unsupported size is used - won't link */
 void __cmpxchg_called_with_bad_pointer(void);
 /* we only need to support cmpxchg of a u32 on sparc */
-unsigned long __cmpxchg_u32(volatile u32 *m, u32 old, u32 new_);
+u32 __cmpxchg_u32(volatile u32 *m, u32 old, u32 new_);
 
 /* don't worry...optimizer will get rid of most of this */
 static inline unsigned long
@@ -47,7 +47,7 @@ __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new_, int size)
 {
 	switch (size) {
 	case 4:
-		return __cmpxchg_u32((u32 *)ptr, (u32)old, (u32)new_);
+		return __cmpxchg_u32(ptr, old, new_);
 	default:
 		__cmpxchg_called_with_bad_pointer();
 		break;
diff --git a/arch/sparc/lib/atomic32.c b/arch/sparc/lib/atomic32.c
index cf80d1ae352be..d90d756123d81 100644
--- a/arch/sparc/lib/atomic32.c
+++ b/arch/sparc/lib/atomic32.c
@@ -159,7 +159,7 @@ unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask)
 }
 EXPORT_SYMBOL(sp32___change_bit);
 
-unsigned long __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
+u32 __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
 {
 	unsigned long flags;
 	u32 prev;
@@ -169,7 +169,7 @@ unsigned long __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
 		*ptr = new;
 	spin_unlock_irqrestore(ATOMIC_HASH(ptr), flags);
 
-	return (unsigned long)prev;
+	return prev;
 }
 EXPORT_SYMBOL(__cmpxchg_u32);
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH cmpxchg 02/14] sparc32: make the first argument of __cmpxchg_u64() volatile u64 *
  2024-04-08 17:47 ` [PATCH RFC cmpxchg 0/8] Provide emulation for one- and two-byte cmpxchg() Paul E. McKenney
  2024-04-08 17:49   ` [PATCH cmpxchg 01/14] sparc32: make __cmpxchg_u32() return u32 Paul E. McKenney
@ 2024-04-08 17:49   ` Paul E. McKenney
  2024-04-08 17:49   ` [PATCH cmpxchg 03/14] sparc32: unify __cmpxchg_u{32,64} Paul E. McKenney
                     ` (12 subsequent siblings)
  14 siblings, 0 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-04-08 17:49 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: elver, akpm, tglx, peterz, dianders, pmladek, torvalds,
	Arnd Bergmann, Al Viro, Paul E . McKenney

From: Al Viro <viro@zeniv.linux.org.uk>

... to match all cmpxchg variants.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 arch/sparc/include/asm/cmpxchg_32.h | 2 +-
 arch/sparc/lib/atomic32.c           | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/sparc/include/asm/cmpxchg_32.h b/arch/sparc/include/asm/cmpxchg_32.h
index 2a05cb236480c..05d5f86a56dc2 100644
--- a/arch/sparc/include/asm/cmpxchg_32.h
+++ b/arch/sparc/include/asm/cmpxchg_32.h
@@ -63,7 +63,7 @@ __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new_, int size)
 			(unsigned long)_n_, sizeof(*(ptr)));		\
 })
 
-u64 __cmpxchg_u64(u64 *ptr, u64 old, u64 new);
+u64 __cmpxchg_u64(volatile u64 *ptr, u64 old, u64 new);
 #define arch_cmpxchg64(ptr, old, new)	__cmpxchg_u64(ptr, old, new)
 
 #include <asm-generic/cmpxchg-local.h>
diff --git a/arch/sparc/lib/atomic32.c b/arch/sparc/lib/atomic32.c
index d90d756123d81..e15affbbb5238 100644
--- a/arch/sparc/lib/atomic32.c
+++ b/arch/sparc/lib/atomic32.c
@@ -173,7 +173,7 @@ u32 __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
 }
 EXPORT_SYMBOL(__cmpxchg_u32);
 
-u64 __cmpxchg_u64(u64 *ptr, u64 old, u64 new)
+u64 __cmpxchg_u64(volatile u64 *ptr, u64 old, u64 new)
 {
 	unsigned long flags;
 	u64 prev;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH cmpxchg 03/14] sparc32: unify __cmpxchg_u{32,64}
  2024-04-08 17:47 ` [PATCH RFC cmpxchg 0/8] Provide emulation for one- and two-byte cmpxchg() Paul E. McKenney
  2024-04-08 17:49   ` [PATCH cmpxchg 01/14] sparc32: make __cmpxchg_u32() return u32 Paul E. McKenney
  2024-04-08 17:49   ` [PATCH cmpxchg 02/14] sparc32: make the first argument of __cmpxchg_u64() volatile u64 * Paul E. McKenney
@ 2024-04-08 17:49   ` Paul E. McKenney
  2024-04-08 17:49   ` [PATCH cmpxchg 04/14] sparc32: add __cmpxchg_u{8,16}() and teach __cmpxchg() to handle those sizes Paul E. McKenney
                     ` (11 subsequent siblings)
  14 siblings, 0 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-04-08 17:49 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: elver, akpm, tglx, peterz, dianders, pmladek, torvalds,
	Arnd Bergmann, Al Viro, Paul E . McKenney

From: Al Viro <viro@zeniv.linux.org.uk>

Add a macro that expands to one of those when given u32 or u64
as an argument - atomic32.c has a lot of similar stuff already.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 arch/sparc/lib/atomic32.c | 41 +++++++++++++++------------------------
 1 file changed, 16 insertions(+), 25 deletions(-)

diff --git a/arch/sparc/lib/atomic32.c b/arch/sparc/lib/atomic32.c
index e15affbbb5238..0d215a772428e 100644
--- a/arch/sparc/lib/atomic32.c
+++ b/arch/sparc/lib/atomic32.c
@@ -159,32 +159,23 @@ unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask)
 }
 EXPORT_SYMBOL(sp32___change_bit);
 
-u32 __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
-{
-	unsigned long flags;
-	u32 prev;
-
-	spin_lock_irqsave(ATOMIC_HASH(ptr), flags);
-	if ((prev = *ptr) == old)
-		*ptr = new;
-	spin_unlock_irqrestore(ATOMIC_HASH(ptr), flags);
-
-	return prev;
-}
+#define CMPXCHG(T)						\
+	T __cmpxchg_##T(volatile T *ptr, T old, T new)		\
+	{							\
+		unsigned long flags;				\
+		T prev;						\
+								\
+		spin_lock_irqsave(ATOMIC_HASH(ptr), flags);	\
+		if ((prev = *ptr) == old)			\
+			*ptr = new;				\
+		spin_unlock_irqrestore(ATOMIC_HASH(ptr), flags);\
+								\
+		return prev;					\
+	}
+
+CMPXCHG(u32)
+CMPXCHG(u64)
 EXPORT_SYMBOL(__cmpxchg_u32);
-
-u64 __cmpxchg_u64(volatile u64 *ptr, u64 old, u64 new)
-{
-	unsigned long flags;
-	u64 prev;
-
-	spin_lock_irqsave(ATOMIC_HASH(ptr), flags);
-	if ((prev = *ptr) == old)
-		*ptr = new;
-	spin_unlock_irqrestore(ATOMIC_HASH(ptr), flags);
-
-	return prev;
-}
 EXPORT_SYMBOL(__cmpxchg_u64);
 
 unsigned long __xchg_u32(volatile u32 *ptr, u32 new)
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH cmpxchg 04/14] sparc32: add __cmpxchg_u{8,16}() and teach __cmpxchg() to handle those sizes
  2024-04-08 17:47 ` [PATCH RFC cmpxchg 0/8] Provide emulation for one- and two-byte cmpxchg() Paul E. McKenney
                     ` (2 preceding siblings ...)
  2024-04-08 17:49   ` [PATCH cmpxchg 03/14] sparc32: unify __cmpxchg_u{32,64} Paul E. McKenney
@ 2024-04-08 17:49   ` Paul E. McKenney
  2024-04-08 17:49   ` [PATCH cmpxchg 05/14] parisc: __cmpxchg_u32(): lift conversion into the callers Paul E. McKenney
                     ` (10 subsequent siblings)
  14 siblings, 0 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-04-08 17:49 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: elver, akpm, tglx, peterz, dianders, pmladek, torvalds,
	Arnd Bergmann, Al Viro, Paul E . McKenney

From: Al Viro <viro@zeniv.linux.org.uk>

trivial now

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 arch/sparc/include/asm/cmpxchg_32.h | 16 +++++++---------
 arch/sparc/lib/atomic32.c           |  4 ++++
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/arch/sparc/include/asm/cmpxchg_32.h b/arch/sparc/include/asm/cmpxchg_32.h
index 05d5f86a56dc2..8c1a3ca34eeb7 100644
--- a/arch/sparc/include/asm/cmpxchg_32.h
+++ b/arch/sparc/include/asm/cmpxchg_32.h
@@ -38,21 +38,19 @@ static __always_inline unsigned long __arch_xchg(unsigned long x, __volatile__ v
 
 /* bug catcher for when unsupported size is used - won't link */
 void __cmpxchg_called_with_bad_pointer(void);
-/* we only need to support cmpxchg of a u32 on sparc */
+u8 __cmpxchg_u8(volatile u8 *m, u8 old, u8 new_);
+u16 __cmpxchg_u16(volatile u16 *m, u16 old, u16 new_);
 u32 __cmpxchg_u32(volatile u32 *m, u32 old, u32 new_);
 
 /* don't worry...optimizer will get rid of most of this */
 static inline unsigned long
 __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new_, int size)
 {
-	switch (size) {
-	case 4:
-		return __cmpxchg_u32(ptr, old, new_);
-	default:
-		__cmpxchg_called_with_bad_pointer();
-		break;
-	}
-	return old;
+	return
+		size == 1 ? __cmpxchg_u8(ptr, old, new_) :
+		size == 2 ? __cmpxchg_u16(ptr, old, new_) :
+		size == 4 ? __cmpxchg_u32(ptr, old, new_) :
+			(__cmpxchg_called_with_bad_pointer(), old);
 }
 
 #define arch_cmpxchg(ptr, o, n)						\
diff --git a/arch/sparc/lib/atomic32.c b/arch/sparc/lib/atomic32.c
index 0d215a772428e..8ae880ebf07aa 100644
--- a/arch/sparc/lib/atomic32.c
+++ b/arch/sparc/lib/atomic32.c
@@ -173,8 +173,12 @@ EXPORT_SYMBOL(sp32___change_bit);
 		return prev;					\
 	}
 
+CMPXCHG(u8)
+CMPXCHG(u16)
 CMPXCHG(u32)
 CMPXCHG(u64)
+EXPORT_SYMBOL(__cmpxchg_u8);
+EXPORT_SYMBOL(__cmpxchg_u16);
 EXPORT_SYMBOL(__cmpxchg_u32);
 EXPORT_SYMBOL(__cmpxchg_u64);
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH cmpxchg 05/14] parisc: __cmpxchg_u32(): lift conversion into the callers
  2024-04-08 17:47 ` [PATCH RFC cmpxchg 0/8] Provide emulation for one- and two-byte cmpxchg() Paul E. McKenney
                     ` (3 preceding siblings ...)
  2024-04-08 17:49   ` [PATCH cmpxchg 04/14] sparc32: add __cmpxchg_u{8,16}() and teach __cmpxchg() to handle those sizes Paul E. McKenney
@ 2024-04-08 17:49   ` Paul E. McKenney
  2024-04-08 17:49   ` [PATCH cmpxchg 06/14] parisc: unify implementations of __cmpxchg_u{8,32,64} Paul E. McKenney
                     ` (9 subsequent siblings)
  14 siblings, 0 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-04-08 17:49 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: elver, akpm, tglx, peterz, dianders, pmladek, torvalds,
	Arnd Bergmann, Al Viro, Paul E . McKenney

From: Al Viro <viro@zeniv.linux.org.uk>

__cmpxchg_u32() return value is unsigned int explicitly cast to
unsigned long.  Both callers are returns from functions that
return unsigned long; might as well have __cmpxchg_u32()
return that unsigned int (aka u32) and let the callers convert
implicitly.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 arch/parisc/include/asm/cmpxchg.h | 3 +--
 arch/parisc/lib/bitops.c          | 6 +++---
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/parisc/include/asm/cmpxchg.h b/arch/parisc/include/asm/cmpxchg.h
index c1d776bb16b4e..0924ebc576d28 100644
--- a/arch/parisc/include/asm/cmpxchg.h
+++ b/arch/parisc/include/asm/cmpxchg.h
@@ -57,8 +57,7 @@ __arch_xchg(unsigned long x, volatile void *ptr, int size)
 extern void __cmpxchg_called_with_bad_pointer(void);
 
 /* __cmpxchg_u32/u64 defined in arch/parisc/lib/bitops.c */
-extern unsigned long __cmpxchg_u32(volatile unsigned int *m, unsigned int old,
-				   unsigned int new_);
+extern u32 __cmpxchg_u32(volatile u32 *m, u32 old, u32 new_);
 extern u64 __cmpxchg_u64(volatile u64 *ptr, u64 old, u64 new_);
 extern u8 __cmpxchg_u8(volatile u8 *ptr, u8 old, u8 new_);
 
diff --git a/arch/parisc/lib/bitops.c b/arch/parisc/lib/bitops.c
index 36a3141990746..ae2231d921985 100644
--- a/arch/parisc/lib/bitops.c
+++ b/arch/parisc/lib/bitops.c
@@ -68,16 +68,16 @@ u64 notrace __cmpxchg_u64(volatile u64 *ptr, u64 old, u64 new)
 	return prev;
 }
 
-unsigned long notrace __cmpxchg_u32(volatile unsigned int *ptr, unsigned int old, unsigned int new)
+u32 notrace __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
 {
 	unsigned long flags;
-	unsigned int prev;
+	u32 prev;
 
 	_atomic_spin_lock_irqsave(ptr, flags);
 	if ((prev = *ptr) == old)
 		*ptr = new;
 	_atomic_spin_unlock_irqrestore(ptr, flags);
-	return (unsigned long)prev;
+	return prev;
 }
 
 u8 notrace __cmpxchg_u8(volatile u8 *ptr, u8 old, u8 new)
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH cmpxchg 06/14] parisc: unify implementations of __cmpxchg_u{8,32,64}
  2024-04-08 17:47 ` [PATCH RFC cmpxchg 0/8] Provide emulation for one- and two-byte cmpxchg() Paul E. McKenney
                     ` (4 preceding siblings ...)
  2024-04-08 17:49   ` [PATCH cmpxchg 05/14] parisc: __cmpxchg_u32(): lift conversion into the callers Paul E. McKenney
@ 2024-04-08 17:49   ` Paul E. McKenney
  2024-04-08 17:49   ` [PATCH cmpxchg 07/14] parisc: add missing export of __cmpxchg_u8() Paul E. McKenney
                     ` (8 subsequent siblings)
  14 siblings, 0 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-04-08 17:49 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: elver, akpm, tglx, peterz, dianders, pmladek, torvalds,
	Arnd Bergmann, Al Viro, Paul E . McKenney

From: Al Viro <viro@zeniv.linux.org.uk>

identical except for type name involved

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 arch/parisc/lib/bitops.c | 51 +++++++++++++---------------------------
 1 file changed, 16 insertions(+), 35 deletions(-)

diff --git a/arch/parisc/lib/bitops.c b/arch/parisc/lib/bitops.c
index ae2231d921985..cae30a3eb6d9b 100644
--- a/arch/parisc/lib/bitops.c
+++ b/arch/parisc/lib/bitops.c
@@ -56,38 +56,19 @@ unsigned long notrace __xchg8(char x, volatile char *ptr)
 }
 
 
-u64 notrace __cmpxchg_u64(volatile u64 *ptr, u64 old, u64 new)
-{
-	unsigned long flags;
-	u64 prev;
-
-	_atomic_spin_lock_irqsave(ptr, flags);
-	if ((prev = *ptr) == old)
-		*ptr = new;
-	_atomic_spin_unlock_irqrestore(ptr, flags);
-	return prev;
-}
-
-u32 notrace __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
-{
-	unsigned long flags;
-	u32 prev;
-
-	_atomic_spin_lock_irqsave(ptr, flags);
-	if ((prev = *ptr) == old)
-		*ptr = new;
-	_atomic_spin_unlock_irqrestore(ptr, flags);
-	return prev;
-}
-
-u8 notrace __cmpxchg_u8(volatile u8 *ptr, u8 old, u8 new)
-{
-	unsigned long flags;
-	u8 prev;
-
-	_atomic_spin_lock_irqsave(ptr, flags);
-	if ((prev = *ptr) == old)
-		*ptr = new;
-	_atomic_spin_unlock_irqrestore(ptr, flags);
-	return prev;
-}
+#define CMPXCHG(T)						\
+	T notrace __cmpxchg_##T(volatile T *ptr, T old, T new)	\
+	{							\
+		unsigned long flags;				\
+		T prev;						\
+								\
+		_atomic_spin_lock_irqsave(ptr, flags);		\
+		if ((prev = *ptr) == old)			\
+			*ptr = new;				\
+		_atomic_spin_unlock_irqrestore(ptr, flags);	\
+		return prev;					\
+	}
+
+CMPXCHG(u64)
+CMPXCHG(u32)
+CMPXCHG(u8)
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH cmpxchg 07/14] parisc: add missing export of __cmpxchg_u8()
  2024-04-08 17:47 ` [PATCH RFC cmpxchg 0/8] Provide emulation for one- and two-byte cmpxchg() Paul E. McKenney
                     ` (5 preceding siblings ...)
  2024-04-08 17:49   ` [PATCH cmpxchg 06/14] parisc: unify implementations of __cmpxchg_u{8,32,64} Paul E. McKenney
@ 2024-04-08 17:49   ` Paul E. McKenney
  2024-04-08 17:49   ` [PATCH cmpxchg 08/14] parisc: add u16 support to cmpxchg() Paul E. McKenney
                     ` (7 subsequent siblings)
  14 siblings, 0 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-04-08 17:49 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: elver, akpm, tglx, peterz, dianders, pmladek, torvalds,
	Arnd Bergmann, Al Viro, Paul E . McKenney

From: Al Viro <viro@zeniv.linux.org.uk>

__cmpxchg_u8() had been added (initially) for the sake of
drivers/phy/ti/phy-tusb1210.c; the thing is, that drivers is
modular, so we need an export

Fixes: b344d6a83d01 "parisc: add support for cmpxchg on u8 pointers"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 arch/parisc/kernel/parisc_ksyms.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/parisc/kernel/parisc_ksyms.c b/arch/parisc/kernel/parisc_ksyms.c
index 6f0c92e8149d8..dcf61cbd31470 100644
--- a/arch/parisc/kernel/parisc_ksyms.c
+++ b/arch/parisc/kernel/parisc_ksyms.c
@@ -22,6 +22,7 @@ EXPORT_SYMBOL(memset);
 #include <linux/atomic.h>
 EXPORT_SYMBOL(__xchg8);
 EXPORT_SYMBOL(__xchg32);
+EXPORT_SYMBOL(__cmpxchg_u8);
 EXPORT_SYMBOL(__cmpxchg_u32);
 EXPORT_SYMBOL(__cmpxchg_u64);
 #ifdef CONFIG_SMP
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH cmpxchg 08/14] parisc: add u16 support to cmpxchg()
  2024-04-08 17:47 ` [PATCH RFC cmpxchg 0/8] Provide emulation for one- and two-byte cmpxchg() Paul E. McKenney
                     ` (6 preceding siblings ...)
  2024-04-08 17:49   ` [PATCH cmpxchg 07/14] parisc: add missing export of __cmpxchg_u8() Paul E. McKenney
@ 2024-04-08 17:49   ` Paul E. McKenney
  2024-04-08 20:10     ` Linus Torvalds
  2024-04-08 17:49   ` [PATCH cmpxchg 09/14] lib: Add one-byte emulation function Paul E. McKenney
                     ` (6 subsequent siblings)
  14 siblings, 1 reply; 71+ messages in thread
From: Paul E. McKenney @ 2024-04-08 17:49 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: elver, akpm, tglx, peterz, dianders, pmladek, torvalds,
	Arnd Bergmann, Al Viro, Paul E . McKenney

From: Al Viro <viro@zeniv.linux.org.uk>

Add (and export) __cmpxchg_u16(), teach __cmpxchg() to use it.

And get rid of manual truncation down to u8, etc. in there - the
only reason for those is to avoid bogus warnings about constant
truncation from sparse, and those are easy to avoid by turning
that switch into conditional expression.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 arch/parisc/include/asm/cmpxchg.h | 19 +++++++++----------
 arch/parisc/kernel/parisc_ksyms.c |  1 +
 arch/parisc/lib/bitops.c          |  1 +
 3 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/arch/parisc/include/asm/cmpxchg.h b/arch/parisc/include/asm/cmpxchg.h
index 0924ebc576d28..bf0a0f1189eb2 100644
--- a/arch/parisc/include/asm/cmpxchg.h
+++ b/arch/parisc/include/asm/cmpxchg.h
@@ -56,25 +56,24 @@ __arch_xchg(unsigned long x, volatile void *ptr, int size)
 /* bug catcher for when unsupported size is used - won't link */
 extern void __cmpxchg_called_with_bad_pointer(void);
 
-/* __cmpxchg_u32/u64 defined in arch/parisc/lib/bitops.c */
+/* __cmpxchg_u... defined in arch/parisc/lib/bitops.c */
+extern u8 __cmpxchg_u8(volatile u8 *ptr, u8 old, u8 new_);
+extern u16 __cmpxchg_u16(volatile u16 *ptr, u16 old, u16 new_);
 extern u32 __cmpxchg_u32(volatile u32 *m, u32 old, u32 new_);
 extern u64 __cmpxchg_u64(volatile u64 *ptr, u64 old, u64 new_);
-extern u8 __cmpxchg_u8(volatile u8 *ptr, u8 old, u8 new_);
 
 /* don't worry...optimizer will get rid of most of this */
 static inline unsigned long
 __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new_, int size)
 {
-	switch (size) {
+	return
 #ifdef CONFIG_64BIT
-	case 8: return __cmpxchg_u64((u64 *)ptr, old, new_);
+		size == 8 ? __cmpxchg_u64(ptr, old, new_) :
 #endif
-	case 4: return __cmpxchg_u32((unsigned int *)ptr,
-				     (unsigned int)old, (unsigned int)new_);
-	case 1: return __cmpxchg_u8((u8 *)ptr, old & 0xff, new_ & 0xff);
-	}
-	__cmpxchg_called_with_bad_pointer();
-	return old;
+		size == 4 ? __cmpxchg_u32(ptr, old, new_) :
+		size == 2 ? __cmpxchg_u16(ptr, old, new_) :
+		size == 1 ? __cmpxchg_u8(ptr, old, new_) :
+			(__cmpxchg_called_with_bad_pointer(), old);
 }
 
 #define arch_cmpxchg(ptr, o, n)						 \
diff --git a/arch/parisc/kernel/parisc_ksyms.c b/arch/parisc/kernel/parisc_ksyms.c
index dcf61cbd31470..c1587aa35beb6 100644
--- a/arch/parisc/kernel/parisc_ksyms.c
+++ b/arch/parisc/kernel/parisc_ksyms.c
@@ -23,6 +23,7 @@ EXPORT_SYMBOL(memset);
 EXPORT_SYMBOL(__xchg8);
 EXPORT_SYMBOL(__xchg32);
 EXPORT_SYMBOL(__cmpxchg_u8);
+EXPORT_SYMBOL(__cmpxchg_u16);
 EXPORT_SYMBOL(__cmpxchg_u32);
 EXPORT_SYMBOL(__cmpxchg_u64);
 #ifdef CONFIG_SMP
diff --git a/arch/parisc/lib/bitops.c b/arch/parisc/lib/bitops.c
index cae30a3eb6d9b..9df8100506427 100644
--- a/arch/parisc/lib/bitops.c
+++ b/arch/parisc/lib/bitops.c
@@ -71,4 +71,5 @@ unsigned long notrace __xchg8(char x, volatile char *ptr)
 
 CMPXCHG(u64)
 CMPXCHG(u32)
+CMPXCHG(u16)
 CMPXCHG(u8)
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH cmpxchg 09/14] lib: Add one-byte emulation function
  2024-04-08 17:47 ` [PATCH RFC cmpxchg 0/8] Provide emulation for one- and two-byte cmpxchg() Paul E. McKenney
                     ` (7 preceding siblings ...)
  2024-04-08 17:49   ` [PATCH cmpxchg 08/14] parisc: add u16 support to cmpxchg() Paul E. McKenney
@ 2024-04-08 17:49   ` Paul E. McKenney
  2024-04-08 17:49   ` [PATCH cmpxchg 10/14] ARC: Emulate one-byte cmpxchg Paul E. McKenney
                     ` (5 subsequent siblings)
  14 siblings, 0 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-04-08 17:49 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: elver, akpm, tglx, peterz, dianders, pmladek, torvalds,
	Arnd Bergmann, Paul E. McKenney

Architectures are required to provide four-byte cmpxchg() and 64-bit
architectures are additionally required to provide eight-byte cmpxchg().
However, there are cases where one-byte cmpxchg() would be extremely
useful.  Therefore, provide cmpxchg_emu_u8() that emulates one-byte
cmpxchg() in terms of four-byte cmpxchg().

Note that this emulations is fully ordered, and can (for example) cause
one-byte cmpxchg_relaxed() to incur the overhead of full ordering.
If this causes problems for a given architecture, that architecture is
free to provide its own lighter-weight primitives.

[ paulmck: Apply Marco Elver feedback. ]
[ paulmck: Apply kernel test robot feedback. ]
[ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]

Link: https://lore.kernel.org/all/0733eb10-5e7a-4450-9b8a-527b97c842ff@paulmck-laptop/

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Marco Elver <elver@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: <linux-arch@vger.kernel.org>
---
 arch/Kconfig                |  3 +++
 include/linux/cmpxchg-emu.h | 15 +++++++++++++
 lib/Makefile                |  1 +
 lib/cmpxchg-emu.c           | 45 +++++++++++++++++++++++++++++++++++++
 4 files changed, 64 insertions(+)
 create mode 100644 include/linux/cmpxchg-emu.h
 create mode 100644 lib/cmpxchg-emu.c

diff --git a/arch/Kconfig b/arch/Kconfig
index 9f066785bb71d..284663392eef8 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1609,4 +1609,7 @@ config CC_HAS_SANE_FUNCTION_ALIGNMENT
 	# strict alignment always, even with -falign-functions.
 	def_bool CC_HAS_MIN_FUNCTION_ALIGNMENT || CC_IS_CLANG
 
+config ARCH_NEED_CMPXCHG_1_EMU
+	bool
+
 endmenu
diff --git a/include/linux/cmpxchg-emu.h b/include/linux/cmpxchg-emu.h
new file mode 100644
index 0000000000000..998deec67740a
--- /dev/null
+++ b/include/linux/cmpxchg-emu.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+/*
+ * Emulated 1-byte and 2-byte cmpxchg operations for architectures
+ * lacking direct support for these sizes.  These are implemented in terms
+ * of 4-byte cmpxchg operations.
+ *
+ * Copyright (C) 2024 Paul E. McKenney.
+ */
+
+#ifndef __LINUX_CMPXCHG_EMU_H
+#define __LINUX_CMPXCHG_EMU_H
+
+uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new);
+
+#endif /* __LINUX_CMPXCHG_EMU_H */
diff --git a/lib/Makefile b/lib/Makefile
index ffc6b2341b45a..cc3d52fdb477d 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -236,6 +236,7 @@ obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
 lib-$(CONFIG_GENERIC_BUG) += bug.o
 
 obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o
+obj-$(CONFIG_ARCH_NEED_CMPXCHG_1_EMU) += cmpxchg-emu.o
 
 obj-$(CONFIG_DYNAMIC_DEBUG_CORE) += dynamic_debug.o
 #ensure exported functions have prototypes
diff --git a/lib/cmpxchg-emu.c b/lib/cmpxchg-emu.c
new file mode 100644
index 0000000000000..27f6f97cb60dd
--- /dev/null
+++ b/lib/cmpxchg-emu.c
@@ -0,0 +1,45 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Emulated 1-byte cmpxchg operation for architectures lacking direct
+ * support for this size.  This is implemented in terms of 4-byte cmpxchg
+ * operations.
+ *
+ * Copyright (C) 2024 Paul E. McKenney.
+ */
+
+#include <linux/types.h>
+#include <linux/export.h>
+#include <linux/instrumented.h>
+#include <linux/atomic.h>
+#include <linux/panic.h>
+#include <linux/bug.h>
+#include <asm-generic/rwonce.h>
+#include <linux/cmpxchg-emu.h>
+
+union u8_32 {
+	u8 b[4];
+	u32 w;
+};
+
+/* Emulate one-byte cmpxchg() in terms of 4-byte cmpxchg. */
+uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new)
+{
+	u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3);
+	int i = ((uintptr_t)p) & 0x3;
+	union u8_32 old32;
+	union u8_32 new32;
+	u32 ret;
+
+	ret = READ_ONCE(*p32);
+	do {
+		old32.w = ret;
+		if (old32.b[i] != old)
+			return old32.b[i];
+		new32.w = old32.w;
+		new32.b[i] = new;
+		instrument_atomic_read_write(p, 1);
+		ret = data_race(cmpxchg(p32, old32.w, new32.w)); // Overridden above.
+	} while (ret != old32.w);
+	return old;
+}
+EXPORT_SYMBOL_GPL(cmpxchg_emu_u8);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH cmpxchg 10/14] ARC: Emulate one-byte cmpxchg
  2024-04-08 17:47 ` [PATCH RFC cmpxchg 0/8] Provide emulation for one- and two-byte cmpxchg() Paul E. McKenney
                     ` (8 preceding siblings ...)
  2024-04-08 17:49   ` [PATCH cmpxchg 09/14] lib: Add one-byte emulation function Paul E. McKenney
@ 2024-04-08 17:49   ` Paul E. McKenney
  2024-04-08 17:49   ` [PATCH cmpxchg 11/14] csky: " Paul E. McKenney
                     ` (4 subsequent siblings)
  14 siblings, 0 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-04-08 17:49 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: elver, akpm, tglx, peterz, dianders, pmladek, torvalds,
	Arnd Bergmann, Paul E. McKenney, Vineet Gupta, Andi Shyti,
	Andrzej Hajda, Palmer Dabbelt, linux-snps-arc

Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on arc.

[ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: <linux-snps-arc@lists.infradead.org>
---
 arch/arc/Kconfig               |  1 +
 arch/arc/include/asm/cmpxchg.h | 32 +++++++++++++++++++++++---------
 2 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index 99d2845f3feb9..5bf6137f0fd47 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -14,6 +14,7 @@ config ARC
 	select ARCH_HAS_SETUP_DMA_OPS
 	select ARCH_HAS_SYNC_DMA_FOR_CPU
 	select ARCH_HAS_SYNC_DMA_FOR_DEVICE
+	select ARCH_NEED_CMPXCHG_1_EMU
 	select ARCH_SUPPORTS_ATOMIC_RMW if ARC_HAS_LLSC
 	select ARCH_32BIT_OFF_T
 	select BUILDTIME_TABLE_SORT
diff --git a/arch/arc/include/asm/cmpxchg.h b/arch/arc/include/asm/cmpxchg.h
index e138fde067dea..c3833e18389f4 100644
--- a/arch/arc/include/asm/cmpxchg.h
+++ b/arch/arc/include/asm/cmpxchg.h
@@ -46,6 +46,9 @@
 	__typeof__(*(ptr)) _prev_;					\
 									\
 	switch(sizeof((_p_))) {						\
+	case 1:								\
+		_prev_ = cmpxchg_emu_u8((volatile u8 *)_p_, _o_, _n_);	\
+		break;							\
 	case 4:								\
 		_prev_ = __cmpxchg(_p_, _o_, _n_);			\
 		break;							\
@@ -65,16 +68,27 @@
 	__typeof__(*(ptr)) _prev_;					\
 	unsigned long __flags;						\
 									\
-	BUILD_BUG_ON(sizeof(_p_) != 4);					\
+	switch(sizeof((_p_))) {						\
+	case 1:								\
+		__flags = cmpxchg_emu_u8((volatile u8 *)_p_, _o_, _n_);	\
+		_prev_ = (__typeof__(*(ptr)))__flags;			\
+		break;							\
+		break;							\
+	case 4:								\
+		/*							\
+		 * spin lock/unlock provide the needed smp_mb()		\
+		 * before/after						\
+		 */							\
+		atomic_ops_lock(__flags);				\
+		_prev_ = *_p_;						\
+		if (_prev_ == _o_)					\
+			*_p_ = _n_;					\
+		atomic_ops_unlock(__flags);				\
+		break;							\
+	default:							\
+		BUILD_BUG();						\
+	}								\
 									\
-	/*								\
-	 * spin lock/unlock provide the needed smp_mb() before/after	\
-	 */								\
-	atomic_ops_lock(__flags);					\
-	_prev_ = *_p_;							\
-	if (_prev_ == _o_)						\
-		*_p_ = _n_;						\
-	atomic_ops_unlock(__flags);					\
 	_prev_;								\
 })
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH cmpxchg 11/14] csky: Emulate one-byte cmpxchg
  2024-04-08 17:47 ` [PATCH RFC cmpxchg 0/8] Provide emulation for one- and two-byte cmpxchg() Paul E. McKenney
                     ` (9 preceding siblings ...)
  2024-04-08 17:49   ` [PATCH cmpxchg 10/14] ARC: Emulate one-byte cmpxchg Paul E. McKenney
@ 2024-04-08 17:49   ` Paul E. McKenney
  2024-04-08 17:49   ` [PATCH cmpxchg 12/14] sh: " Paul E. McKenney
                     ` (3 subsequent siblings)
  14 siblings, 0 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-04-08 17:49 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: elver, akpm, tglx, peterz, dianders, pmladek, torvalds,
	Arnd Bergmann, Paul E. McKenney, Yujie Liu, Guo Ren, linux-csky

Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on csky.

[ paulmck: Apply kernel test robot feedback. ]
[ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]

Co-developed-by: Yujie Liu <yujie.liu@intel.com>
Signed-off-by: Yujie Liu <yujie.liu@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Yujie Liu <yujie.liu@intel.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: <linux-csky@vger.kernel.org>
---
 arch/csky/Kconfig               |  1 +
 arch/csky/include/asm/cmpxchg.h | 10 ++++++++++
 2 files changed, 11 insertions(+)

diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig
index d3ac36751ad1f..5479707eb5d10 100644
--- a/arch/csky/Kconfig
+++ b/arch/csky/Kconfig
@@ -37,6 +37,7 @@ config CSKY
 	select ARCH_INLINE_SPIN_UNLOCK_BH if !PREEMPTION
 	select ARCH_INLINE_SPIN_UNLOCK_IRQ if !PREEMPTION
 	select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE if !PREEMPTION
+	select ARCH_NEED_CMPXCHG_1_EMU
 	select ARCH_WANT_FRAME_POINTERS if !CPU_CK610 && $(cc-option,-mbacktrace)
 	select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
 	select COMMON_CLK
diff --git a/arch/csky/include/asm/cmpxchg.h b/arch/csky/include/asm/cmpxchg.h
index 916043b845f14..db6dda47184e4 100644
--- a/arch/csky/include/asm/cmpxchg.h
+++ b/arch/csky/include/asm/cmpxchg.h
@@ -6,6 +6,7 @@
 #ifdef CONFIG_SMP
 #include <linux/bug.h>
 #include <asm/barrier.h>
+#include <linux/cmpxchg-emu.h>
 
 #define __xchg_relaxed(new, ptr, size)				\
 ({								\
@@ -61,6 +62,9 @@
 	__typeof__(old) __old = (old);				\
 	__typeof__(*(ptr)) __ret;				\
 	switch (size) {						\
+	case 1:							\
+		__ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, (uintptr_t)__old, (uintptr_t)__new); \
+		break;						\
 	case 4:							\
 		asm volatile (					\
 		"1:	ldex.w		%0, (%3) \n"		\
@@ -91,6 +95,9 @@
 	__typeof__(old) __old = (old);				\
 	__typeof__(*(ptr)) __ret;				\
 	switch (size) {						\
+	case 1:							\
+		__ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, (uintptr_t)__old, (uintptr_t)__new); \
+		break;						\
 	case 4:							\
 		asm volatile (					\
 		"1:	ldex.w		%0, (%3) \n"		\
@@ -122,6 +129,9 @@
 	__typeof__(old) __old = (old);				\
 	__typeof__(*(ptr)) __ret;				\
 	switch (size) {						\
+	case 1:							\
+		__ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, (uintptr_t)__old, (uintptr_t)__new); \
+		break;						\
 	case 4:							\
 		asm volatile (					\
 		RELEASE_FENCE					\
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH cmpxchg 12/14] sh: Emulate one-byte cmpxchg
  2024-04-08 17:47 ` [PATCH RFC cmpxchg 0/8] Provide emulation for one- and two-byte cmpxchg() Paul E. McKenney
                     ` (10 preceding siblings ...)
  2024-04-08 17:49   ` [PATCH cmpxchg 11/14] csky: " Paul E. McKenney
@ 2024-04-08 17:49   ` Paul E. McKenney
  2024-04-18  8:04     ` Geert Uytterhoeven
  2024-04-08 17:49   ` [PATCH cmpxchg 13/14] xtensa: " Paul E. McKenney
                     ` (2 subsequent siblings)
  14 siblings, 1 reply; 71+ messages in thread
From: Paul E. McKenney @ 2024-04-08 17:49 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: elver, akpm, tglx, peterz, dianders, pmladek, torvalds,
	Arnd Bergmann, Paul E. McKenney, Andi Shyti, Palmer Dabbelt,
	Masami Hiramatsu, linux-sh

Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on sh.

[ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: <linux-sh@vger.kernel.org>
---
 arch/sh/Kconfig               | 1 +
 arch/sh/include/asm/cmpxchg.h | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index 2ad3e29f0ebec..f47e9ccf4efd2 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -16,6 +16,7 @@ config SUPERH
 	select ARCH_HIBERNATION_POSSIBLE if MMU
 	select ARCH_MIGHT_HAVE_PC_PARPORT
 	select ARCH_WANT_IPC_PARSE_VERSION
+	select ARCH_NEED_CMPXCHG_1_EMU
 	select CPU_NO_EFFICIENT_FFS
 	select DMA_DECLARE_COHERENT
 	select GENERIC_ATOMIC64
diff --git a/arch/sh/include/asm/cmpxchg.h b/arch/sh/include/asm/cmpxchg.h
index 5d617b3ef78f7..27a9040983cfe 100644
--- a/arch/sh/include/asm/cmpxchg.h
+++ b/arch/sh/include/asm/cmpxchg.h
@@ -56,6 +56,8 @@ static inline unsigned long __cmpxchg(volatile void * ptr, unsigned long old,
 		unsigned long new, int size)
 {
 	switch (size) {
+	case 1:
+		return cmpxchg_emu_u8((volatile u8 *)ptr, old, new);
 	case 4:
 		return __cmpxchg_u32(ptr, old, new);
 	}
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH cmpxchg 13/14] xtensa: Emulate one-byte cmpxchg
  2024-04-08 17:47 ` [PATCH RFC cmpxchg 0/8] Provide emulation for one- and two-byte cmpxchg() Paul E. McKenney
                     ` (11 preceding siblings ...)
  2024-04-08 17:49   ` [PATCH cmpxchg 12/14] sh: " Paul E. McKenney
@ 2024-04-08 17:49   ` Paul E. McKenney
  2024-04-18  8:06     ` Geert Uytterhoeven
  2024-04-08 17:49   ` [PATCH cmpxchg 14/14] riscv: " Paul E. McKenney
  2024-05-01 22:58   ` [PATCH v2 cmpxchg 0/8] Provide emulation for one--byte cmpxchg() Paul E. McKenney
  14 siblings, 1 reply; 71+ messages in thread
From: Paul E. McKenney @ 2024-04-08 17:49 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: elver, akpm, tglx, peterz, dianders, pmladek, torvalds,
	Arnd Bergmann, Paul E. McKenney, Yujie Liu, Andi Shyti,
	Geert Uytterhoeven

Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on xtensa.

[ paulmck: Apply kernel test robot feedback. ]
[ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Yujie Liu <yujie.liu@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
---
 arch/xtensa/Kconfig               | 1 +
 arch/xtensa/include/asm/cmpxchg.h | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig
index f200a4ec044e6..d3db28f2f8110 100644
--- a/arch/xtensa/Kconfig
+++ b/arch/xtensa/Kconfig
@@ -14,6 +14,7 @@ config XTENSA
 	select ARCH_HAS_DMA_SET_UNCACHED if MMU
 	select ARCH_HAS_STRNCPY_FROM_USER if !KASAN
 	select ARCH_HAS_STRNLEN_USER
+	select ARCH_NEED_CMPXCHG_1_EMU
 	select ARCH_USE_MEMTEST
 	select ARCH_USE_QUEUED_RWLOCKS
 	select ARCH_USE_QUEUED_SPINLOCKS
diff --git a/arch/xtensa/include/asm/cmpxchg.h b/arch/xtensa/include/asm/cmpxchg.h
index 675a11ea8de76..29f8f594d5592 100644
--- a/arch/xtensa/include/asm/cmpxchg.h
+++ b/arch/xtensa/include/asm/cmpxchg.h
@@ -15,6 +15,7 @@
 
 #include <linux/bits.h>
 #include <linux/stringify.h>
+#include <linux/cmpxchg-emu.h>
 
 /*
  * cmpxchg
@@ -74,6 +75,7 @@ static __inline__ unsigned long
 __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, int size)
 {
 	switch (size) {
+	case 1:  return cmpxchg_emu_u8((volatile u8 *)ptr, old, new);
 	case 4:  return __cmpxchg_u32(ptr, old, new);
 	default: __cmpxchg_called_with_bad_pointer();
 		 return old;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH cmpxchg 14/14] riscv: Emulate one-byte cmpxchg
  2024-04-08 17:47 ` [PATCH RFC cmpxchg 0/8] Provide emulation for one- and two-byte cmpxchg() Paul E. McKenney
                     ` (12 preceding siblings ...)
  2024-04-08 17:49   ` [PATCH cmpxchg 13/14] xtensa: " Paul E. McKenney
@ 2024-04-08 17:49   ` Paul E. McKenney
  2024-04-09 17:35     ` Andrea Parri
  2024-05-01 22:58   ` [PATCH v2 cmpxchg 0/8] Provide emulation for one--byte cmpxchg() Paul E. McKenney
  14 siblings, 1 reply; 71+ messages in thread
From: Paul E. McKenney @ 2024-04-08 17:49 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: elver, akpm, tglx, peterz, dianders, pmladek, torvalds,
	Arnd Bergmann, Paul E. McKenney, Yujie Liu, Andi Shyti,
	Andrzej Hajda, linux-riscv, Palmer Dabbelt

Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on riscv.

[ paulmck: Apply kernel test robot feedback. ]
[ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Yujie Liu <yujie.liu@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: <linux-riscv@lists.infradead.org>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
---
 arch/riscv/Kconfig               |  1 +
 arch/riscv/include/asm/cmpxchg.h | 13 +++++++++++++
 2 files changed, 14 insertions(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index be09c8836d56b..3bab9c5c0f465 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -44,6 +44,7 @@ config RISCV
 	select ARCH_HAS_UBSAN
 	select ARCH_HAS_VDSO_DATA
 	select ARCH_KEEP_MEMBLOCK if ACPI
+	select ARCH_NEED_CMPXCHG_1_EMU
 	select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
 	select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT
 	select ARCH_STACKWALK
diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index 2fee65cc84432..abcd5543b861b 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -9,6 +9,7 @@
 #include <linux/bug.h>
 
 #include <asm/fence.h>
+#include <linux/cmpxchg-emu.h>
 
 #define __xchg_relaxed(ptr, new, size)					\
 ({									\
@@ -170,6 +171,9 @@
 	__typeof__(*(ptr)) __ret;					\
 	register unsigned int __rc;					\
 	switch (size) {							\
+	case 1:								\
+		__ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, (uintptr_t)__old, (uintptr_t)__new); \
+		break;							\
 	case 4:								\
 		__asm__ __volatile__ (					\
 			"0:	lr.w %0, %2\n"				\
@@ -214,6 +218,9 @@
 	__typeof__(*(ptr)) __ret;					\
 	register unsigned int __rc;					\
 	switch (size) {							\
+	case 1:								\
+		__ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, __old, __new); \
+		break;							\
 	case 4:								\
 		__asm__ __volatile__ (					\
 			"0:	lr.w %0, %2\n"				\
@@ -260,6 +267,9 @@
 	__typeof__(*(ptr)) __ret;					\
 	register unsigned int __rc;					\
 	switch (size) {							\
+	case 1:								\
+		__ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, __old, __new); \
+		break;							\
 	case 4:								\
 		__asm__ __volatile__ (					\
 			RISCV_RELEASE_BARRIER				\
@@ -306,6 +316,9 @@
 	__typeof__(*(ptr)) __ret;					\
 	register unsigned int __rc;					\
 	switch (size) {							\
+	case 1:								\
+		__ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, __old, __new); \
+		break;							\
 	case 4:								\
 		__asm__ __volatile__ (					\
 			"0:	lr.w %0, %2\n"				\
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* Re: [PATCH cmpxchg 08/14] parisc: add u16 support to cmpxchg()
  2024-04-08 17:49   ` [PATCH cmpxchg 08/14] parisc: add u16 support to cmpxchg() Paul E. McKenney
@ 2024-04-08 20:10     ` Linus Torvalds
  2024-04-08 20:53       ` Paul E. McKenney
  0 siblings, 1 reply; 71+ messages in thread
From: Linus Torvalds @ 2024-04-08 20:10 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-arch, linux-kernel, elver, akpm, tglx, peterz, dianders,
	pmladek, Arnd Bergmann, Al Viro

On Mon, 8 Apr 2024 at 10:50, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> And get rid of manual truncation down to u8, etc. in there - the
> only reason for those is to avoid bogus warnings about constant
> truncation from sparse, and those are easy to avoid by turning
> that switch into conditional expression.

I support the use of the conditional, but why add the 16-bit case when
it turns out we don't want it after all?

                 Linus

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH cmpxchg 08/14] parisc: add u16 support to cmpxchg()
  2024-04-08 20:10     ` Linus Torvalds
@ 2024-04-08 20:53       ` Paul E. McKenney
  0 siblings, 0 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-04-08 20:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-arch, linux-kernel, elver, akpm, tglx, peterz, dianders,
	pmladek, Arnd Bergmann, Al Viro

On Mon, Apr 08, 2024 at 01:10:40PM -0700, Linus Torvalds wrote:
> On Mon, 8 Apr 2024 at 10:50, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > And get rid of manual truncation down to u8, etc. in there - the
> > only reason for those is to avoid bogus warnings about constant
> > truncation from sparse, and those are easy to avoid by turning
> > that switch into conditional expression.
> 
> I support the use of the conditional, but why add the 16-bit case when
> it turns out we don't want it after all?

You are quite right that we do not want it for emulation.  However, this
commit is providing native parisc support for the full set of cases,
just like x86 already does.

Plus this native parisc/sparc32 support is harmless.  If someone adds a
16-bit cmpxchg() in core code, which (as you say) is a bug given that
some systems do not support 16-bit loads and stores, then kernel test
robot builds of arc, csky, sh, xtensa, and riscv will complain bitterly.

Plus I hope that ongoing removal of support for antique systems will allow
us to support 16-bit cmpxchg() in core code sooner rather than later.
(Hey, I can dream, can't I?)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH cmpxchg 14/14] riscv: Emulate one-byte cmpxchg
  2024-04-08 17:49   ` [PATCH cmpxchg 14/14] riscv: " Paul E. McKenney
@ 2024-04-09 17:35     ` Andrea Parri
  2024-04-09 18:08       ` Paul E. McKenney
  0 siblings, 1 reply; 71+ messages in thread
From: Andrea Parri @ 2024-04-09 17:35 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-arch, linux-kernel, elver, akpm, tglx, peterz, dianders,
	pmladek, torvalds, Arnd Bergmann, Yujie Liu, Andi Shyti,
	Andrzej Hajda, linux-riscv, Palmer Dabbelt

Hi Paul,

> @@ -170,6 +171,9 @@
>  	__typeof__(*(ptr)) __ret;					\
>  	register unsigned int __rc;					\
>  	switch (size) {							\
> +	case 1:								\
> +		__ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, (uintptr_t)__old, (uintptr_t)__new); \
> +		break;							\
>  	case 4:								\
>  		__asm__ __volatile__ (					\
>  			"0:	lr.w %0, %2\n"				\
> @@ -214,6 +218,9 @@
>  	__typeof__(*(ptr)) __ret;					\
>  	register unsigned int __rc;					\
>  	switch (size) {							\
> +	case 1:								\
> +		__ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, __old, __new); \
> +		break;							\
>  	case 4:								\
>  		__asm__ __volatile__ (					\
>  			"0:	lr.w %0, %2\n"				\
> @@ -260,6 +267,9 @@
>  	__typeof__(*(ptr)) __ret;					\
>  	register unsigned int __rc;					\
>  	switch (size) {							\
> +	case 1:								\
> +		__ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, __old, __new); \
> +		break;							\
>  	case 4:								\
>  		__asm__ __volatile__ (					\
>  			RISCV_RELEASE_BARRIER				\
> @@ -306,6 +316,9 @@
>  	__typeof__(*(ptr)) __ret;					\
>  	register unsigned int __rc;					\
>  	switch (size) {							\
> +	case 1:								\
> +		__ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, __old, __new); \
> +		break;							\
>  	case 4:								\
>  		__asm__ __volatile__ (					\
>  			"0:	lr.w %0, %2\n"				\

Seems the last three are missing uintptr_t casts?

  Andrea

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH cmpxchg 14/14] riscv: Emulate one-byte cmpxchg
  2024-04-09 17:35     ` Andrea Parri
@ 2024-04-09 18:08       ` Paul E. McKenney
  0 siblings, 0 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-04-09 18:08 UTC (permalink / raw)
  To: Andrea Parri
  Cc: linux-arch, linux-kernel, elver, akpm, tglx, peterz, dianders,
	pmladek, torvalds, Arnd Bergmann, Yujie Liu, Andi Shyti,
	Andrzej Hajda, linux-riscv, Palmer Dabbelt

On Tue, Apr 09, 2024 at 07:35:39PM +0200, Andrea Parri wrote:
> Hi Paul,
> 
> > @@ -170,6 +171,9 @@
> >  	__typeof__(*(ptr)) __ret;					\
> >  	register unsigned int __rc;					\
> >  	switch (size) {							\
> > +	case 1:								\
> > +		__ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, (uintptr_t)__old, (uintptr_t)__new); \
> > +		break;							\
> >  	case 4:								\
> >  		__asm__ __volatile__ (					\
> >  			"0:	lr.w %0, %2\n"				\
> > @@ -214,6 +218,9 @@
> >  	__typeof__(*(ptr)) __ret;					\
> >  	register unsigned int __rc;					\
> >  	switch (size) {							\
> > +	case 1:								\
> > +		__ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, __old, __new); \
> > +		break;							\
> >  	case 4:								\
> >  		__asm__ __volatile__ (					\
> >  			"0:	lr.w %0, %2\n"				\
> > @@ -260,6 +267,9 @@
> >  	__typeof__(*(ptr)) __ret;					\
> >  	register unsigned int __rc;					\
> >  	switch (size) {							\
> > +	case 1:								\
> > +		__ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, __old, __new); \
> > +		break;							\
> >  	case 4:								\
> >  		__asm__ __volatile__ (					\
> >  			RISCV_RELEASE_BARRIER				\
> > @@ -306,6 +316,9 @@
> >  	__typeof__(*(ptr)) __ret;					\
> >  	register unsigned int __rc;					\
> >  	switch (size) {							\
> > +	case 1:								\
> > +		__ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, __old, __new); \
> > +		break;							\
> >  	case 4:								\
> >  		__asm__ __volatile__ (					\
> >  			"0:	lr.w %0, %2\n"				\
> 
> Seems the last three are missing uintptr_t casts?

Indeed they are, and good eyes!

However, Liu, Yujie beat you to it, and this commit contains the fix:

4d5c72a34948 ("riscv: Emulate one-byte cmpxchg")

							Thanx, Paul

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH cmpxchg 12/14] sh: Emulate one-byte cmpxchg
  2024-04-08 17:49   ` [PATCH cmpxchg 12/14] sh: " Paul E. McKenney
@ 2024-04-18  8:04     ` Geert Uytterhoeven
  0 siblings, 0 replies; 71+ messages in thread
From: Geert Uytterhoeven @ 2024-04-18  8:04 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-arch, linux-kernel, elver, akpm, tglx, peterz, dianders,
	pmladek, torvalds, Arnd Bergmann, Andi Shyti, Palmer Dabbelt,
	Masami Hiramatsu, linux-sh

Hi Paul,

On Mon, Apr 8, 2024 at 7:50 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on sh.
>
> [ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]
>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Thanks for your patch!

> --- a/arch/sh/include/asm/cmpxchg.h
> +++ b/arch/sh/include/asm/cmpxchg.h
> @@ -56,6 +56,8 @@ static inline unsigned long __cmpxchg(volatile void * ptr, unsigned long old,
>                 unsigned long new, int size)
>  {
>         switch (size) {
> +       case 1:
> +               return cmpxchg_emu_u8((volatile u8 *)ptr, old, new);

The cast is not needed.

>         case 4:
>                 return __cmpxchg_u32(ptr, old, new);
>         }

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH cmpxchg 13/14] xtensa: Emulate one-byte cmpxchg
  2024-04-08 17:49   ` [PATCH cmpxchg 13/14] xtensa: " Paul E. McKenney
@ 2024-04-18  8:06     ` Geert Uytterhoeven
  2024-04-18 23:21       ` Paul E. McKenney
  0 siblings, 1 reply; 71+ messages in thread
From: Geert Uytterhoeven @ 2024-04-18  8:06 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-arch, linux-kernel, elver, akpm, tglx, peterz, dianders,
	pmladek, torvalds, Arnd Bergmann, Yujie Liu, Andi Shyti

Hi Paul,

On Mon, Apr 8, 2024 at 7:49 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on xtensa.
>
> [ paulmck: Apply kernel test robot feedback. ]
> [ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]
>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Thanks for your patch!

> --- a/arch/xtensa/include/asm/cmpxchg.h
> +++ b/arch/xtensa/include/asm/cmpxchg.h
> @@ -74,6 +75,7 @@ static __inline__ unsigned long
>  __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, int size)
>  {
>         switch (size) {
> +       case 1:  return cmpxchg_emu_u8((volatile u8 *)ptr, old, new);

The cast is not needed.

>         case 4:  return __cmpxchg_u32(ptr, old, new);
>         default: __cmpxchg_called_with_bad_pointer();
>                  return old;

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH cmpxchg 13/14] xtensa: Emulate one-byte cmpxchg
  2024-04-18  8:06     ` Geert Uytterhoeven
@ 2024-04-18 23:21       ` Paul E. McKenney
  2024-04-19  5:07         ` Yujie Liu
  0 siblings, 1 reply; 71+ messages in thread
From: Paul E. McKenney @ 2024-04-18 23:21 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: linux-arch, linux-kernel, elver, akpm, tglx, peterz, dianders,
	pmladek, torvalds, Arnd Bergmann, Yujie Liu, Andi Shyti

On Thu, Apr 18, 2024 at 10:06:21AM +0200, Geert Uytterhoeven wrote:
> Hi Paul,
> 
> On Mon, Apr 8, 2024 at 7:49 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on xtensa.
> >
> > [ paulmck: Apply kernel test robot feedback. ]
> > [ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]
> >
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> Thanks for your patch!
> 
> > --- a/arch/xtensa/include/asm/cmpxchg.h
> > +++ b/arch/xtensa/include/asm/cmpxchg.h
> > @@ -74,6 +75,7 @@ static __inline__ unsigned long
> >  __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, int size)
> >  {
> >         switch (size) {
> > +       case 1:  return cmpxchg_emu_u8((volatile u8 *)ptr, old, new);
> 
> The cast is not needed.

In both cases, kernel test robot yelled at me when it was not present.

Happy to resubmit without it, though, if that is a yell that I should
have ignored.

							Thanx, Paul

> >         case 4:  return __cmpxchg_u32(ptr, old, new);
> >         default: __cmpxchg_called_with_bad_pointer();
> >                  return old;
> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> -- 
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH cmpxchg 13/14] xtensa: Emulate one-byte cmpxchg
  2024-04-18 23:21       ` Paul E. McKenney
@ 2024-04-19  5:07         ` Yujie Liu
  2024-04-19  8:02           ` Geert Uytterhoeven
  0 siblings, 1 reply; 71+ messages in thread
From: Yujie Liu @ 2024-04-19  5:07 UTC (permalink / raw)
  To: Paul E. McKenney, Geert Uytterhoeven
  Cc: linux-arch, linux-kernel, elver, akpm, tglx, peterz, dianders,
	pmladek, torvalds, Arnd Bergmann, Andi Shyti

On Thu, Apr 18, 2024 at 04:21:46PM -0700, Paul E. McKenney wrote:
> On Thu, Apr 18, 2024 at 10:06:21AM +0200, Geert Uytterhoeven wrote:
> > Hi Paul,
> > 
> > On Mon, Apr 8, 2024 at 7:49 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on xtensa.
> > >
> > > [ paulmck: Apply kernel test robot feedback. ]
> > > [ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]
> > >
> > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > 
> > Thanks for your patch!
> > 
> > > --- a/arch/xtensa/include/asm/cmpxchg.h
> > > +++ b/arch/xtensa/include/asm/cmpxchg.h
> > > @@ -74,6 +75,7 @@ static __inline__ unsigned long
> > >  __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, int size)
> > >  {
> > >         switch (size) {
> > > +       case 1:  return cmpxchg_emu_u8((volatile u8 *)ptr, old, new);
> > 
> > The cast is not needed.
> 
> In both cases, kernel test robot yelled at me when it was not present.
> 
> Happy to resubmit without it, though, if that is a yell that I should
> have ignored.

FYI, kernel test robot did yell some reports on various architectures such as:

[1] https://lore.kernel.org/oe-kbuild-all/202403292321.T55etywH-lkp@intel.com/
[2] https://lore.kernel.org/oe-kbuild-all/202404040526.GVzaL2io-lkp@intel.com/
[3] https://lore.kernel.org/oe-kbuild-all/202404022106.mYwpypit-lkp@intel.com/

In brief, there were mainly three types of issues:

* The cmpxchg-emu.h header is missing
* The parameters of cmpxchg_emu_u8 need to be cast to corresponding types
* The return value of cmpxchg_emu_u8 needs to be cast to the "ret" type

As for this specific case of xtensa arch, the compiler doesn't warn
regardless of whether there is an explicit cast for "ptr" or not.
The "ptr" being passed in is "void *", and it seems that a "void *"
pointer can be automatically cast to any other type of pointer, so it
is not necessary to have an explicit cast of "u8 *".

As for the implementations of other architectures that don't pass the
"ptr" as "void *" (such as a macro implementation), the explicit cast to
"u8 *" may still be required.

Thanks,
Yujie

> 
> 							Thanx, Paul
> 
> > >         case 4:  return __cmpxchg_u32(ptr, old, new);
> > >         default: __cmpxchg_called_with_bad_pointer();
> > >                  return old;
> > 
> > Gr{oetje,eeting}s,
> > 
> >                         Geert
> > 
> > -- 
> > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> > 
> > In personal conversations with technical people, I call myself a hacker. But
> > when I'm talking to journalists I just say "programmer" or something like that.
> >                                 -- Linus Torvalds

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH cmpxchg 13/14] xtensa: Emulate one-byte cmpxchg
  2024-04-19  5:07         ` Yujie Liu
@ 2024-04-19  8:02           ` Geert Uytterhoeven
  2024-04-20 14:03             ` Paul E. McKenney
  0 siblings, 1 reply; 71+ messages in thread
From: Geert Uytterhoeven @ 2024-04-19  8:02 UTC (permalink / raw)
  To: Yujie Liu
  Cc: Paul E. McKenney, linux-arch, linux-kernel, elver, akpm, tglx,
	peterz, dianders, pmladek, torvalds, Arnd Bergmann, Andi Shyti

On Fri, Apr 19, 2024 at 7:14 AM Yujie Liu <yujie.liu@intel.com> wrote:
> On Thu, Apr 18, 2024 at 04:21:46PM -0700, Paul E. McKenney wrote:
> > On Thu, Apr 18, 2024 at 10:06:21AM +0200, Geert Uytterhoeven wrote:
> > > On Mon, Apr 8, 2024 at 7:49 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on xtensa.
> > > >
> > > > [ paulmck: Apply kernel test robot feedback. ]
> > > > [ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]
> > > >
> > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > >
> > > Thanks for your patch!
> > >
> > > > --- a/arch/xtensa/include/asm/cmpxchg.h
> > > > +++ b/arch/xtensa/include/asm/cmpxchg.h
> > > > @@ -74,6 +75,7 @@ static __inline__ unsigned long
> > > >  __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, int size)
> > > >  {
> > > >         switch (size) {
> > > > +       case 1:  return cmpxchg_emu_u8((volatile u8 *)ptr, old, new);
> > >
> > > The cast is not needed.
> >
> > In both cases, kernel test robot yelled at me when it was not present.
> >
> > Happy to resubmit without it, though, if that is a yell that I should
> > have ignored.
>
> FYI, kernel test robot did yell some reports on various architectures such as:
>
> [1] https://lore.kernel.org/oe-kbuild-all/202403292321.T55etywH-lkp@intel.com/
> [2] https://lore.kernel.org/oe-kbuild-all/202404040526.GVzaL2io-lkp@intel.com/
> [3] https://lore.kernel.org/oe-kbuild-all/202404022106.mYwpypit-lkp@intel.com/
>
> In brief, there were mainly three types of issues:
>
> * The cmpxchg-emu.h header is missing
> * The parameters of cmpxchg_emu_u8 need to be cast to corresponding types
> * The return value of cmpxchg_emu_u8 needs to be cast to the "ret" type
>
> As for this specific case of xtensa arch, the compiler doesn't warn
> regardless of whether there is an explicit cast for "ptr" or not.
> The "ptr" being passed in is "void *", and it seems that a "void *"
> pointer can be automatically cast to any other type of pointer, so it
> is not necessary to have an explicit cast of "u8 *".
>
> As for the implementations of other architectures that don't pass the
> "ptr" as "void *" (such as a macro implementation), the explicit cast to
> "u8 *" may still be required.

Exactly.  On sh and xtensa, the original pointer is of type
"volatile void *", so no cast is needed.
On E.g. arc, the original pointer is of type "volatile __typeof__(ptr) _p_",
which is not always compatible with "volatile u8 *".

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH cmpxchg 13/14] xtensa: Emulate one-byte cmpxchg
  2024-04-19  8:02           ` Geert Uytterhoeven
@ 2024-04-20 14:03             ` Paul E. McKenney
  0 siblings, 0 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-04-20 14:03 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Yujie Liu, linux-arch, linux-kernel, elver, akpm, tglx, peterz,
	dianders, pmladek, torvalds, Arnd Bergmann, Andi Shyti

On Fri, Apr 19, 2024 at 10:02:47AM +0200, Geert Uytterhoeven wrote:
> On Fri, Apr 19, 2024 at 7:14 AM Yujie Liu <yujie.liu@intel.com> wrote:
> > On Thu, Apr 18, 2024 at 04:21:46PM -0700, Paul E. McKenney wrote:
> > > On Thu, Apr 18, 2024 at 10:06:21AM +0200, Geert Uytterhoeven wrote:
> > > > On Mon, Apr 8, 2024 at 7:49 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on xtensa.
> > > > >
> > > > > [ paulmck: Apply kernel test robot feedback. ]
> > > > > [ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]
> > > > >
> > > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > >
> > > > Thanks for your patch!
> > > >
> > > > > --- a/arch/xtensa/include/asm/cmpxchg.h
> > > > > +++ b/arch/xtensa/include/asm/cmpxchg.h
> > > > > @@ -74,6 +75,7 @@ static __inline__ unsigned long
> > > > >  __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, int size)
> > > > >  {
> > > > >         switch (size) {
> > > > > +       case 1:  return cmpxchg_emu_u8((volatile u8 *)ptr, old, new);
> > > >
> > > > The cast is not needed.
> > >
> > > In both cases, kernel test robot yelled at me when it was not present.
> > >
> > > Happy to resubmit without it, though, if that is a yell that I should
> > > have ignored.
> >
> > FYI, kernel test robot did yell some reports on various architectures such as:
> >
> > [1] https://lore.kernel.org/oe-kbuild-all/202403292321.T55etywH-lkp@intel.com/
> > [2] https://lore.kernel.org/oe-kbuild-all/202404040526.GVzaL2io-lkp@intel.com/
> > [3] https://lore.kernel.org/oe-kbuild-all/202404022106.mYwpypit-lkp@intel.com/
> >
> > In brief, there were mainly three types of issues:
> >
> > * The cmpxchg-emu.h header is missing
> > * The parameters of cmpxchg_emu_u8 need to be cast to corresponding types
> > * The return value of cmpxchg_emu_u8 needs to be cast to the "ret" type
> >
> > As for this specific case of xtensa arch, the compiler doesn't warn
> > regardless of whether there is an explicit cast for "ptr" or not.
> > The "ptr" being passed in is "void *", and it seems that a "void *"
> > pointer can be automatically cast to any other type of pointer, so it
> > is not necessary to have an explicit cast of "u8 *".
> >
> > As for the implementations of other architectures that don't pass the
> > "ptr" as "void *" (such as a macro implementation), the explicit cast to
> > "u8 *" may still be required.
> 
> Exactly.  On sh and xtensa, the original pointer is of type
> "volatile void *", so no cast is needed.
> On E.g. arc, the original pointer is of type "volatile __typeof__(ptr) _p_",
> which is not always compatible with "volatile u8 *".

Very good, and thank you both!  I will spin updated patches and send
them out early this coming week.

							Thanx, Paul

> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> -- 
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH v2 cmpxchg 0/8] Provide emulation for one--byte cmpxchg()
  2024-04-08 17:47 ` [PATCH RFC cmpxchg 0/8] Provide emulation for one- and two-byte cmpxchg() Paul E. McKenney
                     ` (13 preceding siblings ...)
  2024-04-08 17:49   ` [PATCH cmpxchg 14/14] riscv: " Paul E. McKenney
@ 2024-05-01 22:58   ` Paul E. McKenney
  2024-05-01 23:01     ` [PATCH v2 cmpxchg 01/13] sparc32: make __cmpxchg_u32() return u32 Paul E. McKenney
                       ` (13 more replies)
  14 siblings, 14 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-05-01 22:58 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: elver, akpm, tglx, peterz, dianders, pmladek, torvalds, Arnd Bergmann

Hello!

This v2 series provides emulation functions for one-byte cmpxchg,
and uses it for those architectures not supporting this in hardware.
The emulation is in terms of the fully ordered four-byte cmpxchg() that
is supplied by all of these architectures.  This was tested by making
x86 forget that it can do one-byte cmpxchg() natively:

f0183ab28489 ("EXP arch/x86: Test one-byte cmpxchg emulation")

This commit is local to -rcu and is of course not intended for mainline.

If accepted, RCU Tasks will use this capability in place of the current
rcu_trc_cmpxchg_need_qs() open-coding of this emulation.

The patches are as follows:

1.	sparc32: make __cmpxchg_u32() return u32, courtesy of Al Viro.

2.	sparc32: make the first argument of __cmpxchg_u64() volatile u64 *,
	courtesy of Al Viro.

3.	sparc32: unify __cmpxchg_u{32,64}, courtesy of Al Viro.

4.	sparc32: add __cmpxchg_u{8,16}() and teach __cmpxchg() to handle those
	sizes, courtesy of Al Viro.

5.	parisc: __cmpxchg_u32(): lift conversion into the callers,
	courtesy of Al Viro.

6.	parisc: unify implementations of __cmpxchg_u{8,32,64}, courtesy of
	Al Viro.

7.	parisc: add missing export of __cmpxchg_u8(), courtesy of Al Viro.

8.	parisc: add u16 support to cmpxchg(), courtesy of Al Viro.

9.	lib: Add one-byte emulation function.

10.	ARC: Emulate one-byte cmpxchg.

11.	csky: Emulate one-byte cmpxchg.

12.	sh: Emulate one-byte cmpxchg.

13.	xtensa: Emulate one-byte cmpxchg.

Changes since v2:

o	Dropped riscv patch in favor of alternative patch that
	provides native support.

o	Fixed yet more casting bugs spotted by kernel test robot
	and by Geert Uytterhoeven.

Changes since v1:

o	Add native support for sparc32 and parisc, courtesy of Al Viro.

o	Remove two-byte emulation due to architectures that still do not
	support two-byte load and store instructions, per Arnd Bergmann
	feedback.  (Yes, there are a few systems out there that do not
	even support one-byte load instructions, but these are slated
	for removal anyway.)

o	Fix numerous casting bugs spotted by kernel test robot.

o	Fix SPDX header.  "//" for .c files and "/*" for .h files.
	I am sure that there is a good reason for this.  ;-)

						Thanx, Paul

------------------------------------------------------------------------

 arch/parisc/include/asm/cmpxchg.h     |   19 +++++-------
 arch/parisc/kernel/parisc_ksyms.c     |    1 
 arch/parisc/lib/bitops.c              |   52 +++++++++++-----------------------
 arch/sparc/include/asm/cmpxchg_32.h   |   18 +++++------
 arch/sparc/lib/atomic32.c             |   47 +++++++++++++-----------------
 b/arch/Kconfig                        |    3 +
 b/arch/arc/Kconfig                    |    1 
 b/arch/arc/include/asm/cmpxchg.h      |   33 +++++++++++++++------
 b/arch/csky/Kconfig                   |    1 
 b/arch/csky/include/asm/cmpxchg.h     |   10 ++++++
 b/arch/parisc/include/asm/cmpxchg.h   |    3 -
 b/arch/parisc/kernel/parisc_ksyms.c   |    1 
 b/arch/parisc/lib/bitops.c            |    6 +--
 b/arch/sh/Kconfig                     |    1 
 b/arch/sh/include/asm/cmpxchg.h       |    3 +
 b/arch/sparc/include/asm/cmpxchg_32.h |    4 +-
 b/arch/sparc/lib/atomic32.c           |    4 +-
 b/arch/xtensa/Kconfig                 |    1 
 b/arch/xtensa/include/asm/cmpxchg.h   |    2 +
 b/include/linux/cmpxchg-emu.h         |   15 +++++++++
 b/lib/Makefile                        |    1 
 b/lib/cmpxchg-emu.c                   |   45 +++++++++++++++++++++++++++++
 22 files changed, 172 insertions(+), 99 deletions(-)

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH v2 cmpxchg 01/13] sparc32: make __cmpxchg_u32() return u32
  2024-05-01 22:58   ` [PATCH v2 cmpxchg 0/8] Provide emulation for one--byte cmpxchg() Paul E. McKenney
@ 2024-05-01 23:01     ` Paul E. McKenney
  2024-05-01 23:01     ` [PATCH v2 cmpxchg 02/13] sparc32: make the first argument of __cmpxchg_u64() volatile u64 * Paul E. McKenney
                       ` (12 subsequent siblings)
  13 siblings, 0 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-05-01 23:01 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: elver, akpm, tglx, peterz, dianders, pmladek, arnd, torvalds,
	kernel-team, Al Viro, Paul E . McKenney

From: Al Viro <viro@zeniv.linux.org.uk>

Conversion between u32 and unsigned long is tautological there,
and the only use of return value is to return it from
__cmpxchg() (which return unsigned long).

Get rid of explicit casts in __cmpxchg_u32() call, while we are
at it - normal conversions for arguments will do just fine.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 arch/sparc/include/asm/cmpxchg_32.h | 4 ++--
 arch/sparc/lib/atomic32.c           | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/sparc/include/asm/cmpxchg_32.h b/arch/sparc/include/asm/cmpxchg_32.h
index d0af82c240b73..2a05cb236480c 100644
--- a/arch/sparc/include/asm/cmpxchg_32.h
+++ b/arch/sparc/include/asm/cmpxchg_32.h
@@ -39,7 +39,7 @@ static __always_inline unsigned long __arch_xchg(unsigned long x, __volatile__ v
 /* bug catcher for when unsupported size is used - won't link */
 void __cmpxchg_called_with_bad_pointer(void);
 /* we only need to support cmpxchg of a u32 on sparc */
-unsigned long __cmpxchg_u32(volatile u32 *m, u32 old, u32 new_);
+u32 __cmpxchg_u32(volatile u32 *m, u32 old, u32 new_);
 
 /* don't worry...optimizer will get rid of most of this */
 static inline unsigned long
@@ -47,7 +47,7 @@ __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new_, int size)
 {
 	switch (size) {
 	case 4:
-		return __cmpxchg_u32((u32 *)ptr, (u32)old, (u32)new_);
+		return __cmpxchg_u32(ptr, old, new_);
 	default:
 		__cmpxchg_called_with_bad_pointer();
 		break;
diff --git a/arch/sparc/lib/atomic32.c b/arch/sparc/lib/atomic32.c
index cf80d1ae352be..d90d756123d81 100644
--- a/arch/sparc/lib/atomic32.c
+++ b/arch/sparc/lib/atomic32.c
@@ -159,7 +159,7 @@ unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask)
 }
 EXPORT_SYMBOL(sp32___change_bit);
 
-unsigned long __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
+u32 __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
 {
 	unsigned long flags;
 	u32 prev;
@@ -169,7 +169,7 @@ unsigned long __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
 		*ptr = new;
 	spin_unlock_irqrestore(ATOMIC_HASH(ptr), flags);
 
-	return (unsigned long)prev;
+	return prev;
 }
 EXPORT_SYMBOL(__cmpxchg_u32);
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 cmpxchg 02/13] sparc32: make the first argument of __cmpxchg_u64() volatile u64 *
  2024-05-01 22:58   ` [PATCH v2 cmpxchg 0/8] Provide emulation for one--byte cmpxchg() Paul E. McKenney
  2024-05-01 23:01     ` [PATCH v2 cmpxchg 01/13] sparc32: make __cmpxchg_u32() return u32 Paul E. McKenney
@ 2024-05-01 23:01     ` Paul E. McKenney
  2024-05-01 23:01     ` [PATCH v2 cmpxchg 03/13] sparc32: unify __cmpxchg_u{32,64} Paul E. McKenney
                       ` (11 subsequent siblings)
  13 siblings, 0 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-05-01 23:01 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: elver, akpm, tglx, peterz, dianders, pmladek, arnd, torvalds,
	kernel-team, Al Viro, Paul E . McKenney

From: Al Viro <viro@zeniv.linux.org.uk>

... to match all cmpxchg variants.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 arch/sparc/include/asm/cmpxchg_32.h | 2 +-
 arch/sparc/lib/atomic32.c           | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/sparc/include/asm/cmpxchg_32.h b/arch/sparc/include/asm/cmpxchg_32.h
index 2a05cb236480c..05d5f86a56dc2 100644
--- a/arch/sparc/include/asm/cmpxchg_32.h
+++ b/arch/sparc/include/asm/cmpxchg_32.h
@@ -63,7 +63,7 @@ __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new_, int size)
 			(unsigned long)_n_, sizeof(*(ptr)));		\
 })
 
-u64 __cmpxchg_u64(u64 *ptr, u64 old, u64 new);
+u64 __cmpxchg_u64(volatile u64 *ptr, u64 old, u64 new);
 #define arch_cmpxchg64(ptr, old, new)	__cmpxchg_u64(ptr, old, new)
 
 #include <asm-generic/cmpxchg-local.h>
diff --git a/arch/sparc/lib/atomic32.c b/arch/sparc/lib/atomic32.c
index d90d756123d81..e15affbbb5238 100644
--- a/arch/sparc/lib/atomic32.c
+++ b/arch/sparc/lib/atomic32.c
@@ -173,7 +173,7 @@ u32 __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
 }
 EXPORT_SYMBOL(__cmpxchg_u32);
 
-u64 __cmpxchg_u64(u64 *ptr, u64 old, u64 new)
+u64 __cmpxchg_u64(volatile u64 *ptr, u64 old, u64 new)
 {
 	unsigned long flags;
 	u64 prev;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 cmpxchg 03/13] sparc32: unify __cmpxchg_u{32,64}
  2024-05-01 22:58   ` [PATCH v2 cmpxchg 0/8] Provide emulation for one--byte cmpxchg() Paul E. McKenney
  2024-05-01 23:01     ` [PATCH v2 cmpxchg 01/13] sparc32: make __cmpxchg_u32() return u32 Paul E. McKenney
  2024-05-01 23:01     ` [PATCH v2 cmpxchg 02/13] sparc32: make the first argument of __cmpxchg_u64() volatile u64 * Paul E. McKenney
@ 2024-05-01 23:01     ` Paul E. McKenney
  2024-05-01 23:01     ` [PATCH v2 cmpxchg 04/13] sparc32: add __cmpxchg_u{8,16}() and teach __cmpxchg() to handle those sizes Paul E. McKenney
                       ` (10 subsequent siblings)
  13 siblings, 0 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-05-01 23:01 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: elver, akpm, tglx, peterz, dianders, pmladek, arnd, torvalds,
	kernel-team, Al Viro, Paul E . McKenney

From: Al Viro <viro@zeniv.linux.org.uk>

Add a macro that expands to one of those when given u32 or u64
as an argument - atomic32.c has a lot of similar stuff already.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 arch/sparc/lib/atomic32.c | 41 +++++++++++++++------------------------
 1 file changed, 16 insertions(+), 25 deletions(-)

diff --git a/arch/sparc/lib/atomic32.c b/arch/sparc/lib/atomic32.c
index e15affbbb5238..0d215a772428e 100644
--- a/arch/sparc/lib/atomic32.c
+++ b/arch/sparc/lib/atomic32.c
@@ -159,32 +159,23 @@ unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask)
 }
 EXPORT_SYMBOL(sp32___change_bit);
 
-u32 __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
-{
-	unsigned long flags;
-	u32 prev;
-
-	spin_lock_irqsave(ATOMIC_HASH(ptr), flags);
-	if ((prev = *ptr) == old)
-		*ptr = new;
-	spin_unlock_irqrestore(ATOMIC_HASH(ptr), flags);
-
-	return prev;
-}
+#define CMPXCHG(T)						\
+	T __cmpxchg_##T(volatile T *ptr, T old, T new)		\
+	{							\
+		unsigned long flags;				\
+		T prev;						\
+								\
+		spin_lock_irqsave(ATOMIC_HASH(ptr), flags);	\
+		if ((prev = *ptr) == old)			\
+			*ptr = new;				\
+		spin_unlock_irqrestore(ATOMIC_HASH(ptr), flags);\
+								\
+		return prev;					\
+	}
+
+CMPXCHG(u32)
+CMPXCHG(u64)
 EXPORT_SYMBOL(__cmpxchg_u32);
-
-u64 __cmpxchg_u64(volatile u64 *ptr, u64 old, u64 new)
-{
-	unsigned long flags;
-	u64 prev;
-
-	spin_lock_irqsave(ATOMIC_HASH(ptr), flags);
-	if ((prev = *ptr) == old)
-		*ptr = new;
-	spin_unlock_irqrestore(ATOMIC_HASH(ptr), flags);
-
-	return prev;
-}
 EXPORT_SYMBOL(__cmpxchg_u64);
 
 unsigned long __xchg_u32(volatile u32 *ptr, u32 new)
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 cmpxchg 04/13] sparc32: add __cmpxchg_u{8,16}() and teach __cmpxchg() to handle those sizes
  2024-05-01 22:58   ` [PATCH v2 cmpxchg 0/8] Provide emulation for one--byte cmpxchg() Paul E. McKenney
                       ` (2 preceding siblings ...)
  2024-05-01 23:01     ` [PATCH v2 cmpxchg 03/13] sparc32: unify __cmpxchg_u{32,64} Paul E. McKenney
@ 2024-05-01 23:01     ` Paul E. McKenney
  2024-05-01 23:01     ` [PATCH v2 cmpxchg 05/13] parisc: __cmpxchg_u32(): lift conversion into the callers Paul E. McKenney
                       ` (9 subsequent siblings)
  13 siblings, 0 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-05-01 23:01 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: elver, akpm, tglx, peterz, dianders, pmladek, arnd, torvalds,
	kernel-team, Al Viro, Paul E . McKenney

From: Al Viro <viro@zeniv.linux.org.uk>

trivial now

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 arch/sparc/include/asm/cmpxchg_32.h | 16 +++++++---------
 arch/sparc/lib/atomic32.c           |  4 ++++
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/arch/sparc/include/asm/cmpxchg_32.h b/arch/sparc/include/asm/cmpxchg_32.h
index 05d5f86a56dc2..8c1a3ca34eeb7 100644
--- a/arch/sparc/include/asm/cmpxchg_32.h
+++ b/arch/sparc/include/asm/cmpxchg_32.h
@@ -38,21 +38,19 @@ static __always_inline unsigned long __arch_xchg(unsigned long x, __volatile__ v
 
 /* bug catcher for when unsupported size is used - won't link */
 void __cmpxchg_called_with_bad_pointer(void);
-/* we only need to support cmpxchg of a u32 on sparc */
+u8 __cmpxchg_u8(volatile u8 *m, u8 old, u8 new_);
+u16 __cmpxchg_u16(volatile u16 *m, u16 old, u16 new_);
 u32 __cmpxchg_u32(volatile u32 *m, u32 old, u32 new_);
 
 /* don't worry...optimizer will get rid of most of this */
 static inline unsigned long
 __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new_, int size)
 {
-	switch (size) {
-	case 4:
-		return __cmpxchg_u32(ptr, old, new_);
-	default:
-		__cmpxchg_called_with_bad_pointer();
-		break;
-	}
-	return old;
+	return
+		size == 1 ? __cmpxchg_u8(ptr, old, new_) :
+		size == 2 ? __cmpxchg_u16(ptr, old, new_) :
+		size == 4 ? __cmpxchg_u32(ptr, old, new_) :
+			(__cmpxchg_called_with_bad_pointer(), old);
 }
 
 #define arch_cmpxchg(ptr, o, n)						\
diff --git a/arch/sparc/lib/atomic32.c b/arch/sparc/lib/atomic32.c
index 0d215a772428e..8ae880ebf07aa 100644
--- a/arch/sparc/lib/atomic32.c
+++ b/arch/sparc/lib/atomic32.c
@@ -173,8 +173,12 @@ EXPORT_SYMBOL(sp32___change_bit);
 		return prev;					\
 	}
 
+CMPXCHG(u8)
+CMPXCHG(u16)
 CMPXCHG(u32)
 CMPXCHG(u64)
+EXPORT_SYMBOL(__cmpxchg_u8);
+EXPORT_SYMBOL(__cmpxchg_u16);
 EXPORT_SYMBOL(__cmpxchg_u32);
 EXPORT_SYMBOL(__cmpxchg_u64);
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 cmpxchg 05/13] parisc: __cmpxchg_u32(): lift conversion into the callers
  2024-05-01 22:58   ` [PATCH v2 cmpxchg 0/8] Provide emulation for one--byte cmpxchg() Paul E. McKenney
                       ` (3 preceding siblings ...)
  2024-05-01 23:01     ` [PATCH v2 cmpxchg 04/13] sparc32: add __cmpxchg_u{8,16}() and teach __cmpxchg() to handle those sizes Paul E. McKenney
@ 2024-05-01 23:01     ` Paul E. McKenney
  2024-05-01 23:01     ` [PATCH v2 cmpxchg 06/13] parisc: unify implementations of __cmpxchg_u{8,32,64} Paul E. McKenney
                       ` (8 subsequent siblings)
  13 siblings, 0 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-05-01 23:01 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: elver, akpm, tglx, peterz, dianders, pmladek, arnd, torvalds,
	kernel-team, Al Viro, Paul E . McKenney

From: Al Viro <viro@zeniv.linux.org.uk>

__cmpxchg_u32() return value is unsigned int explicitly cast to
unsigned long.  Both callers are returns from functions that
return unsigned long; might as well have __cmpxchg_u32()
return that unsigned int (aka u32) and let the callers convert
implicitly.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 arch/parisc/include/asm/cmpxchg.h | 3 +--
 arch/parisc/lib/bitops.c          | 6 +++---
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/parisc/include/asm/cmpxchg.h b/arch/parisc/include/asm/cmpxchg.h
index c1d776bb16b4e..0924ebc576d28 100644
--- a/arch/parisc/include/asm/cmpxchg.h
+++ b/arch/parisc/include/asm/cmpxchg.h
@@ -57,8 +57,7 @@ __arch_xchg(unsigned long x, volatile void *ptr, int size)
 extern void __cmpxchg_called_with_bad_pointer(void);
 
 /* __cmpxchg_u32/u64 defined in arch/parisc/lib/bitops.c */
-extern unsigned long __cmpxchg_u32(volatile unsigned int *m, unsigned int old,
-				   unsigned int new_);
+extern u32 __cmpxchg_u32(volatile u32 *m, u32 old, u32 new_);
 extern u64 __cmpxchg_u64(volatile u64 *ptr, u64 old, u64 new_);
 extern u8 __cmpxchg_u8(volatile u8 *ptr, u8 old, u8 new_);
 
diff --git a/arch/parisc/lib/bitops.c b/arch/parisc/lib/bitops.c
index 36a3141990746..ae2231d921985 100644
--- a/arch/parisc/lib/bitops.c
+++ b/arch/parisc/lib/bitops.c
@@ -68,16 +68,16 @@ u64 notrace __cmpxchg_u64(volatile u64 *ptr, u64 old, u64 new)
 	return prev;
 }
 
-unsigned long notrace __cmpxchg_u32(volatile unsigned int *ptr, unsigned int old, unsigned int new)
+u32 notrace __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
 {
 	unsigned long flags;
-	unsigned int prev;
+	u32 prev;
 
 	_atomic_spin_lock_irqsave(ptr, flags);
 	if ((prev = *ptr) == old)
 		*ptr = new;
 	_atomic_spin_unlock_irqrestore(ptr, flags);
-	return (unsigned long)prev;
+	return prev;
 }
 
 u8 notrace __cmpxchg_u8(volatile u8 *ptr, u8 old, u8 new)
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 cmpxchg 06/13] parisc: unify implementations of __cmpxchg_u{8,32,64}
  2024-05-01 22:58   ` [PATCH v2 cmpxchg 0/8] Provide emulation for one--byte cmpxchg() Paul E. McKenney
                       ` (4 preceding siblings ...)
  2024-05-01 23:01     ` [PATCH v2 cmpxchg 05/13] parisc: __cmpxchg_u32(): lift conversion into the callers Paul E. McKenney
@ 2024-05-01 23:01     ` Paul E. McKenney
  2024-05-01 23:01     ` [PATCH v2 cmpxchg 07/13] parisc: add missing export of __cmpxchg_u8() Paul E. McKenney
                       ` (7 subsequent siblings)
  13 siblings, 0 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-05-01 23:01 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: elver, akpm, tglx, peterz, dianders, pmladek, arnd, torvalds,
	kernel-team, Al Viro, Paul E . McKenney

From: Al Viro <viro@zeniv.linux.org.uk>

identical except for type name involved

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 arch/parisc/lib/bitops.c | 51 +++++++++++++---------------------------
 1 file changed, 16 insertions(+), 35 deletions(-)

diff --git a/arch/parisc/lib/bitops.c b/arch/parisc/lib/bitops.c
index ae2231d921985..cae30a3eb6d9b 100644
--- a/arch/parisc/lib/bitops.c
+++ b/arch/parisc/lib/bitops.c
@@ -56,38 +56,19 @@ unsigned long notrace __xchg8(char x, volatile char *ptr)
 }
 
 
-u64 notrace __cmpxchg_u64(volatile u64 *ptr, u64 old, u64 new)
-{
-	unsigned long flags;
-	u64 prev;
-
-	_atomic_spin_lock_irqsave(ptr, flags);
-	if ((prev = *ptr) == old)
-		*ptr = new;
-	_atomic_spin_unlock_irqrestore(ptr, flags);
-	return prev;
-}
-
-u32 notrace __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
-{
-	unsigned long flags;
-	u32 prev;
-
-	_atomic_spin_lock_irqsave(ptr, flags);
-	if ((prev = *ptr) == old)
-		*ptr = new;
-	_atomic_spin_unlock_irqrestore(ptr, flags);
-	return prev;
-}
-
-u8 notrace __cmpxchg_u8(volatile u8 *ptr, u8 old, u8 new)
-{
-	unsigned long flags;
-	u8 prev;
-
-	_atomic_spin_lock_irqsave(ptr, flags);
-	if ((prev = *ptr) == old)
-		*ptr = new;
-	_atomic_spin_unlock_irqrestore(ptr, flags);
-	return prev;
-}
+#define CMPXCHG(T)						\
+	T notrace __cmpxchg_##T(volatile T *ptr, T old, T new)	\
+	{							\
+		unsigned long flags;				\
+		T prev;						\
+								\
+		_atomic_spin_lock_irqsave(ptr, flags);		\
+		if ((prev = *ptr) == old)			\
+			*ptr = new;				\
+		_atomic_spin_unlock_irqrestore(ptr, flags);	\
+		return prev;					\
+	}
+
+CMPXCHG(u64)
+CMPXCHG(u32)
+CMPXCHG(u8)
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 cmpxchg 07/13] parisc: add missing export of __cmpxchg_u8()
  2024-05-01 22:58   ` [PATCH v2 cmpxchg 0/8] Provide emulation for one--byte cmpxchg() Paul E. McKenney
                       ` (5 preceding siblings ...)
  2024-05-01 23:01     ` [PATCH v2 cmpxchg 06/13] parisc: unify implementations of __cmpxchg_u{8,32,64} Paul E. McKenney
@ 2024-05-01 23:01     ` Paul E. McKenney
  2024-05-01 23:01     ` [PATCH v2 cmpxchg 08/13] parisc: add u16 support to cmpxchg() Paul E. McKenney
                       ` (6 subsequent siblings)
  13 siblings, 0 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-05-01 23:01 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: elver, akpm, tglx, peterz, dianders, pmladek, arnd, torvalds,
	kernel-team, Al Viro, Paul E . McKenney

From: Al Viro <viro@zeniv.linux.org.uk>

__cmpxchg_u8() had been added (initially) for the sake of
drivers/phy/ti/phy-tusb1210.c; the thing is, that drivers is
modular, so we need an export

Fixes: b344d6a83d01 "parisc: add support for cmpxchg on u8 pointers"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 arch/parisc/kernel/parisc_ksyms.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/parisc/kernel/parisc_ksyms.c b/arch/parisc/kernel/parisc_ksyms.c
index 6f0c92e8149d8..dcf61cbd31470 100644
--- a/arch/parisc/kernel/parisc_ksyms.c
+++ b/arch/parisc/kernel/parisc_ksyms.c
@@ -22,6 +22,7 @@ EXPORT_SYMBOL(memset);
 #include <linux/atomic.h>
 EXPORT_SYMBOL(__xchg8);
 EXPORT_SYMBOL(__xchg32);
+EXPORT_SYMBOL(__cmpxchg_u8);
 EXPORT_SYMBOL(__cmpxchg_u32);
 EXPORT_SYMBOL(__cmpxchg_u64);
 #ifdef CONFIG_SMP
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 cmpxchg 08/13] parisc: add u16 support to cmpxchg()
  2024-05-01 22:58   ` [PATCH v2 cmpxchg 0/8] Provide emulation for one--byte cmpxchg() Paul E. McKenney
                       ` (6 preceding siblings ...)
  2024-05-01 23:01     ` [PATCH v2 cmpxchg 07/13] parisc: add missing export of __cmpxchg_u8() Paul E. McKenney
@ 2024-05-01 23:01     ` Paul E. McKenney
  2024-05-01 23:01     ` [PATCH v2 cmpxchg 09/13] lib: Add one-byte emulation function Paul E. McKenney
                       ` (5 subsequent siblings)
  13 siblings, 0 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-05-01 23:01 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: elver, akpm, tglx, peterz, dianders, pmladek, arnd, torvalds,
	kernel-team, Al Viro, Paul E . McKenney

From: Al Viro <viro@zeniv.linux.org.uk>

Add (and export) __cmpxchg_u16(), teach __cmpxchg() to use it.

And get rid of manual truncation down to u8, etc. in there - the
only reason for those is to avoid bogus warnings about constant
truncation from sparse, and those are easy to avoid by turning
that switch into conditional expression.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 arch/parisc/include/asm/cmpxchg.h | 19 +++++++++----------
 arch/parisc/kernel/parisc_ksyms.c |  1 +
 arch/parisc/lib/bitops.c          |  1 +
 3 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/arch/parisc/include/asm/cmpxchg.h b/arch/parisc/include/asm/cmpxchg.h
index 0924ebc576d28..bf0a0f1189eb2 100644
--- a/arch/parisc/include/asm/cmpxchg.h
+++ b/arch/parisc/include/asm/cmpxchg.h
@@ -56,25 +56,24 @@ __arch_xchg(unsigned long x, volatile void *ptr, int size)
 /* bug catcher for when unsupported size is used - won't link */
 extern void __cmpxchg_called_with_bad_pointer(void);
 
-/* __cmpxchg_u32/u64 defined in arch/parisc/lib/bitops.c */
+/* __cmpxchg_u... defined in arch/parisc/lib/bitops.c */
+extern u8 __cmpxchg_u8(volatile u8 *ptr, u8 old, u8 new_);
+extern u16 __cmpxchg_u16(volatile u16 *ptr, u16 old, u16 new_);
 extern u32 __cmpxchg_u32(volatile u32 *m, u32 old, u32 new_);
 extern u64 __cmpxchg_u64(volatile u64 *ptr, u64 old, u64 new_);
-extern u8 __cmpxchg_u8(volatile u8 *ptr, u8 old, u8 new_);
 
 /* don't worry...optimizer will get rid of most of this */
 static inline unsigned long
 __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new_, int size)
 {
-	switch (size) {
+	return
 #ifdef CONFIG_64BIT
-	case 8: return __cmpxchg_u64((u64 *)ptr, old, new_);
+		size == 8 ? __cmpxchg_u64(ptr, old, new_) :
 #endif
-	case 4: return __cmpxchg_u32((unsigned int *)ptr,
-				     (unsigned int)old, (unsigned int)new_);
-	case 1: return __cmpxchg_u8((u8 *)ptr, old & 0xff, new_ & 0xff);
-	}
-	__cmpxchg_called_with_bad_pointer();
-	return old;
+		size == 4 ? __cmpxchg_u32(ptr, old, new_) :
+		size == 2 ? __cmpxchg_u16(ptr, old, new_) :
+		size == 1 ? __cmpxchg_u8(ptr, old, new_) :
+			(__cmpxchg_called_with_bad_pointer(), old);
 }
 
 #define arch_cmpxchg(ptr, o, n)						 \
diff --git a/arch/parisc/kernel/parisc_ksyms.c b/arch/parisc/kernel/parisc_ksyms.c
index dcf61cbd31470..c1587aa35beb6 100644
--- a/arch/parisc/kernel/parisc_ksyms.c
+++ b/arch/parisc/kernel/parisc_ksyms.c
@@ -23,6 +23,7 @@ EXPORT_SYMBOL(memset);
 EXPORT_SYMBOL(__xchg8);
 EXPORT_SYMBOL(__xchg32);
 EXPORT_SYMBOL(__cmpxchg_u8);
+EXPORT_SYMBOL(__cmpxchg_u16);
 EXPORT_SYMBOL(__cmpxchg_u32);
 EXPORT_SYMBOL(__cmpxchg_u64);
 #ifdef CONFIG_SMP
diff --git a/arch/parisc/lib/bitops.c b/arch/parisc/lib/bitops.c
index cae30a3eb6d9b..9df8100506427 100644
--- a/arch/parisc/lib/bitops.c
+++ b/arch/parisc/lib/bitops.c
@@ -71,4 +71,5 @@ unsigned long notrace __xchg8(char x, volatile char *ptr)
 
 CMPXCHG(u64)
 CMPXCHG(u32)
+CMPXCHG(u16)
 CMPXCHG(u8)
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 cmpxchg 09/13] lib: Add one-byte emulation function
  2024-05-01 22:58   ` [PATCH v2 cmpxchg 0/8] Provide emulation for one--byte cmpxchg() Paul E. McKenney
                       ` (7 preceding siblings ...)
  2024-05-01 23:01     ` [PATCH v2 cmpxchg 08/13] parisc: add u16 support to cmpxchg() Paul E. McKenney
@ 2024-05-01 23:01     ` Paul E. McKenney
  2024-05-13 14:44       ` Boqun Feng
  2024-05-01 23:01     ` [PATCH v2 cmpxchg 10/13] ARC: Emulate one-byte cmpxchg Paul E. McKenney
                       ` (4 subsequent siblings)
  13 siblings, 1 reply; 71+ messages in thread
From: Paul E. McKenney @ 2024-05-01 23:01 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: elver, akpm, tglx, peterz, dianders, pmladek, arnd, torvalds,
	kernel-team, Paul E. McKenney

Architectures are required to provide four-byte cmpxchg() and 64-bit
architectures are additionally required to provide eight-byte cmpxchg().
However, there are cases where one-byte cmpxchg() would be extremely
useful.  Therefore, provide cmpxchg_emu_u8() that emulates one-byte
cmpxchg() in terms of four-byte cmpxchg().

Note that this emulations is fully ordered, and can (for example) cause
one-byte cmpxchg_relaxed() to incur the overhead of full ordering.
If this causes problems for a given architecture, that architecture is
free to provide its own lighter-weight primitives.

[ paulmck: Apply Marco Elver feedback. ]
[ paulmck: Apply kernel test robot feedback. ]
[ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]

Link: https://lore.kernel.org/all/0733eb10-5e7a-4450-9b8a-527b97c842ff@paulmck-laptop/

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Marco Elver <elver@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: <linux-arch@vger.kernel.org>
---
 arch/Kconfig                |  3 +++
 include/linux/cmpxchg-emu.h | 15 +++++++++++++
 lib/Makefile                |  1 +
 lib/cmpxchg-emu.c           | 45 +++++++++++++++++++++++++++++++++++++
 4 files changed, 64 insertions(+)
 create mode 100644 include/linux/cmpxchg-emu.h
 create mode 100644 lib/cmpxchg-emu.c

diff --git a/arch/Kconfig b/arch/Kconfig
index 9f066785bb71d..284663392eef8 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1609,4 +1609,7 @@ config CC_HAS_SANE_FUNCTION_ALIGNMENT
 	# strict alignment always, even with -falign-functions.
 	def_bool CC_HAS_MIN_FUNCTION_ALIGNMENT || CC_IS_CLANG
 
+config ARCH_NEED_CMPXCHG_1_EMU
+	bool
+
 endmenu
diff --git a/include/linux/cmpxchg-emu.h b/include/linux/cmpxchg-emu.h
new file mode 100644
index 0000000000000..998deec67740a
--- /dev/null
+++ b/include/linux/cmpxchg-emu.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+/*
+ * Emulated 1-byte and 2-byte cmpxchg operations for architectures
+ * lacking direct support for these sizes.  These are implemented in terms
+ * of 4-byte cmpxchg operations.
+ *
+ * Copyright (C) 2024 Paul E. McKenney.
+ */
+
+#ifndef __LINUX_CMPXCHG_EMU_H
+#define __LINUX_CMPXCHG_EMU_H
+
+uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new);
+
+#endif /* __LINUX_CMPXCHG_EMU_H */
diff --git a/lib/Makefile b/lib/Makefile
index ffc6b2341b45a..cc3d52fdb477d 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -236,6 +236,7 @@ obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
 lib-$(CONFIG_GENERIC_BUG) += bug.o
 
 obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o
+obj-$(CONFIG_ARCH_NEED_CMPXCHG_1_EMU) += cmpxchg-emu.o
 
 obj-$(CONFIG_DYNAMIC_DEBUG_CORE) += dynamic_debug.o
 #ensure exported functions have prototypes
diff --git a/lib/cmpxchg-emu.c b/lib/cmpxchg-emu.c
new file mode 100644
index 0000000000000..27f6f97cb60dd
--- /dev/null
+++ b/lib/cmpxchg-emu.c
@@ -0,0 +1,45 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Emulated 1-byte cmpxchg operation for architectures lacking direct
+ * support for this size.  This is implemented in terms of 4-byte cmpxchg
+ * operations.
+ *
+ * Copyright (C) 2024 Paul E. McKenney.
+ */
+
+#include <linux/types.h>
+#include <linux/export.h>
+#include <linux/instrumented.h>
+#include <linux/atomic.h>
+#include <linux/panic.h>
+#include <linux/bug.h>
+#include <asm-generic/rwonce.h>
+#include <linux/cmpxchg-emu.h>
+
+union u8_32 {
+	u8 b[4];
+	u32 w;
+};
+
+/* Emulate one-byte cmpxchg() in terms of 4-byte cmpxchg. */
+uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new)
+{
+	u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3);
+	int i = ((uintptr_t)p) & 0x3;
+	union u8_32 old32;
+	union u8_32 new32;
+	u32 ret;
+
+	ret = READ_ONCE(*p32);
+	do {
+		old32.w = ret;
+		if (old32.b[i] != old)
+			return old32.b[i];
+		new32.w = old32.w;
+		new32.b[i] = new;
+		instrument_atomic_read_write(p, 1);
+		ret = data_race(cmpxchg(p32, old32.w, new32.w)); // Overridden above.
+	} while (ret != old32.w);
+	return old;
+}
+EXPORT_SYMBOL_GPL(cmpxchg_emu_u8);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 cmpxchg 10/13] ARC: Emulate one-byte cmpxchg
  2024-05-01 22:58   ` [PATCH v2 cmpxchg 0/8] Provide emulation for one--byte cmpxchg() Paul E. McKenney
                       ` (8 preceding siblings ...)
  2024-05-01 23:01     ` [PATCH v2 cmpxchg 09/13] lib: Add one-byte emulation function Paul E. McKenney
@ 2024-05-01 23:01     ` Paul E. McKenney
  2024-05-01 23:01     ` [PATCH v2 cmpxchg 11/13] csky: " Paul E. McKenney
                       ` (3 subsequent siblings)
  13 siblings, 0 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-05-01 23:01 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: elver, akpm, tglx, peterz, dianders, pmladek, arnd, torvalds,
	kernel-team, Paul E. McKenney, Vineet Gupta, Andi Shyti,
	Andrzej Hajda, Palmer Dabbelt, linux-snps-arc

Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on arc.

[ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]
[ paulmck: Apply feedback from Naresh Kamboju. ]
[ paulmck: Apply kernel test robot feedback. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: <linux-snps-arc@lists.infradead.org>
---
 arch/arc/Kconfig               |  1 +
 arch/arc/include/asm/cmpxchg.h | 33 ++++++++++++++++++++++++---------
 2 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index 99d2845f3feb9..5bf6137f0fd47 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -14,6 +14,7 @@ config ARC
 	select ARCH_HAS_SETUP_DMA_OPS
 	select ARCH_HAS_SYNC_DMA_FOR_CPU
 	select ARCH_HAS_SYNC_DMA_FOR_DEVICE
+	select ARCH_NEED_CMPXCHG_1_EMU
 	select ARCH_SUPPORTS_ATOMIC_RMW if ARC_HAS_LLSC
 	select ARCH_32BIT_OFF_T
 	select BUILDTIME_TABLE_SORT
diff --git a/arch/arc/include/asm/cmpxchg.h b/arch/arc/include/asm/cmpxchg.h
index e138fde067dea..2102ce076f28b 100644
--- a/arch/arc/include/asm/cmpxchg.h
+++ b/arch/arc/include/asm/cmpxchg.h
@@ -8,6 +8,7 @@
 
 #include <linux/build_bug.h>
 #include <linux/types.h>
+#include <linux/cmpxchg-emu.h>
 
 #include <asm/barrier.h>
 #include <asm/smp.h>
@@ -46,6 +47,9 @@
 	__typeof__(*(ptr)) _prev_;					\
 									\
 	switch(sizeof((_p_))) {						\
+	case 1:								\
+		_prev_ = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)_p_, (uintptr_t)_o_, (uintptr_t)_n_);	\
+		break;							\
 	case 4:								\
 		_prev_ = __cmpxchg(_p_, _o_, _n_);			\
 		break;							\
@@ -65,16 +69,27 @@
 	__typeof__(*(ptr)) _prev_;					\
 	unsigned long __flags;						\
 									\
-	BUILD_BUG_ON(sizeof(_p_) != 4);					\
+	switch(sizeof((_p_))) {						\
+	case 1:								\
+		__flags = cmpxchg_emu_u8((volatile u8 *)_p_, (uintptr_t)_o_, (uintptr_t)_n_);	\
+		_prev_ = (__typeof__(*(ptr)))__flags;			\
+		break;							\
+		break;							\
+	case 4:								\
+		/*							\
+		 * spin lock/unlock provide the needed smp_mb()		\
+		 * before/after						\
+		 */							\
+		atomic_ops_lock(__flags);				\
+		_prev_ = *_p_;						\
+		if (_prev_ == _o_)					\
+			*_p_ = _n_;					\
+		atomic_ops_unlock(__flags);				\
+		break;							\
+	default:							\
+		BUILD_BUG();						\
+	}								\
 									\
-	/*								\
-	 * spin lock/unlock provide the needed smp_mb() before/after	\
-	 */								\
-	atomic_ops_lock(__flags);					\
-	_prev_ = *_p_;							\
-	if (_prev_ == _o_)						\
-		*_p_ = _n_;						\
-	atomic_ops_unlock(__flags);					\
 	_prev_;								\
 })
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 cmpxchg 11/13] csky: Emulate one-byte cmpxchg
  2024-05-01 22:58   ` [PATCH v2 cmpxchg 0/8] Provide emulation for one--byte cmpxchg() Paul E. McKenney
                       ` (9 preceding siblings ...)
  2024-05-01 23:01     ` [PATCH v2 cmpxchg 10/13] ARC: Emulate one-byte cmpxchg Paul E. McKenney
@ 2024-05-01 23:01     ` Paul E. McKenney
  2024-05-11  6:42       ` Guo Ren
  2024-05-01 23:01     ` [PATCH v2 cmpxchg 12/13] sh: " Paul E. McKenney
                       ` (2 subsequent siblings)
  13 siblings, 1 reply; 71+ messages in thread
From: Paul E. McKenney @ 2024-05-01 23:01 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: elver, akpm, tglx, peterz, dianders, pmladek, arnd, torvalds,
	kernel-team, Paul E. McKenney, Yujie Liu, Guo Ren, linux-csky

Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on csky.

[ paulmck: Apply kernel test robot feedback. ]
[ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]

Co-developed-by: Yujie Liu <yujie.liu@intel.com>
Signed-off-by: Yujie Liu <yujie.liu@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Yujie Liu <yujie.liu@intel.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: <linux-csky@vger.kernel.org>
---
 arch/csky/Kconfig               |  1 +
 arch/csky/include/asm/cmpxchg.h | 10 ++++++++++
 2 files changed, 11 insertions(+)

diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig
index d3ac36751ad1f..5479707eb5d10 100644
--- a/arch/csky/Kconfig
+++ b/arch/csky/Kconfig
@@ -37,6 +37,7 @@ config CSKY
 	select ARCH_INLINE_SPIN_UNLOCK_BH if !PREEMPTION
 	select ARCH_INLINE_SPIN_UNLOCK_IRQ if !PREEMPTION
 	select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE if !PREEMPTION
+	select ARCH_NEED_CMPXCHG_1_EMU
 	select ARCH_WANT_FRAME_POINTERS if !CPU_CK610 && $(cc-option,-mbacktrace)
 	select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
 	select COMMON_CLK
diff --git a/arch/csky/include/asm/cmpxchg.h b/arch/csky/include/asm/cmpxchg.h
index 916043b845f14..db6dda47184e4 100644
--- a/arch/csky/include/asm/cmpxchg.h
+++ b/arch/csky/include/asm/cmpxchg.h
@@ -6,6 +6,7 @@
 #ifdef CONFIG_SMP
 #include <linux/bug.h>
 #include <asm/barrier.h>
+#include <linux/cmpxchg-emu.h>
 
 #define __xchg_relaxed(new, ptr, size)				\
 ({								\
@@ -61,6 +62,9 @@
 	__typeof__(old) __old = (old);				\
 	__typeof__(*(ptr)) __ret;				\
 	switch (size) {						\
+	case 1:							\
+		__ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, (uintptr_t)__old, (uintptr_t)__new); \
+		break;						\
 	case 4:							\
 		asm volatile (					\
 		"1:	ldex.w		%0, (%3) \n"		\
@@ -91,6 +95,9 @@
 	__typeof__(old) __old = (old);				\
 	__typeof__(*(ptr)) __ret;				\
 	switch (size) {						\
+	case 1:							\
+		__ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, (uintptr_t)__old, (uintptr_t)__new); \
+		break;						\
 	case 4:							\
 		asm volatile (					\
 		"1:	ldex.w		%0, (%3) \n"		\
@@ -122,6 +129,9 @@
 	__typeof__(old) __old = (old);				\
 	__typeof__(*(ptr)) __ret;				\
 	switch (size) {						\
+	case 1:							\
+		__ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, (uintptr_t)__old, (uintptr_t)__new); \
+		break;						\
 	case 4:							\
 		asm volatile (					\
 		RELEASE_FENCE					\
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 cmpxchg 12/13] sh: Emulate one-byte cmpxchg
  2024-05-01 22:58   ` [PATCH v2 cmpxchg 0/8] Provide emulation for one--byte cmpxchg() Paul E. McKenney
                       ` (10 preceding siblings ...)
  2024-05-01 23:01     ` [PATCH v2 cmpxchg 11/13] csky: " Paul E. McKenney
@ 2024-05-01 23:01     ` Paul E. McKenney
  2024-05-02  4:52       ` John Paul Adrian Glaubitz
  2024-05-01 23:01     ` [PATCH v2 cmpxchg 13/13] xtensa: " Paul E. McKenney
  2024-05-02 20:01     ` [PATCH v2 cmpxchg 0/8] Provide emulation for one--byte cmpxchg() Al Viro
  13 siblings, 1 reply; 71+ messages in thread
From: Paul E. McKenney @ 2024-05-01 23:01 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: elver, akpm, tglx, peterz, dianders, pmladek, arnd, torvalds,
	kernel-team, Paul E. McKenney, Andi Shyti, Palmer Dabbelt,
	Masami Hiramatsu, linux-sh

Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on sh.

[ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]
[ paulmck: Apply feedback from Naresh Kamboju. ]
[ Apply Geert Uytterhoeven feedback. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: <linux-sh@vger.kernel.org>
---
 arch/sh/Kconfig               | 1 +
 arch/sh/include/asm/cmpxchg.h | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index 2ad3e29f0ebec..f47e9ccf4efd2 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -16,6 +16,7 @@ config SUPERH
 	select ARCH_HIBERNATION_POSSIBLE if MMU
 	select ARCH_MIGHT_HAVE_PC_PARPORT
 	select ARCH_WANT_IPC_PARSE_VERSION
+	select ARCH_NEED_CMPXCHG_1_EMU
 	select CPU_NO_EFFICIENT_FFS
 	select DMA_DECLARE_COHERENT
 	select GENERIC_ATOMIC64
diff --git a/arch/sh/include/asm/cmpxchg.h b/arch/sh/include/asm/cmpxchg.h
index 5d617b3ef78f7..1e5dc5ccf7bf5 100644
--- a/arch/sh/include/asm/cmpxchg.h
+++ b/arch/sh/include/asm/cmpxchg.h
@@ -9,6 +9,7 @@
 
 #include <linux/compiler.h>
 #include <linux/types.h>
+#include <linux/cmpxchg-emu.h>
 
 #if defined(CONFIG_GUSA_RB)
 #include <asm/cmpxchg-grb.h>
@@ -56,6 +57,8 @@ static inline unsigned long __cmpxchg(volatile void * ptr, unsigned long old,
 		unsigned long new, int size)
 {
 	switch (size) {
+	case 1:
+		return cmpxchg_emu_u8(ptr, old, new);
 	case 4:
 		return __cmpxchg_u32(ptr, old, new);
 	}
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 cmpxchg 13/13] xtensa: Emulate one-byte cmpxchg
  2024-05-01 22:58   ` [PATCH v2 cmpxchg 0/8] Provide emulation for one--byte cmpxchg() Paul E. McKenney
                       ` (11 preceding siblings ...)
  2024-05-01 23:01     ` [PATCH v2 cmpxchg 12/13] sh: " Paul E. McKenney
@ 2024-05-01 23:01     ` Paul E. McKenney
  2024-05-02 20:01     ` [PATCH v2 cmpxchg 0/8] Provide emulation for one--byte cmpxchg() Al Viro
  13 siblings, 0 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-05-01 23:01 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: elver, akpm, tglx, peterz, dianders, pmladek, arnd, torvalds,
	kernel-team, Paul E. McKenney, Yujie Liu, Andi Shyti,
	Geert Uytterhoeven

Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on xtensa.

[ paulmck: Apply kernel test robot feedback. ]
[ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]
[ Apply Geert Uytterhoeven feedback. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Yujie Liu <yujie.liu@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
---
 arch/xtensa/Kconfig               | 1 +
 arch/xtensa/include/asm/cmpxchg.h | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig
index f200a4ec044e6..d3db28f2f8110 100644
--- a/arch/xtensa/Kconfig
+++ b/arch/xtensa/Kconfig
@@ -14,6 +14,7 @@ config XTENSA
 	select ARCH_HAS_DMA_SET_UNCACHED if MMU
 	select ARCH_HAS_STRNCPY_FROM_USER if !KASAN
 	select ARCH_HAS_STRNLEN_USER
+	select ARCH_NEED_CMPXCHG_1_EMU
 	select ARCH_USE_MEMTEST
 	select ARCH_USE_QUEUED_RWLOCKS
 	select ARCH_USE_QUEUED_SPINLOCKS
diff --git a/arch/xtensa/include/asm/cmpxchg.h b/arch/xtensa/include/asm/cmpxchg.h
index 675a11ea8de76..95e33a913962d 100644
--- a/arch/xtensa/include/asm/cmpxchg.h
+++ b/arch/xtensa/include/asm/cmpxchg.h
@@ -15,6 +15,7 @@
 
 #include <linux/bits.h>
 #include <linux/stringify.h>
+#include <linux/cmpxchg-emu.h>
 
 /*
  * cmpxchg
@@ -74,6 +75,7 @@ static __inline__ unsigned long
 __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, int size)
 {
 	switch (size) {
+	case 1:  return cmpxchg_emu_u8(ptr, old, new);
 	case 4:  return __cmpxchg_u32(ptr, old, new);
 	default: __cmpxchg_called_with_bad_pointer();
 		 return old;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 cmpxchg 12/13] sh: Emulate one-byte cmpxchg
  2024-05-01 23:01     ` [PATCH v2 cmpxchg 12/13] sh: " Paul E. McKenney
@ 2024-05-02  4:52       ` John Paul Adrian Glaubitz
  2024-05-02  5:06         ` Paul E. McKenney
  0 siblings, 1 reply; 71+ messages in thread
From: John Paul Adrian Glaubitz @ 2024-05-02  4:52 UTC (permalink / raw)
  To: Paul E. McKenney, linux-arch, linux-kernel
  Cc: elver, akpm, tglx, peterz, dianders, pmladek, arnd, torvalds,
	kernel-team, Andi Shyti, Palmer Dabbelt, Masami Hiramatsu,
	linux-sh

Hi Paul,

On Wed, 2024-05-01 at 16:01 -0700, Paul E. McKenney wrote:
> Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on sh.
> 
> [ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]
> [ paulmck: Apply feedback from Naresh Kamboju. ]
> [ Apply Geert Uytterhoeven feedback. ]
> 
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: Palmer Dabbelt <palmer@rivosinc.com>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: <linux-sh@vger.kernel.org>
> ---
>  arch/sh/Kconfig               | 1 +
>  arch/sh/include/asm/cmpxchg.h | 3 +++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
> index 2ad3e29f0ebec..f47e9ccf4efd2 100644
> --- a/arch/sh/Kconfig
> +++ b/arch/sh/Kconfig
> @@ -16,6 +16,7 @@ config SUPERH
>  	select ARCH_HIBERNATION_POSSIBLE if MMU
>  	select ARCH_MIGHT_HAVE_PC_PARPORT
>  	select ARCH_WANT_IPC_PARSE_VERSION
> +	select ARCH_NEED_CMPXCHG_1_EMU
>  	select CPU_NO_EFFICIENT_FFS
>  	select DMA_DECLARE_COHERENT
>  	select GENERIC_ATOMIC64
> diff --git a/arch/sh/include/asm/cmpxchg.h b/arch/sh/include/asm/cmpxchg.h
> index 5d617b3ef78f7..1e5dc5ccf7bf5 100644
> --- a/arch/sh/include/asm/cmpxchg.h
> +++ b/arch/sh/include/asm/cmpxchg.h
> @@ -9,6 +9,7 @@
>  
>  #include <linux/compiler.h>
>  #include <linux/types.h>
> +#include <linux/cmpxchg-emu.h>
>  
>  #if defined(CONFIG_GUSA_RB)
>  #include <asm/cmpxchg-grb.h>
> @@ -56,6 +57,8 @@ static inline unsigned long __cmpxchg(volatile void * ptr, unsigned long old,
>  		unsigned long new, int size)
>  {
>  	switch (size) {
> +	case 1:
> +		return cmpxchg_emu_u8(ptr, old, new);
>  	case 4:
>  		return __cmpxchg_u32(ptr, old, new);
>  	}

Thanks for the patch. However, I don't quite understand its purpose.

There is already a case for 8-byte cmpxchg in the switch statement below:

        case 1:                                         \
                __xchg__res = xchg_u8(__xchg_ptr, x);   \
                break;

Does cmpxchg_emu_u8() have any advantages over the native xchg_u8()?

Thanks,
Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 cmpxchg 12/13] sh: Emulate one-byte cmpxchg
  2024-05-02  4:52       ` John Paul Adrian Glaubitz
@ 2024-05-02  5:06         ` Paul E. McKenney
  2024-05-02  5:11           ` John Paul Adrian Glaubitz
  2024-05-02  5:42           ` D. Jeff Dionne
  0 siblings, 2 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-05-02  5:06 UTC (permalink / raw)
  To: John Paul Adrian Glaubitz
  Cc: linux-arch, linux-kernel, elver, akpm, tglx, peterz, dianders,
	pmladek, arnd, torvalds, kernel-team, Andi Shyti, Palmer Dabbelt,
	Masami Hiramatsu, linux-sh

On Thu, May 02, 2024 at 06:52:53AM +0200, John Paul Adrian Glaubitz wrote:
> Hi Paul,
> 
> On Wed, 2024-05-01 at 16:01 -0700, Paul E. McKenney wrote:
> > Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on sh.
> > 
> > [ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]
> > [ paulmck: Apply feedback from Naresh Kamboju. ]
> > [ Apply Geert Uytterhoeven feedback. ]
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > Cc: Andi Shyti <andi.shyti@linux.intel.com>
> > Cc: Palmer Dabbelt <palmer@rivosinc.com>
> > Cc: Masami Hiramatsu <mhiramat@kernel.org>
> > Cc: Arnd Bergmann <arnd@arndb.de>
> > Cc: <linux-sh@vger.kernel.org>
> > ---
> >  arch/sh/Kconfig               | 1 +
> >  arch/sh/include/asm/cmpxchg.h | 3 +++
> >  2 files changed, 4 insertions(+)
> > 
> > diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
> > index 2ad3e29f0ebec..f47e9ccf4efd2 100644
> > --- a/arch/sh/Kconfig
> > +++ b/arch/sh/Kconfig
> > @@ -16,6 +16,7 @@ config SUPERH
> >  	select ARCH_HIBERNATION_POSSIBLE if MMU
> >  	select ARCH_MIGHT_HAVE_PC_PARPORT
> >  	select ARCH_WANT_IPC_PARSE_VERSION
> > +	select ARCH_NEED_CMPXCHG_1_EMU
> >  	select CPU_NO_EFFICIENT_FFS
> >  	select DMA_DECLARE_COHERENT
> >  	select GENERIC_ATOMIC64
> > diff --git a/arch/sh/include/asm/cmpxchg.h b/arch/sh/include/asm/cmpxchg.h
> > index 5d617b3ef78f7..1e5dc5ccf7bf5 100644
> > --- a/arch/sh/include/asm/cmpxchg.h
> > +++ b/arch/sh/include/asm/cmpxchg.h
> > @@ -9,6 +9,7 @@
> >  
> >  #include <linux/compiler.h>
> >  #include <linux/types.h>
> > +#include <linux/cmpxchg-emu.h>
> >  
> >  #if defined(CONFIG_GUSA_RB)
> >  #include <asm/cmpxchg-grb.h>
> > @@ -56,6 +57,8 @@ static inline unsigned long __cmpxchg(volatile void * ptr, unsigned long old,
> >  		unsigned long new, int size)
> >  {
> >  	switch (size) {
> > +	case 1:
> > +		return cmpxchg_emu_u8(ptr, old, new);
> >  	case 4:
> >  		return __cmpxchg_u32(ptr, old, new);
> >  	}
> 
> Thanks for the patch. However, I don't quite understand its purpose.
> 
> There is already a case for 8-byte cmpxchg in the switch statement below:
> 
>         case 1:                                         \
>                 __xchg__res = xchg_u8(__xchg_ptr, x);   \
>                 break;
> 
> Does cmpxchg_emu_u8() have any advantages over the native xchg_u8()?

That would be 8-bit xchg() rather than 8-byte cmpxchg(), correct?

Or am I missing something subtle here that makes sh also support one-byte
(8-bit) cmpxchg()?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 cmpxchg 12/13] sh: Emulate one-byte cmpxchg
  2024-05-02  5:06         ` Paul E. McKenney
@ 2024-05-02  5:11           ` John Paul Adrian Glaubitz
  2024-05-02 13:33             ` Paul E. McKenney
  2024-05-02  5:42           ` D. Jeff Dionne
  1 sibling, 1 reply; 71+ messages in thread
From: John Paul Adrian Glaubitz @ 2024-05-02  5:11 UTC (permalink / raw)
  To: paulmck
  Cc: linux-arch, linux-kernel, elver, akpm, tglx, peterz, dianders,
	pmladek, arnd, torvalds, kernel-team, Andi Shyti, Palmer Dabbelt,
	Masami Hiramatsu, linux-sh

On Wed, 2024-05-01 at 22:06 -0700, Paul E. McKenney wrote:
> > Does cmpxchg_emu_u8() have any advantages over the native xchg_u8()?
> 
> That would be 8-bit xchg() rather than 8-byte cmpxchg(), correct?

Indeed. I realized this after sending my reply.

> Or am I missing something subtle here that makes sh also support one-byte
> (8-bit) cmpxchg()?

Is there an explanation available that explains the rationale behind the
series, so I can learn more about it?

Also, I am opposed to removing Alpha entirely as it's still being actively
maintained in Debian and Gentoo and works well.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 cmpxchg 12/13] sh: Emulate one-byte cmpxchg
  2024-05-02  5:06         ` Paul E. McKenney
  2024-05-02  5:11           ` John Paul Adrian Glaubitz
@ 2024-05-02  5:42           ` D. Jeff Dionne
  2024-05-02 11:30             ` Arnd Bergmann
  1 sibling, 1 reply; 71+ messages in thread
From: D. Jeff Dionne @ 2024-05-02  5:42 UTC (permalink / raw)
  To: paulmck
  Cc: John Paul Adrian Glaubitz, linux-arch, linux-kernel, elver, akpm,
	tglx, peterz, dianders, pmladek, arnd, torvalds, kernel-team,
	Andi Shyti, Palmer Dabbelt, Masami Hiramatsu, linux-sh

On May 2, 2024, at 14:07, Paul E. McKenney <paulmck@kernel.org> wrote:

> That would be 8-bit xchg() rather than 8-byte cmpxchg(), correct?
> 
> Or am I missing something subtle here that makes sh also support one-byte
> (8-bit) cmpxchg()?

The native SH atomic operation is test and set TAS.B.  J2 adds a compare and swap CAS.L instruction, carefully chosen for patent free prior art (s360, IIRC).

The (relatively expensive) encoding space we allocated for CAS.L does not contain size bits.

Not all SH4 patents had expired when J2 was under development, but now have (watch this space).  Not sure (me myself) if there are more atomic operations in sh4.

Cheers,
J

> 
>                            Thanx, Paul

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 cmpxchg 12/13] sh: Emulate one-byte cmpxchg
  2024-05-02  5:42           ` D. Jeff Dionne
@ 2024-05-02 11:30             ` Arnd Bergmann
  0 siblings, 0 replies; 71+ messages in thread
From: Arnd Bergmann @ 2024-05-02 11:30 UTC (permalink / raw)
  To: D. Jeff Dionne, Paul E. McKenney
  Cc: John Paul Adrian Glaubitz, Linux-Arch, linux-kernel, Marco Elver,
	Andrew Morton, Thomas Gleixner, Peter Zijlstra, Doug Anderson,
	Petr Mladek, Linus Torvalds, kernel-team, Andi Shyti,
	Palmer Dabbelt, Masami Hiramatsu, linux-sh

On Thu, May 2, 2024, at 07:42, D. Jeff Dionne wrote:
> On May 2, 2024, at 14:07, Paul E. McKenney <paulmck@kernel.org> wrote:
>
>> That would be 8-bit xchg() rather than 8-byte cmpxchg(), correct?
>> 
>> Or am I missing something subtle here that makes sh also support one-byte
>> (8-bit) cmpxchg()?
>
> The native SH atomic operation is test and set TAS.B.  J2 adds a 
> compare and swap CAS.L instruction, carefully chosen for patent free 
> prior art (s360, IIRC).
>
> The (relatively expensive) encoding space we allocated for CAS.L does 
> not contain size bits.
>
> Not all SH4 patents had expired when J2 was under development, but now 
> have (watch this space).  Not sure (me myself) if there are more atomic 
> operations in sh4.

SH4A supports MIPS R4000 style LL/SC instructions, but it looks like
the older SH4 does not.

      Arnd

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 cmpxchg 12/13] sh: Emulate one-byte cmpxchg
  2024-05-02  5:11           ` John Paul Adrian Glaubitz
@ 2024-05-02 13:33             ` Paul E. McKenney
  2024-05-02 20:53               ` Al Viro
  2024-05-02 21:50               ` Arnd Bergmann
  0 siblings, 2 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-05-02 13:33 UTC (permalink / raw)
  To: John Paul Adrian Glaubitz
  Cc: linux-arch, linux-kernel, elver, akpm, tglx, peterz, dianders,
	pmladek, arnd, torvalds, kernel-team, Andi Shyti, Palmer Dabbelt,
	Masami Hiramatsu, linux-sh

On Thu, May 02, 2024 at 07:11:52AM +0200, John Paul Adrian Glaubitz wrote:
> On Wed, 2024-05-01 at 22:06 -0700, Paul E. McKenney wrote:
> > > Does cmpxchg_emu_u8() have any advantages over the native xchg_u8()?
> > 
> > That would be 8-bit xchg() rather than 8-byte cmpxchg(), correct?
> 
> Indeed. I realized this after sending my reply.

No problem, as I do know that feeling!

> > Or am I missing something subtle here that makes sh also support one-byte
> > (8-bit) cmpxchg()?
> 
> Is there an explanation available that explains the rationale behind the
> series, so I can learn more about it?

We have some places in mainline that need one-byte cmpxchg(), so this
series provides emulation for architectures that do not support this
notion.

> Also, I am opposed to removing Alpha entirely as it's still being actively
> maintained in Debian and Gentoo and works well.

Understood, and this sort of compatibility consideration is why this
version of this patchset does not emulate two-byte (16-bit) cmpxchg()
operations.  The original (RFC) series did emulate these, which does
not work on a few architectures that do not provide 16-bit load/store
instructions, hence no 16-bit support in this series.

So this one-byte-only series affects only Alpha systems lacking
single-byte load/store instructions.  If I understand correctly, Alpha
21164A (EV56) and later *do* have single-byte load/store instructions,
and thus are still just fine.  In fact, it looks like EV56 also has
two-byte load/store instructions, and so would have been OK with
the original one-/two-byte RFC series.

Arnd will not be shy about correcting me if I am wrong.  ;-)

> Adrian
> 
> -- 
>  .''`.  John Paul Adrian Glaubitz
> : :' :  Debian Developer
> `. `'   Physicist
>   `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 cmpxchg 0/8] Provide emulation for one--byte cmpxchg()
  2024-05-01 22:58   ` [PATCH v2 cmpxchg 0/8] Provide emulation for one--byte cmpxchg() Paul E. McKenney
                       ` (12 preceding siblings ...)
  2024-05-01 23:01     ` [PATCH v2 cmpxchg 13/13] xtensa: " Paul E. McKenney
@ 2024-05-02 20:01     ` Al Viro
  2024-05-02 21:20       ` Paul E. McKenney
  13 siblings, 1 reply; 71+ messages in thread
From: Al Viro @ 2024-05-02 20:01 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-arch, linux-kernel, elver, akpm, tglx, peterz, dianders,
	pmladek, torvalds, Arnd Bergmann

On Wed, May 01, 2024 at 03:58:14PM -0700, Paul E. McKenney wrote:
> Hello!
> 
> This v2 series provides emulation functions for one-byte cmpxchg,
       ^^
...

> Changes since v2:
                ^^

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 cmpxchg 12/13] sh: Emulate one-byte cmpxchg
  2024-05-02 13:33             ` Paul E. McKenney
@ 2024-05-02 20:53               ` Al Viro
  2024-05-02 21:01                 ` alpha cmpxchg.h (was Re: [PATCH v2 cmpxchg 12/13] sh: Emulate one-byte cmpxchg) Al Viro
  2024-05-02 21:18                 ` [PATCH v2 cmpxchg 12/13] sh: Emulate one-byte cmpxchg Paul E. McKenney
  2024-05-02 21:50               ` Arnd Bergmann
  1 sibling, 2 replies; 71+ messages in thread
From: Al Viro @ 2024-05-02 20:53 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: John Paul Adrian Glaubitz, linux-arch, linux-kernel, elver, akpm,
	tglx, peterz, dianders, pmladek, arnd, torvalds, kernel-team,
	Andi Shyti, Palmer Dabbelt, Masami Hiramatsu, linux-sh

On Thu, May 02, 2024 at 06:33:49AM -0700, Paul E. McKenney wrote:

> Understood, and this sort of compatibility consideration is why this
> version of this patchset does not emulate two-byte (16-bit) cmpxchg()
> operations.  The original (RFC) series did emulate these, which does
> not work on a few architectures that do not provide 16-bit load/store
> instructions, hence no 16-bit support in this series.
> 
> So this one-byte-only series affects only Alpha systems lacking
> single-byte load/store instructions.  If I understand correctly, Alpha
> 21164A (EV56) and later *do* have single-byte load/store instructions,
> and thus are still just fine.  In fact, it looks like EV56 also has
> two-byte load/store instructions, and so would have been OK with
> the original one-/two-byte RFC series.

Wait a sec.  On Alpha we already implement 16bit and 8bit xchg and cmpxchg.
See arch/alpha/include/asm/xchg.h:
static inline unsigned long
____cmpxchg(_u16, volatile short *m, unsigned short old, unsigned short new)
{
       unsigned long prev, tmp, cmp, addr64;

       __asm__ __volatile__(
       "       andnot  %5,7,%4\n"
       "       inswl   %1,%5,%1\n"
       "1:     ldq_l   %2,0(%4)\n"
       "       extwl   %2,%5,%0\n"
       "       cmpeq   %0,%6,%3\n"
       "       beq     %3,2f\n"
       "       mskwl   %2,%5,%2\n"
       "       or      %1,%2,%2\n"
       "       stq_c   %2,0(%4)\n"
       "       beq     %2,3f\n"
       "2:\n"
       ".subsection 2\n"
       "3:     br      1b\n"
       ".previous"
       : "=&r" (prev), "=&r" (new), "=&r" (tmp), "=&r" (cmp), "=&r" (addr64)
       : "r" ((long)m), "Ir" (old), "1" (new) : "memory");

       return prev;
}

Load-locked and store-conditional are done on 64bit value, with
16bit operations done in registers.  This is what 16bit store
(assignment to unsigned short *) turns into with
        stw $17,0($16)		// *(u16*)r16 = r17
and without -mbwx
        insql $17,$16,$17	// r17 = r17 << (8 * (r16 & 7))
        ldq_u $1,0($16)		// r1 = *(u64 *)(r16 & ~7)
	mskwl $1,$16,$1		// r1 &= ~(0xffff << (8 * (r16 & 7))
	bis $17,$1,$17		// r17 |= r1
	stq_u $17,0($16)	// *(u64 *)(r16 & ~7) = r17

What's more, load-locked/store-conditional doesn't have 16bit and 8bit
variants on any Alphas - it's always 32bit (ldl_l) or 64bit (ldq_l).

What BWX adds is load/store byte/word, load/store byte/word unaligned
and sign-extend byte/word.  IOW, it's absolutely irrelevant for
cmpxchg (or xchg) purposes.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* alpha cmpxchg.h (was Re: [PATCH v2 cmpxchg 12/13] sh: Emulate one-byte cmpxchg)
  2024-05-02 20:53               ` Al Viro
@ 2024-05-02 21:01                 ` Al Viro
  2024-05-02 22:16                   ` Linus Torvalds
  2024-05-02 21:18                 ` [PATCH v2 cmpxchg 12/13] sh: Emulate one-byte cmpxchg Paul E. McKenney
  1 sibling, 1 reply; 71+ messages in thread
From: Al Viro @ 2024-05-02 21:01 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: John Paul Adrian Glaubitz, linux-arch, linux-kernel, elver, akpm,
	tglx, peterz, dianders, pmladek, arnd, torvalds, kernel-team,
	Andi Shyti, Palmer Dabbelt, Masami Hiramatsu, linux-sh,
	linux-alpha

On Thu, May 02, 2024 at 09:53:45PM +0100, Al Viro wrote:
> What's more, load-locked/store-conditional doesn't have 16bit and 8bit
> variants on any Alphas - it's always 32bit (ldl_l) or 64bit (ldq_l).
> 
> What BWX adds is load/store byte/word, load/store byte/word unaligned
> and sign-extend byte/word.  IOW, it's absolutely irrelevant for
> cmpxchg (or xchg) purposes.

FWIW, I do have a cmpxchg-related patch for alpha - the mess with xchg.h
(parametrized double include) is no longer needed, and hadn't been since
2018 (fbfcd0199170 "locking/xchg/alpha: Remove superfluous memory barriers
from the _local() variants" was the point when the things settled down).
Only tangentially related to your stuff, but it makes the damn thing
easier to follow.


commit e992b5436ccd504b07a390118cf2be686355b957
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Mon Apr 8 17:43:37 2024 -0400

    alpha: no need to include asm/xchg.h twice
    
    We used to generate different helpers for local and full
    {cmp,}xchg(); these days the barriers are in arch_{cmp,}xchg()
    instead and generated helpers are identical for local and
    full cases.  No need for those parametrized includes of
    asm/xchg.h - we might as well insert its contents directly
    in asm/cmpxchg.h and do it only once.
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

diff --git a/arch/alpha/include/asm/cmpxchg.h b/arch/alpha/include/asm/cmpxchg.h
index 91d4a4d9258c..ae1b96479d0c 100644
--- a/arch/alpha/include/asm/cmpxchg.h
+++ b/arch/alpha/include/asm/cmpxchg.h
@@ -3,17 +3,232 @@
 #define _ALPHA_CMPXCHG_H
 
 /*
- * Atomic exchange routines.
+ * Atomic exchange.
+ * Since it can be used to implement critical sections
+ * it must clobber "memory" (also for interrupts in UP).
  */
 
-#define ____xchg(type, args...)		__arch_xchg ## type ## _local(args)
-#define ____cmpxchg(type, args...)	__cmpxchg ## type ## _local(args)
-#include <asm/xchg.h>
+static inline unsigned long
+____xchg_u8(volatile char *m, unsigned long val)
+{
+	unsigned long ret, tmp, addr64;
+
+	__asm__ __volatile__(
+	"	andnot	%4,7,%3\n"
+	"	insbl	%1,%4,%1\n"
+	"1:	ldq_l	%2,0(%3)\n"
+	"	extbl	%2,%4,%0\n"
+	"	mskbl	%2,%4,%2\n"
+	"	or	%1,%2,%2\n"
+	"	stq_c	%2,0(%3)\n"
+	"	beq	%2,2f\n"
+	".subsection 2\n"
+	"2:	br	1b\n"
+	".previous"
+	: "=&r" (ret), "=&r" (val), "=&r" (tmp), "=&r" (addr64)
+	: "r" ((long)m), "1" (val) : "memory");
+
+	return ret;
+}
+
+static inline unsigned long
+____xchg_u16(volatile short *m, unsigned long val)
+{
+	unsigned long ret, tmp, addr64;
+
+	__asm__ __volatile__(
+	"	andnot	%4,7,%3\n"
+	"	inswl	%1,%4,%1\n"
+	"1:	ldq_l	%2,0(%3)\n"
+	"	extwl	%2,%4,%0\n"
+	"	mskwl	%2,%4,%2\n"
+	"	or	%1,%2,%2\n"
+	"	stq_c	%2,0(%3)\n"
+	"	beq	%2,2f\n"
+	".subsection 2\n"
+	"2:	br	1b\n"
+	".previous"
+	: "=&r" (ret), "=&r" (val), "=&r" (tmp), "=&r" (addr64)
+	: "r" ((long)m), "1" (val) : "memory");
+
+	return ret;
+}
+
+static inline unsigned long
+____xchg_u32(volatile int *m, unsigned long val)
+{
+	unsigned long dummy;
+
+	__asm__ __volatile__(
+	"1:	ldl_l %0,%4\n"
+	"	bis $31,%3,%1\n"
+	"	stl_c %1,%2\n"
+	"	beq %1,2f\n"
+	".subsection 2\n"
+	"2:	br 1b\n"
+	".previous"
+	: "=&r" (val), "=&r" (dummy), "=m" (*m)
+	: "rI" (val), "m" (*m) : "memory");
+
+	return val;
+}
+
+static inline unsigned long
+____xchg_u64(volatile long *m, unsigned long val)
+{
+	unsigned long dummy;
+
+	__asm__ __volatile__(
+	"1:	ldq_l %0,%4\n"
+	"	bis $31,%3,%1\n"
+	"	stq_c %1,%2\n"
+	"	beq %1,2f\n"
+	".subsection 2\n"
+	"2:	br 1b\n"
+	".previous"
+	: "=&r" (val), "=&r" (dummy), "=m" (*m)
+	: "rI" (val), "m" (*m) : "memory");
+
+	return val;
+}
+
+/* This function doesn't exist, so you'll get a linker error
+   if something tries to do an invalid xchg().  */
+extern void __xchg_called_with_bad_pointer(void);
+
+static __always_inline unsigned long
+____xchg(volatile void *ptr, unsigned long x, int size)
+{
+	return
+		size == 1 ? ____xchg_u8(ptr, x) :
+		size == 2 ? ____xchg_u16(ptr, x) :
+		size == 4 ? ____xchg_u32(ptr, x) :
+		size == 8 ? ____xchg_u64(ptr, x) :
+			(__xchg_called_with_bad_pointer(), x);
+}
+
+/*
+ * Atomic compare and exchange.  Compare OLD with MEM, if identical,
+ * store NEW in MEM.  Return the initial value in MEM.  Success is
+ * indicated by comparing RETURN with OLD.
+ */
+
+static inline unsigned long
+____cmpxchg_u8(volatile char *m, unsigned char old, unsigned char new)
+{
+	unsigned long prev, tmp, cmp, addr64;
+
+	__asm__ __volatile__(
+	"	andnot	%5,7,%4\n"
+	"	insbl	%1,%5,%1\n"
+	"1:	ldq_l	%2,0(%4)\n"
+	"	extbl	%2,%5,%0\n"
+	"	cmpeq	%0,%6,%3\n"
+	"	beq	%3,2f\n"
+	"	mskbl	%2,%5,%2\n"
+	"	or	%1,%2,%2\n"
+	"	stq_c	%2,0(%4)\n"
+	"	beq	%2,3f\n"
+	"2:\n"
+	".subsection 2\n"
+	"3:	br	1b\n"
+	".previous"
+	: "=&r" (prev), "=&r" (new), "=&r" (tmp), "=&r" (cmp), "=&r" (addr64)
+	: "r" ((long)m), "Ir" (old), "1" (new) : "memory");
+
+	return prev;
+}
+
+static inline unsigned long
+____cmpxchg_u16(volatile short *m, unsigned short old, unsigned short new)
+{
+	unsigned long prev, tmp, cmp, addr64;
+
+	__asm__ __volatile__(
+	"	andnot	%5,7,%4\n"
+	"	inswl	%1,%5,%1\n"
+	"1:	ldq_l	%2,0(%4)\n"
+	"	extwl	%2,%5,%0\n"
+	"	cmpeq	%0,%6,%3\n"
+	"	beq	%3,2f\n"
+	"	mskwl	%2,%5,%2\n"
+	"	or	%1,%2,%2\n"
+	"	stq_c	%2,0(%4)\n"
+	"	beq	%2,3f\n"
+	"2:\n"
+	".subsection 2\n"
+	"3:	br	1b\n"
+	".previous"
+	: "=&r" (prev), "=&r" (new), "=&r" (tmp), "=&r" (cmp), "=&r" (addr64)
+	: "r" ((long)m), "Ir" (old), "1" (new) : "memory");
+
+	return prev;
+}
+
+static inline unsigned long
+____cmpxchg_u32(volatile int *m, int old, int new)
+{
+	unsigned long prev, cmp;
+
+	__asm__ __volatile__(
+	"1:	ldl_l %0,%5\n"
+	"	cmpeq %0,%3,%1\n"
+	"	beq %1,2f\n"
+	"	mov %4,%1\n"
+	"	stl_c %1,%2\n"
+	"	beq %1,3f\n"
+	"2:\n"
+	".subsection 2\n"
+	"3:	br 1b\n"
+	".previous"
+	: "=&r"(prev), "=&r"(cmp), "=m"(*m)
+	: "r"((long) old), "r"(new), "m"(*m) : "memory");
+
+	return prev;
+}
+
+static inline unsigned long
+____cmpxchg_u64(volatile long *m, unsigned long old, unsigned long new)
+{
+	unsigned long prev, cmp;
+
+	__asm__ __volatile__(
+	"1:	ldq_l %0,%5\n"
+	"	cmpeq %0,%3,%1\n"
+	"	beq %1,2f\n"
+	"	mov %4,%1\n"
+	"	stq_c %1,%2\n"
+	"	beq %1,3f\n"
+	"2:\n"
+	".subsection 2\n"
+	"3:	br 1b\n"
+	".previous"
+	: "=&r"(prev), "=&r"(cmp), "=m"(*m)
+	: "r"((long) old), "r"(new), "m"(*m) : "memory");
+
+	return prev;
+}
+
+/* This function doesn't exist, so you'll get a linker error
+   if something tries to do an invalid cmpxchg().  */
+extern void __cmpxchg_called_with_bad_pointer(void);
+
+static __always_inline unsigned long
+____cmpxchg(volatile void *ptr, unsigned long old, unsigned long new,
+	      int size)
+{
+	return
+		size == 1 ? ____cmpxchg_u8(ptr, old, new) :
+		size == 2 ? ____cmpxchg_u16(ptr, old, new) :
+		size == 4 ? ____cmpxchg_u32(ptr, old, new) :
+		size == 8 ? ____cmpxchg_u64(ptr, old, new) :
+			(__cmpxchg_called_with_bad_pointer(), old);
+}
 
 #define xchg_local(ptr, x)						\
 ({									\
 	__typeof__(*(ptr)) _x_ = (x);					\
-	(__typeof__(*(ptr))) __arch_xchg_local((ptr), (unsigned long)_x_,\
+	(__typeof__(*(ptr))) ____xchg((ptr), (unsigned long)_x_,	\
 					       sizeof(*(ptr)));		\
 })
 
@@ -21,7 +236,7 @@
 ({									\
 	__typeof__(*(ptr)) _o_ = (o);					\
 	__typeof__(*(ptr)) _n_ = (n);					\
-	(__typeof__(*(ptr))) __cmpxchg_local((ptr), (unsigned long)_o_,	\
+	(__typeof__(*(ptr))) ____cmpxchg((ptr), (unsigned long)_o_,	\
 					  (unsigned long)_n_,		\
 					  sizeof(*(ptr)));		\
 })
@@ -32,12 +247,6 @@
 	cmpxchg_local((ptr), (o), (n));					\
 })
 
-#undef ____xchg
-#undef ____cmpxchg
-#define ____xchg(type, args...)		__arch_xchg ##type(args)
-#define ____cmpxchg(type, args...)	__cmpxchg ##type(args)
-#include <asm/xchg.h>
-
 /*
  * The leading and the trailing memory barriers guarantee that these
  * operations are fully ordered.
@@ -48,7 +257,7 @@
 	__typeof__(*(ptr)) _x_ = (x);					\
 	smp_mb();							\
 	__ret = (__typeof__(*(ptr)))					\
-		__arch_xchg((ptr), (unsigned long)_x_, sizeof(*(ptr)));	\
+		____xchg((ptr), (unsigned long)_x_, sizeof(*(ptr)));	\
 	smp_mb();							\
 	__ret;								\
 })
@@ -59,7 +268,7 @@
 	__typeof__(*(ptr)) _o_ = (o);					\
 	__typeof__(*(ptr)) _n_ = (n);					\
 	smp_mb();							\
-	__ret = (__typeof__(*(ptr))) __cmpxchg((ptr),			\
+	__ret = (__typeof__(*(ptr))) ____cmpxchg((ptr),			\
 		(unsigned long)_o_, (unsigned long)_n_, sizeof(*(ptr)));\
 	smp_mb();							\
 	__ret;								\
@@ -71,6 +280,4 @@
 	arch_cmpxchg((ptr), (o), (n));					\
 })
 
-#undef ____cmpxchg
-
 #endif /* _ALPHA_CMPXCHG_H */
diff --git a/arch/alpha/include/asm/xchg.h b/arch/alpha/include/asm/xchg.h
deleted file mode 100644
index 7adb80c6746a..000000000000
--- a/arch/alpha/include/asm/xchg.h
+++ /dev/null
@@ -1,246 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ALPHA_CMPXCHG_H
-#error Do not include xchg.h directly!
-#else
-/*
- * xchg/xchg_local and cmpxchg/cmpxchg_local share the same code
- * except that local version do not have the expensive memory barrier.
- * So this file is included twice from asm/cmpxchg.h.
- */
-
-/*
- * Atomic exchange.
- * Since it can be used to implement critical sections
- * it must clobber "memory" (also for interrupts in UP).
- */
-
-static inline unsigned long
-____xchg(_u8, volatile char *m, unsigned long val)
-{
-	unsigned long ret, tmp, addr64;
-
-	__asm__ __volatile__(
-	"	andnot	%4,7,%3\n"
-	"	insbl	%1,%4,%1\n"
-	"1:	ldq_l	%2,0(%3)\n"
-	"	extbl	%2,%4,%0\n"
-	"	mskbl	%2,%4,%2\n"
-	"	or	%1,%2,%2\n"
-	"	stq_c	%2,0(%3)\n"
-	"	beq	%2,2f\n"
-	".subsection 2\n"
-	"2:	br	1b\n"
-	".previous"
-	: "=&r" (ret), "=&r" (val), "=&r" (tmp), "=&r" (addr64)
-	: "r" ((long)m), "1" (val) : "memory");
-
-	return ret;
-}
-
-static inline unsigned long
-____xchg(_u16, volatile short *m, unsigned long val)
-{
-	unsigned long ret, tmp, addr64;
-
-	__asm__ __volatile__(
-	"	andnot	%4,7,%3\n"
-	"	inswl	%1,%4,%1\n"
-	"1:	ldq_l	%2,0(%3)\n"
-	"	extwl	%2,%4,%0\n"
-	"	mskwl	%2,%4,%2\n"
-	"	or	%1,%2,%2\n"
-	"	stq_c	%2,0(%3)\n"
-	"	beq	%2,2f\n"
-	".subsection 2\n"
-	"2:	br	1b\n"
-	".previous"
-	: "=&r" (ret), "=&r" (val), "=&r" (tmp), "=&r" (addr64)
-	: "r" ((long)m), "1" (val) : "memory");
-
-	return ret;
-}
-
-static inline unsigned long
-____xchg(_u32, volatile int *m, unsigned long val)
-{
-	unsigned long dummy;
-
-	__asm__ __volatile__(
-	"1:	ldl_l %0,%4\n"
-	"	bis $31,%3,%1\n"
-	"	stl_c %1,%2\n"
-	"	beq %1,2f\n"
-	".subsection 2\n"
-	"2:	br 1b\n"
-	".previous"
-	: "=&r" (val), "=&r" (dummy), "=m" (*m)
-	: "rI" (val), "m" (*m) : "memory");
-
-	return val;
-}
-
-static inline unsigned long
-____xchg(_u64, volatile long *m, unsigned long val)
-{
-	unsigned long dummy;
-
-	__asm__ __volatile__(
-	"1:	ldq_l %0,%4\n"
-	"	bis $31,%3,%1\n"
-	"	stq_c %1,%2\n"
-	"	beq %1,2f\n"
-	".subsection 2\n"
-	"2:	br 1b\n"
-	".previous"
-	: "=&r" (val), "=&r" (dummy), "=m" (*m)
-	: "rI" (val), "m" (*m) : "memory");
-
-	return val;
-}
-
-/* This function doesn't exist, so you'll get a linker error
-   if something tries to do an invalid xchg().  */
-extern void __xchg_called_with_bad_pointer(void);
-
-static __always_inline unsigned long
-____xchg(, volatile void *ptr, unsigned long x, int size)
-{
-	switch (size) {
-		case 1:
-			return ____xchg(_u8, ptr, x);
-		case 2:
-			return ____xchg(_u16, ptr, x);
-		case 4:
-			return ____xchg(_u32, ptr, x);
-		case 8:
-			return ____xchg(_u64, ptr, x);
-	}
-	__xchg_called_with_bad_pointer();
-	return x;
-}
-
-/*
- * Atomic compare and exchange.  Compare OLD with MEM, if identical,
- * store NEW in MEM.  Return the initial value in MEM.  Success is
- * indicated by comparing RETURN with OLD.
- */
-
-static inline unsigned long
-____cmpxchg(_u8, volatile char *m, unsigned char old, unsigned char new)
-{
-	unsigned long prev, tmp, cmp, addr64;
-
-	__asm__ __volatile__(
-	"	andnot	%5,7,%4\n"
-	"	insbl	%1,%5,%1\n"
-	"1:	ldq_l	%2,0(%4)\n"
-	"	extbl	%2,%5,%0\n"
-	"	cmpeq	%0,%6,%3\n"
-	"	beq	%3,2f\n"
-	"	mskbl	%2,%5,%2\n"
-	"	or	%1,%2,%2\n"
-	"	stq_c	%2,0(%4)\n"
-	"	beq	%2,3f\n"
-	"2:\n"
-	".subsection 2\n"
-	"3:	br	1b\n"
-	".previous"
-	: "=&r" (prev), "=&r" (new), "=&r" (tmp), "=&r" (cmp), "=&r" (addr64)
-	: "r" ((long)m), "Ir" (old), "1" (new) : "memory");
-
-	return prev;
-}
-
-static inline unsigned long
-____cmpxchg(_u16, volatile short *m, unsigned short old, unsigned short new)
-{
-	unsigned long prev, tmp, cmp, addr64;
-
-	__asm__ __volatile__(
-	"	andnot	%5,7,%4\n"
-	"	inswl	%1,%5,%1\n"
-	"1:	ldq_l	%2,0(%4)\n"
-	"	extwl	%2,%5,%0\n"
-	"	cmpeq	%0,%6,%3\n"
-	"	beq	%3,2f\n"
-	"	mskwl	%2,%5,%2\n"
-	"	or	%1,%2,%2\n"
-	"	stq_c	%2,0(%4)\n"
-	"	beq	%2,3f\n"
-	"2:\n"
-	".subsection 2\n"
-	"3:	br	1b\n"
-	".previous"
-	: "=&r" (prev), "=&r" (new), "=&r" (tmp), "=&r" (cmp), "=&r" (addr64)
-	: "r" ((long)m), "Ir" (old), "1" (new) : "memory");
-
-	return prev;
-}
-
-static inline unsigned long
-____cmpxchg(_u32, volatile int *m, int old, int new)
-{
-	unsigned long prev, cmp;
-
-	__asm__ __volatile__(
-	"1:	ldl_l %0,%5\n"
-	"	cmpeq %0,%3,%1\n"
-	"	beq %1,2f\n"
-	"	mov %4,%1\n"
-	"	stl_c %1,%2\n"
-	"	beq %1,3f\n"
-	"2:\n"
-	".subsection 2\n"
-	"3:	br 1b\n"
-	".previous"
-	: "=&r"(prev), "=&r"(cmp), "=m"(*m)
-	: "r"((long) old), "r"(new), "m"(*m) : "memory");
-
-	return prev;
-}
-
-static inline unsigned long
-____cmpxchg(_u64, volatile long *m, unsigned long old, unsigned long new)
-{
-	unsigned long prev, cmp;
-
-	__asm__ __volatile__(
-	"1:	ldq_l %0,%5\n"
-	"	cmpeq %0,%3,%1\n"
-	"	beq %1,2f\n"
-	"	mov %4,%1\n"
-	"	stq_c %1,%2\n"
-	"	beq %1,3f\n"
-	"2:\n"
-	".subsection 2\n"
-	"3:	br 1b\n"
-	".previous"
-	: "=&r"(prev), "=&r"(cmp), "=m"(*m)
-	: "r"((long) old), "r"(new), "m"(*m) : "memory");
-
-	return prev;
-}
-
-/* This function doesn't exist, so you'll get a linker error
-   if something tries to do an invalid cmpxchg().  */
-extern void __cmpxchg_called_with_bad_pointer(void);
-
-static __always_inline unsigned long
-____cmpxchg(, volatile void *ptr, unsigned long old, unsigned long new,
-	      int size)
-{
-	switch (size) {
-		case 1:
-			return ____cmpxchg(_u8, ptr, old, new);
-		case 2:
-			return ____cmpxchg(_u16, ptr, old, new);
-		case 4:
-			return ____cmpxchg(_u32, ptr, old, new);
-		case 8:
-			return ____cmpxchg(_u64, ptr, old, new);
-	}
-	__cmpxchg_called_with_bad_pointer();
-	return old;
-}
-
-#endif

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 cmpxchg 12/13] sh: Emulate one-byte cmpxchg
  2024-05-02 20:53               ` Al Viro
  2024-05-02 21:01                 ` alpha cmpxchg.h (was Re: [PATCH v2 cmpxchg 12/13] sh: Emulate one-byte cmpxchg) Al Viro
@ 2024-05-02 21:18                 ` Paul E. McKenney
  2024-05-02 22:07                   ` Al Viro
  1 sibling, 1 reply; 71+ messages in thread
From: Paul E. McKenney @ 2024-05-02 21:18 UTC (permalink / raw)
  To: Al Viro
  Cc: John Paul Adrian Glaubitz, linux-arch, linux-kernel, elver, akpm,
	tglx, peterz, dianders, pmladek, arnd, torvalds, kernel-team,
	Andi Shyti, Palmer Dabbelt, Masami Hiramatsu, linux-sh

On Thu, May 02, 2024 at 09:53:45PM +0100, Al Viro wrote:
> On Thu, May 02, 2024 at 06:33:49AM -0700, Paul E. McKenney wrote:
> 
> > Understood, and this sort of compatibility consideration is why this
> > version of this patchset does not emulate two-byte (16-bit) cmpxchg()
> > operations.  The original (RFC) series did emulate these, which does
> > not work on a few architectures that do not provide 16-bit load/store
> > instructions, hence no 16-bit support in this series.
> > 
> > So this one-byte-only series affects only Alpha systems lacking
> > single-byte load/store instructions.  If I understand correctly, Alpha
> > 21164A (EV56) and later *do* have single-byte load/store instructions,
> > and thus are still just fine.  In fact, it looks like EV56 also has
> > two-byte load/store instructions, and so would have been OK with
> > the original one-/two-byte RFC series.
> 
> Wait a sec.  On Alpha we already implement 16bit and 8bit xchg and cmpxchg.
> See arch/alpha/include/asm/xchg.h:
> static inline unsigned long
> ____cmpxchg(_u16, volatile short *m, unsigned short old, unsigned short new)
> {
>        unsigned long prev, tmp, cmp, addr64;
> 
>        __asm__ __volatile__(
>        "       andnot  %5,7,%4\n"
>        "       inswl   %1,%5,%1\n"
>        "1:     ldq_l   %2,0(%4)\n"
>        "       extwl   %2,%5,%0\n"
>        "       cmpeq   %0,%6,%3\n"
>        "       beq     %3,2f\n"
>        "       mskwl   %2,%5,%2\n"
>        "       or      %1,%2,%2\n"
>        "       stq_c   %2,0(%4)\n"
>        "       beq     %2,3f\n"
>        "2:\n"
>        ".subsection 2\n"
>        "3:     br      1b\n"
>        ".previous"
>        : "=&r" (prev), "=&r" (new), "=&r" (tmp), "=&r" (cmp), "=&r" (addr64)
>        : "r" ((long)m), "Ir" (old), "1" (new) : "memory");
> 
>        return prev;
> }
> 
> Load-locked and store-conditional are done on 64bit value, with
> 16bit operations done in registers.  This is what 16bit store
> (assignment to unsigned short *) turns into with
>         stw $17,0($16)		// *(u16*)r16 = r17
> and without -mbwx
>         insql $17,$16,$17	// r17 = r17 << (8 * (r16 & 7))
>         ldq_u $1,0($16)		// r1 = *(u64 *)(r16 & ~7)
> 	mskwl $1,$16,$1		// r1 &= ~(0xffff << (8 * (r16 & 7))
> 	bis $17,$1,$17		// r17 |= r1
> 	stq_u $17,0($16)	// *(u64 *)(r16 & ~7) = r17
> 
> What's more, load-locked/store-conditional doesn't have 16bit and 8bit
> variants on any Alphas - it's always 32bit (ldl_l) or 64bit (ldq_l).
> 
> What BWX adds is load/store byte/word, load/store byte/word unaligned
> and sign-extend byte/word.  IOW, it's absolutely irrelevant for
> cmpxchg (or xchg) purposes.

If you are only ever doing atomic read-modify-write operations on the
byte in question, then agreed, you don't care about byte loads and stores.

But there are use cases that do mix smp_store_release() with cmpxchg(),
and those use cases won't work unless at least byte store is implemented.
Or I suppose that we could use cmpxchg() instead of smp_store_release(),
but that is wasteful for architectures that do support byte stores.

So EV56 adds the byte loads and stores needed for those use cases.

Or am I missing your point?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 cmpxchg 0/8] Provide emulation for one--byte cmpxchg()
  2024-05-02 20:01     ` [PATCH v2 cmpxchg 0/8] Provide emulation for one--byte cmpxchg() Al Viro
@ 2024-05-02 21:20       ` Paul E. McKenney
  0 siblings, 0 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-05-02 21:20 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-arch, linux-kernel, elver, akpm, tglx, peterz, dianders,
	pmladek, torvalds, Arnd Bergmann

On Thu, May 02, 2024 at 09:01:53PM +0100, Al Viro wrote:
> On Wed, May 01, 2024 at 03:58:14PM -0700, Paul E. McKenney wrote:
> > Hello!
> > 
> > This v2 series provides emulation functions for one-byte cmpxchg,
>        ^^
> ...
> 
> > Changes since v2:

Good catch, and I clearly wasn't paying attention.  It should be
"Changes since v1" and then "Changes since RFC".

Again, apologies for my confusion!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 cmpxchg 12/13] sh: Emulate one-byte cmpxchg
  2024-05-02 13:33             ` Paul E. McKenney
  2024-05-02 20:53               ` Al Viro
@ 2024-05-02 21:50               ` Arnd Bergmann
  1 sibling, 0 replies; 71+ messages in thread
From: Arnd Bergmann @ 2024-05-02 21:50 UTC (permalink / raw)
  To: Paul E. McKenney, John Paul Adrian Glaubitz
  Cc: Linux-Arch, linux-kernel, Marco Elver, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Doug Anderson, Petr Mladek,
	Linus Torvalds, kernel-team, Andi Shyti, Palmer Dabbelt,
	Masami Hiramatsu, linux-sh, Alexander Viro

On Thu, May 2, 2024, at 15:33, Paul E. McKenney wrote:
> On Thu, May 02, 2024 at 07:11:52AM +0200, John Paul Adrian Glaubitz wrote:
>> On Wed, 2024-05-01 at 22:06 -0700, Paul E. McKenney wrote:
>> > > Does cmpxchg_emu_u8() have any advantages over the native xchg_u8()?
>> > 
>> > That would be 8-bit xchg() rather than 8-byte cmpxchg(), correct?
>> 
>> Indeed. I realized this after sending my reply.
>
> So this one-byte-only series affects only Alpha systems lacking
> single-byte load/store instructions.  If I understand correctly, Alpha
> 21164A (EV56) and later *do* have single-byte load/store instructions,
> and thus are still just fine.  In fact, it looks like EV56 also has
> two-byte load/store instructions, and so would have been OK with
> the original one-/two-byte RFC series.

Correct, the only other architecture I'm aware of that is missing
16-bit load/store entirely is ARMv3.

> Arnd will not be shy about correcting me if I am wrong.  ;-)

I'll take this as a reminder to send out my EV4/EV5 removal
series. I've merged my patches with Al's bugfixes and rebased
all on top of 6.9-rc now. It's a bit late now, so I'll
send this tomorrow:

https://git.kernel.org/pub/scm/linux/kernel/garch/alpha/include/asm/cmpxchg.hit/arnd/asm-generic.git/log/?h=alpha-cleanup-6.9

     Arnd

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 cmpxchg 12/13] sh: Emulate one-byte cmpxchg
  2024-05-02 21:18                 ` [PATCH v2 cmpxchg 12/13] sh: Emulate one-byte cmpxchg Paul E. McKenney
@ 2024-05-02 22:07                   ` Al Viro
  2024-05-02 23:12                     ` Paul E. McKenney
  0 siblings, 1 reply; 71+ messages in thread
From: Al Viro @ 2024-05-02 22:07 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: John Paul Adrian Glaubitz, linux-arch, linux-kernel, elver, akpm,
	tglx, peterz, dianders, pmladek, arnd, torvalds, kernel-team,
	Andi Shyti, Palmer Dabbelt, Masami Hiramatsu, linux-sh

On Thu, May 02, 2024 at 02:18:48PM -0700, Paul E. McKenney wrote:

> If you are only ever doing atomic read-modify-write operations on the
> byte in question, then agreed, you don't care about byte loads and stores.
> 
> But there are use cases that do mix smp_store_release() with cmpxchg(),
> and those use cases won't work unless at least byte store is implemented.
> Or I suppose that we could use cmpxchg() instead of smp_store_release(),
> but that is wasteful for architectures that do support byte stores.
> 
> So EV56 adds the byte loads and stores needed for those use cases.
> 
> Or am I missing your point?

arch/alpha/include/cmpxchg.h:
#define arch_cmpxchg(ptr, o, n)                                         \
({                                                                      \
        __typeof__(*(ptr)) __ret;                                       \
        __typeof__(*(ptr)) _o_ = (o);                                   \
        __typeof__(*(ptr)) _n_ = (n);                                   \
        smp_mb();                                                       \
        __ret = (__typeof__(*(ptr))) __cmpxchg((ptr),                   \
                (unsigned long)_o_, (unsigned long)_n_, sizeof(*(ptr)));\
        smp_mb();                                                       \
        __ret;                                                          \
})

Are those smp_mb() in there enough?

I'm probably missing your point, though - what mix of cmpxchg and
smp_store_release on 8bit values?

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: alpha cmpxchg.h (was Re: [PATCH v2 cmpxchg 12/13] sh: Emulate one-byte cmpxchg)
  2024-05-02 21:01                 ` alpha cmpxchg.h (was Re: [PATCH v2 cmpxchg 12/13] sh: Emulate one-byte cmpxchg) Al Viro
@ 2024-05-02 22:16                   ` Linus Torvalds
  0 siblings, 0 replies; 71+ messages in thread
From: Linus Torvalds @ 2024-05-02 22:16 UTC (permalink / raw)
  To: Al Viro
  Cc: Paul E. McKenney, John Paul Adrian Glaubitz, linux-arch,
	linux-kernel, elver, akpm, tglx, peterz, dianders, pmladek, arnd,
	kernel-team, Andi Shyti, Palmer Dabbelt, Masami Hiramatsu,
	linux-sh, linux-alpha

On Thu, 2 May 2024 at 14:01, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> +static inline unsigned long
> +____xchg_u8(volatile char *m, unsigned long val)
> +{
> +       unsigned long ret, tmp, addr64;
> +
> +       __asm__ __volatile__(
> +       "       andnot  %4,7,%3\n"
> +       "       insbl   %1,%4,%1\n"
> +       "1:     ldq_l   %2,0(%3)\n"
> +       "       extbl   %2,%4,%0\n"
> +       "       mskbl   %2,%4,%2\n"
> +       "       or      %1,%2,%2\n"
> +       "       stq_c   %2,0(%3)\n"
> +       "       beq     %2,2f\n"
> +       ".subsection 2\n"
> +       "2:     br      1b\n"
> +       ".previous"
> +       : "=&r" (ret), "=&r" (val), "=&r" (tmp), "=&r" (addr64)
> +       : "r" ((long)m), "1" (val) : "memory");
> +
> +       return ret;
> +}

Side note: if you move this around, I think you should just uninline
it too and turn it into a function call.

This inline asm doesn't actually take any advantage of the inlining.
The main reason to inline something like this is that you could then
deal with different compile-time alignments better than using the
generic software sequence. But that's not what the inline asm actually
does, and it uses the worst-case code sequence for inserting the byte.

Put that together with "byte and word xchg are rare", and it really
smells to me like we shouldn't be inlining this.

Now, the 32-bit and 64-bit cases are different - more common, but also
much simpler code sequences. They seem worth inlining.

That said, maybe for alpha, the "just move code around" is better than
"fix up old bad decisions" just because the effort is lower.

              Linus

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 cmpxchg 12/13] sh: Emulate one-byte cmpxchg
  2024-05-02 22:07                   ` Al Viro
@ 2024-05-02 23:12                     ` Paul E. McKenney
  2024-05-02 23:24                       ` Al Viro
  2024-05-02 23:32                       ` Linus Torvalds
  0 siblings, 2 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-05-02 23:12 UTC (permalink / raw)
  To: Al Viro
  Cc: John Paul Adrian Glaubitz, linux-arch, linux-kernel, elver, akpm,
	tglx, peterz, dianders, pmladek, arnd, torvalds, kernel-team,
	Andi Shyti, Palmer Dabbelt, Masami Hiramatsu, linux-sh

On Thu, May 02, 2024 at 11:07:57PM +0100, Al Viro wrote:
> On Thu, May 02, 2024 at 02:18:48PM -0700, Paul E. McKenney wrote:
> 
> > If you are only ever doing atomic read-modify-write operations on the
> > byte in question, then agreed, you don't care about byte loads and stores.
> > 
> > But there are use cases that do mix smp_store_release() with cmpxchg(),
> > and those use cases won't work unless at least byte store is implemented.
> > Or I suppose that we could use cmpxchg() instead of smp_store_release(),
> > but that is wasteful for architectures that do support byte stores.
> > 
> > So EV56 adds the byte loads and stores needed for those use cases.
> > 
> > Or am I missing your point?
> 
> arch/alpha/include/cmpxchg.h:
> #define arch_cmpxchg(ptr, o, n)                                         \
> ({                                                                      \
>         __typeof__(*(ptr)) __ret;                                       \
>         __typeof__(*(ptr)) _o_ = (o);                                   \
>         __typeof__(*(ptr)) _n_ = (n);                                   \
>         smp_mb();                                                       \
>         __ret = (__typeof__(*(ptr))) __cmpxchg((ptr),                   \
>                 (unsigned long)_o_, (unsigned long)_n_, sizeof(*(ptr)));\
>         smp_mb();                                                       \
>         __ret;                                                          \
> })
> 
> Are those smp_mb() in there enough?
> 
> I'm probably missing your point, though - what mix of cmpxchg and
> smp_store_release on 8bit values?

One of RCU's state machines uses smp_store_release() to start the
state machine (only one task gets to do this) and cmpxchg() to update
state beyond that point.  And the state is 8 bits so that it and other
state fits into 32 bits to allow a single check for multiple conditions
elsewhere.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 cmpxchg 12/13] sh: Emulate one-byte cmpxchg
  2024-05-02 23:12                     ` Paul E. McKenney
@ 2024-05-02 23:24                       ` Al Viro
  2024-05-02 23:45                         ` Paul E. McKenney
  2024-05-02 23:32                       ` Linus Torvalds
  1 sibling, 1 reply; 71+ messages in thread
From: Al Viro @ 2024-05-02 23:24 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: John Paul Adrian Glaubitz, linux-arch, linux-kernel, elver, akpm,
	tglx, peterz, dianders, pmladek, arnd, torvalds, kernel-team,
	Andi Shyti, Palmer Dabbelt, Masami Hiramatsu, linux-sh

On Thu, May 02, 2024 at 04:12:44PM -0700, Paul E. McKenney wrote:

> > I'm probably missing your point, though - what mix of cmpxchg and
> > smp_store_release on 8bit values?
> 
> One of RCU's state machines uses smp_store_release() to start the
> state machine (only one task gets to do this) and cmpxchg() to update
> state beyond that point.  And the state is 8 bits so that it and other
> state fits into 32 bits to allow a single check for multiple conditions
> elsewhere.

Humm...  smp_store_release() of 8bit on old alpha is mb + fetch 64bit + replace
8 bits + store 64bit...

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 cmpxchg 12/13] sh: Emulate one-byte cmpxchg
  2024-05-02 23:12                     ` Paul E. McKenney
  2024-05-02 23:24                       ` Al Viro
@ 2024-05-02 23:32                       ` Linus Torvalds
  2024-05-03  0:16                         ` Paul E. McKenney
  1 sibling, 1 reply; 71+ messages in thread
From: Linus Torvalds @ 2024-05-02 23:32 UTC (permalink / raw)
  To: paulmck
  Cc: Al Viro, John Paul Adrian Glaubitz, linux-arch, linux-kernel,
	elver, akpm, tglx, peterz, dianders, pmladek, arnd, kernel-team,
	Andi Shyti, Palmer Dabbelt, Masami Hiramatsu, linux-sh

On Thu, 2 May 2024 at 16:12, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> One of RCU's state machines uses smp_store_release() to start the
> state machine (only one task gets to do this) and cmpxchg() to update
> state beyond that point.  And the state is 8 bits so that it and other
> state fits into 32 bits to allow a single check for multiple conditions
> elsewhere.

Note that since alpha lacks the release-acquire model, it's always
going to be a full memory barrier before the store.

And then the store turns into a load-mask-store for older alphas.

So it's going to be a complete mess from a performance standpoint regardless.

Happily, I doubt anybody really cares.

I've occasionally wondered if we have situations where the
"smp_store_release()" only cares about previous *writes* being ordered
(ie a "smp_wmb()+WRITE_ONCE" would be sufficient).

It makes no difference on x86 (all stores are relases), power64 (wmb
and store_release are both LWSYNC) or arm64 (str is documentated to be
cheaper than DMB).

On alpha, smp_wmb()+WRITE_ONCE() is cheaper than smp_store_release(),
but nobody sane cares.

But *if* we have a situation where the "smp_store_release()" might be
just a "previous writes need to be visible" rather than ordering
previous reads too, we could maybe introduce that kind of op. I
_think_ the RCU writes tend to be of that kind?

                    Linus

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 cmpxchg 12/13] sh: Emulate one-byte cmpxchg
  2024-05-02 23:24                       ` Al Viro
@ 2024-05-02 23:45                         ` Paul E. McKenney
  0 siblings, 0 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-05-02 23:45 UTC (permalink / raw)
  To: Al Viro
  Cc: John Paul Adrian Glaubitz, linux-arch, linux-kernel, elver, akpm,
	tglx, peterz, dianders, pmladek, arnd, torvalds, kernel-team,
	Andi Shyti, Palmer Dabbelt, Masami Hiramatsu, linux-sh

On Fri, May 03, 2024 at 12:24:47AM +0100, Al Viro wrote:
> On Thu, May 02, 2024 at 04:12:44PM -0700, Paul E. McKenney wrote:
> 
> > > I'm probably missing your point, though - what mix of cmpxchg and
> > > smp_store_release on 8bit values?
> > 
> > One of RCU's state machines uses smp_store_release() to start the
> > state machine (only one task gets to do this) and cmpxchg() to update
> > state beyond that point.  And the state is 8 bits so that it and other
> > state fits into 32 bits to allow a single check for multiple conditions
> > elsewhere.
> 
> Humm...  smp_store_release() of 8bit on old alpha is mb + fetch 64bit + replace
> 8 bits + store 64bit...

Agreed, which is why Arnd is moving his patches ahead.  (He and I
discussed this some weeks back, so not a surprise for him.)

For my part, I dropped 16-bit cmpxchg emulation when moving from the
RFC series to v1.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 cmpxchg 12/13] sh: Emulate one-byte cmpxchg
  2024-05-02 23:32                       ` Linus Torvalds
@ 2024-05-03  0:16                         ` Paul E. McKenney
  0 siblings, 0 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-05-03  0:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, John Paul Adrian Glaubitz, linux-arch, linux-kernel,
	elver, akpm, tglx, peterz, dianders, pmladek, arnd, kernel-team,
	Andi Shyti, Palmer Dabbelt, Masami Hiramatsu, linux-sh

On Thu, May 02, 2024 at 04:32:35PM -0700, Linus Torvalds wrote:
> On Thu, 2 May 2024 at 16:12, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > One of RCU's state machines uses smp_store_release() to start the
> > state machine (only one task gets to do this) and cmpxchg() to update
> > state beyond that point.  And the state is 8 bits so that it and other
> > state fits into 32 bits to allow a single check for multiple conditions
> > elsewhere.
> 
> Note that since alpha lacks the release-acquire model, it's always
> going to be a full memory barrier before the store.
> 
> And then the store turns into a load-mask-store for older alphas.
> 
> So it's going to be a complete mess from a performance standpoint regardless.

And on those older machines, a mess functionally because the other
three bytes in that same 32-bit word can be concurrently updated.
Hence Arnd's patch being necessary here.

EV56 and later all have single-byte stores, so they are OK.  They were
introduced in the mid-1990s, so even they are antiques.  ;-)

> Happily, I doubt anybody really cares.

Here is hoping!

> I've occasionally wondered if we have situations where the
> "smp_store_release()" only cares about previous *writes* being ordered
> (ie a "smp_wmb()+WRITE_ONCE" would be sufficient).

Back in the day, rcu_assign_pointer() worked this way.  But later there
were a few use cases where ordering prior reads was needed.

And in this case, we just barely need that full store-release
functionality.  There is a preceding mutex lock-unlock pair that provides
a full barrier post-boot on almost all systems.

> It makes no difference on x86 (all stores are relases), power64 (wmb
> and store_release are both LWSYNC) or arm64 (str is documentated to be
> cheaper than DMB).
> 
> On alpha, smp_wmb()+WRITE_ONCE() is cheaper than smp_store_release(),
> but nobody sane cares.
> 
> But *if* we have a situation where the "smp_store_release()" might be
> just a "previous writes need to be visible" rather than ordering
> previous reads too, we could maybe introduce that kind of op. I
> _think_ the RCU writes tend to be of that kind?

Most of the time, rcu_assign_pointer() only needs to order prior writes,
not both reads and writes.  In theory, we could make an something like
an rcu_assign_pointer_reads_too(), though hopefully with a shorter name,
and go back to smp_wmb() for rcu_assign_pointer().

But in practice, I am having a really hard time convincing myself that
it would be worth it.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 cmpxchg 11/13] csky: Emulate one-byte cmpxchg
  2024-05-01 23:01     ` [PATCH v2 cmpxchg 11/13] csky: " Paul E. McKenney
@ 2024-05-11  6:42       ` Guo Ren
  2024-05-11 14:49         ` Paul E. McKenney
  0 siblings, 1 reply; 71+ messages in thread
From: Guo Ren @ 2024-05-11  6:42 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-arch, linux-kernel, elver, akpm, tglx, peterz, dianders,
	pmladek, arnd, torvalds, kernel-team, Yujie Liu, linux-csky

On Thu, May 2, 2024 at 7:01 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on csky.
>
> [ paulmck: Apply kernel test robot feedback. ]
> [ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]
>
> Co-developed-by: Yujie Liu <yujie.liu@intel.com>
> Signed-off-by: Yujie Liu <yujie.liu@intel.com>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Tested-by: Yujie Liu <yujie.liu@intel.com>
> Cc: Guo Ren <guoren@kernel.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: <linux-csky@vger.kernel.org>
> ---
>  arch/csky/Kconfig               |  1 +
>  arch/csky/include/asm/cmpxchg.h | 10 ++++++++++
>  2 files changed, 11 insertions(+)
>
> diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig
> index d3ac36751ad1f..5479707eb5d10 100644
> --- a/arch/csky/Kconfig
> +++ b/arch/csky/Kconfig
> @@ -37,6 +37,7 @@ config CSKY
>         select ARCH_INLINE_SPIN_UNLOCK_BH if !PREEMPTION
>         select ARCH_INLINE_SPIN_UNLOCK_IRQ if !PREEMPTION
>         select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE if !PREEMPTION
> +       select ARCH_NEED_CMPXCHG_1_EMU
>         select ARCH_WANT_FRAME_POINTERS if !CPU_CK610 && $(cc-option,-mbacktrace)
>         select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
>         select COMMON_CLK
> diff --git a/arch/csky/include/asm/cmpxchg.h b/arch/csky/include/asm/cmpxchg.h
> index 916043b845f14..db6dda47184e4 100644
> --- a/arch/csky/include/asm/cmpxchg.h
> +++ b/arch/csky/include/asm/cmpxchg.h
> @@ -6,6 +6,7 @@
>  #ifdef CONFIG_SMP
>  #include <linux/bug.h>
>  #include <asm/barrier.h>
> +#include <linux/cmpxchg-emu.h>
>
>  #define __xchg_relaxed(new, ptr, size)                         \
>  ({                                                             \
> @@ -61,6 +62,9 @@
>         __typeof__(old) __old = (old);                          \
>         __typeof__(*(ptr)) __ret;                               \
>         switch (size) {                                         \
> +       case 1:                                                 \
> +               __ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, (uintptr_t)__old, (uintptr_t)__new); \
> +               break;                                          \
>         case 4:                                                 \
>                 asm volatile (                                  \
>                 "1:     ldex.w          %0, (%3) \n"            \
> @@ -91,6 +95,9 @@
>         __typeof__(old) __old = (old);                          \
>         __typeof__(*(ptr)) __ret;                               \
>         switch (size) {                                         \
> +       case 1:                                                 \
> +               __ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, (uintptr_t)__old, (uintptr_t)__new); \
> +               break;                                          \
>         case 4:                                                 \
>                 asm volatile (                                  \
>                 "1:     ldex.w          %0, (%3) \n"            \
> @@ -122,6 +129,9 @@
>         __typeof__(old) __old = (old);                          \
>         __typeof__(*(ptr)) __ret;                               \
>         switch (size) {                                         \
> +       case 1:                                                 \
> +               __ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, (uintptr_t)__old, (uintptr_t)__new); \
> +               break;                                          \
>         case 4:                                                 \
>                 asm volatile (                                  \
>                 RELEASE_FENCE                                   \
> --
> 2.40.1
>
Reviewed-by: Guo Ren <guoren@kernel.org>

I will optimize it after ARCH_NEED_CMPXCHG_1_EMU is merged.

-- 
Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 cmpxchg 11/13] csky: Emulate one-byte cmpxchg
  2024-05-11  6:42       ` Guo Ren
@ 2024-05-11 14:49         ` Paul E. McKenney
  0 siblings, 0 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-05-11 14:49 UTC (permalink / raw)
  To: Guo Ren
  Cc: linux-arch, linux-kernel, elver, akpm, tglx, peterz, dianders,
	pmladek, arnd, torvalds, kernel-team, Yujie Liu, linux-csky

On Sat, May 11, 2024 at 02:42:17PM +0800, Guo Ren wrote:
> On Thu, May 2, 2024 at 7:01 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on csky.
> >
> > [ paulmck: Apply kernel test robot feedback. ]
> > [ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]
> >
> > Co-developed-by: Yujie Liu <yujie.liu@intel.com>
> > Signed-off-by: Yujie Liu <yujie.liu@intel.com>
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > Tested-by: Yujie Liu <yujie.liu@intel.com>
> > Cc: Guo Ren <guoren@kernel.org>
> > Cc: Arnd Bergmann <arnd@arndb.de>
> > Cc: <linux-csky@vger.kernel.org>
> > ---
> >  arch/csky/Kconfig               |  1 +
> >  arch/csky/include/asm/cmpxchg.h | 10 ++++++++++
> >  2 files changed, 11 insertions(+)
> >
> > diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig
> > index d3ac36751ad1f..5479707eb5d10 100644
> > --- a/arch/csky/Kconfig
> > +++ b/arch/csky/Kconfig
> > @@ -37,6 +37,7 @@ config CSKY
> >         select ARCH_INLINE_SPIN_UNLOCK_BH if !PREEMPTION
> >         select ARCH_INLINE_SPIN_UNLOCK_IRQ if !PREEMPTION
> >         select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE if !PREEMPTION
> > +       select ARCH_NEED_CMPXCHG_1_EMU
> >         select ARCH_WANT_FRAME_POINTERS if !CPU_CK610 && $(cc-option,-mbacktrace)
> >         select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
> >         select COMMON_CLK
> > diff --git a/arch/csky/include/asm/cmpxchg.h b/arch/csky/include/asm/cmpxchg.h
> > index 916043b845f14..db6dda47184e4 100644
> > --- a/arch/csky/include/asm/cmpxchg.h
> > +++ b/arch/csky/include/asm/cmpxchg.h
> > @@ -6,6 +6,7 @@
> >  #ifdef CONFIG_SMP
> >  #include <linux/bug.h>
> >  #include <asm/barrier.h>
> > +#include <linux/cmpxchg-emu.h>
> >
> >  #define __xchg_relaxed(new, ptr, size)                         \
> >  ({                                                             \
> > @@ -61,6 +62,9 @@
> >         __typeof__(old) __old = (old);                          \
> >         __typeof__(*(ptr)) __ret;                               \
> >         switch (size) {                                         \
> > +       case 1:                                                 \
> > +               __ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, (uintptr_t)__old, (uintptr_t)__new); \
> > +               break;                                          \
> >         case 4:                                                 \
> >                 asm volatile (                                  \
> >                 "1:     ldex.w          %0, (%3) \n"            \
> > @@ -91,6 +95,9 @@
> >         __typeof__(old) __old = (old);                          \
> >         __typeof__(*(ptr)) __ret;                               \
> >         switch (size) {                                         \
> > +       case 1:                                                 \
> > +               __ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, (uintptr_t)__old, (uintptr_t)__new); \
> > +               break;                                          \
> >         case 4:                                                 \
> >                 asm volatile (                                  \
> >                 "1:     ldex.w          %0, (%3) \n"            \
> > @@ -122,6 +129,9 @@
> >         __typeof__(old) __old = (old);                          \
> >         __typeof__(*(ptr)) __ret;                               \
> >         switch (size) {                                         \
> > +       case 1:                                                 \
> > +               __ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, (uintptr_t)__old, (uintptr_t)__new); \
> > +               break;                                          \
> >         case 4:                                                 \
> >                 asm volatile (                                  \
> >                 RELEASE_FENCE                                   \
> > --
> > 2.40.1
> >
> Reviewed-by: Guo Ren <guoren@kernel.org>
> 
> I will optimize it after ARCH_NEED_CMPXCHG_1_EMU is merged.

Thank you very much!  I have applied this and added it to my pull
request for the upcoming merge window.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 cmpxchg 09/13] lib: Add one-byte emulation function
  2024-05-01 23:01     ` [PATCH v2 cmpxchg 09/13] lib: Add one-byte emulation function Paul E. McKenney
@ 2024-05-13 14:44       ` Boqun Feng
  2024-05-13 15:41         ` Paul E. McKenney
  0 siblings, 1 reply; 71+ messages in thread
From: Boqun Feng @ 2024-05-13 14:44 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-arch, linux-kernel, elver, akpm, tglx, peterz, dianders,
	pmladek, arnd, torvalds, kernel-team

On Wed, May 01, 2024 at 04:01:26PM -0700, Paul E. McKenney wrote:
> Architectures are required to provide four-byte cmpxchg() and 64-bit
> architectures are additionally required to provide eight-byte cmpxchg().
> However, there are cases where one-byte cmpxchg() would be extremely
> useful.  Therefore, provide cmpxchg_emu_u8() that emulates one-byte
> cmpxchg() in terms of four-byte cmpxchg().
> 
> Note that this emulations is fully ordered, and can (for example) cause
> one-byte cmpxchg_relaxed() to incur the overhead of full ordering.
> If this causes problems for a given architecture, that architecture is
> free to provide its own lighter-weight primitives.
> 
> [ paulmck: Apply Marco Elver feedback. ]
> [ paulmck: Apply kernel test robot feedback. ]
> [ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]
> 
> Link: https://lore.kernel.org/all/0733eb10-5e7a-4450-9b8a-527b97c842ff@paulmck-laptop/
> 
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Acked-by: Marco Elver <elver@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
> Cc: Douglas Anderson <dianders@chromium.org>
> Cc: Petr Mladek <pmladek@suse.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: <linux-arch@vger.kernel.org>
> ---
>  arch/Kconfig                |  3 +++
>  include/linux/cmpxchg-emu.h | 15 +++++++++++++
>  lib/Makefile                |  1 +
>  lib/cmpxchg-emu.c           | 45 +++++++++++++++++++++++++++++++++++++
>  4 files changed, 64 insertions(+)
>  create mode 100644 include/linux/cmpxchg-emu.h
>  create mode 100644 lib/cmpxchg-emu.c
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 9f066785bb71d..284663392eef8 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -1609,4 +1609,7 @@ config CC_HAS_SANE_FUNCTION_ALIGNMENT
>  	# strict alignment always, even with -falign-functions.
>  	def_bool CC_HAS_MIN_FUNCTION_ALIGNMENT || CC_IS_CLANG
>  
> +config ARCH_NEED_CMPXCHG_1_EMU
> +	bool
> +
>  endmenu
> diff --git a/include/linux/cmpxchg-emu.h b/include/linux/cmpxchg-emu.h
> new file mode 100644
> index 0000000000000..998deec67740a
> --- /dev/null
> +++ b/include/linux/cmpxchg-emu.h
> @@ -0,0 +1,15 @@
> +/* SPDX-License-Identifier: GPL-2.0+ */
> +/*
> + * Emulated 1-byte and 2-byte cmpxchg operations for architectures
> + * lacking direct support for these sizes.  These are implemented in terms
> + * of 4-byte cmpxchg operations.
> + *
> + * Copyright (C) 2024 Paul E. McKenney.
> + */
> +
> +#ifndef __LINUX_CMPXCHG_EMU_H
> +#define __LINUX_CMPXCHG_EMU_H
> +
> +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new);
> +
> +#endif /* __LINUX_CMPXCHG_EMU_H */
> diff --git a/lib/Makefile b/lib/Makefile
> index ffc6b2341b45a..cc3d52fdb477d 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -236,6 +236,7 @@ obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
>  lib-$(CONFIG_GENERIC_BUG) += bug.o
>  
>  obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o
> +obj-$(CONFIG_ARCH_NEED_CMPXCHG_1_EMU) += cmpxchg-emu.o
>  
>  obj-$(CONFIG_DYNAMIC_DEBUG_CORE) += dynamic_debug.o
>  #ensure exported functions have prototypes
> diff --git a/lib/cmpxchg-emu.c b/lib/cmpxchg-emu.c
> new file mode 100644
> index 0000000000000..27f6f97cb60dd
> --- /dev/null
> +++ b/lib/cmpxchg-emu.c
> @@ -0,0 +1,45 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * Emulated 1-byte cmpxchg operation for architectures lacking direct
> + * support for this size.  This is implemented in terms of 4-byte cmpxchg
> + * operations.
> + *
> + * Copyright (C) 2024 Paul E. McKenney.
> + */
> +
> +#include <linux/types.h>
> +#include <linux/export.h>
> +#include <linux/instrumented.h>
> +#include <linux/atomic.h>
> +#include <linux/panic.h>
> +#include <linux/bug.h>
> +#include <asm-generic/rwonce.h>
> +#include <linux/cmpxchg-emu.h>
> +
> +union u8_32 {
> +	u8 b[4];
> +	u32 w;
> +};
> +
> +/* Emulate one-byte cmpxchg() in terms of 4-byte cmpxchg. */
> +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new)
> +{
> +	u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3);
> +	int i = ((uintptr_t)p) & 0x3;
> +	union u8_32 old32;
> +	union u8_32 new32;
> +	u32 ret;
> +
> +	ret = READ_ONCE(*p32);
> +	do {
> +		old32.w = ret;
> +		if (old32.b[i] != old)
> +			return old32.b[i];
> +		new32.w = old32.w;
> +		new32.b[i] = new;
> +		instrument_atomic_read_write(p, 1);
> +		ret = data_race(cmpxchg(p32, old32.w, new32.w)); // Overridden above.

Just out of curiosity, why is this `data_race` needed? cmpxchg is atomic
so there should be no chance for a data race?

Regards,
Boqun

> +	} while (ret != old32.w);
> +	return old;
> +}
> +EXPORT_SYMBOL_GPL(cmpxchg_emu_u8);
> -- 
> 2.40.1
> 

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 cmpxchg 09/13] lib: Add one-byte emulation function
  2024-05-13 14:44       ` Boqun Feng
@ 2024-05-13 15:41         ` Paul E. McKenney
  2024-05-13 15:57           ` Boqun Feng
  0 siblings, 1 reply; 71+ messages in thread
From: Paul E. McKenney @ 2024-05-13 15:41 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linux-arch, linux-kernel, elver, akpm, tglx, peterz, dianders,
	pmladek, arnd, torvalds, kernel-team

On Mon, May 13, 2024 at 07:44:00AM -0700, Boqun Feng wrote:
> On Wed, May 01, 2024 at 04:01:26PM -0700, Paul E. McKenney wrote:
> > Architectures are required to provide four-byte cmpxchg() and 64-bit
> > architectures are additionally required to provide eight-byte cmpxchg().
> > However, there are cases where one-byte cmpxchg() would be extremely
> > useful.  Therefore, provide cmpxchg_emu_u8() that emulates one-byte
> > cmpxchg() in terms of four-byte cmpxchg().
> > 
> > Note that this emulations is fully ordered, and can (for example) cause
> > one-byte cmpxchg_relaxed() to incur the overhead of full ordering.
> > If this causes problems for a given architecture, that architecture is
> > free to provide its own lighter-weight primitives.
> > 
> > [ paulmck: Apply Marco Elver feedback. ]
> > [ paulmck: Apply kernel test robot feedback. ]
> > [ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]
> > 
> > Link: https://lore.kernel.org/all/0733eb10-5e7a-4450-9b8a-527b97c842ff@paulmck-laptop/
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > Acked-by: Marco Elver <elver@google.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
> > Cc: Douglas Anderson <dianders@chromium.org>
> > Cc: Petr Mladek <pmladek@suse.com>
> > Cc: Arnd Bergmann <arnd@arndb.de>
> > Cc: <linux-arch@vger.kernel.org>
> > ---
> >  arch/Kconfig                |  3 +++
> >  include/linux/cmpxchg-emu.h | 15 +++++++++++++
> >  lib/Makefile                |  1 +
> >  lib/cmpxchg-emu.c           | 45 +++++++++++++++++++++++++++++++++++++
> >  4 files changed, 64 insertions(+)
> >  create mode 100644 include/linux/cmpxchg-emu.h
> >  create mode 100644 lib/cmpxchg-emu.c
> > 
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index 9f066785bb71d..284663392eef8 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -1609,4 +1609,7 @@ config CC_HAS_SANE_FUNCTION_ALIGNMENT
> >  	# strict alignment always, even with -falign-functions.
> >  	def_bool CC_HAS_MIN_FUNCTION_ALIGNMENT || CC_IS_CLANG
> >  
> > +config ARCH_NEED_CMPXCHG_1_EMU
> > +	bool
> > +
> >  endmenu
> > diff --git a/include/linux/cmpxchg-emu.h b/include/linux/cmpxchg-emu.h
> > new file mode 100644
> > index 0000000000000..998deec67740a
> > --- /dev/null
> > +++ b/include/linux/cmpxchg-emu.h
> > @@ -0,0 +1,15 @@
> > +/* SPDX-License-Identifier: GPL-2.0+ */
> > +/*
> > + * Emulated 1-byte and 2-byte cmpxchg operations for architectures
> > + * lacking direct support for these sizes.  These are implemented in terms
> > + * of 4-byte cmpxchg operations.
> > + *
> > + * Copyright (C) 2024 Paul E. McKenney.
> > + */
> > +
> > +#ifndef __LINUX_CMPXCHG_EMU_H
> > +#define __LINUX_CMPXCHG_EMU_H
> > +
> > +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new);
> > +
> > +#endif /* __LINUX_CMPXCHG_EMU_H */
> > diff --git a/lib/Makefile b/lib/Makefile
> > index ffc6b2341b45a..cc3d52fdb477d 100644
> > --- a/lib/Makefile
> > +++ b/lib/Makefile
> > @@ -236,6 +236,7 @@ obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
> >  lib-$(CONFIG_GENERIC_BUG) += bug.o
> >  
> >  obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o
> > +obj-$(CONFIG_ARCH_NEED_CMPXCHG_1_EMU) += cmpxchg-emu.o
> >  
> >  obj-$(CONFIG_DYNAMIC_DEBUG_CORE) += dynamic_debug.o
> >  #ensure exported functions have prototypes
> > diff --git a/lib/cmpxchg-emu.c b/lib/cmpxchg-emu.c
> > new file mode 100644
> > index 0000000000000..27f6f97cb60dd
> > --- /dev/null
> > +++ b/lib/cmpxchg-emu.c
> > @@ -0,0 +1,45 @@
> > +// SPDX-License-Identifier: GPL-2.0+
> > +/*
> > + * Emulated 1-byte cmpxchg operation for architectures lacking direct
> > + * support for this size.  This is implemented in terms of 4-byte cmpxchg
> > + * operations.
> > + *
> > + * Copyright (C) 2024 Paul E. McKenney.
> > + */
> > +
> > +#include <linux/types.h>
> > +#include <linux/export.h>
> > +#include <linux/instrumented.h>
> > +#include <linux/atomic.h>
> > +#include <linux/panic.h>
> > +#include <linux/bug.h>
> > +#include <asm-generic/rwonce.h>
> > +#include <linux/cmpxchg-emu.h>
> > +
> > +union u8_32 {
> > +	u8 b[4];
> > +	u32 w;
> > +};
> > +
> > +/* Emulate one-byte cmpxchg() in terms of 4-byte cmpxchg. */
> > +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new)
> > +{
> > +	u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3);
> > +	int i = ((uintptr_t)p) & 0x3;
> > +	union u8_32 old32;
> > +	union u8_32 new32;
> > +	u32 ret;
> > +
> > +	ret = READ_ONCE(*p32);
> > +	do {
> > +		old32.w = ret;
> > +		if (old32.b[i] != old)
> > +			return old32.b[i];
> > +		new32.w = old32.w;
> > +		new32.b[i] = new;
> > +		instrument_atomic_read_write(p, 1);
> > +		ret = data_race(cmpxchg(p32, old32.w, new32.w)); // Overridden above.
> 
> Just out of curiosity, why is this `data_race` needed? cmpxchg is atomic
> so there should be no chance for a data race?

That is what I thought, too.  ;-)

The problem is that the cmpxchg() covers 32 bits, and so without that
data_race(), KCSAN would complain about data races with perfectly
legitimate concurrent accesses to the other three bytes.

The instrument_atomic_read_write(p, 1) beforehand tells KCSAN to complain
about concurrent accesses, but only to that one byte.

							Thanx, Paul

> Regards,
> Boqun
> 
> > +	} while (ret != old32.w);
> > +	return old;
> > +}
> > +EXPORT_SYMBOL_GPL(cmpxchg_emu_u8);
> > -- 
> > 2.40.1
> > 

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 cmpxchg 09/13] lib: Add one-byte emulation function
  2024-05-13 15:41         ` Paul E. McKenney
@ 2024-05-13 15:57           ` Boqun Feng
  2024-05-13 21:19             ` Boqun Feng
  0 siblings, 1 reply; 71+ messages in thread
From: Boqun Feng @ 2024-05-13 15:57 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-arch, linux-kernel, elver, akpm, tglx, peterz, dianders,
	pmladek, arnd, torvalds, kernel-team, Mark Rutland

On Mon, May 13, 2024 at 08:41:27AM -0700, Paul E. McKenney wrote:
[...]
> > > +#include <linux/types.h>
> > > +#include <linux/export.h>
> > > +#include <linux/instrumented.h>
> > > +#include <linux/atomic.h>
> > > +#include <linux/panic.h>
> > > +#include <linux/bug.h>
> > > +#include <asm-generic/rwonce.h>
> > > +#include <linux/cmpxchg-emu.h>
> > > +
> > > +union u8_32 {
> > > +	u8 b[4];
> > > +	u32 w;
> > > +};
> > > +
> > > +/* Emulate one-byte cmpxchg() in terms of 4-byte cmpxchg. */
> > > +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new)
> > > +{
> > > +	u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3);
> > > +	int i = ((uintptr_t)p) & 0x3;
> > > +	union u8_32 old32;
> > > +	union u8_32 new32;
> > > +	u32 ret;
> > > +
> > > +	ret = READ_ONCE(*p32);
> > > +	do {
> > > +		old32.w = ret;
> > > +		if (old32.b[i] != old)
> > > +			return old32.b[i];
> > > +		new32.w = old32.w;
> > > +		new32.b[i] = new;
> > > +		instrument_atomic_read_write(p, 1);
> > > +		ret = data_race(cmpxchg(p32, old32.w, new32.w)); // Overridden above.
> > 
> > Just out of curiosity, why is this `data_race` needed? cmpxchg is atomic
> > so there should be no chance for a data race?
> 
> That is what I thought, too.  ;-)
> 
> The problem is that the cmpxchg() covers 32 bits, and so without that
> data_race(), KCSAN would complain about data races with perfectly
> legitimate concurrent accesses to the other three bytes.
> 
> The instrument_atomic_read_write(p, 1) beforehand tells KCSAN to complain
> about concurrent accesses, but only to that one byte.
> 

Oh, I see. For that purpose, maybe we can just use raw_cmpxchg() here,
i.e. a cmpxchg() without any instrument in it. Cc Mark in case I'm
missing something.

Regards,
Boqun

> 							Thanx, Paul
> 

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 cmpxchg 09/13] lib: Add one-byte emulation function
  2024-05-13 15:57           ` Boqun Feng
@ 2024-05-13 21:19             ` Boqun Feng
  2024-05-14 14:22               ` Paul E. McKenney
  0 siblings, 1 reply; 71+ messages in thread
From: Boqun Feng @ 2024-05-13 21:19 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-arch, linux-kernel, elver, akpm, tglx, peterz, dianders,
	pmladek, arnd, torvalds, kernel-team, Mark Rutland

On Mon, May 13, 2024 at 08:57:16AM -0700, Boqun Feng wrote:
> On Mon, May 13, 2024 at 08:41:27AM -0700, Paul E. McKenney wrote:
> [...]
> > > > +#include <linux/types.h>
> > > > +#include <linux/export.h>
> > > > +#include <linux/instrumented.h>
> > > > +#include <linux/atomic.h>
> > > > +#include <linux/panic.h>
> > > > +#include <linux/bug.h>
> > > > +#include <asm-generic/rwonce.h>
> > > > +#include <linux/cmpxchg-emu.h>
> > > > +
> > > > +union u8_32 {
> > > > +	u8 b[4];
> > > > +	u32 w;
> > > > +};
> > > > +
> > > > +/* Emulate one-byte cmpxchg() in terms of 4-byte cmpxchg. */
> > > > +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new)
> > > > +{
> > > > +	u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3);
> > > > +	int i = ((uintptr_t)p) & 0x3;
> > > > +	union u8_32 old32;
> > > > +	union u8_32 new32;
> > > > +	u32 ret;
> > > > +
> > > > +	ret = READ_ONCE(*p32);
> > > > +	do {
> > > > +		old32.w = ret;
> > > > +		if (old32.b[i] != old)
> > > > +			return old32.b[i];
> > > > +		new32.w = old32.w;
> > > > +		new32.b[i] = new;
> > > > +		instrument_atomic_read_write(p, 1);
> > > > +		ret = data_race(cmpxchg(p32, old32.w, new32.w)); // Overridden above.
> > > 
> > > Just out of curiosity, why is this `data_race` needed? cmpxchg is atomic
> > > so there should be no chance for a data race?
> > 
> > That is what I thought, too.  ;-)
> > 
> > The problem is that the cmpxchg() covers 32 bits, and so without that
> > data_race(), KCSAN would complain about data races with perfectly
> > legitimate concurrent accesses to the other three bytes.
> > 
> > The instrument_atomic_read_write(p, 1) beforehand tells KCSAN to complain
> > about concurrent accesses, but only to that one byte.
> > 
> 
> Oh, I see. For that purpose, maybe we can just use raw_cmpxchg() here,
> i.e. a cmpxchg() without any instrument in it. Cc Mark in case I'm
> missing something.
> 

I just realized that the KCSAN instrumentation is already done in
cmpxchg() layer:

	#define cmpxchg(ptr, ...) \
	({ \
		typeof(ptr) __ai_ptr = (ptr); \
		kcsan_mb(); \
		instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
		raw_cmpxchg(__ai_ptr, __VA_ARGS__); \
	})

and, this function is lower in the layer, so it shouldn't have the
instrumentation itself. How about the following (based on today's RCU
dev branch)?

Regards,
Boqun

-------------------------------------------->8
Subject: [PATCH] lib: cmpxchg-emu: Make cmpxchg_emu_u8() noinstr

Currently, cmpxchg_emu_u8() is called via cmpxchg() or raw_cmpxchg()
which already makes the instrumentation decision:

* cmpxchg() case:

	cmpxchg():
	  kcsan_mb();
	  instrument_atomic_read_write(...);
	  raw_cmpxchg():
	    arch_cmpxchg():
	      cmpxchg_emu_u8();

... should have KCSAN instrumentation.

* raw_cmpxchg() case:

	raw_cmpxchg():
	  arch_cmpxchg():
	    cmpxchg_emu_u8();

... shouldn't have KCSAN instrumentation.

Therefore it's redundant to put KCSAN instrumentation in
cmpxchg_emu_u8() (along with the data_race() to get away the
instrumentation).

So make cmpxchg_emu_u8() a noinstr function, and remove the KCSAN
instrumentation inside it.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 include/linux/cmpxchg-emu.h |  4 +++-
 lib/cmpxchg-emu.c           | 14 ++++++++++----
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/include/linux/cmpxchg-emu.h b/include/linux/cmpxchg-emu.h
index 998deec67740..c4c85f41d9f4 100644
--- a/include/linux/cmpxchg-emu.h
+++ b/include/linux/cmpxchg-emu.h
@@ -10,6 +10,8 @@
 #ifndef __LINUX_CMPXCHG_EMU_H
 #define __LINUX_CMPXCHG_EMU_H
 
-uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new);
+#include <linux/compiler.h>
+
+noinstr uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new);
 
 #endif /* __LINUX_CMPXCHG_EMU_H */
diff --git a/lib/cmpxchg-emu.c b/lib/cmpxchg-emu.c
index 27f6f97cb60d..788c22cd4462 100644
--- a/lib/cmpxchg-emu.c
+++ b/lib/cmpxchg-emu.c
@@ -21,8 +21,13 @@ union u8_32 {
 	u32 w;
 };
 
-/* Emulate one-byte cmpxchg() in terms of 4-byte cmpxchg. */
-uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new)
+/*
+ * Emulate one-byte cmpxchg() in terms of 4-byte cmpxchg.
+ *
+ * This function is marked as 'noinstr' as the instrumentation should be done at
+ * outer layer.
+ */
+noinstr uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new)
 {
 	u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3);
 	int i = ((uintptr_t)p) & 0x3;
@@ -37,8 +42,9 @@ uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new)
 			return old32.b[i];
 		new32.w = old32.w;
 		new32.b[i] = new;
-		instrument_atomic_read_write(p, 1);
-		ret = data_race(cmpxchg(p32, old32.w, new32.w)); // Overridden above.
+
+		// raw_cmpxchg() is used here to avoid instrumentation.
+		ret = raw_cmpxchg(p32, old32.w, new32.w); // Overridden above.
 	} while (ret != old32.w);
 	return old;
 }
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 cmpxchg 09/13] lib: Add one-byte emulation function
  2024-05-13 21:19             ` Boqun Feng
@ 2024-05-14 14:22               ` Paul E. McKenney
  2024-05-14 14:53                 ` Boqun Feng
  0 siblings, 1 reply; 71+ messages in thread
From: Paul E. McKenney @ 2024-05-14 14:22 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linux-arch, linux-kernel, elver, akpm, tglx, peterz, dianders,
	pmladek, arnd, torvalds, kernel-team, Mark Rutland

On Mon, May 13, 2024 at 02:19:37PM -0700, Boqun Feng wrote:
> On Mon, May 13, 2024 at 08:57:16AM -0700, Boqun Feng wrote:
> > On Mon, May 13, 2024 at 08:41:27AM -0700, Paul E. McKenney wrote:
> > [...]
> > > > > +#include <linux/types.h>
> > > > > +#include <linux/export.h>
> > > > > +#include <linux/instrumented.h>
> > > > > +#include <linux/atomic.h>
> > > > > +#include <linux/panic.h>
> > > > > +#include <linux/bug.h>
> > > > > +#include <asm-generic/rwonce.h>
> > > > > +#include <linux/cmpxchg-emu.h>
> > > > > +
> > > > > +union u8_32 {
> > > > > +	u8 b[4];
> > > > > +	u32 w;
> > > > > +};
> > > > > +
> > > > > +/* Emulate one-byte cmpxchg() in terms of 4-byte cmpxchg. */
> > > > > +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new)
> > > > > +{
> > > > > +	u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3);
> > > > > +	int i = ((uintptr_t)p) & 0x3;
> > > > > +	union u8_32 old32;
> > > > > +	union u8_32 new32;
> > > > > +	u32 ret;
> > > > > +
> > > > > +	ret = READ_ONCE(*p32);
> > > > > +	do {
> > > > > +		old32.w = ret;
> > > > > +		if (old32.b[i] != old)
> > > > > +			return old32.b[i];
> > > > > +		new32.w = old32.w;
> > > > > +		new32.b[i] = new;
> > > > > +		instrument_atomic_read_write(p, 1);
> > > > > +		ret = data_race(cmpxchg(p32, old32.w, new32.w)); // Overridden above.
> > > > 
> > > > Just out of curiosity, why is this `data_race` needed? cmpxchg is atomic
> > > > so there should be no chance for a data race?
> > > 
> > > That is what I thought, too.  ;-)
> > > 
> > > The problem is that the cmpxchg() covers 32 bits, and so without that
> > > data_race(), KCSAN would complain about data races with perfectly
> > > legitimate concurrent accesses to the other three bytes.
> > > 
> > > The instrument_atomic_read_write(p, 1) beforehand tells KCSAN to complain
> > > about concurrent accesses, but only to that one byte.
> > > 
> > 
> > Oh, I see. For that purpose, maybe we can just use raw_cmpxchg() here,
> > i.e. a cmpxchg() without any instrument in it. Cc Mark in case I'm
> > missing something.
> > 
> 
> I just realized that the KCSAN instrumentation is already done in
> cmpxchg() layer:
> 
> 	#define cmpxchg(ptr, ...) \
> 	({ \
> 		typeof(ptr) __ai_ptr = (ptr); \
> 		kcsan_mb(); \
> 		instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
> 		raw_cmpxchg(__ai_ptr, __VA_ARGS__); \
> 	})
> 
> and, this function is lower in the layer, so it shouldn't have the
> instrumentation itself. How about the following (based on today's RCU
> dev branch)?

The raw_cmpxchg() looks nicer than the added data_race()!

One question below, though.

							Thanx, Paul

> Regards,
> Boqun
> 
> -------------------------------------------->8
> Subject: [PATCH] lib: cmpxchg-emu: Make cmpxchg_emu_u8() noinstr
> 
> Currently, cmpxchg_emu_u8() is called via cmpxchg() or raw_cmpxchg()
> which already makes the instrumentation decision:
> 
> * cmpxchg() case:
> 
> 	cmpxchg():
> 	  kcsan_mb();
> 	  instrument_atomic_read_write(...);
> 	  raw_cmpxchg():
> 	    arch_cmpxchg():
> 	      cmpxchg_emu_u8();
> 
> ... should have KCSAN instrumentation.
> 
> * raw_cmpxchg() case:
> 
> 	raw_cmpxchg():
> 	  arch_cmpxchg():
> 	    cmpxchg_emu_u8();
> 
> ... shouldn't have KCSAN instrumentation.
> 
> Therefore it's redundant to put KCSAN instrumentation in
> cmpxchg_emu_u8() (along with the data_race() to get away the
> instrumentation).
> 
> So make cmpxchg_emu_u8() a noinstr function, and remove the KCSAN
> instrumentation inside it.
> 
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> ---
>  include/linux/cmpxchg-emu.h |  4 +++-
>  lib/cmpxchg-emu.c           | 14 ++++++++++----
>  2 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/cmpxchg-emu.h b/include/linux/cmpxchg-emu.h
> index 998deec67740..c4c85f41d9f4 100644
> --- a/include/linux/cmpxchg-emu.h
> +++ b/include/linux/cmpxchg-emu.h
> @@ -10,6 +10,8 @@
>  #ifndef __LINUX_CMPXCHG_EMU_H
>  #define __LINUX_CMPXCHG_EMU_H
>  
> -uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new);
> +#include <linux/compiler.h>
> +
> +noinstr uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new);
>  
>  #endif /* __LINUX_CMPXCHG_EMU_H */
> diff --git a/lib/cmpxchg-emu.c b/lib/cmpxchg-emu.c
> index 27f6f97cb60d..788c22cd4462 100644
> --- a/lib/cmpxchg-emu.c
> +++ b/lib/cmpxchg-emu.c
> @@ -21,8 +21,13 @@ union u8_32 {
>  	u32 w;
>  };
>  
> -/* Emulate one-byte cmpxchg() in terms of 4-byte cmpxchg. */
> -uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new)
> +/*
> + * Emulate one-byte cmpxchg() in terms of 4-byte cmpxchg.
> + *
> + * This function is marked as 'noinstr' as the instrumentation should be done at
> + * outer layer.
> + */
> +noinstr uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new)
>  {
>  	u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3);
>  	int i = ((uintptr_t)p) & 0x3;
> @@ -37,8 +42,9 @@ uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new)
>  			return old32.b[i];
>  		new32.w = old32.w;
>  		new32.b[i] = new;
> -		instrument_atomic_read_write(p, 1);

Don't we need to keep that instrument_atomic_read_write() in order
to allow KCSAN to detect data races with plain C-language reads?
Or is that being handled some other way?

> -		ret = data_race(cmpxchg(p32, old32.w, new32.w)); // Overridden above.
> +
> +		// raw_cmpxchg() is used here to avoid instrumentation.
> +		ret = raw_cmpxchg(p32, old32.w, new32.w); // Overridden above.
>  	} while (ret != old32.w);
>  	return old;
>  }
> -- 
> 2.44.0
> 

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 cmpxchg 09/13] lib: Add one-byte emulation function
  2024-05-14 14:22               ` Paul E. McKenney
@ 2024-05-14 14:53                 ` Boqun Feng
  2024-05-14 15:02                   ` Paul E. McKenney
  0 siblings, 1 reply; 71+ messages in thread
From: Boqun Feng @ 2024-05-14 14:53 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-arch, linux-kernel, elver, akpm, tglx, peterz, dianders,
	pmladek, arnd, torvalds, kernel-team, Mark Rutland

On Tue, May 14, 2024 at 07:22:47AM -0700, Paul E. McKenney wrote:
> On Mon, May 13, 2024 at 02:19:37PM -0700, Boqun Feng wrote:
> > On Mon, May 13, 2024 at 08:57:16AM -0700, Boqun Feng wrote:
> > > On Mon, May 13, 2024 at 08:41:27AM -0700, Paul E. McKenney wrote:
> > > [...]
> > > > > > +#include <linux/types.h>
> > > > > > +#include <linux/export.h>
> > > > > > +#include <linux/instrumented.h>
> > > > > > +#include <linux/atomic.h>
> > > > > > +#include <linux/panic.h>
> > > > > > +#include <linux/bug.h>
> > > > > > +#include <asm-generic/rwonce.h>
> > > > > > +#include <linux/cmpxchg-emu.h>
> > > > > > +
> > > > > > +union u8_32 {
> > > > > > +	u8 b[4];
> > > > > > +	u32 w;
> > > > > > +};
> > > > > > +
> > > > > > +/* Emulate one-byte cmpxchg() in terms of 4-byte cmpxchg. */
> > > > > > +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new)
> > > > > > +{
> > > > > > +	u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3);
> > > > > > +	int i = ((uintptr_t)p) & 0x3;
> > > > > > +	union u8_32 old32;
> > > > > > +	union u8_32 new32;
> > > > > > +	u32 ret;
> > > > > > +
> > > > > > +	ret = READ_ONCE(*p32);
> > > > > > +	do {
> > > > > > +		old32.w = ret;
> > > > > > +		if (old32.b[i] != old)
> > > > > > +			return old32.b[i];
> > > > > > +		new32.w = old32.w;
> > > > > > +		new32.b[i] = new;
> > > > > > +		instrument_atomic_read_write(p, 1);
> > > > > > +		ret = data_race(cmpxchg(p32, old32.w, new32.w)); // Overridden above.
> > > > > 
> > > > > Just out of curiosity, why is this `data_race` needed? cmpxchg is atomic
> > > > > so there should be no chance for a data race?
> > > > 
> > > > That is what I thought, too.  ;-)
> > > > 
> > > > The problem is that the cmpxchg() covers 32 bits, and so without that
> > > > data_race(), KCSAN would complain about data races with perfectly
> > > > legitimate concurrent accesses to the other three bytes.
> > > > 
> > > > The instrument_atomic_read_write(p, 1) beforehand tells KCSAN to complain
> > > > about concurrent accesses, but only to that one byte.
> > > > 
> > > 
> > > Oh, I see. For that purpose, maybe we can just use raw_cmpxchg() here,
> > > i.e. a cmpxchg() without any instrument in it. Cc Mark in case I'm
> > > missing something.
> > > 
> > 
> > I just realized that the KCSAN instrumentation is already done in
> > cmpxchg() layer:
> > 
> > 	#define cmpxchg(ptr, ...) \
> > 	({ \
> > 		typeof(ptr) __ai_ptr = (ptr); \
> > 		kcsan_mb(); \
> > 		instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
> > 		raw_cmpxchg(__ai_ptr, __VA_ARGS__); \
> > 	})
> > 
> > and, this function is lower in the layer, so it shouldn't have the
> > instrumentation itself. How about the following (based on today's RCU
> > dev branch)?
> 
> The raw_cmpxchg() looks nicer than the added data_race()!
> 
> One question below, though.
> 
> 							Thanx, Paul
> 
> > Regards,
> > Boqun
> > 
> > -------------------------------------------->8
> > Subject: [PATCH] lib: cmpxchg-emu: Make cmpxchg_emu_u8() noinstr
> > 
> > Currently, cmpxchg_emu_u8() is called via cmpxchg() or raw_cmpxchg()
> > which already makes the instrumentation decision:
> > 
> > * cmpxchg() case:
> > 
> > 	cmpxchg():
> > 	  kcsan_mb();
> > 	  instrument_atomic_read_write(...);
> > 	  raw_cmpxchg():
> > 	    arch_cmpxchg():
> > 	      cmpxchg_emu_u8();
> > 
> > ... should have KCSAN instrumentation.
> > 
> > * raw_cmpxchg() case:
> > 
> > 	raw_cmpxchg():
> > 	  arch_cmpxchg():
> > 	    cmpxchg_emu_u8();
> > 
> > ... shouldn't have KCSAN instrumentation.
> > 
> > Therefore it's redundant to put KCSAN instrumentation in
> > cmpxchg_emu_u8() (along with the data_race() to get away the
> > instrumentation).
> > 
> > So make cmpxchg_emu_u8() a noinstr function, and remove the KCSAN
> > instrumentation inside it.
> > 
> > Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> > ---
> >  include/linux/cmpxchg-emu.h |  4 +++-
> >  lib/cmpxchg-emu.c           | 14 ++++++++++----
> >  2 files changed, 13 insertions(+), 5 deletions(-)
> > 
> > diff --git a/include/linux/cmpxchg-emu.h b/include/linux/cmpxchg-emu.h
> > index 998deec67740..c4c85f41d9f4 100644
> > --- a/include/linux/cmpxchg-emu.h
> > +++ b/include/linux/cmpxchg-emu.h
> > @@ -10,6 +10,8 @@
> >  #ifndef __LINUX_CMPXCHG_EMU_H
> >  #define __LINUX_CMPXCHG_EMU_H
> >  
> > -uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new);
> > +#include <linux/compiler.h>
> > +
> > +noinstr uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new);
> >  
> >  #endif /* __LINUX_CMPXCHG_EMU_H */
> > diff --git a/lib/cmpxchg-emu.c b/lib/cmpxchg-emu.c
> > index 27f6f97cb60d..788c22cd4462 100644
> > --- a/lib/cmpxchg-emu.c
> > +++ b/lib/cmpxchg-emu.c
> > @@ -21,8 +21,13 @@ union u8_32 {
> >  	u32 w;
> >  };
> >  
> > -/* Emulate one-byte cmpxchg() in terms of 4-byte cmpxchg. */
> > -uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new)
> > +/*
> > + * Emulate one-byte cmpxchg() in terms of 4-byte cmpxchg.
> > + *
> > + * This function is marked as 'noinstr' as the instrumentation should be done at
> > + * outer layer.
> > + */
> > +noinstr uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new)
> >  {
> >  	u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3);
> >  	int i = ((uintptr_t)p) & 0x3;
> > @@ -37,8 +42,9 @@ uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new)
> >  			return old32.b[i];
> >  		new32.w = old32.w;
> >  		new32.b[i] = new;
> > -		instrument_atomic_read_write(p, 1);
> 
> Don't we need to keep that instrument_atomic_read_write() in order
> to allow KCSAN to detect data races with plain C-language reads?
> Or is that being handled some other way?
> 

I think that's already covered by the current code, cmpxchg_emu_u8() is
called from cmpxchg() macro, which has a:

	instrument_atomic_read_write(p, sizeof(*p));

in it, and in the cmpxchg((u8*), ..) case, 'sizeof(*p)' is obviously 1
;-)

Regards,
Boqun

> > -		ret = data_race(cmpxchg(p32, old32.w, new32.w)); // Overridden above.
> > +
> > +		// raw_cmpxchg() is used here to avoid instrumentation.
> > +		ret = raw_cmpxchg(p32, old32.w, new32.w); // Overridden above.
> >  	} while (ret != old32.w);
> >  	return old;
> >  }
> > -- 
> > 2.44.0
> > 

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 cmpxchg 09/13] lib: Add one-byte emulation function
  2024-05-14 14:53                 ` Boqun Feng
@ 2024-05-14 15:02                   ` Paul E. McKenney
  0 siblings, 0 replies; 71+ messages in thread
From: Paul E. McKenney @ 2024-05-14 15:02 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linux-arch, linux-kernel, elver, akpm, tglx, peterz, dianders,
	pmladek, arnd, torvalds, kernel-team, Mark Rutland

On Tue, May 14, 2024 at 07:53:20AM -0700, Boqun Feng wrote:
> On Tue, May 14, 2024 at 07:22:47AM -0700, Paul E. McKenney wrote:
> > On Mon, May 13, 2024 at 02:19:37PM -0700, Boqun Feng wrote:
> > > On Mon, May 13, 2024 at 08:57:16AM -0700, Boqun Feng wrote:
> > > > On Mon, May 13, 2024 at 08:41:27AM -0700, Paul E. McKenney wrote:
> > > > [...]
> > > > > > > +#include <linux/types.h>
> > > > > > > +#include <linux/export.h>
> > > > > > > +#include <linux/instrumented.h>
> > > > > > > +#include <linux/atomic.h>
> > > > > > > +#include <linux/panic.h>
> > > > > > > +#include <linux/bug.h>
> > > > > > > +#include <asm-generic/rwonce.h>
> > > > > > > +#include <linux/cmpxchg-emu.h>
> > > > > > > +
> > > > > > > +union u8_32 {
> > > > > > > +	u8 b[4];
> > > > > > > +	u32 w;
> > > > > > > +};
> > > > > > > +
> > > > > > > +/* Emulate one-byte cmpxchg() in terms of 4-byte cmpxchg. */
> > > > > > > +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new)
> > > > > > > +{
> > > > > > > +	u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3);
> > > > > > > +	int i = ((uintptr_t)p) & 0x3;
> > > > > > > +	union u8_32 old32;
> > > > > > > +	union u8_32 new32;
> > > > > > > +	u32 ret;
> > > > > > > +
> > > > > > > +	ret = READ_ONCE(*p32);
> > > > > > > +	do {
> > > > > > > +		old32.w = ret;
> > > > > > > +		if (old32.b[i] != old)
> > > > > > > +			return old32.b[i];
> > > > > > > +		new32.w = old32.w;
> > > > > > > +		new32.b[i] = new;
> > > > > > > +		instrument_atomic_read_write(p, 1);
> > > > > > > +		ret = data_race(cmpxchg(p32, old32.w, new32.w)); // Overridden above.
> > > > > > 
> > > > > > Just out of curiosity, why is this `data_race` needed? cmpxchg is atomic
> > > > > > so there should be no chance for a data race?
> > > > > 
> > > > > That is what I thought, too.  ;-)
> > > > > 
> > > > > The problem is that the cmpxchg() covers 32 bits, and so without that
> > > > > data_race(), KCSAN would complain about data races with perfectly
> > > > > legitimate concurrent accesses to the other three bytes.
> > > > > 
> > > > > The instrument_atomic_read_write(p, 1) beforehand tells KCSAN to complain
> > > > > about concurrent accesses, but only to that one byte.
> > > > > 
> > > > 
> > > > Oh, I see. For that purpose, maybe we can just use raw_cmpxchg() here,
> > > > i.e. a cmpxchg() without any instrument in it. Cc Mark in case I'm
> > > > missing something.
> > > > 
> > > 
> > > I just realized that the KCSAN instrumentation is already done in
> > > cmpxchg() layer:
> > > 
> > > 	#define cmpxchg(ptr, ...) \
> > > 	({ \
> > > 		typeof(ptr) __ai_ptr = (ptr); \
> > > 		kcsan_mb(); \
> > > 		instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
> > > 		raw_cmpxchg(__ai_ptr, __VA_ARGS__); \
> > > 	})
> > > 
> > > and, this function is lower in the layer, so it shouldn't have the
> > > instrumentation itself. How about the following (based on today's RCU
> > > dev branch)?
> > 
> > The raw_cmpxchg() looks nicer than the added data_race()!
> > 
> > One question below, though.
> > 
> > 							Thanx, Paul
> > 
> > > Regards,
> > > Boqun
> > > 
> > > -------------------------------------------->8
> > > Subject: [PATCH] lib: cmpxchg-emu: Make cmpxchg_emu_u8() noinstr
> > > 
> > > Currently, cmpxchg_emu_u8() is called via cmpxchg() or raw_cmpxchg()
> > > which already makes the instrumentation decision:
> > > 
> > > * cmpxchg() case:
> > > 
> > > 	cmpxchg():
> > > 	  kcsan_mb();
> > > 	  instrument_atomic_read_write(...);
> > > 	  raw_cmpxchg():
> > > 	    arch_cmpxchg():
> > > 	      cmpxchg_emu_u8();
> > > 
> > > ... should have KCSAN instrumentation.
> > > 
> > > * raw_cmpxchg() case:
> > > 
> > > 	raw_cmpxchg():
> > > 	  arch_cmpxchg():
> > > 	    cmpxchg_emu_u8();
> > > 
> > > ... shouldn't have KCSAN instrumentation.
> > > 
> > > Therefore it's redundant to put KCSAN instrumentation in
> > > cmpxchg_emu_u8() (along with the data_race() to get away the
> > > instrumentation).
> > > 
> > > So make cmpxchg_emu_u8() a noinstr function, and remove the KCSAN
> > > instrumentation inside it.
> > > 
> > > Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> > > ---
> > >  include/linux/cmpxchg-emu.h |  4 +++-
> > >  lib/cmpxchg-emu.c           | 14 ++++++++++----
> > >  2 files changed, 13 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/include/linux/cmpxchg-emu.h b/include/linux/cmpxchg-emu.h
> > > index 998deec67740..c4c85f41d9f4 100644
> > > --- a/include/linux/cmpxchg-emu.h
> > > +++ b/include/linux/cmpxchg-emu.h
> > > @@ -10,6 +10,8 @@
> > >  #ifndef __LINUX_CMPXCHG_EMU_H
> > >  #define __LINUX_CMPXCHG_EMU_H
> > >  
> > > -uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new);
> > > +#include <linux/compiler.h>
> > > +
> > > +noinstr uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new);
> > >  
> > >  #endif /* __LINUX_CMPXCHG_EMU_H */
> > > diff --git a/lib/cmpxchg-emu.c b/lib/cmpxchg-emu.c
> > > index 27f6f97cb60d..788c22cd4462 100644
> > > --- a/lib/cmpxchg-emu.c
> > > +++ b/lib/cmpxchg-emu.c
> > > @@ -21,8 +21,13 @@ union u8_32 {
> > >  	u32 w;
> > >  };
> > >  
> > > -/* Emulate one-byte cmpxchg() in terms of 4-byte cmpxchg. */
> > > -uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new)
> > > +/*
> > > + * Emulate one-byte cmpxchg() in terms of 4-byte cmpxchg.
> > > + *
> > > + * This function is marked as 'noinstr' as the instrumentation should be done at
> > > + * outer layer.
> > > + */
> > > +noinstr uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new)
> > >  {
> > >  	u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3);
> > >  	int i = ((uintptr_t)p) & 0x3;
> > > @@ -37,8 +42,9 @@ uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new)
> > >  			return old32.b[i];
> > >  		new32.w = old32.w;
> > >  		new32.b[i] = new;
> > > -		instrument_atomic_read_write(p, 1);
> > 
> > Don't we need to keep that instrument_atomic_read_write() in order
> > to allow KCSAN to detect data races with plain C-language reads?
> > Or is that being handled some other way?
> > 
> 
> I think that's already covered by the current code, cmpxchg_emu_u8() is
> called from cmpxchg() macro, which has a:
> 
> 	instrument_atomic_read_write(p, sizeof(*p));
> 
> in it, and in the cmpxchg((u8*), ..) case, 'sizeof(*p)' is obviously 1
> ;-)

I believe you, but could you nevertheless please test it?  ;-)

							Thanx, Paul

> Regards,
> Boqun
> 
> > > -		ret = data_race(cmpxchg(p32, old32.w, new32.w)); // Overridden above.
> > > +
> > > +		// raw_cmpxchg() is used here to avoid instrumentation.
> > > +		ret = raw_cmpxchg(p32, old32.w, new32.w); // Overridden above.
> > >  	} while (ret != old32.w);
> > >  	return old;
> > >  }
> > > -- 
> > > 2.44.0
> > > 

^ permalink raw reply	[flat|nested] 71+ messages in thread

end of thread, other threads:[~2024-05-14 15:02 UTC | newest]

Thread overview: 71+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-01 21:39 [PATCH RFC cmpxchg 0/8] Provide emulation for one- and two-byte cmpxchg() Paul E. McKenney
2024-04-01 21:39 ` [PATCH RFC cmpxchg 1/8] lib: Add one-byte and two-byte cmpxchg() emulation functions Paul E. McKenney
2024-04-02 13:07   ` Marco Elver
2024-04-02 17:15     ` Paul E. McKenney
2024-04-08 17:47 ` [PATCH RFC cmpxchg 0/8] Provide emulation for one- and two-byte cmpxchg() Paul E. McKenney
2024-04-08 17:49   ` [PATCH cmpxchg 01/14] sparc32: make __cmpxchg_u32() return u32 Paul E. McKenney
2024-04-08 17:49   ` [PATCH cmpxchg 02/14] sparc32: make the first argument of __cmpxchg_u64() volatile u64 * Paul E. McKenney
2024-04-08 17:49   ` [PATCH cmpxchg 03/14] sparc32: unify __cmpxchg_u{32,64} Paul E. McKenney
2024-04-08 17:49   ` [PATCH cmpxchg 04/14] sparc32: add __cmpxchg_u{8,16}() and teach __cmpxchg() to handle those sizes Paul E. McKenney
2024-04-08 17:49   ` [PATCH cmpxchg 05/14] parisc: __cmpxchg_u32(): lift conversion into the callers Paul E. McKenney
2024-04-08 17:49   ` [PATCH cmpxchg 06/14] parisc: unify implementations of __cmpxchg_u{8,32,64} Paul E. McKenney
2024-04-08 17:49   ` [PATCH cmpxchg 07/14] parisc: add missing export of __cmpxchg_u8() Paul E. McKenney
2024-04-08 17:49   ` [PATCH cmpxchg 08/14] parisc: add u16 support to cmpxchg() Paul E. McKenney
2024-04-08 20:10     ` Linus Torvalds
2024-04-08 20:53       ` Paul E. McKenney
2024-04-08 17:49   ` [PATCH cmpxchg 09/14] lib: Add one-byte emulation function Paul E. McKenney
2024-04-08 17:49   ` [PATCH cmpxchg 10/14] ARC: Emulate one-byte cmpxchg Paul E. McKenney
2024-04-08 17:49   ` [PATCH cmpxchg 11/14] csky: " Paul E. McKenney
2024-04-08 17:49   ` [PATCH cmpxchg 12/14] sh: " Paul E. McKenney
2024-04-18  8:04     ` Geert Uytterhoeven
2024-04-08 17:49   ` [PATCH cmpxchg 13/14] xtensa: " Paul E. McKenney
2024-04-18  8:06     ` Geert Uytterhoeven
2024-04-18 23:21       ` Paul E. McKenney
2024-04-19  5:07         ` Yujie Liu
2024-04-19  8:02           ` Geert Uytterhoeven
2024-04-20 14:03             ` Paul E. McKenney
2024-04-08 17:49   ` [PATCH cmpxchg 14/14] riscv: " Paul E. McKenney
2024-04-09 17:35     ` Andrea Parri
2024-04-09 18:08       ` Paul E. McKenney
2024-05-01 22:58   ` [PATCH v2 cmpxchg 0/8] Provide emulation for one--byte cmpxchg() Paul E. McKenney
2024-05-01 23:01     ` [PATCH v2 cmpxchg 01/13] sparc32: make __cmpxchg_u32() return u32 Paul E. McKenney
2024-05-01 23:01     ` [PATCH v2 cmpxchg 02/13] sparc32: make the first argument of __cmpxchg_u64() volatile u64 * Paul E. McKenney
2024-05-01 23:01     ` [PATCH v2 cmpxchg 03/13] sparc32: unify __cmpxchg_u{32,64} Paul E. McKenney
2024-05-01 23:01     ` [PATCH v2 cmpxchg 04/13] sparc32: add __cmpxchg_u{8,16}() and teach __cmpxchg() to handle those sizes Paul E. McKenney
2024-05-01 23:01     ` [PATCH v2 cmpxchg 05/13] parisc: __cmpxchg_u32(): lift conversion into the callers Paul E. McKenney
2024-05-01 23:01     ` [PATCH v2 cmpxchg 06/13] parisc: unify implementations of __cmpxchg_u{8,32,64} Paul E. McKenney
2024-05-01 23:01     ` [PATCH v2 cmpxchg 07/13] parisc: add missing export of __cmpxchg_u8() Paul E. McKenney
2024-05-01 23:01     ` [PATCH v2 cmpxchg 08/13] parisc: add u16 support to cmpxchg() Paul E. McKenney
2024-05-01 23:01     ` [PATCH v2 cmpxchg 09/13] lib: Add one-byte emulation function Paul E. McKenney
2024-05-13 14:44       ` Boqun Feng
2024-05-13 15:41         ` Paul E. McKenney
2024-05-13 15:57           ` Boqun Feng
2024-05-13 21:19             ` Boqun Feng
2024-05-14 14:22               ` Paul E. McKenney
2024-05-14 14:53                 ` Boqun Feng
2024-05-14 15:02                   ` Paul E. McKenney
2024-05-01 23:01     ` [PATCH v2 cmpxchg 10/13] ARC: Emulate one-byte cmpxchg Paul E. McKenney
2024-05-01 23:01     ` [PATCH v2 cmpxchg 11/13] csky: " Paul E. McKenney
2024-05-11  6:42       ` Guo Ren
2024-05-11 14:49         ` Paul E. McKenney
2024-05-01 23:01     ` [PATCH v2 cmpxchg 12/13] sh: " Paul E. McKenney
2024-05-02  4:52       ` John Paul Adrian Glaubitz
2024-05-02  5:06         ` Paul E. McKenney
2024-05-02  5:11           ` John Paul Adrian Glaubitz
2024-05-02 13:33             ` Paul E. McKenney
2024-05-02 20:53               ` Al Viro
2024-05-02 21:01                 ` alpha cmpxchg.h (was Re: [PATCH v2 cmpxchg 12/13] sh: Emulate one-byte cmpxchg) Al Viro
2024-05-02 22:16                   ` Linus Torvalds
2024-05-02 21:18                 ` [PATCH v2 cmpxchg 12/13] sh: Emulate one-byte cmpxchg Paul E. McKenney
2024-05-02 22:07                   ` Al Viro
2024-05-02 23:12                     ` Paul E. McKenney
2024-05-02 23:24                       ` Al Viro
2024-05-02 23:45                         ` Paul E. McKenney
2024-05-02 23:32                       ` Linus Torvalds
2024-05-03  0:16                         ` Paul E. McKenney
2024-05-02 21:50               ` Arnd Bergmann
2024-05-02  5:42           ` D. Jeff Dionne
2024-05-02 11:30             ` Arnd Bergmann
2024-05-01 23:01     ` [PATCH v2 cmpxchg 13/13] xtensa: " Paul E. McKenney
2024-05-02 20:01     ` [PATCH v2 cmpxchg 0/8] Provide emulation for one--byte cmpxchg() Al Viro
2024-05-02 21:20       ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).