linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-10 11:34 Alexander Lobakin
  2022-06-10 11:34 ` [PATCH v2 1/6] ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr() Alexander Lobakin
                   ` (6 more replies)
  0 siblings, 7 replies; 30+ messages in thread
From: Alexander Lobakin @ 2022-06-10 11:34 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

While I was working on converting some structure fields from a fixed
type to a bitmap, I started observing code size increase not only in
places where the code works with the converted structure fields, but
also where the converted vars were on the stack. That said, the
following code:

	DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
	unsigned long bar = BIT(BAR_BIT);
	unsigned long baz = 0;

	__set_bit(FOO_BIT, foo);
	baz |= BIT(BAZ_BIT);

	BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
	BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
	BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));

triggers the first assertion on x86_64, which means that the
compiler is unable to evaluate it to a compile-time initializer
when the architecture-specific bitop is used even if it's obvious.
I found that this is due to that many architecture-specific
non-atomic bitop implementations use inline asm or other hacks which
are faster or more robust when working with "real" variables (i.e.
fields from the structures etc.), but the compilers have no clue how
to optimize them out when called on compile-time constants.

So, in order to let the compiler optimize out such cases, expand the
test_bit() and __*_bit() definitions with a compile-time condition
check, so that they will pick the generic C non-atomic bitop
implementations when all of the arguments passed are compile-time
constants, which means that the result will be a compile-time
constant as well and the compiler will produce more efficient and
simple code in 100% cases (no changes when there's at least one
non-compile-time-constant argument).
The condition itself:

if (
__builtin_constant_p(nr) &&	/* <- bit position is constant */
__builtin_constant_p(!!addr) &&	/* <- compiler knows bitmap addr is
				      always either NULL or not */
addr &&				/* <- bitmap addr is not NULL */
__builtin_constant_p(*addr)	/* <- compiler knows the value of
				      the target bitmap */
)
	/* then pick the generic C variant
else
	/* old code path, arch-specific

I also tried __is_constexpr() as suggested by Andy, but it was
always returning 0 ('not a constant') for the 2,3 and 4th
conditions.

The savings are architecture, compiler and compiler flags dependent,
for example, on x86_64 -O2:

GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)

and ARM64 (courtesy of Mark[0]):

GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)

And the following:

	DECLARE_BITMAP(flags, __IP_TUNNEL_FLAG_NUM) = { };
	__be16 flags;

	__set_bit(IP_TUNNEL_CSUM_BIT, flags);

	tun_flags = cpu_to_be16(*flags & U16_MAX);

	if (test_bit(IP_TUNNEL_VTI_BIT, flags))
		tun_flags |= VTI_ISVTI;

	BUILD_BUG_ON(!__builtin_constant_p(tun_flags));

doesn't blow up anymore, so that we now can e.g. use fixed bitmaps
in compile-time assertions etc.

The series has been in intel-next for a while with no reported issues.

From v1[1]:
* change 'gen_' prefixes to '_generic' to disambiguate from
  'generated' etc. (Mark);
* define a separate 'const_' set to use in the optimization to keep
  the generic test_bit() atomic-safe (Marco);
* unify arch_{test,__*}_bit() as well and include them in the type
  check;
* add more relevant and up-to-date bloat-o-meter results, including
  ARM64 (me, Mark);
* pick a couple '*-by' tags (Mark, Yury).

Also available on my open GH[2].

[0] https://lore.kernel.org/all/Yp4GQFQYD32Rs9Cw@FVFF77S0Q05N
[1] https://lore.kernel.org/all/20220606114908.962562-1-alexandr.lobakin@intel.com
[2] https://github.com/alobakin/linux/commits/bitops

Alexander Lobakin (6):
  ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr()
  bitops: always define asm-generic non-atomic bitops
  bitops: unify non-atomic bitops prototypes across architectures
  bitops: define const_*() versions of the non-atomics
  bitops: wrap non-atomic bitops with a transparent macro
  bitops: let optimize out non-atomic bitops on compile-time constants

 arch/alpha/include/asm/bitops.h               |  28 ++--
 arch/hexagon/include/asm/bitops.h             |  23 ++-
 arch/ia64/include/asm/bitops.h                |  40 ++---
 arch/ia64/include/asm/processor.h             |   2 +-
 arch/m68k/include/asm/bitops.h                |  47 ++++--
 arch/sh/include/asm/bitops-op32.h             |  32 ++--
 arch/sparc/include/asm/bitops_32.h            |  18 +-
 arch/sparc/lib/atomic32.c                     |  12 +-
 arch/x86/include/asm/bitops.h                 |  22 +--
 .../asm-generic/bitops/generic-non-atomic.h   | 155 ++++++++++++++++++
 .../bitops/instrumented-non-atomic.h          |  35 ++--
 include/asm-generic/bitops/non-atomic.h       | 123 ++------------
 include/linux/bitops.h                        |  50 ++++++
 tools/include/asm-generic/bitops/non-atomic.h |  34 ++--
 tools/include/linux/bitops.h                  |  16 ++
 15 files changed, 407 insertions(+), 230 deletions(-)
 create mode 100644 include/asm-generic/bitops/generic-non-atomic.h

base-commit: 874c8ca1e60b2c564a48f7e7acc40d328d5c8733
-- 
2.36.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 1/6] ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr()
  2022-06-10 11:34 [PATCH v2 0/6] bitops: let optimize out non-atomic bitops on compile-time constants Alexander Lobakin
@ 2022-06-10 11:34 ` Alexander Lobakin
  2022-06-10 12:18   ` David Laight
                     ` (2 more replies)
  2022-06-10 11:34 ` [PATCH v2 2/6] bitops: always define asm-generic non-atomic bitops Alexander Lobakin
                   ` (5 subsequent siblings)
  6 siblings, 3 replies; 30+ messages in thread
From: Alexander Lobakin @ 2022-06-10 11:34 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel, stable, kernel test robot

test_bit(), as any other bitmap op, takes `unsigned long *` as a
second argument (pointer to the actual bitmap), as any bitmap
itself is an array of unsigned longs. However, the ia64_get_irr()
code passes a ref to `u64` as a second argument.
This works with the ia64 bitops implementation due to that they
have `void *` as the second argument and then cast it later on.
This works with the bitmap API itself due to that `unsigned long`
has the same size on ia64 as `u64` (`unsigned long long`), but
from the compiler PoV those two are different.
Define @irr as `unsigned long` to fix that. That implies no
functional changes. Has been hidden for 16 years!

Fixes: a58786917ce2 ("[IA64] avoid broken SAL_CACHE_FLUSH implementations")
Cc: stable@vger.kernel.org # 2.6.16+
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 arch/ia64/include/asm/processor.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h
index 7cbce290f4e5..757c2f6d8d4b 100644
--- a/arch/ia64/include/asm/processor.h
+++ b/arch/ia64/include/asm/processor.h
@@ -538,7 +538,7 @@ ia64_get_irr(unsigned int vector)
 {
 	unsigned int reg = vector / 64;
 	unsigned int bit = vector % 64;
-	u64 irr;
+	unsigned long irr;
 
 	switch (reg) {
 	case 0: irr = ia64_getreg(_IA64_REG_CR_IRR0); break;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 2/6] bitops: always define asm-generic non-atomic bitops
  2022-06-10 11:34 [PATCH v2 0/6] bitops: let optimize out non-atomic bitops on compile-time constants Alexander Lobakin
  2022-06-10 11:34 ` [PATCH v2 1/6] ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr() Alexander Lobakin
@ 2022-06-10 11:34 ` Alexander Lobakin
  2022-06-10 13:50   ` Andy Shevchenko
  2022-06-10 11:34 ` [PATCH v2 3/6] bitops: unify non-atomic bitops prototypes across architectures Alexander Lobakin
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Alexander Lobakin @ 2022-06-10 11:34 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

Move generic non-atomic bitops from the asm-generic header which
gets included only when there are no architecture-specific
alternatives, to a separate independent file to make them always
available.

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 .../asm-generic/bitops/generic-non-atomic.h   | 124 ++++++++++++++++++
 include/asm-generic/bitops/non-atomic.h       | 109 ++-------------
 2 files changed, 132 insertions(+), 101 deletions(-)
 create mode 100644 include/asm-generic/bitops/generic-non-atomic.h

diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
new file mode 100644
index 000000000000..808bc4469886
--- /dev/null
+++ b/include/asm-generic/bitops/generic-non-atomic.h
@@ -0,0 +1,124 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
+#define __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
+
+#include <linux/bits.h>
+
+#ifndef _LINUX_BITOPS_H
+#error only <linux/bitops.h> can be included directly
+#endif
+
+/*
+ * Generic definitions for bit operations, should not be used in regular code
+ * directly.
+ */
+
+/**
+ * generic___set_bit - Set a bit in memory
+ * @nr: the bit to set
+ * @addr: the address to start counting from
+ *
+ * Unlike set_bit(), this function is non-atomic and may be reordered.
+ * If it's called on the same region of memory simultaneously, the effect
+ * may be that only one operation succeeds.
+ */
+static __always_inline void
+generic___set_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+	*p  |= mask;
+}
+
+static __always_inline void
+generic___clear_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+	*p &= ~mask;
+}
+
+/**
+ * generic___change_bit - Toggle a bit in memory
+ * @nr: the bit to change
+ * @addr: the address to start counting from
+ *
+ * Unlike change_bit(), this function is non-atomic and may be reordered.
+ * If it's called on the same region of memory simultaneously, the effect
+ * may be that only one operation succeeds.
+ */
+static __always_inline
+void generic___change_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+	*p ^= mask;
+}
+
+/**
+ * generic___test_and_set_bit - Set a bit and return its old value
+ * @nr: Bit to set
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail.  You must protect multiple accesses with a lock.
+ */
+static __always_inline int
+generic___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+	unsigned long old = *p;
+
+	*p = old | mask;
+	return (old & mask) != 0;
+}
+
+/**
+ * generic___test_and_clear_bit - Clear a bit and return its old value
+ * @nr: Bit to clear
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail.  You must protect multiple accesses with a lock.
+ */
+static __always_inline int
+generic___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+	unsigned long old = *p;
+
+	*p = old & ~mask;
+	return (old & mask) != 0;
+}
+
+/* WARNING: non atomic and it can be reordered! */
+static __always_inline int
+generic___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+	unsigned long old = *p;
+
+	*p = old ^ mask;
+	return (old & mask) != 0;
+}
+
+/**
+ * generic_test_bit - Determine whether a bit is set
+ * @nr: bit number to test
+ * @addr: Address to start counting from
+ */
+static __always_inline int
+generic_test_bit(unsigned int nr, const volatile unsigned long *addr)
+{
+	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
+}
+
+#endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h
index 078cc68be2f1..a05bc090a6a3 100644
--- a/include/asm-generic/bitops/non-atomic.h
+++ b/include/asm-generic/bitops/non-atomic.h
@@ -2,121 +2,28 @@
 #ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
 #define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
 
+#include <asm-generic/bitops/generic-non-atomic.h>
 #include <asm/types.h>
 
-/**
- * arch___set_bit - Set a bit in memory
- * @nr: the bit to set
- * @addr: the address to start counting from
- *
- * Unlike set_bit(), this function is non-atomic and may be reordered.
- * If it's called on the same region of memory simultaneously, the effect
- * may be that only one operation succeeds.
- */
-static __always_inline void
-arch___set_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-
-	*p  |= mask;
-}
+#define arch___set_bit generic___set_bit
 #define __set_bit arch___set_bit
 
-static __always_inline void
-arch___clear_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-
-	*p &= ~mask;
-}
+#define arch___clear_bit generic___clear_bit
 #define __clear_bit arch___clear_bit
 
-/**
- * arch___change_bit - Toggle a bit in memory
- * @nr: the bit to change
- * @addr: the address to start counting from
- *
- * Unlike change_bit(), this function is non-atomic and may be reordered.
- * If it's called on the same region of memory simultaneously, the effect
- * may be that only one operation succeeds.
- */
-static __always_inline
-void arch___change_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-
-	*p ^= mask;
-}
+#define arch___change_bit generic___change_bit
 #define __change_bit arch___change_bit
 
-/**
- * arch___test_and_set_bit - Set a bit and return its old value
- * @nr: Bit to set
- * @addr: Address to count from
- *
- * This operation is non-atomic and can be reordered.
- * If two examples of this operation race, one can appear to succeed
- * but actually fail.  You must protect multiple accesses with a lock.
- */
-static __always_inline int
-arch___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-	unsigned long old = *p;
-
-	*p = old | mask;
-	return (old & mask) != 0;
-}
+#define arch___test_and_set_bit generic___test_and_set_bit
 #define __test_and_set_bit arch___test_and_set_bit
 
-/**
- * arch___test_and_clear_bit - Clear a bit and return its old value
- * @nr: Bit to clear
- * @addr: Address to count from
- *
- * This operation is non-atomic and can be reordered.
- * If two examples of this operation race, one can appear to succeed
- * but actually fail.  You must protect multiple accesses with a lock.
- */
-static __always_inline int
-arch___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-	unsigned long old = *p;
-
-	*p = old & ~mask;
-	return (old & mask) != 0;
-}
+#define arch___test_and_clear_bit generic___test_and_clear_bit
 #define __test_and_clear_bit arch___test_and_clear_bit
 
-/* WARNING: non atomic and it can be reordered! */
-static __always_inline int
-arch___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-	unsigned long old = *p;
-
-	*p = old ^ mask;
-	return (old & mask) != 0;
-}
+#define arch___test_and_change_bit generic___test_and_change_bit
 #define __test_and_change_bit arch___test_and_change_bit
 
-/**
- * arch_test_bit - Determine whether a bit is set
- * @nr: bit number to test
- * @addr: Address to start counting from
- */
-static __always_inline int
-arch_test_bit(unsigned int nr, const volatile unsigned long *addr)
-{
-	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
-}
+#define arch_test_bit generic_test_bit
 #define test_bit arch_test_bit
 
 #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 3/6] bitops: unify non-atomic bitops prototypes across architectures
  2022-06-10 11:34 [PATCH v2 0/6] bitops: let optimize out non-atomic bitops on compile-time constants Alexander Lobakin
  2022-06-10 11:34 ` [PATCH v2 1/6] ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr() Alexander Lobakin
  2022-06-10 11:34 ` [PATCH v2 2/6] bitops: always define asm-generic non-atomic bitops Alexander Lobakin
@ 2022-06-10 11:34 ` Alexander Lobakin
  2022-06-10 11:34 ` [PATCH v2 4/6] bitops: define const_*() versions of the non-atomics Alexander Lobakin
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 30+ messages in thread
From: Alexander Lobakin @ 2022-06-10 11:34 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

Currently, there is a mess with the prototypes of the non-atomic
bitops across the different architectures:

ret	bool, int, unsigned long
nr	int, long, unsigned int, unsigned long
addr	volatile unsigned long *, volatile void *

Thankfully, it doesn't provoke any bugs, but can sometimes make
the compiler angry when it's not handy at all.
Adjust all the prototypes to the following standard:

ret	bool				retval can be only 0 or 1
nr	unsigned long			native; signed makes no sense
addr	volatile unsigned long *	bitmaps are arrays of ulongs

Finally, add some static assertions in order to prevent people from
making a mess in this room again.
I also used the %__always_inline attribute consistently they always
get resolved to the actual operations.

Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
---
 arch/alpha/include/asm/bitops.h               | 28 +++++------
 arch/hexagon/include/asm/bitops.h             | 23 +++++----
 arch/ia64/include/asm/bitops.h                | 28 +++++------
 arch/m68k/include/asm/bitops.h                | 47 +++++++++++++------
 arch/sh/include/asm/bitops-op32.h             | 24 ++++++----
 arch/x86/include/asm/bitops.h                 | 22 +++++----
 .../asm-generic/bitops/generic-non-atomic.h   | 24 +++++-----
 .../bitops/instrumented-non-atomic.h          | 21 ++++++---
 include/linux/bitops.h                        | 17 +++++++
 tools/include/asm-generic/bitops/non-atomic.h | 24 ++++++----
 10 files changed, 162 insertions(+), 96 deletions(-)

diff --git a/arch/alpha/include/asm/bitops.h b/arch/alpha/include/asm/bitops.h
index e1d8483a45f2..381ad2eae4b4 100644
--- a/arch/alpha/include/asm/bitops.h
+++ b/arch/alpha/include/asm/bitops.h
@@ -46,8 +46,8 @@ set_bit(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static inline void
-__set_bit(unsigned long nr, volatile void * addr)
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	int *m = ((int *) addr) + (nr >> 5);
 
@@ -82,8 +82,8 @@ clear_bit_unlock(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static __inline__ void
-__clear_bit(unsigned long nr, volatile void * addr)
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	int *m = ((int *) addr) + (nr >> 5);
 
@@ -118,8 +118,8 @@ change_bit(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static __inline__ void
-__change_bit(unsigned long nr, volatile void * addr)
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	int *m = ((int *) addr) + (nr >> 5);
 
@@ -186,8 +186,8 @@ test_and_set_bit_lock(unsigned long nr, volatile void *addr)
 /*
  * WARNING: non atomic version.
  */
-static inline int
-__test_and_set_bit(unsigned long nr, volatile void * addr)
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = 1 << (nr & 0x1f);
 	int *m = ((int *) addr) + (nr >> 5);
@@ -230,8 +230,8 @@ test_and_clear_bit(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static inline int
-__test_and_clear_bit(unsigned long nr, volatile void * addr)
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = 1 << (nr & 0x1f);
 	int *m = ((int *) addr) + (nr >> 5);
@@ -272,8 +272,8 @@ test_and_change_bit(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static __inline__ int
-__test_and_change_bit(unsigned long nr, volatile void * addr)
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = 1 << (nr & 0x1f);
 	int *m = ((int *) addr) + (nr >> 5);
@@ -283,8 +283,8 @@ __test_and_change_bit(unsigned long nr, volatile void * addr)
 	return (old & mask) != 0;
 }
 
-static inline int
-test_bit(int nr, const volatile void * addr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return (1UL & (((const int *) addr)[nr >> 5] >> (nr & 31))) != 0UL;
 }
diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h
index 75d6ba3643b8..a3bfe3a8d4b7 100644
--- a/arch/hexagon/include/asm/bitops.h
+++ b/arch/hexagon/include/asm/bitops.h
@@ -127,38 +127,45 @@ static inline void change_bit(int nr, volatile void *addr)
  * be atomic, particularly for things like slab_lock and slab_unlock.
  *
  */
-static inline void __clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	test_and_clear_bit(nr, addr);
 }
 
-static inline void __set_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	test_and_set_bit(nr, addr);
 }
 
-static inline void __change_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	test_and_change_bit(nr, addr);
 }
 
 /*  Apparently, at least some of these are allowed to be non-atomic  */
-static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_clear_bit(nr, addr);
 }
 
-static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_set_bit(nr, addr);
 }
 
-static inline int __test_and_change_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_change_bit(nr, addr);
 }
 
-static inline int __test_bit(int nr, const volatile unsigned long *addr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	int retval;
 
@@ -172,7 +179,7 @@ static inline int __test_bit(int nr, const volatile unsigned long *addr)
 	return retval;
 }
 
-#define test_bit(nr, addr) __test_bit(nr, addr)
+#define __test_bit(nr, addr) test_bit(nr, addr)
 
 /*
  * ffz - find first zero in word.
diff --git a/arch/ia64/include/asm/bitops.h b/arch/ia64/include/asm/bitops.h
index 577be93c0818..4267a217a503 100644
--- a/arch/ia64/include/asm/bitops.h
+++ b/arch/ia64/include/asm/bitops.h
@@ -61,8 +61,8 @@ set_bit (int nr, volatile void *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static __inline__ void
-__set_bit (int nr, volatile void *addr)
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	*((__u32 *) addr + (nr >> 5)) |= (1 << (nr & 31));
 }
@@ -143,8 +143,8 @@ __clear_bit_unlock(int nr, void *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static __inline__ void
-__clear_bit (int nr, volatile void *addr)
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	*((__u32 *) addr + (nr >> 5)) &= ~(1 << (nr & 31));
 }
@@ -183,8 +183,8 @@ change_bit (int nr, volatile void *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static __inline__ void
-__change_bit (int nr, volatile void *addr)
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	*((__u32 *) addr + (nr >> 5)) ^= (1 << (nr & 31));
 }
@@ -232,8 +232,8 @@ test_and_set_bit (int nr, volatile void *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static __inline__ int
-__test_and_set_bit (int nr, volatile void *addr)
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__u32 *p = (__u32 *) addr + (nr >> 5);
 	__u32 m = 1 << (nr & 31);
@@ -277,8 +277,8 @@ test_and_clear_bit (int nr, volatile void *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static __inline__ int
-__test_and_clear_bit(int nr, volatile void * addr)
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__u32 *p = (__u32 *) addr + (nr >> 5);
 	__u32 m = 1 << (nr & 31);
@@ -320,8 +320,8 @@ test_and_change_bit (int nr, volatile void *addr)
  *
  * This operation is non-atomic and can be reordered.
  */
-static __inline__ int
-__test_and_change_bit (int nr, void *addr)
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__u32 old, bit = (1 << (nr & 31));
 	__u32 *m = (__u32 *) addr + (nr >> 5);
@@ -331,8 +331,8 @@ __test_and_change_bit (int nr, void *addr)
 	return (old & bit) != 0;
 }
 
-static __inline__ int
-test_bit (int nr, const volatile void *addr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1 & (((const volatile __u32 *) addr)[nr >> 5] >> (nr & 31));
 }
diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
index 51283db53667..9d44bd4713cb 100644
--- a/arch/m68k/include/asm/bitops.h
+++ b/arch/m68k/include/asm/bitops.h
@@ -65,8 +65,11 @@ static inline void bfset_mem_set_bit(int nr, volatile unsigned long *vaddr)
 				bfset_mem_set_bit(nr, vaddr))
 #endif
 
-#define __set_bit(nr, vaddr)	set_bit(nr, vaddr)
-
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	set_bit(nr, addr);
+}
 
 static inline void bclr_reg_clear_bit(int nr, volatile unsigned long *vaddr)
 {
@@ -105,8 +108,11 @@ static inline void bfclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
 				bfclr_mem_clear_bit(nr, vaddr))
 #endif
 
-#define __clear_bit(nr, vaddr)	clear_bit(nr, vaddr)
-
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	clear_bit(nr, addr);
+}
 
 static inline void bchg_reg_change_bit(int nr, volatile unsigned long *vaddr)
 {
@@ -145,12 +151,16 @@ static inline void bfchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
 				bfchg_mem_change_bit(nr, vaddr))
 #endif
 
-#define __change_bit(nr, vaddr)	change_bit(nr, vaddr)
-
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	change_bit(nr, addr);
+}
 
-static inline int test_bit(int nr, const volatile unsigned long *vaddr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
-	return (vaddr[nr >> 5] & (1UL << (nr & 31))) != 0;
+	return (addr[nr >> 5] & (1UL << (nr & 31))) != 0;
 }
 
 
@@ -201,8 +211,11 @@ static inline int bfset_mem_test_and_set_bit(int nr,
 					bfset_mem_test_and_set_bit(nr, vaddr))
 #endif
 
-#define __test_and_set_bit(nr, vaddr)	test_and_set_bit(nr, vaddr)
-
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	return test_and_set_bit(nr, addr);
+}
 
 static inline int bclr_reg_test_and_clear_bit(int nr,
 					      volatile unsigned long *vaddr)
@@ -251,8 +264,11 @@ static inline int bfclr_mem_test_and_clear_bit(int nr,
 					bfclr_mem_test_and_clear_bit(nr, vaddr))
 #endif
 
-#define __test_and_clear_bit(nr, vaddr)	test_and_clear_bit(nr, vaddr)
-
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	return test_and_clear_bit(nr, addr);
+}
 
 static inline int bchg_reg_test_and_change_bit(int nr,
 					       volatile unsigned long *vaddr)
@@ -301,8 +317,11 @@ static inline int bfchg_mem_test_and_change_bit(int nr,
 					bfchg_mem_test_and_change_bit(nr, vaddr))
 #endif
 
-#define __test_and_change_bit(nr, vaddr) test_and_change_bit(nr, vaddr)
-
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	return test_and_change_bit(nr, addr);
+}
 
 /*
  *	The true 68020 and more advanced processors support the "bfffo"
diff --git a/arch/sh/include/asm/bitops-op32.h b/arch/sh/include/asm/bitops-op32.h
index cfe5465acce7..dcd85866a394 100644
--- a/arch/sh/include/asm/bitops-op32.h
+++ b/arch/sh/include/asm/bitops-op32.h
@@ -2,6 +2,8 @@
 #ifndef __ASM_SH_BITOPS_OP32_H
 #define __ASM_SH_BITOPS_OP32_H
 
+#include <linux/bits.h>
+
 /*
  * The bit modifying instructions on SH-2A are only capable of working
  * with a 3-bit immediate, which signifies the shift position for the bit
@@ -16,7 +18,8 @@
 #define BYTE_OFFSET(nr)		((nr) % BITS_PER_BYTE)
 #endif
 
-static inline void __set_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	if (__builtin_constant_p(nr)) {
 		__asm__ __volatile__ (
@@ -33,7 +36,8 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
 	}
 }
 
-static inline void __clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	if (__builtin_constant_p(nr)) {
 		__asm__ __volatile__ (
@@ -60,7 +64,8 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static inline void __change_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	if (__builtin_constant_p(nr)) {
 		__asm__ __volatile__ (
@@ -87,7 +92,8 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -106,7 +112,8 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -117,8 +124,8 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
 }
 
 /* WARNING: non atomic and it can be reordered! */
-static inline int __test_and_change_bit(int nr,
-					    volatile unsigned long *addr)
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -133,7 +140,8 @@ static inline int __test_and_change_bit(int nr,
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
-static inline int test_bit(int nr, const volatile unsigned long *addr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index a288ecd230ab..973c6bd17f98 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -63,7 +63,7 @@ arch_set_bit(long nr, volatile unsigned long *addr)
 }
 
 static __always_inline void
-arch___set_bit(long nr, volatile unsigned long *addr)
+arch___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	asm volatile(__ASM_SIZE(bts) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
 }
@@ -89,7 +89,7 @@ arch_clear_bit_unlock(long nr, volatile unsigned long *addr)
 }
 
 static __always_inline void
-arch___clear_bit(long nr, volatile unsigned long *addr)
+arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	asm volatile(__ASM_SIZE(btr) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
 }
@@ -114,7 +114,7 @@ arch___clear_bit_unlock(long nr, volatile unsigned long *addr)
 }
 
 static __always_inline void
-arch___change_bit(long nr, volatile unsigned long *addr)
+arch___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	asm volatile(__ASM_SIZE(btc) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
 }
@@ -145,7 +145,7 @@ arch_test_and_set_bit_lock(long nr, volatile unsigned long *addr)
 }
 
 static __always_inline bool
-arch___test_and_set_bit(long nr, volatile unsigned long *addr)
+arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	bool oldbit;
 
@@ -171,7 +171,7 @@ arch_test_and_clear_bit(long nr, volatile unsigned long *addr)
  * this without also updating arch/x86/kernel/kvm.c
  */
 static __always_inline bool
-arch___test_and_clear_bit(long nr, volatile unsigned long *addr)
+arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	bool oldbit;
 
@@ -183,7 +183,7 @@ arch___test_and_clear_bit(long nr, volatile unsigned long *addr)
 }
 
 static __always_inline bool
-arch___test_and_change_bit(long nr, volatile unsigned long *addr)
+arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	bool oldbit;
 
@@ -219,10 +219,12 @@ static __always_inline bool variable_test_bit(long nr, volatile const unsigned l
 	return oldbit;
 }
 
-#define arch_test_bit(nr, addr)			\
-	(__builtin_constant_p((nr))		\
-	 ? constant_test_bit((nr), (addr))	\
-	 : variable_test_bit((nr), (addr)))
+static __always_inline bool
+arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
+{
+	return __builtin_constant_p(nr) ? constant_test_bit(nr, addr) :
+					  variable_test_bit(nr, addr);
+}
 
 /**
  * __ffs - find first set bit in word
diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
index 808bc4469886..3ce0fa0ab35f 100644
--- a/include/asm-generic/bitops/generic-non-atomic.h
+++ b/include/asm-generic/bitops/generic-non-atomic.h
@@ -23,7 +23,7 @@
  * may be that only one operation succeeds.
  */
 static __always_inline void
-generic___set_bit(unsigned int nr, volatile unsigned long *addr)
+generic___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -32,7 +32,7 @@ generic___set_bit(unsigned int nr, volatile unsigned long *addr)
 }
 
 static __always_inline void
-generic___clear_bit(unsigned int nr, volatile unsigned long *addr)
+generic___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -49,8 +49,8 @@ generic___clear_bit(unsigned int nr, volatile unsigned long *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static __always_inline
-void generic___change_bit(unsigned int nr, volatile unsigned long *addr)
+static __always_inline void
+generic___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -67,8 +67,8 @@ void generic___change_bit(unsigned int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static __always_inline int
-generic___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
+static __always_inline bool
+generic___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -87,8 +87,8 @@ generic___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static __always_inline int
-generic___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
+static __always_inline bool
+generic___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -99,8 +99,8 @@ generic___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
 }
 
 /* WARNING: non atomic and it can be reordered! */
-static __always_inline int
-generic___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
+static __always_inline bool
+generic___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -115,8 +115,8 @@ generic___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
-static __always_inline int
-generic_test_bit(unsigned int nr, const volatile unsigned long *addr)
+static __always_inline bool
+generic_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
diff --git a/include/asm-generic/bitops/instrumented-non-atomic.h b/include/asm-generic/bitops/instrumented-non-atomic.h
index 7ab1ecc37782..b019f77ef21c 100644
--- a/include/asm-generic/bitops/instrumented-non-atomic.h
+++ b/include/asm-generic/bitops/instrumented-non-atomic.h
@@ -22,7 +22,8 @@
  * region of memory concurrently, the effect may be that only one operation
  * succeeds.
  */
-static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___set_bit(nr, addr);
@@ -37,7 +38,8 @@ static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
  * region of memory concurrently, the effect may be that only one operation
  * succeeds.
  */
-static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___clear_bit(nr, addr);
@@ -52,7 +54,8 @@ static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
  * region of memory concurrently, the effect may be that only one operation
  * succeeds.
  */
-static __always_inline void __change_bit(long nr, volatile unsigned long *addr)
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___change_bit(nr, addr);
@@ -90,7 +93,8 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
  * This operation is non-atomic. If two instances of this operation race, one
  * can appear to succeed but actually fail.
  */
-static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_set_bit(nr, addr);
@@ -104,7 +108,8 @@ static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *
  * This operation is non-atomic. If two instances of this operation race, one
  * can appear to succeed but actually fail.
  */
-static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_clear_bit(nr, addr);
@@ -118,7 +123,8 @@ static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long
  * This operation is non-atomic. If two instances of this operation race, one
  * can appear to succeed but actually fail.
  */
-static __always_inline bool __test_and_change_bit(long nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_change_bit(nr, addr);
@@ -129,7 +135,8 @@ static __always_inline bool __test_and_change_bit(long nr, volatile unsigned lon
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
-static __always_inline bool test_bit(long nr, const volatile unsigned long *addr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long));
 	return arch_test_bit(nr, addr);
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 7aaed501f768..87087454a288 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -26,12 +26,29 @@ extern unsigned int __sw_hweight16(unsigned int w);
 extern unsigned int __sw_hweight32(unsigned int w);
 extern unsigned long __sw_hweight64(__u64 w);
 
+#include <asm-generic/bitops/generic-non-atomic.h>
+
 /*
  * Include this here because some architectures need generic_ffs/fls in
  * scope
  */
 #include <asm/bitops.h>
 
+/* Check that the bitops prototypes are sane */
+#define __check_bitop_pr(name)						\
+	static_assert(__same_type(arch_##name, generic_##name) &&	\
+		      __same_type(name, generic_##name))
+
+__check_bitop_pr(__set_bit);
+__check_bitop_pr(__clear_bit);
+__check_bitop_pr(__change_bit);
+__check_bitop_pr(__test_and_set_bit);
+__check_bitop_pr(__test_and_clear_bit);
+__check_bitop_pr(__test_and_change_bit);
+__check_bitop_pr(test_bit);
+
+#undef __check_bitop_pr
+
 static inline int get_bitmask_order(unsigned int count)
 {
 	int order;
diff --git a/tools/include/asm-generic/bitops/non-atomic.h b/tools/include/asm-generic/bitops/non-atomic.h
index 7e10c4b50c5d..e5e78e42e57b 100644
--- a/tools/include/asm-generic/bitops/non-atomic.h
+++ b/tools/include/asm-generic/bitops/non-atomic.h
@@ -2,7 +2,7 @@
 #ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
 #define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
 
-#include <asm/types.h>
+#include <linux/bits.h>
 
 /**
  * __set_bit - Set a bit in memory
@@ -13,7 +13,8 @@
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static inline void __set_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -21,7 +22,8 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
 	*p  |= mask;
 }
 
-static inline void __clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -38,7 +40,8 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static inline void __change_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -55,7 +58,8 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -74,7 +78,8 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -85,8 +90,8 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
 }
 
 /* WARNING: non atomic and it can be reordered! */
-static inline int __test_and_change_bit(int nr,
-					    volatile unsigned long *addr)
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -101,7 +106,8 @@ static inline int __test_and_change_bit(int nr,
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
-static inline int test_bit(int nr, const volatile unsigned long *addr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 4/6] bitops: define const_*() versions of the non-atomics
  2022-06-10 11:34 [PATCH v2 0/6] bitops: let optimize out non-atomic bitops on compile-time constants Alexander Lobakin
                   ` (2 preceding siblings ...)
  2022-06-10 11:34 ` [PATCH v2 3/6] bitops: unify non-atomic bitops prototypes across architectures Alexander Lobakin
@ 2022-06-10 11:34 ` Alexander Lobakin
  2022-06-10 13:56   ` Andy Shevchenko
  2022-06-15  2:57   ` Yury Norov
  2022-06-10 11:34 ` [PATCH v2 5/6] bitops: wrap non-atomic bitops with a transparent macro Alexander Lobakin
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 30+ messages in thread
From: Alexander Lobakin @ 2022-06-10 11:34 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

Define const_*() variants of the non-atomic bitops to be used when
the input arguments are compile-time constants, so that the compiler
will be always to resolve those to compile-time constants as well.
Those are mostly direct aliases for generic_*() with one exception
for const_test_bit(): the original one is declared atomic-safe and
thus doesn't discard the `volatile` qualifier, so in order to let
optimize the code, define it separately disregarding the qualifier.
Add them to the compile-time type checks as well just in case.

Suggested-by: Marco Elver <elver@google.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 .../asm-generic/bitops/generic-non-atomic.h   | 31 +++++++++++++++++++
 include/linux/bitops.h                        |  3 +-
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
index 3ce0fa0ab35f..9a77babfff35 100644
--- a/include/asm-generic/bitops/generic-non-atomic.h
+++ b/include/asm-generic/bitops/generic-non-atomic.h
@@ -121,4 +121,35 @@ generic_test_bit(unsigned long nr, const volatile unsigned long *addr)
 	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
 
+/*
+ * const_*() definitions provide good compile-time optimizations when
+ * the passed arguments can be resolved at compile time.
+ */
+#define const___set_bit			generic___set_bit
+#define const___clear_bit		generic___clear_bit
+#define const___change_bit		generic___change_bit
+#define const___test_and_set_bit	generic___test_and_set_bit
+#define const___test_and_clear_bit	generic___test_and_clear_bit
+#define const___test_and_change_bit	generic___test_and_change_bit
+
+/**
+ * const_test_bit - Determine whether a bit is set
+ * @nr: bit number to test
+ * @addr: Address to start counting from
+ *
+ * A version of generic_test_bit() which discards the `volatile` qualifier to
+ * allow the compiler to optimize code harder. Non-atomic and to be used only
+ * for testing compile-time constants, e.g. from the corresponding macro, or
+ * when you really know what you are doing.
+ */
+static __always_inline bool
+const_test_bit(unsigned long nr, const volatile unsigned long *addr)
+{
+	const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long val = *p;
+
+	return !!(val & mask);
+}
+
 #endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 87087454a288..51c22b8667b4 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -36,7 +36,8 @@ extern unsigned long __sw_hweight64(__u64 w);
 
 /* Check that the bitops prototypes are sane */
 #define __check_bitop_pr(name)						\
-	static_assert(__same_type(arch_##name, generic_##name) &&	\
+	static_assert(__same_type(const_##name, generic_##name) &&	\
+		      __same_type(arch_##name, generic_##name) &&	\
 		      __same_type(name, generic_##name))
 
 __check_bitop_pr(__set_bit);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 5/6] bitops: wrap non-atomic bitops with a transparent macro
  2022-06-10 11:34 [PATCH v2 0/6] bitops: let optimize out non-atomic bitops on compile-time constants Alexander Lobakin
                   ` (3 preceding siblings ...)
  2022-06-10 11:34 ` [PATCH v2 4/6] bitops: define const_*() versions of the non-atomics Alexander Lobakin
@ 2022-06-10 11:34 ` Alexander Lobakin
  2022-06-10 11:34 ` [PATCH v2 6/6] bitops: let optimize out non-atomic bitops on compile-time constants Alexander Lobakin
  2022-06-13  7:35 ` [PATCH v2 0/6] " Geert Uytterhoeven
  6 siblings, 0 replies; 30+ messages in thread
From: Alexander Lobakin @ 2022-06-10 11:34 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

In preparation for altering the non-atomic bitops with a macro, wrap
them in a transparent definition. This requires prepending one more
'_' to their names in order to be able to do that seamlessly.
sparc32 already has the triple-underscored functions, so I had to
rename them ('___' -> 'sp32_').

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 arch/alpha/include/asm/bitops.h               | 14 +++++-----
 arch/hexagon/include/asm/bitops.h             | 16 +++++------
 arch/ia64/include/asm/bitops.h                | 26 ++++++++---------
 arch/m68k/include/asm/bitops.h                | 14 +++++-----
 arch/sh/include/asm/bitops-op32.h             | 22 +++++++--------
 arch/sparc/include/asm/bitops_32.h            | 18 ++++++------
 arch/sparc/lib/atomic32.c                     | 12 ++++----
 .../bitops/instrumented-non-atomic.h          | 28 +++++++++----------
 include/asm-generic/bitops/non-atomic.h       | 14 +++++-----
 include/linux/bitops.h                        | 18 +++++++++++-
 tools/include/asm-generic/bitops/non-atomic.h | 24 ++++++++--------
 tools/include/linux/bitops.h                  | 16 +++++++++++
 12 files changed, 127 insertions(+), 95 deletions(-)

diff --git a/arch/alpha/include/asm/bitops.h b/arch/alpha/include/asm/bitops.h
index 381ad2eae4b4..e7d119826062 100644
--- a/arch/alpha/include/asm/bitops.h
+++ b/arch/alpha/include/asm/bitops.h
@@ -47,7 +47,7 @@ set_bit(unsigned long nr, volatile void * addr)
  * WARNING: non atomic version.
  */
 static __always_inline void
-__set_bit(unsigned long nr, volatile unsigned long *addr)
+___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	int *m = ((int *) addr) + (nr >> 5);
 
@@ -83,7 +83,7 @@ clear_bit_unlock(unsigned long nr, volatile void * addr)
  * WARNING: non atomic version.
  */
 static __always_inline void
-__clear_bit(unsigned long nr, volatile unsigned long *addr)
+___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	int *m = ((int *) addr) + (nr >> 5);
 
@@ -119,7 +119,7 @@ change_bit(unsigned long nr, volatile void * addr)
  * WARNING: non atomic version.
  */
 static __always_inline void
-__change_bit(unsigned long nr, volatile unsigned long *addr)
+___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	int *m = ((int *) addr) + (nr >> 5);
 
@@ -187,7 +187,7 @@ test_and_set_bit_lock(unsigned long nr, volatile void *addr)
  * WARNING: non atomic version.
  */
 static __always_inline bool
-__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = 1 << (nr & 0x1f);
 	int *m = ((int *) addr) + (nr >> 5);
@@ -231,7 +231,7 @@ test_and_clear_bit(unsigned long nr, volatile void * addr)
  * WARNING: non atomic version.
  */
 static __always_inline bool
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = 1 << (nr & 0x1f);
 	int *m = ((int *) addr) + (nr >> 5);
@@ -273,7 +273,7 @@ test_and_change_bit(unsigned long nr, volatile void * addr)
  * WARNING: non atomic version.
  */
 static __always_inline bool
-__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = 1 << (nr & 0x1f);
 	int *m = ((int *) addr) + (nr >> 5);
@@ -284,7 +284,7 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 static __always_inline bool
-test_bit(unsigned long nr, const volatile unsigned long *addr)
+_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return (1UL & (((const int *) addr)[nr >> 5] >> (nr & 31))) != 0UL;
 }
diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h
index a3bfe3a8d4b7..37144f8e0b0c 100644
--- a/arch/hexagon/include/asm/bitops.h
+++ b/arch/hexagon/include/asm/bitops.h
@@ -128,44 +128,44 @@ static inline void change_bit(int nr, volatile void *addr)
  *
  */
 static __always_inline void
-__clear_bit(unsigned long nr, volatile unsigned long *addr)
+___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	test_and_clear_bit(nr, addr);
 }
 
 static __always_inline void
-__set_bit(unsigned long nr, volatile unsigned long *addr)
+___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	test_and_set_bit(nr, addr);
 }
 
 static __always_inline void
-__change_bit(unsigned long nr, volatile unsigned long *addr)
+___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	test_and_change_bit(nr, addr);
 }
 
 /*  Apparently, at least some of these are allowed to be non-atomic  */
 static __always_inline bool
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_clear_bit(nr, addr);
 }
 
 static __always_inline bool
-__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_set_bit(nr, addr);
 }
 
 static __always_inline bool
-__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_change_bit(nr, addr);
 }
 
 static __always_inline bool
-test_bit(unsigned long nr, const volatile unsigned long *addr)
+_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	int retval;
 
@@ -179,7 +179,7 @@ test_bit(unsigned long nr, const volatile unsigned long *addr)
 	return retval;
 }
 
-#define __test_bit(nr, addr) test_bit(nr, addr)
+#define __test_bit(nr, addr) _test_bit(nr, addr)
 
 /*
  * ffz - find first zero in word.
diff --git a/arch/ia64/include/asm/bitops.h b/arch/ia64/include/asm/bitops.h
index 4267a217a503..3cb231fb0283 100644
--- a/arch/ia64/include/asm/bitops.h
+++ b/arch/ia64/include/asm/bitops.h
@@ -53,7 +53,7 @@ set_bit (int nr, volatile void *addr)
 }
 
 /**
- * __set_bit - Set a bit in memory
+ * ___set_bit - Set a bit in memory
  * @nr: the bit to set
  * @addr: the address to start counting from
  *
@@ -62,7 +62,7 @@ set_bit (int nr, volatile void *addr)
  * may be that only one operation succeeds.
  */
 static __always_inline void
-__set_bit(unsigned long nr, volatile unsigned long *addr)
+___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	*((__u32 *) addr + (nr >> 5)) |= (1 << (nr & 31));
 }
@@ -135,7 +135,7 @@ __clear_bit_unlock(int nr, void *addr)
 }
 
 /**
- * __clear_bit - Clears a bit in memory (non-atomic version)
+ * ___clear_bit - Clears a bit in memory (non-atomic version)
  * @nr: the bit to clear
  * @addr: the address to start counting from
  *
@@ -144,7 +144,7 @@ __clear_bit_unlock(int nr, void *addr)
  * may be that only one operation succeeds.
  */
 static __always_inline void
-__clear_bit(unsigned long nr, volatile unsigned long *addr)
+___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	*((__u32 *) addr + (nr >> 5)) &= ~(1 << (nr & 31));
 }
@@ -175,7 +175,7 @@ change_bit (int nr, volatile void *addr)
 }
 
 /**
- * __change_bit - Toggle a bit in memory
+ * ___change_bit - Toggle a bit in memory
  * @nr: the bit to toggle
  * @addr: the address to start counting from
  *
@@ -184,7 +184,7 @@ change_bit (int nr, volatile void *addr)
  * may be that only one operation succeeds.
  */
 static __always_inline void
-__change_bit(unsigned long nr, volatile unsigned long *addr)
+___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	*((__u32 *) addr + (nr >> 5)) ^= (1 << (nr & 31));
 }
@@ -224,7 +224,7 @@ test_and_set_bit (int nr, volatile void *addr)
 #define test_and_set_bit_lock test_and_set_bit
 
 /**
- * __test_and_set_bit - Set a bit and return its old value
+ * ___test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
@@ -233,7 +233,7 @@ test_and_set_bit (int nr, volatile void *addr)
  * but actually fail.  You must protect multiple accesses with a lock.
  */
 static __always_inline bool
-__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__u32 *p = (__u32 *) addr + (nr >> 5);
 	__u32 m = 1 << (nr & 31);
@@ -269,7 +269,7 @@ test_and_clear_bit (int nr, volatile void *addr)
 }
 
 /**
- * __test_and_clear_bit - Clear a bit and return its old value
+ * ___test_and_clear_bit - Clear a bit and return its old value
  * @nr: Bit to clear
  * @addr: Address to count from
  *
@@ -278,7 +278,7 @@ test_and_clear_bit (int nr, volatile void *addr)
  * but actually fail.  You must protect multiple accesses with a lock.
  */
 static __always_inline bool
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__u32 *p = (__u32 *) addr + (nr >> 5);
 	__u32 m = 1 << (nr & 31);
@@ -314,14 +314,14 @@ test_and_change_bit (int nr, volatile void *addr)
 }
 
 /**
- * __test_and_change_bit - Change a bit and return its old value
+ * ___test_and_change_bit - Change a bit and return its old value
  * @nr: Bit to change
  * @addr: Address to count from
  *
  * This operation is non-atomic and can be reordered.
  */
 static __always_inline bool
-__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__u32 old, bit = (1 << (nr & 31));
 	__u32 *m = (__u32 *) addr + (nr >> 5);
@@ -332,7 +332,7 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 static __always_inline bool
-test_bit(unsigned long nr, const volatile unsigned long *addr)
+_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1 & (((const volatile __u32 *) addr)[nr >> 5] >> (nr & 31));
 }
diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
index 9d44bd4713cb..ed08c9d9c547 100644
--- a/arch/m68k/include/asm/bitops.h
+++ b/arch/m68k/include/asm/bitops.h
@@ -66,7 +66,7 @@ static inline void bfset_mem_set_bit(int nr, volatile unsigned long *vaddr)
 #endif
 
 static __always_inline void
-__set_bit(unsigned long nr, volatile unsigned long *addr)
+___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	set_bit(nr, addr);
 }
@@ -109,7 +109,7 @@ static inline void bfclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
 #endif
 
 static __always_inline void
-__clear_bit(unsigned long nr, volatile unsigned long *addr)
+___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	clear_bit(nr, addr);
 }
@@ -152,13 +152,13 @@ static inline void bfchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
 #endif
 
 static __always_inline void
-__change_bit(unsigned long nr, volatile unsigned long *addr)
+___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	change_bit(nr, addr);
 }
 
 static __always_inline bool
-test_bit(unsigned long nr, const volatile unsigned long *addr)
+_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return (addr[nr >> 5] & (1UL << (nr & 31))) != 0;
 }
@@ -212,7 +212,7 @@ static inline int bfset_mem_test_and_set_bit(int nr,
 #endif
 
 static __always_inline bool
-__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_set_bit(nr, addr);
 }
@@ -265,7 +265,7 @@ static inline int bfclr_mem_test_and_clear_bit(int nr,
 #endif
 
 static __always_inline bool
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_clear_bit(nr, addr);
 }
@@ -318,7 +318,7 @@ static inline int bfchg_mem_test_and_change_bit(int nr,
 #endif
 
 static __always_inline bool
-__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_change_bit(nr, addr);
 }
diff --git a/arch/sh/include/asm/bitops-op32.h b/arch/sh/include/asm/bitops-op32.h
index dcd85866a394..24b8a809c1b9 100644
--- a/arch/sh/include/asm/bitops-op32.h
+++ b/arch/sh/include/asm/bitops-op32.h
@@ -19,7 +19,7 @@
 #endif
 
 static __always_inline void
-__set_bit(unsigned long nr, volatile unsigned long *addr)
+___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	if (__builtin_constant_p(nr)) {
 		__asm__ __volatile__ (
@@ -37,7 +37,7 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 static __always_inline void
-__clear_bit(unsigned long nr, volatile unsigned long *addr)
+___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	if (__builtin_constant_p(nr)) {
 		__asm__ __volatile__ (
@@ -56,7 +56,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * __change_bit - Toggle a bit in memory
+ * ___change_bit - Toggle a bit in memory
  * @nr: the bit to change
  * @addr: the address to start counting from
  *
@@ -65,7 +65,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
  * may be that only one operation succeeds.
  */
 static __always_inline void
-__change_bit(unsigned long nr, volatile unsigned long *addr)
+___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	if (__builtin_constant_p(nr)) {
 		__asm__ __volatile__ (
@@ -84,7 +84,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * __test_and_set_bit - Set a bit and return its old value
+ * ___test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
@@ -93,7 +93,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
  * but actually fail.  You must protect multiple accesses with a lock.
  */
 static __always_inline bool
-__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -104,7 +104,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * __test_and_clear_bit - Clear a bit and return its old value
+ * ___test_and_clear_bit - Clear a bit and return its old value
  * @nr: Bit to clear
  * @addr: Address to count from
  *
@@ -113,7 +113,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
  * but actually fail.  You must protect multiple accesses with a lock.
  */
 static __always_inline bool
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -125,7 +125,7 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 
 /* WARNING: non atomic and it can be reordered! */
 static __always_inline bool
-__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -136,12 +136,12 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * test_bit - Determine whether a bit is set
+ * _test_bit - Determine whether a bit is set
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
 static __always_inline bool
-test_bit(unsigned long nr, const volatile unsigned long *addr)
+_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
diff --git a/arch/sparc/include/asm/bitops_32.h b/arch/sparc/include/asm/bitops_32.h
index 889afa9f990f..3448c191b484 100644
--- a/arch/sparc/include/asm/bitops_32.h
+++ b/arch/sparc/include/asm/bitops_32.h
@@ -19,9 +19,9 @@
 #error only <linux/bitops.h> can be included directly
 #endif
 
-unsigned long ___set_bit(unsigned long *addr, unsigned long mask);
-unsigned long ___clear_bit(unsigned long *addr, unsigned long mask);
-unsigned long ___change_bit(unsigned long *addr, unsigned long mask);
+unsigned long sp32___set_bit(unsigned long *addr, unsigned long mask);
+unsigned long sp32___clear_bit(unsigned long *addr, unsigned long mask);
+unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask);
 
 /*
  * Set bit 'nr' in 32-bit quantity at address 'addr' where bit '0'
@@ -36,7 +36,7 @@ static inline int test_and_set_bit(unsigned long nr, volatile unsigned long *add
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	return ___set_bit(ADDR, mask) != 0;
+	return sp32___set_bit(ADDR, mask) != 0;
 }
 
 static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
@@ -46,7 +46,7 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	(void) ___set_bit(ADDR, mask);
+	(void) sp32___set_bit(ADDR, mask);
 }
 
 static inline int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
@@ -56,7 +56,7 @@ static inline int test_and_clear_bit(unsigned long nr, volatile unsigned long *a
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	return ___clear_bit(ADDR, mask) != 0;
+	return sp32___clear_bit(ADDR, mask) != 0;
 }
 
 static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
@@ -66,7 +66,7 @@ static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	(void) ___clear_bit(ADDR, mask);
+	(void) sp32___clear_bit(ADDR, mask);
 }
 
 static inline int test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
@@ -76,7 +76,7 @@ static inline int test_and_change_bit(unsigned long nr, volatile unsigned long *
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	return ___change_bit(ADDR, mask) != 0;
+	return sp32___change_bit(ADDR, mask) != 0;
 }
 
 static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
@@ -86,7 +86,7 @@ static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	(void) ___change_bit(ADDR, mask);
+	(void) sp32___change_bit(ADDR, mask);
 }
 
 #include <asm-generic/bitops/non-atomic.h>
diff --git a/arch/sparc/lib/atomic32.c b/arch/sparc/lib/atomic32.c
index 8b81d0f00c97..cf80d1ae352b 100644
--- a/arch/sparc/lib/atomic32.c
+++ b/arch/sparc/lib/atomic32.c
@@ -120,7 +120,7 @@ void arch_atomic_set(atomic_t *v, int i)
 }
 EXPORT_SYMBOL(arch_atomic_set);
 
-unsigned long ___set_bit(unsigned long *addr, unsigned long mask)
+unsigned long sp32___set_bit(unsigned long *addr, unsigned long mask)
 {
 	unsigned long old, flags;
 
@@ -131,9 +131,9 @@ unsigned long ___set_bit(unsigned long *addr, unsigned long mask)
 
 	return old & mask;
 }
-EXPORT_SYMBOL(___set_bit);
+EXPORT_SYMBOL(sp32___set_bit);
 
-unsigned long ___clear_bit(unsigned long *addr, unsigned long mask)
+unsigned long sp32___clear_bit(unsigned long *addr, unsigned long mask)
 {
 	unsigned long old, flags;
 
@@ -144,9 +144,9 @@ unsigned long ___clear_bit(unsigned long *addr, unsigned long mask)
 
 	return old & mask;
 }
-EXPORT_SYMBOL(___clear_bit);
+EXPORT_SYMBOL(sp32___clear_bit);
 
-unsigned long ___change_bit(unsigned long *addr, unsigned long mask)
+unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask)
 {
 	unsigned long old, flags;
 
@@ -157,7 +157,7 @@ unsigned long ___change_bit(unsigned long *addr, unsigned long mask)
 
 	return old & mask;
 }
-EXPORT_SYMBOL(___change_bit);
+EXPORT_SYMBOL(sp32___change_bit);
 
 unsigned long __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
 {
diff --git a/include/asm-generic/bitops/instrumented-non-atomic.h b/include/asm-generic/bitops/instrumented-non-atomic.h
index b019f77ef21c..988a3bbfba34 100644
--- a/include/asm-generic/bitops/instrumented-non-atomic.h
+++ b/include/asm-generic/bitops/instrumented-non-atomic.h
@@ -14,7 +14,7 @@
 #include <linux/instrumented.h>
 
 /**
- * __set_bit - Set a bit in memory
+ * ___set_bit - Set a bit in memory
  * @nr: the bit to set
  * @addr: the address to start counting from
  *
@@ -23,14 +23,14 @@
  * succeeds.
  */
 static __always_inline void
-__set_bit(unsigned long nr, volatile unsigned long *addr)
+___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___set_bit(nr, addr);
 }
 
 /**
- * __clear_bit - Clears a bit in memory
+ * ___clear_bit - Clears a bit in memory
  * @nr: the bit to clear
  * @addr: the address to start counting from
  *
@@ -39,14 +39,14 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
  * succeeds.
  */
 static __always_inline void
-__clear_bit(unsigned long nr, volatile unsigned long *addr)
+___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___clear_bit(nr, addr);
 }
 
 /**
- * __change_bit - Toggle a bit in memory
+ * ___change_bit - Toggle a bit in memory
  * @nr: the bit to change
  * @addr: the address to start counting from
  *
@@ -55,7 +55,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
  * succeeds.
  */
 static __always_inline void
-__change_bit(unsigned long nr, volatile unsigned long *addr)
+___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___change_bit(nr, addr);
@@ -86,7 +86,7 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
 }
 
 /**
- * __test_and_set_bit - Set a bit and return its old value
+ * ___test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
@@ -94,14 +94,14 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
  * can appear to succeed but actually fail.
  */
 static __always_inline bool
-__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_set_bit(nr, addr);
 }
 
 /**
- * __test_and_clear_bit - Clear a bit and return its old value
+ * ___test_and_clear_bit - Clear a bit and return its old value
  * @nr: Bit to clear
  * @addr: Address to count from
  *
@@ -109,14 +109,14 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
  * can appear to succeed but actually fail.
  */
 static __always_inline bool
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_clear_bit(nr, addr);
 }
 
 /**
- * __test_and_change_bit - Change a bit and return its old value
+ * ___test_and_change_bit - Change a bit and return its old value
  * @nr: Bit to change
  * @addr: Address to count from
  *
@@ -124,19 +124,19 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
  * can appear to succeed but actually fail.
  */
 static __always_inline bool
-__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_change_bit(nr, addr);
 }
 
 /**
- * test_bit - Determine whether a bit is set
+ * _test_bit - Determine whether a bit is set
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
 static __always_inline bool
-test_bit(unsigned long nr, const volatile unsigned long *addr)
+_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long));
 	return arch_test_bit(nr, addr);
diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h
index a05bc090a6a3..23c7c16c6e88 100644
--- a/include/asm-generic/bitops/non-atomic.h
+++ b/include/asm-generic/bitops/non-atomic.h
@@ -6,24 +6,24 @@
 #include <asm/types.h>
 
 #define arch___set_bit generic___set_bit
-#define __set_bit arch___set_bit
+#define ___set_bit arch___set_bit
 
 #define arch___clear_bit generic___clear_bit
-#define __clear_bit arch___clear_bit
+#define ___clear_bit arch___clear_bit
 
 #define arch___change_bit generic___change_bit
-#define __change_bit arch___change_bit
+#define ___change_bit arch___change_bit
 
 #define arch___test_and_set_bit generic___test_and_set_bit
-#define __test_and_set_bit arch___test_and_set_bit
+#define ___test_and_set_bit arch___test_and_set_bit
 
 #define arch___test_and_clear_bit generic___test_and_clear_bit
-#define __test_and_clear_bit arch___test_and_clear_bit
+#define ___test_and_clear_bit arch___test_and_clear_bit
 
 #define arch___test_and_change_bit generic___test_and_change_bit
-#define __test_and_change_bit arch___test_and_change_bit
+#define ___test_and_change_bit arch___test_and_change_bit
 
 #define arch_test_bit generic_test_bit
-#define test_bit arch_test_bit
+#define _test_bit arch_test_bit
 
 #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 51c22b8667b4..753f98e0dcf5 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -26,8 +26,24 @@ extern unsigned int __sw_hweight16(unsigned int w);
 extern unsigned int __sw_hweight32(unsigned int w);
 extern unsigned long __sw_hweight64(__u64 w);
 
+/*
+ * Defined here because those may be needed by architecture-specific static
+ * inlines.
+ */
+
 #include <asm-generic/bitops/generic-non-atomic.h>
 
+#define bitop(op, nr, addr)						\
+	op(nr, addr)
+
+#define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
+#define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
+#define __change_bit(nr, addr)		bitop(___change_bit, nr, addr)
+#define __test_and_set_bit(nr, addr)	bitop(___test_and_set_bit, nr, addr)
+#define __test_and_clear_bit(nr, addr)	bitop(___test_and_clear_bit, nr, addr)
+#define __test_and_change_bit(nr, addr)	bitop(___test_and_change_bit, nr, addr)
+#define test_bit(nr, addr)		bitop(_test_bit, nr, addr)
+
 /*
  * Include this here because some architectures need generic_ffs/fls in
  * scope
@@ -38,7 +54,7 @@ extern unsigned long __sw_hweight64(__u64 w);
 #define __check_bitop_pr(name)						\
 	static_assert(__same_type(const_##name, generic_##name) &&	\
 		      __same_type(arch_##name, generic_##name) &&	\
-		      __same_type(name, generic_##name))
+		      __same_type(_##name, generic_##name))
 
 __check_bitop_pr(__set_bit);
 __check_bitop_pr(__clear_bit);
diff --git a/tools/include/asm-generic/bitops/non-atomic.h b/tools/include/asm-generic/bitops/non-atomic.h
index e5e78e42e57b..0c472a833408 100644
--- a/tools/include/asm-generic/bitops/non-atomic.h
+++ b/tools/include/asm-generic/bitops/non-atomic.h
@@ -5,7 +5,7 @@
 #include <linux/bits.h>
 
 /**
- * __set_bit - Set a bit in memory
+ * ___set_bit - Set a bit in memory
  * @nr: the bit to set
  * @addr: the address to start counting from
  *
@@ -14,7 +14,7 @@
  * may be that only one operation succeeds.
  */
 static __always_inline void
-__set_bit(unsigned long nr, volatile unsigned long *addr)
+___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -23,7 +23,7 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 static __always_inline void
-__clear_bit(unsigned long nr, volatile unsigned long *addr)
+___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -32,7 +32,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * __change_bit - Toggle a bit in memory
+ * ___change_bit - Toggle a bit in memory
  * @nr: the bit to change
  * @addr: the address to start counting from
  *
@@ -41,7 +41,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
  * may be that only one operation succeeds.
  */
 static __always_inline void
-__change_bit(unsigned long nr, volatile unsigned long *addr)
+___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -50,7 +50,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * __test_and_set_bit - Set a bit and return its old value
+ * ___test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
@@ -59,7 +59,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
  * but actually fail.  You must protect multiple accesses with a lock.
  */
 static __always_inline bool
-__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -70,7 +70,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * __test_and_clear_bit - Clear a bit and return its old value
+ * ___test_and_clear_bit - Clear a bit and return its old value
  * @nr: Bit to clear
  * @addr: Address to count from
  *
@@ -79,7 +79,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
  * but actually fail.  You must protect multiple accesses with a lock.
  */
 static __always_inline bool
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -91,7 +91,7 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 
 /* WARNING: non atomic and it can be reordered! */
 static __always_inline bool
-__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -102,12 +102,12 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * test_bit - Determine whether a bit is set
+ * _test_bit - Determine whether a bit is set
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
 static __always_inline bool
-test_bit(unsigned long nr, const volatile unsigned long *addr)
+_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
diff --git a/tools/include/linux/bitops.h b/tools/include/linux/bitops.h
index 5fca38fe1ba8..f18683b95ea6 100644
--- a/tools/include/linux/bitops.h
+++ b/tools/include/linux/bitops.h
@@ -25,6 +25,22 @@ extern unsigned int __sw_hweight16(unsigned int w);
 extern unsigned int __sw_hweight32(unsigned int w);
 extern unsigned long __sw_hweight64(__u64 w);
 
+/*
+ * Defined here because those may be needed by architecture-specific static
+ * inlines.
+ */
+
+#define bitop(op, nr, addr)						\
+	op(nr, addr)
+
+#define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
+#define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
+#define __change_bit(nr, addr)		bitop(___change_bit, nr, addr)
+#define __test_and_set_bit(nr, addr)	bitop(___test_and_set_bit, nr, addr)
+#define __test_and_clear_bit(nr, addr)	bitop(___test_and_clear_bit, nr, addr)
+#define __test_and_change_bit(nr, addr)	bitop(___test_and_change_bit, nr, addr)
+#define test_bit(nr, addr)		bitop(_test_bit, nr, addr)
+
 /*
  * Include this here because some architectures need generic_ffs/fls in
  * scope
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 6/6] bitops: let optimize out non-atomic bitops on compile-time constants
  2022-06-10 11:34 [PATCH v2 0/6] bitops: let optimize out non-atomic bitops on compile-time constants Alexander Lobakin
                   ` (4 preceding siblings ...)
  2022-06-10 11:34 ` [PATCH v2 5/6] bitops: wrap non-atomic bitops with a transparent macro Alexander Lobakin
@ 2022-06-10 11:34 ` Alexander Lobakin
  2022-06-15  3:26   ` Yury Norov
  2022-06-13  7:35 ` [PATCH v2 0/6] " Geert Uytterhoeven
  6 siblings, 1 reply; 30+ messages in thread
From: Alexander Lobakin @ 2022-06-10 11:34 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

Currently, many architecture-specific non-atomic bitop
implementations use inline asm or other hacks which are faster or
more robust when working with "real" variables (i.e. fields from
the structures etc.), but the compilers have no clue how to optimize
them out when called on compile-time constants. That said, the
following code:

	DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
	unsigned long bar = BIT(BAR_BIT);
	unsigned long baz = 0;

	__set_bit(FOO_BIT, foo);
	baz |= BIT(BAZ_BIT);

	BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
	BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
	BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));

triggers the first assertion on x86_64, which means that the
compiler is unable to evaluate it to a compile-time initializer
when the architecture-specific bitop is used even if it's obvious.
In order to let the compiler optimize out such cases, expand the
bitop() macro to use the "constant" C non-atomic bitop
implementations when all of the arguments passed are compile-time
constants, which means that the result will be a compile-time
constant as well, so that it produces more efficient and simple
code in 100% cases, comparing to the architecture-specific
counterparts.

The savings are architecture, compiler and compiler flags dependent,
for example, on x86_64 -O2:

GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)

and ARM64 (courtesy of Mark):

GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)

Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 include/linux/bitops.h | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 753f98e0dcf5..364bdc3606b4 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -33,8 +33,24 @@ extern unsigned long __sw_hweight64(__u64 w);
 
 #include <asm-generic/bitops/generic-non-atomic.h>
 
+/*
+ * Many architecture-specific non-atomic bitops contain inline asm code and due
+ * to that the compiler can't optimize them to compile-time expressions or
+ * constants. In contrary, gen_*() helpers are defined in pure C and compilers
+ * optimize them just well.
+ * Therefore, to make `unsigned long foo = 0; __set_bit(BAR, &foo)` effectively
+ * equal to `unsigned long foo = BIT(BAR)`, pick the generic C alternative when
+ * the arguments can be resolved at compile time. That expression itself is a
+ * constant and doesn't bring any functional changes to the rest of cases.
+ * The casts to `uintptr_t` are needed to mitigate `-Waddress` warnings when
+ * passing a bitmap from .bss or .data (-> `!!addr` is always true).
+ */
 #define bitop(op, nr, addr)						\
-	op(nr, addr)
+	((__builtin_constant_p(nr) &&					\
+	  __builtin_constant_p((uintptr_t)(addr) != (uintptr_t)NULL) &&	\
+	  (uintptr_t)(addr) != (uintptr_t)NULL &&			\
+	  __builtin_constant_p(*(const unsigned long *)(addr))) ?	\
+	 const##op(nr, addr) : op(nr, addr))
 
 #define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
 #define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* RE: [PATCH v2 1/6] ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr()
  2022-06-10 11:34 ` [PATCH v2 1/6] ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr() Alexander Lobakin
@ 2022-06-10 12:18   ` David Laight
  2022-06-10 13:46   ` Andy Shevchenko
  2022-06-15  2:59   ` Yury Norov
  2 siblings, 0 replies; 30+ messages in thread
From: David Laight @ 2022-06-10 12:18 UTC (permalink / raw)
  To: 'Alexander Lobakin', Arnd Bergmann, Yury Norov
  Cc: Andy Shevchenko, Mark Rutland, Matt Turner, Brian Cain,
	Geert Uytterhoeven, Yoshinori Sato, Rich Felker, David S. Miller,
	Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel, stable, kernel test robot

From: Alexander Lobakin
> Sent: 10 June 2022 12:34
> 
> test_bit(), as any other bitmap op, takes `unsigned long *` as a
> second argument (pointer to the actual bitmap), as any bitmap
> itself is an array of unsigned longs. However, the ia64_get_irr()
> code passes a ref to `u64` as a second argument.
> This works with the ia64 bitops implementation due to that they
> have `void *` as the second argument and then cast it later on.
> This works with the bitmap API itself due to that `unsigned long`
> has the same size on ia64 as `u64` (`unsigned long long`), but
> from the compiler PoV those two are different.
> Define @irr as `unsigned long` to fix that. That implies no
> functional changes. Has been hidden for 16 years!

Wouldn't it be better to just test the bit?
As in:
	return irr & (1ull << bit);

    David

> 
> Fixes: a58786917ce2 ("[IA64] avoid broken SAL_CACHE_FLUSH implementations")
> Cc: stable@vger.kernel.org # 2.6.16+
> Reported-by: kernel test robot <lkp@intel.com>
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> ---
>  arch/ia64/include/asm/processor.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h
> index 7cbce290f4e5..757c2f6d8d4b 100644
> --- a/arch/ia64/include/asm/processor.h
> +++ b/arch/ia64/include/asm/processor.h
> @@ -538,7 +538,7 @@ ia64_get_irr(unsigned int vector)
>  {
>  	unsigned int reg = vector / 64;
>  	unsigned int bit = vector % 64;
> -	u64 irr;
> +	unsigned long irr;
> 
>  	switch (reg) {
>  	case 0: irr = ia64_getreg(_IA64_REG_CR_IRR0); break;
> --
> 2.36.1

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/6] ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr()
  2022-06-10 11:34 ` [PATCH v2 1/6] ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr() Alexander Lobakin
  2022-06-10 12:18   ` David Laight
@ 2022-06-10 13:46   ` Andy Shevchenko
  2022-06-15  2:59   ` Yury Norov
  2 siblings, 0 replies; 30+ messages in thread
From: Andy Shevchenko @ 2022-06-10 13:46 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Mark Rutland, Matt Turner, Brian Cain,
	Geert Uytterhoeven, Yoshinori Sato, Rich Felker, David S. Miller,
	Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel, stable, kernel test robot

On Fri, Jun 10, 2022 at 01:34:22PM +0200, Alexander Lobakin wrote:
> test_bit(), as any other bitmap op, takes `unsigned long *` as a
> second argument (pointer to the actual bitmap), as any bitmap
> itself is an array of unsigned longs. However, the ia64_get_irr()
> code passes a ref to `u64` as a second argument.
> This works with the ia64 bitops implementation due to that they
> have `void *` as the second argument and then cast it later on.
> This works with the bitmap API itself due to that `unsigned long`
> has the same size on ia64 as `u64` (`unsigned long long`), but
> from the compiler PoV those two are different.
> Define @irr as `unsigned long` to fix that. That implies no
> functional changes. Has been hidden for 16 years!

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

> Fixes: a58786917ce2 ("[IA64] avoid broken SAL_CACHE_FLUSH implementations")
> Cc: stable@vger.kernel.org # 2.6.16+
> Reported-by: kernel test robot <lkp@intel.com>
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> ---
>  arch/ia64/include/asm/processor.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h
> index 7cbce290f4e5..757c2f6d8d4b 100644
> --- a/arch/ia64/include/asm/processor.h
> +++ b/arch/ia64/include/asm/processor.h
> @@ -538,7 +538,7 @@ ia64_get_irr(unsigned int vector)
>  {
>  	unsigned int reg = vector / 64;
>  	unsigned int bit = vector % 64;
> -	u64 irr;
> +	unsigned long irr;
>  
>  	switch (reg) {
>  	case 0: irr = ia64_getreg(_IA64_REG_CR_IRR0); break;
> -- 
> 2.36.1
> 

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 2/6] bitops: always define asm-generic non-atomic bitops
  2022-06-10 11:34 ` [PATCH v2 2/6] bitops: always define asm-generic non-atomic bitops Alexander Lobakin
@ 2022-06-10 13:50   ` Andy Shevchenko
  2022-06-10 16:02     ` Luck, Tony
  0 siblings, 1 reply; 30+ messages in thread
From: Andy Shevchenko @ 2022-06-10 13:50 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Mark Rutland, Matt Turner, Brian Cain,
	Geert Uytterhoeven, Yoshinori Sato, Rich Felker, David S. Miller,
	Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

On Fri, Jun 10, 2022 at 01:34:23PM +0200, Alexander Lobakin wrote:
> Move generic non-atomic bitops from the asm-generic header which
> gets included only when there are no architecture-specific
> alternatives, to a separate independent file to make them always
> available.

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

But see below.

> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> ---
>  .../asm-generic/bitops/generic-non-atomic.h   | 124 ++++++++++++++++++
>  include/asm-generic/bitops/non-atomic.h       | 109 ++-------------
>  2 files changed, 132 insertions(+), 101 deletions(-)
>  create mode 100644 include/asm-generic/bitops/generic-non-atomic.h
> 
> diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> new file mode 100644
> index 000000000000..808bc4469886
> --- /dev/null
> +++ b/include/asm-generic/bitops/generic-non-atomic.h
> @@ -0,0 +1,124 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +#ifndef __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
> +#define __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
> +
> +#include <linux/bits.h>
> +
> +#ifndef _LINUX_BITOPS_H
> +#error only <linux/bitops.h> can be included directly
> +#endif
> +
> +/*
> + * Generic definitions for bit operations, should not be used in regular code
> + * directly.
> + */
> +
> +/**
> + * generic___set_bit - Set a bit in memory
> + * @nr: the bit to set
> + * @addr: the address to start counting from
> + *
> + * Unlike set_bit(), this function is non-atomic and may be reordered.
> + * If it's called on the same region of memory simultaneously, the effect
> + * may be that only one operation succeeds.
> + */
> +static __always_inline void
> +generic___set_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +	unsigned long mask = BIT_MASK(nr);
> +	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +
> +	*p  |= mask;
> +}
> +
> +static __always_inline void
> +generic___clear_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +	unsigned long mask = BIT_MASK(nr);
> +	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +
> +	*p &= ~mask;
> +}
> +
> +/**
> + * generic___change_bit - Toggle a bit in memory
> + * @nr: the bit to change
> + * @addr: the address to start counting from
> + *
> + * Unlike change_bit(), this function is non-atomic and may be reordered.
> + * If it's called on the same region of memory simultaneously, the effect
> + * may be that only one operation succeeds.
> + */
> +static __always_inline
> +void generic___change_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +	unsigned long mask = BIT_MASK(nr);
> +	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +
> +	*p ^= mask;
> +}
> +
> +/**
> + * generic___test_and_set_bit - Set a bit and return its old value
> + * @nr: Bit to set
> + * @addr: Address to count from
> + *
> + * This operation is non-atomic and can be reordered.
> + * If two examples of this operation race, one can appear to succeed
> + * but actually fail.  You must protect multiple accesses with a lock.
> + */
> +static __always_inline int
> +generic___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +	unsigned long mask = BIT_MASK(nr);
> +	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +	unsigned long old = *p;
> +
> +	*p = old | mask;
> +	return (old & mask) != 0;
> +}
> +
> +/**
> + * generic___test_and_clear_bit - Clear a bit and return its old value
> + * @nr: Bit to clear
> + * @addr: Address to count from
> + *
> + * This operation is non-atomic and can be reordered.
> + * If two examples of this operation race, one can appear to succeed
> + * but actually fail.  You must protect multiple accesses with a lock.
> + */
> +static __always_inline int
> +generic___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +	unsigned long mask = BIT_MASK(nr);
> +	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +	unsigned long old = *p;
> +
> +	*p = old & ~mask;
> +	return (old & mask) != 0;
> +}
> +
> +/* WARNING: non atomic and it can be reordered! */
> +static __always_inline int
> +generic___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +	unsigned long mask = BIT_MASK(nr);
> +	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +	unsigned long old = *p;
> +
> +	*p = old ^ mask;
> +	return (old & mask) != 0;
> +}
> +
> +/**
> + * generic_test_bit - Determine whether a bit is set
> + * @nr: bit number to test
> + * @addr: Address to start counting from
> + */

Shouldn't we add in this or in separate patch a big NOTE to explain that this
is actually atomic and must be kept as a such?

> +static __always_inline int
> +generic_test_bit(unsigned int nr, const volatile unsigned long *addr)
> +{
> +	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> +}
> +
> +#endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
> diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h
> index 078cc68be2f1..a05bc090a6a3 100644
> --- a/include/asm-generic/bitops/non-atomic.h
> +++ b/include/asm-generic/bitops/non-atomic.h
> @@ -2,121 +2,28 @@
>  #ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
>  #define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
>  
> +#include <asm-generic/bitops/generic-non-atomic.h>
>  #include <asm/types.h>
>  
> -/**
> - * arch___set_bit - Set a bit in memory
> - * @nr: the bit to set
> - * @addr: the address to start counting from
> - *
> - * Unlike set_bit(), this function is non-atomic and may be reordered.
> - * If it's called on the same region of memory simultaneously, the effect
> - * may be that only one operation succeeds.
> - */
> -static __always_inline void
> -arch___set_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -	unsigned long mask = BIT_MASK(nr);
> -	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -
> -	*p  |= mask;
> -}
> +#define arch___set_bit generic___set_bit
>  #define __set_bit arch___set_bit
>  
> -static __always_inline void
> -arch___clear_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -	unsigned long mask = BIT_MASK(nr);
> -	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -
> -	*p &= ~mask;
> -}
> +#define arch___clear_bit generic___clear_bit
>  #define __clear_bit arch___clear_bit
>  
> -/**
> - * arch___change_bit - Toggle a bit in memory
> - * @nr: the bit to change
> - * @addr: the address to start counting from
> - *
> - * Unlike change_bit(), this function is non-atomic and may be reordered.
> - * If it's called on the same region of memory simultaneously, the effect
> - * may be that only one operation succeeds.
> - */
> -static __always_inline
> -void arch___change_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -	unsigned long mask = BIT_MASK(nr);
> -	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -
> -	*p ^= mask;
> -}
> +#define arch___change_bit generic___change_bit
>  #define __change_bit arch___change_bit
>  
> -/**
> - * arch___test_and_set_bit - Set a bit and return its old value
> - * @nr: Bit to set
> - * @addr: Address to count from
> - *
> - * This operation is non-atomic and can be reordered.
> - * If two examples of this operation race, one can appear to succeed
> - * but actually fail.  You must protect multiple accesses with a lock.
> - */
> -static __always_inline int
> -arch___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -	unsigned long mask = BIT_MASK(nr);
> -	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -	unsigned long old = *p;
> -
> -	*p = old | mask;
> -	return (old & mask) != 0;
> -}
> +#define arch___test_and_set_bit generic___test_and_set_bit
>  #define __test_and_set_bit arch___test_and_set_bit
>  
> -/**
> - * arch___test_and_clear_bit - Clear a bit and return its old value
> - * @nr: Bit to clear
> - * @addr: Address to count from
> - *
> - * This operation is non-atomic and can be reordered.
> - * If two examples of this operation race, one can appear to succeed
> - * but actually fail.  You must protect multiple accesses with a lock.
> - */
> -static __always_inline int
> -arch___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -	unsigned long mask = BIT_MASK(nr);
> -	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -	unsigned long old = *p;
> -
> -	*p = old & ~mask;
> -	return (old & mask) != 0;
> -}
> +#define arch___test_and_clear_bit generic___test_and_clear_bit
>  #define __test_and_clear_bit arch___test_and_clear_bit
>  
> -/* WARNING: non atomic and it can be reordered! */
> -static __always_inline int
> -arch___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -	unsigned long mask = BIT_MASK(nr);
> -	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -	unsigned long old = *p;
> -
> -	*p = old ^ mask;
> -	return (old & mask) != 0;
> -}
> +#define arch___test_and_change_bit generic___test_and_change_bit
>  #define __test_and_change_bit arch___test_and_change_bit
>  
> -/**
> - * arch_test_bit - Determine whether a bit is set
> - * @nr: bit number to test
> - * @addr: Address to start counting from
> - */
> -static __always_inline int
> -arch_test_bit(unsigned int nr, const volatile unsigned long *addr)
> -{
> -	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> -}
> +#define arch_test_bit generic_test_bit
>  #define test_bit arch_test_bit
>  
>  #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
> -- 
> 2.36.1
> 

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 4/6] bitops: define const_*() versions of the non-atomics
  2022-06-10 11:34 ` [PATCH v2 4/6] bitops: define const_*() versions of the non-atomics Alexander Lobakin
@ 2022-06-10 13:56   ` Andy Shevchenko
  2022-06-13 14:30     ` Alexander Lobakin
  2022-06-15  2:57   ` Yury Norov
  1 sibling, 1 reply; 30+ messages in thread
From: Andy Shevchenko @ 2022-06-10 13:56 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Mark Rutland, Matt Turner, Brian Cain,
	Geert Uytterhoeven, Yoshinori Sato, Rich Felker, David S. Miller,
	Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

On Fri, Jun 10, 2022 at 01:34:25PM +0200, Alexander Lobakin wrote:
> Define const_*() variants of the non-atomic bitops to be used when
> the input arguments are compile-time constants, so that the compiler
> will be always to resolve those to compile-time constants as well.
> Those are mostly direct aliases for generic_*() with one exception
> for const_test_bit(): the original one is declared atomic-safe and
> thus doesn't discard the `volatile` qualifier, so in order to let
> optimize the code, define it separately disregarding the qualifier.
> Add them to the compile-time type checks as well just in case.

...

>  /* Check that the bitops prototypes are sane */
>  #define __check_bitop_pr(name)						\
> -	static_assert(__same_type(arch_##name, generic_##name) &&	\
> +	static_assert(__same_type(const_##name, generic_##name) &&	\
> +		      __same_type(arch_##name, generic_##name) &&	\
>  		      __same_type(name, generic_##name))

Can't it be a one line change and actually keeping ordering at the same time?

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2 2/6] bitops: always define asm-generic non-atomic bitops
  2022-06-10 13:50   ` Andy Shevchenko
@ 2022-06-10 16:02     ` Luck, Tony
  2022-06-10 16:32       ` Marco Elver
  0 siblings, 1 reply; 30+ messages in thread
From: Luck, Tony @ 2022-06-10 16:02 UTC (permalink / raw)
  To: Andy Shevchenko, Lobakin, Alexandr
  Cc: Arnd Bergmann, Yury Norov, Mark Rutland, Matt Turner, Brian Cain,
	Geert Uytterhoeven, Yoshinori Sato, Rich Felker, David S. Miller,
	Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Greg Kroah-Hartman, linux-alpha,
	linux-hexagon, linux-ia64, linux-m68k, linux-sh, sparclinux,
	linux-arch, linux-kernel

> > +/**
> > + * generic_test_bit - Determine whether a bit is set
> > + * @nr: bit number to test
> > + * @addr: Address to start counting from
> > + */
>
> Shouldn't we add in this or in separate patch a big NOTE to explain that this
> is actually atomic and must be kept as a such?

"atomic" isn't really the right word. The volatile access makes sure that the
compiler does the test at the point that the source code asked, and doesn't
move it before/after other operations.

But there is no such thing as an atomic test_bit() operation:

	if (test_bit(5, addr)) {
		/* some other CPU nukes bit 5 */

		/* I know it was set when I looked, but now, could be anything */

		...
	}

-Tony

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 2/6] bitops: always define asm-generic non-atomic bitops
  2022-06-10 16:02     ` Luck, Tony
@ 2022-06-10 16:32       ` Marco Elver
  2022-06-13 14:19         ` Alexander Lobakin
  0 siblings, 1 reply; 30+ messages in thread
From: Marco Elver @ 2022-06-10 16:32 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Andy Shevchenko, Lobakin, Alexandr, Arnd Bergmann, Yury Norov,
	Mark Rutland, Matt Turner, Brian Cain, Geert Uytterhoeven,
	Yoshinori Sato, Rich Felker, David S. Miller, Kees Cook,
	Peter Zijlstra (Intel),
	Borislav Petkov, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

On Fri, 10 Jun 2022 at 18:02, Luck, Tony <tony.luck@intel.com> wrote:
>
> > > +/**
> > > + * generic_test_bit - Determine whether a bit is set
> > > + * @nr: bit number to test
> > > + * @addr: Address to start counting from
> > > + */
> >
> > Shouldn't we add in this or in separate patch a big NOTE to explain that this
> > is actually atomic and must be kept as a such?
>
> "atomic" isn't really the right word. The volatile access makes sure that the
> compiler does the test at the point that the source code asked, and doesn't
> move it before/after other operations.

It's listed in Documentation/atomic_bitops.txt.

It is as "atomic" as READ_ONCE() or atomic_read() is. Though you are
right that the "atomicity" of reading one bit is almost a given,
because we can't really read half a bit.
The main thing is that the compiler keeps it "atomic" and e.g. doesn't
fuse the load with another or elide it completely, and then transforms
the code in concurrency-unfriendly ways.

Like READ_ONCE() and friends, test_bit(), unlike non-atomic bitops,
may also be used to dependency-order some subsequent marked (viz.
atomic) operations.

> But there is no such thing as an atomic test_bit() operation:
>
>         if (test_bit(5, addr)) {
>                 /* some other CPU nukes bit 5 */
>
>                 /* I know it was set when I looked, but now, could be anything */

The operation itself is atomic, because reading half a bit is
impossible. Whether or not that bit is modified concurrently is a
different problem.

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 0/6] bitops: let optimize out non-atomic bitops on compile-time constants
  2022-06-10 11:34 [PATCH v2 0/6] bitops: let optimize out non-atomic bitops on compile-time constants Alexander Lobakin
                   ` (5 preceding siblings ...)
  2022-06-10 11:34 ` [PATCH v2 6/6] bitops: let optimize out non-atomic bitops on compile-time constants Alexander Lobakin
@ 2022-06-13  7:35 ` Geert Uytterhoeven
  2022-06-13 14:26   ` Alexander Lobakin
  6 siblings, 1 reply; 30+ messages in thread
From: Geert Uytterhoeven @ 2022-06-13  7:35 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Mark Rutland,
	Matt Turner, Brian Cain, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	alpha, open list:QUALCOMM HEXAGON...,
	linux-ia64, linux-m68k, Linux-sh list, sparclinux, Linux-Arch,
	Linux Kernel Mailing List

Hi Alexander,

On Fri, Jun 10, 2022 at 1:35 PM Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
> While I was working on converting some structure fields from a fixed
> type to a bitmap, I started observing code size increase not only in
> places where the code works with the converted structure fields, but
> also where the converted vars were on the stack. That said, the
> following code:
>
>         DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
>         unsigned long bar = BIT(BAR_BIT);
>         unsigned long baz = 0;
>
>         __set_bit(FOO_BIT, foo);
>         baz |= BIT(BAZ_BIT);
>
>         BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
>         BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
>         BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
>
> triggers the first assertion on x86_64, which means that the
> compiler is unable to evaluate it to a compile-time initializer
> when the architecture-specific bitop is used even if it's obvious.
> I found that this is due to that many architecture-specific
> non-atomic bitop implementations use inline asm or other hacks which
> are faster or more robust when working with "real" variables (i.e.
> fields from the structures etc.), but the compilers have no clue how
> to optimize them out when called on compile-time constants.
>
> So, in order to let the compiler optimize out such cases, expand the
> test_bit() and __*_bit() definitions with a compile-time condition
> check, so that they will pick the generic C non-atomic bitop
> implementations when all of the arguments passed are compile-time
> constants, which means that the result will be a compile-time
> constant as well and the compiler will produce more efficient and
> simple code in 100% cases (no changes when there's at least one
> non-compile-time-constant argument).
> The condition itself:
>
> if (
> __builtin_constant_p(nr) &&     /* <- bit position is constant */
> __builtin_constant_p(!!addr) && /* <- compiler knows bitmap addr is
>                                       always either NULL or not */
> addr &&                         /* <- bitmap addr is not NULL */
> __builtin_constant_p(*addr)     /* <- compiler knows the value of
>                                       the target bitmap */
> )
>         /* then pick the generic C variant
> else
>         /* old code path, arch-specific
>
> I also tried __is_constexpr() as suggested by Andy, but it was
> always returning 0 ('not a constant') for the 2,3 and 4th
> conditions.
>
> The savings are architecture, compiler and compiler flags dependent,
> for example, on x86_64 -O2:
>
> GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
>
> and ARM64 (courtesy of Mark[0]):
>
> GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
>
> And the following:
>
>         DECLARE_BITMAP(flags, __IP_TUNNEL_FLAG_NUM) = { };
>         __be16 flags;
>
>         __set_bit(IP_TUNNEL_CSUM_BIT, flags);
>
>         tun_flags = cpu_to_be16(*flags & U16_MAX);
>
>         if (test_bit(IP_TUNNEL_VTI_BIT, flags))
>                 tun_flags |= VTI_ISVTI;
>
>         BUILD_BUG_ON(!__builtin_constant_p(tun_flags));
>
> doesn't blow up anymore, so that we now can e.g. use fixed bitmaps
> in compile-time assertions etc.
>
> The series has been in intel-next for a while with no reported issues.
>
> From v1[1]:
> * change 'gen_' prefixes to '_generic' to disambiguate from
>   'generated' etc. (Mark);
> * define a separate 'const_' set to use in the optimization to keep
>   the generic test_bit() atomic-safe (Marco);
> * unify arch_{test,__*}_bit() as well and include them in the type
>   check;
> * add more relevant and up-to-date bloat-o-meter results, including
>   ARM64 (me, Mark);
> * pick a couple '*-by' tags (Mark, Yury).

Thanks for the update!

On m68k, using gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04), this
blows up immediately with:

  CC      kernel/bounds.s
In file included from include/linux/bits.h:22,
                 from include/linux/ratelimit_types.h:5,
                 from include/linux/printk.h:9,
                 from include/asm-generic/bug.h:22,
                 from arch/m68k/include/asm/bug.h:32,
                 from include/linux/bug.h:5,
                 from include/linux/page-flags.h:10,
                 from kernel/bounds.c:10:
include/linux/bitops.h:72:21: error: ‘arch___set_bit’ undeclared here
(not in a function); did you mean ‘const___set_bit’?
   72 |         __same_type(arch_##name, generic_##name) && \
      |                     ^~~~~
include/linux/build_bug.h:78:56: note: in definition of macro ‘__static_assert’
   78 | #define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
      |                                                        ^~~~
include/linux/bitops.h:71:2: note: in expansion of macro ‘static_assert’
   71 |  static_assert(__same_type(const_##name, generic_##name) && \
      |  ^~~~~~~~~~~~~
include/linux/bitops.h:72:9: note: in expansion of macro ‘__same_type’
   72 |         __same_type(arch_##name, generic_##name) && \
      |         ^~~~~~~~~~~
include/linux/bitops.h:75:1: note: in expansion of macro ‘__check_bitop_pr’
   75 | __check_bitop_pr(__set_bit);
      | ^~~~~~~~~~~~~~~~
include/linux/compiler_types.h:293:27: error: expression in static
assertion is not an integer
  293 | #define __same_type(a, b)
__builtin_types_compatible_p(typeof(a), typeof(b))
      |                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:78:56: note: in definition of macro ‘__static_assert’
   78 | #define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
      |                                                        ^~~~
include/linux/bitops.h:71:2: note: in expansion of macro ‘static_assert’
   71 |  static_assert(__same_type(const_##name, generic_##name) && \
      |  ^~~~~~~~~~~~~
include/linux/bitops.h:71:16: note: in expansion of macro ‘__same_type’
   71 |  static_assert(__same_type(const_##name, generic_##name) && \
      |                ^~~~~~~~~~~
include/linux/bitops.h:75:1: note: in expansion of macro ‘__check_bitop_pr’
   75 | __check_bitop_pr(__set_bit);
      | ^~~~~~~~~~~~~~~~
include/linux/bitops.h:72:21: error: ‘arch___clear_bit’ undeclared
here (not in a function); did you mean ‘const___clear_bit’?
   72 |         __same_type(arch_##name, generic_##name) && \
      |                     ^~~~~
include/linux/build_bug.h:78:56: note: in definition of macro ‘__static_assert’
   78 | #define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
      |                                                        ^~~~
include/linux/bitops.h:71:2: note: in expansion of macro ‘static_assert’
   71 |  static_assert(__same_type(const_##name, generic_##name) && \
      |  ^~~~~~~~~~~~~
include/linux/bitops.h:72:9: note: in expansion of macro ‘__same_type’
   72 |         __same_type(arch_##name, generic_##name) && \
      |         ^~~~~~~~~~~
include/linux/bitops.h:76:1: note: in expansion of macro ‘__check_bitop_pr’
   76 | __check_bitop_pr(__clear_bit);
      | ^~~~~~~~~~~~~~~~
include/linux/compiler_types.h:293:27: error: expression in static
assertion is not an integer
  293 | #define __same_type(a, b)
__builtin_types_compatible_p(typeof(a), typeof(b))
      |                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:78:56: note: in definition of macro ‘__static_assert’
   78 | #define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
      |                                                        ^~~~
include/linux/bitops.h:71:2: note: in expansion of macro ‘static_assert’
   71 |  static_assert(__same_type(const_##name, generic_##name) && \
      |  ^~~~~~~~~~~~~
include/linux/bitops.h:71:16: note: in expansion of macro ‘__same_type’
   71 |  static_assert(__same_type(const_##name, generic_##name) && \
      |                ^~~~~~~~~~~
include/linux/bitops.h:76:1: note: in expansion of macro ‘__check_bitop_pr’
   76 | __check_bitop_pr(__clear_bit);
      | ^~~~~~~~~~~~~~~~
include/linux/bitops.h:72:21: error: ‘arch___change_bit’ undeclared
here (not in a function); did you mean ‘const___change_bit’?
   72 |         __same_type(arch_##name, generic_##name) && \
      |                     ^~~~~
include/linux/build_bug.h:78:56: note: in definition of macro ‘__static_assert’
   78 | #define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
      |                                                        ^~~~
include/linux/bitops.h:71:2: note: in expansion of macro ‘static_assert’
   71 |  static_assert(__same_type(const_##name, generic_##name) && \
      |  ^~~~~~~~~~~~~
include/linux/bitops.h:72:9: note: in expansion of macro ‘__same_type’
   72 |         __same_type(arch_##name, generic_##name) && \
      |         ^~~~~~~~~~~
include/linux/bitops.h:77:1: note: in expansion of macro ‘__check_bitop_pr’
   77 | __check_bitop_pr(__change_bit);
      | ^~~~~~~~~~~~~~~~
include/linux/compiler_types.h:293:27: error: expression in static
assertion is not an integer
  293 | #define __same_type(a, b)
__builtin_types_compatible_p(typeof(a), typeof(b))
      |                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:78:56: note: in definition of macro ‘__static_assert’
   78 | #define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
      |                                                        ^~~~
include/linux/bitops.h:71:2: note: in expansion of macro ‘static_assert’
   71 |  static_assert(__same_type(const_##name, generic_##name) && \
      |  ^~~~~~~~~~~~~
include/linux/bitops.h:71:16: note: in expansion of macro ‘__same_type’
   71 |  static_assert(__same_type(const_##name, generic_##name) && \
      |                ^~~~~~~~~~~
include/linux/bitops.h:77:1: note: in expansion of macro ‘__check_bitop_pr’
   77 | __check_bitop_pr(__change_bit);
      | ^~~~~~~~~~~~~~~~
include/linux/bitops.h:72:21: error: ‘arch___test_and_set_bit’
undeclared here (not in a function); did you mean
‘const___test_and_set_bit’?
   72 |         __same_type(arch_##name, generic_##name) && \
      |                     ^~~~~
include/linux/build_bug.h:78:56: note: in definition of macro ‘__static_assert’
   78 | #define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
      |                                                        ^~~~
include/linux/bitops.h:71:2: note: in expansion of macro ‘static_assert’
   71 |  static_assert(__same_type(const_##name, generic_##name) && \
      |  ^~~~~~~~~~~~~
include/linux/bitops.h:72:9: note: in expansion of macro ‘__same_type’
   72 |         __same_type(arch_##name, generic_##name) && \
      |         ^~~~~~~~~~~
include/linux/bitops.h:78:1: note: in expansion of macro ‘__check_bitop_pr’
   78 | __check_bitop_pr(__test_and_set_bit);
      | ^~~~~~~~~~~~~~~~
include/linux/compiler_types.h:293:27: error: expression in static
assertion is not an integer
  293 | #define __same_type(a, b)
__builtin_types_compatible_p(typeof(a), typeof(b))
      |                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:78:56: note: in definition of macro ‘__static_assert’
   78 | #define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
      |                                                        ^~~~
include/linux/bitops.h:71:2: note: in expansion of macro ‘static_assert’
   71 |  static_assert(__same_type(const_##name, generic_##name) && \
      |  ^~~~~~~~~~~~~
include/linux/bitops.h:71:16: note: in expansion of macro ‘__same_type’
   71 |  static_assert(__same_type(const_##name, generic_##name) && \
      |                ^~~~~~~~~~~
include/linux/bitops.h:78:1: note: in expansion of macro ‘__check_bitop_pr’
   78 | __check_bitop_pr(__test_and_set_bit);
      | ^~~~~~~~~~~~~~~~
include/linux/bitops.h:72:21: error: ‘arch___test_and_clear_bit’
undeclared here (not in a function); did you mean
‘const___test_and_clear_bit’?
   72 |         __same_type(arch_##name, generic_##name) && \
      |                     ^~~~~
include/linux/build_bug.h:78:56: note: in definition of macro ‘__static_assert’
   78 | #define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
      |                                                        ^~~~
include/linux/bitops.h:71:2: note: in expansion of macro ‘static_assert’
   71 |  static_assert(__same_type(const_##name, generic_##name) && \
      |  ^~~~~~~~~~~~~
include/linux/bitops.h:72:9: note: in expansion of macro ‘__same_type’
   72 |         __same_type(arch_##name, generic_##name) && \
      |         ^~~~~~~~~~~
include/linux/bitops.h:79:1: note: in expansion of macro ‘__check_bitop_pr’
   79 | __check_bitop_pr(__test_and_clear_bit);
      | ^~~~~~~~~~~~~~~~
include/linux/compiler_types.h:293:27: error: expression in static
assertion is not an integer
  293 | #define __same_type(a, b)
__builtin_types_compatible_p(typeof(a), typeof(b))
      |                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:78:56: note: in definition of macro ‘__static_assert’
   78 | #define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
      |                                                        ^~~~
include/linux/bitops.h:71:2: note: in expansion of macro ‘static_assert’
   71 |  static_assert(__same_type(const_##name, generic_##name) && \
      |  ^~~~~~~~~~~~~
include/linux/bitops.h:71:16: note: in expansion of macro ‘__same_type’
   71 |  static_assert(__same_type(const_##name, generic_##name) && \
      |                ^~~~~~~~~~~
include/linux/bitops.h:79:1: note: in expansion of macro ‘__check_bitop_pr’
   79 | __check_bitop_pr(__test_and_clear_bit);
      | ^~~~~~~~~~~~~~~~
include/linux/bitops.h:72:21: error: ‘arch___test_and_change_bit’
undeclared here (not in a function); did you mean
‘const___test_and_change_bit’?
   72 |         __same_type(arch_##name, generic_##name) && \
      |                     ^~~~~
include/linux/build_bug.h:78:56: note: in definition of macro ‘__static_assert’
   78 | #define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
      |                                                        ^~~~
include/linux/bitops.h:71:2: note: in expansion of macro ‘static_assert’
   71 |  static_assert(__same_type(const_##name, generic_##name) && \
      |  ^~~~~~~~~~~~~
include/linux/bitops.h:72:9: note: in expansion of macro ‘__same_type’
   72 |         __same_type(arch_##name, generic_##name) && \
      |         ^~~~~~~~~~~
include/linux/bitops.h:80:1: note: in expansion of macro ‘__check_bitop_pr’
   80 | __check_bitop_pr(__test_and_change_bit);
      | ^~~~~~~~~~~~~~~~
include/linux/compiler_types.h:293:27: error: expression in static
assertion is not an integer
  293 | #define __same_type(a, b)
__builtin_types_compatible_p(typeof(a), typeof(b))
      |                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:78:56: note: in definition of macro ‘__static_assert’
   78 | #define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
      |                                                        ^~~~
include/linux/bitops.h:71:2: note: in expansion of macro ‘static_assert’
   71 |  static_assert(__same_type(const_##name, generic_##name) && \
      |  ^~~~~~~~~~~~~
include/linux/bitops.h:71:16: note: in expansion of macro ‘__same_type’
   71 |  static_assert(__same_type(const_##name, generic_##name) && \
      |                ^~~~~~~~~~~
include/linux/bitops.h:80:1: note: in expansion of macro ‘__check_bitop_pr’
   80 | __check_bitop_pr(__test_and_change_bit);
      | ^~~~~~~~~~~~~~~~
include/linux/bitops.h:72:21: error: ‘arch_test_bit’ undeclared here
(not in a function); did you mean ‘_test_bit’?
   72 |         __same_type(arch_##name, generic_##name) && \
      |                     ^~~~~
include/linux/build_bug.h:78:56: note: in definition of macro ‘__static_assert’
   78 | #define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
      |                                                        ^~~~
include/linux/bitops.h:71:2: note: in expansion of macro ‘static_assert’
   71 |  static_assert(__same_type(const_##name, generic_##name) && \
      |  ^~~~~~~~~~~~~
include/linux/bitops.h:72:9: note: in expansion of macro ‘__same_type’
   72 |         __same_type(arch_##name, generic_##name) && \
      |         ^~~~~~~~~~~
include/linux/bitops.h:81:1: note: in expansion of macro ‘__check_bitop_pr’
   81 | __check_bitop_pr(test_bit);
      | ^~~~~~~~~~~~~~~~
include/linux/compiler_types.h:293:27: error: expression in static
assertion is not an integer
  293 | #define __same_type(a, b)
__builtin_types_compatible_p(typeof(a), typeof(b))
      |                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:78:56: note: in definition of macro ‘__static_assert’
   78 | #define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
      |                                                        ^~~~
include/linux/bitops.h:71:2: note: in expansion of macro ‘static_assert’
   71 |  static_assert(__same_type(const_##name, generic_##name) && \
      |  ^~~~~~~~~~~~~
include/linux/bitops.h:71:16: note: in expansion of macro ‘__same_type’
   71 |  static_assert(__same_type(const_##name, generic_##name) && \
      |                ^~~~~~~~~~~
include/linux/bitops.h:81:1: note: in expansion of macro ‘__check_bitop_pr’
   81 | __check_bitop_pr(test_bit);
      | ^~~~~~~~~~~~~~~~

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 2/6] bitops: always define asm-generic non-atomic bitops
  2022-06-10 16:32       ` Marco Elver
@ 2022-06-13 14:19         ` Alexander Lobakin
  2022-06-13 14:33           ` Marco Elver
  2022-06-13 16:26           ` Luck, Tony
  0 siblings, 2 replies; 30+ messages in thread
From: Alexander Lobakin @ 2022-06-13 14:19 UTC (permalink / raw)
  To: Marco Elver
  Cc: Alexander Lobakin, Tony Luck, Andy Shevchenko, Arnd Bergmann,
	Yury Norov, Mark Rutland, Matt Turner, Brian Cain,
	Geert Uytterhoeven, Yoshinori Sato, Rich Felker, David S. Miller,
	Kees Cook, Peter Zijlstra (Intel),
	Borislav Petkov, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

From: Marco Elver <elver@google.com>
Date: Fri, 10 Jun 2022 18:32:36 +0200

> On Fri, 10 Jun 2022 at 18:02, Luck, Tony <tony.luck@intel.com> wrote:
> >
> > > > +/**
> > > > + * generic_test_bit - Determine whether a bit is set
> > > > + * @nr: bit number to test
> > > > + * @addr: Address to start counting from
> > > > + */
> > >
> > > Shouldn't we add in this or in separate patch a big NOTE to explain that this
> > > is actually atomic and must be kept as a such?
> >
> > "atomic" isn't really the right word. The volatile access makes sure that the
> > compiler does the test at the point that the source code asked, and doesn't
> > move it before/after other operations.
> 
> It's listed in Documentation/atomic_bitops.txt.

Oh, so my memory was actually correct that I saw it in the docs
somewhere.
WDYT, should I mention this here in the code (block comment) as well
that it's atomic and must not lose `volatile` as Andy suggested or
it's sufficient to have it in the docs (+ it's not underscored)?

> 
> It is as "atomic" as READ_ONCE() or atomic_read() is. Though you are
> right that the "atomicity" of reading one bit is almost a given,
> because we can't really read half a bit.
> The main thing is that the compiler keeps it "atomic" and e.g. doesn't
> fuse the load with another or elide it completely, and then transforms
> the code in concurrency-unfriendly ways.
> 
> Like READ_ONCE() and friends, test_bit(), unlike non-atomic bitops,
> may also be used to dependency-order some subsequent marked (viz.
> atomic) operations.
> 
> > But there is no such thing as an atomic test_bit() operation:
> >
> >         if (test_bit(5, addr)) {
> >                 /* some other CPU nukes bit 5 */
> >
> >                 /* I know it was set when I looked, but now, could be anything */
> 
> The operation itself is atomic, because reading half a bit is
> impossible. Whether or not that bit is modified concurrently is a
> different problem.
> 
> Thanks,
> -- Marco

Thanks,
Olek

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 0/6] bitops: let optimize out non-atomic bitops on compile-time constants
  2022-06-13  7:35 ` [PATCH v2 0/6] " Geert Uytterhoeven
@ 2022-06-13 14:26   ` Alexander Lobakin
  2022-06-13 15:22     ` Geert Uytterhoeven
  0 siblings, 1 reply; 30+ messages in thread
From: Alexander Lobakin @ 2022-06-13 14:26 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Alexander Lobakin, Arnd Bergmann, Yury Norov, Andy Shevchenko,
	Mark Rutland, Matt Turner, Brian Cain, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	alpha, open list:QUALCOMM HEXAGON...,
	linux-ia64, linux-m68k, Linux-sh list, sparclinux, Linux-Arch,
	Linux Kernel Mailing List

From: Geert Uytterhoeven <geert@linux-m68k.org>
Date: Mon, 13 Jun 2022 09:35:16 +0200

> Hi Alexander,
> 
> On Fri, Jun 10, 2022 at 1:35 PM Alexander Lobakin
> <alexandr.lobakin@intel.com> wrote:
> > While I was working on converting some structure fields from a fixed
> > type to a bitmap, I started observing code size increase not only in
> > places where the code works with the converted structure fields, but
> > also where the converted vars were on the stack. That said, the
> > following code:
> >
> >         DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
> >         unsigned long bar = BIT(BAR_BIT);
> >         unsigned long baz = 0;
> >
> >         __set_bit(FOO_BIT, foo);
> >         baz |=3D BIT(BAZ_BIT);
> >
> >         BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
> >         BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
> >         BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
> >
> > triggers the first assertion on x86_64, which means that the
> > compiler is unable to evaluate it to a compile-time initializer
> > when the architecture-specific bitop is used even if it's obvious.
> > I found that this is due to that many architecture-specific
> > non-atomic bitop implementations use inline asm or other hacks which
> > are faster or more robust when working with "real" variables (i.e.
> > fields from the structures etc.), but the compilers have no clue how
> > to optimize them out when called on compile-time constants.
> >
> > So, in order to let the compiler optimize out such cases, expand the
> > test_bit() and __*_bit() definitions with a compile-time condition
> > check, so that they will pick the generic C non-atomic bitop
> > implementations when all of the arguments passed are compile-time
> > constants, which means that the result will be a compile-time
> > constant as well and the compiler will produce more efficient and
> > simple code in 100% cases (no changes when there's at least one
> > non-compile-time-constant argument).
> > The condition itself:
> >
> > if (
> > __builtin_constant_p(nr) &&     /* <- bit position is constant */
> > __builtin_constant_p(!!addr) && /* <- compiler knows bitmap addr is
> >                                       always either NULL or not */
> > addr &&                         /* <- bitmap addr is not NULL */
> > __builtin_constant_p(*addr)     /* <- compiler knows the value of
> >                                       the target bitmap */
> > )
> >         /* then pick the generic C variant
> > else
> >         /* old code path, arch-specific
> >
> > I also tried __is_constexpr() as suggested by Andy, but it was
> > always returning 0 ('not a constant') for the 2,3 and 4th
> > conditions.
> >
> > The savings are architecture, compiler and compiler flags dependent,
> > for example, on x86_64 -O2:
> >
> > GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> > LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> > LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
> >
> > and ARM64 (courtesy of Mark[0]):
> >
> > GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> > LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
> >
> > And the following:
> >
> >         DECLARE_BITMAP(flags, __IP_TUNNEL_FLAG_NUM) = { };
> >         __be16 flags;
> >
> >         __set_bit(IP_TUNNEL_CSUM_BIT, flags);
> >
> >         tun_flags = cpu_to_be16(*flags & U16_MAX);
> >
> >         if (test_bit(IP_TUNNEL_VTI_BIT, flags))
> >                 tun_flags |= VTI_ISVTI;
> >
> >         BUILD_BUG_ON(!__builtin_constant_p(tun_flags));
> >
> > doesn't blow up anymore, so that we now can e.g. use fixed bitmaps
> > in compile-time assertions etc.
> >
> > The series has been in intel-next for a while with no reported issues.
> >
> > From v1[1]:
> > * change 'gen_' prefixes to '_generic' to disambiguate from
> >   'generated' etc. (Mark);
> > * define a separate 'const_' set to use in the optimization to keep
> >   the generic test_bit() atomic-safe (Marco);
> > * unify arch_{test,__*}_bit() as well and include them in the type
> >   check;
> > * add more relevant and up-to-date bloat-o-meter results, including
> >   ARM64 (me, Mark);
> > * pick a couple '*-by' tags (Mark, Yury).
> 
> Thanks for the update!
> 
> On m68k, using gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04), this
> blows up immediately with:

Yeah I saw the kernel bot report already, sorry for that >_< Fixed
in v3 already, will send in 1-2 days.

> 
>   CC      kernel/bounds.s
> In file included from include/linux/bits.h:22,

[...]

>       | ^~~~~~~~~~~~~~~~
> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds

Thanks,
Olek

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 4/6] bitops: define const_*() versions of the non-atomics
  2022-06-10 13:56   ` Andy Shevchenko
@ 2022-06-13 14:30     ` Alexander Lobakin
  0 siblings, 0 replies; 30+ messages in thread
From: Alexander Lobakin @ 2022-06-13 14:30 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Alexander Lobakin, Arnd Bergmann, Yury Norov, Mark Rutland,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

From: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Date: Fri, 10 Jun 2022 16:56:28 +0300

> On Fri, Jun 10, 2022 at 01:34:25PM +0200, Alexander Lobakin wrote:
> > Define const_*() variants of the non-atomic bitops to be used when
> > the input arguments are compile-time constants, so that the compiler
> > will be always to resolve those to compile-time constants as well.
> > Those are mostly direct aliases for generic_*() with one exception
> > for const_test_bit(): the original one is declared atomic-safe and
> > thus doesn't discard the `volatile` qualifier, so in order to let
> > optimize the code, define it separately disregarding the qualifier.
> > Add them to the compile-time type checks as well just in case.
> 
> ...
> 
> >  /* Check that the bitops prototypes are sane */
> >  #define __check_bitop_pr(name)						\
> > -	static_assert(__same_type(arch_##name, generic_##name) &&	\
> > +	static_assert(__same_type(const_##name, generic_##name) &&	\
> > +		      __same_type(arch_##name, generic_##name) &&	\
> >  		      __same_type(name, generic_##name))
> 
> Can't it be a one line change and actually keeping ordering at the same time?

Sure. Wanted to sort them "semantically", but it doesn't really make
any sense in here.

> 
> -- 
> With Best Regards,
> Andy Shevchenko

Thanks,
Olek

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 2/6] bitops: always define asm-generic non-atomic bitops
  2022-06-13 14:19         ` Alexander Lobakin
@ 2022-06-13 14:33           ` Marco Elver
  2022-06-15  2:47             ` Yury Norov
  2022-06-13 16:26           ` Luck, Tony
  1 sibling, 1 reply; 30+ messages in thread
From: Marco Elver @ 2022-06-13 14:33 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Tony Luck, Andy Shevchenko, Arnd Bergmann, Yury Norov,
	Mark Rutland, Matt Turner, Brian Cain, Geert Uytterhoeven,
	Yoshinori Sato, Rich Felker, David S. Miller, Kees Cook,
	Peter Zijlstra (Intel),
	Borislav Petkov, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

On Mon, 13 Jun 2022 at 16:21, Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
>
> From: Marco Elver <elver@google.com>
> Date: Fri, 10 Jun 2022 18:32:36 +0200
>
> > On Fri, 10 Jun 2022 at 18:02, Luck, Tony <tony.luck@intel.com> wrote:
> > >
> > > > > +/**
> > > > > + * generic_test_bit - Determine whether a bit is set
> > > > > + * @nr: bit number to test
> > > > > + * @addr: Address to start counting from
> > > > > + */
> > > >
> > > > Shouldn't we add in this or in separate patch a big NOTE to explain that this
> > > > is actually atomic and must be kept as a such?
> > >
> > > "atomic" isn't really the right word. The volatile access makes sure that the
> > > compiler does the test at the point that the source code asked, and doesn't
> > > move it before/after other operations.
> >
> > It's listed in Documentation/atomic_bitops.txt.
>
> Oh, so my memory was actually correct that I saw it in the docs
> somewhere.
> WDYT, should I mention this here in the code (block comment) as well
> that it's atomic and must not lose `volatile` as Andy suggested or
> it's sufficient to have it in the docs (+ it's not underscored)?

Perhaps a quick comment in the code (not kerneldoc above) will be
sufficient, with reference to Documentation/atomic_bitops.txt.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 0/6] bitops: let optimize out non-atomic bitops on compile-time constants
  2022-06-13 14:26   ` Alexander Lobakin
@ 2022-06-13 15:22     ` Geert Uytterhoeven
  2022-06-15 14:17       ` Alexander Lobakin
  0 siblings, 1 reply; 30+ messages in thread
From: Geert Uytterhoeven @ 2022-06-13 15:22 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Mark Rutland,
	Matt Turner, Brian Cain, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	alpha, open list:QUALCOMM HEXAGON...,
	linux-ia64, linux-m68k, Linux-sh list, sparclinux, Linux-Arch,
	Linux Kernel Mailing List

Hi Olek,

On Mon, Jun 13, 2022 at 4:28 PM Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
> From: Geert Uytterhoeven <geert@linux-m68k.org>
> > On Fri, Jun 10, 2022 at 1:35 PM Alexander Lobakin
> > <alexandr.lobakin@intel.com> wrote:
> > On m68k, using gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04), this
> > blows up immediately with:
>
> Yeah I saw the kernel bot report already, sorry for that >_< Fixed
> in v3 already, will send in 1-2 days.

Is it simple to fix?
I might be able to give the fixed v2 a try before that.
Thanks!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2 2/6] bitops: always define asm-generic non-atomic bitops
  2022-06-13 14:19         ` Alexander Lobakin
  2022-06-13 14:33           ` Marco Elver
@ 2022-06-13 16:26           ` Luck, Tony
  2022-06-13 21:29             ` David Laight
  1 sibling, 1 reply; 30+ messages in thread
From: Luck, Tony @ 2022-06-13 16:26 UTC (permalink / raw)
  To: Lobakin, Alexandr, Marco Elver
  Cc: Andy Shevchenko, Arnd Bergmann, Yury Norov, Mark Rutland,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Borislav Petkov, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

>> It's listed in Documentation/atomic_bitops.txt.
>
> Oh, so my memory was actually correct that I saw it in the docs
> somewhere.
> WDYT, should I mention this here in the code (block comment) as well
> that it's atomic and must not lose `volatile` as Andy suggested or
> it's sufficient to have it in the docs (+ it's not underscored)?

I think a comment that the "volatile" is required to prevent re-ordering
is enough.

But maybe others are sufficiently clear on the meaning? I once wasted
time looking for the non-atomic __test_bit() version (to use in some code
that was already protected by a spin lock, so didn't need the overhead
of an "atomic" version) before realizing there wasn't a non-atomic one.

-Tony

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2 2/6] bitops: always define asm-generic non-atomic bitops
  2022-06-13 16:26           ` Luck, Tony
@ 2022-06-13 21:29             ` David Laight
  0 siblings, 0 replies; 30+ messages in thread
From: David Laight @ 2022-06-13 21:29 UTC (permalink / raw)
  To: 'Luck, Tony', Lobakin, Alexandr, Marco Elver
  Cc: Andy Shevchenko, Arnd Bergmann, Yury Norov, Mark Rutland,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Borislav Petkov, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

From: Luck, Tony
> Sent: 13 June 2022 17:27
> 
> >> It's listed in Documentation/atomic_bitops.txt.
> >
> > Oh, so my memory was actually correct that I saw it in the docs
> > somewhere.
> > WDYT, should I mention this here in the code (block comment) as well
> > that it's atomic and must not lose `volatile` as Andy suggested or
> > it's sufficient to have it in the docs (+ it's not underscored)?
> 
> I think a comment that the "volatile" is required to prevent re-ordering
> is enough.
> 
> But maybe others are sufficiently clear on the meaning? I once wasted
> time looking for the non-atomic __test_bit() version (to use in some code
> that was already protected by a spin lock, so didn't need the overhead
> of an "atomic" version) before realizing there wasn't a non-atomic one.

Does it make any sense for 'test bit' to be atomic?

I'm not even sure is needs any ordering constraints either.
The result is always stale - the value can be changed by
another cpu at any time.

The set/clear atomic bit-ops require a RMW bus cycle - which has
to be locked (or similar) to avoid corruption.

The atomic 'test and set' (etc) are RMW and return a valid state.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 2/6] bitops: always define asm-generic non-atomic bitops
  2022-06-13 14:33           ` Marco Elver
@ 2022-06-15  2:47             ` Yury Norov
  2022-06-15  7:46               ` Marco Elver
  0 siblings, 1 reply; 30+ messages in thread
From: Yury Norov @ 2022-06-15  2:47 UTC (permalink / raw)
  To: Marco Elver
  Cc: Alexander Lobakin, Tony Luck, Andy Shevchenko, Arnd Bergmann,
	Mark Rutland, Matt Turner, Brian Cain, Geert Uytterhoeven,
	Yoshinori Sato, Rich Felker, David S. Miller, Kees Cook,
	Peter Zijlstra (Intel),
	Borislav Petkov, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

On Mon, Jun 13, 2022 at 04:33:17PM +0200, Marco Elver wrote:
> On Mon, 13 Jun 2022 at 16:21, Alexander Lobakin
> <alexandr.lobakin@intel.com> wrote:
> >
> > From: Marco Elver <elver@google.com>
> > Date: Fri, 10 Jun 2022 18:32:36 +0200
> >
> > > On Fri, 10 Jun 2022 at 18:02, Luck, Tony <tony.luck@intel.com> wrote:
> > > >
> > > > > > +/**
> > > > > > + * generic_test_bit - Determine whether a bit is set
> > > > > > + * @nr: bit number to test
> > > > > > + * @addr: Address to start counting from
> > > > > > + */
> > > > >
> > > > > Shouldn't we add in this or in separate patch a big NOTE to explain that this
> > > > > is actually atomic and must be kept as a such?
> > > >
> > > > "atomic" isn't really the right word. The volatile access makes sure that the
> > > > compiler does the test at the point that the source code asked, and doesn't
> > > > move it before/after other operations.
> > >
> > > It's listed in Documentation/atomic_bitops.txt.
> >
> > Oh, so my memory was actually correct that I saw it in the docs
> > somewhere.
> > WDYT, should I mention this here in the code (block comment) as well
> > that it's atomic and must not lose `volatile` as Andy suggested or
> > it's sufficient to have it in the docs (+ it's not underscored)?
> 
> Perhaps a quick comment in the code (not kerneldoc above) will be
> sufficient, with reference to Documentation/atomic_bitops.txt.

If it may help, we can do:

/*
 * Bit testing is a naturally atomic operation because bit is 
 * a minimal quantum of information.
 */
#define __test_bit test_bit


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 4/6] bitops: define const_*() versions of the non-atomics
  2022-06-10 11:34 ` [PATCH v2 4/6] bitops: define const_*() versions of the non-atomics Alexander Lobakin
  2022-06-10 13:56   ` Andy Shevchenko
@ 2022-06-15  2:57   ` Yury Norov
  2022-06-15 13:55     ` Alexander Lobakin
  1 sibling, 1 reply; 30+ messages in thread
From: Yury Norov @ 2022-06-15  2:57 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

On Fri, Jun 10, 2022 at 01:34:25PM +0200, Alexander Lobakin wrote:
> Define const_*() variants of the non-atomic bitops to be used when
> the input arguments are compile-time constants, so that the compiler
> will be always to resolve those to compile-time constants as well.

will be always able?

> Those are mostly direct aliases for generic_*() with one exception
> for const_test_bit(): the original one is declared atomic-safe and
> thus doesn't discard the `volatile` qualifier, so in order to let
> optimize the code, define it separately disregarding the qualifier.
> Add them to the compile-time type checks as well just in case.
> 
> Suggested-by: Marco Elver <elver@google.com>
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> ---
>  .../asm-generic/bitops/generic-non-atomic.h   | 31 +++++++++++++++++++
>  include/linux/bitops.h                        |  3 +-
>  2 files changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> index 3ce0fa0ab35f..9a77babfff35 100644
> --- a/include/asm-generic/bitops/generic-non-atomic.h
> +++ b/include/asm-generic/bitops/generic-non-atomic.h
> @@ -121,4 +121,35 @@ generic_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
>  }
>  
> +/*
> + * const_*() definitions provide good compile-time optimizations when
> + * the passed arguments can be resolved at compile time.
> + */
> +#define const___set_bit			generic___set_bit
> +#define const___clear_bit		generic___clear_bit
> +#define const___change_bit		generic___change_bit
> +#define const___test_and_set_bit	generic___test_and_set_bit
> +#define const___test_and_clear_bit	generic___test_and_clear_bit
> +#define const___test_and_change_bit	generic___test_and_change_bit
> +
> +/**
> + * const_test_bit - Determine whether a bit is set
> + * @nr: bit number to test
> + * @addr: Address to start counting from
> + *
> + * A version of generic_test_bit() which discards the `volatile` qualifier to
> + * allow the compiler to optimize code harder. Non-atomic and to be used only
> + * for testing compile-time constants, e.g. from the corresponding macro, or
> + * when you really know what you are doing.

Not sure I understand the last sentence... Can you please rephrase?

> + */
> +static __always_inline bool
> +const_test_bit(unsigned long nr, const volatile unsigned long *addr)
> +{
> +	const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
> +	unsigned long mask = BIT_MASK(nr);
> +	unsigned long val = *p;
> +
> +	return !!(val & mask);
> +}
> +
>  #endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index 87087454a288..51c22b8667b4 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -36,7 +36,8 @@ extern unsigned long __sw_hweight64(__u64 w);
>  
>  /* Check that the bitops prototypes are sane */
>  #define __check_bitop_pr(name)						\
> -	static_assert(__same_type(arch_##name, generic_##name) &&	\
> +	static_assert(__same_type(const_##name, generic_##name) &&	\
> +		      __same_type(arch_##name, generic_##name) &&	\
>  		      __same_type(name, generic_##name))
>  
>  __check_bitop_pr(__set_bit);
> -- 
> 2.36.1

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/6] ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr()
  2022-06-10 11:34 ` [PATCH v2 1/6] ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr() Alexander Lobakin
  2022-06-10 12:18   ` David Laight
  2022-06-10 13:46   ` Andy Shevchenko
@ 2022-06-15  2:59   ` Yury Norov
  2 siblings, 0 replies; 30+ messages in thread
From: Yury Norov @ 2022-06-15  2:59 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel, stable, kernel test robot

On Fri, Jun 10, 2022 at 01:34:22PM +0200, Alexander Lobakin wrote:
> test_bit(), as any other bitmap op, takes `unsigned long *` as a
> second argument (pointer to the actual bitmap), as any bitmap
> itself is an array of unsigned longs. However, the ia64_get_irr()
> code passes a ref to `u64` as a second argument.
> This works with the ia64 bitops implementation due to that they
> have `void *` as the second argument and then cast it later on.
> This works with the bitmap API itself due to that `unsigned long`
> has the same size on ia64 as `u64` (`unsigned long long`), but
> from the compiler PoV those two are different.
> Define @irr as `unsigned long` to fix that. That implies no
> functional changes. Has been hidden for 16 years!
> 
> Fixes: a58786917ce2 ("[IA64] avoid broken SAL_CACHE_FLUSH implementations")
> Cc: stable@vger.kernel.org # 2.6.16+
> Reported-by: kernel test robot <lkp@intel.com>
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>

Reviewed-by: Yury Norov <yury.norov@gmail.com>

> ---
>  arch/ia64/include/asm/processor.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h
> index 7cbce290f4e5..757c2f6d8d4b 100644
> --- a/arch/ia64/include/asm/processor.h
> +++ b/arch/ia64/include/asm/processor.h
> @@ -538,7 +538,7 @@ ia64_get_irr(unsigned int vector)
>  {
>  	unsigned int reg = vector / 64;
>  	unsigned int bit = vector % 64;
> -	u64 irr;
> +	unsigned long irr;
>  
>  	switch (reg) {
>  	case 0: irr = ia64_getreg(_IA64_REG_CR_IRR0); break;
> -- 
> 2.36.1

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 6/6] bitops: let optimize out non-atomic bitops on compile-time constants
  2022-06-10 11:34 ` [PATCH v2 6/6] bitops: let optimize out non-atomic bitops on compile-time constants Alexander Lobakin
@ 2022-06-15  3:26   ` Yury Norov
  2022-06-15 14:00     ` Alexander Lobakin
  0 siblings, 1 reply; 30+ messages in thread
From: Yury Norov @ 2022-06-15  3:26 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

Hi Alexander,

On Fri, Jun 10, 2022 at 01:34:27PM +0200, Alexander Lobakin wrote:
> Currently, many architecture-specific non-atomic bitop
> implementations use inline asm or other hacks which are faster or
> more robust when working with "real" variables (i.e. fields from
> the structures etc.), but the compilers have no clue how to optimize
> them out when called on compile-time constants. That said, the
> following code:
> 
> 	DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
> 	unsigned long bar = BIT(BAR_BIT);
> 	unsigned long baz = 0;
> 
> 	__set_bit(FOO_BIT, foo);
> 	baz |= BIT(BAZ_BIT);
> 
> 	BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
> 	BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
> 	BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));

Can you put this snippet into lib/test_bitops.c?

Thanks,
Yury

> triggers the first assertion on x86_64, which means that the
> compiler is unable to evaluate it to a compile-time initializer
> when the architecture-specific bitop is used even if it's obvious.
> In order to let the compiler optimize out such cases, expand the
> bitop() macro to use the "constant" C non-atomic bitop
> implementations when all of the arguments passed are compile-time
> constants, which means that the result will be a compile-time
> constant as well, so that it produces more efficient and simple
> code in 100% cases, comparing to the architecture-specific
> counterparts.
> 
> The savings are architecture, compiler and compiler flags dependent,
> for example, on x86_64 -O2:
> 
> GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
> 
> and ARM64 (courtesy of Mark):
> 
> GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> ---
>  include/linux/bitops.h | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index 753f98e0dcf5..364bdc3606b4 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -33,8 +33,24 @@ extern unsigned long __sw_hweight64(__u64 w);
>  
>  #include <asm-generic/bitops/generic-non-atomic.h>
>  
> +/*
> + * Many architecture-specific non-atomic bitops contain inline asm code and due
> + * to that the compiler can't optimize them to compile-time expressions or
> + * constants. In contrary, gen_*() helpers are defined in pure C and compilers
> + * optimize them just well.
> + * Therefore, to make `unsigned long foo = 0; __set_bit(BAR, &foo)` effectively
> + * equal to `unsigned long foo = BIT(BAR)`, pick the generic C alternative when
> + * the arguments can be resolved at compile time. That expression itself is a
> + * constant and doesn't bring any functional changes to the rest of cases.
> + * The casts to `uintptr_t` are needed to mitigate `-Waddress` warnings when
> + * passing a bitmap from .bss or .data (-> `!!addr` is always true).
> + */
>  #define bitop(op, nr, addr)						\
> -	op(nr, addr)
> +	((__builtin_constant_p(nr) &&					\
> +	  __builtin_constant_p((uintptr_t)(addr) != (uintptr_t)NULL) &&	\
> +	  (uintptr_t)(addr) != (uintptr_t)NULL &&			\
> +	  __builtin_constant_p(*(const unsigned long *)(addr))) ?	\
> +	 const##op(nr, addr) : op(nr, addr))
>  
>  #define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
>  #define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
> -- 
> 2.36.1

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 2/6] bitops: always define asm-generic non-atomic bitops
  2022-06-15  2:47             ` Yury Norov
@ 2022-06-15  7:46               ` Marco Elver
  0 siblings, 0 replies; 30+ messages in thread
From: Marco Elver @ 2022-06-15  7:46 UTC (permalink / raw)
  To: Yury Norov
  Cc: Alexander Lobakin, Tony Luck, Andy Shevchenko, Arnd Bergmann,
	Mark Rutland, Matt Turner, Brian Cain, Geert Uytterhoeven,
	Yoshinori Sato, Rich Felker, David S. Miller, Kees Cook,
	Peter Zijlstra (Intel),
	Borislav Petkov, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

On Wed, 15 Jun 2022 at 04:47, Yury Norov <yury.norov@gmail.com> wrote:
>
> On Mon, Jun 13, 2022 at 04:33:17PM +0200, Marco Elver wrote:
> > On Mon, 13 Jun 2022 at 16:21, Alexander Lobakin
> > <alexandr.lobakin@intel.com> wrote:
> > >
> > > From: Marco Elver <elver@google.com>
> > > Date: Fri, 10 Jun 2022 18:32:36 +0200
> > >
> > > > On Fri, 10 Jun 2022 at 18:02, Luck, Tony <tony.luck@intel.com> wrote:
> > > > >
> > > > > > > +/**
> > > > > > > + * generic_test_bit - Determine whether a bit is set
> > > > > > > + * @nr: bit number to test
> > > > > > > + * @addr: Address to start counting from
> > > > > > > + */
> > > > > >
> > > > > > Shouldn't we add in this or in separate patch a big NOTE to explain that this
> > > > > > is actually atomic and must be kept as a such?
> > > > >
> > > > > "atomic" isn't really the right word. The volatile access makes sure that the
> > > > > compiler does the test at the point that the source code asked, and doesn't
> > > > > move it before/after other operations.
> > > >
> > > > It's listed in Documentation/atomic_bitops.txt.
> > >
> > > Oh, so my memory was actually correct that I saw it in the docs
> > > somewhere.
> > > WDYT, should I mention this here in the code (block comment) as well
> > > that it's atomic and must not lose `volatile` as Andy suggested or
> > > it's sufficient to have it in the docs (+ it's not underscored)?
> >
> > Perhaps a quick comment in the code (not kerneldoc above) will be
> > sufficient, with reference to Documentation/atomic_bitops.txt.
>
> If it may help, we can do:
>
> /*
>  * Bit testing is a naturally atomic operation because bit is
>  * a minimal quantum of information.
>  */
> #define __test_bit test_bit

That's redundant and we'll end up with a random mix of both.

What'd be more interesting is having a __test_bit without the volatile
that allows compilers to optimize things more. But I think that also
becomes mostly redundant with the optimizations that this series seeks
out to do.

The distinction is ever so subtle, and clever compilers *will* break
concurrent code in ways that are rather hard to imagine:
https://lwn.net/Articles/793253/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 4/6] bitops: define const_*() versions of the non-atomics
  2022-06-15  2:57   ` Yury Norov
@ 2022-06-15 13:55     ` Alexander Lobakin
  2022-06-15 15:52       ` David Laight
  0 siblings, 1 reply; 30+ messages in thread
From: Alexander Lobakin @ 2022-06-15 13:55 UTC (permalink / raw)
  To: Yury Norov
  Cc: Alexander Lobakin, Arnd Bergmann, Andy Shevchenko, Mark Rutland,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

From: Yury Norov <yury.norov@gmail.com>
Date: Tue, 14 Jun 2022 19:57:43 -0700

> On Fri, Jun 10, 2022 at 01:34:25PM +0200, Alexander Lobakin wrote:
> > Define const_*() variants of the non-atomic bitops to be used when
> > the input arguments are compile-time constants, so that the compiler
> > will be always to resolve those to compile-time constants as well.
> 
> will be always able?

Right, ooops.

> 
> > Those are mostly direct aliases for generic_*() with one exception
> > for const_test_bit(): the original one is declared atomic-safe and
> > thus doesn't discard the `volatile` qualifier, so in order to let
> > optimize the code, define it separately disregarding the qualifier.
> > Add them to the compile-time type checks as well just in case.
> > 
> > Suggested-by: Marco Elver <elver@google.com>
> > Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> > ---
> >  .../asm-generic/bitops/generic-non-atomic.h   | 31 +++++++++++++++++++
> >  include/linux/bitops.h                        |  3 +-
> >  2 files changed, 33 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> > index 3ce0fa0ab35f..9a77babfff35 100644
> > --- a/include/asm-generic/bitops/generic-non-atomic.h
> > +++ b/include/asm-generic/bitops/generic-non-atomic.h
> > @@ -121,4 +121,35 @@ generic_test_bit(unsigned long nr, const volatile unsigned long *addr)
> >  	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> >  }
> >  
> > +/*
> > + * const_*() definitions provide good compile-time optimizations when
> > + * the passed arguments can be resolved at compile time.
> > + */
> > +#define const___set_bit			generic___set_bit
> > +#define const___clear_bit		generic___clear_bit
> > +#define const___change_bit		generic___change_bit
> > +#define const___test_and_set_bit	generic___test_and_set_bit
> > +#define const___test_and_clear_bit	generic___test_and_clear_bit
> > +#define const___test_and_change_bit	generic___test_and_change_bit
> > +
> > +/**
> > + * const_test_bit - Determine whether a bit is set
> > + * @nr: bit number to test
> > + * @addr: Address to start counting from
> > + *
> > + * A version of generic_test_bit() which discards the `volatile` qualifier to
> > + * allow the compiler to optimize code harder. Non-atomic and to be used only
> > + * for testing compile-time constants, e.g. from the corresponding macro, or
> > + * when you really know what you are doing.
> 
> Not sure I understand the last sentence... Can you please rephrase?

I basically want to tell that there potentinally might be cases for
using those outside of the actual macros from 6/6. But it might be
redundant at all to mention this.

> 
> > + */
> > +static __always_inline bool
> > +const_test_bit(unsigned long nr, const volatile unsigned long *addr)
> > +{
> > +	const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
> > +	unsigned long mask = BIT_MASK(nr);
> > +	unsigned long val = *p;
> > +
> > +	return !!(val & mask);
> > +}
> > +
> >  #endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
> > diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> > index 87087454a288..51c22b8667b4 100644
> > --- a/include/linux/bitops.h
> > +++ b/include/linux/bitops.h
> > @@ -36,7 +36,8 @@ extern unsigned long __sw_hweight64(__u64 w);
> >  
> >  /* Check that the bitops prototypes are sane */
> >  #define __check_bitop_pr(name)						\
> > -	static_assert(__same_type(arch_##name, generic_##name) &&	\
> > +	static_assert(__same_type(const_##name, generic_##name) &&	\
> > +		      __same_type(arch_##name, generic_##name) &&	\
> >  		      __same_type(name, generic_##name))
> >  
> >  __check_bitop_pr(__set_bit);
> > -- 
> > 2.36.1

Thanks,
Olek

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 6/6] bitops: let optimize out non-atomic bitops on compile-time constants
  2022-06-15  3:26   ` Yury Norov
@ 2022-06-15 14:00     ` Alexander Lobakin
  0 siblings, 0 replies; 30+ messages in thread
From: Alexander Lobakin @ 2022-06-15 14:00 UTC (permalink / raw)
  To: Yury Norov
  Cc: Alexander Lobakin, Arnd Bergmann, Andy Shevchenko, Mark Rutland,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

From: Yury Norov <yury.norov@gmail.com>
Date: Tue, 14 Jun 2022 20:26:54 -0700

> Hi Alexander,
> 
> On Fri, Jun 10, 2022 at 01:34:27PM +0200, Alexander Lobakin wrote:
> > Currently, many architecture-specific non-atomic bitop
> > implementations use inline asm or other hacks which are faster or
> > more robust when working with "real" variables (i.e. fields from
> > the structures etc.), but the compilers have no clue how to optimize
> > them out when called on compile-time constants. That said, the
> > following code:
> > 
> > 	DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
> > 	unsigned long bar = BIT(BAR_BIT);
> > 	unsigned long baz = 0;
> > 
> > 	__set_bit(FOO_BIT, foo);
> > 	baz |= BIT(BAZ_BIT);
> > 
> > 	BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
> > 	BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
> > 	BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
> 
> Can you put this snippet into lib/test_bitops.c?

Great idea, sure!

> 
> Thanks,
> Yury
> 
> > triggers the first assertion on x86_64, which means that the
> > compiler is unable to evaluate it to a compile-time initializer
> > when the architecture-specific bitop is used even if it's obvious.
> > In order to let the compiler optimize out such cases, expand the
> > bitop() macro to use the "constant" C non-atomic bitop
> > implementations when all of the arguments passed are compile-time
> > constants, which means that the result will be a compile-time
> > constant as well, so that it produces more efficient and simple
> > code in 100% cases, comparing to the architecture-specific
> > counterparts.
> > 
> > The savings are architecture, compiler and compiler flags dependent,
> > for example, on x86_64 -O2:
> > 
> > GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> > LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> > LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
> > 
> > and ARM64 (courtesy of Mark):
> > 
> > GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> > LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
> > 
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> > ---
> >  include/linux/bitops.h | 18 +++++++++++++++++-
> >  1 file changed, 17 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> > index 753f98e0dcf5..364bdc3606b4 100644
> > --- a/include/linux/bitops.h
> > +++ b/include/linux/bitops.h
> > @@ -33,8 +33,24 @@ extern unsigned long __sw_hweight64(__u64 w);
> >  
> >  #include <asm-generic/bitops/generic-non-atomic.h>
> >  
> > +/*
> > + * Many architecture-specific non-atomic bitops contain inline asm code and due
> > + * to that the compiler can't optimize them to compile-time expressions or
> > + * constants. In contrary, gen_*() helpers are defined in pure C and compilers
> > + * optimize them just well.
> > + * Therefore, to make `unsigned long foo = 0; __set_bit(BAR, &foo)` effectively
> > + * equal to `unsigned long foo = BIT(BAR)`, pick the generic C alternative when
> > + * the arguments can be resolved at compile time. That expression itself is a
> > + * constant and doesn't bring any functional changes to the rest of cases.
> > + * The casts to `uintptr_t` are needed to mitigate `-Waddress` warnings when
> > + * passing a bitmap from .bss or .data (-> `!!addr` is always true).
> > + */
> >  #define bitop(op, nr, addr)						\
> > -	op(nr, addr)
> > +	((__builtin_constant_p(nr) &&					\
> > +	  __builtin_constant_p((uintptr_t)(addr) != (uintptr_t)NULL) &&	\
> > +	  (uintptr_t)(addr) != (uintptr_t)NULL &&			\
> > +	  __builtin_constant_p(*(const unsigned long *)(addr))) ?	\
> > +	 const##op(nr, addr) : op(nr, addr))
> >  
> >  #define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
> >  #define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
> > -- 
> 2.36.1

Thanks,
Olek

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 0/6] bitops: let optimize out non-atomic bitops on compile-time constants
  2022-06-13 15:22     ` Geert Uytterhoeven
@ 2022-06-15 14:17       ` Alexander Lobakin
  0 siblings, 0 replies; 30+ messages in thread
From: Alexander Lobakin @ 2022-06-15 14:17 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Alexander Lobakin, Arnd Bergmann, Yury Norov, Andy Shevchenko,
	Mark Rutland, Matt Turner, Brian Cain, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	alpha, open list:QUALCOMM HEXAGON...,
	linux-ia64, linux-m68k, Linux-sh list, sparclinux, Linux-Arch,
	Linux Kernel Mailing List

From: Geert Uytterhoeven <geert@linux-m68k.org>
Date: Mon, 13 Jun 2022 17:22:30 +0200

> Hi Olek,

Hi!

> 
> On Mon, Jun 13, 2022 at 4:28 PM Alexander Lobakin
> <alexandr.lobakin@intel.com> wrote:
> > From: Geert Uytterhoeven <geert@linux-m68k.org>
> > > On Fri, Jun 10, 2022 at 1:35 PM Alexander Lobakin
> > > <alexandr.lobakin@intel.com> wrote:
> > > On m68k, using gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04), this
> > > blows up immediately with:
> >
> > Yeah I saw the kernel bot report already, sorry for that >_< Fixed
> > in v3 already, will send in 1-2 days.
> 
> Is it simple to fix?
> I might be able to give the fixed v2 a try before that.
> Thanks!

Oh, sorry for the late reply, was busy with stuff.
It's linear, there are (after applying the series) some static
inlines in arch/m68k/include/asm/bitops.h (those which were
converted from macros in patch 3/6), 7 ops in total, you just need
to create definitions with 'arch_' prefix for each of them, e.g.

#define arch_test_bit	test_bit
#define arch___set_bit	__set_bit // will be ___set_bit after 5/6

etc.
Hope I explained it clear-ish :)

> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds

Thanks,
Olek

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2 4/6] bitops: define const_*() versions of the non-atomics
  2022-06-15 13:55     ` Alexander Lobakin
@ 2022-06-15 15:52       ` David Laight
  0 siblings, 0 replies; 30+ messages in thread
From: David Laight @ 2022-06-15 15:52 UTC (permalink / raw)
  To: 'Alexander Lobakin', Yury Norov
  Cc: Arnd Bergmann, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

From: Alexander Lobakin
> Sent: 15 June 2022 14:55
...
> > > +/**
> > > + * const_test_bit - Determine whether a bit is set
> > > + * @nr: bit number to test
> > > + * @addr: Address to start counting from
> > > + *
> > > + * A version of generic_test_bit() which discards the `volatile` qualifier to
> > > + * allow the compiler to optimize code harder. Non-atomic and to be used only
> > > + * for testing compile-time constants, e.g. from the corresponding macro, or
> > > + * when you really know what you are doing.
> >
> > Not sure I understand the last sentence... Can you please rephrase?
> 
> I basically want to tell that there potentinally might be cases for
> using those outside of the actual macros from 6/6. But it might be
> redundant at all to mention this.

I bet that is a function has:
	long bitmask;
	...
	if (test_bit(&bitmask, 12))
then the 'volatile' forces the compiler to actually write the
value out to memory (stack) instead of doing a register op.

OTOH such code should be using &.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2022-06-15 15:52 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-10 11:34 [PATCH v2 0/6] bitops: let optimize out non-atomic bitops on compile-time constants Alexander Lobakin
2022-06-10 11:34 ` [PATCH v2 1/6] ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr() Alexander Lobakin
2022-06-10 12:18   ` David Laight
2022-06-10 13:46   ` Andy Shevchenko
2022-06-15  2:59   ` Yury Norov
2022-06-10 11:34 ` [PATCH v2 2/6] bitops: always define asm-generic non-atomic bitops Alexander Lobakin
2022-06-10 13:50   ` Andy Shevchenko
2022-06-10 16:02     ` Luck, Tony
2022-06-10 16:32       ` Marco Elver
2022-06-13 14:19         ` Alexander Lobakin
2022-06-13 14:33           ` Marco Elver
2022-06-15  2:47             ` Yury Norov
2022-06-15  7:46               ` Marco Elver
2022-06-13 16:26           ` Luck, Tony
2022-06-13 21:29             ` David Laight
2022-06-10 11:34 ` [PATCH v2 3/6] bitops: unify non-atomic bitops prototypes across architectures Alexander Lobakin
2022-06-10 11:34 ` [PATCH v2 4/6] bitops: define const_*() versions of the non-atomics Alexander Lobakin
2022-06-10 13:56   ` Andy Shevchenko
2022-06-13 14:30     ` Alexander Lobakin
2022-06-15  2:57   ` Yury Norov
2022-06-15 13:55     ` Alexander Lobakin
2022-06-15 15:52       ` David Laight
2022-06-10 11:34 ` [PATCH v2 5/6] bitops: wrap non-atomic bitops with a transparent macro Alexander Lobakin
2022-06-10 11:34 ` [PATCH v2 6/6] bitops: let optimize out non-atomic bitops on compile-time constants Alexander Lobakin
2022-06-15  3:26   ` Yury Norov
2022-06-15 14:00     ` Alexander Lobakin
2022-06-13  7:35 ` [PATCH v2 0/6] " Geert Uytterhoeven
2022-06-13 14:26   ` Alexander Lobakin
2022-06-13 15:22     ` Geert Uytterhoeven
2022-06-15 14:17       ` Alexander Lobakin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).