All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-06 11:49 ` Alexander Lobakin
  0 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-06 11:49 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

While I was working on converting some structure fields from a fixed
type to a bitmap, I started observing code size increase not only in
places where the code works with the converted structure fields, but
also where the converted vars were on the stack. That said, the
following code:

	DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
	unsigned long bar = BIT(BAR_BIT);
	unsigned long baz = 0;

	__set_bit(FOO_BIT, foo);
	baz |= BIT(BAZ_BIT);

	BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
	BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
	BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));

triggers the first assertion on x86_64, which means that the
compiler is unable to evaluate it to a compile-time initializer
when the architecture-specific bitop is used even if it's obvious.
I found that this is due to that many architecture-specific
non-atomic bitop implementations use inline asm or other hacks which
are faster or more robust when working with "real" variables (i.e.
fields from the structures etc.), but the compilers have no clue how
to optimize them out when called on compile-time constants.

So, in order to let the compiler optimize out such cases, expand the
test_bit() and __*_bit() definitions with a compile-time condition
check, so that they will pick the generic C non-atomic bitop
implementations when all of the arguments passed are compile-time
constants, which means that the result will be a compile-time
constant as well and the compiler will produce more efficient and
simple code in 100% cases (no changes when there's at least one
non-compile-time-constant argument).
The condition itself:

if (
__builtin_constant_p(nr) &&	/* <- bit position is constant */
__builtin_constant_p(!!addr) &&	/* <- compiler knows bitmap addr is
				      always either NULL or not */
addr &&				/* <- bitmap addr is not NULL */
__builtin_constant_p(*addr)	/* <- compiler knows the value of
				      the target bitmap */
)
	/* then pick the generic C variant
else
	/* old code path, arch-specific

I also tried __is_constexpr() as suggested by Andy, but it was
always returning 0 ('not a constant') for the 2,3 and 4th
conditions.

The savings on x86_64 with LLVM are insane (.text):

$ scripts/bloat-o-meter -c vmlinux.{base,test}
add/remove: 72/75 grow/shrink: 182/518 up/down: 53925/-137810 (-83885)

$ scripts/bloat-o-meter -c vmlinux.{base,mod}
add/remove: 7/1 grow/shrink: 1/19 up/down: 1135/-4082 (-2947)

$ scripts/bloat-o-meter -c vmlinux.{base,all}
add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)

And the following:

	DECLARE_BITMAP(flags, __IP_TUNNEL_FLAG_NUM) = { };
	__be16 flags;

	__set_bit(IP_TUNNEL_CSUM_BIT, flags);

	tun_flags = cpu_to_be16(*flags & U16_MAX);

	if (test_bit(IP_TUNNEL_VTI_BIT, flags))
		tun_flags |= VTI_ISVTI;

	BUILD_BUG_ON(!__builtin_constant_p(tun_flags));

doesn't blow up anymore.

The series has been in intel-next for a while with no reported issues.
Also available on my open GH[0].

[0] https://github.com/alobakin/linux/commits/bitops

Alexander Lobakin (6):
  ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr()
  bitops: always define asm-generic non-atomic bitops
  bitops: define gen_test_bit() the same way as the rest of functions
  bitops: unify non-atomic bitops prototypes across architectures
  bitops: wrap non-atomic bitops with a transparent macro
  bitops: let optimize out non-atomic bitops on compile-time constants

 arch/alpha/include/asm/bitops.h               |  28 ++--
 arch/hexagon/include/asm/bitops.h             |  23 ++--
 arch/ia64/include/asm/bitops.h                |  40 +++---
 arch/ia64/include/asm/processor.h             |   2 +-
 arch/m68k/include/asm/bitops.h                |  47 +++++--
 arch/sh/include/asm/bitops-op32.h             |  32 +++--
 arch/sparc/include/asm/bitops_32.h            |  18 +--
 arch/sparc/lib/atomic32.c                     |  12 +-
 .../asm-generic/bitops/generic-non-atomic.h   | 128 ++++++++++++++++++
 .../bitops/instrumented-non-atomic.h          |  35 +++--
 include/asm-generic/bitops/non-atomic.h       | 123 ++---------------
 include/linux/bitops.h                        |  45 ++++++
 tools/include/asm-generic/bitops/non-atomic.h |  34 +++--
 tools/include/linux/bitops.h                  |  16 +++
 14 files changed, 363 insertions(+), 220 deletions(-)
 create mode 100644 include/asm-generic/bitops/generic-non-atomic.h

base-commit: f2906aa863381afb0015a9eb7fefad885d4e5a56
-- 
2.36.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 0/6] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-06 11:49 ` Alexander Lobakin
  0 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-06 11:49 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

While I was working on converting some structure fields from a fixed
type to a bitmap, I started observing code size increase not only in
places where the code works with the converted structure fields, but
also where the converted vars were on the stack. That said, the
following code:

	DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
	unsigned long bar = BIT(BAR_BIT);
	unsigned long baz = 0;

	__set_bit(FOO_BIT, foo);
	baz |= BIT(BAZ_BIT);

	BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
	BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
	BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));

triggers the first assertion on x86_64, which means that the
compiler is unable to evaluate it to a compile-time initializer
when the architecture-specific bitop is used even if it's obvious.
I found that this is due to that many architecture-specific
non-atomic bitop implementations use inline asm or other hacks which
are faster or more robust when working with "real" variables (i.e.
fields from the structures etc.), but the compilers have no clue how
to optimize them out when called on compile-time constants.

So, in order to let the compiler optimize out such cases, expand the
test_bit() and __*_bit() definitions with a compile-time condition
check, so that they will pick the generic C non-atomic bitop
implementations when all of the arguments passed are compile-time
constants, which means that the result will be a compile-time
constant as well and the compiler will produce more efficient and
simple code in 100% cases (no changes when there's at least one
non-compile-time-constant argument).
The condition itself:

if (
__builtin_constant_p(nr) &&	/* <- bit position is constant */
__builtin_constant_p(!!addr) &&	/* <- compiler knows bitmap addr is
				      always either NULL or not */
addr &&				/* <- bitmap addr is not NULL */
__builtin_constant_p(*addr)	/* <- compiler knows the value of
				      the target bitmap */
)
	/* then pick the generic C variant
else
	/* old code path, arch-specific

I also tried __is_constexpr() as suggested by Andy, but it was
always returning 0 ('not a constant') for the 2,3 and 4th
conditions.

The savings on x86_64 with LLVM are insane (.text):

$ scripts/bloat-o-meter -c vmlinux.{base,test}
add/remove: 72/75 grow/shrink: 182/518 up/down: 53925/-137810 (-83885)

$ scripts/bloat-o-meter -c vmlinux.{base,mod}
add/remove: 7/1 grow/shrink: 1/19 up/down: 1135/-4082 (-2947)

$ scripts/bloat-o-meter -c vmlinux.{base,all}
add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)

And the following:

	DECLARE_BITMAP(flags, __IP_TUNNEL_FLAG_NUM) = { };
	__be16 flags;

	__set_bit(IP_TUNNEL_CSUM_BIT, flags);

	tun_flags = cpu_to_be16(*flags & U16_MAX);

	if (test_bit(IP_TUNNEL_VTI_BIT, flags))
		tun_flags |= VTI_ISVTI;

	BUILD_BUG_ON(!__builtin_constant_p(tun_flags));

doesn't blow up anymore.

The series has been in intel-next for a while with no reported issues.
Also available on my open GH[0].

[0] https://github.com/alobakin/linux/commits/bitops

Alexander Lobakin (6):
  ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr()
  bitops: always define asm-generic non-atomic bitops
  bitops: define gen_test_bit() the same way as the rest of functions
  bitops: unify non-atomic bitops prototypes across architectures
  bitops: wrap non-atomic bitops with a transparent macro
  bitops: let optimize out non-atomic bitops on compile-time constants

 arch/alpha/include/asm/bitops.h               |  28 ++--
 arch/hexagon/include/asm/bitops.h             |  23 ++--
 arch/ia64/include/asm/bitops.h                |  40 +++---
 arch/ia64/include/asm/processor.h             |   2 +-
 arch/m68k/include/asm/bitops.h                |  47 +++++--
 arch/sh/include/asm/bitops-op32.h             |  32 +++--
 arch/sparc/include/asm/bitops_32.h            |  18 +--
 arch/sparc/lib/atomic32.c                     |  12 +-
 .../asm-generic/bitops/generic-non-atomic.h   | 128 ++++++++++++++++++
 .../bitops/instrumented-non-atomic.h          |  35 +++--
 include/asm-generic/bitops/non-atomic.h       | 123 ++---------------
 include/linux/bitops.h                        |  45 ++++++
 tools/include/asm-generic/bitops/non-atomic.h |  34 +++--
 tools/include/linux/bitops.h                  |  16 +++
 14 files changed, 363 insertions(+), 220 deletions(-)
 create mode 100644 include/asm-generic/bitops/generic-non-atomic.h

base-commit: f2906aa863381afb0015a9eb7fefad885d4e5a56
-- 
2.36.1

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 0/6] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-06 11:49 ` Alexander Lobakin
  0 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-06 11:49 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux

While I was working on converting some structure fields from a fixed
type to a bitmap, I started observing code size increase not only in
places where the code works with the converted structure fields, but
also where the converted vars were on the stack. That said, the
following code:

	DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
	unsigned long bar = BIT(BAR_BIT);
	unsigned long baz = 0;

	__set_bit(FOO_BIT, foo);
	baz |= BIT(BAZ_BIT);

	BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
	BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
	BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));

triggers the first assertion on x86_64, which means that the
compiler is unable to evaluate it to a compile-time initializer
when the architecture-specific bitop is used even if it's obvious.
I found that this is due to that many architecture-specific
non-atomic bitop implementations use inline asm or other hacks which
are faster or more robust when working with "real" variables (i.e.
fields from the structures etc.), but the compilers have no clue how
to optimize them out when called on compile-time constants.

So, in order to let the compiler optimize out such cases, expand the
test_bit() and __*_bit() definitions with a compile-time condition
check, so that they will pick the generic C non-atomic bitop
implementations when all of the arguments passed are compile-time
constants, which means that the result will be a compile-time
constant as well and the compiler will produce more efficient and
simple code in 100% cases (no changes when there's at least one
non-compile-time-constant argument).
The condition itself:

if (
__builtin_constant_p(nr) &&	/* <- bit position is constant */
__builtin_constant_p(!!addr) &&	/* <- compiler knows bitmap addr is
				      always either NULL or not */
addr &&				/* <- bitmap addr is not NULL */
__builtin_constant_p(*addr)	/* <- compiler knows the value of
				      the target bitmap */
)
	/* then pick the generic C variant
else
	/* old code path, arch-specific

I also tried __is_constexpr() as suggested by Andy, but it was
always returning 0 ('not a constant') for the 2,3 and 4th
conditions.

The savings on x86_64 with LLVM are insane (.text):

$ scripts/bloat-o-meter -c vmlinux.{base,test}
add/remove: 72/75 grow/shrink: 182/518 up/down: 53925/-137810 (-83885)

$ scripts/bloat-o-meter -c vmlinux.{base,mod}
add/remove: 7/1 grow/shrink: 1/19 up/down: 1135/-4082 (-2947)

$ scripts/bloat-o-meter -c vmlinux.{base,all}
add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)

And the following:

	DECLARE_BITMAP(flags, __IP_TUNNEL_FLAG_NUM) = { };
	__be16 flags;

	__set_bit(IP_TUNNEL_CSUM_BIT, flags);

	tun_flags = cpu_to_be16(*flags & U16_MAX);

	if (test_bit(IP_TUNNEL_VTI_BIT, flags))
		tun_flags |= VTI_ISVTI;

	BUILD_BUG_ON(!__builtin_constant_p(tun_flags));

doesn't blow up anymore.

The series has been in intel-next for a while with no reported issues.
Also available on my open GH[0].

[0] https://github.com/alobakin/linux/commits/bitops

Alexander Lobakin (6):
  ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr()
  bitops: always define asm-generic non-atomic bitops
  bitops: define gen_test_bit() the same way as the rest of functions
  bitops: unify non-atomic bitops prototypes across architectures
  bitops: wrap non-atomic bitops with a transparent macro
  bitops: let optimize out non-atomic bitops on compile-time constants

 arch/alpha/include/asm/bitops.h               |  28 ++--
 arch/hexagon/include/asm/bitops.h             |  23 ++--
 arch/ia64/include/asm/bitops.h                |  40 +++---
 arch/ia64/include/asm/processor.h             |   2 +-
 arch/m68k/include/asm/bitops.h                |  47 +++++--
 arch/sh/include/asm/bitops-op32.h             |  32 +++--
 arch/sparc/include/asm/bitops_32.h            |  18 +--
 arch/sparc/lib/atomic32.c                     |  12 +-
 .../asm-generic/bitops/generic-non-atomic.h   | 128 ++++++++++++++++++
 .../bitops/instrumented-non-atomic.h          |  35 +++--
 include/asm-generic/bitops/non-atomic.h       | 123 ++---------------
 include/linux/bitops.h                        |  45 ++++++
 tools/include/asm-generic/bitops/non-atomic.h |  34 +++--
 tools/include/linux/bitops.h                  |  16 +++
 14 files changed, 363 insertions(+), 220 deletions(-)
 create mode 100644 include/asm-generic/bitops/generic-non-atomic.h

base-commit: f2906aa863381afb0015a9eb7fefad885d4e5a56
-- 
2.36.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 1/6] ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr()
  2022-06-06 11:49 ` Alexander Lobakin
  (?)
@ 2022-06-06 11:49   ` Alexander Lobakin
  -1 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-06 11:49 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel, stable, kernel test robot

test_bit(), as any other bitmap op, takes `unsigned long *` as a
second argument (pointer to the actual bitmap), as any bitmap
itself is an array of unsigned longs. However, the ia64_get_irr()
code passes a ref to `u64` as a second argument.
This works with the ia64 bitops implementation due to that they
have `void *` as the second argument and then cast it later on.
This works with the bitmap API itself due to that `unsigned long`
has the same size on ia64 as `u64` (`unsigned long long`), but
from the compiler PoV those two are different.
Define @irr as `unsigned long` to fix that. That implies no
functional changes. Has been hidden for 16 years!

Fixes: a58786917ce2 ("[IA64] avoid broken SAL_CACHE_FLUSH implementations")
Cc: stable@vger.kernel.org # 2.6.16+
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 arch/ia64/include/asm/processor.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h
index 7cbce290f4e5..757c2f6d8d4b 100644
--- a/arch/ia64/include/asm/processor.h
+++ b/arch/ia64/include/asm/processor.h
@@ -538,7 +538,7 @@ ia64_get_irr(unsigned int vector)
 {
 	unsigned int reg = vector / 64;
 	unsigned int bit = vector % 64;
-	u64 irr;
+	unsigned long irr;
 
 	switch (reg) {
 	case 0: irr = ia64_getreg(_IA64_REG_CR_IRR0); break;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 1/6] ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr()
@ 2022-06-06 11:49   ` Alexander Lobakin
  0 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-06 11:49 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel, stable, kernel test robot

test_bit(), as any other bitmap op, takes `unsigned long *` as a
second argument (pointer to the actual bitmap), as any bitmap
itself is an array of unsigned longs. However, the ia64_get_irr()
code passes a ref to `u64` as a second argument.
This works with the ia64 bitops implementation due to that they
have `void *` as the second argument and then cast it later on.
This works with the bitmap API itself due to that `unsigned long`
has the same size on ia64 as `u64` (`unsigned long long`), but
from the compiler PoV those two are different.
Define @irr as `unsigned long` to fix that. That implies no
functional changes. Has been hidden for 16 years!

Fixes: a58786917ce2 ("[IA64] avoid broken SAL_CACHE_FLUSH implementations")
Cc: stable@vger.kernel.org # 2.6.16+
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 arch/ia64/include/asm/processor.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h
index 7cbce290f4e5..757c2f6d8d4b 100644
--- a/arch/ia64/include/asm/processor.h
+++ b/arch/ia64/include/asm/processor.h
@@ -538,7 +538,7 @@ ia64_get_irr(unsigned int vector)
 {
 	unsigned int reg = vector / 64;
 	unsigned int bit = vector % 64;
-	u64 irr;
+	unsigned long irr;
 
 	switch (reg) {
 	case 0: irr = ia64_getreg(_IA64_REG_CR_IRR0); break;
-- 
2.36.1

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 1/6] ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr()
@ 2022-06-06 11:49   ` Alexander Lobakin
  0 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-06 11:49 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux

test_bit(), as any other bitmap op, takes `unsigned long *` as a
second argument (pointer to the actual bitmap), as any bitmap
itself is an array of unsigned longs. However, the ia64_get_irr()
code passes a ref to `u64` as a second argument.
This works with the ia64 bitops implementation due to that they
have `void *` as the second argument and then cast it later on.
This works with the bitmap API itself due to that `unsigned long`
has the same size on ia64 as `u64` (`unsigned long long`), but
from the compiler PoV those two are different.
Define @irr as `unsigned long` to fix that. That implies no
functional changes. Has been hidden for 16 years!

Fixes: a58786917ce2 ("[IA64] avoid broken SAL_CACHE_FLUSH implementations")
Cc: stable@vger.kernel.org # 2.6.16+
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 arch/ia64/include/asm/processor.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h
index 7cbce290f4e5..757c2f6d8d4b 100644
--- a/arch/ia64/include/asm/processor.h
+++ b/arch/ia64/include/asm/processor.h
@@ -538,7 +538,7 @@ ia64_get_irr(unsigned int vector)
 {
 	unsigned int reg = vector / 64;
 	unsigned int bit = vector % 64;
-	u64 irr;
+	unsigned long irr;
 
 	switch (reg) {
 	case 0: irr = ia64_getreg(_IA64_REG_CR_IRR0); break;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 2/6] bitops: always define asm-generic non-atomic bitops
  2022-06-06 11:49 ` Alexander Lobakin
  (?)
@ 2022-06-06 11:49   ` Alexander Lobakin
  -1 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-06 11:49 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

Move generic non-atomic bitops from the asm-generic header which
gets included only when there are no architecture-specific
alternatives, to a separate independent file to make them always
available.

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 .../asm-generic/bitops/generic-non-atomic.h   | 124 ++++++++++++++++++
 include/asm-generic/bitops/non-atomic.h       | 109 ++-------------
 2 files changed, 132 insertions(+), 101 deletions(-)
 create mode 100644 include/asm-generic/bitops/generic-non-atomic.h

diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
new file mode 100644
index 000000000000..7a60adfa6e7d
--- /dev/null
+++ b/include/asm-generic/bitops/generic-non-atomic.h
@@ -0,0 +1,124 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
+#define __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
+
+#include <linux/bits.h>
+
+#ifndef _LINUX_BITOPS_H
+#error only <linux/bitops.h> can be included directly
+#endif
+
+/*
+ * Generic definitions for bit operations, should not be used in regular code
+ * directly.
+ */
+
+/**
+ * gen___set_bit - Set a bit in memory
+ * @nr: the bit to set
+ * @addr: the address to start counting from
+ *
+ * Unlike set_bit(), this function is non-atomic and may be reordered.
+ * If it's called on the same region of memory simultaneously, the effect
+ * may be that only one operation succeeds.
+ */
+static __always_inline void
+gen___set_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+	*p  |= mask;
+}
+
+static __always_inline void
+gen___clear_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+	*p &= ~mask;
+}
+
+/**
+ * gen___change_bit - Toggle a bit in memory
+ * @nr: the bit to change
+ * @addr: the address to start counting from
+ *
+ * Unlike change_bit(), this function is non-atomic and may be reordered.
+ * If it's called on the same region of memory simultaneously, the effect
+ * may be that only one operation succeeds.
+ */
+static __always_inline
+void gen___change_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+	*p ^= mask;
+}
+
+/**
+ * gen___test_and_set_bit - Set a bit and return its old value
+ * @nr: Bit to set
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail.  You must protect multiple accesses with a lock.
+ */
+static __always_inline int
+gen___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+	unsigned long old = *p;
+
+	*p = old | mask;
+	return (old & mask) != 0;
+}
+
+/**
+ * gen___test_and_clear_bit - Clear a bit and return its old value
+ * @nr: Bit to clear
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail.  You must protect multiple accesses with a lock.
+ */
+static __always_inline int
+gen___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+	unsigned long old = *p;
+
+	*p = old & ~mask;
+	return (old & mask) != 0;
+}
+
+/* WARNING: non atomic and it can be reordered! */
+static __always_inline int
+gen___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+	unsigned long old = *p;
+
+	*p = old ^ mask;
+	return (old & mask) != 0;
+}
+
+/**
+ * gen_test_bit - Determine whether a bit is set
+ * @nr: bit number to test
+ * @addr: Address to start counting from
+ */
+static __always_inline int
+gen_test_bit(unsigned int nr, const volatile unsigned long *addr)
+{
+	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
+}
+
+#endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h
index 078cc68be2f1..7ce0ed22fb5f 100644
--- a/include/asm-generic/bitops/non-atomic.h
+++ b/include/asm-generic/bitops/non-atomic.h
@@ -2,121 +2,28 @@
 #ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
 #define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
 
+#include <asm-generic/bitops/generic-non-atomic.h>
 #include <asm/types.h>
 
-/**
- * arch___set_bit - Set a bit in memory
- * @nr: the bit to set
- * @addr: the address to start counting from
- *
- * Unlike set_bit(), this function is non-atomic and may be reordered.
- * If it's called on the same region of memory simultaneously, the effect
- * may be that only one operation succeeds.
- */
-static __always_inline void
-arch___set_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-
-	*p  |= mask;
-}
+#define arch___set_bit gen___set_bit
 #define __set_bit arch___set_bit
 
-static __always_inline void
-arch___clear_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-
-	*p &= ~mask;
-}
+#define arch___clear_bit gen___clear_bit
 #define __clear_bit arch___clear_bit
 
-/**
- * arch___change_bit - Toggle a bit in memory
- * @nr: the bit to change
- * @addr: the address to start counting from
- *
- * Unlike change_bit(), this function is non-atomic and may be reordered.
- * If it's called on the same region of memory simultaneously, the effect
- * may be that only one operation succeeds.
- */
-static __always_inline
-void arch___change_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-
-	*p ^= mask;
-}
+#define arch___change_bit gen___change_bit
 #define __change_bit arch___change_bit
 
-/**
- * arch___test_and_set_bit - Set a bit and return its old value
- * @nr: Bit to set
- * @addr: Address to count from
- *
- * This operation is non-atomic and can be reordered.
- * If two examples of this operation race, one can appear to succeed
- * but actually fail.  You must protect multiple accesses with a lock.
- */
-static __always_inline int
-arch___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-	unsigned long old = *p;
-
-	*p = old | mask;
-	return (old & mask) != 0;
-}
+#define arch___test_and_set_bit gen___test_and_set_bit
 #define __test_and_set_bit arch___test_and_set_bit
 
-/**
- * arch___test_and_clear_bit - Clear a bit and return its old value
- * @nr: Bit to clear
- * @addr: Address to count from
- *
- * This operation is non-atomic and can be reordered.
- * If two examples of this operation race, one can appear to succeed
- * but actually fail.  You must protect multiple accesses with a lock.
- */
-static __always_inline int
-arch___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-	unsigned long old = *p;
-
-	*p = old & ~mask;
-	return (old & mask) != 0;
-}
+#define arch___test_and_clear_bit gen___test_and_clear_bit
 #define __test_and_clear_bit arch___test_and_clear_bit
 
-/* WARNING: non atomic and it can be reordered! */
-static __always_inline int
-arch___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-	unsigned long old = *p;
-
-	*p = old ^ mask;
-	return (old & mask) != 0;
-}
+#define arch___test_and_change_bit gen___test_and_change_bit
 #define __test_and_change_bit arch___test_and_change_bit
 
-/**
- * arch_test_bit - Determine whether a bit is set
- * @nr: bit number to test
- * @addr: Address to start counting from
- */
-static __always_inline int
-arch_test_bit(unsigned int nr, const volatile unsigned long *addr)
-{
-	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
-}
+#define arch_test_bit gen_test_bit
 #define test_bit arch_test_bit
 
 #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 2/6] bitops: always define asm-generic non-atomic bitops
@ 2022-06-06 11:49   ` Alexander Lobakin
  0 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-06 11:49 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

Move generic non-atomic bitops from the asm-generic header which
gets included only when there are no architecture-specific
alternatives, to a separate independent file to make them always
available.

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 .../asm-generic/bitops/generic-non-atomic.h   | 124 ++++++++++++++++++
 include/asm-generic/bitops/non-atomic.h       | 109 ++-------------
 2 files changed, 132 insertions(+), 101 deletions(-)
 create mode 100644 include/asm-generic/bitops/generic-non-atomic.h

diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
new file mode 100644
index 000000000000..7a60adfa6e7d
--- /dev/null
+++ b/include/asm-generic/bitops/generic-non-atomic.h
@@ -0,0 +1,124 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
+#define __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
+
+#include <linux/bits.h>
+
+#ifndef _LINUX_BITOPS_H
+#error only <linux/bitops.h> can be included directly
+#endif
+
+/*
+ * Generic definitions for bit operations, should not be used in regular code
+ * directly.
+ */
+
+/**
+ * gen___set_bit - Set a bit in memory
+ * @nr: the bit to set
+ * @addr: the address to start counting from
+ *
+ * Unlike set_bit(), this function is non-atomic and may be reordered.
+ * If it's called on the same region of memory simultaneously, the effect
+ * may be that only one operation succeeds.
+ */
+static __always_inline void
+gen___set_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+	*p  |= mask;
+}
+
+static __always_inline void
+gen___clear_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+	*p &= ~mask;
+}
+
+/**
+ * gen___change_bit - Toggle a bit in memory
+ * @nr: the bit to change
+ * @addr: the address to start counting from
+ *
+ * Unlike change_bit(), this function is non-atomic and may be reordered.
+ * If it's called on the same region of memory simultaneously, the effect
+ * may be that only one operation succeeds.
+ */
+static __always_inline
+void gen___change_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+	*p ^= mask;
+}
+
+/**
+ * gen___test_and_set_bit - Set a bit and return its old value
+ * @nr: Bit to set
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail.  You must protect multiple accesses with a lock.
+ */
+static __always_inline int
+gen___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+	unsigned long old = *p;
+
+	*p = old | mask;
+	return (old & mask) != 0;
+}
+
+/**
+ * gen___test_and_clear_bit - Clear a bit and return its old value
+ * @nr: Bit to clear
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail.  You must protect multiple accesses with a lock.
+ */
+static __always_inline int
+gen___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+	unsigned long old = *p;
+
+	*p = old & ~mask;
+	return (old & mask) != 0;
+}
+
+/* WARNING: non atomic and it can be reordered! */
+static __always_inline int
+gen___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+	unsigned long old = *p;
+
+	*p = old ^ mask;
+	return (old & mask) != 0;
+}
+
+/**
+ * gen_test_bit - Determine whether a bit is set
+ * @nr: bit number to test
+ * @addr: Address to start counting from
+ */
+static __always_inline int
+gen_test_bit(unsigned int nr, const volatile unsigned long *addr)
+{
+	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
+}
+
+#endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h
index 078cc68be2f1..7ce0ed22fb5f 100644
--- a/include/asm-generic/bitops/non-atomic.h
+++ b/include/asm-generic/bitops/non-atomic.h
@@ -2,121 +2,28 @@
 #ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
 #define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
 
+#include <asm-generic/bitops/generic-non-atomic.h>
 #include <asm/types.h>
 
-/**
- * arch___set_bit - Set a bit in memory
- * @nr: the bit to set
- * @addr: the address to start counting from
- *
- * Unlike set_bit(), this function is non-atomic and may be reordered.
- * If it's called on the same region of memory simultaneously, the effect
- * may be that only one operation succeeds.
- */
-static __always_inline void
-arch___set_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-
-	*p  |= mask;
-}
+#define arch___set_bit gen___set_bit
 #define __set_bit arch___set_bit
 
-static __always_inline void
-arch___clear_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-
-	*p &= ~mask;
-}
+#define arch___clear_bit gen___clear_bit
 #define __clear_bit arch___clear_bit
 
-/**
- * arch___change_bit - Toggle a bit in memory
- * @nr: the bit to change
- * @addr: the address to start counting from
- *
- * Unlike change_bit(), this function is non-atomic and may be reordered.
- * If it's called on the same region of memory simultaneously, the effect
- * may be that only one operation succeeds.
- */
-static __always_inline
-void arch___change_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-
-	*p ^= mask;
-}
+#define arch___change_bit gen___change_bit
 #define __change_bit arch___change_bit
 
-/**
- * arch___test_and_set_bit - Set a bit and return its old value
- * @nr: Bit to set
- * @addr: Address to count from
- *
- * This operation is non-atomic and can be reordered.
- * If two examples of this operation race, one can appear to succeed
- * but actually fail.  You must protect multiple accesses with a lock.
- */
-static __always_inline int
-arch___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-	unsigned long old = *p;
-
-	*p = old | mask;
-	return (old & mask) != 0;
-}
+#define arch___test_and_set_bit gen___test_and_set_bit
 #define __test_and_set_bit arch___test_and_set_bit
 
-/**
- * arch___test_and_clear_bit - Clear a bit and return its old value
- * @nr: Bit to clear
- * @addr: Address to count from
- *
- * This operation is non-atomic and can be reordered.
- * If two examples of this operation race, one can appear to succeed
- * but actually fail.  You must protect multiple accesses with a lock.
- */
-static __always_inline int
-arch___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-	unsigned long old = *p;
-
-	*p = old & ~mask;
-	return (old & mask) != 0;
-}
+#define arch___test_and_clear_bit gen___test_and_clear_bit
 #define __test_and_clear_bit arch___test_and_clear_bit
 
-/* WARNING: non atomic and it can be reordered! */
-static __always_inline int
-arch___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-	unsigned long old = *p;
-
-	*p = old ^ mask;
-	return (old & mask) != 0;
-}
+#define arch___test_and_change_bit gen___test_and_change_bit
 #define __test_and_change_bit arch___test_and_change_bit
 
-/**
- * arch_test_bit - Determine whether a bit is set
- * @nr: bit number to test
- * @addr: Address to start counting from
- */
-static __always_inline int
-arch_test_bit(unsigned int nr, const volatile unsigned long *addr)
-{
-	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
-}
+#define arch_test_bit gen_test_bit
 #define test_bit arch_test_bit
 
 #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
-- 
2.36.1

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 2/6] bitops: always define asm-generic non-atomic bitops
@ 2022-06-06 11:49   ` Alexander Lobakin
  0 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-06 11:49 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux

Move generic non-atomic bitops from the asm-generic header which
gets included only when there are no architecture-specific
alternatives, to a separate independent file to make them always
available.

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 .../asm-generic/bitops/generic-non-atomic.h   | 124 ++++++++++++++++++
 include/asm-generic/bitops/non-atomic.h       | 109 ++-------------
 2 files changed, 132 insertions(+), 101 deletions(-)
 create mode 100644 include/asm-generic/bitops/generic-non-atomic.h

diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
new file mode 100644
index 000000000000..7a60adfa6e7d
--- /dev/null
+++ b/include/asm-generic/bitops/generic-non-atomic.h
@@ -0,0 +1,124 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
+#define __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
+
+#include <linux/bits.h>
+
+#ifndef _LINUX_BITOPS_H
+#error only <linux/bitops.h> can be included directly
+#endif
+
+/*
+ * Generic definitions for bit operations, should not be used in regular code
+ * directly.
+ */
+
+/**
+ * gen___set_bit - Set a bit in memory
+ * @nr: the bit to set
+ * @addr: the address to start counting from
+ *
+ * Unlike set_bit(), this function is non-atomic and may be reordered.
+ * If it's called on the same region of memory simultaneously, the effect
+ * may be that only one operation succeeds.
+ */
+static __always_inline void
+gen___set_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+	*p  |= mask;
+}
+
+static __always_inline void
+gen___clear_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+	*p &= ~mask;
+}
+
+/**
+ * gen___change_bit - Toggle a bit in memory
+ * @nr: the bit to change
+ * @addr: the address to start counting from
+ *
+ * Unlike change_bit(), this function is non-atomic and may be reordered.
+ * If it's called on the same region of memory simultaneously, the effect
+ * may be that only one operation succeeds.
+ */
+static __always_inline
+void gen___change_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+	*p ^= mask;
+}
+
+/**
+ * gen___test_and_set_bit - Set a bit and return its old value
+ * @nr: Bit to set
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail.  You must protect multiple accesses with a lock.
+ */
+static __always_inline int
+gen___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+	unsigned long old = *p;
+
+	*p = old | mask;
+	return (old & mask) != 0;
+}
+
+/**
+ * gen___test_and_clear_bit - Clear a bit and return its old value
+ * @nr: Bit to clear
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail.  You must protect multiple accesses with a lock.
+ */
+static __always_inline int
+gen___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+	unsigned long old = *p;
+
+	*p = old & ~mask;
+	return (old & mask) != 0;
+}
+
+/* WARNING: non atomic and it can be reordered! */
+static __always_inline int
+gen___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+	unsigned long old = *p;
+
+	*p = old ^ mask;
+	return (old & mask) != 0;
+}
+
+/**
+ * gen_test_bit - Determine whether a bit is set
+ * @nr: bit number to test
+ * @addr: Address to start counting from
+ */
+static __always_inline int
+gen_test_bit(unsigned int nr, const volatile unsigned long *addr)
+{
+	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
+}
+
+#endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h
index 078cc68be2f1..7ce0ed22fb5f 100644
--- a/include/asm-generic/bitops/non-atomic.h
+++ b/include/asm-generic/bitops/non-atomic.h
@@ -2,121 +2,28 @@
 #ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
 #define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
 
+#include <asm-generic/bitops/generic-non-atomic.h>
 #include <asm/types.h>
 
-/**
- * arch___set_bit - Set a bit in memory
- * @nr: the bit to set
- * @addr: the address to start counting from
- *
- * Unlike set_bit(), this function is non-atomic and may be reordered.
- * If it's called on the same region of memory simultaneously, the effect
- * may be that only one operation succeeds.
- */
-static __always_inline void
-arch___set_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-
-	*p  |= mask;
-}
+#define arch___set_bit gen___set_bit
 #define __set_bit arch___set_bit
 
-static __always_inline void
-arch___clear_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-
-	*p &= ~mask;
-}
+#define arch___clear_bit gen___clear_bit
 #define __clear_bit arch___clear_bit
 
-/**
- * arch___change_bit - Toggle a bit in memory
- * @nr: the bit to change
- * @addr: the address to start counting from
- *
- * Unlike change_bit(), this function is non-atomic and may be reordered.
- * If it's called on the same region of memory simultaneously, the effect
- * may be that only one operation succeeds.
- */
-static __always_inline
-void arch___change_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-
-	*p ^= mask;
-}
+#define arch___change_bit gen___change_bit
 #define __change_bit arch___change_bit
 
-/**
- * arch___test_and_set_bit - Set a bit and return its old value
- * @nr: Bit to set
- * @addr: Address to count from
- *
- * This operation is non-atomic and can be reordered.
- * If two examples of this operation race, one can appear to succeed
- * but actually fail.  You must protect multiple accesses with a lock.
- */
-static __always_inline int
-arch___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-	unsigned long old = *p;
-
-	*p = old | mask;
-	return (old & mask) != 0;
-}
+#define arch___test_and_set_bit gen___test_and_set_bit
 #define __test_and_set_bit arch___test_and_set_bit
 
-/**
- * arch___test_and_clear_bit - Clear a bit and return its old value
- * @nr: Bit to clear
- * @addr: Address to count from
- *
- * This operation is non-atomic and can be reordered.
- * If two examples of this operation race, one can appear to succeed
- * but actually fail.  You must protect multiple accesses with a lock.
- */
-static __always_inline int
-arch___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-	unsigned long old = *p;
-
-	*p = old & ~mask;
-	return (old & mask) != 0;
-}
+#define arch___test_and_clear_bit gen___test_and_clear_bit
 #define __test_and_clear_bit arch___test_and_clear_bit
 
-/* WARNING: non atomic and it can be reordered! */
-static __always_inline int
-arch___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-	unsigned long old = *p;
-
-	*p = old ^ mask;
-	return (old & mask) != 0;
-}
+#define arch___test_and_change_bit gen___test_and_change_bit
 #define __test_and_change_bit arch___test_and_change_bit
 
-/**
- * arch_test_bit - Determine whether a bit is set
- * @nr: bit number to test
- * @addr: Address to start counting from
- */
-static __always_inline int
-arch_test_bit(unsigned int nr, const volatile unsigned long *addr)
-{
-	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
-}
+#define arch_test_bit gen_test_bit
 #define test_bit arch_test_bit
 
 #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 3/6] bitops: define gen_test_bit() the same way as the rest of functions
  2022-06-06 11:49 ` Alexander Lobakin
  (?)
@ 2022-06-06 11:49   ` Alexander Lobakin
  -1 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-06 11:49 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

Currently, the generic test_bit() function is defined as a one-liner
and in case with constant bitmaps the compiler is unable to optimize
it to a constant. At the same time, gen_test_and_*_bit() are being
optimized pretty good.
Define gen_test_bit() the same way as they are defined.

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 include/asm-generic/bitops/generic-non-atomic.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
index 7a60adfa6e7d..202d8a3b40e1 100644
--- a/include/asm-generic/bitops/generic-non-atomic.h
+++ b/include/asm-generic/bitops/generic-non-atomic.h
@@ -118,7 +118,11 @@ gen___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
 static __always_inline int
 gen_test_bit(unsigned int nr, const volatile unsigned long *addr)
 {
-	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
+	const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long val = *p;
+
+	return !!(val & mask);
 }
 
 #endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 3/6] bitops: define gen_test_bit() the same way as the rest of functions
@ 2022-06-06 11:49   ` Alexander Lobakin
  0 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-06 11:49 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

Currently, the generic test_bit() function is defined as a one-liner
and in case with constant bitmaps the compiler is unable to optimize
it to a constant. At the same time, gen_test_and_*_bit() are being
optimized pretty good.
Define gen_test_bit() the same way as they are defined.

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 include/asm-generic/bitops/generic-non-atomic.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
index 7a60adfa6e7d..202d8a3b40e1 100644
--- a/include/asm-generic/bitops/generic-non-atomic.h
+++ b/include/asm-generic/bitops/generic-non-atomic.h
@@ -118,7 +118,11 @@ gen___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
 static __always_inline int
 gen_test_bit(unsigned int nr, const volatile unsigned long *addr)
 {
-	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
+	const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long val = *p;
+
+	return !!(val & mask);
 }
 
 #endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
-- 
2.36.1

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 3/6] bitops: define gen_test_bit() the same way as the rest of functions
@ 2022-06-06 11:49   ` Alexander Lobakin
  0 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-06 11:49 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux

Currently, the generic test_bit() function is defined as a one-liner
and in case with constant bitmaps the compiler is unable to optimize
it to a constant. At the same time, gen_test_and_*_bit() are being
optimized pretty good.
Define gen_test_bit() the same way as they are defined.

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 include/asm-generic/bitops/generic-non-atomic.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
index 7a60adfa6e7d..202d8a3b40e1 100644
--- a/include/asm-generic/bitops/generic-non-atomic.h
+++ b/include/asm-generic/bitops/generic-non-atomic.h
@@ -118,7 +118,11 @@ gen___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
 static __always_inline int
 gen_test_bit(unsigned int nr, const volatile unsigned long *addr)
 {
-	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
+	const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long val = *p;
+
+	return !!(val & mask);
 }
 
 #endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 4/6] bitops: unify non-atomic bitops prototypes across architectures
  2022-06-06 11:49 ` Alexander Lobakin
  (?)
@ 2022-06-06 11:49   ` Alexander Lobakin
  -1 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-06 11:49 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

Currently, there is a mess with the prototypes of the non-atomic
bitops across the different architectures:

ret	bool, int, unsigned long
nr	int, long, unsigned int, unsigned long
addr	volatile unsigned long *, volatile void *

Thankfully, it doesn't provoke any bugs, but can sometimes make
the compiler angry when it's not handy at all.
Adjust all the prototypes to the following standard:

ret	bool				retval can be only 0 or 1
nr	unsigned long			native; signed makes no sense
addr	volatile unsigned long *	bitmaps are arrays of ulongs

Finally, add some static assertions in order to prevent people from
making a mess in this room again.
I also used the %__always_inline attribute consistently they always
get resolved to the actual operations.

Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 arch/alpha/include/asm/bitops.h               | 28 +++++------
 arch/hexagon/include/asm/bitops.h             | 23 +++++----
 arch/ia64/include/asm/bitops.h                | 28 +++++------
 arch/m68k/include/asm/bitops.h                | 47 +++++++++++++------
 arch/sh/include/asm/bitops-op32.h             | 24 ++++++----
 .../asm-generic/bitops/generic-non-atomic.h   | 24 +++++-----
 .../bitops/instrumented-non-atomic.h          | 21 ++++++---
 include/linux/bitops.h                        | 13 +++++
 tools/include/asm-generic/bitops/non-atomic.h | 24 ++++++----
 9 files changed, 146 insertions(+), 86 deletions(-)

diff --git a/arch/alpha/include/asm/bitops.h b/arch/alpha/include/asm/bitops.h
index e1d8483a45f2..381ad2eae4b4 100644
--- a/arch/alpha/include/asm/bitops.h
+++ b/arch/alpha/include/asm/bitops.h
@@ -46,8 +46,8 @@ set_bit(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static inline void
-__set_bit(unsigned long nr, volatile void * addr)
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	int *m = ((int *) addr) + (nr >> 5);
 
@@ -82,8 +82,8 @@ clear_bit_unlock(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static __inline__ void
-__clear_bit(unsigned long nr, volatile void * addr)
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	int *m = ((int *) addr) + (nr >> 5);
 
@@ -118,8 +118,8 @@ change_bit(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static __inline__ void
-__change_bit(unsigned long nr, volatile void * addr)
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	int *m = ((int *) addr) + (nr >> 5);
 
@@ -186,8 +186,8 @@ test_and_set_bit_lock(unsigned long nr, volatile void *addr)
 /*
  * WARNING: non atomic version.
  */
-static inline int
-__test_and_set_bit(unsigned long nr, volatile void * addr)
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = 1 << (nr & 0x1f);
 	int *m = ((int *) addr) + (nr >> 5);
@@ -230,8 +230,8 @@ test_and_clear_bit(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static inline int
-__test_and_clear_bit(unsigned long nr, volatile void * addr)
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = 1 << (nr & 0x1f);
 	int *m = ((int *) addr) + (nr >> 5);
@@ -272,8 +272,8 @@ test_and_change_bit(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static __inline__ int
-__test_and_change_bit(unsigned long nr, volatile void * addr)
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = 1 << (nr & 0x1f);
 	int *m = ((int *) addr) + (nr >> 5);
@@ -283,8 +283,8 @@ __test_and_change_bit(unsigned long nr, volatile void * addr)
 	return (old & mask) != 0;
 }
 
-static inline int
-test_bit(int nr, const volatile void * addr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return (1UL & (((const int *) addr)[nr >> 5] >> (nr & 31))) != 0UL;
 }
diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h
index 75d6ba3643b8..a3bfe3a8d4b7 100644
--- a/arch/hexagon/include/asm/bitops.h
+++ b/arch/hexagon/include/asm/bitops.h
@@ -127,38 +127,45 @@ static inline void change_bit(int nr, volatile void *addr)
  * be atomic, particularly for things like slab_lock and slab_unlock.
  *
  */
-static inline void __clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	test_and_clear_bit(nr, addr);
 }
 
-static inline void __set_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	test_and_set_bit(nr, addr);
 }
 
-static inline void __change_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	test_and_change_bit(nr, addr);
 }
 
 /*  Apparently, at least some of these are allowed to be non-atomic  */
-static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_clear_bit(nr, addr);
 }
 
-static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_set_bit(nr, addr);
 }
 
-static inline int __test_and_change_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_change_bit(nr, addr);
 }
 
-static inline int __test_bit(int nr, const volatile unsigned long *addr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	int retval;
 
@@ -172,7 +179,7 @@ static inline int __test_bit(int nr, const volatile unsigned long *addr)
 	return retval;
 }
 
-#define test_bit(nr, addr) __test_bit(nr, addr)
+#define __test_bit(nr, addr) test_bit(nr, addr)
 
 /*
  * ffz - find first zero in word.
diff --git a/arch/ia64/include/asm/bitops.h b/arch/ia64/include/asm/bitops.h
index 577be93c0818..4267a217a503 100644
--- a/arch/ia64/include/asm/bitops.h
+++ b/arch/ia64/include/asm/bitops.h
@@ -61,8 +61,8 @@ set_bit (int nr, volatile void *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static __inline__ void
-__set_bit (int nr, volatile void *addr)
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	*((__u32 *) addr + (nr >> 5)) |= (1 << (nr & 31));
 }
@@ -143,8 +143,8 @@ __clear_bit_unlock(int nr, void *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static __inline__ void
-__clear_bit (int nr, volatile void *addr)
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	*((__u32 *) addr + (nr >> 5)) &= ~(1 << (nr & 31));
 }
@@ -183,8 +183,8 @@ change_bit (int nr, volatile void *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static __inline__ void
-__change_bit (int nr, volatile void *addr)
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	*((__u32 *) addr + (nr >> 5)) ^= (1 << (nr & 31));
 }
@@ -232,8 +232,8 @@ test_and_set_bit (int nr, volatile void *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static __inline__ int
-__test_and_set_bit (int nr, volatile void *addr)
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__u32 *p = (__u32 *) addr + (nr >> 5);
 	__u32 m = 1 << (nr & 31);
@@ -277,8 +277,8 @@ test_and_clear_bit (int nr, volatile void *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static __inline__ int
-__test_and_clear_bit(int nr, volatile void * addr)
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__u32 *p = (__u32 *) addr + (nr >> 5);
 	__u32 m = 1 << (nr & 31);
@@ -320,8 +320,8 @@ test_and_change_bit (int nr, volatile void *addr)
  *
  * This operation is non-atomic and can be reordered.
  */
-static __inline__ int
-__test_and_change_bit (int nr, void *addr)
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__u32 old, bit = (1 << (nr & 31));
 	__u32 *m = (__u32 *) addr + (nr >> 5);
@@ -331,8 +331,8 @@ __test_and_change_bit (int nr, void *addr)
 	return (old & bit) != 0;
 }
 
-static __inline__ int
-test_bit (int nr, const volatile void *addr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1 & (((const volatile __u32 *) addr)[nr >> 5] >> (nr & 31));
 }
diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
index 51283db53667..9d44bd4713cb 100644
--- a/arch/m68k/include/asm/bitops.h
+++ b/arch/m68k/include/asm/bitops.h
@@ -65,8 +65,11 @@ static inline void bfset_mem_set_bit(int nr, volatile unsigned long *vaddr)
 				bfset_mem_set_bit(nr, vaddr))
 #endif
 
-#define __set_bit(nr, vaddr)	set_bit(nr, vaddr)
-
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	set_bit(nr, addr);
+}
 
 static inline void bclr_reg_clear_bit(int nr, volatile unsigned long *vaddr)
 {
@@ -105,8 +108,11 @@ static inline void bfclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
 				bfclr_mem_clear_bit(nr, vaddr))
 #endif
 
-#define __clear_bit(nr, vaddr)	clear_bit(nr, vaddr)
-
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	clear_bit(nr, addr);
+}
 
 static inline void bchg_reg_change_bit(int nr, volatile unsigned long *vaddr)
 {
@@ -145,12 +151,16 @@ static inline void bfchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
 				bfchg_mem_change_bit(nr, vaddr))
 #endif
 
-#define __change_bit(nr, vaddr)	change_bit(nr, vaddr)
-
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	change_bit(nr, addr);
+}
 
-static inline int test_bit(int nr, const volatile unsigned long *vaddr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
-	return (vaddr[nr >> 5] & (1UL << (nr & 31))) != 0;
+	return (addr[nr >> 5] & (1UL << (nr & 31))) != 0;
 }
 
 
@@ -201,8 +211,11 @@ static inline int bfset_mem_test_and_set_bit(int nr,
 					bfset_mem_test_and_set_bit(nr, vaddr))
 #endif
 
-#define __test_and_set_bit(nr, vaddr)	test_and_set_bit(nr, vaddr)
-
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	return test_and_set_bit(nr, addr);
+}
 
 static inline int bclr_reg_test_and_clear_bit(int nr,
 					      volatile unsigned long *vaddr)
@@ -251,8 +264,11 @@ static inline int bfclr_mem_test_and_clear_bit(int nr,
 					bfclr_mem_test_and_clear_bit(nr, vaddr))
 #endif
 
-#define __test_and_clear_bit(nr, vaddr)	test_and_clear_bit(nr, vaddr)
-
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	return test_and_clear_bit(nr, addr);
+}
 
 static inline int bchg_reg_test_and_change_bit(int nr,
 					       volatile unsigned long *vaddr)
@@ -301,8 +317,11 @@ static inline int bfchg_mem_test_and_change_bit(int nr,
 					bfchg_mem_test_and_change_bit(nr, vaddr))
 #endif
 
-#define __test_and_change_bit(nr, vaddr) test_and_change_bit(nr, vaddr)
-
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	return test_and_change_bit(nr, addr);
+}
 
 /*
  *	The true 68020 and more advanced processors support the "bfffo"
diff --git a/arch/sh/include/asm/bitops-op32.h b/arch/sh/include/asm/bitops-op32.h
index cfe5465acce7..dcd85866a394 100644
--- a/arch/sh/include/asm/bitops-op32.h
+++ b/arch/sh/include/asm/bitops-op32.h
@@ -2,6 +2,8 @@
 #ifndef __ASM_SH_BITOPS_OP32_H
 #define __ASM_SH_BITOPS_OP32_H
 
+#include <linux/bits.h>
+
 /*
  * The bit modifying instructions on SH-2A are only capable of working
  * with a 3-bit immediate, which signifies the shift position for the bit
@@ -16,7 +18,8 @@
 #define BYTE_OFFSET(nr)		((nr) % BITS_PER_BYTE)
 #endif
 
-static inline void __set_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	if (__builtin_constant_p(nr)) {
 		__asm__ __volatile__ (
@@ -33,7 +36,8 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
 	}
 }
 
-static inline void __clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	if (__builtin_constant_p(nr)) {
 		__asm__ __volatile__ (
@@ -60,7 +64,8 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static inline void __change_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	if (__builtin_constant_p(nr)) {
 		__asm__ __volatile__ (
@@ -87,7 +92,8 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -106,7 +112,8 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -117,8 +124,8 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
 }
 
 /* WARNING: non atomic and it can be reordered! */
-static inline int __test_and_change_bit(int nr,
-					    volatile unsigned long *addr)
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -133,7 +140,8 @@ static inline int __test_and_change_bit(int nr,
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
-static inline int test_bit(int nr, const volatile unsigned long *addr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
index 202d8a3b40e1..249b2a91c174 100644
--- a/include/asm-generic/bitops/generic-non-atomic.h
+++ b/include/asm-generic/bitops/generic-non-atomic.h
@@ -23,7 +23,7 @@
  * may be that only one operation succeeds.
  */
 static __always_inline void
-gen___set_bit(unsigned int nr, volatile unsigned long *addr)
+gen___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -32,7 +32,7 @@ gen___set_bit(unsigned int nr, volatile unsigned long *addr)
 }
 
 static __always_inline void
-gen___clear_bit(unsigned int nr, volatile unsigned long *addr)
+gen___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -49,8 +49,8 @@ gen___clear_bit(unsigned int nr, volatile unsigned long *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static __always_inline
-void gen___change_bit(unsigned int nr, volatile unsigned long *addr)
+static __always_inline void
+gen___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -67,8 +67,8 @@ void gen___change_bit(unsigned int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static __always_inline int
-gen___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
+static __always_inline bool
+gen___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -87,8 +87,8 @@ gen___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static __always_inline int
-gen___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
+static __always_inline bool
+gen___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -99,8 +99,8 @@ gen___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
 }
 
 /* WARNING: non atomic and it can be reordered! */
-static __always_inline int
-gen___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
+static __always_inline bool
+gen___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -115,8 +115,8 @@ gen___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
-static __always_inline int
-gen_test_bit(unsigned int nr, const volatile unsigned long *addr)
+static __always_inline bool
+gen_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
 	unsigned long mask = BIT_MASK(nr);
diff --git a/include/asm-generic/bitops/instrumented-non-atomic.h b/include/asm-generic/bitops/instrumented-non-atomic.h
index 7ab1ecc37782..b019f77ef21c 100644
--- a/include/asm-generic/bitops/instrumented-non-atomic.h
+++ b/include/asm-generic/bitops/instrumented-non-atomic.h
@@ -22,7 +22,8 @@
  * region of memory concurrently, the effect may be that only one operation
  * succeeds.
  */
-static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___set_bit(nr, addr);
@@ -37,7 +38,8 @@ static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
  * region of memory concurrently, the effect may be that only one operation
  * succeeds.
  */
-static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___clear_bit(nr, addr);
@@ -52,7 +54,8 @@ static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
  * region of memory concurrently, the effect may be that only one operation
  * succeeds.
  */
-static __always_inline void __change_bit(long nr, volatile unsigned long *addr)
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___change_bit(nr, addr);
@@ -90,7 +93,8 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
  * This operation is non-atomic. If two instances of this operation race, one
  * can appear to succeed but actually fail.
  */
-static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_set_bit(nr, addr);
@@ -104,7 +108,8 @@ static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *
  * This operation is non-atomic. If two instances of this operation race, one
  * can appear to succeed but actually fail.
  */
-static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_clear_bit(nr, addr);
@@ -118,7 +123,8 @@ static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long
  * This operation is non-atomic. If two instances of this operation race, one
  * can appear to succeed but actually fail.
  */
-static __always_inline bool __test_and_change_bit(long nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_change_bit(nr, addr);
@@ -129,7 +135,8 @@ static __always_inline bool __test_and_change_bit(long nr, volatile unsigned lon
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
-static __always_inline bool test_bit(long nr, const volatile unsigned long *addr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long));
 	return arch_test_bit(nr, addr);
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 7aaed501f768..5520ac9b1c24 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -26,12 +26,25 @@ extern unsigned int __sw_hweight16(unsigned int w);
 extern unsigned int __sw_hweight32(unsigned int w);
 extern unsigned long __sw_hweight64(__u64 w);
 
+#include <asm-generic/bitops/generic-non-atomic.h>
+
 /*
  * Include this here because some architectures need generic_ffs/fls in
  * scope
  */
 #include <asm/bitops.h>
 
+/* Check that the bitops prototypes are sane */
+#define __check_bitop_pr(name)	static_assert(__same_type(name, gen_##name))
+__check_bitop_pr(__set_bit);
+__check_bitop_pr(__clear_bit);
+__check_bitop_pr(__change_bit);
+__check_bitop_pr(__test_and_set_bit);
+__check_bitop_pr(__test_and_clear_bit);
+__check_bitop_pr(__test_and_change_bit);
+__check_bitop_pr(test_bit);
+#undef __check_bitop_pr
+
 static inline int get_bitmask_order(unsigned int count)
 {
 	int order;
diff --git a/tools/include/asm-generic/bitops/non-atomic.h b/tools/include/asm-generic/bitops/non-atomic.h
index 7e10c4b50c5d..e5e78e42e57b 100644
--- a/tools/include/asm-generic/bitops/non-atomic.h
+++ b/tools/include/asm-generic/bitops/non-atomic.h
@@ -2,7 +2,7 @@
 #ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
 #define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
 
-#include <asm/types.h>
+#include <linux/bits.h>
 
 /**
  * __set_bit - Set a bit in memory
@@ -13,7 +13,8 @@
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static inline void __set_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -21,7 +22,8 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
 	*p  |= mask;
 }
 
-static inline void __clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -38,7 +40,8 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static inline void __change_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -55,7 +58,8 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -74,7 +78,8 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -85,8 +90,8 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
 }
 
 /* WARNING: non atomic and it can be reordered! */
-static inline int __test_and_change_bit(int nr,
-					    volatile unsigned long *addr)
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -101,7 +106,8 @@ static inline int __test_and_change_bit(int nr,
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
-static inline int test_bit(int nr, const volatile unsigned long *addr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 4/6] bitops: unify non-atomic bitops prototypes across architectures
@ 2022-06-06 11:49   ` Alexander Lobakin
  0 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-06 11:49 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

Currently, there is a mess with the prototypes of the non-atomic
bitops across the different architectures:

ret	bool, int, unsigned long
nr	int, long, unsigned int, unsigned long
addr	volatile unsigned long *, volatile void *

Thankfully, it doesn't provoke any bugs, but can sometimes make
the compiler angry when it's not handy at all.
Adjust all the prototypes to the following standard:

ret	bool				retval can be only 0 or 1
nr	unsigned long			native; signed makes no sense
addr	volatile unsigned long *	bitmaps are arrays of ulongs

Finally, add some static assertions in order to prevent people from
making a mess in this room again.
I also used the %__always_inline attribute consistently they always
get resolved to the actual operations.

Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 arch/alpha/include/asm/bitops.h               | 28 +++++------
 arch/hexagon/include/asm/bitops.h             | 23 +++++----
 arch/ia64/include/asm/bitops.h                | 28 +++++------
 arch/m68k/include/asm/bitops.h                | 47 +++++++++++++------
 arch/sh/include/asm/bitops-op32.h             | 24 ++++++----
 .../asm-generic/bitops/generic-non-atomic.h   | 24 +++++-----
 .../bitops/instrumented-non-atomic.h          | 21 ++++++---
 include/linux/bitops.h                        | 13 +++++
 tools/include/asm-generic/bitops/non-atomic.h | 24 ++++++----
 9 files changed, 146 insertions(+), 86 deletions(-)

diff --git a/arch/alpha/include/asm/bitops.h b/arch/alpha/include/asm/bitops.h
index e1d8483a45f2..381ad2eae4b4 100644
--- a/arch/alpha/include/asm/bitops.h
+++ b/arch/alpha/include/asm/bitops.h
@@ -46,8 +46,8 @@ set_bit(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static inline void
-__set_bit(unsigned long nr, volatile void * addr)
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	int *m = ((int *) addr) + (nr >> 5);
 
@@ -82,8 +82,8 @@ clear_bit_unlock(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static __inline__ void
-__clear_bit(unsigned long nr, volatile void * addr)
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	int *m = ((int *) addr) + (nr >> 5);
 
@@ -118,8 +118,8 @@ change_bit(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static __inline__ void
-__change_bit(unsigned long nr, volatile void * addr)
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	int *m = ((int *) addr) + (nr >> 5);
 
@@ -186,8 +186,8 @@ test_and_set_bit_lock(unsigned long nr, volatile void *addr)
 /*
  * WARNING: non atomic version.
  */
-static inline int
-__test_and_set_bit(unsigned long nr, volatile void * addr)
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = 1 << (nr & 0x1f);
 	int *m = ((int *) addr) + (nr >> 5);
@@ -230,8 +230,8 @@ test_and_clear_bit(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static inline int
-__test_and_clear_bit(unsigned long nr, volatile void * addr)
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = 1 << (nr & 0x1f);
 	int *m = ((int *) addr) + (nr >> 5);
@@ -272,8 +272,8 @@ test_and_change_bit(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static __inline__ int
-__test_and_change_bit(unsigned long nr, volatile void * addr)
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = 1 << (nr & 0x1f);
 	int *m = ((int *) addr) + (nr >> 5);
@@ -283,8 +283,8 @@ __test_and_change_bit(unsigned long nr, volatile void * addr)
 	return (old & mask) != 0;
 }
 
-static inline int
-test_bit(int nr, const volatile void * addr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return (1UL & (((const int *) addr)[nr >> 5] >> (nr & 31))) != 0UL;
 }
diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h
index 75d6ba3643b8..a3bfe3a8d4b7 100644
--- a/arch/hexagon/include/asm/bitops.h
+++ b/arch/hexagon/include/asm/bitops.h
@@ -127,38 +127,45 @@ static inline void change_bit(int nr, volatile void *addr)
  * be atomic, particularly for things like slab_lock and slab_unlock.
  *
  */
-static inline void __clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	test_and_clear_bit(nr, addr);
 }
 
-static inline void __set_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	test_and_set_bit(nr, addr);
 }
 
-static inline void __change_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	test_and_change_bit(nr, addr);
 }
 
 /*  Apparently, at least some of these are allowed to be non-atomic  */
-static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_clear_bit(nr, addr);
 }
 
-static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_set_bit(nr, addr);
 }
 
-static inline int __test_and_change_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_change_bit(nr, addr);
 }
 
-static inline int __test_bit(int nr, const volatile unsigned long *addr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	int retval;
 
@@ -172,7 +179,7 @@ static inline int __test_bit(int nr, const volatile unsigned long *addr)
 	return retval;
 }
 
-#define test_bit(nr, addr) __test_bit(nr, addr)
+#define __test_bit(nr, addr) test_bit(nr, addr)
 
 /*
  * ffz - find first zero in word.
diff --git a/arch/ia64/include/asm/bitops.h b/arch/ia64/include/asm/bitops.h
index 577be93c0818..4267a217a503 100644
--- a/arch/ia64/include/asm/bitops.h
+++ b/arch/ia64/include/asm/bitops.h
@@ -61,8 +61,8 @@ set_bit (int nr, volatile void *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static __inline__ void
-__set_bit (int nr, volatile void *addr)
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	*((__u32 *) addr + (nr >> 5)) |= (1 << (nr & 31));
 }
@@ -143,8 +143,8 @@ __clear_bit_unlock(int nr, void *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static __inline__ void
-__clear_bit (int nr, volatile void *addr)
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	*((__u32 *) addr + (nr >> 5)) &= ~(1 << (nr & 31));
 }
@@ -183,8 +183,8 @@ change_bit (int nr, volatile void *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static __inline__ void
-__change_bit (int nr, volatile void *addr)
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	*((__u32 *) addr + (nr >> 5)) ^= (1 << (nr & 31));
 }
@@ -232,8 +232,8 @@ test_and_set_bit (int nr, volatile void *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static __inline__ int
-__test_and_set_bit (int nr, volatile void *addr)
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__u32 *p = (__u32 *) addr + (nr >> 5);
 	__u32 m = 1 << (nr & 31);
@@ -277,8 +277,8 @@ test_and_clear_bit (int nr, volatile void *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static __inline__ int
-__test_and_clear_bit(int nr, volatile void * addr)
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__u32 *p = (__u32 *) addr + (nr >> 5);
 	__u32 m = 1 << (nr & 31);
@@ -320,8 +320,8 @@ test_and_change_bit (int nr, volatile void *addr)
  *
  * This operation is non-atomic and can be reordered.
  */
-static __inline__ int
-__test_and_change_bit (int nr, void *addr)
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__u32 old, bit = (1 << (nr & 31));
 	__u32 *m = (__u32 *) addr + (nr >> 5);
@@ -331,8 +331,8 @@ __test_and_change_bit (int nr, void *addr)
 	return (old & bit) != 0;
 }
 
-static __inline__ int
-test_bit (int nr, const volatile void *addr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1 & (((const volatile __u32 *) addr)[nr >> 5] >> (nr & 31));
 }
diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
index 51283db53667..9d44bd4713cb 100644
--- a/arch/m68k/include/asm/bitops.h
+++ b/arch/m68k/include/asm/bitops.h
@@ -65,8 +65,11 @@ static inline void bfset_mem_set_bit(int nr, volatile unsigned long *vaddr)
 				bfset_mem_set_bit(nr, vaddr))
 #endif
 
-#define __set_bit(nr, vaddr)	set_bit(nr, vaddr)
-
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	set_bit(nr, addr);
+}
 
 static inline void bclr_reg_clear_bit(int nr, volatile unsigned long *vaddr)
 {
@@ -105,8 +108,11 @@ static inline void bfclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
 				bfclr_mem_clear_bit(nr, vaddr))
 #endif
 
-#define __clear_bit(nr, vaddr)	clear_bit(nr, vaddr)
-
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	clear_bit(nr, addr);
+}
 
 static inline void bchg_reg_change_bit(int nr, volatile unsigned long *vaddr)
 {
@@ -145,12 +151,16 @@ static inline void bfchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
 				bfchg_mem_change_bit(nr, vaddr))
 #endif
 
-#define __change_bit(nr, vaddr)	change_bit(nr, vaddr)
-
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	change_bit(nr, addr);
+}
 
-static inline int test_bit(int nr, const volatile unsigned long *vaddr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
-	return (vaddr[nr >> 5] & (1UL << (nr & 31))) != 0;
+	return (addr[nr >> 5] & (1UL << (nr & 31))) != 0;
 }
 
 
@@ -201,8 +211,11 @@ static inline int bfset_mem_test_and_set_bit(int nr,
 					bfset_mem_test_and_set_bit(nr, vaddr))
 #endif
 
-#define __test_and_set_bit(nr, vaddr)	test_and_set_bit(nr, vaddr)
-
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	return test_and_set_bit(nr, addr);
+}
 
 static inline int bclr_reg_test_and_clear_bit(int nr,
 					      volatile unsigned long *vaddr)
@@ -251,8 +264,11 @@ static inline int bfclr_mem_test_and_clear_bit(int nr,
 					bfclr_mem_test_and_clear_bit(nr, vaddr))
 #endif
 
-#define __test_and_clear_bit(nr, vaddr)	test_and_clear_bit(nr, vaddr)
-
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	return test_and_clear_bit(nr, addr);
+}
 
 static inline int bchg_reg_test_and_change_bit(int nr,
 					       volatile unsigned long *vaddr)
@@ -301,8 +317,11 @@ static inline int bfchg_mem_test_and_change_bit(int nr,
 					bfchg_mem_test_and_change_bit(nr, vaddr))
 #endif
 
-#define __test_and_change_bit(nr, vaddr) test_and_change_bit(nr, vaddr)
-
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	return test_and_change_bit(nr, addr);
+}
 
 /*
  *	The true 68020 and more advanced processors support the "bfffo"
diff --git a/arch/sh/include/asm/bitops-op32.h b/arch/sh/include/asm/bitops-op32.h
index cfe5465acce7..dcd85866a394 100644
--- a/arch/sh/include/asm/bitops-op32.h
+++ b/arch/sh/include/asm/bitops-op32.h
@@ -2,6 +2,8 @@
 #ifndef __ASM_SH_BITOPS_OP32_H
 #define __ASM_SH_BITOPS_OP32_H
 
+#include <linux/bits.h>
+
 /*
  * The bit modifying instructions on SH-2A are only capable of working
  * with a 3-bit immediate, which signifies the shift position for the bit
@@ -16,7 +18,8 @@
 #define BYTE_OFFSET(nr)		((nr) % BITS_PER_BYTE)
 #endif
 
-static inline void __set_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	if (__builtin_constant_p(nr)) {
 		__asm__ __volatile__ (
@@ -33,7 +36,8 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
 	}
 }
 
-static inline void __clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	if (__builtin_constant_p(nr)) {
 		__asm__ __volatile__ (
@@ -60,7 +64,8 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static inline void __change_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	if (__builtin_constant_p(nr)) {
 		__asm__ __volatile__ (
@@ -87,7 +92,8 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -106,7 +112,8 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -117,8 +124,8 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
 }
 
 /* WARNING: non atomic and it can be reordered! */
-static inline int __test_and_change_bit(int nr,
-					    volatile unsigned long *addr)
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -133,7 +140,8 @@ static inline int __test_and_change_bit(int nr,
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
-static inline int test_bit(int nr, const volatile unsigned long *addr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
index 202d8a3b40e1..249b2a91c174 100644
--- a/include/asm-generic/bitops/generic-non-atomic.h
+++ b/include/asm-generic/bitops/generic-non-atomic.h
@@ -23,7 +23,7 @@
  * may be that only one operation succeeds.
  */
 static __always_inline void
-gen___set_bit(unsigned int nr, volatile unsigned long *addr)
+gen___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -32,7 +32,7 @@ gen___set_bit(unsigned int nr, volatile unsigned long *addr)
 }
 
 static __always_inline void
-gen___clear_bit(unsigned int nr, volatile unsigned long *addr)
+gen___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -49,8 +49,8 @@ gen___clear_bit(unsigned int nr, volatile unsigned long *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static __always_inline
-void gen___change_bit(unsigned int nr, volatile unsigned long *addr)
+static __always_inline void
+gen___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -67,8 +67,8 @@ void gen___change_bit(unsigned int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static __always_inline int
-gen___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
+static __always_inline bool
+gen___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -87,8 +87,8 @@ gen___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static __always_inline int
-gen___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
+static __always_inline bool
+gen___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -99,8 +99,8 @@ gen___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
 }
 
 /* WARNING: non atomic and it can be reordered! */
-static __always_inline int
-gen___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
+static __always_inline bool
+gen___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -115,8 +115,8 @@ gen___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
-static __always_inline int
-gen_test_bit(unsigned int nr, const volatile unsigned long *addr)
+static __always_inline bool
+gen_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
 	unsigned long mask = BIT_MASK(nr);
diff --git a/include/asm-generic/bitops/instrumented-non-atomic.h b/include/asm-generic/bitops/instrumented-non-atomic.h
index 7ab1ecc37782..b019f77ef21c 100644
--- a/include/asm-generic/bitops/instrumented-non-atomic.h
+++ b/include/asm-generic/bitops/instrumented-non-atomic.h
@@ -22,7 +22,8 @@
  * region of memory concurrently, the effect may be that only one operation
  * succeeds.
  */
-static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___set_bit(nr, addr);
@@ -37,7 +38,8 @@ static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
  * region of memory concurrently, the effect may be that only one operation
  * succeeds.
  */
-static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___clear_bit(nr, addr);
@@ -52,7 +54,8 @@ static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
  * region of memory concurrently, the effect may be that only one operation
  * succeeds.
  */
-static __always_inline void __change_bit(long nr, volatile unsigned long *addr)
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___change_bit(nr, addr);
@@ -90,7 +93,8 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
  * This operation is non-atomic. If two instances of this operation race, one
  * can appear to succeed but actually fail.
  */
-static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_set_bit(nr, addr);
@@ -104,7 +108,8 @@ static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *
  * This operation is non-atomic. If two instances of this operation race, one
  * can appear to succeed but actually fail.
  */
-static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_clear_bit(nr, addr);
@@ -118,7 +123,8 @@ static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long
  * This operation is non-atomic. If two instances of this operation race, one
  * can appear to succeed but actually fail.
  */
-static __always_inline bool __test_and_change_bit(long nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_change_bit(nr, addr);
@@ -129,7 +135,8 @@ static __always_inline bool __test_and_change_bit(long nr, volatile unsigned lon
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
-static __always_inline bool test_bit(long nr, const volatile unsigned long *addr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long));
 	return arch_test_bit(nr, addr);
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 7aaed501f768..5520ac9b1c24 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -26,12 +26,25 @@ extern unsigned int __sw_hweight16(unsigned int w);
 extern unsigned int __sw_hweight32(unsigned int w);
 extern unsigned long __sw_hweight64(__u64 w);
 
+#include <asm-generic/bitops/generic-non-atomic.h>
+
 /*
  * Include this here because some architectures need generic_ffs/fls in
  * scope
  */
 #include <asm/bitops.h>
 
+/* Check that the bitops prototypes are sane */
+#define __check_bitop_pr(name)	static_assert(__same_type(name, gen_##name))
+__check_bitop_pr(__set_bit);
+__check_bitop_pr(__clear_bit);
+__check_bitop_pr(__change_bit);
+__check_bitop_pr(__test_and_set_bit);
+__check_bitop_pr(__test_and_clear_bit);
+__check_bitop_pr(__test_and_change_bit);
+__check_bitop_pr(test_bit);
+#undef __check_bitop_pr
+
 static inline int get_bitmask_order(unsigned int count)
 {
 	int order;
diff --git a/tools/include/asm-generic/bitops/non-atomic.h b/tools/include/asm-generic/bitops/non-atomic.h
index 7e10c4b50c5d..e5e78e42e57b 100644
--- a/tools/include/asm-generic/bitops/non-atomic.h
+++ b/tools/include/asm-generic/bitops/non-atomic.h
@@ -2,7 +2,7 @@
 #ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
 #define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
 
-#include <asm/types.h>
+#include <linux/bits.h>
 
 /**
  * __set_bit - Set a bit in memory
@@ -13,7 +13,8 @@
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static inline void __set_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -21,7 +22,8 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
 	*p  |= mask;
 }
 
-static inline void __clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -38,7 +40,8 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static inline void __change_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -55,7 +58,8 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -74,7 +78,8 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -85,8 +90,8 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
 }
 
 /* WARNING: non atomic and it can be reordered! */
-static inline int __test_and_change_bit(int nr,
-					    volatile unsigned long *addr)
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -101,7 +106,8 @@ static inline int __test_and_change_bit(int nr,
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
-static inline int test_bit(int nr, const volatile unsigned long *addr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
-- 
2.36.1

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 4/6] bitops: unify non-atomic bitops prototypes across architectures
@ 2022-06-06 11:49   ` Alexander Lobakin
  0 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-06 11:49 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux

Currently, there is a mess with the prototypes of the non-atomic
bitops across the different architectures:

ret	bool, int, unsigned long
nr	int, long, unsigned int, unsigned long
addr	volatile unsigned long *, volatile void *

Thankfully, it doesn't provoke any bugs, but can sometimes make
the compiler angry when it's not handy at all.
Adjust all the prototypes to the following standard:

ret	bool				retval can be only 0 or 1
nr	unsigned long			native; signed makes no sense
addr	volatile unsigned long *	bitmaps are arrays of ulongs

Finally, add some static assertions in order to prevent people from
making a mess in this room again.
I also used the %__always_inline attribute consistently they always
get resolved to the actual operations.

Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 arch/alpha/include/asm/bitops.h               | 28 +++++------
 arch/hexagon/include/asm/bitops.h             | 23 +++++----
 arch/ia64/include/asm/bitops.h                | 28 +++++------
 arch/m68k/include/asm/bitops.h                | 47 +++++++++++++------
 arch/sh/include/asm/bitops-op32.h             | 24 ++++++----
 .../asm-generic/bitops/generic-non-atomic.h   | 24 +++++-----
 .../bitops/instrumented-non-atomic.h          | 21 ++++++---
 include/linux/bitops.h                        | 13 +++++
 tools/include/asm-generic/bitops/non-atomic.h | 24 ++++++----
 9 files changed, 146 insertions(+), 86 deletions(-)

diff --git a/arch/alpha/include/asm/bitops.h b/arch/alpha/include/asm/bitops.h
index e1d8483a45f2..381ad2eae4b4 100644
--- a/arch/alpha/include/asm/bitops.h
+++ b/arch/alpha/include/asm/bitops.h
@@ -46,8 +46,8 @@ set_bit(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static inline void
-__set_bit(unsigned long nr, volatile void * addr)
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	int *m = ((int *) addr) + (nr >> 5);
 
@@ -82,8 +82,8 @@ clear_bit_unlock(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static __inline__ void
-__clear_bit(unsigned long nr, volatile void * addr)
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	int *m = ((int *) addr) + (nr >> 5);
 
@@ -118,8 +118,8 @@ change_bit(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static __inline__ void
-__change_bit(unsigned long nr, volatile void * addr)
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	int *m = ((int *) addr) + (nr >> 5);
 
@@ -186,8 +186,8 @@ test_and_set_bit_lock(unsigned long nr, volatile void *addr)
 /*
  * WARNING: non atomic version.
  */
-static inline int
-__test_and_set_bit(unsigned long nr, volatile void * addr)
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = 1 << (nr & 0x1f);
 	int *m = ((int *) addr) + (nr >> 5);
@@ -230,8 +230,8 @@ test_and_clear_bit(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static inline int
-__test_and_clear_bit(unsigned long nr, volatile void * addr)
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = 1 << (nr & 0x1f);
 	int *m = ((int *) addr) + (nr >> 5);
@@ -272,8 +272,8 @@ test_and_change_bit(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static __inline__ int
-__test_and_change_bit(unsigned long nr, volatile void * addr)
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = 1 << (nr & 0x1f);
 	int *m = ((int *) addr) + (nr >> 5);
@@ -283,8 +283,8 @@ __test_and_change_bit(unsigned long nr, volatile void * addr)
 	return (old & mask) != 0;
 }
 
-static inline int
-test_bit(int nr, const volatile void * addr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return (1UL & (((const int *) addr)[nr >> 5] >> (nr & 31))) != 0UL;
 }
diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h
index 75d6ba3643b8..a3bfe3a8d4b7 100644
--- a/arch/hexagon/include/asm/bitops.h
+++ b/arch/hexagon/include/asm/bitops.h
@@ -127,38 +127,45 @@ static inline void change_bit(int nr, volatile void *addr)
  * be atomic, particularly for things like slab_lock and slab_unlock.
  *
  */
-static inline void __clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	test_and_clear_bit(nr, addr);
 }
 
-static inline void __set_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	test_and_set_bit(nr, addr);
 }
 
-static inline void __change_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	test_and_change_bit(nr, addr);
 }
 
 /*  Apparently, at least some of these are allowed to be non-atomic  */
-static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_clear_bit(nr, addr);
 }
 
-static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_set_bit(nr, addr);
 }
 
-static inline int __test_and_change_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_change_bit(nr, addr);
 }
 
-static inline int __test_bit(int nr, const volatile unsigned long *addr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	int retval;
 
@@ -172,7 +179,7 @@ static inline int __test_bit(int nr, const volatile unsigned long *addr)
 	return retval;
 }
 
-#define test_bit(nr, addr) __test_bit(nr, addr)
+#define __test_bit(nr, addr) test_bit(nr, addr)
 
 /*
  * ffz - find first zero in word.
diff --git a/arch/ia64/include/asm/bitops.h b/arch/ia64/include/asm/bitops.h
index 577be93c0818..4267a217a503 100644
--- a/arch/ia64/include/asm/bitops.h
+++ b/arch/ia64/include/asm/bitops.h
@@ -61,8 +61,8 @@ set_bit (int nr, volatile void *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static __inline__ void
-__set_bit (int nr, volatile void *addr)
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	*((__u32 *) addr + (nr >> 5)) |= (1 << (nr & 31));
 }
@@ -143,8 +143,8 @@ __clear_bit_unlock(int nr, void *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static __inline__ void
-__clear_bit (int nr, volatile void *addr)
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	*((__u32 *) addr + (nr >> 5)) &= ~(1 << (nr & 31));
 }
@@ -183,8 +183,8 @@ change_bit (int nr, volatile void *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static __inline__ void
-__change_bit (int nr, volatile void *addr)
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	*((__u32 *) addr + (nr >> 5)) ^= (1 << (nr & 31));
 }
@@ -232,8 +232,8 @@ test_and_set_bit (int nr, volatile void *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static __inline__ int
-__test_and_set_bit (int nr, volatile void *addr)
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__u32 *p = (__u32 *) addr + (nr >> 5);
 	__u32 m = 1 << (nr & 31);
@@ -277,8 +277,8 @@ test_and_clear_bit (int nr, volatile void *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static __inline__ int
-__test_and_clear_bit(int nr, volatile void * addr)
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__u32 *p = (__u32 *) addr + (nr >> 5);
 	__u32 m = 1 << (nr & 31);
@@ -320,8 +320,8 @@ test_and_change_bit (int nr, volatile void *addr)
  *
  * This operation is non-atomic and can be reordered.
  */
-static __inline__ int
-__test_and_change_bit (int nr, void *addr)
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__u32 old, bit = (1 << (nr & 31));
 	__u32 *m = (__u32 *) addr + (nr >> 5);
@@ -331,8 +331,8 @@ __test_and_change_bit (int nr, void *addr)
 	return (old & bit) != 0;
 }
 
-static __inline__ int
-test_bit (int nr, const volatile void *addr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1 & (((const volatile __u32 *) addr)[nr >> 5] >> (nr & 31));
 }
diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
index 51283db53667..9d44bd4713cb 100644
--- a/arch/m68k/include/asm/bitops.h
+++ b/arch/m68k/include/asm/bitops.h
@@ -65,8 +65,11 @@ static inline void bfset_mem_set_bit(int nr, volatile unsigned long *vaddr)
 				bfset_mem_set_bit(nr, vaddr))
 #endif
 
-#define __set_bit(nr, vaddr)	set_bit(nr, vaddr)
-
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	set_bit(nr, addr);
+}
 
 static inline void bclr_reg_clear_bit(int nr, volatile unsigned long *vaddr)
 {
@@ -105,8 +108,11 @@ static inline void bfclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
 				bfclr_mem_clear_bit(nr, vaddr))
 #endif
 
-#define __clear_bit(nr, vaddr)	clear_bit(nr, vaddr)
-
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	clear_bit(nr, addr);
+}
 
 static inline void bchg_reg_change_bit(int nr, volatile unsigned long *vaddr)
 {
@@ -145,12 +151,16 @@ static inline void bfchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
 				bfchg_mem_change_bit(nr, vaddr))
 #endif
 
-#define __change_bit(nr, vaddr)	change_bit(nr, vaddr)
-
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	change_bit(nr, addr);
+}
 
-static inline int test_bit(int nr, const volatile unsigned long *vaddr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
-	return (vaddr[nr >> 5] & (1UL << (nr & 31))) != 0;
+	return (addr[nr >> 5] & (1UL << (nr & 31))) != 0;
 }
 
 
@@ -201,8 +211,11 @@ static inline int bfset_mem_test_and_set_bit(int nr,
 					bfset_mem_test_and_set_bit(nr, vaddr))
 #endif
 
-#define __test_and_set_bit(nr, vaddr)	test_and_set_bit(nr, vaddr)
-
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	return test_and_set_bit(nr, addr);
+}
 
 static inline int bclr_reg_test_and_clear_bit(int nr,
 					      volatile unsigned long *vaddr)
@@ -251,8 +264,11 @@ static inline int bfclr_mem_test_and_clear_bit(int nr,
 					bfclr_mem_test_and_clear_bit(nr, vaddr))
 #endif
 
-#define __test_and_clear_bit(nr, vaddr)	test_and_clear_bit(nr, vaddr)
-
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	return test_and_clear_bit(nr, addr);
+}
 
 static inline int bchg_reg_test_and_change_bit(int nr,
 					       volatile unsigned long *vaddr)
@@ -301,8 +317,11 @@ static inline int bfchg_mem_test_and_change_bit(int nr,
 					bfchg_mem_test_and_change_bit(nr, vaddr))
 #endif
 
-#define __test_and_change_bit(nr, vaddr) test_and_change_bit(nr, vaddr)
-
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	return test_and_change_bit(nr, addr);
+}
 
 /*
  *	The true 68020 and more advanced processors support the "bfffo"
diff --git a/arch/sh/include/asm/bitops-op32.h b/arch/sh/include/asm/bitops-op32.h
index cfe5465acce7..dcd85866a394 100644
--- a/arch/sh/include/asm/bitops-op32.h
+++ b/arch/sh/include/asm/bitops-op32.h
@@ -2,6 +2,8 @@
 #ifndef __ASM_SH_BITOPS_OP32_H
 #define __ASM_SH_BITOPS_OP32_H
 
+#include <linux/bits.h>
+
 /*
  * The bit modifying instructions on SH-2A are only capable of working
  * with a 3-bit immediate, which signifies the shift position for the bit
@@ -16,7 +18,8 @@
 #define BYTE_OFFSET(nr)		((nr) % BITS_PER_BYTE)
 #endif
 
-static inline void __set_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	if (__builtin_constant_p(nr)) {
 		__asm__ __volatile__ (
@@ -33,7 +36,8 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
 	}
 }
 
-static inline void __clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	if (__builtin_constant_p(nr)) {
 		__asm__ __volatile__ (
@@ -60,7 +64,8 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static inline void __change_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	if (__builtin_constant_p(nr)) {
 		__asm__ __volatile__ (
@@ -87,7 +92,8 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -106,7 +112,8 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -117,8 +124,8 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
 }
 
 /* WARNING: non atomic and it can be reordered! */
-static inline int __test_and_change_bit(int nr,
-					    volatile unsigned long *addr)
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -133,7 +140,8 @@ static inline int __test_and_change_bit(int nr,
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
-static inline int test_bit(int nr, const volatile unsigned long *addr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
index 202d8a3b40e1..249b2a91c174 100644
--- a/include/asm-generic/bitops/generic-non-atomic.h
+++ b/include/asm-generic/bitops/generic-non-atomic.h
@@ -23,7 +23,7 @@
  * may be that only one operation succeeds.
  */
 static __always_inline void
-gen___set_bit(unsigned int nr, volatile unsigned long *addr)
+gen___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -32,7 +32,7 @@ gen___set_bit(unsigned int nr, volatile unsigned long *addr)
 }
 
 static __always_inline void
-gen___clear_bit(unsigned int nr, volatile unsigned long *addr)
+gen___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -49,8 +49,8 @@ gen___clear_bit(unsigned int nr, volatile unsigned long *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static __always_inline
-void gen___change_bit(unsigned int nr, volatile unsigned long *addr)
+static __always_inline void
+gen___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -67,8 +67,8 @@ void gen___change_bit(unsigned int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static __always_inline int
-gen___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
+static __always_inline bool
+gen___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -87,8 +87,8 @@ gen___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static __always_inline int
-gen___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
+static __always_inline bool
+gen___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -99,8 +99,8 @@ gen___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
 }
 
 /* WARNING: non atomic and it can be reordered! */
-static __always_inline int
-gen___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
+static __always_inline bool
+gen___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -115,8 +115,8 @@ gen___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
-static __always_inline int
-gen_test_bit(unsigned int nr, const volatile unsigned long *addr)
+static __always_inline bool
+gen_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
 	unsigned long mask = BIT_MASK(nr);
diff --git a/include/asm-generic/bitops/instrumented-non-atomic.h b/include/asm-generic/bitops/instrumented-non-atomic.h
index 7ab1ecc37782..b019f77ef21c 100644
--- a/include/asm-generic/bitops/instrumented-non-atomic.h
+++ b/include/asm-generic/bitops/instrumented-non-atomic.h
@@ -22,7 +22,8 @@
  * region of memory concurrently, the effect may be that only one operation
  * succeeds.
  */
-static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___set_bit(nr, addr);
@@ -37,7 +38,8 @@ static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
  * region of memory concurrently, the effect may be that only one operation
  * succeeds.
  */
-static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___clear_bit(nr, addr);
@@ -52,7 +54,8 @@ static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
  * region of memory concurrently, the effect may be that only one operation
  * succeeds.
  */
-static __always_inline void __change_bit(long nr, volatile unsigned long *addr)
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___change_bit(nr, addr);
@@ -90,7 +93,8 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
  * This operation is non-atomic. If two instances of this operation race, one
  * can appear to succeed but actually fail.
  */
-static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_set_bit(nr, addr);
@@ -104,7 +108,8 @@ static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *
  * This operation is non-atomic. If two instances of this operation race, one
  * can appear to succeed but actually fail.
  */
-static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_clear_bit(nr, addr);
@@ -118,7 +123,8 @@ static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long
  * This operation is non-atomic. If two instances of this operation race, one
  * can appear to succeed but actually fail.
  */
-static __always_inline bool __test_and_change_bit(long nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_change_bit(nr, addr);
@@ -129,7 +135,8 @@ static __always_inline bool __test_and_change_bit(long nr, volatile unsigned lon
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
-static __always_inline bool test_bit(long nr, const volatile unsigned long *addr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long));
 	return arch_test_bit(nr, addr);
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 7aaed501f768..5520ac9b1c24 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -26,12 +26,25 @@ extern unsigned int __sw_hweight16(unsigned int w);
 extern unsigned int __sw_hweight32(unsigned int w);
 extern unsigned long __sw_hweight64(__u64 w);
 
+#include <asm-generic/bitops/generic-non-atomic.h>
+
 /*
  * Include this here because some architectures need generic_ffs/fls in
  * scope
  */
 #include <asm/bitops.h>
 
+/* Check that the bitops prototypes are sane */
+#define __check_bitop_pr(name)	static_assert(__same_type(name, gen_##name))
+__check_bitop_pr(__set_bit);
+__check_bitop_pr(__clear_bit);
+__check_bitop_pr(__change_bit);
+__check_bitop_pr(__test_and_set_bit);
+__check_bitop_pr(__test_and_clear_bit);
+__check_bitop_pr(__test_and_change_bit);
+__check_bitop_pr(test_bit);
+#undef __check_bitop_pr
+
 static inline int get_bitmask_order(unsigned int count)
 {
 	int order;
diff --git a/tools/include/asm-generic/bitops/non-atomic.h b/tools/include/asm-generic/bitops/non-atomic.h
index 7e10c4b50c5d..e5e78e42e57b 100644
--- a/tools/include/asm-generic/bitops/non-atomic.h
+++ b/tools/include/asm-generic/bitops/non-atomic.h
@@ -2,7 +2,7 @@
 #ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
 #define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
 
-#include <asm/types.h>
+#include <linux/bits.h>
 
 /**
  * __set_bit - Set a bit in memory
@@ -13,7 +13,8 @@
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static inline void __set_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -21,7 +22,8 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
 	*p  |= mask;
 }
 
-static inline void __clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -38,7 +40,8 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static inline void __change_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -55,7 +58,8 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -74,7 +78,8 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -85,8 +90,8 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
 }
 
 /* WARNING: non atomic and it can be reordered! */
-static inline int __test_and_change_bit(int nr,
-					    volatile unsigned long *addr)
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -101,7 +106,8 @@ static inline int __test_and_change_bit(int nr,
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
-static inline int test_bit(int nr, const volatile unsigned long *addr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 5/6] bitops: wrap non-atomic bitops with a transparent macro
  2022-06-06 11:49 ` Alexander Lobakin
  (?)
@ 2022-06-06 11:49   ` Alexander Lobakin
  -1 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-06 11:49 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

In preparation for altering the non-atomic bitops with a macro, wrap
them in a transparent definition. This requires prepending one more
'_' to their names in order to be able to do that seamlessly.
sparc32 already has the triple-underscored functions, so I had to
rename them ('___' -> 'sp32_').

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 arch/alpha/include/asm/bitops.h               | 14 ++++----
 arch/hexagon/include/asm/bitops.h             | 16 +++++-----
 arch/ia64/include/asm/bitops.h                | 26 +++++++--------
 arch/m68k/include/asm/bitops.h                | 14 ++++----
 arch/sh/include/asm/bitops-op32.h             | 22 ++++++-------
 arch/sparc/include/asm/bitops_32.h            | 18 +++++------
 arch/sparc/lib/atomic32.c                     | 12 +++----
 .../bitops/instrumented-non-atomic.h          | 28 ++++++++--------
 include/asm-generic/bitops/non-atomic.h       | 14 ++++----
 include/linux/bitops.h                        | 32 ++++++++++++++-----
 tools/include/asm-generic/bitops/non-atomic.h | 24 +++++++-------
 tools/include/linux/bitops.h                  | 16 ++++++++++
 12 files changed, 134 insertions(+), 102 deletions(-)

diff --git a/arch/alpha/include/asm/bitops.h b/arch/alpha/include/asm/bitops.h
index 381ad2eae4b4..e7d119826062 100644
--- a/arch/alpha/include/asm/bitops.h
+++ b/arch/alpha/include/asm/bitops.h
@@ -47,7 +47,7 @@ set_bit(unsigned long nr, volatile void * addr)
  * WARNING: non atomic version.
  */
 static __always_inline void
-__set_bit(unsigned long nr, volatile unsigned long *addr)
+___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	int *m = ((int *) addr) + (nr >> 5);
 
@@ -83,7 +83,7 @@ clear_bit_unlock(unsigned long nr, volatile void * addr)
  * WARNING: non atomic version.
  */
 static __always_inline void
-__clear_bit(unsigned long nr, volatile unsigned long *addr)
+___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	int *m = ((int *) addr) + (nr >> 5);
 
@@ -119,7 +119,7 @@ change_bit(unsigned long nr, volatile void * addr)
  * WARNING: non atomic version.
  */
 static __always_inline void
-__change_bit(unsigned long nr, volatile unsigned long *addr)
+___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	int *m = ((int *) addr) + (nr >> 5);
 
@@ -187,7 +187,7 @@ test_and_set_bit_lock(unsigned long nr, volatile void *addr)
  * WARNING: non atomic version.
  */
 static __always_inline bool
-__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = 1 << (nr & 0x1f);
 	int *m = ((int *) addr) + (nr >> 5);
@@ -231,7 +231,7 @@ test_and_clear_bit(unsigned long nr, volatile void * addr)
  * WARNING: non atomic version.
  */
 static __always_inline bool
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = 1 << (nr & 0x1f);
 	int *m = ((int *) addr) + (nr >> 5);
@@ -273,7 +273,7 @@ test_and_change_bit(unsigned long nr, volatile void * addr)
  * WARNING: non atomic version.
  */
 static __always_inline bool
-__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = 1 << (nr & 0x1f);
 	int *m = ((int *) addr) + (nr >> 5);
@@ -284,7 +284,7 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 static __always_inline bool
-test_bit(unsigned long nr, const volatile unsigned long *addr)
+_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return (1UL & (((const int *) addr)[nr >> 5] >> (nr & 31))) != 0UL;
 }
diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h
index a3bfe3a8d4b7..37144f8e0b0c 100644
--- a/arch/hexagon/include/asm/bitops.h
+++ b/arch/hexagon/include/asm/bitops.h
@@ -128,44 +128,44 @@ static inline void change_bit(int nr, volatile void *addr)
  *
  */
 static __always_inline void
-__clear_bit(unsigned long nr, volatile unsigned long *addr)
+___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	test_and_clear_bit(nr, addr);
 }
 
 static __always_inline void
-__set_bit(unsigned long nr, volatile unsigned long *addr)
+___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	test_and_set_bit(nr, addr);
 }
 
 static __always_inline void
-__change_bit(unsigned long nr, volatile unsigned long *addr)
+___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	test_and_change_bit(nr, addr);
 }
 
 /*  Apparently, at least some of these are allowed to be non-atomic  */
 static __always_inline bool
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_clear_bit(nr, addr);
 }
 
 static __always_inline bool
-__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_set_bit(nr, addr);
 }
 
 static __always_inline bool
-__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_change_bit(nr, addr);
 }
 
 static __always_inline bool
-test_bit(unsigned long nr, const volatile unsigned long *addr)
+_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	int retval;
 
@@ -179,7 +179,7 @@ test_bit(unsigned long nr, const volatile unsigned long *addr)
 	return retval;
 }
 
-#define __test_bit(nr, addr) test_bit(nr, addr)
+#define __test_bit(nr, addr) _test_bit(nr, addr)
 
 /*
  * ffz - find first zero in word.
diff --git a/arch/ia64/include/asm/bitops.h b/arch/ia64/include/asm/bitops.h
index 4267a217a503..3cb231fb0283 100644
--- a/arch/ia64/include/asm/bitops.h
+++ b/arch/ia64/include/asm/bitops.h
@@ -53,7 +53,7 @@ set_bit (int nr, volatile void *addr)
 }
 
 /**
- * __set_bit - Set a bit in memory
+ * ___set_bit - Set a bit in memory
  * @nr: the bit to set
  * @addr: the address to start counting from
  *
@@ -62,7 +62,7 @@ set_bit (int nr, volatile void *addr)
  * may be that only one operation succeeds.
  */
 static __always_inline void
-__set_bit(unsigned long nr, volatile unsigned long *addr)
+___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	*((__u32 *) addr + (nr >> 5)) |= (1 << (nr & 31));
 }
@@ -135,7 +135,7 @@ __clear_bit_unlock(int nr, void *addr)
 }
 
 /**
- * __clear_bit - Clears a bit in memory (non-atomic version)
+ * ___clear_bit - Clears a bit in memory (non-atomic version)
  * @nr: the bit to clear
  * @addr: the address to start counting from
  *
@@ -144,7 +144,7 @@ __clear_bit_unlock(int nr, void *addr)
  * may be that only one operation succeeds.
  */
 static __always_inline void
-__clear_bit(unsigned long nr, volatile unsigned long *addr)
+___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	*((__u32 *) addr + (nr >> 5)) &= ~(1 << (nr & 31));
 }
@@ -175,7 +175,7 @@ change_bit (int nr, volatile void *addr)
 }
 
 /**
- * __change_bit - Toggle a bit in memory
+ * ___change_bit - Toggle a bit in memory
  * @nr: the bit to toggle
  * @addr: the address to start counting from
  *
@@ -184,7 +184,7 @@ change_bit (int nr, volatile void *addr)
  * may be that only one operation succeeds.
  */
 static __always_inline void
-__change_bit(unsigned long nr, volatile unsigned long *addr)
+___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	*((__u32 *) addr + (nr >> 5)) ^= (1 << (nr & 31));
 }
@@ -224,7 +224,7 @@ test_and_set_bit (int nr, volatile void *addr)
 #define test_and_set_bit_lock test_and_set_bit
 
 /**
- * __test_and_set_bit - Set a bit and return its old value
+ * ___test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
@@ -233,7 +233,7 @@ test_and_set_bit (int nr, volatile void *addr)
  * but actually fail.  You must protect multiple accesses with a lock.
  */
 static __always_inline bool
-__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__u32 *p = (__u32 *) addr + (nr >> 5);
 	__u32 m = 1 << (nr & 31);
@@ -269,7 +269,7 @@ test_and_clear_bit (int nr, volatile void *addr)
 }
 
 /**
- * __test_and_clear_bit - Clear a bit and return its old value
+ * ___test_and_clear_bit - Clear a bit and return its old value
  * @nr: Bit to clear
  * @addr: Address to count from
  *
@@ -278,7 +278,7 @@ test_and_clear_bit (int nr, volatile void *addr)
  * but actually fail.  You must protect multiple accesses with a lock.
  */
 static __always_inline bool
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__u32 *p = (__u32 *) addr + (nr >> 5);
 	__u32 m = 1 << (nr & 31);
@@ -314,14 +314,14 @@ test_and_change_bit (int nr, volatile void *addr)
 }
 
 /**
- * __test_and_change_bit - Change a bit and return its old value
+ * ___test_and_change_bit - Change a bit and return its old value
  * @nr: Bit to change
  * @addr: Address to count from
  *
  * This operation is non-atomic and can be reordered.
  */
 static __always_inline bool
-__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__u32 old, bit = (1 << (nr & 31));
 	__u32 *m = (__u32 *) addr + (nr >> 5);
@@ -332,7 +332,7 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 static __always_inline bool
-test_bit(unsigned long nr, const volatile unsigned long *addr)
+_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1 & (((const volatile __u32 *) addr)[nr >> 5] >> (nr & 31));
 }
diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
index 9d44bd4713cb..ed08c9d9c547 100644
--- a/arch/m68k/include/asm/bitops.h
+++ b/arch/m68k/include/asm/bitops.h
@@ -66,7 +66,7 @@ static inline void bfset_mem_set_bit(int nr, volatile unsigned long *vaddr)
 #endif
 
 static __always_inline void
-__set_bit(unsigned long nr, volatile unsigned long *addr)
+___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	set_bit(nr, addr);
 }
@@ -109,7 +109,7 @@ static inline void bfclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
 #endif
 
 static __always_inline void
-__clear_bit(unsigned long nr, volatile unsigned long *addr)
+___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	clear_bit(nr, addr);
 }
@@ -152,13 +152,13 @@ static inline void bfchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
 #endif
 
 static __always_inline void
-__change_bit(unsigned long nr, volatile unsigned long *addr)
+___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	change_bit(nr, addr);
 }
 
 static __always_inline bool
-test_bit(unsigned long nr, const volatile unsigned long *addr)
+_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return (addr[nr >> 5] & (1UL << (nr & 31))) != 0;
 }
@@ -212,7 +212,7 @@ static inline int bfset_mem_test_and_set_bit(int nr,
 #endif
 
 static __always_inline bool
-__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_set_bit(nr, addr);
 }
@@ -265,7 +265,7 @@ static inline int bfclr_mem_test_and_clear_bit(int nr,
 #endif
 
 static __always_inline bool
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_clear_bit(nr, addr);
 }
@@ -318,7 +318,7 @@ static inline int bfchg_mem_test_and_change_bit(int nr,
 #endif
 
 static __always_inline bool
-__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_change_bit(nr, addr);
 }
diff --git a/arch/sh/include/asm/bitops-op32.h b/arch/sh/include/asm/bitops-op32.h
index dcd85866a394..24b8a809c1b9 100644
--- a/arch/sh/include/asm/bitops-op32.h
+++ b/arch/sh/include/asm/bitops-op32.h
@@ -19,7 +19,7 @@
 #endif
 
 static __always_inline void
-__set_bit(unsigned long nr, volatile unsigned long *addr)
+___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	if (__builtin_constant_p(nr)) {
 		__asm__ __volatile__ (
@@ -37,7 +37,7 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 static __always_inline void
-__clear_bit(unsigned long nr, volatile unsigned long *addr)
+___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	if (__builtin_constant_p(nr)) {
 		__asm__ __volatile__ (
@@ -56,7 +56,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * __change_bit - Toggle a bit in memory
+ * ___change_bit - Toggle a bit in memory
  * @nr: the bit to change
  * @addr: the address to start counting from
  *
@@ -65,7 +65,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
  * may be that only one operation succeeds.
  */
 static __always_inline void
-__change_bit(unsigned long nr, volatile unsigned long *addr)
+___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	if (__builtin_constant_p(nr)) {
 		__asm__ __volatile__ (
@@ -84,7 +84,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * __test_and_set_bit - Set a bit and return its old value
+ * ___test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
@@ -93,7 +93,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
  * but actually fail.  You must protect multiple accesses with a lock.
  */
 static __always_inline bool
-__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -104,7 +104,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * __test_and_clear_bit - Clear a bit and return its old value
+ * ___test_and_clear_bit - Clear a bit and return its old value
  * @nr: Bit to clear
  * @addr: Address to count from
  *
@@ -113,7 +113,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
  * but actually fail.  You must protect multiple accesses with a lock.
  */
 static __always_inline bool
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -125,7 +125,7 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 
 /* WARNING: non atomic and it can be reordered! */
 static __always_inline bool
-__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -136,12 +136,12 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * test_bit - Determine whether a bit is set
+ * _test_bit - Determine whether a bit is set
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
 static __always_inline bool
-test_bit(unsigned long nr, const volatile unsigned long *addr)
+_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
diff --git a/arch/sparc/include/asm/bitops_32.h b/arch/sparc/include/asm/bitops_32.h
index 889afa9f990f..3448c191b484 100644
--- a/arch/sparc/include/asm/bitops_32.h
+++ b/arch/sparc/include/asm/bitops_32.h
@@ -19,9 +19,9 @@
 #error only <linux/bitops.h> can be included directly
 #endif
 
-unsigned long ___set_bit(unsigned long *addr, unsigned long mask);
-unsigned long ___clear_bit(unsigned long *addr, unsigned long mask);
-unsigned long ___change_bit(unsigned long *addr, unsigned long mask);
+unsigned long sp32___set_bit(unsigned long *addr, unsigned long mask);
+unsigned long sp32___clear_bit(unsigned long *addr, unsigned long mask);
+unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask);
 
 /*
  * Set bit 'nr' in 32-bit quantity at address 'addr' where bit '0'
@@ -36,7 +36,7 @@ static inline int test_and_set_bit(unsigned long nr, volatile unsigned long *add
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	return ___set_bit(ADDR, mask) != 0;
+	return sp32___set_bit(ADDR, mask) != 0;
 }
 
 static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
@@ -46,7 +46,7 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	(void) ___set_bit(ADDR, mask);
+	(void) sp32___set_bit(ADDR, mask);
 }
 
 static inline int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
@@ -56,7 +56,7 @@ static inline int test_and_clear_bit(unsigned long nr, volatile unsigned long *a
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	return ___clear_bit(ADDR, mask) != 0;
+	return sp32___clear_bit(ADDR, mask) != 0;
 }
 
 static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
@@ -66,7 +66,7 @@ static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	(void) ___clear_bit(ADDR, mask);
+	(void) sp32___clear_bit(ADDR, mask);
 }
 
 static inline int test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
@@ -76,7 +76,7 @@ static inline int test_and_change_bit(unsigned long nr, volatile unsigned long *
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	return ___change_bit(ADDR, mask) != 0;
+	return sp32___change_bit(ADDR, mask) != 0;
 }
 
 static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
@@ -86,7 +86,7 @@ static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	(void) ___change_bit(ADDR, mask);
+	(void) sp32___change_bit(ADDR, mask);
 }
 
 #include <asm-generic/bitops/non-atomic.h>
diff --git a/arch/sparc/lib/atomic32.c b/arch/sparc/lib/atomic32.c
index 8b81d0f00c97..cf80d1ae352b 100644
--- a/arch/sparc/lib/atomic32.c
+++ b/arch/sparc/lib/atomic32.c
@@ -120,7 +120,7 @@ void arch_atomic_set(atomic_t *v, int i)
 }
 EXPORT_SYMBOL(arch_atomic_set);
 
-unsigned long ___set_bit(unsigned long *addr, unsigned long mask)
+unsigned long sp32___set_bit(unsigned long *addr, unsigned long mask)
 {
 	unsigned long old, flags;
 
@@ -131,9 +131,9 @@ unsigned long ___set_bit(unsigned long *addr, unsigned long mask)
 
 	return old & mask;
 }
-EXPORT_SYMBOL(___set_bit);
+EXPORT_SYMBOL(sp32___set_bit);
 
-unsigned long ___clear_bit(unsigned long *addr, unsigned long mask)
+unsigned long sp32___clear_bit(unsigned long *addr, unsigned long mask)
 {
 	unsigned long old, flags;
 
@@ -144,9 +144,9 @@ unsigned long ___clear_bit(unsigned long *addr, unsigned long mask)
 
 	return old & mask;
 }
-EXPORT_SYMBOL(___clear_bit);
+EXPORT_SYMBOL(sp32___clear_bit);
 
-unsigned long ___change_bit(unsigned long *addr, unsigned long mask)
+unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask)
 {
 	unsigned long old, flags;
 
@@ -157,7 +157,7 @@ unsigned long ___change_bit(unsigned long *addr, unsigned long mask)
 
 	return old & mask;
 }
-EXPORT_SYMBOL(___change_bit);
+EXPORT_SYMBOL(sp32___change_bit);
 
 unsigned long __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
 {
diff --git a/include/asm-generic/bitops/instrumented-non-atomic.h b/include/asm-generic/bitops/instrumented-non-atomic.h
index b019f77ef21c..988a3bbfba34 100644
--- a/include/asm-generic/bitops/instrumented-non-atomic.h
+++ b/include/asm-generic/bitops/instrumented-non-atomic.h
@@ -14,7 +14,7 @@
 #include <linux/instrumented.h>
 
 /**
- * __set_bit - Set a bit in memory
+ * ___set_bit - Set a bit in memory
  * @nr: the bit to set
  * @addr: the address to start counting from
  *
@@ -23,14 +23,14 @@
  * succeeds.
  */
 static __always_inline void
-__set_bit(unsigned long nr, volatile unsigned long *addr)
+___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___set_bit(nr, addr);
 }
 
 /**
- * __clear_bit - Clears a bit in memory
+ * ___clear_bit - Clears a bit in memory
  * @nr: the bit to clear
  * @addr: the address to start counting from
  *
@@ -39,14 +39,14 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
  * succeeds.
  */
 static __always_inline void
-__clear_bit(unsigned long nr, volatile unsigned long *addr)
+___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___clear_bit(nr, addr);
 }
 
 /**
- * __change_bit - Toggle a bit in memory
+ * ___change_bit - Toggle a bit in memory
  * @nr: the bit to change
  * @addr: the address to start counting from
  *
@@ -55,7 +55,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
  * succeeds.
  */
 static __always_inline void
-__change_bit(unsigned long nr, volatile unsigned long *addr)
+___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___change_bit(nr, addr);
@@ -86,7 +86,7 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
 }
 
 /**
- * __test_and_set_bit - Set a bit and return its old value
+ * ___test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
@@ -94,14 +94,14 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
  * can appear to succeed but actually fail.
  */
 static __always_inline bool
-__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_set_bit(nr, addr);
 }
 
 /**
- * __test_and_clear_bit - Clear a bit and return its old value
+ * ___test_and_clear_bit - Clear a bit and return its old value
  * @nr: Bit to clear
  * @addr: Address to count from
  *
@@ -109,14 +109,14 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
  * can appear to succeed but actually fail.
  */
 static __always_inline bool
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_clear_bit(nr, addr);
 }
 
 /**
- * __test_and_change_bit - Change a bit and return its old value
+ * ___test_and_change_bit - Change a bit and return its old value
  * @nr: Bit to change
  * @addr: Address to count from
  *
@@ -124,19 +124,19 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
  * can appear to succeed but actually fail.
  */
 static __always_inline bool
-__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_change_bit(nr, addr);
 }
 
 /**
- * test_bit - Determine whether a bit is set
+ * _test_bit - Determine whether a bit is set
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
 static __always_inline bool
-test_bit(unsigned long nr, const volatile unsigned long *addr)
+_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long));
 	return arch_test_bit(nr, addr);
diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h
index 7ce0ed22fb5f..0c488331c9e7 100644
--- a/include/asm-generic/bitops/non-atomic.h
+++ b/include/asm-generic/bitops/non-atomic.h
@@ -6,24 +6,24 @@
 #include <asm/types.h>
 
 #define arch___set_bit gen___set_bit
-#define __set_bit arch___set_bit
+#define ___set_bit arch___set_bit
 
 #define arch___clear_bit gen___clear_bit
-#define __clear_bit arch___clear_bit
+#define ___clear_bit arch___clear_bit
 
 #define arch___change_bit gen___change_bit
-#define __change_bit arch___change_bit
+#define ___change_bit arch___change_bit
 
 #define arch___test_and_set_bit gen___test_and_set_bit
-#define __test_and_set_bit arch___test_and_set_bit
+#define ___test_and_set_bit arch___test_and_set_bit
 
 #define arch___test_and_clear_bit gen___test_and_clear_bit
-#define __test_and_clear_bit arch___test_and_clear_bit
+#define ___test_and_clear_bit arch___test_and_clear_bit
 
 #define arch___test_and_change_bit gen___test_and_change_bit
-#define __test_and_change_bit arch___test_and_change_bit
+#define ___test_and_change_bit arch___test_and_change_bit
 
 #define arch_test_bit gen_test_bit
-#define test_bit arch_test_bit
+#define _test_bit arch_test_bit
 
 #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 5520ac9b1c24..33cfc836a115 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -26,8 +26,24 @@ extern unsigned int __sw_hweight16(unsigned int w);
 extern unsigned int __sw_hweight32(unsigned int w);
 extern unsigned long __sw_hweight64(__u64 w);
 
+/*
+ * Defined here because those may be needed by architecture-specific static
+ * inlines.
+ */
+
 #include <asm-generic/bitops/generic-non-atomic.h>
 
+#define bitop(op, nr, addr)						\
+	op(nr, addr)
+
+#define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
+#define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
+#define __change_bit(nr, addr)		bitop(___change_bit, nr, addr)
+#define __test_and_set_bit(nr, addr)	bitop(___test_and_set_bit, nr, addr)
+#define __test_and_clear_bit(nr, addr)	bitop(___test_and_clear_bit, nr, addr)
+#define __test_and_change_bit(nr, addr)	bitop(___test_and_change_bit, nr, addr)
+#define test_bit(nr, addr)		bitop(_test_bit, nr, addr)
+
 /*
  * Include this here because some architectures need generic_ffs/fls in
  * scope
@@ -35,14 +51,14 @@ extern unsigned long __sw_hweight64(__u64 w);
 #include <asm/bitops.h>
 
 /* Check that the bitops prototypes are sane */
-#define __check_bitop_pr(name)	static_assert(__same_type(name, gen_##name))
-__check_bitop_pr(__set_bit);
-__check_bitop_pr(__clear_bit);
-__check_bitop_pr(__change_bit);
-__check_bitop_pr(__test_and_set_bit);
-__check_bitop_pr(__test_and_clear_bit);
-__check_bitop_pr(__test_and_change_bit);
-__check_bitop_pr(test_bit);
+#define __check_bitop_pr(name)	static_assert(__same_type(name, gen##name))
+__check_bitop_pr(___set_bit);
+__check_bitop_pr(___clear_bit);
+__check_bitop_pr(___change_bit);
+__check_bitop_pr(___test_and_set_bit);
+__check_bitop_pr(___test_and_clear_bit);
+__check_bitop_pr(___test_and_change_bit);
+__check_bitop_pr(_test_bit);
 #undef __check_bitop_pr
 
 static inline int get_bitmask_order(unsigned int count)
diff --git a/tools/include/asm-generic/bitops/non-atomic.h b/tools/include/asm-generic/bitops/non-atomic.h
index e5e78e42e57b..0c472a833408 100644
--- a/tools/include/asm-generic/bitops/non-atomic.h
+++ b/tools/include/asm-generic/bitops/non-atomic.h
@@ -5,7 +5,7 @@
 #include <linux/bits.h>
 
 /**
- * __set_bit - Set a bit in memory
+ * ___set_bit - Set a bit in memory
  * @nr: the bit to set
  * @addr: the address to start counting from
  *
@@ -14,7 +14,7 @@
  * may be that only one operation succeeds.
  */
 static __always_inline void
-__set_bit(unsigned long nr, volatile unsigned long *addr)
+___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -23,7 +23,7 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 static __always_inline void
-__clear_bit(unsigned long nr, volatile unsigned long *addr)
+___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -32,7 +32,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * __change_bit - Toggle a bit in memory
+ * ___change_bit - Toggle a bit in memory
  * @nr: the bit to change
  * @addr: the address to start counting from
  *
@@ -41,7 +41,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
  * may be that only one operation succeeds.
  */
 static __always_inline void
-__change_bit(unsigned long nr, volatile unsigned long *addr)
+___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -50,7 +50,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * __test_and_set_bit - Set a bit and return its old value
+ * ___test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
@@ -59,7 +59,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
  * but actually fail.  You must protect multiple accesses with a lock.
  */
 static __always_inline bool
-__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -70,7 +70,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * __test_and_clear_bit - Clear a bit and return its old value
+ * ___test_and_clear_bit - Clear a bit and return its old value
  * @nr: Bit to clear
  * @addr: Address to count from
  *
@@ -79,7 +79,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
  * but actually fail.  You must protect multiple accesses with a lock.
  */
 static __always_inline bool
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -91,7 +91,7 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 
 /* WARNING: non atomic and it can be reordered! */
 static __always_inline bool
-__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -102,12 +102,12 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * test_bit - Determine whether a bit is set
+ * _test_bit - Determine whether a bit is set
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
 static __always_inline bool
-test_bit(unsigned long nr, const volatile unsigned long *addr)
+_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
diff --git a/tools/include/linux/bitops.h b/tools/include/linux/bitops.h
index 5fca38fe1ba8..f18683b95ea6 100644
--- a/tools/include/linux/bitops.h
+++ b/tools/include/linux/bitops.h
@@ -25,6 +25,22 @@ extern unsigned int __sw_hweight16(unsigned int w);
 extern unsigned int __sw_hweight32(unsigned int w);
 extern unsigned long __sw_hweight64(__u64 w);
 
+/*
+ * Defined here because those may be needed by architecture-specific static
+ * inlines.
+ */
+
+#define bitop(op, nr, addr)						\
+	op(nr, addr)
+
+#define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
+#define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
+#define __change_bit(nr, addr)		bitop(___change_bit, nr, addr)
+#define __test_and_set_bit(nr, addr)	bitop(___test_and_set_bit, nr, addr)
+#define __test_and_clear_bit(nr, addr)	bitop(___test_and_clear_bit, nr, addr)
+#define __test_and_change_bit(nr, addr)	bitop(___test_and_change_bit, nr, addr)
+#define test_bit(nr, addr)		bitop(_test_bit, nr, addr)
+
 /*
  * Include this here because some architectures need generic_ffs/fls in
  * scope
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 5/6] bitops: wrap non-atomic bitops with a transparent macro
@ 2022-06-06 11:49   ` Alexander Lobakin
  0 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-06 11:49 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

In preparation for altering the non-atomic bitops with a macro, wrap
them in a transparent definition. This requires prepending one more
'_' to their names in order to be able to do that seamlessly.
sparc32 already has the triple-underscored functions, so I had to
rename them ('___' -> 'sp32_').

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 arch/alpha/include/asm/bitops.h               | 14 ++++----
 arch/hexagon/include/asm/bitops.h             | 16 +++++-----
 arch/ia64/include/asm/bitops.h                | 26 +++++++--------
 arch/m68k/include/asm/bitops.h                | 14 ++++----
 arch/sh/include/asm/bitops-op32.h             | 22 ++++++-------
 arch/sparc/include/asm/bitops_32.h            | 18 +++++------
 arch/sparc/lib/atomic32.c                     | 12 +++----
 .../bitops/instrumented-non-atomic.h          | 28 ++++++++--------
 include/asm-generic/bitops/non-atomic.h       | 14 ++++----
 include/linux/bitops.h                        | 32 ++++++++++++++-----
 tools/include/asm-generic/bitops/non-atomic.h | 24 +++++++-------
 tools/include/linux/bitops.h                  | 16 ++++++++++
 12 files changed, 134 insertions(+), 102 deletions(-)

diff --git a/arch/alpha/include/asm/bitops.h b/arch/alpha/include/asm/bitops.h
index 381ad2eae4b4..e7d119826062 100644
--- a/arch/alpha/include/asm/bitops.h
+++ b/arch/alpha/include/asm/bitops.h
@@ -47,7 +47,7 @@ set_bit(unsigned long nr, volatile void * addr)
  * WARNING: non atomic version.
  */
 static __always_inline void
-__set_bit(unsigned long nr, volatile unsigned long *addr)
+___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	int *m = ((int *) addr) + (nr >> 5);
 
@@ -83,7 +83,7 @@ clear_bit_unlock(unsigned long nr, volatile void * addr)
  * WARNING: non atomic version.
  */
 static __always_inline void
-__clear_bit(unsigned long nr, volatile unsigned long *addr)
+___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	int *m = ((int *) addr) + (nr >> 5);
 
@@ -119,7 +119,7 @@ change_bit(unsigned long nr, volatile void * addr)
  * WARNING: non atomic version.
  */
 static __always_inline void
-__change_bit(unsigned long nr, volatile unsigned long *addr)
+___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	int *m = ((int *) addr) + (nr >> 5);
 
@@ -187,7 +187,7 @@ test_and_set_bit_lock(unsigned long nr, volatile void *addr)
  * WARNING: non atomic version.
  */
 static __always_inline bool
-__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = 1 << (nr & 0x1f);
 	int *m = ((int *) addr) + (nr >> 5);
@@ -231,7 +231,7 @@ test_and_clear_bit(unsigned long nr, volatile void * addr)
  * WARNING: non atomic version.
  */
 static __always_inline bool
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = 1 << (nr & 0x1f);
 	int *m = ((int *) addr) + (nr >> 5);
@@ -273,7 +273,7 @@ test_and_change_bit(unsigned long nr, volatile void * addr)
  * WARNING: non atomic version.
  */
 static __always_inline bool
-__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = 1 << (nr & 0x1f);
 	int *m = ((int *) addr) + (nr >> 5);
@@ -284,7 +284,7 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 static __always_inline bool
-test_bit(unsigned long nr, const volatile unsigned long *addr)
+_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return (1UL & (((const int *) addr)[nr >> 5] >> (nr & 31))) != 0UL;
 }
diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h
index a3bfe3a8d4b7..37144f8e0b0c 100644
--- a/arch/hexagon/include/asm/bitops.h
+++ b/arch/hexagon/include/asm/bitops.h
@@ -128,44 +128,44 @@ static inline void change_bit(int nr, volatile void *addr)
  *
  */
 static __always_inline void
-__clear_bit(unsigned long nr, volatile unsigned long *addr)
+___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	test_and_clear_bit(nr, addr);
 }
 
 static __always_inline void
-__set_bit(unsigned long nr, volatile unsigned long *addr)
+___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	test_and_set_bit(nr, addr);
 }
 
 static __always_inline void
-__change_bit(unsigned long nr, volatile unsigned long *addr)
+___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	test_and_change_bit(nr, addr);
 }
 
 /*  Apparently, at least some of these are allowed to be non-atomic  */
 static __always_inline bool
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_clear_bit(nr, addr);
 }
 
 static __always_inline bool
-__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_set_bit(nr, addr);
 }
 
 static __always_inline bool
-__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_change_bit(nr, addr);
 }
 
 static __always_inline bool
-test_bit(unsigned long nr, const volatile unsigned long *addr)
+_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	int retval;
 
@@ -179,7 +179,7 @@ test_bit(unsigned long nr, const volatile unsigned long *addr)
 	return retval;
 }
 
-#define __test_bit(nr, addr) test_bit(nr, addr)
+#define __test_bit(nr, addr) _test_bit(nr, addr)
 
 /*
  * ffz - find first zero in word.
diff --git a/arch/ia64/include/asm/bitops.h b/arch/ia64/include/asm/bitops.h
index 4267a217a503..3cb231fb0283 100644
--- a/arch/ia64/include/asm/bitops.h
+++ b/arch/ia64/include/asm/bitops.h
@@ -53,7 +53,7 @@ set_bit (int nr, volatile void *addr)
 }
 
 /**
- * __set_bit - Set a bit in memory
+ * ___set_bit - Set a bit in memory
  * @nr: the bit to set
  * @addr: the address to start counting from
  *
@@ -62,7 +62,7 @@ set_bit (int nr, volatile void *addr)
  * may be that only one operation succeeds.
  */
 static __always_inline void
-__set_bit(unsigned long nr, volatile unsigned long *addr)
+___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	*((__u32 *) addr + (nr >> 5)) |= (1 << (nr & 31));
 }
@@ -135,7 +135,7 @@ __clear_bit_unlock(int nr, void *addr)
 }
 
 /**
- * __clear_bit - Clears a bit in memory (non-atomic version)
+ * ___clear_bit - Clears a bit in memory (non-atomic version)
  * @nr: the bit to clear
  * @addr: the address to start counting from
  *
@@ -144,7 +144,7 @@ __clear_bit_unlock(int nr, void *addr)
  * may be that only one operation succeeds.
  */
 static __always_inline void
-__clear_bit(unsigned long nr, volatile unsigned long *addr)
+___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	*((__u32 *) addr + (nr >> 5)) &= ~(1 << (nr & 31));
 }
@@ -175,7 +175,7 @@ change_bit (int nr, volatile void *addr)
 }
 
 /**
- * __change_bit - Toggle a bit in memory
+ * ___change_bit - Toggle a bit in memory
  * @nr: the bit to toggle
  * @addr: the address to start counting from
  *
@@ -184,7 +184,7 @@ change_bit (int nr, volatile void *addr)
  * may be that only one operation succeeds.
  */
 static __always_inline void
-__change_bit(unsigned long nr, volatile unsigned long *addr)
+___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	*((__u32 *) addr + (nr >> 5)) ^= (1 << (nr & 31));
 }
@@ -224,7 +224,7 @@ test_and_set_bit (int nr, volatile void *addr)
 #define test_and_set_bit_lock test_and_set_bit
 
 /**
- * __test_and_set_bit - Set a bit and return its old value
+ * ___test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
@@ -233,7 +233,7 @@ test_and_set_bit (int nr, volatile void *addr)
  * but actually fail.  You must protect multiple accesses with a lock.
  */
 static __always_inline bool
-__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__u32 *p = (__u32 *) addr + (nr >> 5);
 	__u32 m = 1 << (nr & 31);
@@ -269,7 +269,7 @@ test_and_clear_bit (int nr, volatile void *addr)
 }
 
 /**
- * __test_and_clear_bit - Clear a bit and return its old value
+ * ___test_and_clear_bit - Clear a bit and return its old value
  * @nr: Bit to clear
  * @addr: Address to count from
  *
@@ -278,7 +278,7 @@ test_and_clear_bit (int nr, volatile void *addr)
  * but actually fail.  You must protect multiple accesses with a lock.
  */
 static __always_inline bool
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__u32 *p = (__u32 *) addr + (nr >> 5);
 	__u32 m = 1 << (nr & 31);
@@ -314,14 +314,14 @@ test_and_change_bit (int nr, volatile void *addr)
 }
 
 /**
- * __test_and_change_bit - Change a bit and return its old value
+ * ___test_and_change_bit - Change a bit and return its old value
  * @nr: Bit to change
  * @addr: Address to count from
  *
  * This operation is non-atomic and can be reordered.
  */
 static __always_inline bool
-__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__u32 old, bit = (1 << (nr & 31));
 	__u32 *m = (__u32 *) addr + (nr >> 5);
@@ -332,7 +332,7 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 static __always_inline bool
-test_bit(unsigned long nr, const volatile unsigned long *addr)
+_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1 & (((const volatile __u32 *) addr)[nr >> 5] >> (nr & 31));
 }
diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
index 9d44bd4713cb..ed08c9d9c547 100644
--- a/arch/m68k/include/asm/bitops.h
+++ b/arch/m68k/include/asm/bitops.h
@@ -66,7 +66,7 @@ static inline void bfset_mem_set_bit(int nr, volatile unsigned long *vaddr)
 #endif
 
 static __always_inline void
-__set_bit(unsigned long nr, volatile unsigned long *addr)
+___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	set_bit(nr, addr);
 }
@@ -109,7 +109,7 @@ static inline void bfclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
 #endif
 
 static __always_inline void
-__clear_bit(unsigned long nr, volatile unsigned long *addr)
+___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	clear_bit(nr, addr);
 }
@@ -152,13 +152,13 @@ static inline void bfchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
 #endif
 
 static __always_inline void
-__change_bit(unsigned long nr, volatile unsigned long *addr)
+___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	change_bit(nr, addr);
 }
 
 static __always_inline bool
-test_bit(unsigned long nr, const volatile unsigned long *addr)
+_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return (addr[nr >> 5] & (1UL << (nr & 31))) != 0;
 }
@@ -212,7 +212,7 @@ static inline int bfset_mem_test_and_set_bit(int nr,
 #endif
 
 static __always_inline bool
-__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_set_bit(nr, addr);
 }
@@ -265,7 +265,7 @@ static inline int bfclr_mem_test_and_clear_bit(int nr,
 #endif
 
 static __always_inline bool
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_clear_bit(nr, addr);
 }
@@ -318,7 +318,7 @@ static inline int bfchg_mem_test_and_change_bit(int nr,
 #endif
 
 static __always_inline bool
-__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_change_bit(nr, addr);
 }
diff --git a/arch/sh/include/asm/bitops-op32.h b/arch/sh/include/asm/bitops-op32.h
index dcd85866a394..24b8a809c1b9 100644
--- a/arch/sh/include/asm/bitops-op32.h
+++ b/arch/sh/include/asm/bitops-op32.h
@@ -19,7 +19,7 @@
 #endif
 
 static __always_inline void
-__set_bit(unsigned long nr, volatile unsigned long *addr)
+___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	if (__builtin_constant_p(nr)) {
 		__asm__ __volatile__ (
@@ -37,7 +37,7 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 static __always_inline void
-__clear_bit(unsigned long nr, volatile unsigned long *addr)
+___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	if (__builtin_constant_p(nr)) {
 		__asm__ __volatile__ (
@@ -56,7 +56,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * __change_bit - Toggle a bit in memory
+ * ___change_bit - Toggle a bit in memory
  * @nr: the bit to change
  * @addr: the address to start counting from
  *
@@ -65,7 +65,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
  * may be that only one operation succeeds.
  */
 static __always_inline void
-__change_bit(unsigned long nr, volatile unsigned long *addr)
+___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	if (__builtin_constant_p(nr)) {
 		__asm__ __volatile__ (
@@ -84,7 +84,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * __test_and_set_bit - Set a bit and return its old value
+ * ___test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
@@ -93,7 +93,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
  * but actually fail.  You must protect multiple accesses with a lock.
  */
 static __always_inline bool
-__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -104,7 +104,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * __test_and_clear_bit - Clear a bit and return its old value
+ * ___test_and_clear_bit - Clear a bit and return its old value
  * @nr: Bit to clear
  * @addr: Address to count from
  *
@@ -113,7 +113,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
  * but actually fail.  You must protect multiple accesses with a lock.
  */
 static __always_inline bool
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -125,7 +125,7 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 
 /* WARNING: non atomic and it can be reordered! */
 static __always_inline bool
-__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -136,12 +136,12 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * test_bit - Determine whether a bit is set
+ * _test_bit - Determine whether a bit is set
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
 static __always_inline bool
-test_bit(unsigned long nr, const volatile unsigned long *addr)
+_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
diff --git a/arch/sparc/include/asm/bitops_32.h b/arch/sparc/include/asm/bitops_32.h
index 889afa9f990f..3448c191b484 100644
--- a/arch/sparc/include/asm/bitops_32.h
+++ b/arch/sparc/include/asm/bitops_32.h
@@ -19,9 +19,9 @@
 #error only <linux/bitops.h> can be included directly
 #endif
 
-unsigned long ___set_bit(unsigned long *addr, unsigned long mask);
-unsigned long ___clear_bit(unsigned long *addr, unsigned long mask);
-unsigned long ___change_bit(unsigned long *addr, unsigned long mask);
+unsigned long sp32___set_bit(unsigned long *addr, unsigned long mask);
+unsigned long sp32___clear_bit(unsigned long *addr, unsigned long mask);
+unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask);
 
 /*
  * Set bit 'nr' in 32-bit quantity at address 'addr' where bit '0'
@@ -36,7 +36,7 @@ static inline int test_and_set_bit(unsigned long nr, volatile unsigned long *add
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	return ___set_bit(ADDR, mask) != 0;
+	return sp32___set_bit(ADDR, mask) != 0;
 }
 
 static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
@@ -46,7 +46,7 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	(void) ___set_bit(ADDR, mask);
+	(void) sp32___set_bit(ADDR, mask);
 }
 
 static inline int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
@@ -56,7 +56,7 @@ static inline int test_and_clear_bit(unsigned long nr, volatile unsigned long *a
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	return ___clear_bit(ADDR, mask) != 0;
+	return sp32___clear_bit(ADDR, mask) != 0;
 }
 
 static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
@@ -66,7 +66,7 @@ static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	(void) ___clear_bit(ADDR, mask);
+	(void) sp32___clear_bit(ADDR, mask);
 }
 
 static inline int test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
@@ -76,7 +76,7 @@ static inline int test_and_change_bit(unsigned long nr, volatile unsigned long *
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	return ___change_bit(ADDR, mask) != 0;
+	return sp32___change_bit(ADDR, mask) != 0;
 }
 
 static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
@@ -86,7 +86,7 @@ static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	(void) ___change_bit(ADDR, mask);
+	(void) sp32___change_bit(ADDR, mask);
 }
 
 #include <asm-generic/bitops/non-atomic.h>
diff --git a/arch/sparc/lib/atomic32.c b/arch/sparc/lib/atomic32.c
index 8b81d0f00c97..cf80d1ae352b 100644
--- a/arch/sparc/lib/atomic32.c
+++ b/arch/sparc/lib/atomic32.c
@@ -120,7 +120,7 @@ void arch_atomic_set(atomic_t *v, int i)
 }
 EXPORT_SYMBOL(arch_atomic_set);
 
-unsigned long ___set_bit(unsigned long *addr, unsigned long mask)
+unsigned long sp32___set_bit(unsigned long *addr, unsigned long mask)
 {
 	unsigned long old, flags;
 
@@ -131,9 +131,9 @@ unsigned long ___set_bit(unsigned long *addr, unsigned long mask)
 
 	return old & mask;
 }
-EXPORT_SYMBOL(___set_bit);
+EXPORT_SYMBOL(sp32___set_bit);
 
-unsigned long ___clear_bit(unsigned long *addr, unsigned long mask)
+unsigned long sp32___clear_bit(unsigned long *addr, unsigned long mask)
 {
 	unsigned long old, flags;
 
@@ -144,9 +144,9 @@ unsigned long ___clear_bit(unsigned long *addr, unsigned long mask)
 
 	return old & mask;
 }
-EXPORT_SYMBOL(___clear_bit);
+EXPORT_SYMBOL(sp32___clear_bit);
 
-unsigned long ___change_bit(unsigned long *addr, unsigned long mask)
+unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask)
 {
 	unsigned long old, flags;
 
@@ -157,7 +157,7 @@ unsigned long ___change_bit(unsigned long *addr, unsigned long mask)
 
 	return old & mask;
 }
-EXPORT_SYMBOL(___change_bit);
+EXPORT_SYMBOL(sp32___change_bit);
 
 unsigned long __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
 {
diff --git a/include/asm-generic/bitops/instrumented-non-atomic.h b/include/asm-generic/bitops/instrumented-non-atomic.h
index b019f77ef21c..988a3bbfba34 100644
--- a/include/asm-generic/bitops/instrumented-non-atomic.h
+++ b/include/asm-generic/bitops/instrumented-non-atomic.h
@@ -14,7 +14,7 @@
 #include <linux/instrumented.h>
 
 /**
- * __set_bit - Set a bit in memory
+ * ___set_bit - Set a bit in memory
  * @nr: the bit to set
  * @addr: the address to start counting from
  *
@@ -23,14 +23,14 @@
  * succeeds.
  */
 static __always_inline void
-__set_bit(unsigned long nr, volatile unsigned long *addr)
+___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___set_bit(nr, addr);
 }
 
 /**
- * __clear_bit - Clears a bit in memory
+ * ___clear_bit - Clears a bit in memory
  * @nr: the bit to clear
  * @addr: the address to start counting from
  *
@@ -39,14 +39,14 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
  * succeeds.
  */
 static __always_inline void
-__clear_bit(unsigned long nr, volatile unsigned long *addr)
+___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___clear_bit(nr, addr);
 }
 
 /**
- * __change_bit - Toggle a bit in memory
+ * ___change_bit - Toggle a bit in memory
  * @nr: the bit to change
  * @addr: the address to start counting from
  *
@@ -55,7 +55,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
  * succeeds.
  */
 static __always_inline void
-__change_bit(unsigned long nr, volatile unsigned long *addr)
+___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___change_bit(nr, addr);
@@ -86,7 +86,7 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
 }
 
 /**
- * __test_and_set_bit - Set a bit and return its old value
+ * ___test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
@@ -94,14 +94,14 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
  * can appear to succeed but actually fail.
  */
 static __always_inline bool
-__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_set_bit(nr, addr);
 }
 
 /**
- * __test_and_clear_bit - Clear a bit and return its old value
+ * ___test_and_clear_bit - Clear a bit and return its old value
  * @nr: Bit to clear
  * @addr: Address to count from
  *
@@ -109,14 +109,14 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
  * can appear to succeed but actually fail.
  */
 static __always_inline bool
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_clear_bit(nr, addr);
 }
 
 /**
- * __test_and_change_bit - Change a bit and return its old value
+ * ___test_and_change_bit - Change a bit and return its old value
  * @nr: Bit to change
  * @addr: Address to count from
  *
@@ -124,19 +124,19 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
  * can appear to succeed but actually fail.
  */
 static __always_inline bool
-__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_change_bit(nr, addr);
 }
 
 /**
- * test_bit - Determine whether a bit is set
+ * _test_bit - Determine whether a bit is set
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
 static __always_inline bool
-test_bit(unsigned long nr, const volatile unsigned long *addr)
+_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long));
 	return arch_test_bit(nr, addr);
diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h
index 7ce0ed22fb5f..0c488331c9e7 100644
--- a/include/asm-generic/bitops/non-atomic.h
+++ b/include/asm-generic/bitops/non-atomic.h
@@ -6,24 +6,24 @@
 #include <asm/types.h>
 
 #define arch___set_bit gen___set_bit
-#define __set_bit arch___set_bit
+#define ___set_bit arch___set_bit
 
 #define arch___clear_bit gen___clear_bit
-#define __clear_bit arch___clear_bit
+#define ___clear_bit arch___clear_bit
 
 #define arch___change_bit gen___change_bit
-#define __change_bit arch___change_bit
+#define ___change_bit arch___change_bit
 
 #define arch___test_and_set_bit gen___test_and_set_bit
-#define __test_and_set_bit arch___test_and_set_bit
+#define ___test_and_set_bit arch___test_and_set_bit
 
 #define arch___test_and_clear_bit gen___test_and_clear_bit
-#define __test_and_clear_bit arch___test_and_clear_bit
+#define ___test_and_clear_bit arch___test_and_clear_bit
 
 #define arch___test_and_change_bit gen___test_and_change_bit
-#define __test_and_change_bit arch___test_and_change_bit
+#define ___test_and_change_bit arch___test_and_change_bit
 
 #define arch_test_bit gen_test_bit
-#define test_bit arch_test_bit
+#define _test_bit arch_test_bit
 
 #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 5520ac9b1c24..33cfc836a115 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -26,8 +26,24 @@ extern unsigned int __sw_hweight16(unsigned int w);
 extern unsigned int __sw_hweight32(unsigned int w);
 extern unsigned long __sw_hweight64(__u64 w);
 
+/*
+ * Defined here because those may be needed by architecture-specific static
+ * inlines.
+ */
+
 #include <asm-generic/bitops/generic-non-atomic.h>
 
+#define bitop(op, nr, addr)						\
+	op(nr, addr)
+
+#define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
+#define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
+#define __change_bit(nr, addr)		bitop(___change_bit, nr, addr)
+#define __test_and_set_bit(nr, addr)	bitop(___test_and_set_bit, nr, addr)
+#define __test_and_clear_bit(nr, addr)	bitop(___test_and_clear_bit, nr, addr)
+#define __test_and_change_bit(nr, addr)	bitop(___test_and_change_bit, nr, addr)
+#define test_bit(nr, addr)		bitop(_test_bit, nr, addr)
+
 /*
  * Include this here because some architectures need generic_ffs/fls in
  * scope
@@ -35,14 +51,14 @@ extern unsigned long __sw_hweight64(__u64 w);
 #include <asm/bitops.h>
 
 /* Check that the bitops prototypes are sane */
-#define __check_bitop_pr(name)	static_assert(__same_type(name, gen_##name))
-__check_bitop_pr(__set_bit);
-__check_bitop_pr(__clear_bit);
-__check_bitop_pr(__change_bit);
-__check_bitop_pr(__test_and_set_bit);
-__check_bitop_pr(__test_and_clear_bit);
-__check_bitop_pr(__test_and_change_bit);
-__check_bitop_pr(test_bit);
+#define __check_bitop_pr(name)	static_assert(__same_type(name, gen##name))
+__check_bitop_pr(___set_bit);
+__check_bitop_pr(___clear_bit);
+__check_bitop_pr(___change_bit);
+__check_bitop_pr(___test_and_set_bit);
+__check_bitop_pr(___test_and_clear_bit);
+__check_bitop_pr(___test_and_change_bit);
+__check_bitop_pr(_test_bit);
 #undef __check_bitop_pr
 
 static inline int get_bitmask_order(unsigned int count)
diff --git a/tools/include/asm-generic/bitops/non-atomic.h b/tools/include/asm-generic/bitops/non-atomic.h
index e5e78e42e57b..0c472a833408 100644
--- a/tools/include/asm-generic/bitops/non-atomic.h
+++ b/tools/include/asm-generic/bitops/non-atomic.h
@@ -5,7 +5,7 @@
 #include <linux/bits.h>
 
 /**
- * __set_bit - Set a bit in memory
+ * ___set_bit - Set a bit in memory
  * @nr: the bit to set
  * @addr: the address to start counting from
  *
@@ -14,7 +14,7 @@
  * may be that only one operation succeeds.
  */
 static __always_inline void
-__set_bit(unsigned long nr, volatile unsigned long *addr)
+___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -23,7 +23,7 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 static __always_inline void
-__clear_bit(unsigned long nr, volatile unsigned long *addr)
+___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -32,7 +32,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * __change_bit - Toggle a bit in memory
+ * ___change_bit - Toggle a bit in memory
  * @nr: the bit to change
  * @addr: the address to start counting from
  *
@@ -41,7 +41,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
  * may be that only one operation succeeds.
  */
 static __always_inline void
-__change_bit(unsigned long nr, volatile unsigned long *addr)
+___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -50,7 +50,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * __test_and_set_bit - Set a bit and return its old value
+ * ___test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
@@ -59,7 +59,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
  * but actually fail.  You must protect multiple accesses with a lock.
  */
 static __always_inline bool
-__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -70,7 +70,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * __test_and_clear_bit - Clear a bit and return its old value
+ * ___test_and_clear_bit - Clear a bit and return its old value
  * @nr: Bit to clear
  * @addr: Address to count from
  *
@@ -79,7 +79,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
  * but actually fail.  You must protect multiple accesses with a lock.
  */
 static __always_inline bool
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -91,7 +91,7 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 
 /* WARNING: non atomic and it can be reordered! */
 static __always_inline bool
-__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -102,12 +102,12 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * test_bit - Determine whether a bit is set
+ * _test_bit - Determine whether a bit is set
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
 static __always_inline bool
-test_bit(unsigned long nr, const volatile unsigned long *addr)
+_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
diff --git a/tools/include/linux/bitops.h b/tools/include/linux/bitops.h
index 5fca38fe1ba8..f18683b95ea6 100644
--- a/tools/include/linux/bitops.h
+++ b/tools/include/linux/bitops.h
@@ -25,6 +25,22 @@ extern unsigned int __sw_hweight16(unsigned int w);
 extern unsigned int __sw_hweight32(unsigned int w);
 extern unsigned long __sw_hweight64(__u64 w);
 
+/*
+ * Defined here because those may be needed by architecture-specific static
+ * inlines.
+ */
+
+#define bitop(op, nr, addr)						\
+	op(nr, addr)
+
+#define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
+#define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
+#define __change_bit(nr, addr)		bitop(___change_bit, nr, addr)
+#define __test_and_set_bit(nr, addr)	bitop(___test_and_set_bit, nr, addr)
+#define __test_and_clear_bit(nr, addr)	bitop(___test_and_clear_bit, nr, addr)
+#define __test_and_change_bit(nr, addr)	bitop(___test_and_change_bit, nr, addr)
+#define test_bit(nr, addr)		bitop(_test_bit, nr, addr)
+
 /*
  * Include this here because some architectures need generic_ffs/fls in
  * scope
-- 
2.36.1

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 5/6] bitops: wrap non-atomic bitops with a transparent macro
@ 2022-06-06 11:49   ` Alexander Lobakin
  0 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-06 11:49 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux

In preparation for altering the non-atomic bitops with a macro, wrap
them in a transparent definition. This requires prepending one more
'_' to their names in order to be able to do that seamlessly.
sparc32 already has the triple-underscored functions, so I had to
rename them ('___' -> 'sp32_').

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 arch/alpha/include/asm/bitops.h               | 14 ++++----
 arch/hexagon/include/asm/bitops.h             | 16 +++++-----
 arch/ia64/include/asm/bitops.h                | 26 +++++++--------
 arch/m68k/include/asm/bitops.h                | 14 ++++----
 arch/sh/include/asm/bitops-op32.h             | 22 ++++++-------
 arch/sparc/include/asm/bitops_32.h            | 18 +++++------
 arch/sparc/lib/atomic32.c                     | 12 +++----
 .../bitops/instrumented-non-atomic.h          | 28 ++++++++--------
 include/asm-generic/bitops/non-atomic.h       | 14 ++++----
 include/linux/bitops.h                        | 32 ++++++++++++++-----
 tools/include/asm-generic/bitops/non-atomic.h | 24 +++++++-------
 tools/include/linux/bitops.h                  | 16 ++++++++++
 12 files changed, 134 insertions(+), 102 deletions(-)

diff --git a/arch/alpha/include/asm/bitops.h b/arch/alpha/include/asm/bitops.h
index 381ad2eae4b4..e7d119826062 100644
--- a/arch/alpha/include/asm/bitops.h
+++ b/arch/alpha/include/asm/bitops.h
@@ -47,7 +47,7 @@ set_bit(unsigned long nr, volatile void * addr)
  * WARNING: non atomic version.
  */
 static __always_inline void
-__set_bit(unsigned long nr, volatile unsigned long *addr)
+___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	int *m = ((int *) addr) + (nr >> 5);
 
@@ -83,7 +83,7 @@ clear_bit_unlock(unsigned long nr, volatile void * addr)
  * WARNING: non atomic version.
  */
 static __always_inline void
-__clear_bit(unsigned long nr, volatile unsigned long *addr)
+___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	int *m = ((int *) addr) + (nr >> 5);
 
@@ -119,7 +119,7 @@ change_bit(unsigned long nr, volatile void * addr)
  * WARNING: non atomic version.
  */
 static __always_inline void
-__change_bit(unsigned long nr, volatile unsigned long *addr)
+___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	int *m = ((int *) addr) + (nr >> 5);
 
@@ -187,7 +187,7 @@ test_and_set_bit_lock(unsigned long nr, volatile void *addr)
  * WARNING: non atomic version.
  */
 static __always_inline bool
-__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = 1 << (nr & 0x1f);
 	int *m = ((int *) addr) + (nr >> 5);
@@ -231,7 +231,7 @@ test_and_clear_bit(unsigned long nr, volatile void * addr)
  * WARNING: non atomic version.
  */
 static __always_inline bool
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = 1 << (nr & 0x1f);
 	int *m = ((int *) addr) + (nr >> 5);
@@ -273,7 +273,7 @@ test_and_change_bit(unsigned long nr, volatile void * addr)
  * WARNING: non atomic version.
  */
 static __always_inline bool
-__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = 1 << (nr & 0x1f);
 	int *m = ((int *) addr) + (nr >> 5);
@@ -284,7 +284,7 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 static __always_inline bool
-test_bit(unsigned long nr, const volatile unsigned long *addr)
+_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return (1UL & (((const int *) addr)[nr >> 5] >> (nr & 31))) != 0UL;
 }
diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h
index a3bfe3a8d4b7..37144f8e0b0c 100644
--- a/arch/hexagon/include/asm/bitops.h
+++ b/arch/hexagon/include/asm/bitops.h
@@ -128,44 +128,44 @@ static inline void change_bit(int nr, volatile void *addr)
  *
  */
 static __always_inline void
-__clear_bit(unsigned long nr, volatile unsigned long *addr)
+___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	test_and_clear_bit(nr, addr);
 }
 
 static __always_inline void
-__set_bit(unsigned long nr, volatile unsigned long *addr)
+___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	test_and_set_bit(nr, addr);
 }
 
 static __always_inline void
-__change_bit(unsigned long nr, volatile unsigned long *addr)
+___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	test_and_change_bit(nr, addr);
 }
 
 /*  Apparently, at least some of these are allowed to be non-atomic  */
 static __always_inline bool
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_clear_bit(nr, addr);
 }
 
 static __always_inline bool
-__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_set_bit(nr, addr);
 }
 
 static __always_inline bool
-__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_change_bit(nr, addr);
 }
 
 static __always_inline bool
-test_bit(unsigned long nr, const volatile unsigned long *addr)
+_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	int retval;
 
@@ -179,7 +179,7 @@ test_bit(unsigned long nr, const volatile unsigned long *addr)
 	return retval;
 }
 
-#define __test_bit(nr, addr) test_bit(nr, addr)
+#define __test_bit(nr, addr) _test_bit(nr, addr)
 
 /*
  * ffz - find first zero in word.
diff --git a/arch/ia64/include/asm/bitops.h b/arch/ia64/include/asm/bitops.h
index 4267a217a503..3cb231fb0283 100644
--- a/arch/ia64/include/asm/bitops.h
+++ b/arch/ia64/include/asm/bitops.h
@@ -53,7 +53,7 @@ set_bit (int nr, volatile void *addr)
 }
 
 /**
- * __set_bit - Set a bit in memory
+ * ___set_bit - Set a bit in memory
  * @nr: the bit to set
  * @addr: the address to start counting from
  *
@@ -62,7 +62,7 @@ set_bit (int nr, volatile void *addr)
  * may be that only one operation succeeds.
  */
 static __always_inline void
-__set_bit(unsigned long nr, volatile unsigned long *addr)
+___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	*((__u32 *) addr + (nr >> 5)) |= (1 << (nr & 31));
 }
@@ -135,7 +135,7 @@ __clear_bit_unlock(int nr, void *addr)
 }
 
 /**
- * __clear_bit - Clears a bit in memory (non-atomic version)
+ * ___clear_bit - Clears a bit in memory (non-atomic version)
  * @nr: the bit to clear
  * @addr: the address to start counting from
  *
@@ -144,7 +144,7 @@ __clear_bit_unlock(int nr, void *addr)
  * may be that only one operation succeeds.
  */
 static __always_inline void
-__clear_bit(unsigned long nr, volatile unsigned long *addr)
+___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	*((__u32 *) addr + (nr >> 5)) &= ~(1 << (nr & 31));
 }
@@ -175,7 +175,7 @@ change_bit (int nr, volatile void *addr)
 }
 
 /**
- * __change_bit - Toggle a bit in memory
+ * ___change_bit - Toggle a bit in memory
  * @nr: the bit to toggle
  * @addr: the address to start counting from
  *
@@ -184,7 +184,7 @@ change_bit (int nr, volatile void *addr)
  * may be that only one operation succeeds.
  */
 static __always_inline void
-__change_bit(unsigned long nr, volatile unsigned long *addr)
+___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	*((__u32 *) addr + (nr >> 5)) ^= (1 << (nr & 31));
 }
@@ -224,7 +224,7 @@ test_and_set_bit (int nr, volatile void *addr)
 #define test_and_set_bit_lock test_and_set_bit
 
 /**
- * __test_and_set_bit - Set a bit and return its old value
+ * ___test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
@@ -233,7 +233,7 @@ test_and_set_bit (int nr, volatile void *addr)
  * but actually fail.  You must protect multiple accesses with a lock.
  */
 static __always_inline bool
-__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__u32 *p = (__u32 *) addr + (nr >> 5);
 	__u32 m = 1 << (nr & 31);
@@ -269,7 +269,7 @@ test_and_clear_bit (int nr, volatile void *addr)
 }
 
 /**
- * __test_and_clear_bit - Clear a bit and return its old value
+ * ___test_and_clear_bit - Clear a bit and return its old value
  * @nr: Bit to clear
  * @addr: Address to count from
  *
@@ -278,7 +278,7 @@ test_and_clear_bit (int nr, volatile void *addr)
  * but actually fail.  You must protect multiple accesses with a lock.
  */
 static __always_inline bool
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__u32 *p = (__u32 *) addr + (nr >> 5);
 	__u32 m = 1 << (nr & 31);
@@ -314,14 +314,14 @@ test_and_change_bit (int nr, volatile void *addr)
 }
 
 /**
- * __test_and_change_bit - Change a bit and return its old value
+ * ___test_and_change_bit - Change a bit and return its old value
  * @nr: Bit to change
  * @addr: Address to count from
  *
  * This operation is non-atomic and can be reordered.
  */
 static __always_inline bool
-__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__u32 old, bit = (1 << (nr & 31));
 	__u32 *m = (__u32 *) addr + (nr >> 5);
@@ -332,7 +332,7 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 static __always_inline bool
-test_bit(unsigned long nr, const volatile unsigned long *addr)
+_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1 & (((const volatile __u32 *) addr)[nr >> 5] >> (nr & 31));
 }
diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
index 9d44bd4713cb..ed08c9d9c547 100644
--- a/arch/m68k/include/asm/bitops.h
+++ b/arch/m68k/include/asm/bitops.h
@@ -66,7 +66,7 @@ static inline void bfset_mem_set_bit(int nr, volatile unsigned long *vaddr)
 #endif
 
 static __always_inline void
-__set_bit(unsigned long nr, volatile unsigned long *addr)
+___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	set_bit(nr, addr);
 }
@@ -109,7 +109,7 @@ static inline void bfclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
 #endif
 
 static __always_inline void
-__clear_bit(unsigned long nr, volatile unsigned long *addr)
+___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	clear_bit(nr, addr);
 }
@@ -152,13 +152,13 @@ static inline void bfchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
 #endif
 
 static __always_inline void
-__change_bit(unsigned long nr, volatile unsigned long *addr)
+___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	change_bit(nr, addr);
 }
 
 static __always_inline bool
-test_bit(unsigned long nr, const volatile unsigned long *addr)
+_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return (addr[nr >> 5] & (1UL << (nr & 31))) != 0;
 }
@@ -212,7 +212,7 @@ static inline int bfset_mem_test_and_set_bit(int nr,
 #endif
 
 static __always_inline bool
-__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_set_bit(nr, addr);
 }
@@ -265,7 +265,7 @@ static inline int bfclr_mem_test_and_clear_bit(int nr,
 #endif
 
 static __always_inline bool
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_clear_bit(nr, addr);
 }
@@ -318,7 +318,7 @@ static inline int bfchg_mem_test_and_change_bit(int nr,
 #endif
 
 static __always_inline bool
-__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_change_bit(nr, addr);
 }
diff --git a/arch/sh/include/asm/bitops-op32.h b/arch/sh/include/asm/bitops-op32.h
index dcd85866a394..24b8a809c1b9 100644
--- a/arch/sh/include/asm/bitops-op32.h
+++ b/arch/sh/include/asm/bitops-op32.h
@@ -19,7 +19,7 @@
 #endif
 
 static __always_inline void
-__set_bit(unsigned long nr, volatile unsigned long *addr)
+___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	if (__builtin_constant_p(nr)) {
 		__asm__ __volatile__ (
@@ -37,7 +37,7 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 static __always_inline void
-__clear_bit(unsigned long nr, volatile unsigned long *addr)
+___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	if (__builtin_constant_p(nr)) {
 		__asm__ __volatile__ (
@@ -56,7 +56,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * __change_bit - Toggle a bit in memory
+ * ___change_bit - Toggle a bit in memory
  * @nr: the bit to change
  * @addr: the address to start counting from
  *
@@ -65,7 +65,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
  * may be that only one operation succeeds.
  */
 static __always_inline void
-__change_bit(unsigned long nr, volatile unsigned long *addr)
+___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	if (__builtin_constant_p(nr)) {
 		__asm__ __volatile__ (
@@ -84,7 +84,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * __test_and_set_bit - Set a bit and return its old value
+ * ___test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
@@ -93,7 +93,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
  * but actually fail.  You must protect multiple accesses with a lock.
  */
 static __always_inline bool
-__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -104,7 +104,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * __test_and_clear_bit - Clear a bit and return its old value
+ * ___test_and_clear_bit - Clear a bit and return its old value
  * @nr: Bit to clear
  * @addr: Address to count from
  *
@@ -113,7 +113,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
  * but actually fail.  You must protect multiple accesses with a lock.
  */
 static __always_inline bool
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -125,7 +125,7 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 
 /* WARNING: non atomic and it can be reordered! */
 static __always_inline bool
-__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -136,12 +136,12 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * test_bit - Determine whether a bit is set
+ * _test_bit - Determine whether a bit is set
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
 static __always_inline bool
-test_bit(unsigned long nr, const volatile unsigned long *addr)
+_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
diff --git a/arch/sparc/include/asm/bitops_32.h b/arch/sparc/include/asm/bitops_32.h
index 889afa9f990f..3448c191b484 100644
--- a/arch/sparc/include/asm/bitops_32.h
+++ b/arch/sparc/include/asm/bitops_32.h
@@ -19,9 +19,9 @@
 #error only <linux/bitops.h> can be included directly
 #endif
 
-unsigned long ___set_bit(unsigned long *addr, unsigned long mask);
-unsigned long ___clear_bit(unsigned long *addr, unsigned long mask);
-unsigned long ___change_bit(unsigned long *addr, unsigned long mask);
+unsigned long sp32___set_bit(unsigned long *addr, unsigned long mask);
+unsigned long sp32___clear_bit(unsigned long *addr, unsigned long mask);
+unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask);
 
 /*
  * Set bit 'nr' in 32-bit quantity at address 'addr' where bit '0'
@@ -36,7 +36,7 @@ static inline int test_and_set_bit(unsigned long nr, volatile unsigned long *add
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	return ___set_bit(ADDR, mask) != 0;
+	return sp32___set_bit(ADDR, mask) != 0;
 }
 
 static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
@@ -46,7 +46,7 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	(void) ___set_bit(ADDR, mask);
+	(void) sp32___set_bit(ADDR, mask);
 }
 
 static inline int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
@@ -56,7 +56,7 @@ static inline int test_and_clear_bit(unsigned long nr, volatile unsigned long *a
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	return ___clear_bit(ADDR, mask) != 0;
+	return sp32___clear_bit(ADDR, mask) != 0;
 }
 
 static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
@@ -66,7 +66,7 @@ static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	(void) ___clear_bit(ADDR, mask);
+	(void) sp32___clear_bit(ADDR, mask);
 }
 
 static inline int test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
@@ -76,7 +76,7 @@ static inline int test_and_change_bit(unsigned long nr, volatile unsigned long *
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	return ___change_bit(ADDR, mask) != 0;
+	return sp32___change_bit(ADDR, mask) != 0;
 }
 
 static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
@@ -86,7 +86,7 @@ static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	(void) ___change_bit(ADDR, mask);
+	(void) sp32___change_bit(ADDR, mask);
 }
 
 #include <asm-generic/bitops/non-atomic.h>
diff --git a/arch/sparc/lib/atomic32.c b/arch/sparc/lib/atomic32.c
index 8b81d0f00c97..cf80d1ae352b 100644
--- a/arch/sparc/lib/atomic32.c
+++ b/arch/sparc/lib/atomic32.c
@@ -120,7 +120,7 @@ void arch_atomic_set(atomic_t *v, int i)
 }
 EXPORT_SYMBOL(arch_atomic_set);
 
-unsigned long ___set_bit(unsigned long *addr, unsigned long mask)
+unsigned long sp32___set_bit(unsigned long *addr, unsigned long mask)
 {
 	unsigned long old, flags;
 
@@ -131,9 +131,9 @@ unsigned long ___set_bit(unsigned long *addr, unsigned long mask)
 
 	return old & mask;
 }
-EXPORT_SYMBOL(___set_bit);
+EXPORT_SYMBOL(sp32___set_bit);
 
-unsigned long ___clear_bit(unsigned long *addr, unsigned long mask)
+unsigned long sp32___clear_bit(unsigned long *addr, unsigned long mask)
 {
 	unsigned long old, flags;
 
@@ -144,9 +144,9 @@ unsigned long ___clear_bit(unsigned long *addr, unsigned long mask)
 
 	return old & mask;
 }
-EXPORT_SYMBOL(___clear_bit);
+EXPORT_SYMBOL(sp32___clear_bit);
 
-unsigned long ___change_bit(unsigned long *addr, unsigned long mask)
+unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask)
 {
 	unsigned long old, flags;
 
@@ -157,7 +157,7 @@ unsigned long ___change_bit(unsigned long *addr, unsigned long mask)
 
 	return old & mask;
 }
-EXPORT_SYMBOL(___change_bit);
+EXPORT_SYMBOL(sp32___change_bit);
 
 unsigned long __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
 {
diff --git a/include/asm-generic/bitops/instrumented-non-atomic.h b/include/asm-generic/bitops/instrumented-non-atomic.h
index b019f77ef21c..988a3bbfba34 100644
--- a/include/asm-generic/bitops/instrumented-non-atomic.h
+++ b/include/asm-generic/bitops/instrumented-non-atomic.h
@@ -14,7 +14,7 @@
 #include <linux/instrumented.h>
 
 /**
- * __set_bit - Set a bit in memory
+ * ___set_bit - Set a bit in memory
  * @nr: the bit to set
  * @addr: the address to start counting from
  *
@@ -23,14 +23,14 @@
  * succeeds.
  */
 static __always_inline void
-__set_bit(unsigned long nr, volatile unsigned long *addr)
+___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___set_bit(nr, addr);
 }
 
 /**
- * __clear_bit - Clears a bit in memory
+ * ___clear_bit - Clears a bit in memory
  * @nr: the bit to clear
  * @addr: the address to start counting from
  *
@@ -39,14 +39,14 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
  * succeeds.
  */
 static __always_inline void
-__clear_bit(unsigned long nr, volatile unsigned long *addr)
+___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___clear_bit(nr, addr);
 }
 
 /**
- * __change_bit - Toggle a bit in memory
+ * ___change_bit - Toggle a bit in memory
  * @nr: the bit to change
  * @addr: the address to start counting from
  *
@@ -55,7 +55,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
  * succeeds.
  */
 static __always_inline void
-__change_bit(unsigned long nr, volatile unsigned long *addr)
+___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___change_bit(nr, addr);
@@ -86,7 +86,7 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
 }
 
 /**
- * __test_and_set_bit - Set a bit and return its old value
+ * ___test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
@@ -94,14 +94,14 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
  * can appear to succeed but actually fail.
  */
 static __always_inline bool
-__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_set_bit(nr, addr);
 }
 
 /**
- * __test_and_clear_bit - Clear a bit and return its old value
+ * ___test_and_clear_bit - Clear a bit and return its old value
  * @nr: Bit to clear
  * @addr: Address to count from
  *
@@ -109,14 +109,14 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
  * can appear to succeed but actually fail.
  */
 static __always_inline bool
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_clear_bit(nr, addr);
 }
 
 /**
- * __test_and_change_bit - Change a bit and return its old value
+ * ___test_and_change_bit - Change a bit and return its old value
  * @nr: Bit to change
  * @addr: Address to count from
  *
@@ -124,19 +124,19 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
  * can appear to succeed but actually fail.
  */
 static __always_inline bool
-__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_change_bit(nr, addr);
 }
 
 /**
- * test_bit - Determine whether a bit is set
+ * _test_bit - Determine whether a bit is set
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
 static __always_inline bool
-test_bit(unsigned long nr, const volatile unsigned long *addr)
+_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long));
 	return arch_test_bit(nr, addr);
diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h
index 7ce0ed22fb5f..0c488331c9e7 100644
--- a/include/asm-generic/bitops/non-atomic.h
+++ b/include/asm-generic/bitops/non-atomic.h
@@ -6,24 +6,24 @@
 #include <asm/types.h>
 
 #define arch___set_bit gen___set_bit
-#define __set_bit arch___set_bit
+#define ___set_bit arch___set_bit
 
 #define arch___clear_bit gen___clear_bit
-#define __clear_bit arch___clear_bit
+#define ___clear_bit arch___clear_bit
 
 #define arch___change_bit gen___change_bit
-#define __change_bit arch___change_bit
+#define ___change_bit arch___change_bit
 
 #define arch___test_and_set_bit gen___test_and_set_bit
-#define __test_and_set_bit arch___test_and_set_bit
+#define ___test_and_set_bit arch___test_and_set_bit
 
 #define arch___test_and_clear_bit gen___test_and_clear_bit
-#define __test_and_clear_bit arch___test_and_clear_bit
+#define ___test_and_clear_bit arch___test_and_clear_bit
 
 #define arch___test_and_change_bit gen___test_and_change_bit
-#define __test_and_change_bit arch___test_and_change_bit
+#define ___test_and_change_bit arch___test_and_change_bit
 
 #define arch_test_bit gen_test_bit
-#define test_bit arch_test_bit
+#define _test_bit arch_test_bit
 
 #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 5520ac9b1c24..33cfc836a115 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -26,8 +26,24 @@ extern unsigned int __sw_hweight16(unsigned int w);
 extern unsigned int __sw_hweight32(unsigned int w);
 extern unsigned long __sw_hweight64(__u64 w);
 
+/*
+ * Defined here because those may be needed by architecture-specific static
+ * inlines.
+ */
+
 #include <asm-generic/bitops/generic-non-atomic.h>
 
+#define bitop(op, nr, addr)						\
+	op(nr, addr)
+
+#define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
+#define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
+#define __change_bit(nr, addr)		bitop(___change_bit, nr, addr)
+#define __test_and_set_bit(nr, addr)	bitop(___test_and_set_bit, nr, addr)
+#define __test_and_clear_bit(nr, addr)	bitop(___test_and_clear_bit, nr, addr)
+#define __test_and_change_bit(nr, addr)	bitop(___test_and_change_bit, nr, addr)
+#define test_bit(nr, addr)		bitop(_test_bit, nr, addr)
+
 /*
  * Include this here because some architectures need generic_ffs/fls in
  * scope
@@ -35,14 +51,14 @@ extern unsigned long __sw_hweight64(__u64 w);
 #include <asm/bitops.h>
 
 /* Check that the bitops prototypes are sane */
-#define __check_bitop_pr(name)	static_assert(__same_type(name, gen_##name))
-__check_bitop_pr(__set_bit);
-__check_bitop_pr(__clear_bit);
-__check_bitop_pr(__change_bit);
-__check_bitop_pr(__test_and_set_bit);
-__check_bitop_pr(__test_and_clear_bit);
-__check_bitop_pr(__test_and_change_bit);
-__check_bitop_pr(test_bit);
+#define __check_bitop_pr(name)	static_assert(__same_type(name, gen##name))
+__check_bitop_pr(___set_bit);
+__check_bitop_pr(___clear_bit);
+__check_bitop_pr(___change_bit);
+__check_bitop_pr(___test_and_set_bit);
+__check_bitop_pr(___test_and_clear_bit);
+__check_bitop_pr(___test_and_change_bit);
+__check_bitop_pr(_test_bit);
 #undef __check_bitop_pr
 
 static inline int get_bitmask_order(unsigned int count)
diff --git a/tools/include/asm-generic/bitops/non-atomic.h b/tools/include/asm-generic/bitops/non-atomic.h
index e5e78e42e57b..0c472a833408 100644
--- a/tools/include/asm-generic/bitops/non-atomic.h
+++ b/tools/include/asm-generic/bitops/non-atomic.h
@@ -5,7 +5,7 @@
 #include <linux/bits.h>
 
 /**
- * __set_bit - Set a bit in memory
+ * ___set_bit - Set a bit in memory
  * @nr: the bit to set
  * @addr: the address to start counting from
  *
@@ -14,7 +14,7 @@
  * may be that only one operation succeeds.
  */
 static __always_inline void
-__set_bit(unsigned long nr, volatile unsigned long *addr)
+___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -23,7 +23,7 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 static __always_inline void
-__clear_bit(unsigned long nr, volatile unsigned long *addr)
+___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -32,7 +32,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * __change_bit - Toggle a bit in memory
+ * ___change_bit - Toggle a bit in memory
  * @nr: the bit to change
  * @addr: the address to start counting from
  *
@@ -41,7 +41,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
  * may be that only one operation succeeds.
  */
 static __always_inline void
-__change_bit(unsigned long nr, volatile unsigned long *addr)
+___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -50,7 +50,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * __test_and_set_bit - Set a bit and return its old value
+ * ___test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
@@ -59,7 +59,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
  * but actually fail.  You must protect multiple accesses with a lock.
  */
 static __always_inline bool
-__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -70,7 +70,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * __test_and_clear_bit - Clear a bit and return its old value
+ * ___test_and_clear_bit - Clear a bit and return its old value
  * @nr: Bit to clear
  * @addr: Address to count from
  *
@@ -79,7 +79,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
  * but actually fail.  You must protect multiple accesses with a lock.
  */
 static __always_inline bool
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -91,7 +91,7 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 
 /* WARNING: non atomic and it can be reordered! */
 static __always_inline bool
-__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -102,12 +102,12 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * test_bit - Determine whether a bit is set
+ * _test_bit - Determine whether a bit is set
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
 static __always_inline bool
-test_bit(unsigned long nr, const volatile unsigned long *addr)
+_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
diff --git a/tools/include/linux/bitops.h b/tools/include/linux/bitops.h
index 5fca38fe1ba8..f18683b95ea6 100644
--- a/tools/include/linux/bitops.h
+++ b/tools/include/linux/bitops.h
@@ -25,6 +25,22 @@ extern unsigned int __sw_hweight16(unsigned int w);
 extern unsigned int __sw_hweight32(unsigned int w);
 extern unsigned long __sw_hweight64(__u64 w);
 
+/*
+ * Defined here because those may be needed by architecture-specific static
+ * inlines.
+ */
+
+#define bitop(op, nr, addr)						\
+	op(nr, addr)
+
+#define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
+#define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
+#define __change_bit(nr, addr)		bitop(___change_bit, nr, addr)
+#define __test_and_set_bit(nr, addr)	bitop(___test_and_set_bit, nr, addr)
+#define __test_and_clear_bit(nr, addr)	bitop(___test_and_clear_bit, nr, addr)
+#define __test_and_change_bit(nr, addr)	bitop(___test_and_change_bit, nr, addr)
+#define test_bit(nr, addr)		bitop(_test_bit, nr, addr)
+
 /*
  * Include this here because some architectures need generic_ffs/fls in
  * scope
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 6/6] bitops: let optimize out non-atomic bitops on compile-time constants
  2022-06-06 11:49 ` Alexander Lobakin
  (?)
@ 2022-06-06 11:49   ` Alexander Lobakin
  -1 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-06 11:49 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

Currently, many architecture-specific non-atomic bitop
implementations use inline asm or other hacks which are faster or
more robust when working with "real" variables (i.e. fields from
the structures etc.), but the compilers have no clue how to optimize
them out when called on compile-time constants. That said, the
following code:

	DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
	unsigned long bar = BIT(BAR_BIT);
	unsigned long baz = 0;

	__set_bit(FOO_BIT, foo);
	baz |= BIT(BAZ_BIT);

	BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
	BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
	BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));

triggers the first assertion on x86_64, which means that the
compiler is unable to evaluate it to a compile-time initializer
when the architecture-specific bitop is used even if it's obvious.
In order to let the compiler optimize out such cases, expand the
bitop() macro to use the generic C non-atomic bitop implementations
when all of the arguments passed are compile-time constants, which
means that the result will be a compile-time constant as well, so
that it produces more efficient and simple code in 100% cases,
comparing to the architecture-specific counterparts.
The savings on x86_64 with LLVM are insane (.text):

$ scripts/bloat-o-meter -c vmlinux.{base,test}
add/remove: 72/75 grow/shrink: 182/518 up/down: 53925/-137810 (-83885)

$ scripts/bloat-o-meter -c vmlinux.{base,mod}
add/remove: 7/1 grow/shrink: 1/19 up/down: 1135/-4082 (-2947)

$ scripts/bloat-o-meter -c vmlinux.{base,all}
add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 include/linux/bitops.h | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 33cfc836a115..5788784b2f65 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -33,8 +33,24 @@ extern unsigned long __sw_hweight64(__u64 w);
 
 #include <asm-generic/bitops/generic-non-atomic.h>
 
+/*
+ * Many architecture-specific non-atomic bitops contain inline asm code and due
+ * to that the compiler can't optimize them to compile-time expressions or
+ * constants. In contrary, gen_*() helpers are defined in pure C and compilers
+ * optimize them just well.
+ * Therefore, to make `unsigned long foo = 0; __set_bit(BAR, &foo)` effectively
+ * equal to `unsigned long foo = BIT(BAR)`, pick the generic C alternative when
+ * the arguments can be resolved at compile time. That expression itself is a
+ * constant and doesn't bring any functional changes to the rest of cases.
+ * The casts to `uintptr_t` are needed to mitigate `-Waddress` warnings when
+ * passing a bitmap from .bss or .data (-> `!!addr` is always true).
+ */
 #define bitop(op, nr, addr)						\
-	op(nr, addr)
+	((__builtin_constant_p(nr) &&					\
+	  __builtin_constant_p((uintptr_t)(addr) != (uintptr_t)NULL) &&	\
+	  (uintptr_t)(addr) != (uintptr_t)NULL &&			\
+	  __builtin_constant_p(*(const unsigned long *)(addr))) ?	\
+	 gen##op(nr, addr) : op(nr, addr))
 
 #define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
 #define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 6/6] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-06 11:49   ` Alexander Lobakin
  0 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-06 11:49 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

Currently, many architecture-specific non-atomic bitop
implementations use inline asm or other hacks which are faster or
more robust when working with "real" variables (i.e. fields from
the structures etc.), but the compilers have no clue how to optimize
them out when called on compile-time constants. That said, the
following code:

	DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
	unsigned long bar = BIT(BAR_BIT);
	unsigned long baz = 0;

	__set_bit(FOO_BIT, foo);
	baz |= BIT(BAZ_BIT);

	BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
	BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
	BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));

triggers the first assertion on x86_64, which means that the
compiler is unable to evaluate it to a compile-time initializer
when the architecture-specific bitop is used even if it's obvious.
In order to let the compiler optimize out such cases, expand the
bitop() macro to use the generic C non-atomic bitop implementations
when all of the arguments passed are compile-time constants, which
means that the result will be a compile-time constant as well, so
that it produces more efficient and simple code in 100% cases,
comparing to the architecture-specific counterparts.
The savings on x86_64 with LLVM are insane (.text):

$ scripts/bloat-o-meter -c vmlinux.{base,test}
add/remove: 72/75 grow/shrink: 182/518 up/down: 53925/-137810 (-83885)

$ scripts/bloat-o-meter -c vmlinux.{base,mod}
add/remove: 7/1 grow/shrink: 1/19 up/down: 1135/-4082 (-2947)

$ scripts/bloat-o-meter -c vmlinux.{base,all}
add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 include/linux/bitops.h | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 33cfc836a115..5788784b2f65 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -33,8 +33,24 @@ extern unsigned long __sw_hweight64(__u64 w);
 
 #include <asm-generic/bitops/generic-non-atomic.h>
 
+/*
+ * Many architecture-specific non-atomic bitops contain inline asm code and due
+ * to that the compiler can't optimize them to compile-time expressions or
+ * constants. In contrary, gen_*() helpers are defined in pure C and compilers
+ * optimize them just well.
+ * Therefore, to make `unsigned long foo = 0; __set_bit(BAR, &foo)` effectively
+ * equal to `unsigned long foo = BIT(BAR)`, pick the generic C alternative when
+ * the arguments can be resolved at compile time. That expression itself is a
+ * constant and doesn't bring any functional changes to the rest of cases.
+ * The casts to `uintptr_t` are needed to mitigate `-Waddress` warnings when
+ * passing a bitmap from .bss or .data (-> `!!addr` is always true).
+ */
 #define bitop(op, nr, addr)						\
-	op(nr, addr)
+	((__builtin_constant_p(nr) &&					\
+	  __builtin_constant_p((uintptr_t)(addr) != (uintptr_t)NULL) &&	\
+	  (uintptr_t)(addr) != (uintptr_t)NULL &&			\
+	  __builtin_constant_p(*(const unsigned long *)(addr))) ?	\
+	 gen##op(nr, addr) : op(nr, addr))
 
 #define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
 #define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
-- 
2.36.1

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 6/6] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-06 11:49   ` Alexander Lobakin
  0 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-06 11:49 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux

Currently, many architecture-specific non-atomic bitop
implementations use inline asm or other hacks which are faster or
more robust when working with "real" variables (i.e. fields from
the structures etc.), but the compilers have no clue how to optimize
them out when called on compile-time constants. That said, the
following code:

	DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
	unsigned long bar = BIT(BAR_BIT);
	unsigned long baz = 0;

	__set_bit(FOO_BIT, foo);
	baz |= BIT(BAZ_BIT);

	BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
	BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
	BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));

triggers the first assertion on x86_64, which means that the
compiler is unable to evaluate it to a compile-time initializer
when the architecture-specific bitop is used even if it's obvious.
In order to let the compiler optimize out such cases, expand the
bitop() macro to use the generic C non-atomic bitop implementations
when all of the arguments passed are compile-time constants, which
means that the result will be a compile-time constant as well, so
that it produces more efficient and simple code in 100% cases,
comparing to the architecture-specific counterparts.
The savings on x86_64 with LLVM are insane (.text):

$ scripts/bloat-o-meter -c vmlinux.{base,test}
add/remove: 72/75 grow/shrink: 182/518 up/down: 53925/-137810 (-83885)

$ scripts/bloat-o-meter -c vmlinux.{base,mod}
add/remove: 7/1 grow/shrink: 1/19 up/down: 1135/-4082 (-2947)

$ scripts/bloat-o-meter -c vmlinux.{base,all}
add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 include/linux/bitops.h | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 33cfc836a115..5788784b2f65 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -33,8 +33,24 @@ extern unsigned long __sw_hweight64(__u64 w);
 
 #include <asm-generic/bitops/generic-non-atomic.h>
 
+/*
+ * Many architecture-specific non-atomic bitops contain inline asm code and due
+ * to that the compiler can't optimize them to compile-time expressions or
+ * constants. In contrary, gen_*() helpers are defined in pure C and compilers
+ * optimize them just well.
+ * Therefore, to make `unsigned long foo = 0; __set_bit(BAR, &foo)` effectively
+ * equal to `unsigned long foo = BIT(BAR)`, pick the generic C alternative when
+ * the arguments can be resolved at compile time. That expression itself is a
+ * constant and doesn't bring any functional changes to the rest of cases.
+ * The casts to `uintptr_t` are needed to mitigate `-Waddress` warnings when
+ * passing a bitmap from .bss or .data (-> `!!addr` is always true).
+ */
 #define bitop(op, nr, addr)						\
-	op(nr, addr)
+	((__builtin_constant_p(nr) &&					\
+	  __builtin_constant_p((uintptr_t)(addr) != (uintptr_t)NULL) &&	\
+	  (uintptr_t)(addr) != (uintptr_t)NULL &&			\
+	  __builtin_constant_p(*(const unsigned long *)(addr))) ?	\
+	 gen##op(nr, addr) : op(nr, addr))
 
 #define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
 #define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [PATCH 2/6] bitops: always define asm-generic non-atomic bitops
  2022-06-06 11:49   ` Alexander Lobakin
  (?)
@ 2022-06-06 12:44     ` Mark Rutland
  -1 siblings, 0 replies; 75+ messages in thread
From: Mark Rutland @ 2022-06-06 12:44 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

On Mon, Jun 06, 2022 at 01:49:03PM +0200, Alexander Lobakin wrote:
> Move generic non-atomic bitops from the asm-generic header which
> gets included only when there are no architecture-specific
> alternatives, to a separate independent file to make them always
> available.
> 
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> ---
>  .../asm-generic/bitops/generic-non-atomic.h   | 124 ++++++++++++++++++
>  include/asm-generic/bitops/non-atomic.h       | 109 ++-------------
>  2 files changed, 132 insertions(+), 101 deletions(-)
>  create mode 100644 include/asm-generic/bitops/generic-non-atomic.h
> 
> diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> new file mode 100644
> index 000000000000..7a60adfa6e7d
> --- /dev/null
> +++ b/include/asm-generic/bitops/generic-non-atomic.h
> @@ -0,0 +1,124 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +#ifndef __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
> +#define __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
> +
> +#include <linux/bits.h>
> +
> +#ifndef _LINUX_BITOPS_H
> +#error only <linux/bitops.h> can be included directly
> +#endif
> +
> +/*
> + * Generic definitions for bit operations, should not be used in regular code
> + * directly.
> + */
> +
> +/**
> + * gen___set_bit - Set a bit in memory
> + * @nr: the bit to set
> + * @addr: the address to start counting from
> + *
> + * Unlike set_bit(), this function is non-atomic and may be reordered.
> + * If it's called on the same region of memory simultaneously, the effect
> + * may be that only one operation succeeds.
> + */
> +static __always_inline void
> +gen___set_bit(unsigned int nr, volatile unsigned long *addr)

Could we please use 'generic' rather than 'gen' as the prefix?

That'd match what we did for the generic atomic_*() and atomic64_*() functions
in commits

* f8b6455a9d381fc5 ("locking/atomic: atomic: support ARCH_ATOMIC")
* 1bdadf46eff6804a ("locking/atomic: atomic64: support ARCH_ATOMIC")

... and it avoids any potential confusion with 'gen' meaning 'generated' or
similar.

Thanks,
Mark.

> +{
> +	unsigned long mask = BIT_MASK(nr);
> +	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +
> +	*p  |= mask;
> +}
> +
> +static __always_inline void
> +gen___clear_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +	unsigned long mask = BIT_MASK(nr);
> +	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +
> +	*p &= ~mask;
> +}
> +
> +/**
> + * gen___change_bit - Toggle a bit in memory
> + * @nr: the bit to change
> + * @addr: the address to start counting from
> + *
> + * Unlike change_bit(), this function is non-atomic and may be reordered.
> + * If it's called on the same region of memory simultaneously, the effect
> + * may be that only one operation succeeds.
> + */
> +static __always_inline
> +void gen___change_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +	unsigned long mask = BIT_MASK(nr);
> +	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +
> +	*p ^= mask;
> +}
> +
> +/**
> + * gen___test_and_set_bit - Set a bit and return its old value
> + * @nr: Bit to set
> + * @addr: Address to count from
> + *
> + * This operation is non-atomic and can be reordered.
> + * If two examples of this operation race, one can appear to succeed
> + * but actually fail.  You must protect multiple accesses with a lock.
> + */
> +static __always_inline int
> +gen___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +	unsigned long mask = BIT_MASK(nr);
> +	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +	unsigned long old = *p;
> +
> +	*p = old | mask;
> +	return (old & mask) != 0;
> +}
> +
> +/**
> + * gen___test_and_clear_bit - Clear a bit and return its old value
> + * @nr: Bit to clear
> + * @addr: Address to count from
> + *
> + * This operation is non-atomic and can be reordered.
> + * If two examples of this operation race, one can appear to succeed
> + * but actually fail.  You must protect multiple accesses with a lock.
> + */
> +static __always_inline int
> +gen___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +	unsigned long mask = BIT_MASK(nr);
> +	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +	unsigned long old = *p;
> +
> +	*p = old & ~mask;
> +	return (old & mask) != 0;
> +}
> +
> +/* WARNING: non atomic and it can be reordered! */
> +static __always_inline int
> +gen___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +	unsigned long mask = BIT_MASK(nr);
> +	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +	unsigned long old = *p;
> +
> +	*p = old ^ mask;
> +	return (old & mask) != 0;
> +}
> +
> +/**
> + * gen_test_bit - Determine whether a bit is set
> + * @nr: bit number to test
> + * @addr: Address to start counting from
> + */
> +static __always_inline int
> +gen_test_bit(unsigned int nr, const volatile unsigned long *addr)
> +{
> +	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> +}
> +
> +#endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
> diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h
> index 078cc68be2f1..7ce0ed22fb5f 100644
> --- a/include/asm-generic/bitops/non-atomic.h
> +++ b/include/asm-generic/bitops/non-atomic.h
> @@ -2,121 +2,28 @@
>  #ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
>  #define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
>  
> +#include <asm-generic/bitops/generic-non-atomic.h>
>  #include <asm/types.h>
>  
> -/**
> - * arch___set_bit - Set a bit in memory
> - * @nr: the bit to set
> - * @addr: the address to start counting from
> - *
> - * Unlike set_bit(), this function is non-atomic and may be reordered.
> - * If it's called on the same region of memory simultaneously, the effect
> - * may be that only one operation succeeds.
> - */
> -static __always_inline void
> -arch___set_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -	unsigned long mask = BIT_MASK(nr);
> -	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -
> -	*p  |= mask;
> -}
> +#define arch___set_bit gen___set_bit
>  #define __set_bit arch___set_bit
>  
> -static __always_inline void
> -arch___clear_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -	unsigned long mask = BIT_MASK(nr);
> -	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -
> -	*p &= ~mask;
> -}
> +#define arch___clear_bit gen___clear_bit
>  #define __clear_bit arch___clear_bit
>  
> -/**
> - * arch___change_bit - Toggle a bit in memory
> - * @nr: the bit to change
> - * @addr: the address to start counting from
> - *
> - * Unlike change_bit(), this function is non-atomic and may be reordered.
> - * If it's called on the same region of memory simultaneously, the effect
> - * may be that only one operation succeeds.
> - */
> -static __always_inline
> -void arch___change_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -	unsigned long mask = BIT_MASK(nr);
> -	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -
> -	*p ^= mask;
> -}
> +#define arch___change_bit gen___change_bit
>  #define __change_bit arch___change_bit
>  
> -/**
> - * arch___test_and_set_bit - Set a bit and return its old value
> - * @nr: Bit to set
> - * @addr: Address to count from
> - *
> - * This operation is non-atomic and can be reordered.
> - * If two examples of this operation race, one can appear to succeed
> - * but actually fail.  You must protect multiple accesses with a lock.
> - */
> -static __always_inline int
> -arch___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -	unsigned long mask = BIT_MASK(nr);
> -	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -	unsigned long old = *p;
> -
> -	*p = old | mask;
> -	return (old & mask) != 0;
> -}
> +#define arch___test_and_set_bit gen___test_and_set_bit
>  #define __test_and_set_bit arch___test_and_set_bit
>  
> -/**
> - * arch___test_and_clear_bit - Clear a bit and return its old value
> - * @nr: Bit to clear
> - * @addr: Address to count from
> - *
> - * This operation is non-atomic and can be reordered.
> - * If two examples of this operation race, one can appear to succeed
> - * but actually fail.  You must protect multiple accesses with a lock.
> - */
> -static __always_inline int
> -arch___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -	unsigned long mask = BIT_MASK(nr);
> -	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -	unsigned long old = *p;
> -
> -	*p = old & ~mask;
> -	return (old & mask) != 0;
> -}
> +#define arch___test_and_clear_bit gen___test_and_clear_bit
>  #define __test_and_clear_bit arch___test_and_clear_bit
>  
> -/* WARNING: non atomic and it can be reordered! */
> -static __always_inline int
> -arch___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -	unsigned long mask = BIT_MASK(nr);
> -	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -	unsigned long old = *p;
> -
> -	*p = old ^ mask;
> -	return (old & mask) != 0;
> -}
> +#define arch___test_and_change_bit gen___test_and_change_bit
>  #define __test_and_change_bit arch___test_and_change_bit
>  
> -/**
> - * arch_test_bit - Determine whether a bit is set
> - * @nr: bit number to test
> - * @addr: Address to start counting from
> - */
> -static __always_inline int
> -arch_test_bit(unsigned int nr, const volatile unsigned long *addr)
> -{
> -	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> -}
> +#define arch_test_bit gen_test_bit
>  #define test_bit arch_test_bit
>  
>  #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
> -- 
> 2.36.1
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 2/6] bitops: always define asm-generic non-atomic bitops
@ 2022-06-06 12:44     ` Mark Rutland
  0 siblings, 0 replies; 75+ messages in thread
From: Mark Rutland @ 2022-06-06 12:44 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

On Mon, Jun 06, 2022 at 01:49:03PM +0200, Alexander Lobakin wrote:
> Move generic non-atomic bitops from the asm-generic header which
> gets included only when there are no architecture-specific
> alternatives, to a separate independent file to make them always
> available.
> 
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> ---
>  .../asm-generic/bitops/generic-non-atomic.h   | 124 ++++++++++++++++++
>  include/asm-generic/bitops/non-atomic.h       | 109 ++-------------
>  2 files changed, 132 insertions(+), 101 deletions(-)
>  create mode 100644 include/asm-generic/bitops/generic-non-atomic.h
> 
> diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> new file mode 100644
> index 000000000000..7a60adfa6e7d
> --- /dev/null
> +++ b/include/asm-generic/bitops/generic-non-atomic.h
> @@ -0,0 +1,124 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +#ifndef __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
> +#define __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
> +
> +#include <linux/bits.h>
> +
> +#ifndef _LINUX_BITOPS_H
> +#error only <linux/bitops.h> can be included directly
> +#endif
> +
> +/*
> + * Generic definitions for bit operations, should not be used in regular code
> + * directly.
> + */
> +
> +/**
> + * gen___set_bit - Set a bit in memory
> + * @nr: the bit to set
> + * @addr: the address to start counting from
> + *
> + * Unlike set_bit(), this function is non-atomic and may be reordered.
> + * If it's called on the same region of memory simultaneously, the effect
> + * may be that only one operation succeeds.
> + */
> +static __always_inline void
> +gen___set_bit(unsigned int nr, volatile unsigned long *addr)

Could we please use 'generic' rather than 'gen' as the prefix?

That'd match what we did for the generic atomic_*() and atomic64_*() functions
in commits

* f8b6455a9d381fc5 ("locking/atomic: atomic: support ARCH_ATOMIC")
* 1bdadf46eff6804a ("locking/atomic: atomic64: support ARCH_ATOMIC")

... and it avoids any potential confusion with 'gen' meaning 'generated' or
similar.

Thanks,
Mark.

> +{
> +	unsigned long mask = BIT_MASK(nr);
> +	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +
> +	*p  |= mask;
> +}
> +
> +static __always_inline void
> +gen___clear_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +	unsigned long mask = BIT_MASK(nr);
> +	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +
> +	*p &= ~mask;
> +}
> +
> +/**
> + * gen___change_bit - Toggle a bit in memory
> + * @nr: the bit to change
> + * @addr: the address to start counting from
> + *
> + * Unlike change_bit(), this function is non-atomic and may be reordered.
> + * If it's called on the same region of memory simultaneously, the effect
> + * may be that only one operation succeeds.
> + */
> +static __always_inline
> +void gen___change_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +	unsigned long mask = BIT_MASK(nr);
> +	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +
> +	*p ^= mask;
> +}
> +
> +/**
> + * gen___test_and_set_bit - Set a bit and return its old value
> + * @nr: Bit to set
> + * @addr: Address to count from
> + *
> + * This operation is non-atomic and can be reordered.
> + * If two examples of this operation race, one can appear to succeed
> + * but actually fail.  You must protect multiple accesses with a lock.
> + */
> +static __always_inline int
> +gen___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +	unsigned long mask = BIT_MASK(nr);
> +	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +	unsigned long old = *p;
> +
> +	*p = old | mask;
> +	return (old & mask) != 0;
> +}
> +
> +/**
> + * gen___test_and_clear_bit - Clear a bit and return its old value
> + * @nr: Bit to clear
> + * @addr: Address to count from
> + *
> + * This operation is non-atomic and can be reordered.
> + * If two examples of this operation race, one can appear to succeed
> + * but actually fail.  You must protect multiple accesses with a lock.
> + */
> +static __always_inline int
> +gen___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +	unsigned long mask = BIT_MASK(nr);
> +	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +	unsigned long old = *p;
> +
> +	*p = old & ~mask;
> +	return (old & mask) != 0;
> +}
> +
> +/* WARNING: non atomic and it can be reordered! */
> +static __always_inline int
> +gen___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +	unsigned long mask = BIT_MASK(nr);
> +	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +	unsigned long old = *p;
> +
> +	*p = old ^ mask;
> +	return (old & mask) != 0;
> +}
> +
> +/**
> + * gen_test_bit - Determine whether a bit is set
> + * @nr: bit number to test
> + * @addr: Address to start counting from
> + */
> +static __always_inline int
> +gen_test_bit(unsigned int nr, const volatile unsigned long *addr)
> +{
> +	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> +}
> +
> +#endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
> diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h
> index 078cc68be2f1..7ce0ed22fb5f 100644
> --- a/include/asm-generic/bitops/non-atomic.h
> +++ b/include/asm-generic/bitops/non-atomic.h
> @@ -2,121 +2,28 @@
>  #ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
>  #define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
>  
> +#include <asm-generic/bitops/generic-non-atomic.h>
>  #include <asm/types.h>
>  
> -/**
> - * arch___set_bit - Set a bit in memory
> - * @nr: the bit to set
> - * @addr: the address to start counting from
> - *
> - * Unlike set_bit(), this function is non-atomic and may be reordered.
> - * If it's called on the same region of memory simultaneously, the effect
> - * may be that only one operation succeeds.
> - */
> -static __always_inline void
> -arch___set_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -	unsigned long mask = BIT_MASK(nr);
> -	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -
> -	*p  |= mask;
> -}
> +#define arch___set_bit gen___set_bit
>  #define __set_bit arch___set_bit
>  
> -static __always_inline void
> -arch___clear_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -	unsigned long mask = BIT_MASK(nr);
> -	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -
> -	*p &= ~mask;
> -}
> +#define arch___clear_bit gen___clear_bit
>  #define __clear_bit arch___clear_bit
>  
> -/**
> - * arch___change_bit - Toggle a bit in memory
> - * @nr: the bit to change
> - * @addr: the address to start counting from
> - *
> - * Unlike change_bit(), this function is non-atomic and may be reordered.
> - * If it's called on the same region of memory simultaneously, the effect
> - * may be that only one operation succeeds.
> - */
> -static __always_inline
> -void arch___change_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -	unsigned long mask = BIT_MASK(nr);
> -	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -
> -	*p ^= mask;
> -}
> +#define arch___change_bit gen___change_bit
>  #define __change_bit arch___change_bit
>  
> -/**
> - * arch___test_and_set_bit - Set a bit and return its old value
> - * @nr: Bit to set
> - * @addr: Address to count from
> - *
> - * This operation is non-atomic and can be reordered.
> - * If two examples of this operation race, one can appear to succeed
> - * but actually fail.  You must protect multiple accesses with a lock.
> - */
> -static __always_inline int
> -arch___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -	unsigned long mask = BIT_MASK(nr);
> -	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -	unsigned long old = *p;
> -
> -	*p = old | mask;
> -	return (old & mask) != 0;
> -}
> +#define arch___test_and_set_bit gen___test_and_set_bit
>  #define __test_and_set_bit arch___test_and_set_bit
>  
> -/**
> - * arch___test_and_clear_bit - Clear a bit and return its old value
> - * @nr: Bit to clear
> - * @addr: Address to count from
> - *
> - * This operation is non-atomic and can be reordered.
> - * If two examples of this operation race, one can appear to succeed
> - * but actually fail.  You must protect multiple accesses with a lock.
> - */
> -static __always_inline int
> -arch___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -	unsigned long mask = BIT_MASK(nr);
> -	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -	unsigned long old = *p;
> -
> -	*p = old & ~mask;
> -	return (old & mask) != 0;
> -}
> +#define arch___test_and_clear_bit gen___test_and_clear_bit
>  #define __test_and_clear_bit arch___test_and_clear_bit
>  
> -/* WARNING: non atomic and it can be reordered! */
> -static __always_inline int
> -arch___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -	unsigned long mask = BIT_MASK(nr);
> -	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -	unsigned long old = *p;
> -
> -	*p = old ^ mask;
> -	return (old & mask) != 0;
> -}
> +#define arch___test_and_change_bit gen___test_and_change_bit
>  #define __test_and_change_bit arch___test_and_change_bit
>  
> -/**
> - * arch_test_bit - Determine whether a bit is set
> - * @nr: bit number to test
> - * @addr: Address to start counting from
> - */
> -static __always_inline int
> -arch_test_bit(unsigned int nr, const volatile unsigned long *addr)
> -{
> -	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> -}
> +#define arch_test_bit gen_test_bit
>  #define test_bit arch_test_bit
>  
>  #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
> -- 
> 2.36.1
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 2/6] bitops: always define asm-generic non-atomic bitops
@ 2022-06-06 12:44     ` Mark Rutland
  0 siblings, 0 replies; 75+ messages in thread
From: Mark Rutland @ 2022-06-06 12:44 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh

On Mon, Jun 06, 2022 at 01:49:03PM +0200, Alexander Lobakin wrote:
> Move generic non-atomic bitops from the asm-generic header which
> gets included only when there are no architecture-specific
> alternatives, to a separate independent file to make them always
> available.
> 
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> ---
>  .../asm-generic/bitops/generic-non-atomic.h   | 124 ++++++++++++++++++
>  include/asm-generic/bitops/non-atomic.h       | 109 ++-------------
>  2 files changed, 132 insertions(+), 101 deletions(-)
>  create mode 100644 include/asm-generic/bitops/generic-non-atomic.h
> 
> diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> new file mode 100644
> index 000000000000..7a60adfa6e7d
> --- /dev/null
> +++ b/include/asm-generic/bitops/generic-non-atomic.h
> @@ -0,0 +1,124 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +#ifndef __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
> +#define __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
> +
> +#include <linux/bits.h>
> +
> +#ifndef _LINUX_BITOPS_H
> +#error only <linux/bitops.h> can be included directly
> +#endif
> +
> +/*
> + * Generic definitions for bit operations, should not be used in regular code
> + * directly.
> + */
> +
> +/**
> + * gen___set_bit - Set a bit in memory
> + * @nr: the bit to set
> + * @addr: the address to start counting from
> + *
> + * Unlike set_bit(), this function is non-atomic and may be reordered.
> + * If it's called on the same region of memory simultaneously, the effect
> + * may be that only one operation succeeds.
> + */
> +static __always_inline void
> +gen___set_bit(unsigned int nr, volatile unsigned long *addr)

Could we please use 'generic' rather than 'gen' as the prefix?

That'd match what we did for the generic atomic_*() and atomic64_*() functions
in commits

* f8b6455a9d381fc5 ("locking/atomic: atomic: support ARCH_ATOMIC")
* 1bdadf46eff6804a ("locking/atomic: atomic64: support ARCH_ATOMIC")

... and it avoids any potential confusion with 'gen' meaning 'generated' or
similar.

Thanks,
Mark.

> +{
> +	unsigned long mask = BIT_MASK(nr);
> +	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +
> +	*p  |= mask;
> +}
> +
> +static __always_inline void
> +gen___clear_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +	unsigned long mask = BIT_MASK(nr);
> +	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +
> +	*p &= ~mask;
> +}
> +
> +/**
> + * gen___change_bit - Toggle a bit in memory
> + * @nr: the bit to change
> + * @addr: the address to start counting from
> + *
> + * Unlike change_bit(), this function is non-atomic and may be reordered.
> + * If it's called on the same region of memory simultaneously, the effect
> + * may be that only one operation succeeds.
> + */
> +static __always_inline
> +void gen___change_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +	unsigned long mask = BIT_MASK(nr);
> +	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +
> +	*p ^= mask;
> +}
> +
> +/**
> + * gen___test_and_set_bit - Set a bit and return its old value
> + * @nr: Bit to set
> + * @addr: Address to count from
> + *
> + * This operation is non-atomic and can be reordered.
> + * If two examples of this operation race, one can appear to succeed
> + * but actually fail.  You must protect multiple accesses with a lock.
> + */
> +static __always_inline int
> +gen___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +	unsigned long mask = BIT_MASK(nr);
> +	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +	unsigned long old = *p;
> +
> +	*p = old | mask;
> +	return (old & mask) != 0;
> +}
> +
> +/**
> + * gen___test_and_clear_bit - Clear a bit and return its old value
> + * @nr: Bit to clear
> + * @addr: Address to count from
> + *
> + * This operation is non-atomic and can be reordered.
> + * If two examples of this operation race, one can appear to succeed
> + * but actually fail.  You must protect multiple accesses with a lock.
> + */
> +static __always_inline int
> +gen___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +	unsigned long mask = BIT_MASK(nr);
> +	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +	unsigned long old = *p;
> +
> +	*p = old & ~mask;
> +	return (old & mask) != 0;
> +}
> +
> +/* WARNING: non atomic and it can be reordered! */
> +static __always_inline int
> +gen___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +	unsigned long mask = BIT_MASK(nr);
> +	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +	unsigned long old = *p;
> +
> +	*p = old ^ mask;
> +	return (old & mask) != 0;
> +}
> +
> +/**
> + * gen_test_bit - Determine whether a bit is set
> + * @nr: bit number to test
> + * @addr: Address to start counting from
> + */
> +static __always_inline int
> +gen_test_bit(unsigned int nr, const volatile unsigned long *addr)
> +{
> +	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> +}
> +
> +#endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
> diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h
> index 078cc68be2f1..7ce0ed22fb5f 100644
> --- a/include/asm-generic/bitops/non-atomic.h
> +++ b/include/asm-generic/bitops/non-atomic.h
> @@ -2,121 +2,28 @@
>  #ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
>  #define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
>  
> +#include <asm-generic/bitops/generic-non-atomic.h>
>  #include <asm/types.h>
>  
> -/**
> - * arch___set_bit - Set a bit in memory
> - * @nr: the bit to set
> - * @addr: the address to start counting from
> - *
> - * Unlike set_bit(), this function is non-atomic and may be reordered.
> - * If it's called on the same region of memory simultaneously, the effect
> - * may be that only one operation succeeds.
> - */
> -static __always_inline void
> -arch___set_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -	unsigned long mask = BIT_MASK(nr);
> -	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -
> -	*p  |= mask;
> -}
> +#define arch___set_bit gen___set_bit
>  #define __set_bit arch___set_bit
>  
> -static __always_inline void
> -arch___clear_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -	unsigned long mask = BIT_MASK(nr);
> -	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -
> -	*p &= ~mask;
> -}
> +#define arch___clear_bit gen___clear_bit
>  #define __clear_bit arch___clear_bit
>  
> -/**
> - * arch___change_bit - Toggle a bit in memory
> - * @nr: the bit to change
> - * @addr: the address to start counting from
> - *
> - * Unlike change_bit(), this function is non-atomic and may be reordered.
> - * If it's called on the same region of memory simultaneously, the effect
> - * may be that only one operation succeeds.
> - */
> -static __always_inline
> -void arch___change_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -	unsigned long mask = BIT_MASK(nr);
> -	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -
> -	*p ^= mask;
> -}
> +#define arch___change_bit gen___change_bit
>  #define __change_bit arch___change_bit
>  
> -/**
> - * arch___test_and_set_bit - Set a bit and return its old value
> - * @nr: Bit to set
> - * @addr: Address to count from
> - *
> - * This operation is non-atomic and can be reordered.
> - * If two examples of this operation race, one can appear to succeed
> - * but actually fail.  You must protect multiple accesses with a lock.
> - */
> -static __always_inline int
> -arch___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -	unsigned long mask = BIT_MASK(nr);
> -	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -	unsigned long old = *p;
> -
> -	*p = old | mask;
> -	return (old & mask) != 0;
> -}
> +#define arch___test_and_set_bit gen___test_and_set_bit
>  #define __test_and_set_bit arch___test_and_set_bit
>  
> -/**
> - * arch___test_and_clear_bit - Clear a bit and return its old value
> - * @nr: Bit to clear
> - * @addr: Address to count from
> - *
> - * This operation is non-atomic and can be reordered.
> - * If two examples of this operation race, one can appear to succeed
> - * but actually fail.  You must protect multiple accesses with a lock.
> - */
> -static __always_inline int
> -arch___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -	unsigned long mask = BIT_MASK(nr);
> -	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -	unsigned long old = *p;
> -
> -	*p = old & ~mask;
> -	return (old & mask) != 0;
> -}
> +#define arch___test_and_clear_bit gen___test_and_clear_bit
>  #define __test_and_clear_bit arch___test_and_clear_bit
>  
> -/* WARNING: non atomic and it can be reordered! */
> -static __always_inline int
> -arch___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -	unsigned long mask = BIT_MASK(nr);
> -	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -	unsigned long old = *p;
> -
> -	*p = old ^ mask;
> -	return (old & mask) != 0;
> -}
> +#define arch___test_and_change_bit gen___test_and_change_bit
>  #define __test_and_change_bit arch___test_and_change_bit
>  
> -/**
> - * arch_test_bit - Determine whether a bit is set
> - * @nr: bit number to test
> - * @addr: Address to start counting from
> - */
> -static __always_inline int
> -arch_test_bit(unsigned int nr, const volatile unsigned long *addr)
> -{
> -	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> -}
> +#define arch_test_bit gen_test_bit
>  #define test_bit arch_test_bit
>  
>  #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
> -- 
> 2.36.1
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 0/6] bitops: let optimize out non-atomic bitops on compile-time constants
  2022-06-06 11:49 ` Alexander Lobakin
  (?)
@ 2022-06-06 13:50   ` Mark Rutland
  -1 siblings, 0 replies; 75+ messages in thread
From: Mark Rutland @ 2022-06-06 13:50 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

On Mon, Jun 06, 2022 at 01:49:01PM +0200, Alexander Lobakin wrote:
> While I was working on converting some structure fields from a fixed
> type to a bitmap, I started observing code size increase not only in
> places where the code works with the converted structure fields, but
> also where the converted vars were on the stack. That said, the
> following code:
> 
> 	DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
> 	unsigned long bar = BIT(BAR_BIT);
> 	unsigned long baz = 0;
> 
> 	__set_bit(FOO_BIT, foo);
> 	baz |= BIT(BAZ_BIT);
> 
> 	BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
> 	BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
> 	BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
> 
> triggers the first assertion on x86_64, which means that the
> compiler is unable to evaluate it to a compile-time initializer
> when the architecture-specific bitop is used even if it's obvious.
> I found that this is due to that many architecture-specific
> non-atomic bitop implementations use inline asm or other hacks which
> are faster or more robust when working with "real" variables (i.e.
> fields from the structures etc.), but the compilers have no clue how
> to optimize them out when called on compile-time constants.
> 
> So, in order to let the compiler optimize out such cases, expand the
> test_bit() and __*_bit() definitions with a compile-time condition
> check, so that they will pick the generic C non-atomic bitop
> implementations when all of the arguments passed are compile-time
> constants, which means that the result will be a compile-time
> constant as well and the compiler will produce more efficient and
> simple code in 100% cases (no changes when there's at least one
> non-compile-time-constant argument).
> The condition itself:
> 
> if (
> __builtin_constant_p(nr) &&	/* <- bit position is constant */
> __builtin_constant_p(!!addr) &&	/* <- compiler knows bitmap addr is
> 				      always either NULL or not */
> addr &&				/* <- bitmap addr is not NULL */
> __builtin_constant_p(*addr)	/* <- compiler knows the value of
> 				      the target bitmap */
> )
> 	/* then pick the generic C variant
> else
> 	/* old code path, arch-specific
> 
> I also tried __is_constexpr() as suggested by Andy, but it was
> always returning 0 ('not a constant') for the 2,3 and 4th
> conditions.
> 
> The savings on x86_64 with LLVM are insane (.text):
> 
> $ scripts/bloat-o-meter -c vmlinux.{base,test}
> add/remove: 72/75 grow/shrink: 182/518 up/down: 53925/-137810 (-83885)

FWIW, I gave this series a spin on arm64 with GCC 11.1.0 and LLVM 14.0.0. Using
defconfig and v5.19-rc1 as a baseline, and we get about half that difference
with LLVM (and ~1/20th when using GCC):

  % ./scripts/bloat-o-meter -t vmlinux-llvm-baseline vmlinux-llvm-bitops
  add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
  ...
  Total: Before=16874812, After=16871048, chg -0.02%

  % ./scripts/bloat-o-meter -t vmlinux-gcc-baseline vmlinux-gcc-bitops
  add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
  ...
  Total: Before=16337480, After=16294240, chg -0.26%

The vmlinux files are correspondingly smaller:

  % ls -al vmlinux-*
  -rwxr-xr-x 1 mark mark 44262216 Jun  6 13:58 vmlinux-gcc-baseline
  -rwxr-xr-x 1 mark mark 44265240 Jun  6 14:01 vmlinux-gcc-bitops
  -rwxr-xr-x 1 mark mark 43573760 Jun  6 14:07 vmlinux-llvm-baseline
  -rwxr-xr-x 1 mark mark 43572560 Jun  6 14:11 vmlinux-llvm-bitops

... though due to padding and alignment, the resulting Image is the same size:

  % ls -al Image* 
  -rw-r--r-- 1 mark mark 36311552 Jun  6 13:58 Image-gcc-baseline
  -rw-r--r-- 1 mark mark 36311552 Jun  6 14:01 Image-gcc-bitops
  -rwxr-xr-x 1 mark mark 31218176 Jun  6 14:07 Image-llvm-baseline
  -rwxr-xr-x 1 mark mark 31218176 Jun  6 14:11 Image-llvm-bitops 

(as an aside, I need to go look at why the GCC Image is 5MB larger!).

Overall this looks like a nice cleanup/rework; I'll take a look over the
remaining patches in a bit.

Thanks,
Mark.

> 
> $ scripts/bloat-o-meter -c vmlinux.{base,mod}
> add/remove: 7/1 grow/shrink: 1/19 up/down: 1135/-4082 (-2947)
> 
> $ scripts/bloat-o-meter -c vmlinux.{base,all}
> add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> 
> And the following:
> 
> 	DECLARE_BITMAP(flags, __IP_TUNNEL_FLAG_NUM) = { };
> 	__be16 flags;
> 
> 	__set_bit(IP_TUNNEL_CSUM_BIT, flags);
> 
> 	tun_flags = cpu_to_be16(*flags & U16_MAX);
> 
> 	if (test_bit(IP_TUNNEL_VTI_BIT, flags))
> 		tun_flags |= VTI_ISVTI;
> 
> 	BUILD_BUG_ON(!__builtin_constant_p(tun_flags));
> 
> doesn't blow up anymore.
> 
> The series has been in intel-next for a while with no reported issues.
> Also available on my open GH[0].
> 
> [0] https://github.com/alobakin/linux/commits/bitops
> 
> Alexander Lobakin (6):
>   ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr()
>   bitops: always define asm-generic non-atomic bitops
>   bitops: define gen_test_bit() the same way as the rest of functions
>   bitops: unify non-atomic bitops prototypes across architectures
>   bitops: wrap non-atomic bitops with a transparent macro
>   bitops: let optimize out non-atomic bitops on compile-time constants
> 
>  arch/alpha/include/asm/bitops.h               |  28 ++--
>  arch/hexagon/include/asm/bitops.h             |  23 ++--
>  arch/ia64/include/asm/bitops.h                |  40 +++---
>  arch/ia64/include/asm/processor.h             |   2 +-
>  arch/m68k/include/asm/bitops.h                |  47 +++++--
>  arch/sh/include/asm/bitops-op32.h             |  32 +++--
>  arch/sparc/include/asm/bitops_32.h            |  18 +--
>  arch/sparc/lib/atomic32.c                     |  12 +-
>  .../asm-generic/bitops/generic-non-atomic.h   | 128 ++++++++++++++++++
>  .../bitops/instrumented-non-atomic.h          |  35 +++--
>  include/asm-generic/bitops/non-atomic.h       | 123 ++---------------
>  include/linux/bitops.h                        |  45 ++++++
>  tools/include/asm-generic/bitops/non-atomic.h |  34 +++--
>  tools/include/linux/bitops.h                  |  16 +++
>  14 files changed, 363 insertions(+), 220 deletions(-)
>  create mode 100644 include/asm-generic/bitops/generic-non-atomic.h
> 
> base-commit: f2906aa863381afb0015a9eb7fefad885d4e5a56
> -- 
> 2.36.1
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 0/6] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-06 13:50   ` Mark Rutland
  0 siblings, 0 replies; 75+ messages in thread
From: Mark Rutland @ 2022-06-06 13:50 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

On Mon, Jun 06, 2022 at 01:49:01PM +0200, Alexander Lobakin wrote:
> While I was working on converting some structure fields from a fixed
> type to a bitmap, I started observing code size increase not only in
> places where the code works with the converted structure fields, but
> also where the converted vars were on the stack. That said, the
> following code:
> 
> 	DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
> 	unsigned long bar = BIT(BAR_BIT);
> 	unsigned long baz = 0;
> 
> 	__set_bit(FOO_BIT, foo);
> 	baz |= BIT(BAZ_BIT);
> 
> 	BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
> 	BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
> 	BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
> 
> triggers the first assertion on x86_64, which means that the
> compiler is unable to evaluate it to a compile-time initializer
> when the architecture-specific bitop is used even if it's obvious.
> I found that this is due to that many architecture-specific
> non-atomic bitop implementations use inline asm or other hacks which
> are faster or more robust when working with "real" variables (i.e.
> fields from the structures etc.), but the compilers have no clue how
> to optimize them out when called on compile-time constants.
> 
> So, in order to let the compiler optimize out such cases, expand the
> test_bit() and __*_bit() definitions with a compile-time condition
> check, so that they will pick the generic C non-atomic bitop
> implementations when all of the arguments passed are compile-time
> constants, which means that the result will be a compile-time
> constant as well and the compiler will produce more efficient and
> simple code in 100% cases (no changes when there's at least one
> non-compile-time-constant argument).
> The condition itself:
> 
> if (
> __builtin_constant_p(nr) &&	/* <- bit position is constant */
> __builtin_constant_p(!!addr) &&	/* <- compiler knows bitmap addr is
> 				      always either NULL or not */
> addr &&				/* <- bitmap addr is not NULL */
> __builtin_constant_p(*addr)	/* <- compiler knows the value of
> 				      the target bitmap */
> )
> 	/* then pick the generic C variant
> else
> 	/* old code path, arch-specific
> 
> I also tried __is_constexpr() as suggested by Andy, but it was
> always returning 0 ('not a constant') for the 2,3 and 4th
> conditions.
> 
> The savings on x86_64 with LLVM are insane (.text):
> 
> $ scripts/bloat-o-meter -c vmlinux.{base,test}
> add/remove: 72/75 grow/shrink: 182/518 up/down: 53925/-137810 (-83885)

FWIW, I gave this series a spin on arm64 with GCC 11.1.0 and LLVM 14.0.0. Using
defconfig and v5.19-rc1 as a baseline, and we get about half that difference
with LLVM (and ~1/20th when using GCC):

  % ./scripts/bloat-o-meter -t vmlinux-llvm-baseline vmlinux-llvm-bitops
  add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
  ...
  Total: Before\x16874812, After\x16871048, chg -0.02%

  % ./scripts/bloat-o-meter -t vmlinux-gcc-baseline vmlinux-gcc-bitops
  add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
  ...
  Total: Before\x16337480, After\x16294240, chg -0.26%

The vmlinux files are correspondingly smaller:

  % ls -al vmlinux-*
  -rwxr-xr-x 1 mark mark 44262216 Jun  6 13:58 vmlinux-gcc-baseline
  -rwxr-xr-x 1 mark mark 44265240 Jun  6 14:01 vmlinux-gcc-bitops
  -rwxr-xr-x 1 mark mark 43573760 Jun  6 14:07 vmlinux-llvm-baseline
  -rwxr-xr-x 1 mark mark 43572560 Jun  6 14:11 vmlinux-llvm-bitops

... though due to padding and alignment, the resulting Image is the same size:

  % ls -al Image* 
  -rw-r--r-- 1 mark mark 36311552 Jun  6 13:58 Image-gcc-baseline
  -rw-r--r-- 1 mark mark 36311552 Jun  6 14:01 Image-gcc-bitops
  -rwxr-xr-x 1 mark mark 31218176 Jun  6 14:07 Image-llvm-baseline
  -rwxr-xr-x 1 mark mark 31218176 Jun  6 14:11 Image-llvm-bitops 

(as an aside, I need to go look at why the GCC Image is 5MB larger!).

Overall this looks like a nice cleanup/rework; I'll take a look over the
remaining patches in a bit.

Thanks,
Mark.

> 
> $ scripts/bloat-o-meter -c vmlinux.{base,mod}
> add/remove: 7/1 grow/shrink: 1/19 up/down: 1135/-4082 (-2947)
> 
> $ scripts/bloat-o-meter -c vmlinux.{base,all}
> add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> 
> And the following:
> 
> 	DECLARE_BITMAP(flags, __IP_TUNNEL_FLAG_NUM) = { };
> 	__be16 flags;
> 
> 	__set_bit(IP_TUNNEL_CSUM_BIT, flags);
> 
> 	tun_flags = cpu_to_be16(*flags & U16_MAX);
> 
> 	if (test_bit(IP_TUNNEL_VTI_BIT, flags))
> 		tun_flags |= VTI_ISVTI;
> 
> 	BUILD_BUG_ON(!__builtin_constant_p(tun_flags));
> 
> doesn't blow up anymore.
> 
> The series has been in intel-next for a while with no reported issues.
> Also available on my open GH[0].
> 
> [0] https://github.com/alobakin/linux/commits/bitops
> 
> Alexander Lobakin (6):
>   ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr()
>   bitops: always define asm-generic non-atomic bitops
>   bitops: define gen_test_bit() the same way as the rest of functions
>   bitops: unify non-atomic bitops prototypes across architectures
>   bitops: wrap non-atomic bitops with a transparent macro
>   bitops: let optimize out non-atomic bitops on compile-time constants
> 
>  arch/alpha/include/asm/bitops.h               |  28 ++--
>  arch/hexagon/include/asm/bitops.h             |  23 ++--
>  arch/ia64/include/asm/bitops.h                |  40 +++---
>  arch/ia64/include/asm/processor.h             |   2 +-
>  arch/m68k/include/asm/bitops.h                |  47 +++++--
>  arch/sh/include/asm/bitops-op32.h             |  32 +++--
>  arch/sparc/include/asm/bitops_32.h            |  18 +--
>  arch/sparc/lib/atomic32.c                     |  12 +-
>  .../asm-generic/bitops/generic-non-atomic.h   | 128 ++++++++++++++++++
>  .../bitops/instrumented-non-atomic.h          |  35 +++--
>  include/asm-generic/bitops/non-atomic.h       | 123 ++---------------
>  include/linux/bitops.h                        |  45 ++++++
>  tools/include/asm-generic/bitops/non-atomic.h |  34 +++--
>  tools/include/linux/bitops.h                  |  16 +++
>  14 files changed, 363 insertions(+), 220 deletions(-)
>  create mode 100644 include/asm-generic/bitops/generic-non-atomic.h
> 
> base-commit: f2906aa863381afb0015a9eb7fefad885d4e5a56
> -- 
> 2.36.1
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 0/6] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-06 13:50   ` Mark Rutland
  0 siblings, 0 replies; 75+ messages in thread
From: Mark Rutland @ 2022-06-06 13:50 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh

On Mon, Jun 06, 2022 at 01:49:01PM +0200, Alexander Lobakin wrote:
> While I was working on converting some structure fields from a fixed
> type to a bitmap, I started observing code size increase not only in
> places where the code works with the converted structure fields, but
> also where the converted vars were on the stack. That said, the
> following code:
> 
> 	DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
> 	unsigned long bar = BIT(BAR_BIT);
> 	unsigned long baz = 0;
> 
> 	__set_bit(FOO_BIT, foo);
> 	baz |= BIT(BAZ_BIT);
> 
> 	BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
> 	BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
> 	BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
> 
> triggers the first assertion on x86_64, which means that the
> compiler is unable to evaluate it to a compile-time initializer
> when the architecture-specific bitop is used even if it's obvious.
> I found that this is due to that many architecture-specific
> non-atomic bitop implementations use inline asm or other hacks which
> are faster or more robust when working with "real" variables (i.e.
> fields from the structures etc.), but the compilers have no clue how
> to optimize them out when called on compile-time constants.
> 
> So, in order to let the compiler optimize out such cases, expand the
> test_bit() and __*_bit() definitions with a compile-time condition
> check, so that they will pick the generic C non-atomic bitop
> implementations when all of the arguments passed are compile-time
> constants, which means that the result will be a compile-time
> constant as well and the compiler will produce more efficient and
> simple code in 100% cases (no changes when there's at least one
> non-compile-time-constant argument).
> The condition itself:
> 
> if (
> __builtin_constant_p(nr) &&	/* <- bit position is constant */
> __builtin_constant_p(!!addr) &&	/* <- compiler knows bitmap addr is
> 				      always either NULL or not */
> addr &&				/* <- bitmap addr is not NULL */
> __builtin_constant_p(*addr)	/* <- compiler knows the value of
> 				      the target bitmap */
> )
> 	/* then pick the generic C variant
> else
> 	/* old code path, arch-specific
> 
> I also tried __is_constexpr() as suggested by Andy, but it was
> always returning 0 ('not a constant') for the 2,3 and 4th
> conditions.
> 
> The savings on x86_64 with LLVM are insane (.text):
> 
> $ scripts/bloat-o-meter -c vmlinux.{base,test}
> add/remove: 72/75 grow/shrink: 182/518 up/down: 53925/-137810 (-83885)

FWIW, I gave this series a spin on arm64 with GCC 11.1.0 and LLVM 14.0.0. Using
defconfig and v5.19-rc1 as a baseline, and we get about half that difference
with LLVM (and ~1/20th when using GCC):

  % ./scripts/bloat-o-meter -t vmlinux-llvm-baseline vmlinux-llvm-bitops
  add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
  ...
  Total: Before=16874812, After=16871048, chg -0.02%

  % ./scripts/bloat-o-meter -t vmlinux-gcc-baseline vmlinux-gcc-bitops
  add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
  ...
  Total: Before=16337480, After=16294240, chg -0.26%

The vmlinux files are correspondingly smaller:

  % ls -al vmlinux-*
  -rwxr-xr-x 1 mark mark 44262216 Jun  6 13:58 vmlinux-gcc-baseline
  -rwxr-xr-x 1 mark mark 44265240 Jun  6 14:01 vmlinux-gcc-bitops
  -rwxr-xr-x 1 mark mark 43573760 Jun  6 14:07 vmlinux-llvm-baseline
  -rwxr-xr-x 1 mark mark 43572560 Jun  6 14:11 vmlinux-llvm-bitops

... though due to padding and alignment, the resulting Image is the same size:

  % ls -al Image* 
  -rw-r--r-- 1 mark mark 36311552 Jun  6 13:58 Image-gcc-baseline
  -rw-r--r-- 1 mark mark 36311552 Jun  6 14:01 Image-gcc-bitops
  -rwxr-xr-x 1 mark mark 31218176 Jun  6 14:07 Image-llvm-baseline
  -rwxr-xr-x 1 mark mark 31218176 Jun  6 14:11 Image-llvm-bitops 

(as an aside, I need to go look at why the GCC Image is 5MB larger!).

Overall this looks like a nice cleanup/rework; I'll take a look over the
remaining patches in a bit.

Thanks,
Mark.

> 
> $ scripts/bloat-o-meter -c vmlinux.{base,mod}
> add/remove: 7/1 grow/shrink: 1/19 up/down: 1135/-4082 (-2947)
> 
> $ scripts/bloat-o-meter -c vmlinux.{base,all}
> add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> 
> And the following:
> 
> 	DECLARE_BITMAP(flags, __IP_TUNNEL_FLAG_NUM) = { };
> 	__be16 flags;
> 
> 	__set_bit(IP_TUNNEL_CSUM_BIT, flags);
> 
> 	tun_flags = cpu_to_be16(*flags & U16_MAX);
> 
> 	if (test_bit(IP_TUNNEL_VTI_BIT, flags))
> 		tun_flags |= VTI_ISVTI;
> 
> 	BUILD_BUG_ON(!__builtin_constant_p(tun_flags));
> 
> doesn't blow up anymore.
> 
> The series has been in intel-next for a while with no reported issues.
> Also available on my open GH[0].
> 
> [0] https://github.com/alobakin/linux/commits/bitops
> 
> Alexander Lobakin (6):
>   ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr()
>   bitops: always define asm-generic non-atomic bitops
>   bitops: define gen_test_bit() the same way as the rest of functions
>   bitops: unify non-atomic bitops prototypes across architectures
>   bitops: wrap non-atomic bitops with a transparent macro
>   bitops: let optimize out non-atomic bitops on compile-time constants
> 
>  arch/alpha/include/asm/bitops.h               |  28 ++--
>  arch/hexagon/include/asm/bitops.h             |  23 ++--
>  arch/ia64/include/asm/bitops.h                |  40 +++---
>  arch/ia64/include/asm/processor.h             |   2 +-
>  arch/m68k/include/asm/bitops.h                |  47 +++++--
>  arch/sh/include/asm/bitops-op32.h             |  32 +++--
>  arch/sparc/include/asm/bitops_32.h            |  18 +--
>  arch/sparc/lib/atomic32.c                     |  12 +-
>  .../asm-generic/bitops/generic-non-atomic.h   | 128 ++++++++++++++++++
>  .../bitops/instrumented-non-atomic.h          |  35 +++--
>  include/asm-generic/bitops/non-atomic.h       | 123 ++---------------
>  include/linux/bitops.h                        |  45 ++++++
>  tools/include/asm-generic/bitops/non-atomic.h |  34 +++--
>  tools/include/linux/bitops.h                  |  16 +++
>  14 files changed, 363 insertions(+), 220 deletions(-)
>  create mode 100644 include/asm-generic/bitops/generic-non-atomic.h
> 
> base-commit: f2906aa863381afb0015a9eb7fefad885d4e5a56
> -- 
> 2.36.1
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 2/6] bitops: always define asm-generic non-atomic bitops
  2022-06-06 12:44     ` Mark Rutland
  (?)
@ 2022-06-06 14:21       ` Alexander Lobakin
  -1 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-06 14:21 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Alexander Lobakin, Arnd Bergmann, Yury Norov, Andy Shevchenko,
	Richard Henderson, Matt Turner, Brian Cain, Geert Uytterhoeven,
	Yoshinori Sato, Rich Felker, David S. Miller, Kees Cook,
	Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

From: Mark Rutland <mark.rutland@arm.com>
Date: Mon, 6 Jun 2022 13:44:52 +0100

> On Mon, Jun 06, 2022 at 01:49:03PM +0200, Alexander Lobakin wrote:
> > Move generic non-atomic bitops from the asm-generic header which
> > gets included only when there are no architecture-specific
> > alternatives, to a separate independent file to make them always
> > available.
> > 
> > Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> > ---
> >  .../asm-generic/bitops/generic-non-atomic.h   | 124 ++++++++++++++++++
> >  include/asm-generic/bitops/non-atomic.h       | 109 ++-------------
> >  2 files changed, 132 insertions(+), 101 deletions(-)
> >  create mode 100644 include/asm-generic/bitops/generic-non-atomic.h
> > 
> > diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> > new file mode 100644
> > index 000000000000..7a60adfa6e7d
> > --- /dev/null
> > +++ b/include/asm-generic/bitops/generic-non-atomic.h
> > @@ -0,0 +1,124 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +#ifndef __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
> > +#define __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
> > +
> > +#include <linux/bits.h>
> > +
> > +#ifndef _LINUX_BITOPS_H
> > +#error only <linux/bitops.h> can be included directly
> > +#endif
> > +
> > +/*
> > + * Generic definitions for bit operations, should not be used in regular code
> > + * directly.
> > + */
> > +
> > +/**
> > + * gen___set_bit - Set a bit in memory
> > + * @nr: the bit to set
> > + * @addr: the address to start counting from
> > + *
> > + * Unlike set_bit(), this function is non-atomic and may be reordered.
> > + * If it's called on the same region of memory simultaneously, the effect
> > + * may be that only one operation succeeds.
> > + */
> > +static __always_inline void
> > +gen___set_bit(unsigned int nr, volatile unsigned long *addr)
> 
> Could we please use 'generic' rather than 'gen' as the prefix?
> 
> That'd match what we did for the generic atomic_*() and atomic64_*() functions
> in commits
> 
> * f8b6455a9d381fc5 ("locking/atomic: atomic: support ARCH_ATOMIC")
> * 1bdadf46eff6804a ("locking/atomic: atomic64: support ARCH_ATOMIC")
> 
> ... and it avoids any potential confusion with 'gen' meaning 'generated' or
> similar.

Sure! Thanks for giving the hint, I do agree it would look better
and will rename in v2.

> 
> Thanks,
> Mark.
> 
> > +{
> > +	unsigned long mask = BIT_MASK(nr);
> > +	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> > +
> > +	*p  |= mask;
> > +}

[...]

> > -- 
> > 2.36.1

Thanks,
Al

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 2/6] bitops: always define asm-generic non-atomic bitops
@ 2022-06-06 14:21       ` Alexander Lobakin
  0 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-06 14:21 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Alexander Lobakin, Arnd Bergmann, Yury Norov, Andy Shevchenko,
	Richard Henderson, Matt Turner, Brian Cain, Geert Uytterhoeven,
	Yoshinori Sato, Rich Felker, David S. Miller, Kees Cook,
	Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

From: Mark Rutland <mark.rutland@arm.com>
Date: Mon, 6 Jun 2022 13:44:52 +0100

> On Mon, Jun 06, 2022 at 01:49:03PM +0200, Alexander Lobakin wrote:
> > Move generic non-atomic bitops from the asm-generic header which
> > gets included only when there are no architecture-specific
> > alternatives, to a separate independent file to make them always
> > available.
> > 
> > Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> > ---
> >  .../asm-generic/bitops/generic-non-atomic.h   | 124 ++++++++++++++++++
> >  include/asm-generic/bitops/non-atomic.h       | 109 ++-------------
> >  2 files changed, 132 insertions(+), 101 deletions(-)
> >  create mode 100644 include/asm-generic/bitops/generic-non-atomic.h
> > 
> > diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> > new file mode 100644
> > index 000000000000..7a60adfa6e7d
> > --- /dev/null
> > +++ b/include/asm-generic/bitops/generic-non-atomic.h
> > @@ -0,0 +1,124 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +#ifndef __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
> > +#define __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
> > +
> > +#include <linux/bits.h>
> > +
> > +#ifndef _LINUX_BITOPS_H
> > +#error only <linux/bitops.h> can be included directly
> > +#endif
> > +
> > +/*
> > + * Generic definitions for bit operations, should not be used in regular code
> > + * directly.
> > + */
> > +
> > +/**
> > + * gen___set_bit - Set a bit in memory
> > + * @nr: the bit to set
> > + * @addr: the address to start counting from
> > + *
> > + * Unlike set_bit(), this function is non-atomic and may be reordered.
> > + * If it's called on the same region of memory simultaneously, the effect
> > + * may be that only one operation succeeds.
> > + */
> > +static __always_inline void
> > +gen___set_bit(unsigned int nr, volatile unsigned long *addr)
> 
> Could we please use 'generic' rather than 'gen' as the prefix?
> 
> That'd match what we did for the generic atomic_*() and atomic64_*() functions
> in commits
> 
> * f8b6455a9d381fc5 ("locking/atomic: atomic: support ARCH_ATOMIC")
> * 1bdadf46eff6804a ("locking/atomic: atomic64: support ARCH_ATOMIC")
> 
> ... and it avoids any potential confusion with 'gen' meaning 'generated' or
> similar.

Sure! Thanks for giving the hint, I do agree it would look better
and will rename in v2.

> 
> Thanks,
> Mark.
> 
> > +{
> > +	unsigned long mask = BIT_MASK(nr);
> > +	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> > +
> > +	*p  |= mask;
> > +}

[...]

> > -- 
> > 2.36.1

Thanks,
Al

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 2/6] bitops: always define asm-generic non-atomic bitops
@ 2022-06-06 14:21       ` Alexander Lobakin
  0 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-06 14:21 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Alexander Lobakin, Arnd Bergmann, Yury Norov, Andy Shevchenko,
	Richard Henderson, Matt Turner, Brian Cain, Geert Uytterhoeven,
	Yoshinori Sato, Rich Felker, David S. Miller, Kees Cook,
	Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m6

From: Mark Rutland <mark.rutland@arm.com>
Date: Mon, 6 Jun 2022 13:44:52 +0100

> On Mon, Jun 06, 2022 at 01:49:03PM +0200, Alexander Lobakin wrote:
> > Move generic non-atomic bitops from the asm-generic header which
> > gets included only when there are no architecture-specific
> > alternatives, to a separate independent file to make them always
> > available.
> > 
> > Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> > ---
> >  .../asm-generic/bitops/generic-non-atomic.h   | 124 ++++++++++++++++++
> >  include/asm-generic/bitops/non-atomic.h       | 109 ++-------------
> >  2 files changed, 132 insertions(+), 101 deletions(-)
> >  create mode 100644 include/asm-generic/bitops/generic-non-atomic.h
> > 
> > diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> > new file mode 100644
> > index 000000000000..7a60adfa6e7d
> > --- /dev/null
> > +++ b/include/asm-generic/bitops/generic-non-atomic.h
> > @@ -0,0 +1,124 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +#ifndef __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
> > +#define __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
> > +
> > +#include <linux/bits.h>
> > +
> > +#ifndef _LINUX_BITOPS_H
> > +#error only <linux/bitops.h> can be included directly
> > +#endif
> > +
> > +/*
> > + * Generic definitions for bit operations, should not be used in regular code
> > + * directly.
> > + */
> > +
> > +/**
> > + * gen___set_bit - Set a bit in memory
> > + * @nr: the bit to set
> > + * @addr: the address to start counting from
> > + *
> > + * Unlike set_bit(), this function is non-atomic and may be reordered.
> > + * If it's called on the same region of memory simultaneously, the effect
> > + * may be that only one operation succeeds.
> > + */
> > +static __always_inline void
> > +gen___set_bit(unsigned int nr, volatile unsigned long *addr)
> 
> Could we please use 'generic' rather than 'gen' as the prefix?
> 
> That'd match what we did for the generic atomic_*() and atomic64_*() functions
> in commits
> 
> * f8b6455a9d381fc5 ("locking/atomic: atomic: support ARCH_ATOMIC")
> * 1bdadf46eff6804a ("locking/atomic: atomic64: support ARCH_ATOMIC")
> 
> ... and it avoids any potential confusion with 'gen' meaning 'generated' or
> similar.

Sure! Thanks for giving the hint, I do agree it would look better
and will rename in v2.

> 
> Thanks,
> Mark.
> 
> > +{
> > +	unsigned long mask = BIT_MASK(nr);
> > +	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> > +
> > +	*p  |= mask;
> > +}

[...]

> > -- 
> > 2.36.1

Thanks,
Al

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 3/6] bitops: define gen_test_bit() the same way as the rest of functions
  2022-06-06 11:49   ` Alexander Lobakin
  (?)
@ 2022-06-06 16:19     ` Mark Rutland
  -1 siblings, 0 replies; 75+ messages in thread
From: Mark Rutland @ 2022-06-06 16:19 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

On Mon, Jun 06, 2022 at 01:49:04PM +0200, Alexander Lobakin wrote:
> Currently, the generic test_bit() function is defined as a one-liner
> and in case with constant bitmaps the compiler is unable to optimize
> it to a constant. At the same time, gen_test_and_*_bit() are being
> optimized pretty good.
> Define gen_test_bit() the same way as they are defined.
> 
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>

Regardless of whether compilers prefer this, I think it's nicer to have the
structure consistent with the rest of the functions, so FWIW:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Mark.

> ---
>  include/asm-generic/bitops/generic-non-atomic.h | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> index 7a60adfa6e7d..202d8a3b40e1 100644
> --- a/include/asm-generic/bitops/generic-non-atomic.h
> +++ b/include/asm-generic/bitops/generic-non-atomic.h
> @@ -118,7 +118,11 @@ gen___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
>  static __always_inline int
>  gen_test_bit(unsigned int nr, const volatile unsigned long *addr)
>  {
> -	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> +	const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
> +	unsigned long mask = BIT_MASK(nr);
> +	unsigned long val = *p;
> +
> +	return !!(val & mask);
>  }
>  
>  #endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
> -- 
> 2.36.1
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 3/6] bitops: define gen_test_bit() the same way as the rest of functions
@ 2022-06-06 16:19     ` Mark Rutland
  0 siblings, 0 replies; 75+ messages in thread
From: Mark Rutland @ 2022-06-06 16:19 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

On Mon, Jun 06, 2022 at 01:49:04PM +0200, Alexander Lobakin wrote:
> Currently, the generic test_bit() function is defined as a one-liner
> and in case with constant bitmaps the compiler is unable to optimize
> it to a constant. At the same time, gen_test_and_*_bit() are being
> optimized pretty good.
> Define gen_test_bit() the same way as they are defined.
> 
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>

Regardless of whether compilers prefer this, I think it's nicer to have the
structure consistent with the rest of the functions, so FWIW:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Mark.

> ---
>  include/asm-generic/bitops/generic-non-atomic.h | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> index 7a60adfa6e7d..202d8a3b40e1 100644
> --- a/include/asm-generic/bitops/generic-non-atomic.h
> +++ b/include/asm-generic/bitops/generic-non-atomic.h
> @@ -118,7 +118,11 @@ gen___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
>  static __always_inline int
>  gen_test_bit(unsigned int nr, const volatile unsigned long *addr)
>  {
> -	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> +	const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
> +	unsigned long mask = BIT_MASK(nr);
> +	unsigned long val = *p;
> +
> +	return !!(val & mask);
>  }
>  
>  #endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
> -- 
> 2.36.1
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 3/6] bitops: define gen_test_bit() the same way as the rest of functions
@ 2022-06-06 16:19     ` Mark Rutland
  0 siblings, 0 replies; 75+ messages in thread
From: Mark Rutland @ 2022-06-06 16:19 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh

On Mon, Jun 06, 2022 at 01:49:04PM +0200, Alexander Lobakin wrote:
> Currently, the generic test_bit() function is defined as a one-liner
> and in case with constant bitmaps the compiler is unable to optimize
> it to a constant. At the same time, gen_test_and_*_bit() are being
> optimized pretty good.
> Define gen_test_bit() the same way as they are defined.
> 
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>

Regardless of whether compilers prefer this, I think it's nicer to have the
structure consistent with the rest of the functions, so FWIW:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Mark.

> ---
>  include/asm-generic/bitops/generic-non-atomic.h | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> index 7a60adfa6e7d..202d8a3b40e1 100644
> --- a/include/asm-generic/bitops/generic-non-atomic.h
> +++ b/include/asm-generic/bitops/generic-non-atomic.h
> @@ -118,7 +118,11 @@ gen___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
>  static __always_inline int
>  gen_test_bit(unsigned int nr, const volatile unsigned long *addr)
>  {
> -	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> +	const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
> +	unsigned long mask = BIT_MASK(nr);
> +	unsigned long val = *p;
> +
> +	return !!(val & mask);
>  }
>  
>  #endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
> -- 
> 2.36.1
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 4/6] bitops: unify non-atomic bitops prototypes across architectures
  2022-06-06 11:49   ` Alexander Lobakin
  (?)
@ 2022-06-06 16:25     ` Mark Rutland
  -1 siblings, 0 replies; 75+ messages in thread
From: Mark Rutland @ 2022-06-06 16:25 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

On Mon, Jun 06, 2022 at 01:49:05PM +0200, Alexander Lobakin wrote:
> Currently, there is a mess with the prototypes of the non-atomic
> bitops across the different architectures:
> 
> ret	bool, int, unsigned long
> nr	int, long, unsigned int, unsigned long
> addr	volatile unsigned long *, volatile void *
> 
> Thankfully, it doesn't provoke any bugs, but can sometimes make
> the compiler angry when it's not handy at all.
> Adjust all the prototypes to the following standard:
> 
> ret	bool				retval can be only 0 or 1
> nr	unsigned long			native; signed makes no sense
> addr	volatile unsigned long *	bitmaps are arrays of ulongs
> 
> Finally, add some static assertions in order to prevent people from
> making a mess in this room again.

This looks like a nice cleanup!

Using `bool` makes the semantic clearer from the prototype.

> I also used the %__always_inline attribute consistently they always
> get resolved to the actual operations.

Since these will get wrapped in the next patch (and architectures may want to
use these in noinstr code), this makes sense to me.

> 
> Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>

FWIW:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Mark.

> ---
>  arch/alpha/include/asm/bitops.h               | 28 +++++------
>  arch/hexagon/include/asm/bitops.h             | 23 +++++----
>  arch/ia64/include/asm/bitops.h                | 28 +++++------
>  arch/m68k/include/asm/bitops.h                | 47 +++++++++++++------
>  arch/sh/include/asm/bitops-op32.h             | 24 ++++++----
>  .../asm-generic/bitops/generic-non-atomic.h   | 24 +++++-----
>  .../bitops/instrumented-non-atomic.h          | 21 ++++++---
>  include/linux/bitops.h                        | 13 +++++
>  tools/include/asm-generic/bitops/non-atomic.h | 24 ++++++----
>  9 files changed, 146 insertions(+), 86 deletions(-)
> 
> diff --git a/arch/alpha/include/asm/bitops.h b/arch/alpha/include/asm/bitops.h
> index e1d8483a45f2..381ad2eae4b4 100644
> --- a/arch/alpha/include/asm/bitops.h
> +++ b/arch/alpha/include/asm/bitops.h
> @@ -46,8 +46,8 @@ set_bit(unsigned long nr, volatile void * addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static inline void
> -__set_bit(unsigned long nr, volatile void * addr)
> +static __always_inline void
> +__set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	int *m = ((int *) addr) + (nr >> 5);
>  
> @@ -82,8 +82,8 @@ clear_bit_unlock(unsigned long nr, volatile void * addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static __inline__ void
> -__clear_bit(unsigned long nr, volatile void * addr)
> +static __always_inline void
> +__clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	int *m = ((int *) addr) + (nr >> 5);
>  
> @@ -118,8 +118,8 @@ change_bit(unsigned long nr, volatile void * addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static __inline__ void
> -__change_bit(unsigned long nr, volatile void * addr)
> +static __always_inline void
> +__change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	int *m = ((int *) addr) + (nr >> 5);
>  
> @@ -186,8 +186,8 @@ test_and_set_bit_lock(unsigned long nr, volatile void *addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static inline int
> -__test_and_set_bit(unsigned long nr, volatile void * addr)
> +static __always_inline bool
> +__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = 1 << (nr & 0x1f);
>  	int *m = ((int *) addr) + (nr >> 5);
> @@ -230,8 +230,8 @@ test_and_clear_bit(unsigned long nr, volatile void * addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static inline int
> -__test_and_clear_bit(unsigned long nr, volatile void * addr)
> +static __always_inline bool
> +__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = 1 << (nr & 0x1f);
>  	int *m = ((int *) addr) + (nr >> 5);
> @@ -272,8 +272,8 @@ test_and_change_bit(unsigned long nr, volatile void * addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static __inline__ int
> -__test_and_change_bit(unsigned long nr, volatile void * addr)
> +static __always_inline bool
> +__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = 1 << (nr & 0x1f);
>  	int *m = ((int *) addr) + (nr >> 5);
> @@ -283,8 +283,8 @@ __test_and_change_bit(unsigned long nr, volatile void * addr)
>  	return (old & mask) != 0;
>  }
>  
> -static inline int
> -test_bit(int nr, const volatile void * addr)
> +static __always_inline bool
> +test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return (1UL & (((const int *) addr)[nr >> 5] >> (nr & 31))) != 0UL;
>  }
> diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h
> index 75d6ba3643b8..a3bfe3a8d4b7 100644
> --- a/arch/hexagon/include/asm/bitops.h
> +++ b/arch/hexagon/include/asm/bitops.h
> @@ -127,38 +127,45 @@ static inline void change_bit(int nr, volatile void *addr)
>   * be atomic, particularly for things like slab_lock and slab_unlock.
>   *
>   */
> -static inline void __clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	test_and_clear_bit(nr, addr);
>  }
>  
> -static inline void __set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	test_and_set_bit(nr, addr);
>  }
>  
> -static inline void __change_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	test_and_change_bit(nr, addr);
>  }
>  
>  /*  Apparently, at least some of these are allowed to be non-atomic  */
> -static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_clear_bit(nr, addr);
>  }
>  
> -static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_set_bit(nr, addr);
>  }
>  
> -static inline int __test_and_change_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_change_bit(nr, addr);
>  }
>  
> -static inline int __test_bit(int nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	int retval;
>  
> @@ -172,7 +179,7 @@ static inline int __test_bit(int nr, const volatile unsigned long *addr)
>  	return retval;
>  }
>  
> -#define test_bit(nr, addr) __test_bit(nr, addr)
> +#define __test_bit(nr, addr) test_bit(nr, addr)
>  
>  /*
>   * ffz - find first zero in word.
> diff --git a/arch/ia64/include/asm/bitops.h b/arch/ia64/include/asm/bitops.h
> index 577be93c0818..4267a217a503 100644
> --- a/arch/ia64/include/asm/bitops.h
> +++ b/arch/ia64/include/asm/bitops.h
> @@ -61,8 +61,8 @@ set_bit (int nr, volatile void *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static __inline__ void
> -__set_bit (int nr, volatile void *addr)
> +static __always_inline void
> +__set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	*((__u32 *) addr + (nr >> 5)) |= (1 << (nr & 31));
>  }
> @@ -143,8 +143,8 @@ __clear_bit_unlock(int nr, void *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static __inline__ void
> -__clear_bit (int nr, volatile void *addr)
> +static __always_inline void
> +__clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	*((__u32 *) addr + (nr >> 5)) &= ~(1 << (nr & 31));
>  }
> @@ -183,8 +183,8 @@ change_bit (int nr, volatile void *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static __inline__ void
> -__change_bit (int nr, volatile void *addr)
> +static __always_inline void
> +__change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	*((__u32 *) addr + (nr >> 5)) ^= (1 << (nr & 31));
>  }
> @@ -232,8 +232,8 @@ test_and_set_bit (int nr, volatile void *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static __inline__ int
> -__test_and_set_bit (int nr, volatile void *addr)
> +static __always_inline bool
> +__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__u32 *p = (__u32 *) addr + (nr >> 5);
>  	__u32 m = 1 << (nr & 31);
> @@ -277,8 +277,8 @@ test_and_clear_bit (int nr, volatile void *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static __inline__ int
> -__test_and_clear_bit(int nr, volatile void * addr)
> +static __always_inline bool
> +__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__u32 *p = (__u32 *) addr + (nr >> 5);
>  	__u32 m = 1 << (nr & 31);
> @@ -320,8 +320,8 @@ test_and_change_bit (int nr, volatile void *addr)
>   *
>   * This operation is non-atomic and can be reordered.
>   */
> -static __inline__ int
> -__test_and_change_bit (int nr, void *addr)
> +static __always_inline bool
> +__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__u32 old, bit = (1 << (nr & 31));
>  	__u32 *m = (__u32 *) addr + (nr >> 5);
> @@ -331,8 +331,8 @@ __test_and_change_bit (int nr, void *addr)
>  	return (old & bit) != 0;
>  }
>  
> -static __inline__ int
> -test_bit (int nr, const volatile void *addr)
> +static __always_inline bool
> +test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return 1 & (((const volatile __u32 *) addr)[nr >> 5] >> (nr & 31));
>  }
> diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
> index 51283db53667..9d44bd4713cb 100644
> --- a/arch/m68k/include/asm/bitops.h
> +++ b/arch/m68k/include/asm/bitops.h
> @@ -65,8 +65,11 @@ static inline void bfset_mem_set_bit(int nr, volatile unsigned long *vaddr)
>  				bfset_mem_set_bit(nr, vaddr))
>  #endif
>  
> -#define __set_bit(nr, vaddr)	set_bit(nr, vaddr)
> -
> +static __always_inline void
> +__set_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> +	set_bit(nr, addr);
> +}
>  
>  static inline void bclr_reg_clear_bit(int nr, volatile unsigned long *vaddr)
>  {
> @@ -105,8 +108,11 @@ static inline void bfclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
>  				bfclr_mem_clear_bit(nr, vaddr))
>  #endif
>  
> -#define __clear_bit(nr, vaddr)	clear_bit(nr, vaddr)
> -
> +static __always_inline void
> +__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> +	clear_bit(nr, addr);
> +}
>  
>  static inline void bchg_reg_change_bit(int nr, volatile unsigned long *vaddr)
>  {
> @@ -145,12 +151,16 @@ static inline void bfchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
>  				bfchg_mem_change_bit(nr, vaddr))
>  #endif
>  
> -#define __change_bit(nr, vaddr)	change_bit(nr, vaddr)
> -
> +static __always_inline void
> +__change_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> +	change_bit(nr, addr);
> +}
>  
> -static inline int test_bit(int nr, const volatile unsigned long *vaddr)
> +static __always_inline bool
> +test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
> -	return (vaddr[nr >> 5] & (1UL << (nr & 31))) != 0;
> +	return (addr[nr >> 5] & (1UL << (nr & 31))) != 0;
>  }
>  
>  
> @@ -201,8 +211,11 @@ static inline int bfset_mem_test_and_set_bit(int nr,
>  					bfset_mem_test_and_set_bit(nr, vaddr))
>  #endif
>  
> -#define __test_and_set_bit(nr, vaddr)	test_and_set_bit(nr, vaddr)
> -
> +static __always_inline bool
> +__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> +	return test_and_set_bit(nr, addr);
> +}
>  
>  static inline int bclr_reg_test_and_clear_bit(int nr,
>  					      volatile unsigned long *vaddr)
> @@ -251,8 +264,11 @@ static inline int bfclr_mem_test_and_clear_bit(int nr,
>  					bfclr_mem_test_and_clear_bit(nr, vaddr))
>  #endif
>  
> -#define __test_and_clear_bit(nr, vaddr)	test_and_clear_bit(nr, vaddr)
> -
> +static __always_inline bool
> +__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> +	return test_and_clear_bit(nr, addr);
> +}
>  
>  static inline int bchg_reg_test_and_change_bit(int nr,
>  					       volatile unsigned long *vaddr)
> @@ -301,8 +317,11 @@ static inline int bfchg_mem_test_and_change_bit(int nr,
>  					bfchg_mem_test_and_change_bit(nr, vaddr))
>  #endif
>  
> -#define __test_and_change_bit(nr, vaddr) test_and_change_bit(nr, vaddr)
> -
> +static __always_inline bool
> +__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> +	return test_and_change_bit(nr, addr);
> +}
>  
>  /*
>   *	The true 68020 and more advanced processors support the "bfffo"
> diff --git a/arch/sh/include/asm/bitops-op32.h b/arch/sh/include/asm/bitops-op32.h
> index cfe5465acce7..dcd85866a394 100644
> --- a/arch/sh/include/asm/bitops-op32.h
> +++ b/arch/sh/include/asm/bitops-op32.h
> @@ -2,6 +2,8 @@
>  #ifndef __ASM_SH_BITOPS_OP32_H
>  #define __ASM_SH_BITOPS_OP32_H
>  
> +#include <linux/bits.h>
> +
>  /*
>   * The bit modifying instructions on SH-2A are only capable of working
>   * with a 3-bit immediate, which signifies the shift position for the bit
> @@ -16,7 +18,8 @@
>  #define BYTE_OFFSET(nr)		((nr) % BITS_PER_BYTE)
>  #endif
>  
> -static inline void __set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	if (__builtin_constant_p(nr)) {
>  		__asm__ __volatile__ (
> @@ -33,7 +36,8 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
>  	}
>  }
>  
> -static inline void __clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	if (__builtin_constant_p(nr)) {
>  		__asm__ __volatile__ (
> @@ -60,7 +64,8 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static inline void __change_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	if (__builtin_constant_p(nr)) {
>  		__asm__ __volatile__ (
> @@ -87,7 +92,8 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -106,7 +112,8 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -117,8 +124,8 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
>  }
>  
>  /* WARNING: non atomic and it can be reordered! */
> -static inline int __test_and_change_bit(int nr,
> -					    volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -133,7 +140,8 @@ static inline int __test_and_change_bit(int nr,
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
> -static inline int test_bit(int nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
>  }
> diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> index 202d8a3b40e1..249b2a91c174 100644
> --- a/include/asm-generic/bitops/generic-non-atomic.h
> +++ b/include/asm-generic/bitops/generic-non-atomic.h
> @@ -23,7 +23,7 @@
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -gen___set_bit(unsigned int nr, volatile unsigned long *addr)
> +gen___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -32,7 +32,7 @@ gen___set_bit(unsigned int nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline void
> -gen___clear_bit(unsigned int nr, volatile unsigned long *addr)
> +gen___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -49,8 +49,8 @@ gen___clear_bit(unsigned int nr, volatile unsigned long *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static __always_inline
> -void gen___change_bit(unsigned int nr, volatile unsigned long *addr)
> +static __always_inline void
> +gen___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -67,8 +67,8 @@ void gen___change_bit(unsigned int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static __always_inline int
> -gen___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +gen___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -87,8 +87,8 @@ gen___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static __always_inline int
> -gen___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +gen___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -99,8 +99,8 @@ gen___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
>  }
>  
>  /* WARNING: non atomic and it can be reordered! */
> -static __always_inline int
> -gen___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +gen___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -115,8 +115,8 @@ gen___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
> -static __always_inline int
> -gen_test_bit(unsigned int nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +gen_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
>  	unsigned long mask = BIT_MASK(nr);
> diff --git a/include/asm-generic/bitops/instrumented-non-atomic.h b/include/asm-generic/bitops/instrumented-non-atomic.h
> index 7ab1ecc37782..b019f77ef21c 100644
> --- a/include/asm-generic/bitops/instrumented-non-atomic.h
> +++ b/include/asm-generic/bitops/instrumented-non-atomic.h
> @@ -22,7 +22,8 @@
>   * region of memory concurrently, the effect may be that only one operation
>   * succeeds.
>   */
> -static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
> +static __always_inline void
> +__set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___set_bit(nr, addr);
> @@ -37,7 +38,8 @@ static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
>   * region of memory concurrently, the effect may be that only one operation
>   * succeeds.
>   */
> -static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
> +static __always_inline void
> +__clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___clear_bit(nr, addr);
> @@ -52,7 +54,8 @@ static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
>   * region of memory concurrently, the effect may be that only one operation
>   * succeeds.
>   */
> -static __always_inline void __change_bit(long nr, volatile unsigned long *addr)
> +static __always_inline void
> +__change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___change_bit(nr, addr);
> @@ -90,7 +93,8 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
>   * This operation is non-atomic. If two instances of this operation race, one
>   * can appear to succeed but actually fail.
>   */
> -static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_set_bit(nr, addr);
> @@ -104,7 +108,8 @@ static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *
>   * This operation is non-atomic. If two instances of this operation race, one
>   * can appear to succeed but actually fail.
>   */
> -static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_clear_bit(nr, addr);
> @@ -118,7 +123,8 @@ static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long
>   * This operation is non-atomic. If two instances of this operation race, one
>   * can appear to succeed but actually fail.
>   */
> -static __always_inline bool __test_and_change_bit(long nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_change_bit(nr, addr);
> @@ -129,7 +135,8 @@ static __always_inline bool __test_and_change_bit(long nr, volatile unsigned lon
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
> -static __always_inline bool test_bit(long nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long));
>  	return arch_test_bit(nr, addr);
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index 7aaed501f768..5520ac9b1c24 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -26,12 +26,25 @@ extern unsigned int __sw_hweight16(unsigned int w);
>  extern unsigned int __sw_hweight32(unsigned int w);
>  extern unsigned long __sw_hweight64(__u64 w);
>  
> +#include <asm-generic/bitops/generic-non-atomic.h>
> +
>  /*
>   * Include this here because some architectures need generic_ffs/fls in
>   * scope
>   */
>  #include <asm/bitops.h>
>  
> +/* Check that the bitops prototypes are sane */
> +#define __check_bitop_pr(name)	static_assert(__same_type(name, gen_##name))
> +__check_bitop_pr(__set_bit);
> +__check_bitop_pr(__clear_bit);
> +__check_bitop_pr(__change_bit);
> +__check_bitop_pr(__test_and_set_bit);
> +__check_bitop_pr(__test_and_clear_bit);
> +__check_bitop_pr(__test_and_change_bit);
> +__check_bitop_pr(test_bit);
> +#undef __check_bitop_pr
> +
>  static inline int get_bitmask_order(unsigned int count)
>  {
>  	int order;
> diff --git a/tools/include/asm-generic/bitops/non-atomic.h b/tools/include/asm-generic/bitops/non-atomic.h
> index 7e10c4b50c5d..e5e78e42e57b 100644
> --- a/tools/include/asm-generic/bitops/non-atomic.h
> +++ b/tools/include/asm-generic/bitops/non-atomic.h
> @@ -2,7 +2,7 @@
>  #ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
>  #define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
>  
> -#include <asm/types.h>
> +#include <linux/bits.h>
>  
>  /**
>   * __set_bit - Set a bit in memory
> @@ -13,7 +13,8 @@
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static inline void __set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -21,7 +22,8 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
>  	*p  |= mask;
>  }
>  
> -static inline void __clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -38,7 +40,8 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static inline void __change_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -55,7 +58,8 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -74,7 +78,8 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -85,8 +90,8 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
>  }
>  
>  /* WARNING: non atomic and it can be reordered! */
> -static inline int __test_and_change_bit(int nr,
> -					    volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -101,7 +106,8 @@ static inline int __test_and_change_bit(int nr,
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
> -static inline int test_bit(int nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
>  }
> -- 
> 2.36.1
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 4/6] bitops: unify non-atomic bitops prototypes across architectures
@ 2022-06-06 16:25     ` Mark Rutland
  0 siblings, 0 replies; 75+ messages in thread
From: Mark Rutland @ 2022-06-06 16:25 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

On Mon, Jun 06, 2022 at 01:49:05PM +0200, Alexander Lobakin wrote:
> Currently, there is a mess with the prototypes of the non-atomic
> bitops across the different architectures:
> 
> ret	bool, int, unsigned long
> nr	int, long, unsigned int, unsigned long
> addr	volatile unsigned long *, volatile void *
> 
> Thankfully, it doesn't provoke any bugs, but can sometimes make
> the compiler angry when it's not handy at all.
> Adjust all the prototypes to the following standard:
> 
> ret	bool				retval can be only 0 or 1
> nr	unsigned long			native; signed makes no sense
> addr	volatile unsigned long *	bitmaps are arrays of ulongs
> 
> Finally, add some static assertions in order to prevent people from
> making a mess in this room again.

This looks like a nice cleanup!

Using `bool` makes the semantic clearer from the prototype.

> I also used the %__always_inline attribute consistently they always
> get resolved to the actual operations.

Since these will get wrapped in the next patch (and architectures may want to
use these in noinstr code), this makes sense to me.

> 
> Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>

FWIW:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Mark.

> ---
>  arch/alpha/include/asm/bitops.h               | 28 +++++------
>  arch/hexagon/include/asm/bitops.h             | 23 +++++----
>  arch/ia64/include/asm/bitops.h                | 28 +++++------
>  arch/m68k/include/asm/bitops.h                | 47 +++++++++++++------
>  arch/sh/include/asm/bitops-op32.h             | 24 ++++++----
>  .../asm-generic/bitops/generic-non-atomic.h   | 24 +++++-----
>  .../bitops/instrumented-non-atomic.h          | 21 ++++++---
>  include/linux/bitops.h                        | 13 +++++
>  tools/include/asm-generic/bitops/non-atomic.h | 24 ++++++----
>  9 files changed, 146 insertions(+), 86 deletions(-)
> 
> diff --git a/arch/alpha/include/asm/bitops.h b/arch/alpha/include/asm/bitops.h
> index e1d8483a45f2..381ad2eae4b4 100644
> --- a/arch/alpha/include/asm/bitops.h
> +++ b/arch/alpha/include/asm/bitops.h
> @@ -46,8 +46,8 @@ set_bit(unsigned long nr, volatile void * addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static inline void
> -__set_bit(unsigned long nr, volatile void * addr)
> +static __always_inline void
> +__set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	int *m = ((int *) addr) + (nr >> 5);
>  
> @@ -82,8 +82,8 @@ clear_bit_unlock(unsigned long nr, volatile void * addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static __inline__ void
> -__clear_bit(unsigned long nr, volatile void * addr)
> +static __always_inline void
> +__clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	int *m = ((int *) addr) + (nr >> 5);
>  
> @@ -118,8 +118,8 @@ change_bit(unsigned long nr, volatile void * addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static __inline__ void
> -__change_bit(unsigned long nr, volatile void * addr)
> +static __always_inline void
> +__change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	int *m = ((int *) addr) + (nr >> 5);
>  
> @@ -186,8 +186,8 @@ test_and_set_bit_lock(unsigned long nr, volatile void *addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static inline int
> -__test_and_set_bit(unsigned long nr, volatile void * addr)
> +static __always_inline bool
> +__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = 1 << (nr & 0x1f);
>  	int *m = ((int *) addr) + (nr >> 5);
> @@ -230,8 +230,8 @@ test_and_clear_bit(unsigned long nr, volatile void * addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static inline int
> -__test_and_clear_bit(unsigned long nr, volatile void * addr)
> +static __always_inline bool
> +__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = 1 << (nr & 0x1f);
>  	int *m = ((int *) addr) + (nr >> 5);
> @@ -272,8 +272,8 @@ test_and_change_bit(unsigned long nr, volatile void * addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static __inline__ int
> -__test_and_change_bit(unsigned long nr, volatile void * addr)
> +static __always_inline bool
> +__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = 1 << (nr & 0x1f);
>  	int *m = ((int *) addr) + (nr >> 5);
> @@ -283,8 +283,8 @@ __test_and_change_bit(unsigned long nr, volatile void * addr)
>  	return (old & mask) != 0;
>  }
>  
> -static inline int
> -test_bit(int nr, const volatile void * addr)
> +static __always_inline bool
> +test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return (1UL & (((const int *) addr)[nr >> 5] >> (nr & 31))) != 0UL;
>  }
> diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h
> index 75d6ba3643b8..a3bfe3a8d4b7 100644
> --- a/arch/hexagon/include/asm/bitops.h
> +++ b/arch/hexagon/include/asm/bitops.h
> @@ -127,38 +127,45 @@ static inline void change_bit(int nr, volatile void *addr)
>   * be atomic, particularly for things like slab_lock and slab_unlock.
>   *
>   */
> -static inline void __clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	test_and_clear_bit(nr, addr);
>  }
>  
> -static inline void __set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	test_and_set_bit(nr, addr);
>  }
>  
> -static inline void __change_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	test_and_change_bit(nr, addr);
>  }
>  
>  /*  Apparently, at least some of these are allowed to be non-atomic  */
> -static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_clear_bit(nr, addr);
>  }
>  
> -static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_set_bit(nr, addr);
>  }
>  
> -static inline int __test_and_change_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_change_bit(nr, addr);
>  }
>  
> -static inline int __test_bit(int nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	int retval;
>  
> @@ -172,7 +179,7 @@ static inline int __test_bit(int nr, const volatile unsigned long *addr)
>  	return retval;
>  }
>  
> -#define test_bit(nr, addr) __test_bit(nr, addr)
> +#define __test_bit(nr, addr) test_bit(nr, addr)
>  
>  /*
>   * ffz - find first zero in word.
> diff --git a/arch/ia64/include/asm/bitops.h b/arch/ia64/include/asm/bitops.h
> index 577be93c0818..4267a217a503 100644
> --- a/arch/ia64/include/asm/bitops.h
> +++ b/arch/ia64/include/asm/bitops.h
> @@ -61,8 +61,8 @@ set_bit (int nr, volatile void *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static __inline__ void
> -__set_bit (int nr, volatile void *addr)
> +static __always_inline void
> +__set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	*((__u32 *) addr + (nr >> 5)) |= (1 << (nr & 31));
>  }
> @@ -143,8 +143,8 @@ __clear_bit_unlock(int nr, void *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static __inline__ void
> -__clear_bit (int nr, volatile void *addr)
> +static __always_inline void
> +__clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	*((__u32 *) addr + (nr >> 5)) &= ~(1 << (nr & 31));
>  }
> @@ -183,8 +183,8 @@ change_bit (int nr, volatile void *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static __inline__ void
> -__change_bit (int nr, volatile void *addr)
> +static __always_inline void
> +__change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	*((__u32 *) addr + (nr >> 5)) ^= (1 << (nr & 31));
>  }
> @@ -232,8 +232,8 @@ test_and_set_bit (int nr, volatile void *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static __inline__ int
> -__test_and_set_bit (int nr, volatile void *addr)
> +static __always_inline bool
> +__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__u32 *p = (__u32 *) addr + (nr >> 5);
>  	__u32 m = 1 << (nr & 31);
> @@ -277,8 +277,8 @@ test_and_clear_bit (int nr, volatile void *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static __inline__ int
> -__test_and_clear_bit(int nr, volatile void * addr)
> +static __always_inline bool
> +__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__u32 *p = (__u32 *) addr + (nr >> 5);
>  	__u32 m = 1 << (nr & 31);
> @@ -320,8 +320,8 @@ test_and_change_bit (int nr, volatile void *addr)
>   *
>   * This operation is non-atomic and can be reordered.
>   */
> -static __inline__ int
> -__test_and_change_bit (int nr, void *addr)
> +static __always_inline bool
> +__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__u32 old, bit = (1 << (nr & 31));
>  	__u32 *m = (__u32 *) addr + (nr >> 5);
> @@ -331,8 +331,8 @@ __test_and_change_bit (int nr, void *addr)
>  	return (old & bit) != 0;
>  }
>  
> -static __inline__ int
> -test_bit (int nr, const volatile void *addr)
> +static __always_inline bool
> +test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return 1 & (((const volatile __u32 *) addr)[nr >> 5] >> (nr & 31));
>  }
> diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
> index 51283db53667..9d44bd4713cb 100644
> --- a/arch/m68k/include/asm/bitops.h
> +++ b/arch/m68k/include/asm/bitops.h
> @@ -65,8 +65,11 @@ static inline void bfset_mem_set_bit(int nr, volatile unsigned long *vaddr)
>  				bfset_mem_set_bit(nr, vaddr))
>  #endif
>  
> -#define __set_bit(nr, vaddr)	set_bit(nr, vaddr)
> -
> +static __always_inline void
> +__set_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> +	set_bit(nr, addr);
> +}
>  
>  static inline void bclr_reg_clear_bit(int nr, volatile unsigned long *vaddr)
>  {
> @@ -105,8 +108,11 @@ static inline void bfclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
>  				bfclr_mem_clear_bit(nr, vaddr))
>  #endif
>  
> -#define __clear_bit(nr, vaddr)	clear_bit(nr, vaddr)
> -
> +static __always_inline void
> +__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> +	clear_bit(nr, addr);
> +}
>  
>  static inline void bchg_reg_change_bit(int nr, volatile unsigned long *vaddr)
>  {
> @@ -145,12 +151,16 @@ static inline void bfchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
>  				bfchg_mem_change_bit(nr, vaddr))
>  #endif
>  
> -#define __change_bit(nr, vaddr)	change_bit(nr, vaddr)
> -
> +static __always_inline void
> +__change_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> +	change_bit(nr, addr);
> +}
>  
> -static inline int test_bit(int nr, const volatile unsigned long *vaddr)
> +static __always_inline bool
> +test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
> -	return (vaddr[nr >> 5] & (1UL << (nr & 31))) != 0;
> +	return (addr[nr >> 5] & (1UL << (nr & 31))) != 0;
>  }
>  
>  
> @@ -201,8 +211,11 @@ static inline int bfset_mem_test_and_set_bit(int nr,
>  					bfset_mem_test_and_set_bit(nr, vaddr))
>  #endif
>  
> -#define __test_and_set_bit(nr, vaddr)	test_and_set_bit(nr, vaddr)
> -
> +static __always_inline bool
> +__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> +	return test_and_set_bit(nr, addr);
> +}
>  
>  static inline int bclr_reg_test_and_clear_bit(int nr,
>  					      volatile unsigned long *vaddr)
> @@ -251,8 +264,11 @@ static inline int bfclr_mem_test_and_clear_bit(int nr,
>  					bfclr_mem_test_and_clear_bit(nr, vaddr))
>  #endif
>  
> -#define __test_and_clear_bit(nr, vaddr)	test_and_clear_bit(nr, vaddr)
> -
> +static __always_inline bool
> +__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> +	return test_and_clear_bit(nr, addr);
> +}
>  
>  static inline int bchg_reg_test_and_change_bit(int nr,
>  					       volatile unsigned long *vaddr)
> @@ -301,8 +317,11 @@ static inline int bfchg_mem_test_and_change_bit(int nr,
>  					bfchg_mem_test_and_change_bit(nr, vaddr))
>  #endif
>  
> -#define __test_and_change_bit(nr, vaddr) test_and_change_bit(nr, vaddr)
> -
> +static __always_inline bool
> +__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> +	return test_and_change_bit(nr, addr);
> +}
>  
>  /*
>   *	The true 68020 and more advanced processors support the "bfffo"
> diff --git a/arch/sh/include/asm/bitops-op32.h b/arch/sh/include/asm/bitops-op32.h
> index cfe5465acce7..dcd85866a394 100644
> --- a/arch/sh/include/asm/bitops-op32.h
> +++ b/arch/sh/include/asm/bitops-op32.h
> @@ -2,6 +2,8 @@
>  #ifndef __ASM_SH_BITOPS_OP32_H
>  #define __ASM_SH_BITOPS_OP32_H
>  
> +#include <linux/bits.h>
> +
>  /*
>   * The bit modifying instructions on SH-2A are only capable of working
>   * with a 3-bit immediate, which signifies the shift position for the bit
> @@ -16,7 +18,8 @@
>  #define BYTE_OFFSET(nr)		((nr) % BITS_PER_BYTE)
>  #endif
>  
> -static inline void __set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	if (__builtin_constant_p(nr)) {
>  		__asm__ __volatile__ (
> @@ -33,7 +36,8 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
>  	}
>  }
>  
> -static inline void __clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	if (__builtin_constant_p(nr)) {
>  		__asm__ __volatile__ (
> @@ -60,7 +64,8 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static inline void __change_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	if (__builtin_constant_p(nr)) {
>  		__asm__ __volatile__ (
> @@ -87,7 +92,8 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -106,7 +112,8 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -117,8 +124,8 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
>  }
>  
>  /* WARNING: non atomic and it can be reordered! */
> -static inline int __test_and_change_bit(int nr,
> -					    volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -133,7 +140,8 @@ static inline int __test_and_change_bit(int nr,
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
> -static inline int test_bit(int nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
>  }
> diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> index 202d8a3b40e1..249b2a91c174 100644
> --- a/include/asm-generic/bitops/generic-non-atomic.h
> +++ b/include/asm-generic/bitops/generic-non-atomic.h
> @@ -23,7 +23,7 @@
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -gen___set_bit(unsigned int nr, volatile unsigned long *addr)
> +gen___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -32,7 +32,7 @@ gen___set_bit(unsigned int nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline void
> -gen___clear_bit(unsigned int nr, volatile unsigned long *addr)
> +gen___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -49,8 +49,8 @@ gen___clear_bit(unsigned int nr, volatile unsigned long *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static __always_inline
> -void gen___change_bit(unsigned int nr, volatile unsigned long *addr)
> +static __always_inline void
> +gen___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -67,8 +67,8 @@ void gen___change_bit(unsigned int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static __always_inline int
> -gen___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +gen___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -87,8 +87,8 @@ gen___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static __always_inline int
> -gen___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +gen___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -99,8 +99,8 @@ gen___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
>  }
>  
>  /* WARNING: non atomic and it can be reordered! */
> -static __always_inline int
> -gen___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +gen___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -115,8 +115,8 @@ gen___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
> -static __always_inline int
> -gen_test_bit(unsigned int nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +gen_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
>  	unsigned long mask = BIT_MASK(nr);
> diff --git a/include/asm-generic/bitops/instrumented-non-atomic.h b/include/asm-generic/bitops/instrumented-non-atomic.h
> index 7ab1ecc37782..b019f77ef21c 100644
> --- a/include/asm-generic/bitops/instrumented-non-atomic.h
> +++ b/include/asm-generic/bitops/instrumented-non-atomic.h
> @@ -22,7 +22,8 @@
>   * region of memory concurrently, the effect may be that only one operation
>   * succeeds.
>   */
> -static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
> +static __always_inline void
> +__set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___set_bit(nr, addr);
> @@ -37,7 +38,8 @@ static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
>   * region of memory concurrently, the effect may be that only one operation
>   * succeeds.
>   */
> -static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
> +static __always_inline void
> +__clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___clear_bit(nr, addr);
> @@ -52,7 +54,8 @@ static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
>   * region of memory concurrently, the effect may be that only one operation
>   * succeeds.
>   */
> -static __always_inline void __change_bit(long nr, volatile unsigned long *addr)
> +static __always_inline void
> +__change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___change_bit(nr, addr);
> @@ -90,7 +93,8 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
>   * This operation is non-atomic. If two instances of this operation race, one
>   * can appear to succeed but actually fail.
>   */
> -static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_set_bit(nr, addr);
> @@ -104,7 +108,8 @@ static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *
>   * This operation is non-atomic. If two instances of this operation race, one
>   * can appear to succeed but actually fail.
>   */
> -static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_clear_bit(nr, addr);
> @@ -118,7 +123,8 @@ static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long
>   * This operation is non-atomic. If two instances of this operation race, one
>   * can appear to succeed but actually fail.
>   */
> -static __always_inline bool __test_and_change_bit(long nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_change_bit(nr, addr);
> @@ -129,7 +135,8 @@ static __always_inline bool __test_and_change_bit(long nr, volatile unsigned lon
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
> -static __always_inline bool test_bit(long nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long));
>  	return arch_test_bit(nr, addr);
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index 7aaed501f768..5520ac9b1c24 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -26,12 +26,25 @@ extern unsigned int __sw_hweight16(unsigned int w);
>  extern unsigned int __sw_hweight32(unsigned int w);
>  extern unsigned long __sw_hweight64(__u64 w);
>  
> +#include <asm-generic/bitops/generic-non-atomic.h>
> +
>  /*
>   * Include this here because some architectures need generic_ffs/fls in
>   * scope
>   */
>  #include <asm/bitops.h>
>  
> +/* Check that the bitops prototypes are sane */
> +#define __check_bitop_pr(name)	static_assert(__same_type(name, gen_##name))
> +__check_bitop_pr(__set_bit);
> +__check_bitop_pr(__clear_bit);
> +__check_bitop_pr(__change_bit);
> +__check_bitop_pr(__test_and_set_bit);
> +__check_bitop_pr(__test_and_clear_bit);
> +__check_bitop_pr(__test_and_change_bit);
> +__check_bitop_pr(test_bit);
> +#undef __check_bitop_pr
> +
>  static inline int get_bitmask_order(unsigned int count)
>  {
>  	int order;
> diff --git a/tools/include/asm-generic/bitops/non-atomic.h b/tools/include/asm-generic/bitops/non-atomic.h
> index 7e10c4b50c5d..e5e78e42e57b 100644
> --- a/tools/include/asm-generic/bitops/non-atomic.h
> +++ b/tools/include/asm-generic/bitops/non-atomic.h
> @@ -2,7 +2,7 @@
>  #ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
>  #define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
>  
> -#include <asm/types.h>
> +#include <linux/bits.h>
>  
>  /**
>   * __set_bit - Set a bit in memory
> @@ -13,7 +13,8 @@
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static inline void __set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -21,7 +22,8 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
>  	*p  |= mask;
>  }
>  
> -static inline void __clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -38,7 +40,8 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static inline void __change_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -55,7 +58,8 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -74,7 +78,8 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -85,8 +90,8 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
>  }
>  
>  /* WARNING: non atomic and it can be reordered! */
> -static inline int __test_and_change_bit(int nr,
> -					    volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -101,7 +106,8 @@ static inline int __test_and_change_bit(int nr,
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
> -static inline int test_bit(int nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
>  }
> -- 
> 2.36.1
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 4/6] bitops: unify non-atomic bitops prototypes across architectures
@ 2022-06-06 16:25     ` Mark Rutland
  0 siblings, 0 replies; 75+ messages in thread
From: Mark Rutland @ 2022-06-06 16:25 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh

On Mon, Jun 06, 2022 at 01:49:05PM +0200, Alexander Lobakin wrote:
> Currently, there is a mess with the prototypes of the non-atomic
> bitops across the different architectures:
> 
> ret	bool, int, unsigned long
> nr	int, long, unsigned int, unsigned long
> addr	volatile unsigned long *, volatile void *
> 
> Thankfully, it doesn't provoke any bugs, but can sometimes make
> the compiler angry when it's not handy at all.
> Adjust all the prototypes to the following standard:
> 
> ret	bool				retval can be only 0 or 1
> nr	unsigned long			native; signed makes no sense
> addr	volatile unsigned long *	bitmaps are arrays of ulongs
> 
> Finally, add some static assertions in order to prevent people from
> making a mess in this room again.

This looks like a nice cleanup!

Using `bool` makes the semantic clearer from the prototype.

> I also used the %__always_inline attribute consistently they always
> get resolved to the actual operations.

Since these will get wrapped in the next patch (and architectures may want to
use these in noinstr code), this makes sense to me.

> 
> Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>

FWIW:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Mark.

> ---
>  arch/alpha/include/asm/bitops.h               | 28 +++++------
>  arch/hexagon/include/asm/bitops.h             | 23 +++++----
>  arch/ia64/include/asm/bitops.h                | 28 +++++------
>  arch/m68k/include/asm/bitops.h                | 47 +++++++++++++------
>  arch/sh/include/asm/bitops-op32.h             | 24 ++++++----
>  .../asm-generic/bitops/generic-non-atomic.h   | 24 +++++-----
>  .../bitops/instrumented-non-atomic.h          | 21 ++++++---
>  include/linux/bitops.h                        | 13 +++++
>  tools/include/asm-generic/bitops/non-atomic.h | 24 ++++++----
>  9 files changed, 146 insertions(+), 86 deletions(-)
> 
> diff --git a/arch/alpha/include/asm/bitops.h b/arch/alpha/include/asm/bitops.h
> index e1d8483a45f2..381ad2eae4b4 100644
> --- a/arch/alpha/include/asm/bitops.h
> +++ b/arch/alpha/include/asm/bitops.h
> @@ -46,8 +46,8 @@ set_bit(unsigned long nr, volatile void * addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static inline void
> -__set_bit(unsigned long nr, volatile void * addr)
> +static __always_inline void
> +__set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	int *m = ((int *) addr) + (nr >> 5);
>  
> @@ -82,8 +82,8 @@ clear_bit_unlock(unsigned long nr, volatile void * addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static __inline__ void
> -__clear_bit(unsigned long nr, volatile void * addr)
> +static __always_inline void
> +__clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	int *m = ((int *) addr) + (nr >> 5);
>  
> @@ -118,8 +118,8 @@ change_bit(unsigned long nr, volatile void * addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static __inline__ void
> -__change_bit(unsigned long nr, volatile void * addr)
> +static __always_inline void
> +__change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	int *m = ((int *) addr) + (nr >> 5);
>  
> @@ -186,8 +186,8 @@ test_and_set_bit_lock(unsigned long nr, volatile void *addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static inline int
> -__test_and_set_bit(unsigned long nr, volatile void * addr)
> +static __always_inline bool
> +__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = 1 << (nr & 0x1f);
>  	int *m = ((int *) addr) + (nr >> 5);
> @@ -230,8 +230,8 @@ test_and_clear_bit(unsigned long nr, volatile void * addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static inline int
> -__test_and_clear_bit(unsigned long nr, volatile void * addr)
> +static __always_inline bool
> +__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = 1 << (nr & 0x1f);
>  	int *m = ((int *) addr) + (nr >> 5);
> @@ -272,8 +272,8 @@ test_and_change_bit(unsigned long nr, volatile void * addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static __inline__ int
> -__test_and_change_bit(unsigned long nr, volatile void * addr)
> +static __always_inline bool
> +__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = 1 << (nr & 0x1f);
>  	int *m = ((int *) addr) + (nr >> 5);
> @@ -283,8 +283,8 @@ __test_and_change_bit(unsigned long nr, volatile void * addr)
>  	return (old & mask) != 0;
>  }
>  
> -static inline int
> -test_bit(int nr, const volatile void * addr)
> +static __always_inline bool
> +test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return (1UL & (((const int *) addr)[nr >> 5] >> (nr & 31))) != 0UL;
>  }
> diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h
> index 75d6ba3643b8..a3bfe3a8d4b7 100644
> --- a/arch/hexagon/include/asm/bitops.h
> +++ b/arch/hexagon/include/asm/bitops.h
> @@ -127,38 +127,45 @@ static inline void change_bit(int nr, volatile void *addr)
>   * be atomic, particularly for things like slab_lock and slab_unlock.
>   *
>   */
> -static inline void __clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	test_and_clear_bit(nr, addr);
>  }
>  
> -static inline void __set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	test_and_set_bit(nr, addr);
>  }
>  
> -static inline void __change_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	test_and_change_bit(nr, addr);
>  }
>  
>  /*  Apparently, at least some of these are allowed to be non-atomic  */
> -static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_clear_bit(nr, addr);
>  }
>  
> -static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_set_bit(nr, addr);
>  }
>  
> -static inline int __test_and_change_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_change_bit(nr, addr);
>  }
>  
> -static inline int __test_bit(int nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	int retval;
>  
> @@ -172,7 +179,7 @@ static inline int __test_bit(int nr, const volatile unsigned long *addr)
>  	return retval;
>  }
>  
> -#define test_bit(nr, addr) __test_bit(nr, addr)
> +#define __test_bit(nr, addr) test_bit(nr, addr)
>  
>  /*
>   * ffz - find first zero in word.
> diff --git a/arch/ia64/include/asm/bitops.h b/arch/ia64/include/asm/bitops.h
> index 577be93c0818..4267a217a503 100644
> --- a/arch/ia64/include/asm/bitops.h
> +++ b/arch/ia64/include/asm/bitops.h
> @@ -61,8 +61,8 @@ set_bit (int nr, volatile void *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static __inline__ void
> -__set_bit (int nr, volatile void *addr)
> +static __always_inline void
> +__set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	*((__u32 *) addr + (nr >> 5)) |= (1 << (nr & 31));
>  }
> @@ -143,8 +143,8 @@ __clear_bit_unlock(int nr, void *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static __inline__ void
> -__clear_bit (int nr, volatile void *addr)
> +static __always_inline void
> +__clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	*((__u32 *) addr + (nr >> 5)) &= ~(1 << (nr & 31));
>  }
> @@ -183,8 +183,8 @@ change_bit (int nr, volatile void *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static __inline__ void
> -__change_bit (int nr, volatile void *addr)
> +static __always_inline void
> +__change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	*((__u32 *) addr + (nr >> 5)) ^= (1 << (nr & 31));
>  }
> @@ -232,8 +232,8 @@ test_and_set_bit (int nr, volatile void *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static __inline__ int
> -__test_and_set_bit (int nr, volatile void *addr)
> +static __always_inline bool
> +__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__u32 *p = (__u32 *) addr + (nr >> 5);
>  	__u32 m = 1 << (nr & 31);
> @@ -277,8 +277,8 @@ test_and_clear_bit (int nr, volatile void *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static __inline__ int
> -__test_and_clear_bit(int nr, volatile void * addr)
> +static __always_inline bool
> +__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__u32 *p = (__u32 *) addr + (nr >> 5);
>  	__u32 m = 1 << (nr & 31);
> @@ -320,8 +320,8 @@ test_and_change_bit (int nr, volatile void *addr)
>   *
>   * This operation is non-atomic and can be reordered.
>   */
> -static __inline__ int
> -__test_and_change_bit (int nr, void *addr)
> +static __always_inline bool
> +__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__u32 old, bit = (1 << (nr & 31));
>  	__u32 *m = (__u32 *) addr + (nr >> 5);
> @@ -331,8 +331,8 @@ __test_and_change_bit (int nr, void *addr)
>  	return (old & bit) != 0;
>  }
>  
> -static __inline__ int
> -test_bit (int nr, const volatile void *addr)
> +static __always_inline bool
> +test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return 1 & (((const volatile __u32 *) addr)[nr >> 5] >> (nr & 31));
>  }
> diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
> index 51283db53667..9d44bd4713cb 100644
> --- a/arch/m68k/include/asm/bitops.h
> +++ b/arch/m68k/include/asm/bitops.h
> @@ -65,8 +65,11 @@ static inline void bfset_mem_set_bit(int nr, volatile unsigned long *vaddr)
>  				bfset_mem_set_bit(nr, vaddr))
>  #endif
>  
> -#define __set_bit(nr, vaddr)	set_bit(nr, vaddr)
> -
> +static __always_inline void
> +__set_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> +	set_bit(nr, addr);
> +}
>  
>  static inline void bclr_reg_clear_bit(int nr, volatile unsigned long *vaddr)
>  {
> @@ -105,8 +108,11 @@ static inline void bfclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
>  				bfclr_mem_clear_bit(nr, vaddr))
>  #endif
>  
> -#define __clear_bit(nr, vaddr)	clear_bit(nr, vaddr)
> -
> +static __always_inline void
> +__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> +	clear_bit(nr, addr);
> +}
>  
>  static inline void bchg_reg_change_bit(int nr, volatile unsigned long *vaddr)
>  {
> @@ -145,12 +151,16 @@ static inline void bfchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
>  				bfchg_mem_change_bit(nr, vaddr))
>  #endif
>  
> -#define __change_bit(nr, vaddr)	change_bit(nr, vaddr)
> -
> +static __always_inline void
> +__change_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> +	change_bit(nr, addr);
> +}
>  
> -static inline int test_bit(int nr, const volatile unsigned long *vaddr)
> +static __always_inline bool
> +test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
> -	return (vaddr[nr >> 5] & (1UL << (nr & 31))) != 0;
> +	return (addr[nr >> 5] & (1UL << (nr & 31))) != 0;
>  }
>  
>  
> @@ -201,8 +211,11 @@ static inline int bfset_mem_test_and_set_bit(int nr,
>  					bfset_mem_test_and_set_bit(nr, vaddr))
>  #endif
>  
> -#define __test_and_set_bit(nr, vaddr)	test_and_set_bit(nr, vaddr)
> -
> +static __always_inline bool
> +__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> +	return test_and_set_bit(nr, addr);
> +}
>  
>  static inline int bclr_reg_test_and_clear_bit(int nr,
>  					      volatile unsigned long *vaddr)
> @@ -251,8 +264,11 @@ static inline int bfclr_mem_test_and_clear_bit(int nr,
>  					bfclr_mem_test_and_clear_bit(nr, vaddr))
>  #endif
>  
> -#define __test_and_clear_bit(nr, vaddr)	test_and_clear_bit(nr, vaddr)
> -
> +static __always_inline bool
> +__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> +	return test_and_clear_bit(nr, addr);
> +}
>  
>  static inline int bchg_reg_test_and_change_bit(int nr,
>  					       volatile unsigned long *vaddr)
> @@ -301,8 +317,11 @@ static inline int bfchg_mem_test_and_change_bit(int nr,
>  					bfchg_mem_test_and_change_bit(nr, vaddr))
>  #endif
>  
> -#define __test_and_change_bit(nr, vaddr) test_and_change_bit(nr, vaddr)
> -
> +static __always_inline bool
> +__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> +	return test_and_change_bit(nr, addr);
> +}
>  
>  /*
>   *	The true 68020 and more advanced processors support the "bfffo"
> diff --git a/arch/sh/include/asm/bitops-op32.h b/arch/sh/include/asm/bitops-op32.h
> index cfe5465acce7..dcd85866a394 100644
> --- a/arch/sh/include/asm/bitops-op32.h
> +++ b/arch/sh/include/asm/bitops-op32.h
> @@ -2,6 +2,8 @@
>  #ifndef __ASM_SH_BITOPS_OP32_H
>  #define __ASM_SH_BITOPS_OP32_H
>  
> +#include <linux/bits.h>
> +
>  /*
>   * The bit modifying instructions on SH-2A are only capable of working
>   * with a 3-bit immediate, which signifies the shift position for the bit
> @@ -16,7 +18,8 @@
>  #define BYTE_OFFSET(nr)		((nr) % BITS_PER_BYTE)
>  #endif
>  
> -static inline void __set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	if (__builtin_constant_p(nr)) {
>  		__asm__ __volatile__ (
> @@ -33,7 +36,8 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
>  	}
>  }
>  
> -static inline void __clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	if (__builtin_constant_p(nr)) {
>  		__asm__ __volatile__ (
> @@ -60,7 +64,8 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static inline void __change_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	if (__builtin_constant_p(nr)) {
>  		__asm__ __volatile__ (
> @@ -87,7 +92,8 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -106,7 +112,8 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -117,8 +124,8 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
>  }
>  
>  /* WARNING: non atomic and it can be reordered! */
> -static inline int __test_and_change_bit(int nr,
> -					    volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -133,7 +140,8 @@ static inline int __test_and_change_bit(int nr,
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
> -static inline int test_bit(int nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
>  }
> diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> index 202d8a3b40e1..249b2a91c174 100644
> --- a/include/asm-generic/bitops/generic-non-atomic.h
> +++ b/include/asm-generic/bitops/generic-non-atomic.h
> @@ -23,7 +23,7 @@
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -gen___set_bit(unsigned int nr, volatile unsigned long *addr)
> +gen___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -32,7 +32,7 @@ gen___set_bit(unsigned int nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline void
> -gen___clear_bit(unsigned int nr, volatile unsigned long *addr)
> +gen___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -49,8 +49,8 @@ gen___clear_bit(unsigned int nr, volatile unsigned long *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static __always_inline
> -void gen___change_bit(unsigned int nr, volatile unsigned long *addr)
> +static __always_inline void
> +gen___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -67,8 +67,8 @@ void gen___change_bit(unsigned int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static __always_inline int
> -gen___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +gen___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -87,8 +87,8 @@ gen___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static __always_inline int
> -gen___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +gen___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -99,8 +99,8 @@ gen___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
>  }
>  
>  /* WARNING: non atomic and it can be reordered! */
> -static __always_inline int
> -gen___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +gen___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -115,8 +115,8 @@ gen___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
> -static __always_inline int
> -gen_test_bit(unsigned int nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +gen_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
>  	unsigned long mask = BIT_MASK(nr);
> diff --git a/include/asm-generic/bitops/instrumented-non-atomic.h b/include/asm-generic/bitops/instrumented-non-atomic.h
> index 7ab1ecc37782..b019f77ef21c 100644
> --- a/include/asm-generic/bitops/instrumented-non-atomic.h
> +++ b/include/asm-generic/bitops/instrumented-non-atomic.h
> @@ -22,7 +22,8 @@
>   * region of memory concurrently, the effect may be that only one operation
>   * succeeds.
>   */
> -static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
> +static __always_inline void
> +__set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___set_bit(nr, addr);
> @@ -37,7 +38,8 @@ static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
>   * region of memory concurrently, the effect may be that only one operation
>   * succeeds.
>   */
> -static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
> +static __always_inline void
> +__clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___clear_bit(nr, addr);
> @@ -52,7 +54,8 @@ static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
>   * region of memory concurrently, the effect may be that only one operation
>   * succeeds.
>   */
> -static __always_inline void __change_bit(long nr, volatile unsigned long *addr)
> +static __always_inline void
> +__change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___change_bit(nr, addr);
> @@ -90,7 +93,8 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
>   * This operation is non-atomic. If two instances of this operation race, one
>   * can appear to succeed but actually fail.
>   */
> -static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_set_bit(nr, addr);
> @@ -104,7 +108,8 @@ static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *
>   * This operation is non-atomic. If two instances of this operation race, one
>   * can appear to succeed but actually fail.
>   */
> -static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_clear_bit(nr, addr);
> @@ -118,7 +123,8 @@ static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long
>   * This operation is non-atomic. If two instances of this operation race, one
>   * can appear to succeed but actually fail.
>   */
> -static __always_inline bool __test_and_change_bit(long nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_change_bit(nr, addr);
> @@ -129,7 +135,8 @@ static __always_inline bool __test_and_change_bit(long nr, volatile unsigned lon
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
> -static __always_inline bool test_bit(long nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long));
>  	return arch_test_bit(nr, addr);
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index 7aaed501f768..5520ac9b1c24 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -26,12 +26,25 @@ extern unsigned int __sw_hweight16(unsigned int w);
>  extern unsigned int __sw_hweight32(unsigned int w);
>  extern unsigned long __sw_hweight64(__u64 w);
>  
> +#include <asm-generic/bitops/generic-non-atomic.h>
> +
>  /*
>   * Include this here because some architectures need generic_ffs/fls in
>   * scope
>   */
>  #include <asm/bitops.h>
>  
> +/* Check that the bitops prototypes are sane */
> +#define __check_bitop_pr(name)	static_assert(__same_type(name, gen_##name))
> +__check_bitop_pr(__set_bit);
> +__check_bitop_pr(__clear_bit);
> +__check_bitop_pr(__change_bit);
> +__check_bitop_pr(__test_and_set_bit);
> +__check_bitop_pr(__test_and_clear_bit);
> +__check_bitop_pr(__test_and_change_bit);
> +__check_bitop_pr(test_bit);
> +#undef __check_bitop_pr
> +
>  static inline int get_bitmask_order(unsigned int count)
>  {
>  	int order;
> diff --git a/tools/include/asm-generic/bitops/non-atomic.h b/tools/include/asm-generic/bitops/non-atomic.h
> index 7e10c4b50c5d..e5e78e42e57b 100644
> --- a/tools/include/asm-generic/bitops/non-atomic.h
> +++ b/tools/include/asm-generic/bitops/non-atomic.h
> @@ -2,7 +2,7 @@
>  #ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
>  #define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
>  
> -#include <asm/types.h>
> +#include <linux/bits.h>
>  
>  /**
>   * __set_bit - Set a bit in memory
> @@ -13,7 +13,8 @@
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static inline void __set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -21,7 +22,8 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
>  	*p  |= mask;
>  }
>  
> -static inline void __clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -38,7 +40,8 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static inline void __change_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -55,7 +58,8 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -74,7 +78,8 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -85,8 +90,8 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
>  }
>  
>  /* WARNING: non atomic and it can be reordered! */
> -static inline int __test_and_change_bit(int nr,
> -					    volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -101,7 +106,8 @@ static inline int __test_and_change_bit(int nr,
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
> -static inline int test_bit(int nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
>  }
> -- 
> 2.36.1
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 5/6] bitops: wrap non-atomic bitops with a transparent macro
  2022-06-06 11:49   ` Alexander Lobakin
  (?)
@ 2022-06-06 16:27     ` Mark Rutland
  -1 siblings, 0 replies; 75+ messages in thread
From: Mark Rutland @ 2022-06-06 16:27 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

On Mon, Jun 06, 2022 at 01:49:06PM +0200, Alexander Lobakin wrote:
> In preparation for altering the non-atomic bitops with a macro, wrap
> them in a transparent definition. This requires prepending one more
> '_' to their names in order to be able to do that seamlessly.
> sparc32 already has the triple-underscored functions, so I had to
> rename them ('___' -> 'sp32_').

Could we use an 'arch_' prefix here, like we do for the atomics, or is that
already overloaded?

Thanks,
Mark.

> 
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> ---
>  arch/alpha/include/asm/bitops.h               | 14 ++++----
>  arch/hexagon/include/asm/bitops.h             | 16 +++++-----
>  arch/ia64/include/asm/bitops.h                | 26 +++++++--------
>  arch/m68k/include/asm/bitops.h                | 14 ++++----
>  arch/sh/include/asm/bitops-op32.h             | 22 ++++++-------
>  arch/sparc/include/asm/bitops_32.h            | 18 +++++------
>  arch/sparc/lib/atomic32.c                     | 12 +++----
>  .../bitops/instrumented-non-atomic.h          | 28 ++++++++--------
>  include/asm-generic/bitops/non-atomic.h       | 14 ++++----
>  include/linux/bitops.h                        | 32 ++++++++++++++-----
>  tools/include/asm-generic/bitops/non-atomic.h | 24 +++++++-------
>  tools/include/linux/bitops.h                  | 16 ++++++++++
>  12 files changed, 134 insertions(+), 102 deletions(-)
> 
> diff --git a/arch/alpha/include/asm/bitops.h b/arch/alpha/include/asm/bitops.h
> index 381ad2eae4b4..e7d119826062 100644
> --- a/arch/alpha/include/asm/bitops.h
> +++ b/arch/alpha/include/asm/bitops.h
> @@ -47,7 +47,7 @@ set_bit(unsigned long nr, volatile void * addr)
>   * WARNING: non atomic version.
>   */
>  static __always_inline void
> -__set_bit(unsigned long nr, volatile unsigned long *addr)
> +___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	int *m = ((int *) addr) + (nr >> 5);
>  
> @@ -83,7 +83,7 @@ clear_bit_unlock(unsigned long nr, volatile void * addr)
>   * WARNING: non atomic version.
>   */
>  static __always_inline void
> -__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	int *m = ((int *) addr) + (nr >> 5);
>  
> @@ -119,7 +119,7 @@ change_bit(unsigned long nr, volatile void * addr)
>   * WARNING: non atomic version.
>   */
>  static __always_inline void
> -__change_bit(unsigned long nr, volatile unsigned long *addr)
> +___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	int *m = ((int *) addr) + (nr >> 5);
>  
> @@ -187,7 +187,7 @@ test_and_set_bit_lock(unsigned long nr, volatile void *addr)
>   * WARNING: non atomic version.
>   */
>  static __always_inline bool
> -__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = 1 << (nr & 0x1f);
>  	int *m = ((int *) addr) + (nr >> 5);
> @@ -231,7 +231,7 @@ test_and_clear_bit(unsigned long nr, volatile void * addr)
>   * WARNING: non atomic version.
>   */
>  static __always_inline bool
> -__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = 1 << (nr & 0x1f);
>  	int *m = ((int *) addr) + (nr >> 5);
> @@ -273,7 +273,7 @@ test_and_change_bit(unsigned long nr, volatile void * addr)
>   * WARNING: non atomic version.
>   */
>  static __always_inline bool
> -__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = 1 << (nr & 0x1f);
>  	int *m = ((int *) addr) + (nr >> 5);
> @@ -284,7 +284,7 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline bool
> -test_bit(unsigned long nr, const volatile unsigned long *addr)
> +_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return (1UL & (((const int *) addr)[nr >> 5] >> (nr & 31))) != 0UL;
>  }
> diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h
> index a3bfe3a8d4b7..37144f8e0b0c 100644
> --- a/arch/hexagon/include/asm/bitops.h
> +++ b/arch/hexagon/include/asm/bitops.h
> @@ -128,44 +128,44 @@ static inline void change_bit(int nr, volatile void *addr)
>   *
>   */
>  static __always_inline void
> -__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	test_and_clear_bit(nr, addr);
>  }
>  
>  static __always_inline void
> -__set_bit(unsigned long nr, volatile unsigned long *addr)
> +___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	test_and_set_bit(nr, addr);
>  }
>  
>  static __always_inline void
> -__change_bit(unsigned long nr, volatile unsigned long *addr)
> +___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	test_and_change_bit(nr, addr);
>  }
>  
>  /*  Apparently, at least some of these are allowed to be non-atomic  */
>  static __always_inline bool
> -__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_clear_bit(nr, addr);
>  }
>  
>  static __always_inline bool
> -__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_set_bit(nr, addr);
>  }
>  
>  static __always_inline bool
> -__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_change_bit(nr, addr);
>  }
>  
>  static __always_inline bool
> -test_bit(unsigned long nr, const volatile unsigned long *addr)
> +_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	int retval;
>  
> @@ -179,7 +179,7 @@ test_bit(unsigned long nr, const volatile unsigned long *addr)
>  	return retval;
>  }
>  
> -#define __test_bit(nr, addr) test_bit(nr, addr)
> +#define __test_bit(nr, addr) _test_bit(nr, addr)
>  
>  /*
>   * ffz - find first zero in word.
> diff --git a/arch/ia64/include/asm/bitops.h b/arch/ia64/include/asm/bitops.h
> index 4267a217a503..3cb231fb0283 100644
> --- a/arch/ia64/include/asm/bitops.h
> +++ b/arch/ia64/include/asm/bitops.h
> @@ -53,7 +53,7 @@ set_bit (int nr, volatile void *addr)
>  }
>  
>  /**
> - * __set_bit - Set a bit in memory
> + * ___set_bit - Set a bit in memory
>   * @nr: the bit to set
>   * @addr: the address to start counting from
>   *
> @@ -62,7 +62,7 @@ set_bit (int nr, volatile void *addr)
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -__set_bit(unsigned long nr, volatile unsigned long *addr)
> +___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	*((__u32 *) addr + (nr >> 5)) |= (1 << (nr & 31));
>  }
> @@ -135,7 +135,7 @@ __clear_bit_unlock(int nr, void *addr)
>  }
>  
>  /**
> - * __clear_bit - Clears a bit in memory (non-atomic version)
> + * ___clear_bit - Clears a bit in memory (non-atomic version)
>   * @nr: the bit to clear
>   * @addr: the address to start counting from
>   *
> @@ -144,7 +144,7 @@ __clear_bit_unlock(int nr, void *addr)
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	*((__u32 *) addr + (nr >> 5)) &= ~(1 << (nr & 31));
>  }
> @@ -175,7 +175,7 @@ change_bit (int nr, volatile void *addr)
>  }
>  
>  /**
> - * __change_bit - Toggle a bit in memory
> + * ___change_bit - Toggle a bit in memory
>   * @nr: the bit to toggle
>   * @addr: the address to start counting from
>   *
> @@ -184,7 +184,7 @@ change_bit (int nr, volatile void *addr)
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -__change_bit(unsigned long nr, volatile unsigned long *addr)
> +___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	*((__u32 *) addr + (nr >> 5)) ^= (1 << (nr & 31));
>  }
> @@ -224,7 +224,7 @@ test_and_set_bit (int nr, volatile void *addr)
>  #define test_and_set_bit_lock test_and_set_bit
>  
>  /**
> - * __test_and_set_bit - Set a bit and return its old value
> + * ___test_and_set_bit - Set a bit and return its old value
>   * @nr: Bit to set
>   * @addr: Address to count from
>   *
> @@ -233,7 +233,7 @@ test_and_set_bit (int nr, volatile void *addr)
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
>  static __always_inline bool
> -__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__u32 *p = (__u32 *) addr + (nr >> 5);
>  	__u32 m = 1 << (nr & 31);
> @@ -269,7 +269,7 @@ test_and_clear_bit (int nr, volatile void *addr)
>  }
>  
>  /**
> - * __test_and_clear_bit - Clear a bit and return its old value
> + * ___test_and_clear_bit - Clear a bit and return its old value
>   * @nr: Bit to clear
>   * @addr: Address to count from
>   *
> @@ -278,7 +278,7 @@ test_and_clear_bit (int nr, volatile void *addr)
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
>  static __always_inline bool
> -__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__u32 *p = (__u32 *) addr + (nr >> 5);
>  	__u32 m = 1 << (nr & 31);
> @@ -314,14 +314,14 @@ test_and_change_bit (int nr, volatile void *addr)
>  }
>  
>  /**
> - * __test_and_change_bit - Change a bit and return its old value
> + * ___test_and_change_bit - Change a bit and return its old value
>   * @nr: Bit to change
>   * @addr: Address to count from
>   *
>   * This operation is non-atomic and can be reordered.
>   */
>  static __always_inline bool
> -__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__u32 old, bit = (1 << (nr & 31));
>  	__u32 *m = (__u32 *) addr + (nr >> 5);
> @@ -332,7 +332,7 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline bool
> -test_bit(unsigned long nr, const volatile unsigned long *addr)
> +_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return 1 & (((const volatile __u32 *) addr)[nr >> 5] >> (nr & 31));
>  }
> diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
> index 9d44bd4713cb..ed08c9d9c547 100644
> --- a/arch/m68k/include/asm/bitops.h
> +++ b/arch/m68k/include/asm/bitops.h
> @@ -66,7 +66,7 @@ static inline void bfset_mem_set_bit(int nr, volatile unsigned long *vaddr)
>  #endif
>  
>  static __always_inline void
> -__set_bit(unsigned long nr, volatile unsigned long *addr)
> +___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	set_bit(nr, addr);
>  }
> @@ -109,7 +109,7 @@ static inline void bfclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
>  #endif
>  
>  static __always_inline void
> -__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	clear_bit(nr, addr);
>  }
> @@ -152,13 +152,13 @@ static inline void bfchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
>  #endif
>  
>  static __always_inline void
> -__change_bit(unsigned long nr, volatile unsigned long *addr)
> +___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	change_bit(nr, addr);
>  }
>  
>  static __always_inline bool
> -test_bit(unsigned long nr, const volatile unsigned long *addr)
> +_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return (addr[nr >> 5] & (1UL << (nr & 31))) != 0;
>  }
> @@ -212,7 +212,7 @@ static inline int bfset_mem_test_and_set_bit(int nr,
>  #endif
>  
>  static __always_inline bool
> -__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_set_bit(nr, addr);
>  }
> @@ -265,7 +265,7 @@ static inline int bfclr_mem_test_and_clear_bit(int nr,
>  #endif
>  
>  static __always_inline bool
> -__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_clear_bit(nr, addr);
>  }
> @@ -318,7 +318,7 @@ static inline int bfchg_mem_test_and_change_bit(int nr,
>  #endif
>  
>  static __always_inline bool
> -__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_change_bit(nr, addr);
>  }
> diff --git a/arch/sh/include/asm/bitops-op32.h b/arch/sh/include/asm/bitops-op32.h
> index dcd85866a394..24b8a809c1b9 100644
> --- a/arch/sh/include/asm/bitops-op32.h
> +++ b/arch/sh/include/asm/bitops-op32.h
> @@ -19,7 +19,7 @@
>  #endif
>  
>  static __always_inline void
> -__set_bit(unsigned long nr, volatile unsigned long *addr)
> +___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	if (__builtin_constant_p(nr)) {
>  		__asm__ __volatile__ (
> @@ -37,7 +37,7 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline void
> -__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	if (__builtin_constant_p(nr)) {
>  		__asm__ __volatile__ (
> @@ -56,7 +56,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __change_bit - Toggle a bit in memory
> + * ___change_bit - Toggle a bit in memory
>   * @nr: the bit to change
>   * @addr: the address to start counting from
>   *
> @@ -65,7 +65,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -__change_bit(unsigned long nr, volatile unsigned long *addr)
> +___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	if (__builtin_constant_p(nr)) {
>  		__asm__ __volatile__ (
> @@ -84,7 +84,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __test_and_set_bit - Set a bit and return its old value
> + * ___test_and_set_bit - Set a bit and return its old value
>   * @nr: Bit to set
>   * @addr: Address to count from
>   *
> @@ -93,7 +93,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
>  static __always_inline bool
> -__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -104,7 +104,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __test_and_clear_bit - Clear a bit and return its old value
> + * ___test_and_clear_bit - Clear a bit and return its old value
>   * @nr: Bit to clear
>   * @addr: Address to count from
>   *
> @@ -113,7 +113,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
>  static __always_inline bool
> -__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -125,7 +125,7 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  
>  /* WARNING: non atomic and it can be reordered! */
>  static __always_inline bool
> -__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -136,12 +136,12 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * test_bit - Determine whether a bit is set
> + * _test_bit - Determine whether a bit is set
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
>  static __always_inline bool
> -test_bit(unsigned long nr, const volatile unsigned long *addr)
> +_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
>  }
> diff --git a/arch/sparc/include/asm/bitops_32.h b/arch/sparc/include/asm/bitops_32.h
> index 889afa9f990f..3448c191b484 100644
> --- a/arch/sparc/include/asm/bitops_32.h
> +++ b/arch/sparc/include/asm/bitops_32.h
> @@ -19,9 +19,9 @@
>  #error only <linux/bitops.h> can be included directly
>  #endif
>  
> -unsigned long ___set_bit(unsigned long *addr, unsigned long mask);
> -unsigned long ___clear_bit(unsigned long *addr, unsigned long mask);
> -unsigned long ___change_bit(unsigned long *addr, unsigned long mask);
> +unsigned long sp32___set_bit(unsigned long *addr, unsigned long mask);
> +unsigned long sp32___clear_bit(unsigned long *addr, unsigned long mask);
> +unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask);
>  
>  /*
>   * Set bit 'nr' in 32-bit quantity at address 'addr' where bit '0'
> @@ -36,7 +36,7 @@ static inline int test_and_set_bit(unsigned long nr, volatile unsigned long *add
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	return ___set_bit(ADDR, mask) != 0;
> +	return sp32___set_bit(ADDR, mask) != 0;
>  }
>  
>  static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -46,7 +46,7 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	(void) ___set_bit(ADDR, mask);
> +	(void) sp32___set_bit(ADDR, mask);
>  }
>  
>  static inline int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -56,7 +56,7 @@ static inline int test_and_clear_bit(unsigned long nr, volatile unsigned long *a
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	return ___clear_bit(ADDR, mask) != 0;
> +	return sp32___clear_bit(ADDR, mask) != 0;
>  }
>  
>  static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -66,7 +66,7 @@ static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	(void) ___clear_bit(ADDR, mask);
> +	(void) sp32___clear_bit(ADDR, mask);
>  }
>  
>  static inline int test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -76,7 +76,7 @@ static inline int test_and_change_bit(unsigned long nr, volatile unsigned long *
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	return ___change_bit(ADDR, mask) != 0;
> +	return sp32___change_bit(ADDR, mask) != 0;
>  }
>  
>  static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -86,7 +86,7 @@ static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	(void) ___change_bit(ADDR, mask);
> +	(void) sp32___change_bit(ADDR, mask);
>  }
>  
>  #include <asm-generic/bitops/non-atomic.h>
> diff --git a/arch/sparc/lib/atomic32.c b/arch/sparc/lib/atomic32.c
> index 8b81d0f00c97..cf80d1ae352b 100644
> --- a/arch/sparc/lib/atomic32.c
> +++ b/arch/sparc/lib/atomic32.c
> @@ -120,7 +120,7 @@ void arch_atomic_set(atomic_t *v, int i)
>  }
>  EXPORT_SYMBOL(arch_atomic_set);
>  
> -unsigned long ___set_bit(unsigned long *addr, unsigned long mask)
> +unsigned long sp32___set_bit(unsigned long *addr, unsigned long mask)
>  {
>  	unsigned long old, flags;
>  
> @@ -131,9 +131,9 @@ unsigned long ___set_bit(unsigned long *addr, unsigned long mask)
>  
>  	return old & mask;
>  }
> -EXPORT_SYMBOL(___set_bit);
> +EXPORT_SYMBOL(sp32___set_bit);
>  
> -unsigned long ___clear_bit(unsigned long *addr, unsigned long mask)
> +unsigned long sp32___clear_bit(unsigned long *addr, unsigned long mask)
>  {
>  	unsigned long old, flags;
>  
> @@ -144,9 +144,9 @@ unsigned long ___clear_bit(unsigned long *addr, unsigned long mask)
>  
>  	return old & mask;
>  }
> -EXPORT_SYMBOL(___clear_bit);
> +EXPORT_SYMBOL(sp32___clear_bit);
>  
> -unsigned long ___change_bit(unsigned long *addr, unsigned long mask)
> +unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask)
>  {
>  	unsigned long old, flags;
>  
> @@ -157,7 +157,7 @@ unsigned long ___change_bit(unsigned long *addr, unsigned long mask)
>  
>  	return old & mask;
>  }
> -EXPORT_SYMBOL(___change_bit);
> +EXPORT_SYMBOL(sp32___change_bit);
>  
>  unsigned long __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
>  {
> diff --git a/include/asm-generic/bitops/instrumented-non-atomic.h b/include/asm-generic/bitops/instrumented-non-atomic.h
> index b019f77ef21c..988a3bbfba34 100644
> --- a/include/asm-generic/bitops/instrumented-non-atomic.h
> +++ b/include/asm-generic/bitops/instrumented-non-atomic.h
> @@ -14,7 +14,7 @@
>  #include <linux/instrumented.h>
>  
>  /**
> - * __set_bit - Set a bit in memory
> + * ___set_bit - Set a bit in memory
>   * @nr: the bit to set
>   * @addr: the address to start counting from
>   *
> @@ -23,14 +23,14 @@
>   * succeeds.
>   */
>  static __always_inline void
> -__set_bit(unsigned long nr, volatile unsigned long *addr)
> +___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___set_bit(nr, addr);
>  }
>  
>  /**
> - * __clear_bit - Clears a bit in memory
> + * ___clear_bit - Clears a bit in memory
>   * @nr: the bit to clear
>   * @addr: the address to start counting from
>   *
> @@ -39,14 +39,14 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
>   * succeeds.
>   */
>  static __always_inline void
> -__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___clear_bit(nr, addr);
>  }
>  
>  /**
> - * __change_bit - Toggle a bit in memory
> + * ___change_bit - Toggle a bit in memory
>   * @nr: the bit to change
>   * @addr: the address to start counting from
>   *
> @@ -55,7 +55,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
>   * succeeds.
>   */
>  static __always_inline void
> -__change_bit(unsigned long nr, volatile unsigned long *addr)
> +___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___change_bit(nr, addr);
> @@ -86,7 +86,7 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
>  }
>  
>  /**
> - * __test_and_set_bit - Set a bit and return its old value
> + * ___test_and_set_bit - Set a bit and return its old value
>   * @nr: Bit to set
>   * @addr: Address to count from
>   *
> @@ -94,14 +94,14 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
>   * can appear to succeed but actually fail.
>   */
>  static __always_inline bool
> -__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_set_bit(nr, addr);
>  }
>  
>  /**
> - * __test_and_clear_bit - Clear a bit and return its old value
> + * ___test_and_clear_bit - Clear a bit and return its old value
>   * @nr: Bit to clear
>   * @addr: Address to count from
>   *
> @@ -109,14 +109,14 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>   * can appear to succeed but actually fail.
>   */
>  static __always_inline bool
> -__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_clear_bit(nr, addr);
>  }
>  
>  /**
> - * __test_and_change_bit - Change a bit and return its old value
> + * ___test_and_change_bit - Change a bit and return its old value
>   * @nr: Bit to change
>   * @addr: Address to count from
>   *
> @@ -124,19 +124,19 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>   * can appear to succeed but actually fail.
>   */
>  static __always_inline bool
> -__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_change_bit(nr, addr);
>  }
>  
>  /**
> - * test_bit - Determine whether a bit is set
> + * _test_bit - Determine whether a bit is set
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
>  static __always_inline bool
> -test_bit(unsigned long nr, const volatile unsigned long *addr)
> +_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long));
>  	return arch_test_bit(nr, addr);
> diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h
> index 7ce0ed22fb5f..0c488331c9e7 100644
> --- a/include/asm-generic/bitops/non-atomic.h
> +++ b/include/asm-generic/bitops/non-atomic.h
> @@ -6,24 +6,24 @@
>  #include <asm/types.h>
>  
>  #define arch___set_bit gen___set_bit
> -#define __set_bit arch___set_bit
> +#define ___set_bit arch___set_bit
>  
>  #define arch___clear_bit gen___clear_bit
> -#define __clear_bit arch___clear_bit
> +#define ___clear_bit arch___clear_bit
>  
>  #define arch___change_bit gen___change_bit
> -#define __change_bit arch___change_bit
> +#define ___change_bit arch___change_bit
>  
>  #define arch___test_and_set_bit gen___test_and_set_bit
> -#define __test_and_set_bit arch___test_and_set_bit
> +#define ___test_and_set_bit arch___test_and_set_bit
>  
>  #define arch___test_and_clear_bit gen___test_and_clear_bit
> -#define __test_and_clear_bit arch___test_and_clear_bit
> +#define ___test_and_clear_bit arch___test_and_clear_bit
>  
>  #define arch___test_and_change_bit gen___test_and_change_bit
> -#define __test_and_change_bit arch___test_and_change_bit
> +#define ___test_and_change_bit arch___test_and_change_bit
>  
>  #define arch_test_bit gen_test_bit
> -#define test_bit arch_test_bit
> +#define _test_bit arch_test_bit
>  
>  #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index 5520ac9b1c24..33cfc836a115 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -26,8 +26,24 @@ extern unsigned int __sw_hweight16(unsigned int w);
>  extern unsigned int __sw_hweight32(unsigned int w);
>  extern unsigned long __sw_hweight64(__u64 w);
>  
> +/*
> + * Defined here because those may be needed by architecture-specific static
> + * inlines.
> + */
> +
>  #include <asm-generic/bitops/generic-non-atomic.h>
>  
> +#define bitop(op, nr, addr)						\
> +	op(nr, addr)
> +
> +#define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
> +#define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
> +#define __change_bit(nr, addr)		bitop(___change_bit, nr, addr)
> +#define __test_and_set_bit(nr, addr)	bitop(___test_and_set_bit, nr, addr)
> +#define __test_and_clear_bit(nr, addr)	bitop(___test_and_clear_bit, nr, addr)
> +#define __test_and_change_bit(nr, addr)	bitop(___test_and_change_bit, nr, addr)
> +#define test_bit(nr, addr)		bitop(_test_bit, nr, addr)
> +
>  /*
>   * Include this here because some architectures need generic_ffs/fls in
>   * scope
> @@ -35,14 +51,14 @@ extern unsigned long __sw_hweight64(__u64 w);
>  #include <asm/bitops.h>
>  
>  /* Check that the bitops prototypes are sane */
> -#define __check_bitop_pr(name)	static_assert(__same_type(name, gen_##name))
> -__check_bitop_pr(__set_bit);
> -__check_bitop_pr(__clear_bit);
> -__check_bitop_pr(__change_bit);
> -__check_bitop_pr(__test_and_set_bit);
> -__check_bitop_pr(__test_and_clear_bit);
> -__check_bitop_pr(__test_and_change_bit);
> -__check_bitop_pr(test_bit);
> +#define __check_bitop_pr(name)	static_assert(__same_type(name, gen##name))
> +__check_bitop_pr(___set_bit);
> +__check_bitop_pr(___clear_bit);
> +__check_bitop_pr(___change_bit);
> +__check_bitop_pr(___test_and_set_bit);
> +__check_bitop_pr(___test_and_clear_bit);
> +__check_bitop_pr(___test_and_change_bit);
> +__check_bitop_pr(_test_bit);
>  #undef __check_bitop_pr
>  
>  static inline int get_bitmask_order(unsigned int count)
> diff --git a/tools/include/asm-generic/bitops/non-atomic.h b/tools/include/asm-generic/bitops/non-atomic.h
> index e5e78e42e57b..0c472a833408 100644
> --- a/tools/include/asm-generic/bitops/non-atomic.h
> +++ b/tools/include/asm-generic/bitops/non-atomic.h
> @@ -5,7 +5,7 @@
>  #include <linux/bits.h>
>  
>  /**
> - * __set_bit - Set a bit in memory
> + * ___set_bit - Set a bit in memory
>   * @nr: the bit to set
>   * @addr: the address to start counting from
>   *
> @@ -14,7 +14,7 @@
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -__set_bit(unsigned long nr, volatile unsigned long *addr)
> +___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -23,7 +23,7 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline void
> -__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -32,7 +32,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __change_bit - Toggle a bit in memory
> + * ___change_bit - Toggle a bit in memory
>   * @nr: the bit to change
>   * @addr: the address to start counting from
>   *
> @@ -41,7 +41,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -__change_bit(unsigned long nr, volatile unsigned long *addr)
> +___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -50,7 +50,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __test_and_set_bit - Set a bit and return its old value
> + * ___test_and_set_bit - Set a bit and return its old value
>   * @nr: Bit to set
>   * @addr: Address to count from
>   *
> @@ -59,7 +59,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
>  static __always_inline bool
> -__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -70,7 +70,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __test_and_clear_bit - Clear a bit and return its old value
> + * ___test_and_clear_bit - Clear a bit and return its old value
>   * @nr: Bit to clear
>   * @addr: Address to count from
>   *
> @@ -79,7 +79,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
>  static __always_inline bool
> -__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -91,7 +91,7 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  
>  /* WARNING: non atomic and it can be reordered! */
>  static __always_inline bool
> -__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -102,12 +102,12 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * test_bit - Determine whether a bit is set
> + * _test_bit - Determine whether a bit is set
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
>  static __always_inline bool
> -test_bit(unsigned long nr, const volatile unsigned long *addr)
> +_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
>  }
> diff --git a/tools/include/linux/bitops.h b/tools/include/linux/bitops.h
> index 5fca38fe1ba8..f18683b95ea6 100644
> --- a/tools/include/linux/bitops.h
> +++ b/tools/include/linux/bitops.h
> @@ -25,6 +25,22 @@ extern unsigned int __sw_hweight16(unsigned int w);
>  extern unsigned int __sw_hweight32(unsigned int w);
>  extern unsigned long __sw_hweight64(__u64 w);
>  
> +/*
> + * Defined here because those may be needed by architecture-specific static
> + * inlines.
> + */
> +
> +#define bitop(op, nr, addr)						\
> +	op(nr, addr)
> +
> +#define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
> +#define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
> +#define __change_bit(nr, addr)		bitop(___change_bit, nr, addr)
> +#define __test_and_set_bit(nr, addr)	bitop(___test_and_set_bit, nr, addr)
> +#define __test_and_clear_bit(nr, addr)	bitop(___test_and_clear_bit, nr, addr)
> +#define __test_and_change_bit(nr, addr)	bitop(___test_and_change_bit, nr, addr)
> +#define test_bit(nr, addr)		bitop(_test_bit, nr, addr)
> +
>  /*
>   * Include this here because some architectures need generic_ffs/fls in
>   * scope
> -- 
> 2.36.1
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 5/6] bitops: wrap non-atomic bitops with a transparent macro
@ 2022-06-06 16:27     ` Mark Rutland
  0 siblings, 0 replies; 75+ messages in thread
From: Mark Rutland @ 2022-06-06 16:27 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

On Mon, Jun 06, 2022 at 01:49:06PM +0200, Alexander Lobakin wrote:
> In preparation for altering the non-atomic bitops with a macro, wrap
> them in a transparent definition. This requires prepending one more
> '_' to their names in order to be able to do that seamlessly.
> sparc32 already has the triple-underscored functions, so I had to
> rename them ('___' -> 'sp32_').

Could we use an 'arch_' prefix here, like we do for the atomics, or is that
already overloaded?

Thanks,
Mark.

> 
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> ---
>  arch/alpha/include/asm/bitops.h               | 14 ++++----
>  arch/hexagon/include/asm/bitops.h             | 16 +++++-----
>  arch/ia64/include/asm/bitops.h                | 26 +++++++--------
>  arch/m68k/include/asm/bitops.h                | 14 ++++----
>  arch/sh/include/asm/bitops-op32.h             | 22 ++++++-------
>  arch/sparc/include/asm/bitops_32.h            | 18 +++++------
>  arch/sparc/lib/atomic32.c                     | 12 +++----
>  .../bitops/instrumented-non-atomic.h          | 28 ++++++++--------
>  include/asm-generic/bitops/non-atomic.h       | 14 ++++----
>  include/linux/bitops.h                        | 32 ++++++++++++++-----
>  tools/include/asm-generic/bitops/non-atomic.h | 24 +++++++-------
>  tools/include/linux/bitops.h                  | 16 ++++++++++
>  12 files changed, 134 insertions(+), 102 deletions(-)
> 
> diff --git a/arch/alpha/include/asm/bitops.h b/arch/alpha/include/asm/bitops.h
> index 381ad2eae4b4..e7d119826062 100644
> --- a/arch/alpha/include/asm/bitops.h
> +++ b/arch/alpha/include/asm/bitops.h
> @@ -47,7 +47,7 @@ set_bit(unsigned long nr, volatile void * addr)
>   * WARNING: non atomic version.
>   */
>  static __always_inline void
> -__set_bit(unsigned long nr, volatile unsigned long *addr)
> +___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	int *m = ((int *) addr) + (nr >> 5);
>  
> @@ -83,7 +83,7 @@ clear_bit_unlock(unsigned long nr, volatile void * addr)
>   * WARNING: non atomic version.
>   */
>  static __always_inline void
> -__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	int *m = ((int *) addr) + (nr >> 5);
>  
> @@ -119,7 +119,7 @@ change_bit(unsigned long nr, volatile void * addr)
>   * WARNING: non atomic version.
>   */
>  static __always_inline void
> -__change_bit(unsigned long nr, volatile unsigned long *addr)
> +___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	int *m = ((int *) addr) + (nr >> 5);
>  
> @@ -187,7 +187,7 @@ test_and_set_bit_lock(unsigned long nr, volatile void *addr)
>   * WARNING: non atomic version.
>   */
>  static __always_inline bool
> -__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = 1 << (nr & 0x1f);
>  	int *m = ((int *) addr) + (nr >> 5);
> @@ -231,7 +231,7 @@ test_and_clear_bit(unsigned long nr, volatile void * addr)
>   * WARNING: non atomic version.
>   */
>  static __always_inline bool
> -__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = 1 << (nr & 0x1f);
>  	int *m = ((int *) addr) + (nr >> 5);
> @@ -273,7 +273,7 @@ test_and_change_bit(unsigned long nr, volatile void * addr)
>   * WARNING: non atomic version.
>   */
>  static __always_inline bool
> -__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = 1 << (nr & 0x1f);
>  	int *m = ((int *) addr) + (nr >> 5);
> @@ -284,7 +284,7 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline bool
> -test_bit(unsigned long nr, const volatile unsigned long *addr)
> +_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return (1UL & (((const int *) addr)[nr >> 5] >> (nr & 31))) != 0UL;
>  }
> diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h
> index a3bfe3a8d4b7..37144f8e0b0c 100644
> --- a/arch/hexagon/include/asm/bitops.h
> +++ b/arch/hexagon/include/asm/bitops.h
> @@ -128,44 +128,44 @@ static inline void change_bit(int nr, volatile void *addr)
>   *
>   */
>  static __always_inline void
> -__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	test_and_clear_bit(nr, addr);
>  }
>  
>  static __always_inline void
> -__set_bit(unsigned long nr, volatile unsigned long *addr)
> +___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	test_and_set_bit(nr, addr);
>  }
>  
>  static __always_inline void
> -__change_bit(unsigned long nr, volatile unsigned long *addr)
> +___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	test_and_change_bit(nr, addr);
>  }
>  
>  /*  Apparently, at least some of these are allowed to be non-atomic  */
>  static __always_inline bool
> -__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_clear_bit(nr, addr);
>  }
>  
>  static __always_inline bool
> -__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_set_bit(nr, addr);
>  }
>  
>  static __always_inline bool
> -__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_change_bit(nr, addr);
>  }
>  
>  static __always_inline bool
> -test_bit(unsigned long nr, const volatile unsigned long *addr)
> +_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	int retval;
>  
> @@ -179,7 +179,7 @@ test_bit(unsigned long nr, const volatile unsigned long *addr)
>  	return retval;
>  }
>  
> -#define __test_bit(nr, addr) test_bit(nr, addr)
> +#define __test_bit(nr, addr) _test_bit(nr, addr)
>  
>  /*
>   * ffz - find first zero in word.
> diff --git a/arch/ia64/include/asm/bitops.h b/arch/ia64/include/asm/bitops.h
> index 4267a217a503..3cb231fb0283 100644
> --- a/arch/ia64/include/asm/bitops.h
> +++ b/arch/ia64/include/asm/bitops.h
> @@ -53,7 +53,7 @@ set_bit (int nr, volatile void *addr)
>  }
>  
>  /**
> - * __set_bit - Set a bit in memory
> + * ___set_bit - Set a bit in memory
>   * @nr: the bit to set
>   * @addr: the address to start counting from
>   *
> @@ -62,7 +62,7 @@ set_bit (int nr, volatile void *addr)
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -__set_bit(unsigned long nr, volatile unsigned long *addr)
> +___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	*((__u32 *) addr + (nr >> 5)) |= (1 << (nr & 31));
>  }
> @@ -135,7 +135,7 @@ __clear_bit_unlock(int nr, void *addr)
>  }
>  
>  /**
> - * __clear_bit - Clears a bit in memory (non-atomic version)
> + * ___clear_bit - Clears a bit in memory (non-atomic version)
>   * @nr: the bit to clear
>   * @addr: the address to start counting from
>   *
> @@ -144,7 +144,7 @@ __clear_bit_unlock(int nr, void *addr)
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	*((__u32 *) addr + (nr >> 5)) &= ~(1 << (nr & 31));
>  }
> @@ -175,7 +175,7 @@ change_bit (int nr, volatile void *addr)
>  }
>  
>  /**
> - * __change_bit - Toggle a bit in memory
> + * ___change_bit - Toggle a bit in memory
>   * @nr: the bit to toggle
>   * @addr: the address to start counting from
>   *
> @@ -184,7 +184,7 @@ change_bit (int nr, volatile void *addr)
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -__change_bit(unsigned long nr, volatile unsigned long *addr)
> +___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	*((__u32 *) addr + (nr >> 5)) ^= (1 << (nr & 31));
>  }
> @@ -224,7 +224,7 @@ test_and_set_bit (int nr, volatile void *addr)
>  #define test_and_set_bit_lock test_and_set_bit
>  
>  /**
> - * __test_and_set_bit - Set a bit and return its old value
> + * ___test_and_set_bit - Set a bit and return its old value
>   * @nr: Bit to set
>   * @addr: Address to count from
>   *
> @@ -233,7 +233,7 @@ test_and_set_bit (int nr, volatile void *addr)
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
>  static __always_inline bool
> -__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__u32 *p = (__u32 *) addr + (nr >> 5);
>  	__u32 m = 1 << (nr & 31);
> @@ -269,7 +269,7 @@ test_and_clear_bit (int nr, volatile void *addr)
>  }
>  
>  /**
> - * __test_and_clear_bit - Clear a bit and return its old value
> + * ___test_and_clear_bit - Clear a bit and return its old value
>   * @nr: Bit to clear
>   * @addr: Address to count from
>   *
> @@ -278,7 +278,7 @@ test_and_clear_bit (int nr, volatile void *addr)
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
>  static __always_inline bool
> -__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__u32 *p = (__u32 *) addr + (nr >> 5);
>  	__u32 m = 1 << (nr & 31);
> @@ -314,14 +314,14 @@ test_and_change_bit (int nr, volatile void *addr)
>  }
>  
>  /**
> - * __test_and_change_bit - Change a bit and return its old value
> + * ___test_and_change_bit - Change a bit and return its old value
>   * @nr: Bit to change
>   * @addr: Address to count from
>   *
>   * This operation is non-atomic and can be reordered.
>   */
>  static __always_inline bool
> -__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__u32 old, bit = (1 << (nr & 31));
>  	__u32 *m = (__u32 *) addr + (nr >> 5);
> @@ -332,7 +332,7 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline bool
> -test_bit(unsigned long nr, const volatile unsigned long *addr)
> +_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return 1 & (((const volatile __u32 *) addr)[nr >> 5] >> (nr & 31));
>  }
> diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
> index 9d44bd4713cb..ed08c9d9c547 100644
> --- a/arch/m68k/include/asm/bitops.h
> +++ b/arch/m68k/include/asm/bitops.h
> @@ -66,7 +66,7 @@ static inline void bfset_mem_set_bit(int nr, volatile unsigned long *vaddr)
>  #endif
>  
>  static __always_inline void
> -__set_bit(unsigned long nr, volatile unsigned long *addr)
> +___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	set_bit(nr, addr);
>  }
> @@ -109,7 +109,7 @@ static inline void bfclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
>  #endif
>  
>  static __always_inline void
> -__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	clear_bit(nr, addr);
>  }
> @@ -152,13 +152,13 @@ static inline void bfchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
>  #endif
>  
>  static __always_inline void
> -__change_bit(unsigned long nr, volatile unsigned long *addr)
> +___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	change_bit(nr, addr);
>  }
>  
>  static __always_inline bool
> -test_bit(unsigned long nr, const volatile unsigned long *addr)
> +_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return (addr[nr >> 5] & (1UL << (nr & 31))) != 0;
>  }
> @@ -212,7 +212,7 @@ static inline int bfset_mem_test_and_set_bit(int nr,
>  #endif
>  
>  static __always_inline bool
> -__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_set_bit(nr, addr);
>  }
> @@ -265,7 +265,7 @@ static inline int bfclr_mem_test_and_clear_bit(int nr,
>  #endif
>  
>  static __always_inline bool
> -__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_clear_bit(nr, addr);
>  }
> @@ -318,7 +318,7 @@ static inline int bfchg_mem_test_and_change_bit(int nr,
>  #endif
>  
>  static __always_inline bool
> -__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_change_bit(nr, addr);
>  }
> diff --git a/arch/sh/include/asm/bitops-op32.h b/arch/sh/include/asm/bitops-op32.h
> index dcd85866a394..24b8a809c1b9 100644
> --- a/arch/sh/include/asm/bitops-op32.h
> +++ b/arch/sh/include/asm/bitops-op32.h
> @@ -19,7 +19,7 @@
>  #endif
>  
>  static __always_inline void
> -__set_bit(unsigned long nr, volatile unsigned long *addr)
> +___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	if (__builtin_constant_p(nr)) {
>  		__asm__ __volatile__ (
> @@ -37,7 +37,7 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline void
> -__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	if (__builtin_constant_p(nr)) {
>  		__asm__ __volatile__ (
> @@ -56,7 +56,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __change_bit - Toggle a bit in memory
> + * ___change_bit - Toggle a bit in memory
>   * @nr: the bit to change
>   * @addr: the address to start counting from
>   *
> @@ -65,7 +65,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -__change_bit(unsigned long nr, volatile unsigned long *addr)
> +___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	if (__builtin_constant_p(nr)) {
>  		__asm__ __volatile__ (
> @@ -84,7 +84,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __test_and_set_bit - Set a bit and return its old value
> + * ___test_and_set_bit - Set a bit and return its old value
>   * @nr: Bit to set
>   * @addr: Address to count from
>   *
> @@ -93,7 +93,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
>  static __always_inline bool
> -__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -104,7 +104,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __test_and_clear_bit - Clear a bit and return its old value
> + * ___test_and_clear_bit - Clear a bit and return its old value
>   * @nr: Bit to clear
>   * @addr: Address to count from
>   *
> @@ -113,7 +113,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
>  static __always_inline bool
> -__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -125,7 +125,7 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  
>  /* WARNING: non atomic and it can be reordered! */
>  static __always_inline bool
> -__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -136,12 +136,12 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * test_bit - Determine whether a bit is set
> + * _test_bit - Determine whether a bit is set
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
>  static __always_inline bool
> -test_bit(unsigned long nr, const volatile unsigned long *addr)
> +_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
>  }
> diff --git a/arch/sparc/include/asm/bitops_32.h b/arch/sparc/include/asm/bitops_32.h
> index 889afa9f990f..3448c191b484 100644
> --- a/arch/sparc/include/asm/bitops_32.h
> +++ b/arch/sparc/include/asm/bitops_32.h
> @@ -19,9 +19,9 @@
>  #error only <linux/bitops.h> can be included directly
>  #endif
>  
> -unsigned long ___set_bit(unsigned long *addr, unsigned long mask);
> -unsigned long ___clear_bit(unsigned long *addr, unsigned long mask);
> -unsigned long ___change_bit(unsigned long *addr, unsigned long mask);
> +unsigned long sp32___set_bit(unsigned long *addr, unsigned long mask);
> +unsigned long sp32___clear_bit(unsigned long *addr, unsigned long mask);
> +unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask);
>  
>  /*
>   * Set bit 'nr' in 32-bit quantity at address 'addr' where bit '0'
> @@ -36,7 +36,7 @@ static inline int test_and_set_bit(unsigned long nr, volatile unsigned long *add
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	return ___set_bit(ADDR, mask) != 0;
> +	return sp32___set_bit(ADDR, mask) != 0;
>  }
>  
>  static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -46,7 +46,7 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	(void) ___set_bit(ADDR, mask);
> +	(void) sp32___set_bit(ADDR, mask);
>  }
>  
>  static inline int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -56,7 +56,7 @@ static inline int test_and_clear_bit(unsigned long nr, volatile unsigned long *a
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	return ___clear_bit(ADDR, mask) != 0;
> +	return sp32___clear_bit(ADDR, mask) != 0;
>  }
>  
>  static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -66,7 +66,7 @@ static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	(void) ___clear_bit(ADDR, mask);
> +	(void) sp32___clear_bit(ADDR, mask);
>  }
>  
>  static inline int test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -76,7 +76,7 @@ static inline int test_and_change_bit(unsigned long nr, volatile unsigned long *
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	return ___change_bit(ADDR, mask) != 0;
> +	return sp32___change_bit(ADDR, mask) != 0;
>  }
>  
>  static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -86,7 +86,7 @@ static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	(void) ___change_bit(ADDR, mask);
> +	(void) sp32___change_bit(ADDR, mask);
>  }
>  
>  #include <asm-generic/bitops/non-atomic.h>
> diff --git a/arch/sparc/lib/atomic32.c b/arch/sparc/lib/atomic32.c
> index 8b81d0f00c97..cf80d1ae352b 100644
> --- a/arch/sparc/lib/atomic32.c
> +++ b/arch/sparc/lib/atomic32.c
> @@ -120,7 +120,7 @@ void arch_atomic_set(atomic_t *v, int i)
>  }
>  EXPORT_SYMBOL(arch_atomic_set);
>  
> -unsigned long ___set_bit(unsigned long *addr, unsigned long mask)
> +unsigned long sp32___set_bit(unsigned long *addr, unsigned long mask)
>  {
>  	unsigned long old, flags;
>  
> @@ -131,9 +131,9 @@ unsigned long ___set_bit(unsigned long *addr, unsigned long mask)
>  
>  	return old & mask;
>  }
> -EXPORT_SYMBOL(___set_bit);
> +EXPORT_SYMBOL(sp32___set_bit);
>  
> -unsigned long ___clear_bit(unsigned long *addr, unsigned long mask)
> +unsigned long sp32___clear_bit(unsigned long *addr, unsigned long mask)
>  {
>  	unsigned long old, flags;
>  
> @@ -144,9 +144,9 @@ unsigned long ___clear_bit(unsigned long *addr, unsigned long mask)
>  
>  	return old & mask;
>  }
> -EXPORT_SYMBOL(___clear_bit);
> +EXPORT_SYMBOL(sp32___clear_bit);
>  
> -unsigned long ___change_bit(unsigned long *addr, unsigned long mask)
> +unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask)
>  {
>  	unsigned long old, flags;
>  
> @@ -157,7 +157,7 @@ unsigned long ___change_bit(unsigned long *addr, unsigned long mask)
>  
>  	return old & mask;
>  }
> -EXPORT_SYMBOL(___change_bit);
> +EXPORT_SYMBOL(sp32___change_bit);
>  
>  unsigned long __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
>  {
> diff --git a/include/asm-generic/bitops/instrumented-non-atomic.h b/include/asm-generic/bitops/instrumented-non-atomic.h
> index b019f77ef21c..988a3bbfba34 100644
> --- a/include/asm-generic/bitops/instrumented-non-atomic.h
> +++ b/include/asm-generic/bitops/instrumented-non-atomic.h
> @@ -14,7 +14,7 @@
>  #include <linux/instrumented.h>
>  
>  /**
> - * __set_bit - Set a bit in memory
> + * ___set_bit - Set a bit in memory
>   * @nr: the bit to set
>   * @addr: the address to start counting from
>   *
> @@ -23,14 +23,14 @@
>   * succeeds.
>   */
>  static __always_inline void
> -__set_bit(unsigned long nr, volatile unsigned long *addr)
> +___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___set_bit(nr, addr);
>  }
>  
>  /**
> - * __clear_bit - Clears a bit in memory
> + * ___clear_bit - Clears a bit in memory
>   * @nr: the bit to clear
>   * @addr: the address to start counting from
>   *
> @@ -39,14 +39,14 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
>   * succeeds.
>   */
>  static __always_inline void
> -__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___clear_bit(nr, addr);
>  }
>  
>  /**
> - * __change_bit - Toggle a bit in memory
> + * ___change_bit - Toggle a bit in memory
>   * @nr: the bit to change
>   * @addr: the address to start counting from
>   *
> @@ -55,7 +55,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
>   * succeeds.
>   */
>  static __always_inline void
> -__change_bit(unsigned long nr, volatile unsigned long *addr)
> +___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___change_bit(nr, addr);
> @@ -86,7 +86,7 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
>  }
>  
>  /**
> - * __test_and_set_bit - Set a bit and return its old value
> + * ___test_and_set_bit - Set a bit and return its old value
>   * @nr: Bit to set
>   * @addr: Address to count from
>   *
> @@ -94,14 +94,14 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
>   * can appear to succeed but actually fail.
>   */
>  static __always_inline bool
> -__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_set_bit(nr, addr);
>  }
>  
>  /**
> - * __test_and_clear_bit - Clear a bit and return its old value
> + * ___test_and_clear_bit - Clear a bit and return its old value
>   * @nr: Bit to clear
>   * @addr: Address to count from
>   *
> @@ -109,14 +109,14 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>   * can appear to succeed but actually fail.
>   */
>  static __always_inline bool
> -__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_clear_bit(nr, addr);
>  }
>  
>  /**
> - * __test_and_change_bit - Change a bit and return its old value
> + * ___test_and_change_bit - Change a bit and return its old value
>   * @nr: Bit to change
>   * @addr: Address to count from
>   *
> @@ -124,19 +124,19 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>   * can appear to succeed but actually fail.
>   */
>  static __always_inline bool
> -__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_change_bit(nr, addr);
>  }
>  
>  /**
> - * test_bit - Determine whether a bit is set
> + * _test_bit - Determine whether a bit is set
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
>  static __always_inline bool
> -test_bit(unsigned long nr, const volatile unsigned long *addr)
> +_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long));
>  	return arch_test_bit(nr, addr);
> diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h
> index 7ce0ed22fb5f..0c488331c9e7 100644
> --- a/include/asm-generic/bitops/non-atomic.h
> +++ b/include/asm-generic/bitops/non-atomic.h
> @@ -6,24 +6,24 @@
>  #include <asm/types.h>
>  
>  #define arch___set_bit gen___set_bit
> -#define __set_bit arch___set_bit
> +#define ___set_bit arch___set_bit
>  
>  #define arch___clear_bit gen___clear_bit
> -#define __clear_bit arch___clear_bit
> +#define ___clear_bit arch___clear_bit
>  
>  #define arch___change_bit gen___change_bit
> -#define __change_bit arch___change_bit
> +#define ___change_bit arch___change_bit
>  
>  #define arch___test_and_set_bit gen___test_and_set_bit
> -#define __test_and_set_bit arch___test_and_set_bit
> +#define ___test_and_set_bit arch___test_and_set_bit
>  
>  #define arch___test_and_clear_bit gen___test_and_clear_bit
> -#define __test_and_clear_bit arch___test_and_clear_bit
> +#define ___test_and_clear_bit arch___test_and_clear_bit
>  
>  #define arch___test_and_change_bit gen___test_and_change_bit
> -#define __test_and_change_bit arch___test_and_change_bit
> +#define ___test_and_change_bit arch___test_and_change_bit
>  
>  #define arch_test_bit gen_test_bit
> -#define test_bit arch_test_bit
> +#define _test_bit arch_test_bit
>  
>  #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index 5520ac9b1c24..33cfc836a115 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -26,8 +26,24 @@ extern unsigned int __sw_hweight16(unsigned int w);
>  extern unsigned int __sw_hweight32(unsigned int w);
>  extern unsigned long __sw_hweight64(__u64 w);
>  
> +/*
> + * Defined here because those may be needed by architecture-specific static
> + * inlines.
> + */
> +
>  #include <asm-generic/bitops/generic-non-atomic.h>
>  
> +#define bitop(op, nr, addr)						\
> +	op(nr, addr)
> +
> +#define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
> +#define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
> +#define __change_bit(nr, addr)		bitop(___change_bit, nr, addr)
> +#define __test_and_set_bit(nr, addr)	bitop(___test_and_set_bit, nr, addr)
> +#define __test_and_clear_bit(nr, addr)	bitop(___test_and_clear_bit, nr, addr)
> +#define __test_and_change_bit(nr, addr)	bitop(___test_and_change_bit, nr, addr)
> +#define test_bit(nr, addr)		bitop(_test_bit, nr, addr)
> +
>  /*
>   * Include this here because some architectures need generic_ffs/fls in
>   * scope
> @@ -35,14 +51,14 @@ extern unsigned long __sw_hweight64(__u64 w);
>  #include <asm/bitops.h>
>  
>  /* Check that the bitops prototypes are sane */
> -#define __check_bitop_pr(name)	static_assert(__same_type(name, gen_##name))
> -__check_bitop_pr(__set_bit);
> -__check_bitop_pr(__clear_bit);
> -__check_bitop_pr(__change_bit);
> -__check_bitop_pr(__test_and_set_bit);
> -__check_bitop_pr(__test_and_clear_bit);
> -__check_bitop_pr(__test_and_change_bit);
> -__check_bitop_pr(test_bit);
> +#define __check_bitop_pr(name)	static_assert(__same_type(name, gen##name))
> +__check_bitop_pr(___set_bit);
> +__check_bitop_pr(___clear_bit);
> +__check_bitop_pr(___change_bit);
> +__check_bitop_pr(___test_and_set_bit);
> +__check_bitop_pr(___test_and_clear_bit);
> +__check_bitop_pr(___test_and_change_bit);
> +__check_bitop_pr(_test_bit);
>  #undef __check_bitop_pr
>  
>  static inline int get_bitmask_order(unsigned int count)
> diff --git a/tools/include/asm-generic/bitops/non-atomic.h b/tools/include/asm-generic/bitops/non-atomic.h
> index e5e78e42e57b..0c472a833408 100644
> --- a/tools/include/asm-generic/bitops/non-atomic.h
> +++ b/tools/include/asm-generic/bitops/non-atomic.h
> @@ -5,7 +5,7 @@
>  #include <linux/bits.h>
>  
>  /**
> - * __set_bit - Set a bit in memory
> + * ___set_bit - Set a bit in memory
>   * @nr: the bit to set
>   * @addr: the address to start counting from
>   *
> @@ -14,7 +14,7 @@
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -__set_bit(unsigned long nr, volatile unsigned long *addr)
> +___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -23,7 +23,7 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline void
> -__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -32,7 +32,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __change_bit - Toggle a bit in memory
> + * ___change_bit - Toggle a bit in memory
>   * @nr: the bit to change
>   * @addr: the address to start counting from
>   *
> @@ -41,7 +41,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -__change_bit(unsigned long nr, volatile unsigned long *addr)
> +___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -50,7 +50,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __test_and_set_bit - Set a bit and return its old value
> + * ___test_and_set_bit - Set a bit and return its old value
>   * @nr: Bit to set
>   * @addr: Address to count from
>   *
> @@ -59,7 +59,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
>  static __always_inline bool
> -__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -70,7 +70,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __test_and_clear_bit - Clear a bit and return its old value
> + * ___test_and_clear_bit - Clear a bit and return its old value
>   * @nr: Bit to clear
>   * @addr: Address to count from
>   *
> @@ -79,7 +79,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
>  static __always_inline bool
> -__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -91,7 +91,7 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  
>  /* WARNING: non atomic and it can be reordered! */
>  static __always_inline bool
> -__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -102,12 +102,12 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * test_bit - Determine whether a bit is set
> + * _test_bit - Determine whether a bit is set
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
>  static __always_inline bool
> -test_bit(unsigned long nr, const volatile unsigned long *addr)
> +_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
>  }
> diff --git a/tools/include/linux/bitops.h b/tools/include/linux/bitops.h
> index 5fca38fe1ba8..f18683b95ea6 100644
> --- a/tools/include/linux/bitops.h
> +++ b/tools/include/linux/bitops.h
> @@ -25,6 +25,22 @@ extern unsigned int __sw_hweight16(unsigned int w);
>  extern unsigned int __sw_hweight32(unsigned int w);
>  extern unsigned long __sw_hweight64(__u64 w);
>  
> +/*
> + * Defined here because those may be needed by architecture-specific static
> + * inlines.
> + */
> +
> +#define bitop(op, nr, addr)						\
> +	op(nr, addr)
> +
> +#define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
> +#define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
> +#define __change_bit(nr, addr)		bitop(___change_bit, nr, addr)
> +#define __test_and_set_bit(nr, addr)	bitop(___test_and_set_bit, nr, addr)
> +#define __test_and_clear_bit(nr, addr)	bitop(___test_and_clear_bit, nr, addr)
> +#define __test_and_change_bit(nr, addr)	bitop(___test_and_change_bit, nr, addr)
> +#define test_bit(nr, addr)		bitop(_test_bit, nr, addr)
> +
>  /*
>   * Include this here because some architectures need generic_ffs/fls in
>   * scope
> -- 
> 2.36.1
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 5/6] bitops: wrap non-atomic bitops with a transparent macro
@ 2022-06-06 16:27     ` Mark Rutland
  0 siblings, 0 replies; 75+ messages in thread
From: Mark Rutland @ 2022-06-06 16:27 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh

On Mon, Jun 06, 2022 at 01:49:06PM +0200, Alexander Lobakin wrote:
> In preparation for altering the non-atomic bitops with a macro, wrap
> them in a transparent definition. This requires prepending one more
> '_' to their names in order to be able to do that seamlessly.
> sparc32 already has the triple-underscored functions, so I had to
> rename them ('___' -> 'sp32_').

Could we use an 'arch_' prefix here, like we do for the atomics, or is that
already overloaded?

Thanks,
Mark.

> 
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> ---
>  arch/alpha/include/asm/bitops.h               | 14 ++++----
>  arch/hexagon/include/asm/bitops.h             | 16 +++++-----
>  arch/ia64/include/asm/bitops.h                | 26 +++++++--------
>  arch/m68k/include/asm/bitops.h                | 14 ++++----
>  arch/sh/include/asm/bitops-op32.h             | 22 ++++++-------
>  arch/sparc/include/asm/bitops_32.h            | 18 +++++------
>  arch/sparc/lib/atomic32.c                     | 12 +++----
>  .../bitops/instrumented-non-atomic.h          | 28 ++++++++--------
>  include/asm-generic/bitops/non-atomic.h       | 14 ++++----
>  include/linux/bitops.h                        | 32 ++++++++++++++-----
>  tools/include/asm-generic/bitops/non-atomic.h | 24 +++++++-------
>  tools/include/linux/bitops.h                  | 16 ++++++++++
>  12 files changed, 134 insertions(+), 102 deletions(-)
> 
> diff --git a/arch/alpha/include/asm/bitops.h b/arch/alpha/include/asm/bitops.h
> index 381ad2eae4b4..e7d119826062 100644
> --- a/arch/alpha/include/asm/bitops.h
> +++ b/arch/alpha/include/asm/bitops.h
> @@ -47,7 +47,7 @@ set_bit(unsigned long nr, volatile void * addr)
>   * WARNING: non atomic version.
>   */
>  static __always_inline void
> -__set_bit(unsigned long nr, volatile unsigned long *addr)
> +___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	int *m = ((int *) addr) + (nr >> 5);
>  
> @@ -83,7 +83,7 @@ clear_bit_unlock(unsigned long nr, volatile void * addr)
>   * WARNING: non atomic version.
>   */
>  static __always_inline void
> -__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	int *m = ((int *) addr) + (nr >> 5);
>  
> @@ -119,7 +119,7 @@ change_bit(unsigned long nr, volatile void * addr)
>   * WARNING: non atomic version.
>   */
>  static __always_inline void
> -__change_bit(unsigned long nr, volatile unsigned long *addr)
> +___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	int *m = ((int *) addr) + (nr >> 5);
>  
> @@ -187,7 +187,7 @@ test_and_set_bit_lock(unsigned long nr, volatile void *addr)
>   * WARNING: non atomic version.
>   */
>  static __always_inline bool
> -__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = 1 << (nr & 0x1f);
>  	int *m = ((int *) addr) + (nr >> 5);
> @@ -231,7 +231,7 @@ test_and_clear_bit(unsigned long nr, volatile void * addr)
>   * WARNING: non atomic version.
>   */
>  static __always_inline bool
> -__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = 1 << (nr & 0x1f);
>  	int *m = ((int *) addr) + (nr >> 5);
> @@ -273,7 +273,7 @@ test_and_change_bit(unsigned long nr, volatile void * addr)
>   * WARNING: non atomic version.
>   */
>  static __always_inline bool
> -__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = 1 << (nr & 0x1f);
>  	int *m = ((int *) addr) + (nr >> 5);
> @@ -284,7 +284,7 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline bool
> -test_bit(unsigned long nr, const volatile unsigned long *addr)
> +_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return (1UL & (((const int *) addr)[nr >> 5] >> (nr & 31))) != 0UL;
>  }
> diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h
> index a3bfe3a8d4b7..37144f8e0b0c 100644
> --- a/arch/hexagon/include/asm/bitops.h
> +++ b/arch/hexagon/include/asm/bitops.h
> @@ -128,44 +128,44 @@ static inline void change_bit(int nr, volatile void *addr)
>   *
>   */
>  static __always_inline void
> -__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	test_and_clear_bit(nr, addr);
>  }
>  
>  static __always_inline void
> -__set_bit(unsigned long nr, volatile unsigned long *addr)
> +___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	test_and_set_bit(nr, addr);
>  }
>  
>  static __always_inline void
> -__change_bit(unsigned long nr, volatile unsigned long *addr)
> +___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	test_and_change_bit(nr, addr);
>  }
>  
>  /*  Apparently, at least some of these are allowed to be non-atomic  */
>  static __always_inline bool
> -__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_clear_bit(nr, addr);
>  }
>  
>  static __always_inline bool
> -__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_set_bit(nr, addr);
>  }
>  
>  static __always_inline bool
> -__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_change_bit(nr, addr);
>  }
>  
>  static __always_inline bool
> -test_bit(unsigned long nr, const volatile unsigned long *addr)
> +_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	int retval;
>  
> @@ -179,7 +179,7 @@ test_bit(unsigned long nr, const volatile unsigned long *addr)
>  	return retval;
>  }
>  
> -#define __test_bit(nr, addr) test_bit(nr, addr)
> +#define __test_bit(nr, addr) _test_bit(nr, addr)
>  
>  /*
>   * ffz - find first zero in word.
> diff --git a/arch/ia64/include/asm/bitops.h b/arch/ia64/include/asm/bitops.h
> index 4267a217a503..3cb231fb0283 100644
> --- a/arch/ia64/include/asm/bitops.h
> +++ b/arch/ia64/include/asm/bitops.h
> @@ -53,7 +53,7 @@ set_bit (int nr, volatile void *addr)
>  }
>  
>  /**
> - * __set_bit - Set a bit in memory
> + * ___set_bit - Set a bit in memory
>   * @nr: the bit to set
>   * @addr: the address to start counting from
>   *
> @@ -62,7 +62,7 @@ set_bit (int nr, volatile void *addr)
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -__set_bit(unsigned long nr, volatile unsigned long *addr)
> +___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	*((__u32 *) addr + (nr >> 5)) |= (1 << (nr & 31));
>  }
> @@ -135,7 +135,7 @@ __clear_bit_unlock(int nr, void *addr)
>  }
>  
>  /**
> - * __clear_bit - Clears a bit in memory (non-atomic version)
> + * ___clear_bit - Clears a bit in memory (non-atomic version)
>   * @nr: the bit to clear
>   * @addr: the address to start counting from
>   *
> @@ -144,7 +144,7 @@ __clear_bit_unlock(int nr, void *addr)
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	*((__u32 *) addr + (nr >> 5)) &= ~(1 << (nr & 31));
>  }
> @@ -175,7 +175,7 @@ change_bit (int nr, volatile void *addr)
>  }
>  
>  /**
> - * __change_bit - Toggle a bit in memory
> + * ___change_bit - Toggle a bit in memory
>   * @nr: the bit to toggle
>   * @addr: the address to start counting from
>   *
> @@ -184,7 +184,7 @@ change_bit (int nr, volatile void *addr)
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -__change_bit(unsigned long nr, volatile unsigned long *addr)
> +___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	*((__u32 *) addr + (nr >> 5)) ^= (1 << (nr & 31));
>  }
> @@ -224,7 +224,7 @@ test_and_set_bit (int nr, volatile void *addr)
>  #define test_and_set_bit_lock test_and_set_bit
>  
>  /**
> - * __test_and_set_bit - Set a bit and return its old value
> + * ___test_and_set_bit - Set a bit and return its old value
>   * @nr: Bit to set
>   * @addr: Address to count from
>   *
> @@ -233,7 +233,7 @@ test_and_set_bit (int nr, volatile void *addr)
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
>  static __always_inline bool
> -__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__u32 *p = (__u32 *) addr + (nr >> 5);
>  	__u32 m = 1 << (nr & 31);
> @@ -269,7 +269,7 @@ test_and_clear_bit (int nr, volatile void *addr)
>  }
>  
>  /**
> - * __test_and_clear_bit - Clear a bit and return its old value
> + * ___test_and_clear_bit - Clear a bit and return its old value
>   * @nr: Bit to clear
>   * @addr: Address to count from
>   *
> @@ -278,7 +278,7 @@ test_and_clear_bit (int nr, volatile void *addr)
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
>  static __always_inline bool
> -__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__u32 *p = (__u32 *) addr + (nr >> 5);
>  	__u32 m = 1 << (nr & 31);
> @@ -314,14 +314,14 @@ test_and_change_bit (int nr, volatile void *addr)
>  }
>  
>  /**
> - * __test_and_change_bit - Change a bit and return its old value
> + * ___test_and_change_bit - Change a bit and return its old value
>   * @nr: Bit to change
>   * @addr: Address to count from
>   *
>   * This operation is non-atomic and can be reordered.
>   */
>  static __always_inline bool
> -__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__u32 old, bit = (1 << (nr & 31));
>  	__u32 *m = (__u32 *) addr + (nr >> 5);
> @@ -332,7 +332,7 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline bool
> -test_bit(unsigned long nr, const volatile unsigned long *addr)
> +_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return 1 & (((const volatile __u32 *) addr)[nr >> 5] >> (nr & 31));
>  }
> diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
> index 9d44bd4713cb..ed08c9d9c547 100644
> --- a/arch/m68k/include/asm/bitops.h
> +++ b/arch/m68k/include/asm/bitops.h
> @@ -66,7 +66,7 @@ static inline void bfset_mem_set_bit(int nr, volatile unsigned long *vaddr)
>  #endif
>  
>  static __always_inline void
> -__set_bit(unsigned long nr, volatile unsigned long *addr)
> +___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	set_bit(nr, addr);
>  }
> @@ -109,7 +109,7 @@ static inline void bfclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
>  #endif
>  
>  static __always_inline void
> -__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	clear_bit(nr, addr);
>  }
> @@ -152,13 +152,13 @@ static inline void bfchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
>  #endif
>  
>  static __always_inline void
> -__change_bit(unsigned long nr, volatile unsigned long *addr)
> +___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	change_bit(nr, addr);
>  }
>  
>  static __always_inline bool
> -test_bit(unsigned long nr, const volatile unsigned long *addr)
> +_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return (addr[nr >> 5] & (1UL << (nr & 31))) != 0;
>  }
> @@ -212,7 +212,7 @@ static inline int bfset_mem_test_and_set_bit(int nr,
>  #endif
>  
>  static __always_inline bool
> -__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_set_bit(nr, addr);
>  }
> @@ -265,7 +265,7 @@ static inline int bfclr_mem_test_and_clear_bit(int nr,
>  #endif
>  
>  static __always_inline bool
> -__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_clear_bit(nr, addr);
>  }
> @@ -318,7 +318,7 @@ static inline int bfchg_mem_test_and_change_bit(int nr,
>  #endif
>  
>  static __always_inline bool
> -__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_change_bit(nr, addr);
>  }
> diff --git a/arch/sh/include/asm/bitops-op32.h b/arch/sh/include/asm/bitops-op32.h
> index dcd85866a394..24b8a809c1b9 100644
> --- a/arch/sh/include/asm/bitops-op32.h
> +++ b/arch/sh/include/asm/bitops-op32.h
> @@ -19,7 +19,7 @@
>  #endif
>  
>  static __always_inline void
> -__set_bit(unsigned long nr, volatile unsigned long *addr)
> +___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	if (__builtin_constant_p(nr)) {
>  		__asm__ __volatile__ (
> @@ -37,7 +37,7 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline void
> -__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	if (__builtin_constant_p(nr)) {
>  		__asm__ __volatile__ (
> @@ -56,7 +56,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __change_bit - Toggle a bit in memory
> + * ___change_bit - Toggle a bit in memory
>   * @nr: the bit to change
>   * @addr: the address to start counting from
>   *
> @@ -65,7 +65,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -__change_bit(unsigned long nr, volatile unsigned long *addr)
> +___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	if (__builtin_constant_p(nr)) {
>  		__asm__ __volatile__ (
> @@ -84,7 +84,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __test_and_set_bit - Set a bit and return its old value
> + * ___test_and_set_bit - Set a bit and return its old value
>   * @nr: Bit to set
>   * @addr: Address to count from
>   *
> @@ -93,7 +93,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
>  static __always_inline bool
> -__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -104,7 +104,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __test_and_clear_bit - Clear a bit and return its old value
> + * ___test_and_clear_bit - Clear a bit and return its old value
>   * @nr: Bit to clear
>   * @addr: Address to count from
>   *
> @@ -113,7 +113,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
>  static __always_inline bool
> -__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -125,7 +125,7 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  
>  /* WARNING: non atomic and it can be reordered! */
>  static __always_inline bool
> -__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -136,12 +136,12 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * test_bit - Determine whether a bit is set
> + * _test_bit - Determine whether a bit is set
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
>  static __always_inline bool
> -test_bit(unsigned long nr, const volatile unsigned long *addr)
> +_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
>  }
> diff --git a/arch/sparc/include/asm/bitops_32.h b/arch/sparc/include/asm/bitops_32.h
> index 889afa9f990f..3448c191b484 100644
> --- a/arch/sparc/include/asm/bitops_32.h
> +++ b/arch/sparc/include/asm/bitops_32.h
> @@ -19,9 +19,9 @@
>  #error only <linux/bitops.h> can be included directly
>  #endif
>  
> -unsigned long ___set_bit(unsigned long *addr, unsigned long mask);
> -unsigned long ___clear_bit(unsigned long *addr, unsigned long mask);
> -unsigned long ___change_bit(unsigned long *addr, unsigned long mask);
> +unsigned long sp32___set_bit(unsigned long *addr, unsigned long mask);
> +unsigned long sp32___clear_bit(unsigned long *addr, unsigned long mask);
> +unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask);
>  
>  /*
>   * Set bit 'nr' in 32-bit quantity at address 'addr' where bit '0'
> @@ -36,7 +36,7 @@ static inline int test_and_set_bit(unsigned long nr, volatile unsigned long *add
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	return ___set_bit(ADDR, mask) != 0;
> +	return sp32___set_bit(ADDR, mask) != 0;
>  }
>  
>  static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -46,7 +46,7 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	(void) ___set_bit(ADDR, mask);
> +	(void) sp32___set_bit(ADDR, mask);
>  }
>  
>  static inline int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -56,7 +56,7 @@ static inline int test_and_clear_bit(unsigned long nr, volatile unsigned long *a
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	return ___clear_bit(ADDR, mask) != 0;
> +	return sp32___clear_bit(ADDR, mask) != 0;
>  }
>  
>  static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -66,7 +66,7 @@ static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	(void) ___clear_bit(ADDR, mask);
> +	(void) sp32___clear_bit(ADDR, mask);
>  }
>  
>  static inline int test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -76,7 +76,7 @@ static inline int test_and_change_bit(unsigned long nr, volatile unsigned long *
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	return ___change_bit(ADDR, mask) != 0;
> +	return sp32___change_bit(ADDR, mask) != 0;
>  }
>  
>  static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -86,7 +86,7 @@ static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	(void) ___change_bit(ADDR, mask);
> +	(void) sp32___change_bit(ADDR, mask);
>  }
>  
>  #include <asm-generic/bitops/non-atomic.h>
> diff --git a/arch/sparc/lib/atomic32.c b/arch/sparc/lib/atomic32.c
> index 8b81d0f00c97..cf80d1ae352b 100644
> --- a/arch/sparc/lib/atomic32.c
> +++ b/arch/sparc/lib/atomic32.c
> @@ -120,7 +120,7 @@ void arch_atomic_set(atomic_t *v, int i)
>  }
>  EXPORT_SYMBOL(arch_atomic_set);
>  
> -unsigned long ___set_bit(unsigned long *addr, unsigned long mask)
> +unsigned long sp32___set_bit(unsigned long *addr, unsigned long mask)
>  {
>  	unsigned long old, flags;
>  
> @@ -131,9 +131,9 @@ unsigned long ___set_bit(unsigned long *addr, unsigned long mask)
>  
>  	return old & mask;
>  }
> -EXPORT_SYMBOL(___set_bit);
> +EXPORT_SYMBOL(sp32___set_bit);
>  
> -unsigned long ___clear_bit(unsigned long *addr, unsigned long mask)
> +unsigned long sp32___clear_bit(unsigned long *addr, unsigned long mask)
>  {
>  	unsigned long old, flags;
>  
> @@ -144,9 +144,9 @@ unsigned long ___clear_bit(unsigned long *addr, unsigned long mask)
>  
>  	return old & mask;
>  }
> -EXPORT_SYMBOL(___clear_bit);
> +EXPORT_SYMBOL(sp32___clear_bit);
>  
> -unsigned long ___change_bit(unsigned long *addr, unsigned long mask)
> +unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask)
>  {
>  	unsigned long old, flags;
>  
> @@ -157,7 +157,7 @@ unsigned long ___change_bit(unsigned long *addr, unsigned long mask)
>  
>  	return old & mask;
>  }
> -EXPORT_SYMBOL(___change_bit);
> +EXPORT_SYMBOL(sp32___change_bit);
>  
>  unsigned long __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
>  {
> diff --git a/include/asm-generic/bitops/instrumented-non-atomic.h b/include/asm-generic/bitops/instrumented-non-atomic.h
> index b019f77ef21c..988a3bbfba34 100644
> --- a/include/asm-generic/bitops/instrumented-non-atomic.h
> +++ b/include/asm-generic/bitops/instrumented-non-atomic.h
> @@ -14,7 +14,7 @@
>  #include <linux/instrumented.h>
>  
>  /**
> - * __set_bit - Set a bit in memory
> + * ___set_bit - Set a bit in memory
>   * @nr: the bit to set
>   * @addr: the address to start counting from
>   *
> @@ -23,14 +23,14 @@
>   * succeeds.
>   */
>  static __always_inline void
> -__set_bit(unsigned long nr, volatile unsigned long *addr)
> +___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___set_bit(nr, addr);
>  }
>  
>  /**
> - * __clear_bit - Clears a bit in memory
> + * ___clear_bit - Clears a bit in memory
>   * @nr: the bit to clear
>   * @addr: the address to start counting from
>   *
> @@ -39,14 +39,14 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
>   * succeeds.
>   */
>  static __always_inline void
> -__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___clear_bit(nr, addr);
>  }
>  
>  /**
> - * __change_bit - Toggle a bit in memory
> + * ___change_bit - Toggle a bit in memory
>   * @nr: the bit to change
>   * @addr: the address to start counting from
>   *
> @@ -55,7 +55,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
>   * succeeds.
>   */
>  static __always_inline void
> -__change_bit(unsigned long nr, volatile unsigned long *addr)
> +___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___change_bit(nr, addr);
> @@ -86,7 +86,7 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
>  }
>  
>  /**
> - * __test_and_set_bit - Set a bit and return its old value
> + * ___test_and_set_bit - Set a bit and return its old value
>   * @nr: Bit to set
>   * @addr: Address to count from
>   *
> @@ -94,14 +94,14 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
>   * can appear to succeed but actually fail.
>   */
>  static __always_inline bool
> -__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_set_bit(nr, addr);
>  }
>  
>  /**
> - * __test_and_clear_bit - Clear a bit and return its old value
> + * ___test_and_clear_bit - Clear a bit and return its old value
>   * @nr: Bit to clear
>   * @addr: Address to count from
>   *
> @@ -109,14 +109,14 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>   * can appear to succeed but actually fail.
>   */
>  static __always_inline bool
> -__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_clear_bit(nr, addr);
>  }
>  
>  /**
> - * __test_and_change_bit - Change a bit and return its old value
> + * ___test_and_change_bit - Change a bit and return its old value
>   * @nr: Bit to change
>   * @addr: Address to count from
>   *
> @@ -124,19 +124,19 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>   * can appear to succeed but actually fail.
>   */
>  static __always_inline bool
> -__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_change_bit(nr, addr);
>  }
>  
>  /**
> - * test_bit - Determine whether a bit is set
> + * _test_bit - Determine whether a bit is set
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
>  static __always_inline bool
> -test_bit(unsigned long nr, const volatile unsigned long *addr)
> +_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long));
>  	return arch_test_bit(nr, addr);
> diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h
> index 7ce0ed22fb5f..0c488331c9e7 100644
> --- a/include/asm-generic/bitops/non-atomic.h
> +++ b/include/asm-generic/bitops/non-atomic.h
> @@ -6,24 +6,24 @@
>  #include <asm/types.h>
>  
>  #define arch___set_bit gen___set_bit
> -#define __set_bit arch___set_bit
> +#define ___set_bit arch___set_bit
>  
>  #define arch___clear_bit gen___clear_bit
> -#define __clear_bit arch___clear_bit
> +#define ___clear_bit arch___clear_bit
>  
>  #define arch___change_bit gen___change_bit
> -#define __change_bit arch___change_bit
> +#define ___change_bit arch___change_bit
>  
>  #define arch___test_and_set_bit gen___test_and_set_bit
> -#define __test_and_set_bit arch___test_and_set_bit
> +#define ___test_and_set_bit arch___test_and_set_bit
>  
>  #define arch___test_and_clear_bit gen___test_and_clear_bit
> -#define __test_and_clear_bit arch___test_and_clear_bit
> +#define ___test_and_clear_bit arch___test_and_clear_bit
>  
>  #define arch___test_and_change_bit gen___test_and_change_bit
> -#define __test_and_change_bit arch___test_and_change_bit
> +#define ___test_and_change_bit arch___test_and_change_bit
>  
>  #define arch_test_bit gen_test_bit
> -#define test_bit arch_test_bit
> +#define _test_bit arch_test_bit
>  
>  #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index 5520ac9b1c24..33cfc836a115 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -26,8 +26,24 @@ extern unsigned int __sw_hweight16(unsigned int w);
>  extern unsigned int __sw_hweight32(unsigned int w);
>  extern unsigned long __sw_hweight64(__u64 w);
>  
> +/*
> + * Defined here because those may be needed by architecture-specific static
> + * inlines.
> + */
> +
>  #include <asm-generic/bitops/generic-non-atomic.h>
>  
> +#define bitop(op, nr, addr)						\
> +	op(nr, addr)
> +
> +#define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
> +#define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
> +#define __change_bit(nr, addr)		bitop(___change_bit, nr, addr)
> +#define __test_and_set_bit(nr, addr)	bitop(___test_and_set_bit, nr, addr)
> +#define __test_and_clear_bit(nr, addr)	bitop(___test_and_clear_bit, nr, addr)
> +#define __test_and_change_bit(nr, addr)	bitop(___test_and_change_bit, nr, addr)
> +#define test_bit(nr, addr)		bitop(_test_bit, nr, addr)
> +
>  /*
>   * Include this here because some architectures need generic_ffs/fls in
>   * scope
> @@ -35,14 +51,14 @@ extern unsigned long __sw_hweight64(__u64 w);
>  #include <asm/bitops.h>
>  
>  /* Check that the bitops prototypes are sane */
> -#define __check_bitop_pr(name)	static_assert(__same_type(name, gen_##name))
> -__check_bitop_pr(__set_bit);
> -__check_bitop_pr(__clear_bit);
> -__check_bitop_pr(__change_bit);
> -__check_bitop_pr(__test_and_set_bit);
> -__check_bitop_pr(__test_and_clear_bit);
> -__check_bitop_pr(__test_and_change_bit);
> -__check_bitop_pr(test_bit);
> +#define __check_bitop_pr(name)	static_assert(__same_type(name, gen##name))
> +__check_bitop_pr(___set_bit);
> +__check_bitop_pr(___clear_bit);
> +__check_bitop_pr(___change_bit);
> +__check_bitop_pr(___test_and_set_bit);
> +__check_bitop_pr(___test_and_clear_bit);
> +__check_bitop_pr(___test_and_change_bit);
> +__check_bitop_pr(_test_bit);
>  #undef __check_bitop_pr
>  
>  static inline int get_bitmask_order(unsigned int count)
> diff --git a/tools/include/asm-generic/bitops/non-atomic.h b/tools/include/asm-generic/bitops/non-atomic.h
> index e5e78e42e57b..0c472a833408 100644
> --- a/tools/include/asm-generic/bitops/non-atomic.h
> +++ b/tools/include/asm-generic/bitops/non-atomic.h
> @@ -5,7 +5,7 @@
>  #include <linux/bits.h>
>  
>  /**
> - * __set_bit - Set a bit in memory
> + * ___set_bit - Set a bit in memory
>   * @nr: the bit to set
>   * @addr: the address to start counting from
>   *
> @@ -14,7 +14,7 @@
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -__set_bit(unsigned long nr, volatile unsigned long *addr)
> +___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -23,7 +23,7 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline void
> -__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -32,7 +32,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __change_bit - Toggle a bit in memory
> + * ___change_bit - Toggle a bit in memory
>   * @nr: the bit to change
>   * @addr: the address to start counting from
>   *
> @@ -41,7 +41,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -__change_bit(unsigned long nr, volatile unsigned long *addr)
> +___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -50,7 +50,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __test_and_set_bit - Set a bit and return its old value
> + * ___test_and_set_bit - Set a bit and return its old value
>   * @nr: Bit to set
>   * @addr: Address to count from
>   *
> @@ -59,7 +59,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
>  static __always_inline bool
> -__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -70,7 +70,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __test_and_clear_bit - Clear a bit and return its old value
> + * ___test_and_clear_bit - Clear a bit and return its old value
>   * @nr: Bit to clear
>   * @addr: Address to count from
>   *
> @@ -79,7 +79,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
>  static __always_inline bool
> -__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -91,7 +91,7 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  
>  /* WARNING: non atomic and it can be reordered! */
>  static __always_inline bool
> -__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -102,12 +102,12 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * test_bit - Determine whether a bit is set
> + * _test_bit - Determine whether a bit is set
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
>  static __always_inline bool
> -test_bit(unsigned long nr, const volatile unsigned long *addr)
> +_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
>  }
> diff --git a/tools/include/linux/bitops.h b/tools/include/linux/bitops.h
> index 5fca38fe1ba8..f18683b95ea6 100644
> --- a/tools/include/linux/bitops.h
> +++ b/tools/include/linux/bitops.h
> @@ -25,6 +25,22 @@ extern unsigned int __sw_hweight16(unsigned int w);
>  extern unsigned int __sw_hweight32(unsigned int w);
>  extern unsigned long __sw_hweight64(__u64 w);
>  
> +/*
> + * Defined here because those may be needed by architecture-specific static
> + * inlines.
> + */
> +
> +#define bitop(op, nr, addr)						\
> +	op(nr, addr)
> +
> +#define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
> +#define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
> +#define __change_bit(nr, addr)		bitop(___change_bit, nr, addr)
> +#define __test_and_set_bit(nr, addr)	bitop(___test_and_set_bit, nr, addr)
> +#define __test_and_clear_bit(nr, addr)	bitop(___test_and_clear_bit, nr, addr)
> +#define __test_and_change_bit(nr, addr)	bitop(___test_and_change_bit, nr, addr)
> +#define test_bit(nr, addr)		bitop(_test_bit, nr, addr)
> +
>  /*
>   * Include this here because some architectures need generic_ffs/fls in
>   * scope
> -- 
> 2.36.1
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 4/6] bitops: unify non-atomic bitops prototypes across architectures
  2022-06-06 11:49   ` Alexander Lobakin
  (?)
@ 2022-06-06 20:48     ` Yury Norov
  -1 siblings, 0 replies; 75+ messages in thread
From: Yury Norov @ 2022-06-06 20:48 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Andy Shevchenko, Richard Henderson, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

On Mon, Jun 06, 2022 at 01:49:05PM +0200, Alexander Lobakin wrote:
> Currently, there is a mess with the prototypes of the non-atomic
> bitops across the different architectures:
> 
> ret	bool, int, unsigned long
> nr	int, long, unsigned int, unsigned long
> addr	volatile unsigned long *, volatile void *
> 
> Thankfully, it doesn't provoke any bugs, but can sometimes make
> the compiler angry when it's not handy at all.
> Adjust all the prototypes to the following standard:
> 
> ret	bool				retval can be only 0 or 1
> nr	unsigned long			native; signed makes no sense
> addr	volatile unsigned long *	bitmaps are arrays of ulongs
> 
> Finally, add some static assertions in order to prevent people from
> making a mess in this room again.
> I also used the %__always_inline attribute consistently they always
> get resolved to the actual operations.
> 
> Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> ---

Reviewed-by: Yury Norov <yury.norov@gmail.com>

[...]

> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index 7aaed501f768..5520ac9b1c24 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -26,12 +26,25 @@ extern unsigned int __sw_hweight16(unsigned int w);
>  extern unsigned int __sw_hweight32(unsigned int w);
>  extern unsigned long __sw_hweight64(__u64 w);
>  
> +#include <asm-generic/bitops/generic-non-atomic.h>
> +
>  /*
>   * Include this here because some architectures need generic_ffs/fls in
>   * scope
>   */
>  #include <asm/bitops.h>
>  
> +/* Check that the bitops prototypes are sane */
> +#define __check_bitop_pr(name)	static_assert(__same_type(name, gen_##name))
> +__check_bitop_pr(__set_bit);
> +__check_bitop_pr(__clear_bit);
> +__check_bitop_pr(__change_bit);
> +__check_bitop_pr(__test_and_set_bit);
> +__check_bitop_pr(__test_and_clear_bit);
> +__check_bitop_pr(__test_and_change_bit);
> +__check_bitop_pr(test_bit);
> +#undef __check_bitop_pr

This one is amazing trick! And the series is good overall. Do you want me to
take it in bitmap tree, when it's ready, or you'll move it somehow else?

Thanks,
Yury

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 4/6] bitops: unify non-atomic bitops prototypes across architectures
@ 2022-06-06 20:48     ` Yury Norov
  0 siblings, 0 replies; 75+ messages in thread
From: Yury Norov @ 2022-06-06 20:48 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Andy Shevchenko, Richard Henderson, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

On Mon, Jun 06, 2022 at 01:49:05PM +0200, Alexander Lobakin wrote:
> Currently, there is a mess with the prototypes of the non-atomic
> bitops across the different architectures:
> 
> ret	bool, int, unsigned long
> nr	int, long, unsigned int, unsigned long
> addr	volatile unsigned long *, volatile void *
> 
> Thankfully, it doesn't provoke any bugs, but can sometimes make
> the compiler angry when it's not handy at all.
> Adjust all the prototypes to the following standard:
> 
> ret	bool				retval can be only 0 or 1
> nr	unsigned long			native; signed makes no sense
> addr	volatile unsigned long *	bitmaps are arrays of ulongs
> 
> Finally, add some static assertions in order to prevent people from
> making a mess in this room again.
> I also used the %__always_inline attribute consistently they always
> get resolved to the actual operations.
> 
> Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> ---

Reviewed-by: Yury Norov <yury.norov@gmail.com>

[...]

> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index 7aaed501f768..5520ac9b1c24 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -26,12 +26,25 @@ extern unsigned int __sw_hweight16(unsigned int w);
>  extern unsigned int __sw_hweight32(unsigned int w);
>  extern unsigned long __sw_hweight64(__u64 w);
>  
> +#include <asm-generic/bitops/generic-non-atomic.h>
> +
>  /*
>   * Include this here because some architectures need generic_ffs/fls in
>   * scope
>   */
>  #include <asm/bitops.h>
>  
> +/* Check that the bitops prototypes are sane */
> +#define __check_bitop_pr(name)	static_assert(__same_type(name, gen_##name))
> +__check_bitop_pr(__set_bit);
> +__check_bitop_pr(__clear_bit);
> +__check_bitop_pr(__change_bit);
> +__check_bitop_pr(__test_and_set_bit);
> +__check_bitop_pr(__test_and_clear_bit);
> +__check_bitop_pr(__test_and_change_bit);
> +__check_bitop_pr(test_bit);
> +#undef __check_bitop_pr

This one is amazing trick! And the series is good overall. Do you want me to
take it in bitmap tree, when it's ready, or you'll move it somehow else?

Thanks,
Yury

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 4/6] bitops: unify non-atomic bitops prototypes across architectures
@ 2022-06-06 20:48     ` Yury Norov
  0 siblings, 0 replies; 75+ messages in thread
From: Yury Norov @ 2022-06-06 20:48 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Andy Shevchenko, Richard Henderson, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch

On Mon, Jun 06, 2022 at 01:49:05PM +0200, Alexander Lobakin wrote:
> Currently, there is a mess with the prototypes of the non-atomic
> bitops across the different architectures:
> 
> ret	bool, int, unsigned long
> nr	int, long, unsigned int, unsigned long
> addr	volatile unsigned long *, volatile void *
> 
> Thankfully, it doesn't provoke any bugs, but can sometimes make
> the compiler angry when it's not handy at all.
> Adjust all the prototypes to the following standard:
> 
> ret	bool				retval can be only 0 or 1
> nr	unsigned long			native; signed makes no sense
> addr	volatile unsigned long *	bitmaps are arrays of ulongs
> 
> Finally, add some static assertions in order to prevent people from
> making a mess in this room again.
> I also used the %__always_inline attribute consistently they always
> get resolved to the actual operations.
> 
> Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> ---

Reviewed-by: Yury Norov <yury.norov@gmail.com>

[...]

> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index 7aaed501f768..5520ac9b1c24 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -26,12 +26,25 @@ extern unsigned int __sw_hweight16(unsigned int w);
>  extern unsigned int __sw_hweight32(unsigned int w);
>  extern unsigned long __sw_hweight64(__u64 w);
>  
> +#include <asm-generic/bitops/generic-non-atomic.h>
> +
>  /*
>   * Include this here because some architectures need generic_ffs/fls in
>   * scope
>   */
>  #include <asm/bitops.h>
>  
> +/* Check that the bitops prototypes are sane */
> +#define __check_bitop_pr(name)	static_assert(__same_type(name, gen_##name))
> +__check_bitop_pr(__set_bit);
> +__check_bitop_pr(__clear_bit);
> +__check_bitop_pr(__change_bit);
> +__check_bitop_pr(__test_and_set_bit);
> +__check_bitop_pr(__test_and_clear_bit);
> +__check_bitop_pr(__test_and_change_bit);
> +__check_bitop_pr(test_bit);
> +#undef __check_bitop_pr

This one is amazing trick! And the series is good overall. Do you want me to
take it in bitmap tree, when it's ready, or you'll move it somehow else?

Thanks,
Yury

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 5/6] bitops: wrap non-atomic bitops with a transparent macro
  2022-06-06 16:27     ` Mark Rutland
  (?)
@ 2022-06-07 10:57       ` Alexander Lobakin
  -1 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-07 10:57 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Alexander Lobakin, Arnd Bergmann, Yury Norov, Andy Shevchenko,
	Richard Henderson, Matt Turner, Brian Cain, Geert Uytterhoeven,
	Yoshinori Sato, Rich Felker, David S. Miller, Kees Cook,
	Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

From: Mark Rutland <mark.rutland@arm.com>
Date: Mon, 6 Jun 2022 17:27:16 +0100

> On Mon, Jun 06, 2022 at 01:49:06PM +0200, Alexander Lobakin wrote:
> > In preparation for altering the non-atomic bitops with a macro, wrap
> > them in a transparent definition. This requires prepending one more
> > '_' to their names in order to be able to do that seamlessly.
> > sparc32 already has the triple-underscored functions, so I had to
> > rename them ('___' -> 'sp32_').
> 
> Could we use an 'arch_' prefix here, like we do for the atomics, or is that
> already overloaded?

Yeah it is, for example, x86 has 'arch_' functions defined in its
architecture headers[0] and at the same time uses generic
instrumented '__' helpers[1], so on x86 both underscored and 'arch_'
are defined and they are not the same.
Same with those sparc32 triple-underscored, sparc32 at the same time
uses generic non-instrumented, so it has underscored, 'arch_' and
triple-underscored.

In general, bitops are overloaded with tons of prefixes already :)
I'm not really glad that I introduced one more level, but not that
we have many options here.

> 
> Thanks,
> Mark.
> 
> > 
> > Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> > ---

[...]

> > -- 
> > 2.36.1

Thanks,
Olek

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 5/6] bitops: wrap non-atomic bitops with a transparent macro
@ 2022-06-07 10:57       ` Alexander Lobakin
  0 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-07 10:57 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Alexander Lobakin, Arnd Bergmann, Yury Norov, Andy Shevchenko,
	Richard Henderson, Matt Turner, Brian Cain, Geert Uytterhoeven,
	Yoshinori Sato, Rich Felker, David S. Miller, Kees Cook,
	Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

From: Mark Rutland <mark.rutland@arm.com>
Date: Mon, 6 Jun 2022 17:27:16 +0100

> On Mon, Jun 06, 2022 at 01:49:06PM +0200, Alexander Lobakin wrote:
> > In preparation for altering the non-atomic bitops with a macro, wrap
> > them in a transparent definition. This requires prepending one more
> > '_' to their names in order to be able to do that seamlessly.
> > sparc32 already has the triple-underscored functions, so I had to
> > rename them ('___' -> 'sp32_').
> 
> Could we use an 'arch_' prefix here, like we do for the atomics, or is that
> already overloaded?

Yeah it is, for example, x86 has 'arch_' functions defined in its
architecture headers[0] and at the same time uses generic
instrumented '__' helpers[1], so on x86 both underscored and 'arch_'
are defined and they are not the same.
Same with those sparc32 triple-underscored, sparc32 at the same time
uses generic non-instrumented, so it has underscored, 'arch_' and
triple-underscored.

In general, bitops are overloaded with tons of prefixes already :)
I'm not really glad that I introduced one more level, but not that
we have many options here.

> 
> Thanks,
> Mark.
> 
> > 
> > Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> > ---

[...]

> > -- 
> > 2.36.1

Thanks,
Olek

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 5/6] bitops: wrap non-atomic bitops with a transparent macro
@ 2022-06-07 10:57       ` Alexander Lobakin
  0 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-07 10:57 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Alexander Lobakin, Arnd Bergmann, Yury Norov, Andy Shevchenko,
	Richard Henderson, Matt Turner, Brian Cain, Geert Uytterhoeven,
	Yoshinori Sato, Rich Felker, David S. Miller, Kees Cook,
	Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m6

From: Mark Rutland <mark.rutland@arm.com>
Date: Mon, 6 Jun 2022 17:27:16 +0100

> On Mon, Jun 06, 2022 at 01:49:06PM +0200, Alexander Lobakin wrote:
> > In preparation for altering the non-atomic bitops with a macro, wrap
> > them in a transparent definition. This requires prepending one more
> > '_' to their names in order to be able to do that seamlessly.
> > sparc32 already has the triple-underscored functions, so I had to
> > rename them ('___' -> 'sp32_').
> 
> Could we use an 'arch_' prefix here, like we do for the atomics, or is that
> already overloaded?

Yeah it is, for example, x86 has 'arch_' functions defined in its
architecture headers[0] and at the same time uses generic
instrumented '__' helpers[1], so on x86 both underscored and 'arch_'
are defined and they are not the same.
Same with those sparc32 triple-underscored, sparc32 at the same time
uses generic non-instrumented, so it has underscored, 'arch_' and
triple-underscored.

In general, bitops are overloaded with tons of prefixes already :)
I'm not really glad that I introduced one more level, but not that
we have many options here.

> 
> Thanks,
> Mark.
> 
> > 
> > Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> > ---

[...]

> > -- 
> > 2.36.1

Thanks,
Olek

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 4/6] bitops: unify non-atomic bitops prototypes across architectures
  2022-06-06 20:48     ` Yury Norov
  (?)
@ 2022-06-07 11:03       ` Alexander Lobakin
  -1 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-07 11:03 UTC (permalink / raw)
  To: Yury Norov
  Cc: Alexander Lobakin, Arnd Bergmann, Andy Shevchenko,
	Richard Henderson, Matt Turner, Brian Cain, Geert Uytterhoeven,
	Yoshinori Sato, Rich Felker, David S. Miller, Kees Cook,
	Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

From: Yury Norov <yury.norov@gmail.com>
Date: Mon, 6 Jun 2022 13:48:50 -0700

> On Mon, Jun 06, 2022 at 01:49:05PM +0200, Alexander Lobakin wrote:
> > Currently, there is a mess with the prototypes of the non-atomic
> > bitops across the different architectures:
> > 
> > ret	bool, int, unsigned long
> > nr	int, long, unsigned int, unsigned long
> > addr	volatile unsigned long *, volatile void *
> > 
> > Thankfully, it doesn't provoke any bugs, but can sometimes make
> > the compiler angry when it's not handy at all.
> > Adjust all the prototypes to the following standard:
> > 
> > ret	bool				retval can be only 0 or 1
> > nr	unsigned long			native; signed makes no sense
> > addr	volatile unsigned long *	bitmaps are arrays of ulongs
> > 
> > Finally, add some static assertions in order to prevent people from
> > making a mess in this room again.
> > I also used the %__always_inline attribute consistently they always
> > get resolved to the actual operations.
> > 
> > Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> > Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> > ---
> 
> Reviewed-by: Yury Norov <yury.norov@gmail.com>
> 
> [...]
> 
> > diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> > index 7aaed501f768..5520ac9b1c24 100644
> > --- a/include/linux/bitops.h
> > +++ b/include/linux/bitops.h
> > @@ -26,12 +26,25 @@ extern unsigned int __sw_hweight16(unsigned int w);
> >  extern unsigned int __sw_hweight32(unsigned int w);
> >  extern unsigned long __sw_hweight64(__u64 w);
> >  
> > +#include <asm-generic/bitops/generic-non-atomic.h>
> > +
> >  /*
> >   * Include this here because some architectures need generic_ffs/fls in
> >   * scope
> >   */
> >  #include <asm/bitops.h>
> >  
> > +/* Check that the bitops prototypes are sane */
> > +#define __check_bitop_pr(name)	static_assert(__same_type(name, gen_##name))
> > +__check_bitop_pr(__set_bit);
> > +__check_bitop_pr(__clear_bit);
> > +__check_bitop_pr(__change_bit);
> > +__check_bitop_pr(__test_and_set_bit);
> > +__check_bitop_pr(__test_and_clear_bit);
> > +__check_bitop_pr(__test_and_change_bit);
> > +__check_bitop_pr(test_bit);
> > +#undef __check_bitop_pr
> 
> This one is amazing trick! And the series is good overall. Do you want me to
> take it in bitmap tree, when it's ready, or you'll move it somehow else?

Thanks :) Yeah I'm glad we can use __same_type() (->
__builtin_types_compatible_p()) for functions as well, it simplifies
keeping the prototypes unified a lot.

I'm fine with either your bitmap tree or Arnd's asm-generic tree, so
it was my question which I happily forgot to ask: which of those two
is preferred for the series.

> 
> Thanks,
> Yury

Thanks,
Olek

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 4/6] bitops: unify non-atomic bitops prototypes across architectures
@ 2022-06-07 11:03       ` Alexander Lobakin
  0 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-07 11:03 UTC (permalink / raw)
  To: Yury Norov
  Cc: Alexander Lobakin, Arnd Bergmann, Andy Shevchenko,
	Richard Henderson, Matt Turner, Brian Cain, Geert Uytterhoeven,
	Yoshinori Sato, Rich Felker, David S. Miller, Kees Cook,
	Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

From: Yury Norov <yury.norov@gmail.com>
Date: Mon, 6 Jun 2022 13:48:50 -0700

> On Mon, Jun 06, 2022 at 01:49:05PM +0200, Alexander Lobakin wrote:
> > Currently, there is a mess with the prototypes of the non-atomic
> > bitops across the different architectures:
> > 
> > ret	bool, int, unsigned long
> > nr	int, long, unsigned int, unsigned long
> > addr	volatile unsigned long *, volatile void *
> > 
> > Thankfully, it doesn't provoke any bugs, but can sometimes make
> > the compiler angry when it's not handy at all.
> > Adjust all the prototypes to the following standard:
> > 
> > ret	bool				retval can be only 0 or 1
> > nr	unsigned long			native; signed makes no sense
> > addr	volatile unsigned long *	bitmaps are arrays of ulongs
> > 
> > Finally, add some static assertions in order to prevent people from
> > making a mess in this room again.
> > I also used the %__always_inline attribute consistently they always
> > get resolved to the actual operations.
> > 
> > Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> > Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> > ---
> 
> Reviewed-by: Yury Norov <yury.norov@gmail.com>
> 
> [...]
> 
> > diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> > index 7aaed501f768..5520ac9b1c24 100644
> > --- a/include/linux/bitops.h
> > +++ b/include/linux/bitops.h
> > @@ -26,12 +26,25 @@ extern unsigned int __sw_hweight16(unsigned int w);
> >  extern unsigned int __sw_hweight32(unsigned int w);
> >  extern unsigned long __sw_hweight64(__u64 w);
> >  
> > +#include <asm-generic/bitops/generic-non-atomic.h>
> > +
> >  /*
> >   * Include this here because some architectures need generic_ffs/fls in
> >   * scope
> >   */
> >  #include <asm/bitops.h>
> >  
> > +/* Check that the bitops prototypes are sane */
> > +#define __check_bitop_pr(name)	static_assert(__same_type(name, gen_##name))
> > +__check_bitop_pr(__set_bit);
> > +__check_bitop_pr(__clear_bit);
> > +__check_bitop_pr(__change_bit);
> > +__check_bitop_pr(__test_and_set_bit);
> > +__check_bitop_pr(__test_and_clear_bit);
> > +__check_bitop_pr(__test_and_change_bit);
> > +__check_bitop_pr(test_bit);
> > +#undef __check_bitop_pr
> 
> This one is amazing trick! And the series is good overall. Do you want me to
> take it in bitmap tree, when it's ready, or you'll move it somehow else?

Thanks :) Yeah I'm glad we can use __same_type() (->
__builtin_types_compatible_p()) for functions as well, it simplifies
keeping the prototypes unified a lot.

I'm fine with either your bitmap tree or Arnd's asm-generic tree, so
it was my question which I happily forgot to ask: which of those two
is preferred for the series.

> 
> Thanks,
> Yury

Thanks,
Olek

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 4/6] bitops: unify non-atomic bitops prototypes across architectures
@ 2022-06-07 11:03       ` Alexander Lobakin
  0 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-07 11:03 UTC (permalink / raw)
  To: Yury Norov
  Cc: Alexander Lobakin, Arnd Bergmann, Andy Shevchenko,
	Richard Henderson, Matt Turner, Brian Cain, Geert Uytterhoeven,
	Yoshinori Sato, Rich Felker, David S. Miller, Kees Cook,
	Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh

From: Yury Norov <yury.norov@gmail.com>
Date: Mon, 6 Jun 2022 13:48:50 -0700

> On Mon, Jun 06, 2022 at 01:49:05PM +0200, Alexander Lobakin wrote:
> > Currently, there is a mess with the prototypes of the non-atomic
> > bitops across the different architectures:
> > 
> > ret	bool, int, unsigned long
> > nr	int, long, unsigned int, unsigned long
> > addr	volatile unsigned long *, volatile void *
> > 
> > Thankfully, it doesn't provoke any bugs, but can sometimes make
> > the compiler angry when it's not handy at all.
> > Adjust all the prototypes to the following standard:
> > 
> > ret	bool				retval can be only 0 or 1
> > nr	unsigned long			native; signed makes no sense
> > addr	volatile unsigned long *	bitmaps are arrays of ulongs
> > 
> > Finally, add some static assertions in order to prevent people from
> > making a mess in this room again.
> > I also used the %__always_inline attribute consistently they always
> > get resolved to the actual operations.
> > 
> > Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> > Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> > ---
> 
> Reviewed-by: Yury Norov <yury.norov@gmail.com>
> 
> [...]
> 
> > diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> > index 7aaed501f768..5520ac9b1c24 100644
> > --- a/include/linux/bitops.h
> > +++ b/include/linux/bitops.h
> > @@ -26,12 +26,25 @@ extern unsigned int __sw_hweight16(unsigned int w);
> >  extern unsigned int __sw_hweight32(unsigned int w);
> >  extern unsigned long __sw_hweight64(__u64 w);
> >  
> > +#include <asm-generic/bitops/generic-non-atomic.h>
> > +
> >  /*
> >   * Include this here because some architectures need generic_ffs/fls in
> >   * scope
> >   */
> >  #include <asm/bitops.h>
> >  
> > +/* Check that the bitops prototypes are sane */
> > +#define __check_bitop_pr(name)	static_assert(__same_type(name, gen_##name))
> > +__check_bitop_pr(__set_bit);
> > +__check_bitop_pr(__clear_bit);
> > +__check_bitop_pr(__change_bit);
> > +__check_bitop_pr(__test_and_set_bit);
> > +__check_bitop_pr(__test_and_clear_bit);
> > +__check_bitop_pr(__test_and_change_bit);
> > +__check_bitop_pr(test_bit);
> > +#undef __check_bitop_pr
> 
> This one is amazing trick! And the series is good overall. Do you want me to
> take it in bitmap tree, when it's ready, or you'll move it somehow else?

Thanks :) Yeah I'm glad we can use __same_type() (->
__builtin_types_compatible_p()) for functions as well, it simplifies
keeping the prototypes unified a lot.

I'm fine with either your bitmap tree or Arnd's asm-generic tree, so
it was my question which I happily forgot to ask: which of those two
is preferred for the series.

> 
> Thanks,
> Yury

Thanks,
Olek

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 5/6] bitops: wrap non-atomic bitops with a transparent macro
  2022-06-07 10:57       ` Alexander Lobakin
  (?)
@ 2022-06-07 11:07         ` Alexander Lobakin
  -1 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-07 11:07 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Alexander Lobakin, Arnd Bergmann, Yury Norov, Andy Shevchenko,
	Richard Henderson, Matt Turner, Brian Cain, Geert Uytterhoeven,
	Yoshinori Sato, Rich Felker, David S. Miller, Kees Cook,
	Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

From: Alexander Lobakin <alexandr.lobakin@intel.com>
Date: Tue, 7 Jun 2022 12:57:18 +0200

> From: Mark Rutland <mark.rutland@arm.com>
> Date: Mon, 6 Jun 2022 17:27:16 +0100
> 
> > On Mon, Jun 06, 2022 at 01:49:06PM +0200, Alexander Lobakin wrote:
> > > In preparation for altering the non-atomic bitops with a macro, wrap
> > > them in a transparent definition. This requires prepending one more
> > > '_' to their names in order to be able to do that seamlessly.
> > > sparc32 already has the triple-underscored functions, so I had to
> > > rename them ('___' -> 'sp32_').
> > 
> > Could we use an 'arch_' prefix here, like we do for the atomics, or is that
> > already overloaded?
> 
> Yeah it is, for example, x86 has 'arch_' functions defined in its
> architecture headers[0] and at the same time uses generic
> instrumented '__' helpers[1], so on x86 both underscored and 'arch_'
> are defined and they are not the same.

Oh well, forgot to attach the links. Can be found at the bottom of
this mail.

> Same with those sparc32 triple-underscored, sparc32 at the same time
> uses generic non-instrumented, so it has underscored, 'arch_' and
> triple-underscored.
> 
> In general, bitops are overloaded with tons of prefixes already :)
> I'm not really glad that I introduced one more level, but not that
> we have many options here.
> 
> > 
> > Thanks,
> > Mark.
> > 
> > > 
> > > Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> > > ---
> 
> [...]
> 
> > > -- 
> > > 2.36.1
> 
> Thanks,
> Olek

[0] https://elixir.bootlin.com/linux/v5.19-rc1/source/arch/x86/include/asm/bitops.h#L136
[1] https://elixir.bootlin.com/linux/v5.19-rc1/source/include/asm-generic/bitops/instrumented-non-atomic.h#L93

Thanks,
Olek

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 5/6] bitops: wrap non-atomic bitops with a transparent macro
@ 2022-06-07 11:07         ` Alexander Lobakin
  0 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-07 11:07 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Alexander Lobakin, Arnd Bergmann, Yury Norov, Andy Shevchenko,
	Richard Henderson, Matt Turner, Brian Cain, Geert Uytterhoeven,
	Yoshinori Sato, Rich Felker, David S. Miller, Kees Cook,
	Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k, linux-sh,
	sparclinux, linux-arch, linux-kernel

From: Alexander Lobakin <alexandr.lobakin@intel.com>
Date: Tue, 7 Jun 2022 12:57:18 +0200

> From: Mark Rutland <mark.rutland@arm.com>
> Date: Mon, 6 Jun 2022 17:27:16 +0100
> 
> > On Mon, Jun 06, 2022 at 01:49:06PM +0200, Alexander Lobakin wrote:
> > > In preparation for altering the non-atomic bitops with a macro, wrap
> > > them in a transparent definition. This requires prepending one more
> > > '_' to their names in order to be able to do that seamlessly.
> > > sparc32 already has the triple-underscored functions, so I had to
> > > rename them ('___' -> 'sp32_').
> > 
> > Could we use an 'arch_' prefix here, like we do for the atomics, or is that
> > already overloaded?
> 
> Yeah it is, for example, x86 has 'arch_' functions defined in its
> architecture headers[0] and at the same time uses generic
> instrumented '__' helpers[1], so on x86 both underscored and 'arch_'
> are defined and they are not the same.

Oh well, forgot to attach the links. Can be found at the bottom of
this mail.

> Same with those sparc32 triple-underscored, sparc32 at the same time
> uses generic non-instrumented, so it has underscored, 'arch_' and
> triple-underscored.
> 
> In general, bitops are overloaded with tons of prefixes already :)
> I'm not really glad that I introduced one more level, but not that
> we have many options here.
> 
> > 
> > Thanks,
> > Mark.
> > 
> > > 
> > > Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> > > ---
> 
> [...]
> 
> > > -- 
> > > 2.36.1
> 
> Thanks,
> Olek

[0] https://elixir.bootlin.com/linux/v5.19-rc1/source/arch/x86/include/asm/bitops.h#L136
[1] https://elixir.bootlin.com/linux/v5.19-rc1/source/include/asm-generic/bitops/instrumented-non-atomic.h#L93

Thanks,
Olek

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 5/6] bitops: wrap non-atomic bitops with a transparent macro
@ 2022-06-07 11:07         ` Alexander Lobakin
  0 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-07 11:07 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Alexander Lobakin, Arnd Bergmann, Yury Norov, Andy Shevchenko,
	Richard Henderson, Matt Turner, Brian Cain, Geert Uytterhoeven,
	Yoshinori Sato, Rich Felker, David S. Miller, Kees Cook,
	Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	linux-alpha, linux-hexagon, linux-ia64, linux-m68k

From: Alexander Lobakin <alexandr.lobakin@intel.com>
Date: Tue, 7 Jun 2022 12:57:18 +0200

> From: Mark Rutland <mark.rutland@arm.com>
> Date: Mon, 6 Jun 2022 17:27:16 +0100
> 
> > On Mon, Jun 06, 2022 at 01:49:06PM +0200, Alexander Lobakin wrote:
> > > In preparation for altering the non-atomic bitops with a macro, wrap
> > > them in a transparent definition. This requires prepending one more
> > > '_' to their names in order to be able to do that seamlessly.
> > > sparc32 already has the triple-underscored functions, so I had to
> > > rename them ('___' -> 'sp32_').
> > 
> > Could we use an 'arch_' prefix here, like we do for the atomics, or is that
> > already overloaded?
> 
> Yeah it is, for example, x86 has 'arch_' functions defined in its
> architecture headers[0] and at the same time uses generic
> instrumented '__' helpers[1], so on x86 both underscored and 'arch_'
> are defined and they are not the same.

Oh well, forgot to attach the links. Can be found at the bottom of
this mail.

> Same with those sparc32 triple-underscored, sparc32 at the same time
> uses generic non-instrumented, so it has underscored, 'arch_' and
> triple-underscored.
> 
> In general, bitops are overloaded with tons of prefixes already :)
> I'm not really glad that I introduced one more level, but not that
> we have many options here.
> 
> > 
> > Thanks,
> > Mark.
> > 
> > > 
> > > Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> > > ---
> 
> [...]
> 
> > > -- 
> > > 2.36.1
> 
> Thanks,
> Olek

[0] https://elixir.bootlin.com/linux/v5.19-rc1/source/arch/x86/include/asm/bitops.h#L136
[1] https://elixir.bootlin.com/linux/v5.19-rc1/source/include/asm-generic/bitops/instrumented-non-atomic.h#L93

Thanks,
Olek

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 0/6] bitops: let optimize out non-atomic bitops on compile-time constants
  2022-06-06 11:49 ` Alexander Lobakin
  (?)
@ 2022-06-07 12:45   ` Geert Uytterhoeven
  -1 siblings, 0 replies; 75+ messages in thread
From: Geert Uytterhoeven @ 2022-06-07 12:45 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	alpha, open list:QUALCOMM HEXAGON...,
	linux-ia64, linux-m68k, Linux-sh list, sparclinux, Linux-Arch,
	Linux Kernel Mailing List

Hi Alexander,

On Mon, Jun 6, 2022 at 1:50 PM Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
> While I was working on converting some structure fields from a fixed
> type to a bitmap, I started observing code size increase not only in
> places where the code works with the converted structure fields, but
> also where the converted vars were on the stack. That said, the
> following code:
>
>         DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
>         unsigned long bar = BIT(BAR_BIT);
>         unsigned long baz = 0;
>
>         __set_bit(FOO_BIT, foo);
>         baz |= BIT(BAZ_BIT);
>
>         BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
>         BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
>         BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
>
> triggers the first assertion on x86_64, which means that the
> compiler is unable to evaluate it to a compile-time initializer
> when the architecture-specific bitop is used even if it's obvious.
> I found that this is due to that many architecture-specific
> non-atomic bitop implementations use inline asm or other hacks which
> are faster or more robust when working with "real" variables (i.e.
> fields from the structures etc.), but the compilers have no clue how
> to optimize them out when called on compile-time constants.
>
> So, in order to let the compiler optimize out such cases, expand the
> test_bit() and __*_bit() definitions with a compile-time condition
> check, so that they will pick the generic C non-atomic bitop
> implementations when all of the arguments passed are compile-time
> constants, which means that the result will be a compile-time
> constant as well and the compiler will produce more efficient and
> simple code in 100% cases (no changes when there's at least one
> non-compile-time-constant argument).
> The condition itself:
>
> if (
> __builtin_constant_p(nr) &&     /* <- bit position is constant */
> __builtin_constant_p(!!addr) && /* <- compiler knows bitmap addr is
>                                       always either NULL or not */
> addr &&                         /* <- bitmap addr is not NULL */
> __builtin_constant_p(*addr)     /* <- compiler knows the value of
>                                       the target bitmap */
> )
>         /* then pick the generic C variant
> else
>         /* old code path, arch-specific
>
> I also tried __is_constexpr() as suggested by Andy, but it was
> always returning 0 ('not a constant') for the 2,3 and 4th
> conditions.
>
> The savings on x86_64 with LLVM are insane (.text):
>
> $ scripts/bloat-o-meter -c vmlinux.{base,test}
> add/remove: 72/75 grow/shrink: 182/518 up/down: 53925/-137810 (-83885)
>
> $ scripts/bloat-o-meter -c vmlinux.{base,mod}
> add/remove: 7/1 grow/shrink: 1/19 up/down: 1135/-4082 (-2947)
>
> $ scripts/bloat-o-meter -c vmlinux.{base,all}
> add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)

Thank you!

I gave it a try on m68k, and am a bit disappointed seeing an increase
in code size:

    add/remove: 49/13 grow/shrink: 279/138 up/down: 6434/-3342 (3092)

This is atari_defconfig on a tree based on v5.19-rc1, with
m68k-linux-gnu-gcc (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0, GNU ld (GNU
Binutils for Ubuntu) 2.34).

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 0/6] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-07 12:45   ` Geert Uytterhoeven
  0 siblings, 0 replies; 75+ messages in thread
From: Geert Uytterhoeven @ 2022-06-07 12:45 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	alpha, open list:QUALCOMM HEXAGON...,
	linux-ia64, linux-m68k, Linux-sh list, sparclinux, Linux-Arch,
	Linux Kernel Mailing List

Hi Alexander,

On Mon, Jun 6, 2022 at 1:50 PM Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
> While I was working on converting some structure fields from a fixed
> type to a bitmap, I started observing code size increase not only in
> places where the code works with the converted structure fields, but
> also where the converted vars were on the stack. That said, the
> following code:
>
>         DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
>         unsigned long bar = BIT(BAR_BIT);
>         unsigned long baz = 0;
>
>         __set_bit(FOO_BIT, foo);
>         baz |= BIT(BAZ_BIT);
>
>         BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
>         BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
>         BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
>
> triggers the first assertion on x86_64, which means that the
> compiler is unable to evaluate it to a compile-time initializer
> when the architecture-specific bitop is used even if it's obvious.
> I found that this is due to that many architecture-specific
> non-atomic bitop implementations use inline asm or other hacks which
> are faster or more robust when working with "real" variables (i.e.
> fields from the structures etc.), but the compilers have no clue how
> to optimize them out when called on compile-time constants.
>
> So, in order to let the compiler optimize out such cases, expand the
> test_bit() and __*_bit() definitions with a compile-time condition
> check, so that they will pick the generic C non-atomic bitop
> implementations when all of the arguments passed are compile-time
> constants, which means that the result will be a compile-time
> constant as well and the compiler will produce more efficient and
> simple code in 100% cases (no changes when there's at least one
> non-compile-time-constant argument).
> The condition itself:
>
> if (
> __builtin_constant_p(nr) &&     /* <- bit position is constant */
> __builtin_constant_p(!!addr) && /* <- compiler knows bitmap addr is
>                                       always either NULL or not */
> addr &&                         /* <- bitmap addr is not NULL */
> __builtin_constant_p(*addr)     /* <- compiler knows the value of
>                                       the target bitmap */
> )
>         /* then pick the generic C variant
> else
>         /* old code path, arch-specific
>
> I also tried __is_constexpr() as suggested by Andy, but it was
> always returning 0 ('not a constant') for the 2,3 and 4th
> conditions.
>
> The savings on x86_64 with LLVM are insane (.text):
>
> $ scripts/bloat-o-meter -c vmlinux.{base,test}
> add/remove: 72/75 grow/shrink: 182/518 up/down: 53925/-137810 (-83885)
>
> $ scripts/bloat-o-meter -c vmlinux.{base,mod}
> add/remove: 7/1 grow/shrink: 1/19 up/down: 1135/-4082 (-2947)
>
> $ scripts/bloat-o-meter -c vmlinux.{base,all}
> add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)

Thank you!

I gave it a try on m68k, and am a bit disappointed seeing an increase
in code size:

    add/remove: 49/13 grow/shrink: 279/138 up/down: 6434/-3342 (3092)

This is atari_defconfig on a tree based on v5.19-rc1, with
m68k-linux-gnu-gcc (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0, GNU ld (GNU
Binutils for Ubuntu) 2.34).

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 0/6] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-07 12:45   ` Geert Uytterhoeven
  0 siblings, 0 replies; 75+ messages in thread
From: Geert Uytterhoeven @ 2022-06-07 12:45 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	alpha, open list:QUALCOMM HEXAGON...,
	linux-ia64, linux-m68k

Hi Alexander,

On Mon, Jun 6, 2022 at 1:50 PM Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
> While I was working on converting some structure fields from a fixed
> type to a bitmap, I started observing code size increase not only in
> places where the code works with the converted structure fields, but
> also where the converted vars were on the stack. That said, the
> following code:
>
>         DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
>         unsigned long bar = BIT(BAR_BIT);
>         unsigned long baz = 0;
>
>         __set_bit(FOO_BIT, foo);
>         baz |= BIT(BAZ_BIT);
>
>         BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
>         BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
>         BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
>
> triggers the first assertion on x86_64, which means that the
> compiler is unable to evaluate it to a compile-time initializer
> when the architecture-specific bitop is used even if it's obvious.
> I found that this is due to that many architecture-specific
> non-atomic bitop implementations use inline asm or other hacks which
> are faster or more robust when working with "real" variables (i.e.
> fields from the structures etc.), but the compilers have no clue how
> to optimize them out when called on compile-time constants.
>
> So, in order to let the compiler optimize out such cases, expand the
> test_bit() and __*_bit() definitions with a compile-time condition
> check, so that they will pick the generic C non-atomic bitop
> implementations when all of the arguments passed are compile-time
> constants, which means that the result will be a compile-time
> constant as well and the compiler will produce more efficient and
> simple code in 100% cases (no changes when there's at least one
> non-compile-time-constant argument).
> The condition itself:
>
> if (
> __builtin_constant_p(nr) &&     /* <- bit position is constant */
> __builtin_constant_p(!!addr) && /* <- compiler knows bitmap addr is
>                                       always either NULL or not */
> addr &&                         /* <- bitmap addr is not NULL */
> __builtin_constant_p(*addr)     /* <- compiler knows the value of
>                                       the target bitmap */
> )
>         /* then pick the generic C variant
> else
>         /* old code path, arch-specific
>
> I also tried __is_constexpr() as suggested by Andy, but it was
> always returning 0 ('not a constant') for the 2,3 and 4th
> conditions.
>
> The savings on x86_64 with LLVM are insane (.text):
>
> $ scripts/bloat-o-meter -c vmlinux.{base,test}
> add/remove: 72/75 grow/shrink: 182/518 up/down: 53925/-137810 (-83885)
>
> $ scripts/bloat-o-meter -c vmlinux.{base,mod}
> add/remove: 7/1 grow/shrink: 1/19 up/down: 1135/-4082 (-2947)
>
> $ scripts/bloat-o-meter -c vmlinux.{base,all}
> add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)

Thank you!

I gave it a try on m68k, and am a bit disappointed seeing an increase
in code size:

    add/remove: 49/13 grow/shrink: 279/138 up/down: 6434/-3342 (3092)

This is atari_defconfig on a tree based on v5.19-rc1, with
m68k-linux-gnu-gcc (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0, GNU ld (GNU
Binutils for Ubuntu) 2.34).

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 3/6] bitops: define gen_test_bit() the same way as the rest of functions
  2022-06-06 11:49   ` Alexander Lobakin
  (?)
@ 2022-06-07 13:43     ` Marco Elver
  -1 siblings, 0 replies; 75+ messages in thread
From: Marco Elver @ 2022-06-07 13:43 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Borislav Petkov, Tony Luck, Greg Kroah-Hartman, linux-alpha,
	linux-hexagon, linux-ia64, linux-m68k, linux-sh, sparclinux,
	linux-arch, linux-kernel

On Mon, Jun 06, 2022 at 01:49PM +0200, Alexander Lobakin wrote:
> Currently, the generic test_bit() function is defined as a one-liner
> and in case with constant bitmaps the compiler is unable to optimize
> it to a constant. At the same time, gen_test_and_*_bit() are being
> optimized pretty good.
> Define gen_test_bit() the same way as they are defined.
> 
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> ---
>  include/asm-generic/bitops/generic-non-atomic.h | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> index 7a60adfa6e7d..202d8a3b40e1 100644
> --- a/include/asm-generic/bitops/generic-non-atomic.h
> +++ b/include/asm-generic/bitops/generic-non-atomic.h
> @@ -118,7 +118,11 @@ gen___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
>  static __always_inline int
>  gen_test_bit(unsigned int nr, const volatile unsigned long *addr)
>  {
> -	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> +	const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
> +	unsigned long mask = BIT_MASK(nr);
> +	unsigned long val = *p;
> +
> +	return !!(val & mask);

Unfortunately this makes the dereference of 'addr' non-volatile, and
effectively weakens test_bit() to the point where I'd no longer consider
it atomic. Per atomic_bitops.txt, test_bit() is atomic.

The generic version has been using a volatile access to make it atomic
(akin to generic READ_ONCE() casting to volatile). The volatile is also
the reason the compiler can't optimize much, because volatile forces a
real memory access.

Yes, confusingly, test_bit() lives in non-atomic.h, and this had caused
confusion before, but the decision was made that moving it will cause
headaches for ppc so it was left alone:
https://lore.kernel.org/all/87a78xgu8o.fsf@dja-thinkpad.axtens.net/T/#u

As for how to make test_bit() more compiler-optimization friendly, I'm
guessing that test_bit() needs some special casing where even the
generic arch_test_bit() is different from the gen_test_bit().
gen_test_bit() should probably assert that whatever it is called with
can actually be evaluated at compile-time so it is never accidentally
used otherwise.

I would also propose adding a comment close to the deref that test_bit()
is atomic and the deref needs to remain volatile, so future people will
not try to do the same optimization.

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 3/6] bitops: define gen_test_bit() the same way as the rest of functions
@ 2022-06-07 13:43     ` Marco Elver
  0 siblings, 0 replies; 75+ messages in thread
From: Marco Elver @ 2022-06-07 13:43 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Borislav Petkov, Tony Luck, Greg Kroah-Hartman, linux-alpha,
	linux-hexagon, linux-ia64, linux-m68k, linux-sh, sparclinux,
	linux-arch, linux-kernel

On Mon, Jun 06, 2022 at 01:49PM +0200, Alexander Lobakin wrote:
> Currently, the generic test_bit() function is defined as a one-liner
> and in case with constant bitmaps the compiler is unable to optimize
> it to a constant. At the same time, gen_test_and_*_bit() are being
> optimized pretty good.
> Define gen_test_bit() the same way as they are defined.
> 
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> ---
>  include/asm-generic/bitops/generic-non-atomic.h | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> index 7a60adfa6e7d..202d8a3b40e1 100644
> --- a/include/asm-generic/bitops/generic-non-atomic.h
> +++ b/include/asm-generic/bitops/generic-non-atomic.h
> @@ -118,7 +118,11 @@ gen___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
>  static __always_inline int
>  gen_test_bit(unsigned int nr, const volatile unsigned long *addr)
>  {
> -	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> +	const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
> +	unsigned long mask = BIT_MASK(nr);
> +	unsigned long val = *p;
> +
> +	return !!(val & mask);

Unfortunately this makes the dereference of 'addr' non-volatile, and
effectively weakens test_bit() to the point where I'd no longer consider
it atomic. Per atomic_bitops.txt, test_bit() is atomic.

The generic version has been using a volatile access to make it atomic
(akin to generic READ_ONCE() casting to volatile). The volatile is also
the reason the compiler can't optimize much, because volatile forces a
real memory access.

Yes, confusingly, test_bit() lives in non-atomic.h, and this had caused
confusion before, but the decision was made that moving it will cause
headaches for ppc so it was left alone:
https://lore.kernel.org/all/87a78xgu8o.fsf@dja-thinkpad.axtens.net/T/#u

As for how to make test_bit() more compiler-optimization friendly, I'm
guessing that test_bit() needs some special casing where even the
generic arch_test_bit() is different from the gen_test_bit().
gen_test_bit() should probably assert that whatever it is called with
can actually be evaluated at compile-time so it is never accidentally
used otherwise.

I would also propose adding a comment close to the deref that test_bit()
is atomic and the deref needs to remain volatile, so future people will
not try to do the same optimization.

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 3/6] bitops: define gen_test_bit() the same way as the rest of functions
@ 2022-06-07 13:43     ` Marco Elver
  0 siblings, 0 replies; 75+ messages in thread
From: Marco Elver @ 2022-06-07 13:43 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Borislav Petkov, Tony Luck, Greg Kroah-Hartman, linux-alpha,
	linux-hexagon, linux-ia64, linux-m68k, linux-sh, sparclinux,
	linux-arch

On Mon, Jun 06, 2022 at 01:49PM +0200, Alexander Lobakin wrote:
> Currently, the generic test_bit() function is defined as a one-liner
> and in case with constant bitmaps the compiler is unable to optimize
> it to a constant. At the same time, gen_test_and_*_bit() are being
> optimized pretty good.
> Define gen_test_bit() the same way as they are defined.
> 
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> ---
>  include/asm-generic/bitops/generic-non-atomic.h | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> index 7a60adfa6e7d..202d8a3b40e1 100644
> --- a/include/asm-generic/bitops/generic-non-atomic.h
> +++ b/include/asm-generic/bitops/generic-non-atomic.h
> @@ -118,7 +118,11 @@ gen___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
>  static __always_inline int
>  gen_test_bit(unsigned int nr, const volatile unsigned long *addr)
>  {
> -	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> +	const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
> +	unsigned long mask = BIT_MASK(nr);
> +	unsigned long val = *p;
> +
> +	return !!(val & mask);

Unfortunately this makes the dereference of 'addr' non-volatile, and
effectively weakens test_bit() to the point where I'd no longer consider
it atomic. Per atomic_bitops.txt, test_bit() is atomic.

The generic version has been using a volatile access to make it atomic
(akin to generic READ_ONCE() casting to volatile). The volatile is also
the reason the compiler can't optimize much, because volatile forces a
real memory access.

Yes, confusingly, test_bit() lives in non-atomic.h, and this had caused
confusion before, but the decision was made that moving it will cause
headaches for ppc so it was left alone:
https://lore.kernel.org/all/87a78xgu8o.fsf@dja-thinkpad.axtens.net/T/#u

As for how to make test_bit() more compiler-optimization friendly, I'm
guessing that test_bit() needs some special casing where even the
generic arch_test_bit() is different from the gen_test_bit().
gen_test_bit() should probably assert that whatever it is called with
can actually be evaluated at compile-time so it is never accidentally
used otherwise.

I would also propose adding a comment close to the deref that test_bit()
is atomic and the deref needs to remain volatile, so future people will
not try to do the same optimization.

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 0/6] bitops: let optimize out non-atomic bitops on compile-time constants
  2022-06-07 12:45   ` Geert Uytterhoeven
  (?)
@ 2022-06-07 15:47     ` Alexander Lobakin
  -1 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-07 15:47 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Alexander Lobakin, Arnd Bergmann, Yury Norov, Andy Shevchenko,
	Richard Henderson, Matt Turner, Brian Cain, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	alpha, open list:QUALCOMM HEXAGON...,
	linux-ia64, linux-m68k, Linux-sh list, sparclinux, Linux-Arch,
	Linux Kernel Mailing List

From: Geert Uytterhoeven <geert@linux-m68k.org>
Date: Tue, 7 Jun 2022 14:45:41 +0200

> Hi Alexander,

Hi!

> 
> On Mon, Jun 6, 2022 at 1:50 PM Alexander Lobakin
> <alexandr.lobakin@intel.com> wrote:
> > While I was working on converting some structure fields from a fixed
> > type to a bitmap, I started observing code size increase not only in
> > places where the code works with the converted structure fields, but
> > also where the converted vars were on the stack. That said, the
> > following code:
> >
> >         DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
> >         unsigned long bar = BIT(BAR_BIT);
> >         unsigned long baz = 0;
> >
> >         __set_bit(FOO_BIT, foo);
> >         baz |= BIT(BAZ_BIT);
> >
> >         BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
> >         BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
> >         BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
> >
> > triggers the first assertion on x86_64, which means that the
> > compiler is unable to evaluate it to a compile-time initializer
> > when the architecture-specific bitop is used even if it's obvious.
> > I found that this is due to that many architecture-specific
> > non-atomic bitop implementations use inline asm or other hacks which
> > are faster or more robust when working with "real" variables (i.e.
> > fields from the structures etc.), but the compilers have no clue how
> > to optimize them out when called on compile-time constants.
> >
> > So, in order to let the compiler optimize out such cases, expand the
> > test_bit() and __*_bit() definitions with a compile-time condition
> > check, so that they will pick the generic C non-atomic bitop
> > implementations when all of the arguments passed are compile-time
> > constants, which means that the result will be a compile-time
> > constant as well and the compiler will produce more efficient and
> > simple code in 100% cases (no changes when there's at least one
> > non-compile-time-constant argument).
> > The condition itself:
> >
> > if (
> > __builtin_constant_p(nr) &&     /* <- bit position is constant */
> > __builtin_constant_p(!!addr) && /* <- compiler knows bitmap addr is
> >                                       always either NULL or not */
> > addr &&                         /* <- bitmap addr is not NULL */
> > __builtin_constant_p(*addr)     /* <- compiler knows the value of
> >                                       the target bitmap */
> > )
> >         /* then pick the generic C variant
> > else
> >         /* old code path, arch-specific
> >
> > I also tried __is_constexpr() as suggested by Andy, but it was
> > always returning 0 ('not a constant') for the 2,3 and 4th
> > conditions.
> >
> > The savings on x86_64 with LLVM are insane (.text):
> >
> > $ scripts/bloat-o-meter -c vmlinux.{base,test}
> > add/remove: 72/75 grow/shrink: 182/518 up/down: 53925/-137810 (-83885)
> >
> > $ scripts/bloat-o-meter -c vmlinux.{base,mod}
> > add/remove: 7/1 grow/shrink: 1/19 up/down: 1135/-4082 (-2947)
> >
> > $ scripts/bloat-o-meter -c vmlinux.{base,all}
> > add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> 
> Thank you!
> 
> I gave it a try on m68k, and am a bit disappointed seeing an increase
> in code size:
> 
>     add/remove: 49/13 grow/shrink: 279/138 up/down: 6434/-3342 (3092)

Ufff, that sucks =\
Could you please try to compile the following code snippet (with the
series applied)?

	unsigned long map;

	bitmap_zero(&map, BITS_PER_LONG);
	__set_bit(1, &map);
	BUILD_BUG_ON(!__builtin_constant_p(map));

If it fails during the vmlinux linkage, it will mean that on your
architecture/setup the compiler is unable to optimize the generic
implementations to compile-time constants and I'll need to debug
this more (probably via some compiler explorer).
You could also check the vmlinux size after applying each patch
to see which one does this if you feel like it :)

> 
> This is atari_defconfig on a tree based on v5.19-rc1, with
> m68k-linux-gnu-gcc (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0, GNU ld (GNU
> Binutils for Ubuntu) 2.34).
> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds

Thanks,
Olek

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 0/6] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-07 15:47     ` Alexander Lobakin
  0 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-07 15:47 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Alexander Lobakin, Arnd Bergmann, Yury Norov, Andy Shevchenko,
	Richard Henderson, Matt Turner, Brian Cain, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	alpha, open list:QUALCOMM HEXAGON...,
	linux-ia64, linux-m68k, Linux-sh list, sparclinux, Linux-Arch,
	Linux Kernel Mailing List

From: Geert Uytterhoeven <geert@linux-m68k.org>
Date: Tue, 7 Jun 2022 14:45:41 +0200

> Hi Alexander,

Hi!

> 
> On Mon, Jun 6, 2022 at 1:50 PM Alexander Lobakin
> <alexandr.lobakin@intel.com> wrote:
> > While I was working on converting some structure fields from a fixed
> > type to a bitmap, I started observing code size increase not only in
> > places where the code works with the converted structure fields, but
> > also where the converted vars were on the stack. That said, the
> > following code:
> >
> >         DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
> >         unsigned long bar = BIT(BAR_BIT);
> >         unsigned long baz = 0;
> >
> >         __set_bit(FOO_BIT, foo);
> >         baz |= BIT(BAZ_BIT);
> >
> >         BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
> >         BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
> >         BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
> >
> > triggers the first assertion on x86_64, which means that the
> > compiler is unable to evaluate it to a compile-time initializer
> > when the architecture-specific bitop is used even if it's obvious.
> > I found that this is due to that many architecture-specific
> > non-atomic bitop implementations use inline asm or other hacks which
> > are faster or more robust when working with "real" variables (i.e.
> > fields from the structures etc.), but the compilers have no clue how
> > to optimize them out when called on compile-time constants.
> >
> > So, in order to let the compiler optimize out such cases, expand the
> > test_bit() and __*_bit() definitions with a compile-time condition
> > check, so that they will pick the generic C non-atomic bitop
> > implementations when all of the arguments passed are compile-time
> > constants, which means that the result will be a compile-time
> > constant as well and the compiler will produce more efficient and
> > simple code in 100% cases (no changes when there's at least one
> > non-compile-time-constant argument).
> > The condition itself:
> >
> > if (
> > __builtin_constant_p(nr) &&     /* <- bit position is constant */
> > __builtin_constant_p(!!addr) && /* <- compiler knows bitmap addr is
> >                                       always either NULL or not */
> > addr &&                         /* <- bitmap addr is not NULL */
> > __builtin_constant_p(*addr)     /* <- compiler knows the value of
> >                                       the target bitmap */
> > )
> >         /* then pick the generic C variant
> > else
> >         /* old code path, arch-specific
> >
> > I also tried __is_constexpr() as suggested by Andy, but it was
> > always returning 0 ('not a constant') for the 2,3 and 4th
> > conditions.
> >
> > The savings on x86_64 with LLVM are insane (.text):
> >
> > $ scripts/bloat-o-meter -c vmlinux.{base,test}
> > add/remove: 72/75 grow/shrink: 182/518 up/down: 53925/-137810 (-83885)
> >
> > $ scripts/bloat-o-meter -c vmlinux.{base,mod}
> > add/remove: 7/1 grow/shrink: 1/19 up/down: 1135/-4082 (-2947)
> >
> > $ scripts/bloat-o-meter -c vmlinux.{base,all}
> > add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> 
> Thank you!
> 
> I gave it a try on m68k, and am a bit disappointed seeing an increase
> in code size:
> 
>     add/remove: 49/13 grow/shrink: 279/138 up/down: 6434/-3342 (3092)

Ufff, that sucks =\
Could you please try to compile the following code snippet (with the
series applied)?

	unsigned long map;

	bitmap_zero(&map, BITS_PER_LONG);
	__set_bit(1, &map);
	BUILD_BUG_ON(!__builtin_constant_p(map));

If it fails during the vmlinux linkage, it will mean that on your
architecture/setup the compiler is unable to optimize the generic
implementations to compile-time constants and I'll need to debug
this more (probably via some compiler explorer).
You could also check the vmlinux size after applying each patch
to see which one does this if you feel like it :)

> 
> This is atari_defconfig on a tree based on v5.19-rc1, with
> m68k-linux-gnu-gcc (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0, GNU ld (GNU
> Binutils for Ubuntu) 2.34).
> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds

Thanks,
Olek

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 0/6] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-07 15:47     ` Alexander Lobakin
  0 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-07 15:47 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Alexander Lobakin, Arnd Bergmann, Yury Norov, Andy Shevchenko,
	Richard Henderson, Matt Turner, Brian Cain, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	alpha, open list:QUALCOMM HEXAGON...,
	linux-ia64@vger.kernel.org

From: Geert Uytterhoeven <geert@linux-m68k.org>
Date: Tue, 7 Jun 2022 14:45:41 +0200

> Hi Alexander,

Hi!

> 
> On Mon, Jun 6, 2022 at 1:50 PM Alexander Lobakin
> <alexandr.lobakin@intel.com> wrote:
> > While I was working on converting some structure fields from a fixed
> > type to a bitmap, I started observing code size increase not only in
> > places where the code works with the converted structure fields, but
> > also where the converted vars were on the stack. That said, the
> > following code:
> >
> >         DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
> >         unsigned long bar = BIT(BAR_BIT);
> >         unsigned long baz = 0;
> >
> >         __set_bit(FOO_BIT, foo);
> >         baz |= BIT(BAZ_BIT);
> >
> >         BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
> >         BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
> >         BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
> >
> > triggers the first assertion on x86_64, which means that the
> > compiler is unable to evaluate it to a compile-time initializer
> > when the architecture-specific bitop is used even if it's obvious.
> > I found that this is due to that many architecture-specific
> > non-atomic bitop implementations use inline asm or other hacks which
> > are faster or more robust when working with "real" variables (i.e.
> > fields from the structures etc.), but the compilers have no clue how
> > to optimize them out when called on compile-time constants.
> >
> > So, in order to let the compiler optimize out such cases, expand the
> > test_bit() and __*_bit() definitions with a compile-time condition
> > check, so that they will pick the generic C non-atomic bitop
> > implementations when all of the arguments passed are compile-time
> > constants, which means that the result will be a compile-time
> > constant as well and the compiler will produce more efficient and
> > simple code in 100% cases (no changes when there's at least one
> > non-compile-time-constant argument).
> > The condition itself:
> >
> > if (
> > __builtin_constant_p(nr) &&     /* <- bit position is constant */
> > __builtin_constant_p(!!addr) && /* <- compiler knows bitmap addr is
> >                                       always either NULL or not */
> > addr &&                         /* <- bitmap addr is not NULL */
> > __builtin_constant_p(*addr)     /* <- compiler knows the value of
> >                                       the target bitmap */
> > )
> >         /* then pick the generic C variant
> > else
> >         /* old code path, arch-specific
> >
> > I also tried __is_constexpr() as suggested by Andy, but it was
> > always returning 0 ('not a constant') for the 2,3 and 4th
> > conditions.
> >
> > The savings on x86_64 with LLVM are insane (.text):
> >
> > $ scripts/bloat-o-meter -c vmlinux.{base,test}
> > add/remove: 72/75 grow/shrink: 182/518 up/down: 53925/-137810 (-83885)
> >
> > $ scripts/bloat-o-meter -c vmlinux.{base,mod}
> > add/remove: 7/1 grow/shrink: 1/19 up/down: 1135/-4082 (-2947)
> >
> > $ scripts/bloat-o-meter -c vmlinux.{base,all}
> > add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> 
> Thank you!
> 
> I gave it a try on m68k, and am a bit disappointed seeing an increase
> in code size:
> 
>     add/remove: 49/13 grow/shrink: 279/138 up/down: 6434/-3342 (3092)

Ufff, that sucks =\
Could you please try to compile the following code snippet (with the
series applied)?

	unsigned long map;

	bitmap_zero(&map, BITS_PER_LONG);
	__set_bit(1, &map);
	BUILD_BUG_ON(!__builtin_constant_p(map));

If it fails during the vmlinux linkage, it will mean that on your
architecture/setup the compiler is unable to optimize the generic
implementations to compile-time constants and I'll need to debug
this more (probably via some compiler explorer).
You could also check the vmlinux size after applying each patch
to see which one does this if you feel like it :)

> 
> This is atari_defconfig on a tree based on v5.19-rc1, with
> m68k-linux-gnu-gcc (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0, GNU ld (GNU
> Binutils for Ubuntu) 2.34).
> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds

Thanks,
Olek

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 3/6] bitops: define gen_test_bit() the same way as the rest of functions
  2022-06-07 13:43     ` Marco Elver
  (?)
@ 2022-06-07 15:57       ` Alexander Lobakin
  -1 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-07 15:57 UTC (permalink / raw)
  To: Marco Elver
  Cc: Alexander Lobakin, Arnd Bergmann, Yury Norov, Andy Shevchenko,
	Richard Henderson, Matt Turner, Brian Cain, Geert Uytterhoeven,
	Yoshinori Sato, Rich Felker, David S. Miller, Kees Cook,
	Peter Zijlstra (Intel),
	Borislav Petkov, Tony Luck, Greg Kroah-Hartman, linux-alpha,
	linux-hexagon, linux-ia64, linux-m68k, linux-sh, sparclinux,
	linux-arch, linux-kernel

From: Marco Elver <elver@google.com>
Date: Tue, 7 Jun 2022 15:43:49 +0200

> On Mon, Jun 06, 2022 at 01:49PM +0200, Alexander Lobakin wrote:
> > Currently, the generic test_bit() function is defined as a one-liner
> > and in case with constant bitmaps the compiler is unable to optimize
> > it to a constant. At the same time, gen_test_and_*_bit() are being
> > optimized pretty good.
> > Define gen_test_bit() the same way as they are defined.
> > 
> > Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> > ---
> >  include/asm-generic/bitops/generic-non-atomic.h | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> > index 7a60adfa6e7d..202d8a3b40e1 100644
> > --- a/include/asm-generic/bitops/generic-non-atomic.h
> > +++ b/include/asm-generic/bitops/generic-non-atomic.h
> > @@ -118,7 +118,11 @@ gen___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
> >  static __always_inline int
> >  gen_test_bit(unsigned int nr, const volatile unsigned long *addr)
> >  {
> > -	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> > +	const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
> > +	unsigned long mask = BIT_MASK(nr);
> > +	unsigned long val = *p;
> > +
> > +	return !!(val & mask);
> 
> Unfortunately this makes the dereference of 'addr' non-volatile, and
> effectively weakens test_bit() to the point where I'd no longer consider
> it atomic. Per atomic_bitops.txt, test_bit() is atomic.
> 
> The generic version has been using a volatile access to make it atomic
> (akin to generic READ_ONCE() casting to volatile). The volatile is also
> the reason the compiler can't optimize much, because volatile forces a
> real memory access.

Ah-ha, I see now. Thanks for catching and explaining this!

> 
> Yes, confusingly, test_bit() lives in non-atomic.h, and this had caused
> confusion before, but the decision was made that moving it will cause
> headaches for ppc so it was left alone:
> https://lore.kernel.org/all/87a78xgu8o.fsf@dja-thinkpad.axtens.net/T/#u
> 
> As for how to make test_bit() more compiler-optimization friendly, I'm
> guessing that test_bit() needs some special casing where even the
> generic arch_test_bit() is different from the gen_test_bit().
> gen_test_bit() should probably assert that whatever it is called with
> can actually be evaluated at compile-time so it is never accidentally
> used otherwise.

I like the idea! Will do in v2.
I can move the generics and after, right below them, define
'const_*' helpers which will mostly redirect to 'generic_*', but
for test_bit() it will be a separate function with no `volatile`
and with an assertion that the input args are constants.

> 
> I would also propose adding a comment close to the deref that test_bit()
> is atomic and the deref needs to remain volatile, so future people will
> not try to do the same optimization.

I think that's also the reason why it's not underscored, right?

> 
> Thanks,
> -- Marco

Thanks,
Olek

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 3/6] bitops: define gen_test_bit() the same way as the rest of functions
@ 2022-06-07 15:57       ` Alexander Lobakin
  0 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-07 15:57 UTC (permalink / raw)
  To: Marco Elver
  Cc: Alexander Lobakin, Arnd Bergmann, Yury Norov, Andy Shevchenko,
	Richard Henderson, Matt Turner, Brian Cain, Geert Uytterhoeven,
	Yoshinori Sato, Rich Felker, David S. Miller, Kees Cook,
	Peter Zijlstra (Intel),
	Borislav Petkov, Tony Luck, Greg Kroah-Hartman, linux-alpha,
	linux-hexagon, linux-ia64, linux-m68k, linux-sh, sparclinux,
	linux-arch, linux-kernel

From: Marco Elver <elver@google.com>
Date: Tue, 7 Jun 2022 15:43:49 +0200

> On Mon, Jun 06, 2022 at 01:49PM +0200, Alexander Lobakin wrote:
> > Currently, the generic test_bit() function is defined as a one-liner
> > and in case with constant bitmaps the compiler is unable to optimize
> > it to a constant. At the same time, gen_test_and_*_bit() are being
> > optimized pretty good.
> > Define gen_test_bit() the same way as they are defined.
> > 
> > Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> > ---
> >  include/asm-generic/bitops/generic-non-atomic.h | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> > index 7a60adfa6e7d..202d8a3b40e1 100644
> > --- a/include/asm-generic/bitops/generic-non-atomic.h
> > +++ b/include/asm-generic/bitops/generic-non-atomic.h
> > @@ -118,7 +118,11 @@ gen___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
> >  static __always_inline int
> >  gen_test_bit(unsigned int nr, const volatile unsigned long *addr)
> >  {
> > -	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> > +	const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
> > +	unsigned long mask = BIT_MASK(nr);
> > +	unsigned long val = *p;
> > +
> > +	return !!(val & mask);
> 
> Unfortunately this makes the dereference of 'addr' non-volatile, and
> effectively weakens test_bit() to the point where I'd no longer consider
> it atomic. Per atomic_bitops.txt, test_bit() is atomic.
> 
> The generic version has been using a volatile access to make it atomic
> (akin to generic READ_ONCE() casting to volatile). The volatile is also
> the reason the compiler can't optimize much, because volatile forces a
> real memory access.

Ah-ha, I see now. Thanks for catching and explaining this!

> 
> Yes, confusingly, test_bit() lives in non-atomic.h, and this had caused
> confusion before, but the decision was made that moving it will cause
> headaches for ppc so it was left alone:
> https://lore.kernel.org/all/87a78xgu8o.fsf@dja-thinkpad.axtens.net/T/#u
> 
> As for how to make test_bit() more compiler-optimization friendly, I'm
> guessing that test_bit() needs some special casing where even the
> generic arch_test_bit() is different from the gen_test_bit().
> gen_test_bit() should probably assert that whatever it is called with
> can actually be evaluated at compile-time so it is never accidentally
> used otherwise.

I like the idea! Will do in v2.
I can move the generics and after, right below them, define
'const_*' helpers which will mostly redirect to 'generic_*', but
for test_bit() it will be a separate function with no `volatile`
and with an assertion that the input args are constants.

> 
> I would also propose adding a comment close to the deref that test_bit()
> is atomic and the deref needs to remain volatile, so future people will
> not try to do the same optimization.

I think that's also the reason why it's not underscored, right?

> 
> Thanks,
> -- Marco

Thanks,
Olek

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 3/6] bitops: define gen_test_bit() the same way as the rest of functions
@ 2022-06-07 15:57       ` Alexander Lobakin
  0 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-07 15:57 UTC (permalink / raw)
  To: Marco Elver
  Cc: Alexander Lobakin, Arnd Bergmann, Yury Norov, Andy Shevchenko,
	Richard Henderson, Matt Turner, Brian Cain, Geert Uytterhoeven,
	Yoshinori Sato, Rich Felker, David S. Miller, Kees Cook,
	Peter Zijlstra (Intel),
	Borislav Petkov, Tony Luck, Greg Kroah-Hartman, linux-alpha,
	linux-hexagon, linux-ia64, linux-m68k, linux-sh

From: Marco Elver <elver@google.com>
Date: Tue, 7 Jun 2022 15:43:49 +0200

> On Mon, Jun 06, 2022 at 01:49PM +0200, Alexander Lobakin wrote:
> > Currently, the generic test_bit() function is defined as a one-liner
> > and in case with constant bitmaps the compiler is unable to optimize
> > it to a constant. At the same time, gen_test_and_*_bit() are being
> > optimized pretty good.
> > Define gen_test_bit() the same way as they are defined.
> > 
> > Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> > ---
> >  include/asm-generic/bitops/generic-non-atomic.h | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> > index 7a60adfa6e7d..202d8a3b40e1 100644
> > --- a/include/asm-generic/bitops/generic-non-atomic.h
> > +++ b/include/asm-generic/bitops/generic-non-atomic.h
> > @@ -118,7 +118,11 @@ gen___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
> >  static __always_inline int
> >  gen_test_bit(unsigned int nr, const volatile unsigned long *addr)
> >  {
> > -	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> > +	const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
> > +	unsigned long mask = BIT_MASK(nr);
> > +	unsigned long val = *p;
> > +
> > +	return !!(val & mask);
> 
> Unfortunately this makes the dereference of 'addr' non-volatile, and
> effectively weakens test_bit() to the point where I'd no longer consider
> it atomic. Per atomic_bitops.txt, test_bit() is atomic.
> 
> The generic version has been using a volatile access to make it atomic
> (akin to generic READ_ONCE() casting to volatile). The volatile is also
> the reason the compiler can't optimize much, because volatile forces a
> real memory access.

Ah-ha, I see now. Thanks for catching and explaining this!

> 
> Yes, confusingly, test_bit() lives in non-atomic.h, and this had caused
> confusion before, but the decision was made that moving it will cause
> headaches for ppc so it was left alone:
> https://lore.kernel.org/all/87a78xgu8o.fsf@dja-thinkpad.axtens.net/T/#u
> 
> As for how to make test_bit() more compiler-optimization friendly, I'm
> guessing that test_bit() needs some special casing where even the
> generic arch_test_bit() is different from the gen_test_bit().
> gen_test_bit() should probably assert that whatever it is called with
> can actually be evaluated at compile-time so it is never accidentally
> used otherwise.

I like the idea! Will do in v2.
I can move the generics and after, right below them, define
'const_*' helpers which will mostly redirect to 'generic_*', but
for test_bit() it will be a separate function with no `volatile`
and with an assertion that the input args are constants.

> 
> I would also propose adding a comment close to the deref that test_bit()
> is atomic and the deref needs to remain volatile, so future people will
> not try to do the same optimization.

I think that's also the reason why it's not underscored, right?

> 
> Thanks,
> -- Marco

Thanks,
Olek

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 3/6] bitops: define gen_test_bit() the same way as the rest of functions
  2022-06-07 15:57       ` Alexander Lobakin
  (?)
@ 2022-06-07 16:15         ` Marco Elver
  -1 siblings, 0 replies; 75+ messages in thread
From: Marco Elver @ 2022-06-07 16:15 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Borislav Petkov, Tony Luck, Greg Kroah-Hartman, linux-alpha,
	linux-hexagon, linux-ia64, linux-m68k, linux-sh, sparclinux,
	linux-arch, linux-kernel

On Tue, 7 Jun 2022 at 18:05, Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
>
> From: Marco Elver <elver@google.com>
> Date: Tue, 7 Jun 2022 15:43:49 +0200
>
> > On Mon, Jun 06, 2022 at 01:49PM +0200, Alexander Lobakin wrote:
> > > Currently, the generic test_bit() function is defined as a one-liner
> > > and in case with constant bitmaps the compiler is unable to optimize
> > > it to a constant. At the same time, gen_test_and_*_bit() are being
> > > optimized pretty good.
> > > Define gen_test_bit() the same way as they are defined.
> > >
> > > Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> > > ---
> > >  include/asm-generic/bitops/generic-non-atomic.h | 6 +++++-
> > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> > > index 7a60adfa6e7d..202d8a3b40e1 100644
> > > --- a/include/asm-generic/bitops/generic-non-atomic.h
> > > +++ b/include/asm-generic/bitops/generic-non-atomic.h
> > > @@ -118,7 +118,11 @@ gen___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
> > >  static __always_inline int
> > >  gen_test_bit(unsigned int nr, const volatile unsigned long *addr)
> > >  {
> > > -   return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> > > +   const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
> > > +   unsigned long mask = BIT_MASK(nr);
> > > +   unsigned long val = *p;
> > > +
> > > +   return !!(val & mask);
> >
> > Unfortunately this makes the dereference of 'addr' non-volatile, and
> > effectively weakens test_bit() to the point where I'd no longer consider
> > it atomic. Per atomic_bitops.txt, test_bit() is atomic.
> >
> > The generic version has been using a volatile access to make it atomic
> > (akin to generic READ_ONCE() casting to volatile). The volatile is also
> > the reason the compiler can't optimize much, because volatile forces a
> > real memory access.
>
> Ah-ha, I see now. Thanks for catching and explaining this!
>
> >
> > Yes, confusingly, test_bit() lives in non-atomic.h, and this had caused
> > confusion before, but the decision was made that moving it will cause
> > headaches for ppc so it was left alone:
> > https://lore.kernel.org/all/87a78xgu8o.fsf@dja-thinkpad.axtens.net/T/#u
> >
> > As for how to make test_bit() more compiler-optimization friendly, I'm
> > guessing that test_bit() needs some special casing where even the
> > generic arch_test_bit() is different from the gen_test_bit().
> > gen_test_bit() should probably assert that whatever it is called with
> > can actually be evaluated at compile-time so it is never accidentally
> > used otherwise.
>
> I like the idea! Will do in v2.
> I can move the generics and after, right below them, define
> 'const_*' helpers which will mostly redirect to 'generic_*', but
> for test_bit() it will be a separate function with no `volatile`
> and with an assertion that the input args are constants.

Be aware that there's already a "constant_test_bit()" in
arch/x86/include/asm/bitops.h, which uses 2 versions of test_bit() if
'nr' is constant or not. I guess you can steer clear of that if you
use "const_", but they do sound similar.

> >
> > I would also propose adding a comment close to the deref that test_bit()
> > is atomic and the deref needs to remain volatile, so future people will
> > not try to do the same optimization.
>
> I think that's also the reason why it's not underscored, right?

Yes, the naming convention is that double-underscored ones are
non-atomic so that's one clue indeed. Documentation/atomic_bitops.txt
another, and unlike the other non-atomic bitops, its kernel-doc
comment also does not mention "This operation is non-atomic...".

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 3/6] bitops: define gen_test_bit() the same way as the rest of functions
@ 2022-06-07 16:15         ` Marco Elver
  0 siblings, 0 replies; 75+ messages in thread
From: Marco Elver @ 2022-06-07 16:15 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Borislav Petkov, Tony Luck, Greg Kroah-Hartman, linux-alpha,
	linux-hexagon, linux-ia64, linux-m68k, linux-sh, sparclinux,
	linux-arch, linux-kernel

On Tue, 7 Jun 2022 at 18:05, Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
>
> From: Marco Elver <elver@google.com>
> Date: Tue, 7 Jun 2022 15:43:49 +0200
>
> > On Mon, Jun 06, 2022 at 01:49PM +0200, Alexander Lobakin wrote:
> > > Currently, the generic test_bit() function is defined as a one-liner
> > > and in case with constant bitmaps the compiler is unable to optimize
> > > it to a constant. At the same time, gen_test_and_*_bit() are being
> > > optimized pretty good.
> > > Define gen_test_bit() the same way as they are defined.
> > >
> > > Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> > > ---
> > >  include/asm-generic/bitops/generic-non-atomic.h | 6 +++++-
> > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> > > index 7a60adfa6e7d..202d8a3b40e1 100644
> > > --- a/include/asm-generic/bitops/generic-non-atomic.h
> > > +++ b/include/asm-generic/bitops/generic-non-atomic.h
> > > @@ -118,7 +118,11 @@ gen___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
> > >  static __always_inline int
> > >  gen_test_bit(unsigned int nr, const volatile unsigned long *addr)
> > >  {
> > > -   return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> > > +   const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
> > > +   unsigned long mask = BIT_MASK(nr);
> > > +   unsigned long val = *p;
> > > +
> > > +   return !!(val & mask);
> >
> > Unfortunately this makes the dereference of 'addr' non-volatile, and
> > effectively weakens test_bit() to the point where I'd no longer consider
> > it atomic. Per atomic_bitops.txt, test_bit() is atomic.
> >
> > The generic version has been using a volatile access to make it atomic
> > (akin to generic READ_ONCE() casting to volatile). The volatile is also
> > the reason the compiler can't optimize much, because volatile forces a
> > real memory access.
>
> Ah-ha, I see now. Thanks for catching and explaining this!
>
> >
> > Yes, confusingly, test_bit() lives in non-atomic.h, and this had caused
> > confusion before, but the decision was made that moving it will cause
> > headaches for ppc so it was left alone:
> > https://lore.kernel.org/all/87a78xgu8o.fsf@dja-thinkpad.axtens.net/T/#u
> >
> > As for how to make test_bit() more compiler-optimization friendly, I'm
> > guessing that test_bit() needs some special casing where even the
> > generic arch_test_bit() is different from the gen_test_bit().
> > gen_test_bit() should probably assert that whatever it is called with
> > can actually be evaluated at compile-time so it is never accidentally
> > used otherwise.
>
> I like the idea! Will do in v2.
> I can move the generics and after, right below them, define
> 'const_*' helpers which will mostly redirect to 'generic_*', but
> for test_bit() it will be a separate function with no `volatile`
> and with an assertion that the input args are constants.

Be aware that there's already a "constant_test_bit()" in
arch/x86/include/asm/bitops.h, which uses 2 versions of test_bit() if
'nr' is constant or not. I guess you can steer clear of that if you
use "const_", but they do sound similar.

> >
> > I would also propose adding a comment close to the deref that test_bit()
> > is atomic and the deref needs to remain volatile, so future people will
> > not try to do the same optimization.
>
> I think that's also the reason why it's not underscored, right?

Yes, the naming convention is that double-underscored ones are
non-atomic so that's one clue indeed. Documentation/atomic_bitops.txt
another, and unlike the other non-atomic bitops, its kernel-doc
comment also does not mention "This operation is non-atomic...".

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 3/6] bitops: define gen_test_bit() the same way as the rest of functions
@ 2022-06-07 16:15         ` Marco Elver
  0 siblings, 0 replies; 75+ messages in thread
From: Marco Elver @ 2022-06-07 16:15 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Borislav Petkov, Tony Luck, Greg Kroah-Hartman, linux-alpha,
	linux-hexagon, linux-ia64, linux-m68k, linux-sh, sparclinux,
	linux-arch

On Tue, 7 Jun 2022 at 18:05, Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
>
> From: Marco Elver <elver@google.com>
> Date: Tue, 7 Jun 2022 15:43:49 +0200
>
> > On Mon, Jun 06, 2022 at 01:49PM +0200, Alexander Lobakin wrote:
> > > Currently, the generic test_bit() function is defined as a one-liner
> > > and in case with constant bitmaps the compiler is unable to optimize
> > > it to a constant. At the same time, gen_test_and_*_bit() are being
> > > optimized pretty good.
> > > Define gen_test_bit() the same way as they are defined.
> > >
> > > Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> > > ---
> > >  include/asm-generic/bitops/generic-non-atomic.h | 6 +++++-
> > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> > > index 7a60adfa6e7d..202d8a3b40e1 100644
> > > --- a/include/asm-generic/bitops/generic-non-atomic.h
> > > +++ b/include/asm-generic/bitops/generic-non-atomic.h
> > > @@ -118,7 +118,11 @@ gen___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
> > >  static __always_inline int
> > >  gen_test_bit(unsigned int nr, const volatile unsigned long *addr)
> > >  {
> > > -   return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> > > +   const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
> > > +   unsigned long mask = BIT_MASK(nr);
> > > +   unsigned long val = *p;
> > > +
> > > +   return !!(val & mask);
> >
> > Unfortunately this makes the dereference of 'addr' non-volatile, and
> > effectively weakens test_bit() to the point where I'd no longer consider
> > it atomic. Per atomic_bitops.txt, test_bit() is atomic.
> >
> > The generic version has been using a volatile access to make it atomic
> > (akin to generic READ_ONCE() casting to volatile). The volatile is also
> > the reason the compiler can't optimize much, because volatile forces a
> > real memory access.
>
> Ah-ha, I see now. Thanks for catching and explaining this!
>
> >
> > Yes, confusingly, test_bit() lives in non-atomic.h, and this had caused
> > confusion before, but the decision was made that moving it will cause
> > headaches for ppc so it was left alone:
> > https://lore.kernel.org/all/87a78xgu8o.fsf@dja-thinkpad.axtens.net/T/#u
> >
> > As for how to make test_bit() more compiler-optimization friendly, I'm
> > guessing that test_bit() needs some special casing where even the
> > generic arch_test_bit() is different from the gen_test_bit().
> > gen_test_bit() should probably assert that whatever it is called with
> > can actually be evaluated at compile-time so it is never accidentally
> > used otherwise.
>
> I like the idea! Will do in v2.
> I can move the generics and after, right below them, define
> 'const_*' helpers which will mostly redirect to 'generic_*', but
> for test_bit() it will be a separate function with no `volatile`
> and with an assertion that the input args are constants.

Be aware that there's already a "constant_test_bit()" in
arch/x86/include/asm/bitops.h, which uses 2 versions of test_bit() if
'nr' is constant or not. I guess you can steer clear of that if you
use "const_", but they do sound similar.

> >
> > I would also propose adding a comment close to the deref that test_bit()
> > is atomic and the deref needs to remain volatile, so future people will
> > not try to do the same optimization.
>
> I think that's also the reason why it's not underscored, right?

Yes, the naming convention is that double-underscored ones are
non-atomic so that's one clue indeed. Documentation/atomic_bitops.txt
another, and unlike the other non-atomic bitops, its kernel-doc
comment also does not mention "This operation is non-atomic...".

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 3/6] bitops: define gen_test_bit() the same way as the rest of functions
  2022-06-07 15:57       ` Alexander Lobakin
  (?)
@ 2022-06-07 16:28         ` Andy Shevchenko
  -1 siblings, 0 replies; 75+ messages in thread
From: Andy Shevchenko @ 2022-06-07 16:28 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Marco Elver, Arnd Bergmann, Yury Norov, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Borislav Petkov, Tony Luck, Greg Kroah-Hartman, linux-alpha,
	linux-hexagon, linux-ia64, linux-m68k, linux-sh, sparclinux,
	linux-arch, linux-kernel

On Tue, Jun 07, 2022 at 05:57:22PM +0200, Alexander Lobakin wrote:
> From: Marco Elver <elver@google.com>
> Date: Tue, 7 Jun 2022 15:43:49 +0200
> > On Mon, Jun 06, 2022 at 01:49PM +0200, Alexander Lobakin wrote:

...

> > I would also propose adding a comment close to the deref that test_bit()
> > is atomic and the deref needs to remain volatile, so future people will
> > not try to do the same optimization.
> 
> I think that's also the reason why it's not underscored, right?

Non-__ prefixed bitops are atomic, __ non-atomic.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 3/6] bitops: define gen_test_bit() the same way as the rest of functions
@ 2022-06-07 16:28         ` Andy Shevchenko
  0 siblings, 0 replies; 75+ messages in thread
From: Andy Shevchenko @ 2022-06-07 16:28 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Marco Elver, Arnd Bergmann, Yury Norov, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Borislav Petkov, Tony Luck, Greg Kroah-Hartman, linux-alpha,
	linux-hexagon, linux-ia64, linux-m68k, linux-sh, sparclinux,
	linux-arch, linux-kernel

On Tue, Jun 07, 2022 at 05:57:22PM +0200, Alexander Lobakin wrote:
> From: Marco Elver <elver@google.com>
> Date: Tue, 7 Jun 2022 15:43:49 +0200
> > On Mon, Jun 06, 2022 at 01:49PM +0200, Alexander Lobakin wrote:

...

> > I would also propose adding a comment close to the deref that test_bit()
> > is atomic and the deref needs to remain volatile, so future people will
> > not try to do the same optimization.
> 
> I think that's also the reason why it's not underscored, right?

Non-__ prefixed bitops are atomic, __ non-atomic.

-- 
With Best Regards,
Andy Shevchenko


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 3/6] bitops: define gen_test_bit() the same way as the rest of functions
@ 2022-06-07 16:28         ` Andy Shevchenko
  0 siblings, 0 replies; 75+ messages in thread
From: Andy Shevchenko @ 2022-06-07 16:28 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Marco Elver, Arnd Bergmann, Yury Norov, Richard Henderson,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Borislav Petkov, Tony Luck, Greg Kroah-Hartman, linux-alpha,
	linux-hexagon, linux-ia64, linux-m68k, linux-sh, sparclinux,
	linux-arch, linux-ke

On Tue, Jun 07, 2022 at 05:57:22PM +0200, Alexander Lobakin wrote:
> From: Marco Elver <elver@google.com>
> Date: Tue, 7 Jun 2022 15:43:49 +0200
> > On Mon, Jun 06, 2022 at 01:49PM +0200, Alexander Lobakin wrote:

...

> > I would also propose adding a comment close to the deref that test_bit()
> > is atomic and the deref needs to remain volatile, so future people will
> > not try to do the same optimization.
> 
> I think that's also the reason why it's not underscored, right?

Non-__ prefixed bitops are atomic, __ non-atomic.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 0/6] bitops: let optimize out non-atomic bitops on compile-time constants
  2022-06-07 15:47     ` Alexander Lobakin
  (?)
@ 2022-06-08  9:55       ` Geert Uytterhoeven
  -1 siblings, 0 replies; 75+ messages in thread
From: Geert Uytterhoeven @ 2022-06-08  9:55 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	alpha, open list:QUALCOMM HEXAGON...,
	linux-ia64, linux-m68k, Linux-sh list, sparclinux, Linux-Arch,
	Linux Kernel Mailing List

Hi Olek,

On Tue, Jun 7, 2022 at 5:51 PM Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
> From: Geert Uytterhoeven <geert@linux-m68k.org>
> > On Mon, Jun 6, 2022 at 1:50 PM Alexander Lobakin
> > <alexandr.lobakin@intel.com> wrote:
> > > While I was working on converting some structure fields from a fixed
> > > type to a bitmap, I started observing code size increase not only in
> > > places where the code works with the converted structure fields, but
> > > also where the converted vars were on the stack. That said, the
> > > following code:
> > >
> > >         DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
> > >         unsigned long bar = BIT(BAR_BIT);
> > >         unsigned long baz = 0;
> > >
> > >         __set_bit(FOO_BIT, foo);
> > >         baz |= BIT(BAZ_BIT);
> > >
> > >         BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
> > >         BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
> > >         BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
> > >
> > > triggers the first assertion on x86_64, which means that the
> > > compiler is unable to evaluate it to a compile-time initializer
> > > when the architecture-specific bitop is used even if it's obvious.
> > > I found that this is due to that many architecture-specific
> > > non-atomic bitop implementations use inline asm or other hacks which
> > > are faster or more robust when working with "real" variables (i.e.
> > > fields from the structures etc.), but the compilers have no clue how
> > > to optimize them out when called on compile-time constants.
> > >
> > > So, in order to let the compiler optimize out such cases, expand the
> > > test_bit() and __*_bit() definitions with a compile-time condition
> > > check, so that they will pick the generic C non-atomic bitop
> > > implementations when all of the arguments passed are compile-time
> > > constants, which means that the result will be a compile-time
> > > constant as well and the compiler will produce more efficient and
> > > simple code in 100% cases (no changes when there's at least one
> > > non-compile-time-constant argument).
> > > The condition itself:
> > >
> > > if (
> > > __builtin_constant_p(nr) &&     /* <- bit position is constant */
> > > __builtin_constant_p(!!addr) && /* <- compiler knows bitmap addr is
> > >                                       always either NULL or not */
> > > addr &&                         /* <- bitmap addr is not NULL */
> > > __builtin_constant_p(*addr)     /* <- compiler knows the value of
> > >                                       the target bitmap */
> > > )
> > >         /* then pick the generic C variant
> > > else
> > >         /* old code path, arch-specific
> > >
> > > I also tried __is_constexpr() as suggested by Andy, but it was
> > > always returning 0 ('not a constant') for the 2,3 and 4th
> > > conditions.
> > >
> > > The savings on x86_64 with LLVM are insane (.text):
> > >
> > > $ scripts/bloat-o-meter -c vmlinux.{base,test}
> > > add/remove: 72/75 grow/shrink: 182/518 up/down: 53925/-137810 (-83885)
> > >
> > > $ scripts/bloat-o-meter -c vmlinux.{base,mod}
> > > add/remove: 7/1 grow/shrink: 1/19 up/down: 1135/-4082 (-2947)
> > >
> > > $ scripts/bloat-o-meter -c vmlinux.{base,all}
> > > add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> >
> > Thank you!
> >
> > I gave it a try on m68k, and am a bit disappointed seeing an increase
> > in code size:
> >
> >     add/remove: 49/13 grow/shrink: 279/138 up/down: 6434/-3342 (3092)
>
> Ufff, that sucks =\
> Could you please try to compile the following code snippet (with the
> series applied)?
>
>         unsigned long map;
>
>         bitmap_zero(&map, BITS_PER_LONG);
>         __set_bit(1, &map);
>         BUILD_BUG_ON(!__builtin_constant_p(map));
>
> If it fails during the vmlinux linkage, it will mean that on your
> architecture/setup the compiler is unable to optimize the generic
> implementations to compile-time constants and I'll need to debug
> this more (probably via some compiler explorer).

Builds and links fine.

> You could also check the vmlinux size after applying each patch
> to see which one does this if you feel like it :)

The (incremental) impact of the various patches is shown below:

    bitops: unify non-atomic bitops prototypes across architectures
      add/remove: 4/11 grow/shrink: 123/160 up/down: 1700/-2786 (-1086)

    bitops: let optimize out non-atomic bitops on compile-time constants
      add/remove: 50/7 grow/shrink: 280/101 up/down: 6798/-2620 (4178)

I.e. the total impact is -1086 + 4178 = +3092

Looking at the impact of the last change on a single file, with rather
small functions to make it easier to analyze, the results are:

    bloat-o-meter net/core/sock.o{.orig,}
    add/remove: 3/1 grow/shrink: 20/3 up/down: 286/-68 (218)
    Function                                     old     new   delta
    sock_flag                                      -      38     +38
    sock_set_flag                                  -      22     +22
    sock_reset_flag                                -      22     +22
    sock_recv_errqueue                           264     284     +20
    sock_alloc_send_pskb                         406     424     +18
    __sock_set_timestamps                        104     122     +18
    sock_setsockopt                             2412    2428     +16
    sock_pfree                                    52      66     +14
    sock_wfree                                   236     248     +12
    sk_wait_data                                 222     234     +12
    sk_destruct                                   70      82     +12
    sk_wake_async                                 40      50     +10
    sk_set_memalloc                               74      84     +10
    __sock_queue_rcv_skb                         254     264     +10
    __sk_backlog_rcv                              92     102     +10
    sock_getsockopt                             1734    1742      +8
    sock_no_linger                                36      42      +6
    sk_clone_lock                                478     484      +6
    sk_clear_memalloc                             98     104      +6
    __sk_receive_skb                             194     200      +6
    sock_init_data                               344     348      +4
    __sock_cmsg_send                             196     200      +4
    sk_common_release                            152     154      +2
    sock_set_keepalive                            62      60      -2
    sock_enable_timestamp                         80      72      -8
    sock_valbool_flag                             34      12     -22
    bset_mem_set_bit                              36       -     -36
    Total: Before=18862, After=19080, chg +1.16%

Several small inline functions are no longer inlined.
And e.g. __sk_backlog_rcv() increases as it now calls sock_flag()
out-of-line, and needs to save more on the stack:

 __sk_backlog_rcv:
+       move.l %a3,-(%sp)       |,
        move.l %d2,-(%sp)       |,
-       move.l 8(%sp),%a0       | sk, sk
-| arch/m68k/include/asm/bitops.h:163:  return (addr[nr >> 5] & (1UL
<< (nr & 31))) != 0;
-       move.l 76(%a0),%d0      | MEM[(const long unsigned int
*)sk_6(D) + 76B], _14
+       move.l 12(%sp),%a3      | sk, sk
 | net/core/sock.c:330:         BUG_ON(!sock_flag(sk, SOCK_MEMALLOC));
-       btst #14,%d0    |, _14
-       jne .L193               |
+       pea 14.w                |
+       move.l %a3,-(%sp)       | sk,
+       jsr sock_flag           |
+       addq.l #8,%sp   |,
+       tst.b %d0       | tmp50
+       jne .L192               |
        pea 330.w               |
        pea .LC0                |
        pea .LC3                |
        jsr _printk             |
        trap #7

Note that the above is with atari_defconfig, which has
CONFIG_CC_OPTIMIZE_FOR_SIZE=y.

Switching to CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y results in a kernel
that is ca. 25% larger, and the net impact of this series is:

    add/remove: 24/27 grow/shrink: 227/233 up/down: 7494/-8080 (-586)

i.e. a reduction in size...

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 0/6] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-08  9:55       ` Geert Uytterhoeven
  0 siblings, 0 replies; 75+ messages in thread
From: Geert Uytterhoeven @ 2022-06-08  9:55 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	alpha, open list:QUALCOMM HEXAGON...,
	linux-ia64, linux-m68k, Linux-sh list, sparclinux, Linux-Arch,
	Linux Kernel Mailing List

Hi Olek,

On Tue, Jun 7, 2022 at 5:51 PM Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
> From: Geert Uytterhoeven <geert@linux-m68k.org>
> > On Mon, Jun 6, 2022 at 1:50 PM Alexander Lobakin
> > <alexandr.lobakin@intel.com> wrote:
> > > While I was working on converting some structure fields from a fixed
> > > type to a bitmap, I started observing code size increase not only in
> > > places where the code works with the converted structure fields, but
> > > also where the converted vars were on the stack. That said, the
> > > following code:
> > >
> > >         DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
> > >         unsigned long bar = BIT(BAR_BIT);
> > >         unsigned long baz = 0;
> > >
> > >         __set_bit(FOO_BIT, foo);
> > >         baz |= BIT(BAZ_BIT);
> > >
> > >         BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
> > >         BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
> > >         BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
> > >
> > > triggers the first assertion on x86_64, which means that the
> > > compiler is unable to evaluate it to a compile-time initializer
> > > when the architecture-specific bitop is used even if it's obvious.
> > > I found that this is due to that many architecture-specific
> > > non-atomic bitop implementations use inline asm or other hacks which
> > > are faster or more robust when working with "real" variables (i.e.
> > > fields from the structures etc.), but the compilers have no clue how
> > > to optimize them out when called on compile-time constants.
> > >
> > > So, in order to let the compiler optimize out such cases, expand the
> > > test_bit() and __*_bit() definitions with a compile-time condition
> > > check, so that they will pick the generic C non-atomic bitop
> > > implementations when all of the arguments passed are compile-time
> > > constants, which means that the result will be a compile-time
> > > constant as well and the compiler will produce more efficient and
> > > simple code in 100% cases (no changes when there's at least one
> > > non-compile-time-constant argument).
> > > The condition itself:
> > >
> > > if (
> > > __builtin_constant_p(nr) &&     /* <- bit position is constant */
> > > __builtin_constant_p(!!addr) && /* <- compiler knows bitmap addr is
> > >                                       always either NULL or not */
> > > addr &&                         /* <- bitmap addr is not NULL */
> > > __builtin_constant_p(*addr)     /* <- compiler knows the value of
> > >                                       the target bitmap */
> > > )
> > >         /* then pick the generic C variant
> > > else
> > >         /* old code path, arch-specific
> > >
> > > I also tried __is_constexpr() as suggested by Andy, but it was
> > > always returning 0 ('not a constant') for the 2,3 and 4th
> > > conditions.
> > >
> > > The savings on x86_64 with LLVM are insane (.text):
> > >
> > > $ scripts/bloat-o-meter -c vmlinux.{base,test}
> > > add/remove: 72/75 grow/shrink: 182/518 up/down: 53925/-137810 (-83885)
> > >
> > > $ scripts/bloat-o-meter -c vmlinux.{base,mod}
> > > add/remove: 7/1 grow/shrink: 1/19 up/down: 1135/-4082 (-2947)
> > >
> > > $ scripts/bloat-o-meter -c vmlinux.{base,all}
> > > add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> >
> > Thank you!
> >
> > I gave it a try on m68k, and am a bit disappointed seeing an increase
> > in code size:
> >
> >     add/remove: 49/13 grow/shrink: 279/138 up/down: 6434/-3342 (3092)
>
> Ufff, that sucks =\
> Could you please try to compile the following code snippet (with the
> series applied)?
>
>         unsigned long map;
>
>         bitmap_zero(&map, BITS_PER_LONG);
>         __set_bit(1, &map);
>         BUILD_BUG_ON(!__builtin_constant_p(map));
>
> If it fails during the vmlinux linkage, it will mean that on your
> architecture/setup the compiler is unable to optimize the generic
> implementations to compile-time constants and I'll need to debug
> this more (probably via some compiler explorer).

Builds and links fine.

> You could also check the vmlinux size after applying each patch
> to see which one does this if you feel like it :)

The (incremental) impact of the various patches is shown below:

    bitops: unify non-atomic bitops prototypes across architectures
      add/remove: 4/11 grow/shrink: 123/160 up/down: 1700/-2786 (-1086)

    bitops: let optimize out non-atomic bitops on compile-time constants
      add/remove: 50/7 grow/shrink: 280/101 up/down: 6798/-2620 (4178)

I.e. the total impact is -1086 + 4178 = +3092

Looking at the impact of the last change on a single file, with rather
small functions to make it easier to analyze, the results are:

    bloat-o-meter net/core/sock.o{.orig,}
    add/remove: 3/1 grow/shrink: 20/3 up/down: 286/-68 (218)
    Function                                     old     new   delta
    sock_flag                                      -      38     +38
    sock_set_flag                                  -      22     +22
    sock_reset_flag                                -      22     +22
    sock_recv_errqueue                           264     284     +20
    sock_alloc_send_pskb                         406     424     +18
    __sock_set_timestamps                        104     122     +18
    sock_setsockopt                             2412    2428     +16
    sock_pfree                                    52      66     +14
    sock_wfree                                   236     248     +12
    sk_wait_data                                 222     234     +12
    sk_destruct                                   70      82     +12
    sk_wake_async                                 40      50     +10
    sk_set_memalloc                               74      84     +10
    __sock_queue_rcv_skb                         254     264     +10
    __sk_backlog_rcv                              92     102     +10
    sock_getsockopt                             1734    1742      +8
    sock_no_linger                                36      42      +6
    sk_clone_lock                                478     484      +6
    sk_clear_memalloc                             98     104      +6
    __sk_receive_skb                             194     200      +6
    sock_init_data                               344     348      +4
    __sock_cmsg_send                             196     200      +4
    sk_common_release                            152     154      +2
    sock_set_keepalive                            62      60      -2
    sock_enable_timestamp                         80      72      -8
    sock_valbool_flag                             34      12     -22
    bset_mem_set_bit                              36       -     -36
    Total: Before\x18862, After\x19080, chg +1.16%

Several small inline functions are no longer inlined.
And e.g. __sk_backlog_rcv() increases as it now calls sock_flag()
out-of-line, and needs to save more on the stack:

 __sk_backlog_rcv:
+       move.l %a3,-(%sp)       |,
        move.l %d2,-(%sp)       |,
-       move.l 8(%sp),%a0       | sk, sk
-| arch/m68k/include/asm/bitops.h:163:  return (addr[nr >> 5] & (1UL
<< (nr & 31))) != 0;
-       move.l 76(%a0),%d0      | MEM[(const long unsigned int
*)sk_6(D) + 76B], _14
+       move.l 12(%sp),%a3      | sk, sk
 | net/core/sock.c:330:         BUG_ON(!sock_flag(sk, SOCK_MEMALLOC));
-       btst #14,%d0    |, _14
-       jne .L193               |
+       pea 14.w                |
+       move.l %a3,-(%sp)       | sk,
+       jsr sock_flag           |
+       addq.l #8,%sp   |,
+       tst.b %d0       | tmp50
+       jne .L192               |
        pea 330.w               |
        pea .LC0                |
        pea .LC3                |
        jsr _printk             |
        trap #7

Note that the above is with atari_defconfig, which has
CONFIG_CC_OPTIMIZE_FOR_SIZE=y.

Switching to CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y results in a kernel
that is ca. 25% larger, and the net impact of this series is:

    add/remove: 24/27 grow/shrink: 227/233 up/down: 7494/-8080 (-586)

i.e. a reduction in size...

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 0/6] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-08  9:55       ` Geert Uytterhoeven
  0 siblings, 0 replies; 75+ messages in thread
From: Geert Uytterhoeven @ 2022-06-08  9:55 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Richard Henderson,
	Matt Turner, Brian Cain, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	alpha, open list:QUALCOMM HEXAGON...,
	linux-ia64, linux-m68k

Hi Olek,

On Tue, Jun 7, 2022 at 5:51 PM Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
> From: Geert Uytterhoeven <geert@linux-m68k.org>
> > On Mon, Jun 6, 2022 at 1:50 PM Alexander Lobakin
> > <alexandr.lobakin@intel.com> wrote:
> > > While I was working on converting some structure fields from a fixed
> > > type to a bitmap, I started observing code size increase not only in
> > > places where the code works with the converted structure fields, but
> > > also where the converted vars were on the stack. That said, the
> > > following code:
> > >
> > >         DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
> > >         unsigned long bar = BIT(BAR_BIT);
> > >         unsigned long baz = 0;
> > >
> > >         __set_bit(FOO_BIT, foo);
> > >         baz |= BIT(BAZ_BIT);
> > >
> > >         BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
> > >         BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
> > >         BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
> > >
> > > triggers the first assertion on x86_64, which means that the
> > > compiler is unable to evaluate it to a compile-time initializer
> > > when the architecture-specific bitop is used even if it's obvious.
> > > I found that this is due to that many architecture-specific
> > > non-atomic bitop implementations use inline asm or other hacks which
> > > are faster or more robust when working with "real" variables (i.e.
> > > fields from the structures etc.), but the compilers have no clue how
> > > to optimize them out when called on compile-time constants.
> > >
> > > So, in order to let the compiler optimize out such cases, expand the
> > > test_bit() and __*_bit() definitions with a compile-time condition
> > > check, so that they will pick the generic C non-atomic bitop
> > > implementations when all of the arguments passed are compile-time
> > > constants, which means that the result will be a compile-time
> > > constant as well and the compiler will produce more efficient and
> > > simple code in 100% cases (no changes when there's at least one
> > > non-compile-time-constant argument).
> > > The condition itself:
> > >
> > > if (
> > > __builtin_constant_p(nr) &&     /* <- bit position is constant */
> > > __builtin_constant_p(!!addr) && /* <- compiler knows bitmap addr is
> > >                                       always either NULL or not */
> > > addr &&                         /* <- bitmap addr is not NULL */
> > > __builtin_constant_p(*addr)     /* <- compiler knows the value of
> > >                                       the target bitmap */
> > > )
> > >         /* then pick the generic C variant
> > > else
> > >         /* old code path, arch-specific
> > >
> > > I also tried __is_constexpr() as suggested by Andy, but it was
> > > always returning 0 ('not a constant') for the 2,3 and 4th
> > > conditions.
> > >
> > > The savings on x86_64 with LLVM are insane (.text):
> > >
> > > $ scripts/bloat-o-meter -c vmlinux.{base,test}
> > > add/remove: 72/75 grow/shrink: 182/518 up/down: 53925/-137810 (-83885)
> > >
> > > $ scripts/bloat-o-meter -c vmlinux.{base,mod}
> > > add/remove: 7/1 grow/shrink: 1/19 up/down: 1135/-4082 (-2947)
> > >
> > > $ scripts/bloat-o-meter -c vmlinux.{base,all}
> > > add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> >
> > Thank you!
> >
> > I gave it a try on m68k, and am a bit disappointed seeing an increase
> > in code size:
> >
> >     add/remove: 49/13 grow/shrink: 279/138 up/down: 6434/-3342 (3092)
>
> Ufff, that sucks =\
> Could you please try to compile the following code snippet (with the
> series applied)?
>
>         unsigned long map;
>
>         bitmap_zero(&map, BITS_PER_LONG);
>         __set_bit(1, &map);
>         BUILD_BUG_ON(!__builtin_constant_p(map));
>
> If it fails during the vmlinux linkage, it will mean that on your
> architecture/setup the compiler is unable to optimize the generic
> implementations to compile-time constants and I'll need to debug
> this more (probably via some compiler explorer).

Builds and links fine.

> You could also check the vmlinux size after applying each patch
> to see which one does this if you feel like it :)

The (incremental) impact of the various patches is shown below:

    bitops: unify non-atomic bitops prototypes across architectures
      add/remove: 4/11 grow/shrink: 123/160 up/down: 1700/-2786 (-1086)

    bitops: let optimize out non-atomic bitops on compile-time constants
      add/remove: 50/7 grow/shrink: 280/101 up/down: 6798/-2620 (4178)

I.e. the total impact is -1086 + 4178 = +3092

Looking at the impact of the last change on a single file, with rather
small functions to make it easier to analyze, the results are:

    bloat-o-meter net/core/sock.o{.orig,}
    add/remove: 3/1 grow/shrink: 20/3 up/down: 286/-68 (218)
    Function                                     old     new   delta
    sock_flag                                      -      38     +38
    sock_set_flag                                  -      22     +22
    sock_reset_flag                                -      22     +22
    sock_recv_errqueue                           264     284     +20
    sock_alloc_send_pskb                         406     424     +18
    __sock_set_timestamps                        104     122     +18
    sock_setsockopt                             2412    2428     +16
    sock_pfree                                    52      66     +14
    sock_wfree                                   236     248     +12
    sk_wait_data                                 222     234     +12
    sk_destruct                                   70      82     +12
    sk_wake_async                                 40      50     +10
    sk_set_memalloc                               74      84     +10
    __sock_queue_rcv_skb                         254     264     +10
    __sk_backlog_rcv                              92     102     +10
    sock_getsockopt                             1734    1742      +8
    sock_no_linger                                36      42      +6
    sk_clone_lock                                478     484      +6
    sk_clear_memalloc                             98     104      +6
    __sk_receive_skb                             194     200      +6
    sock_init_data                               344     348      +4
    __sock_cmsg_send                             196     200      +4
    sk_common_release                            152     154      +2
    sock_set_keepalive                            62      60      -2
    sock_enable_timestamp                         80      72      -8
    sock_valbool_flag                             34      12     -22
    bset_mem_set_bit                              36       -     -36
    Total: Before=18862, After=19080, chg +1.16%

Several small inline functions are no longer inlined.
And e.g. __sk_backlog_rcv() increases as it now calls sock_flag()
out-of-line, and needs to save more on the stack:

 __sk_backlog_rcv:
+       move.l %a3,-(%sp)       |,
        move.l %d2,-(%sp)       |,
-       move.l 8(%sp),%a0       | sk, sk
-| arch/m68k/include/asm/bitops.h:163:  return (addr[nr >> 5] & (1UL
<< (nr & 31))) != 0;
-       move.l 76(%a0),%d0      | MEM[(const long unsigned int
*)sk_6(D) + 76B], _14
+       move.l 12(%sp),%a3      | sk, sk
 | net/core/sock.c:330:         BUG_ON(!sock_flag(sk, SOCK_MEMALLOC));
-       btst #14,%d0    |, _14
-       jne .L193               |
+       pea 14.w                |
+       move.l %a3,-(%sp)       | sk,
+       jsr sock_flag           |
+       addq.l #8,%sp   |,
+       tst.b %d0       | tmp50
+       jne .L192               |
        pea 330.w               |
        pea .LC0                |
        pea .LC3                |
        jsr _printk             |
        trap #7

Note that the above is with atari_defconfig, which has
CONFIG_CC_OPTIMIZE_FOR_SIZE=y.

Switching to CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y results in a kernel
that is ca. 25% larger, and the net impact of this series is:

    add/remove: 24/27 grow/shrink: 227/233 up/down: 7494/-8080 (-586)

i.e. a reduction in size...

Gr{oetje,eeting}s,

                        Geert

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 0/6] bitops: let optimize out non-atomic bitops on compile-time constants
  2022-06-08  9:55       ` Geert Uytterhoeven
  (?)
@ 2022-06-08 13:31         ` Alexander Lobakin
  -1 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-08 13:31 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Alexander Lobakin, Arnd Bergmann, Yury Norov, Andy Shevchenko,
	Richard Henderson, Matt Turner, Brian Cain, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	alpha, open list:QUALCOMM HEXAGON...,
	linux-ia64, linux-m68k, Linux-sh list, sparclinux, Linux-Arch,
	Linux Kernel Mailing List

From: Geert Uytterhoeven <geert@linux-m68k.org>
Date: Wed, 8 Jun 2022 11:55:08 +0200

> Hi Olek,
> 
> On Tue, Jun 7, 2022 at 5:51 PM Alexander Lobakin
> <alexandr.lobakin@intel.com> wrote:
> > From: Geert Uytterhoeven <geert@linux-m68k.org>
> > > On Mon, Jun 6, 2022 at 1:50 PM Alexander Lobakin
> > > <alexandr.lobakin@intel.com> wrote:
> > > > While I was working on converting some structure fields from a fixed
> > > > type to a bitmap, I started observing code size increase not only in
> > > > places where the code works with the converted structure fields, but
> > > > also where the converted vars were on the stack. That said, the
> > > > following code:
> > > >
> > > >         DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
> > > >         unsigned long bar = BIT(BAR_BIT);
> > > >         unsigned long baz = 0;
> > > >
> > > >         __set_bit(FOO_BIT, foo);
> > > >         baz |= BIT(BAZ_BIT);
> > > >
> > > >         BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
> > > >         BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
> > > >         BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
> > > >
> > > > triggers the first assertion on x86_64, which means that the
> > > > compiler is unable to evaluate it to a compile-time initializer
> > > > when the architecture-specific bitop is used even if it's obvious.
> > > > I found that this is due to that many architecture-specific
> > > > non-atomic bitop implementations use inline asm or other hacks which
> > > > are faster or more robust when working with "real" variables (i.e.
> > > > fields from the structures etc.), but the compilers have no clue how
> > > > to optimize them out when called on compile-time constants.
> > > >
> > > > So, in order to let the compiler optimize out such cases, expand the
> > > > test_bit() and __*_bit() definitions with a compile-time condition
> > > > check, so that they will pick the generic C non-atomic bitop
> > > > implementations when all of the arguments passed are compile-time
> > > > constants, which means that the result will be a compile-time
> > > > constant as well and the compiler will produce more efficient and
> > > > simple code in 100% cases (no changes when there's at least one
> > > > non-compile-time-constant argument).
> > > > The condition itself:
> > > >
> > > > if (
> > > > __builtin_constant_p(nr) &&     /* <- bit position is constant */
> > > > __builtin_constant_p(!!addr) && /* <- compiler knows bitmap addr is
> > > >                                       always either NULL or not */
> > > > addr &&                         /* <- bitmap addr is not NULL */
> > > > __builtin_constant_p(*addr)     /* <- compiler knows the value of
> > > >                                       the target bitmap */
> > > > )
> > > >         /* then pick the generic C variant
> > > > else
> > > >         /* old code path, arch-specific
> > > >
> > > > I also tried __is_constexpr() as suggested by Andy, but it was
> > > > always returning 0 ('not a constant') for the 2,3 and 4th
> > > > conditions.
> > > >
> > > > The savings on x86_64 with LLVM are insane (.text):
> > > >
> > > > $ scripts/bloat-o-meter -c vmlinux.{base,test}
> > > > add/remove: 72/75 grow/shrink: 182/518 up/down: 53925/-137810 (-83885)
> > > >
> > > > $ scripts/bloat-o-meter -c vmlinux.{base,mod}
> > > > add/remove: 7/1 grow/shrink: 1/19 up/down: 1135/-4082 (-2947)
> > > >
> > > > $ scripts/bloat-o-meter -c vmlinux.{base,all}
> > > > add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> > >
> > > Thank you!
> > >
> > > I gave it a try on m68k, and am a bit disappointed seeing an increase
> > > in code size:
> > >
> > >     add/remove: 49/13 grow/shrink: 279/138 up/down: 6434/-3342 (3092)
> >
> > Ufff, that sucks =\
> > Could you please try to compile the following code snippet (with the
> > series applied)?
> >
> >         unsigned long map;
> >
> >         bitmap_zero(&map, BITS_PER_LONG);
> >         __set_bit(1, &map);
> >         BUILD_BUG_ON(!__builtin_constant_p(map));
> >
> > If it fails during the vmlinux linkage, it will mean that on your
> > architecture/setup the compiler is unable to optimize the generic
> > implementations to compile-time constants and I'll need to debug
> > this more (probably via some compiler explorer).
> 
> Builds and links fine.

Ok, so that means the main task is still achieved.

> 
> > You could also check the vmlinux size after applying each patch
> > to see which one does this if you feel like it :)
> 
> The (incremental) impact of the various patches is shown below:
> 
>     bitops: unify non-atomic bitops prototypes across architectures
>       add/remove: 4/11 grow/shrink: 123/160 up/down: 1700/-2786 (-1086)
> 
>     bitops: let optimize out non-atomic bitops on compile-time constants
>       add/remove: 50/7 grow/shrink: 280/101 up/down: 6798/-2620 (4178)
> 
> I.e. the total impact is -1086 + 4178 = +3092
> 
> Looking at the impact of the last change on a single file, with rather
> small functions to make it easier to analyze, the results are:
> 
>     bloat-o-meter net/core/sock.o{.orig,}
>     add/remove: 3/1 grow/shrink: 20/3 up/down: 286/-68 (218)
>     Function                                     old     new   delta
>     sock_flag                                      -      38     +38
>     sock_set_flag                                  -      22     +22
>     sock_reset_flag                                -      22     +22
>     sock_recv_errqueue                           264     284     +20
>     sock_alloc_send_pskb                         406     424     +18
>     __sock_set_timestamps                        104     122     +18
>     sock_setsockopt                             2412    2428     +16
>     sock_pfree                                    52      66     +14
>     sock_wfree                                   236     248     +12
>     sk_wait_data                                 222     234     +12
>     sk_destruct                                   70      82     +12
>     sk_wake_async                                 40      50     +10
>     sk_set_memalloc                               74      84     +10
>     __sock_queue_rcv_skb                         254     264     +10
>     __sk_backlog_rcv                              92     102     +10
>     sock_getsockopt                             1734    1742      +8
>     sock_no_linger                                36      42      +6
>     sk_clone_lock                                478     484      +6
>     sk_clear_memalloc                             98     104      +6
>     __sk_receive_skb                             194     200      +6
>     sock_init_data                               344     348      +4
>     __sock_cmsg_send                             196     200      +4
>     sk_common_release                            152     154      +2
>     sock_set_keepalive                            62      60      -2
>     sock_enable_timestamp                         80      72      -8
>     sock_valbool_flag                             34      12     -22
>     bset_mem_set_bit                              36       -     -36
>     Total: Before=18862, After=19080, chg +1.16%
> 
> Several small inline functions are no longer inlined.
> And e.g. __sk_backlog_rcv() increases as it now calls sock_flag()
> out-of-line, and needs to save more on the stack:
> 
>  __sk_backlog_rcv:
> +       move.l %a3,-(%sp)       |,
>         move.l %d2,-(%sp)       |,
> -       move.l 8(%sp),%a0       | sk, sk
> -| arch/m68k/include/asm/bitops.h:163:  return (addr[nr >> 5] & (1UL
> << (nr & 31))) != 0;
> -       move.l 76(%a0),%d0      | MEM[(const long unsigned int
> *)sk_6(D) + 76B], _14
> +       move.l 12(%sp),%a3      | sk, sk
>  | net/core/sock.c:330:         BUG_ON(!sock_flag(sk, SOCK_MEMALLOC));
> -       btst #14,%d0    |, _14
> -       jne .L193               |
> +       pea 14.w                |
> +       move.l %a3,-(%sp)       | sk,
> +       jsr sock_flag           |
> +       addq.l #8,%sp   |,
> +       tst.b %d0       | tmp50
> +       jne .L192               |
>         pea 330.w               |
>         pea .LC0                |
>         pea .LC3                |
>         jsr _printk             |
>         trap #7
> 
> Note that the above is with atari_defconfig, which has
> CONFIG_CC_OPTIMIZE_FOR_SIZE=y.
> 
> Switching to CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y results in a kernel
> that is ca. 25% larger, and the net impact of this series is:
> 
>     add/remove: 24/27 grow/shrink: 227/233 up/down: 7494/-8080 (-586)
> 
> i.e. a reduction in size...

Much thanks for such an analysis!

I see that previously it was uninlining bset_mem_set_bit(), which is
m68k implementation of __set_bit(), and now it uninlined the "outer"
sock_*_flag().
I'd name it sort of normal, but the intention was to not change the
existing code when we're unable to optimize it out... And those sock
helpers are usually from that side, since @sk is never anything
constant. It's unexpected that the compiler is unable to resolve the
expression to the original bitop call on `-Os`. Dunno if it's
okay <O> __builtin_constant_p() still gives the desired results, so
I guess it is, but at the same time that function layout rebalance
can potentially hurt performance. Or improve it ._.

BTW, after updating LLVM 13 -> 14 I'm now getting -3Kb (was -86).
GCC 12 still gives -30.

> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds

Thanks,
Olek

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 0/6] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-08 13:31         ` Alexander Lobakin
  0 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-08 13:31 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Alexander Lobakin, Arnd Bergmann, Yury Norov, Andy Shevchenko,
	Richard Henderson, Matt Turner, Brian Cain, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	alpha, open list:QUALCOMM HEXAGON...,
	linux-ia64, linux-m68k, Linux-sh list, sparclinux, Linux-Arch,
	Linux Kernel Mailing List

From: Geert Uytterhoeven <geert@linux-m68k.org>
Date: Wed, 8 Jun 2022 11:55:08 +0200

> Hi Olek,
> 
> On Tue, Jun 7, 2022 at 5:51 PM Alexander Lobakin
> <alexandr.lobakin@intel.com> wrote:
> > From: Geert Uytterhoeven <geert@linux-m68k.org>
> > > On Mon, Jun 6, 2022 at 1:50 PM Alexander Lobakin
> > > <alexandr.lobakin@intel.com> wrote:
> > > > While I was working on converting some structure fields from a fixed
> > > > type to a bitmap, I started observing code size increase not only in
> > > > places where the code works with the converted structure fields, but
> > > > also where the converted vars were on the stack. That said, the
> > > > following code:
> > > >
> > > >         DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
> > > >         unsigned long bar = BIT(BAR_BIT);
> > > >         unsigned long baz = 0;
> > > >
> > > >         __set_bit(FOO_BIT, foo);
> > > >         baz |= BIT(BAZ_BIT);
> > > >
> > > >         BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
> > > >         BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
> > > >         BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
> > > >
> > > > triggers the first assertion on x86_64, which means that the
> > > > compiler is unable to evaluate it to a compile-time initializer
> > > > when the architecture-specific bitop is used even if it's obvious.
> > > > I found that this is due to that many architecture-specific
> > > > non-atomic bitop implementations use inline asm or other hacks which
> > > > are faster or more robust when working with "real" variables (i.e.
> > > > fields from the structures etc.), but the compilers have no clue how
> > > > to optimize them out when called on compile-time constants.
> > > >
> > > > So, in order to let the compiler optimize out such cases, expand the
> > > > test_bit() and __*_bit() definitions with a compile-time condition
> > > > check, so that they will pick the generic C non-atomic bitop
> > > > implementations when all of the arguments passed are compile-time
> > > > constants, which means that the result will be a compile-time
> > > > constant as well and the compiler will produce more efficient and
> > > > simple code in 100% cases (no changes when there's at least one
> > > > non-compile-time-constant argument).
> > > > The condition itself:
> > > >
> > > > if (
> > > > __builtin_constant_p(nr) &&     /* <- bit position is constant */
> > > > __builtin_constant_p(!!addr) && /* <- compiler knows bitmap addr is
> > > >                                       always either NULL or not */
> > > > addr &&                         /* <- bitmap addr is not NULL */
> > > > __builtin_constant_p(*addr)     /* <- compiler knows the value of
> > > >                                       the target bitmap */
> > > > )
> > > >         /* then pick the generic C variant
> > > > else
> > > >         /* old code path, arch-specific
> > > >
> > > > I also tried __is_constexpr() as suggested by Andy, but it was
> > > > always returning 0 ('not a constant') for the 2,3 and 4th
> > > > conditions.
> > > >
> > > > The savings on x86_64 with LLVM are insane (.text):
> > > >
> > > > $ scripts/bloat-o-meter -c vmlinux.{base,test}
> > > > add/remove: 72/75 grow/shrink: 182/518 up/down: 53925/-137810 (-83885)
> > > >
> > > > $ scripts/bloat-o-meter -c vmlinux.{base,mod}
> > > > add/remove: 7/1 grow/shrink: 1/19 up/down: 1135/-4082 (-2947)
> > > >
> > > > $ scripts/bloat-o-meter -c vmlinux.{base,all}
> > > > add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> > >
> > > Thank you!
> > >
> > > I gave it a try on m68k, and am a bit disappointed seeing an increase
> > > in code size:
> > >
> > >     add/remove: 49/13 grow/shrink: 279/138 up/down: 6434/-3342 (3092)
> >
> > Ufff, that sucks =\
> > Could you please try to compile the following code snippet (with the
> > series applied)?
> >
> >         unsigned long map;
> >
> >         bitmap_zero(&map, BITS_PER_LONG);
> >         __set_bit(1, &map);
> >         BUILD_BUG_ON(!__builtin_constant_p(map));
> >
> > If it fails during the vmlinux linkage, it will mean that on your
> > architecture/setup the compiler is unable to optimize the generic
> > implementations to compile-time constants and I'll need to debug
> > this more (probably via some compiler explorer).
> 
> Builds and links fine.

Ok, so that means the main task is still achieved.

> 
> > You could also check the vmlinux size after applying each patch
> > to see which one does this if you feel like it :)
> 
> The (incremental) impact of the various patches is shown below:
> 
>     bitops: unify non-atomic bitops prototypes across architectures
>       add/remove: 4/11 grow/shrink: 123/160 up/down: 1700/-2786 (-1086)
> 
>     bitops: let optimize out non-atomic bitops on compile-time constants
>       add/remove: 50/7 grow/shrink: 280/101 up/down: 6798/-2620 (4178)
> 
> I.e. the total impact is -1086 + 4178 = +3092
> 
> Looking at the impact of the last change on a single file, with rather
> small functions to make it easier to analyze, the results are:
> 
>     bloat-o-meter net/core/sock.o{.orig,}
>     add/remove: 3/1 grow/shrink: 20/3 up/down: 286/-68 (218)
>     Function                                     old     new   delta
>     sock_flag                                      -      38     +38
>     sock_set_flag                                  -      22     +22
>     sock_reset_flag                                -      22     +22
>     sock_recv_errqueue                           264     284     +20
>     sock_alloc_send_pskb                         406     424     +18
>     __sock_set_timestamps                        104     122     +18
>     sock_setsockopt                             2412    2428     +16
>     sock_pfree                                    52      66     +14
>     sock_wfree                                   236     248     +12
>     sk_wait_data                                 222     234     +12
>     sk_destruct                                   70      82     +12
>     sk_wake_async                                 40      50     +10
>     sk_set_memalloc                               74      84     +10
>     __sock_queue_rcv_skb                         254     264     +10
>     __sk_backlog_rcv                              92     102     +10
>     sock_getsockopt                             1734    1742      +8
>     sock_no_linger                                36      42      +6
>     sk_clone_lock                                478     484      +6
>     sk_clear_memalloc                             98     104      +6
>     __sk_receive_skb                             194     200      +6
>     sock_init_data                               344     348      +4
>     __sock_cmsg_send                             196     200      +4
>     sk_common_release                            152     154      +2
>     sock_set_keepalive                            62      60      -2
>     sock_enable_timestamp                         80      72      -8
>     sock_valbool_flag                             34      12     -22
>     bset_mem_set_bit                              36       -     -36
>     Total: Before\x18862, After\x19080, chg +1.16%
> 
> Several small inline functions are no longer inlined.
> And e.g. __sk_backlog_rcv() increases as it now calls sock_flag()
> out-of-line, and needs to save more on the stack:
> 
>  __sk_backlog_rcv:
> +       move.l %a3,-(%sp)       |,
>         move.l %d2,-(%sp)       |,
> -       move.l 8(%sp),%a0       | sk, sk
> -| arch/m68k/include/asm/bitops.h:163:  return (addr[nr >> 5] & (1UL
> << (nr & 31))) != 0;
> -       move.l 76(%a0),%d0      | MEM[(const long unsigned int
> *)sk_6(D) + 76B], _14
> +       move.l 12(%sp),%a3      | sk, sk
>  | net/core/sock.c:330:         BUG_ON(!sock_flag(sk, SOCK_MEMALLOC));
> -       btst #14,%d0    |, _14
> -       jne .L193               |
> +       pea 14.w                |
> +       move.l %a3,-(%sp)       | sk,
> +       jsr sock_flag           |
> +       addq.l #8,%sp   |,
> +       tst.b %d0       | tmp50
> +       jne .L192               |
>         pea 330.w               |
>         pea .LC0                |
>         pea .LC3                |
>         jsr _printk             |
>         trap #7
> 
> Note that the above is with atari_defconfig, which has
> CONFIG_CC_OPTIMIZE_FOR_SIZE=y.
> 
> Switching to CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y results in a kernel
> that is ca. 25% larger, and the net impact of this series is:
> 
>     add/remove: 24/27 grow/shrink: 227/233 up/down: 7494/-8080 (-586)
> 
> i.e. a reduction in size...

Much thanks for such an analysis!

I see that previously it was uninlining bset_mem_set_bit(), which is
m68k implementation of __set_bit(), and now it uninlined the "outer"
sock_*_flag().
I'd name it sort of normal, but the intention was to not change the
existing code when we're unable to optimize it out... And those sock
helpers are usually from that side, since @sk is never anything
constant. It's unexpected that the compiler is unable to resolve the
expression to the original bitop call on `-Os`. Dunno if it's
okay <O> __builtin_constant_p() still gives the desired results, so
I guess it is, but at the same time that function layout rebalance
can potentially hurt performance. Or improve it ._.

BTW, after updating LLVM 13 -> 14 I'm now getting -3Kb (was -86).
GCC 12 still gives -30.

> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds

Thanks,
Olek

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 0/6] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-08 13:31         ` Alexander Lobakin
  0 siblings, 0 replies; 75+ messages in thread
From: Alexander Lobakin @ 2022-06-08 13:31 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Alexander Lobakin, Arnd Bergmann, Yury Norov, Andy Shevchenko,
	Richard Henderson, Matt Turner, Brian Cain, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Greg Kroah-Hartman,
	alpha, open list:QUALCOMM HEXAGON...,
	linux-ia64@vger.kernel.org

From: Geert Uytterhoeven <geert@linux-m68k.org>
Date: Wed, 8 Jun 2022 11:55:08 +0200

> Hi Olek,
> 
> On Tue, Jun 7, 2022 at 5:51 PM Alexander Lobakin
> <alexandr.lobakin@intel.com> wrote:
> > From: Geert Uytterhoeven <geert@linux-m68k.org>
> > > On Mon, Jun 6, 2022 at 1:50 PM Alexander Lobakin
> > > <alexandr.lobakin@intel.com> wrote:
> > > > While I was working on converting some structure fields from a fixed
> > > > type to a bitmap, I started observing code size increase not only in
> > > > places where the code works with the converted structure fields, but
> > > > also where the converted vars were on the stack. That said, the
> > > > following code:
> > > >
> > > >         DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
> > > >         unsigned long bar = BIT(BAR_BIT);
> > > >         unsigned long baz = 0;
> > > >
> > > >         __set_bit(FOO_BIT, foo);
> > > >         baz |= BIT(BAZ_BIT);
> > > >
> > > >         BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
> > > >         BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
> > > >         BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
> > > >
> > > > triggers the first assertion on x86_64, which means that the
> > > > compiler is unable to evaluate it to a compile-time initializer
> > > > when the architecture-specific bitop is used even if it's obvious.
> > > > I found that this is due to that many architecture-specific
> > > > non-atomic bitop implementations use inline asm or other hacks which
> > > > are faster or more robust when working with "real" variables (i.e.
> > > > fields from the structures etc.), but the compilers have no clue how
> > > > to optimize them out when called on compile-time constants.
> > > >
> > > > So, in order to let the compiler optimize out such cases, expand the
> > > > test_bit() and __*_bit() definitions with a compile-time condition
> > > > check, so that they will pick the generic C non-atomic bitop
> > > > implementations when all of the arguments passed are compile-time
> > > > constants, which means that the result will be a compile-time
> > > > constant as well and the compiler will produce more efficient and
> > > > simple code in 100% cases (no changes when there's at least one
> > > > non-compile-time-constant argument).
> > > > The condition itself:
> > > >
> > > > if (
> > > > __builtin_constant_p(nr) &&     /* <- bit position is constant */
> > > > __builtin_constant_p(!!addr) && /* <- compiler knows bitmap addr is
> > > >                                       always either NULL or not */
> > > > addr &&                         /* <- bitmap addr is not NULL */
> > > > __builtin_constant_p(*addr)     /* <- compiler knows the value of
> > > >                                       the target bitmap */
> > > > )
> > > >         /* then pick the generic C variant
> > > > else
> > > >         /* old code path, arch-specific
> > > >
> > > > I also tried __is_constexpr() as suggested by Andy, but it was
> > > > always returning 0 ('not a constant') for the 2,3 and 4th
> > > > conditions.
> > > >
> > > > The savings on x86_64 with LLVM are insane (.text):
> > > >
> > > > $ scripts/bloat-o-meter -c vmlinux.{base,test}
> > > > add/remove: 72/75 grow/shrink: 182/518 up/down: 53925/-137810 (-83885)
> > > >
> > > > $ scripts/bloat-o-meter -c vmlinux.{base,mod}
> > > > add/remove: 7/1 grow/shrink: 1/19 up/down: 1135/-4082 (-2947)
> > > >
> > > > $ scripts/bloat-o-meter -c vmlinux.{base,all}
> > > > add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> > >
> > > Thank you!
> > >
> > > I gave it a try on m68k, and am a bit disappointed seeing an increase
> > > in code size:
> > >
> > >     add/remove: 49/13 grow/shrink: 279/138 up/down: 6434/-3342 (3092)
> >
> > Ufff, that sucks =\
> > Could you please try to compile the following code snippet (with the
> > series applied)?
> >
> >         unsigned long map;
> >
> >         bitmap_zero(&map, BITS_PER_LONG);
> >         __set_bit(1, &map);
> >         BUILD_BUG_ON(!__builtin_constant_p(map));
> >
> > If it fails during the vmlinux linkage, it will mean that on your
> > architecture/setup the compiler is unable to optimize the generic
> > implementations to compile-time constants and I'll need to debug
> > this more (probably via some compiler explorer).
> 
> Builds and links fine.

Ok, so that means the main task is still achieved.

> 
> > You could also check the vmlinux size after applying each patch
> > to see which one does this if you feel like it :)
> 
> The (incremental) impact of the various patches is shown below:
> 
>     bitops: unify non-atomic bitops prototypes across architectures
>       add/remove: 4/11 grow/shrink: 123/160 up/down: 1700/-2786 (-1086)
> 
>     bitops: let optimize out non-atomic bitops on compile-time constants
>       add/remove: 50/7 grow/shrink: 280/101 up/down: 6798/-2620 (4178)
> 
> I.e. the total impact is -1086 + 4178 = +3092
> 
> Looking at the impact of the last change on a single file, with rather
> small functions to make it easier to analyze, the results are:
> 
>     bloat-o-meter net/core/sock.o{.orig,}
>     add/remove: 3/1 grow/shrink: 20/3 up/down: 286/-68 (218)
>     Function                                     old     new   delta
>     sock_flag                                      -      38     +38
>     sock_set_flag                                  -      22     +22
>     sock_reset_flag                                -      22     +22
>     sock_recv_errqueue                           264     284     +20
>     sock_alloc_send_pskb                         406     424     +18
>     __sock_set_timestamps                        104     122     +18
>     sock_setsockopt                             2412    2428     +16
>     sock_pfree                                    52      66     +14
>     sock_wfree                                   236     248     +12
>     sk_wait_data                                 222     234     +12
>     sk_destruct                                   70      82     +12
>     sk_wake_async                                 40      50     +10
>     sk_set_memalloc                               74      84     +10
>     __sock_queue_rcv_skb                         254     264     +10
>     __sk_backlog_rcv                              92     102     +10
>     sock_getsockopt                             1734    1742      +8
>     sock_no_linger                                36      42      +6
>     sk_clone_lock                                478     484      +6
>     sk_clear_memalloc                             98     104      +6
>     __sk_receive_skb                             194     200      +6
>     sock_init_data                               344     348      +4
>     __sock_cmsg_send                             196     200      +4
>     sk_common_release                            152     154      +2
>     sock_set_keepalive                            62      60      -2
>     sock_enable_timestamp                         80      72      -8
>     sock_valbool_flag                             34      12     -22
>     bset_mem_set_bit                              36       -     -36
>     Total: Before=18862, After=19080, chg +1.16%
> 
> Several small inline functions are no longer inlined.
> And e.g. __sk_backlog_rcv() increases as it now calls sock_flag()
> out-of-line, and needs to save more on the stack:
> 
>  __sk_backlog_rcv:
> +       move.l %a3,-(%sp)       |,
>         move.l %d2,-(%sp)       |,
> -       move.l 8(%sp),%a0       | sk, sk
> -| arch/m68k/include/asm/bitops.h:163:  return (addr[nr >> 5] & (1UL
> << (nr & 31))) != 0;
> -       move.l 76(%a0),%d0      | MEM[(const long unsigned int
> *)sk_6(D) + 76B], _14
> +       move.l 12(%sp),%a3      | sk, sk
>  | net/core/sock.c:330:         BUG_ON(!sock_flag(sk, SOCK_MEMALLOC));
> -       btst #14,%d0    |, _14
> -       jne .L193               |
> +       pea 14.w                |
> +       move.l %a3,-(%sp)       | sk,
> +       jsr sock_flag           |
> +       addq.l #8,%sp   |,
> +       tst.b %d0       | tmp50
> +       jne .L192               |
>         pea 330.w               |
>         pea .LC0                |
>         pea .LC3                |
>         jsr _printk             |
>         trap #7
> 
> Note that the above is with atari_defconfig, which has
> CONFIG_CC_OPTIMIZE_FOR_SIZE=y.
> 
> Switching to CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y results in a kernel
> that is ca. 25% larger, and the net impact of this series is:
> 
>     add/remove: 24/27 grow/shrink: 227/233 up/down: 7494/-8080 (-586)
> 
> i.e. a reduction in size...

Much thanks for such an analysis!

I see that previously it was uninlining bset_mem_set_bit(), which is
m68k implementation of __set_bit(), and now it uninlined the "outer"
sock_*_flag().
I'd name it sort of normal, but the intention was to not change the
existing code when we're unable to optimize it out... And those sock
helpers are usually from that side, since @sk is never anything
constant. It's unexpected that the compiler is unable to resolve the
expression to the original bitop call on `-Os`. Dunno if it's
okay <O> __builtin_constant_p() still gives the desired results, so
I guess it is, but at the same time that function layout rebalance
can potentially hurt performance. Or improve it ._.

BTW, after updating LLVM 13 -> 14 I'm now getting -3Kb (was -86).
GCC 12 still gives -30.

> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds

Thanks,
Olek

^ permalink raw reply	[flat|nested] 75+ messages in thread

end of thread, other threads:[~2022-06-08 13:33 UTC | newest]

Thread overview: 75+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-06 11:49 [PATCH 0/6] bitops: let optimize out non-atomic bitops on compile-time constants Alexander Lobakin
2022-06-06 11:49 ` Alexander Lobakin
2022-06-06 11:49 ` Alexander Lobakin
2022-06-06 11:49 ` [PATCH 1/6] ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr() Alexander Lobakin
2022-06-06 11:49   ` Alexander Lobakin
2022-06-06 11:49   ` Alexander Lobakin
2022-06-06 11:49 ` [PATCH 2/6] bitops: always define asm-generic non-atomic bitops Alexander Lobakin
2022-06-06 11:49   ` Alexander Lobakin
2022-06-06 11:49   ` Alexander Lobakin
2022-06-06 12:44   ` Mark Rutland
2022-06-06 12:44     ` Mark Rutland
2022-06-06 12:44     ` Mark Rutland
2022-06-06 14:21     ` Alexander Lobakin
2022-06-06 14:21       ` Alexander Lobakin
2022-06-06 14:21       ` Alexander Lobakin
2022-06-06 11:49 ` [PATCH 3/6] bitops: define gen_test_bit() the same way as the rest of functions Alexander Lobakin
2022-06-06 11:49   ` Alexander Lobakin
2022-06-06 11:49   ` Alexander Lobakin
2022-06-06 16:19   ` Mark Rutland
2022-06-06 16:19     ` Mark Rutland
2022-06-06 16:19     ` Mark Rutland
2022-06-07 13:43   ` Marco Elver
2022-06-07 13:43     ` Marco Elver
2022-06-07 13:43     ` Marco Elver
2022-06-07 15:57     ` Alexander Lobakin
2022-06-07 15:57       ` Alexander Lobakin
2022-06-07 15:57       ` Alexander Lobakin
2022-06-07 16:15       ` Marco Elver
2022-06-07 16:15         ` Marco Elver
2022-06-07 16:15         ` Marco Elver
2022-06-07 16:28       ` Andy Shevchenko
2022-06-07 16:28         ` Andy Shevchenko
2022-06-07 16:28         ` Andy Shevchenko
2022-06-06 11:49 ` [PATCH 4/6] bitops: unify non-atomic bitops prototypes across architectures Alexander Lobakin
2022-06-06 11:49   ` Alexander Lobakin
2022-06-06 11:49   ` Alexander Lobakin
2022-06-06 16:25   ` Mark Rutland
2022-06-06 16:25     ` Mark Rutland
2022-06-06 16:25     ` Mark Rutland
2022-06-06 20:48   ` Yury Norov
2022-06-06 20:48     ` Yury Norov
2022-06-06 20:48     ` Yury Norov
2022-06-07 11:03     ` Alexander Lobakin
2022-06-07 11:03       ` Alexander Lobakin
2022-06-07 11:03       ` Alexander Lobakin
2022-06-06 11:49 ` [PATCH 5/6] bitops: wrap non-atomic bitops with a transparent macro Alexander Lobakin
2022-06-06 11:49   ` Alexander Lobakin
2022-06-06 11:49   ` Alexander Lobakin
2022-06-06 16:27   ` Mark Rutland
2022-06-06 16:27     ` Mark Rutland
2022-06-06 16:27     ` Mark Rutland
2022-06-07 10:57     ` Alexander Lobakin
2022-06-07 10:57       ` Alexander Lobakin
2022-06-07 10:57       ` Alexander Lobakin
2022-06-07 11:07       ` Alexander Lobakin
2022-06-07 11:07         ` Alexander Lobakin
2022-06-07 11:07         ` Alexander Lobakin
2022-06-06 11:49 ` [PATCH 6/6] bitops: let optimize out non-atomic bitops on compile-time constants Alexander Lobakin
2022-06-06 11:49   ` Alexander Lobakin
2022-06-06 11:49   ` Alexander Lobakin
2022-06-06 13:50 ` [PATCH 0/6] " Mark Rutland
2022-06-06 13:50   ` Mark Rutland
2022-06-06 13:50   ` Mark Rutland
2022-06-07 12:45 ` Geert Uytterhoeven
2022-06-07 12:45   ` Geert Uytterhoeven
2022-06-07 12:45   ` Geert Uytterhoeven
2022-06-07 15:47   ` Alexander Lobakin
2022-06-07 15:47     ` Alexander Lobakin
2022-06-07 15:47     ` Alexander Lobakin
2022-06-08  9:55     ` Geert Uytterhoeven
2022-06-08  9:55       ` Geert Uytterhoeven
2022-06-08  9:55       ` Geert Uytterhoeven
2022-06-08 13:31       ` Alexander Lobakin
2022-06-08 13:31         ` Alexander Lobakin
2022-06-08 13:31         ` Alexander Lobakin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.