All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-17 14:40   ` Alexander Lobakin
  0 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-17 14:40 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

While I was working on converting some structure fields from a fixed
type to a bitmap, I started observing code size increase not only in
places where the code works with the converted structure fields, but
also where the converted vars were on the stack. That said, the
following code:

	DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
	unsigned long bar = BIT(BAR_BIT);
	unsigned long baz = 0;

	__set_bit(FOO_BIT, foo);
	baz |= BIT(BAZ_BIT);

	BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
	BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
	BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));

triggers the first assertion on x86_64, which means that the
compiler is unable to evaluate it to a compile-time initializer
when the architecture-specific bitop is used even if it's obvious.
I found that this is due to that many architecture-specific
non-atomic bitop implementations use inline asm or other hacks which
are faster or more robust when working with "real" variables (i.e.
fields from the structures etc.), but the compilers have no clue how
to optimize them out when called on compile-time constants.

So, in order to let the compiler optimize out such cases, expand the
test_bit() and __*_bit() definitions with a compile-time condition
check, so that they will pick the generic C non-atomic bitop
implementations when all of the arguments passed are compile-time
constants, which means that the result will be a compile-time
constant as well and the compiler will produce more efficient and
simple code in 100% cases (no changes when there's at least one
non-compile-time-constant argument).
The condition itself:

if (
__builtin_constant_p(nr) &&	/* <- bit position is constant */
__builtin_constant_p(!!addr) &&	/* <- compiler knows bitmap addr is
				      always either NULL or not */
addr &&				/* <- bitmap addr is not NULL */
__builtin_constant_p(*addr)	/* <- compiler knows the value of
				      the target bitmap */
)
	/* then pick the generic C variant
else
	/* old code path, arch-specific

I also tried __is_constexpr() as suggested by Andy, but it was
always returning 0 ('not a constant') for the 2,3 and 4th
conditions.

The savings are architecture, compiler and compiler flags dependent,
for example, on x86_64 -O2:

GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)

and ARM64 (courtesy of Mark[0]):

GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)

And the following:

	DECLARE_BITMAP(flags, __IP_TUNNEL_FLAG_NUM) = { };
	__be16 flags;

	__set_bit(IP_TUNNEL_CSUM_BIT, flags);

	tun_flags = cpu_to_be16(*flags & U16_MAX);

	if (test_bit(IP_TUNNEL_VTI_BIT, flags))
		tun_flags |= VTI_ISVTI;

	BUILD_BUG_ON(!__builtin_constant_p(tun_flags));

doesn't blow up anymore (which is being checked now at build time),
so that we can now e.g. use fixed bitmaps in compile-time assertions
etc.

The series has been in intel-next for a while with no reported issues.

From v2[1]:
* collect several Reviewed-bys (Andy, Yury);
* add a comment to generic_test_bit() that it is atomic-safe and
  must always stay like that (the first version of this series
  errorneously tried to change this) (Andy, Marco);
* unify the way how architectures define platform-specific bitops,
  both supporting instrumentation and not: now they define only
  'arch_' versions and asm-generic includes take care of the rest;
* micro-optimize the diffstat of 0004/0007 (__check_bitop_pr())
  (Andy);
* add compile-time tests to lib/test_bitmap to make sure everything
  works as expected on any setup (Yury).

From v1[2]:
* change 'gen_' prefixes to '_generic' to disambiguate from
  'generated' etc. (Mark);
* define a separate 'const_' set to use in the optimization to keep
  the generic test_bit() atomic-safe (Marco);
* unify arch_{test,__*}_bit() as well and include them in the type
  check;
* add more relevant and up-to-date bloat-o-meter results, including
  ARM64 (me, Mark);
* pick a couple '*-by' tags (Mark, Yury).

Also available on my open GH[3].

[0] https://lore.kernel.org/all/Yp4GQFQYD32Rs9Cw@FVFF77S0Q05N
[1] https://lore.kernel.org/linux-arch/20220610113427.908751-1-alexandr.lobakin@intel.com
[2] https://lore.kernel.org/all/20220606114908.962562-1-alexandr.lobakin@intel.com
[3] https://github.com/alobakin/linux/commits/bitops

Alexander Lobakin (7):
  ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr()
  bitops: always define asm-generic non-atomic bitops
  bitops: unify non-atomic bitops prototypes across architectures
  bitops: define const_*() versions of the non-atomics
  bitops: wrap non-atomic bitops with a transparent macro
  bitops: let optimize out non-atomic bitops on compile-time constants
  lib: test_bitmap: add compile-time optimization/evaluations assertions

 arch/alpha/include/asm/bitops.h               |  32 ++--
 arch/hexagon/include/asm/bitops.h             |  24 ++-
 arch/ia64/include/asm/bitops.h                |  42 ++---
 arch/ia64/include/asm/processor.h             |   2 +-
 arch/m68k/include/asm/bitops.h                |  49 ++++--
 arch/sh/include/asm/bitops-op32.h             |  34 ++--
 arch/sparc/include/asm/bitops_32.h            |  18 +-
 arch/sparc/lib/atomic32.c                     |  12 +-
 arch/x86/include/asm/bitops.h                 |  22 +--
 .../asm-generic/bitops/generic-non-atomic.h   | 161 ++++++++++++++++++
 .../bitops/instrumented-non-atomic.h          |  35 ++--
 include/asm-generic/bitops/non-atomic.h       | 121 +------------
 .../bitops/non-instrumented-non-atomic.h      |  16 ++
 include/linux/bitops.h                        |  50 ++++++
 lib/test_bitmap.c                             |  45 +++++
 tools/include/asm-generic/bitops/non-atomic.h |  34 ++--
 tools/include/linux/bitops.h                  |  16 ++
 17 files changed, 476 insertions(+), 237 deletions(-)
 create mode 100644 include/asm-generic/bitops/generic-non-atomic.h
 create mode 100644 include/asm-generic/bitops/non-instrumented-non-atomic.h

-- 
2.36.1


^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-17 14:40   ` Alexander Lobakin
  0 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-17 14:40 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

While I was working on converting some structure fields from a fixed
type to a bitmap, I started observing code size increase not only in
places where the code works with the converted structure fields, but
also where the converted vars were on the stack. That said, the
following code:

	DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
	unsigned long bar = BIT(BAR_BIT);
	unsigned long baz = 0;

	__set_bit(FOO_BIT, foo);
	baz |= BIT(BAZ_BIT);

	BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
	BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
	BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));

triggers the first assertion on x86_64, which means that the
compiler is unable to evaluate it to a compile-time initializer
when the architecture-specific bitop is used even if it's obvious.
I found that this is due to that many architecture-specific
non-atomic bitop implementations use inline asm or other hacks which
are faster or more robust when working with "real" variables (i.e.
fields from the structures etc.), but the compilers have no clue how
to optimize them out when called on compile-time constants.

So, in order to let the compiler optimize out such cases, expand the
test_bit() and __*_bit() definitions with a compile-time condition
check, so that they will pick the generic C non-atomic bitop
implementations when all of the arguments passed are compile-time
constants, which means that the result will be a compile-time
constant as well and the compiler will produce more efficient and
simple code in 100% cases (no changes when there's at least one
non-compile-time-constant argument).
The condition itself:

if (
__builtin_constant_p(nr) &&	/* <- bit position is constant */
__builtin_constant_p(!!addr) &&	/* <- compiler knows bitmap addr is
				      always either NULL or not */
addr &&				/* <- bitmap addr is not NULL */
__builtin_constant_p(*addr)	/* <- compiler knows the value of
				      the target bitmap */
)
	/* then pick the generic C variant
else
	/* old code path, arch-specific

I also tried __is_constexpr() as suggested by Andy, but it was
always returning 0 ('not a constant') for the 2,3 and 4th
conditions.

The savings are architecture, compiler and compiler flags dependent,
for example, on x86_64 -O2:

GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)

and ARM64 (courtesy of Mark[0]):

GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)

And the following:

	DECLARE_BITMAP(flags, __IP_TUNNEL_FLAG_NUM) = { };
	__be16 flags;

	__set_bit(IP_TUNNEL_CSUM_BIT, flags);

	tun_flags = cpu_to_be16(*flags & U16_MAX);

	if (test_bit(IP_TUNNEL_VTI_BIT, flags))
		tun_flags |= VTI_ISVTI;

	BUILD_BUG_ON(!__builtin_constant_p(tun_flags));

doesn't blow up anymore (which is being checked now at build time),
so that we can now e.g. use fixed bitmaps in compile-time assertions
etc.

The series has been in intel-next for a while with no reported issues.

From v2[1]:
* collect several Reviewed-bys (Andy, Yury);
* add a comment to generic_test_bit() that it is atomic-safe and
  must always stay like that (the first version of this series
  errorneously tried to change this) (Andy, Marco);
* unify the way how architectures define platform-specific bitops,
  both supporting instrumentation and not: now they define only
  'arch_' versions and asm-generic includes take care of the rest;
* micro-optimize the diffstat of 0004/0007 (__check_bitop_pr())
  (Andy);
* add compile-time tests to lib/test_bitmap to make sure everything
  works as expected on any setup (Yury).

From v1[2]:
* change 'gen_' prefixes to '_generic' to disambiguate from
  'generated' etc. (Mark);
* define a separate 'const_' set to use in the optimization to keep
  the generic test_bit() atomic-safe (Marco);
* unify arch_{test,__*}_bit() as well and include them in the type
  check;
* add more relevant and up-to-date bloat-o-meter results, including
  ARM64 (me, Mark);
* pick a couple '*-by' tags (Mark, Yury).

Also available on my open GH[3].

[0] https://lore.kernel.org/all/Yp4GQFQYD32Rs9Cw@FVFF77S0Q05N
[1] https://lore.kernel.org/linux-arch/20220610113427.908751-1-alexandr.lobakin@intel.com
[2] https://lore.kernel.org/all/20220606114908.962562-1-alexandr.lobakin@intel.com
[3] https://github.com/alobakin/linux/commits/bitops

Alexander Lobakin (7):
  ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr()
  bitops: always define asm-generic non-atomic bitops
  bitops: unify non-atomic bitops prototypes across architectures
  bitops: define const_*() versions of the non-atomics
  bitops: wrap non-atomic bitops with a transparent macro
  bitops: let optimize out non-atomic bitops on compile-time constants
  lib: test_bitmap: add compile-time optimization/evaluations assertions

 arch/alpha/include/asm/bitops.h               |  32 ++--
 arch/hexagon/include/asm/bitops.h             |  24 ++-
 arch/ia64/include/asm/bitops.h                |  42 ++---
 arch/ia64/include/asm/processor.h             |   2 +-
 arch/m68k/include/asm/bitops.h                |  49 ++++--
 arch/sh/include/asm/bitops-op32.h             |  34 ++--
 arch/sparc/include/asm/bitops_32.h            |  18 +-
 arch/sparc/lib/atomic32.c                     |  12 +-
 arch/x86/include/asm/bitops.h                 |  22 +--
 .../asm-generic/bitops/generic-non-atomic.h   | 161 ++++++++++++++++++
 .../bitops/instrumented-non-atomic.h          |  35 ++--
 include/asm-generic/bitops/non-atomic.h       | 121 +------------
 .../bitops/non-instrumented-non-atomic.h      |  16 ++
 include/linux/bitops.h                        |  50 ++++++
 lib/test_bitmap.c                             |  45 +++++
 tools/include/asm-generic/bitops/non-atomic.h |  34 ++--
 tools/include/linux/bitops.h                  |  16 ++
 17 files changed, 476 insertions(+), 237 deletions(-)
 create mode 100644 include/asm-generic/bitops/generic-non-atomic.h
 create mode 100644 include/asm-generic/bitops/non-instrumented-non-atomic.h

-- 
2.36.1

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-17 14:40   ` Alexander Lobakin
  0 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-17 14:40 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64

While I was working on converting some structure fields from a fixed
type to a bitmap, I started observing code size increase not only in
places where the code works with the converted structure fields, but
also where the converted vars were on the stack. That said, the
following code:

	DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
	unsigned long bar = BIT(BAR_BIT);
	unsigned long baz = 0;

	__set_bit(FOO_BIT, foo);
	baz |= BIT(BAZ_BIT);

	BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
	BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
	BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));

triggers the first assertion on x86_64, which means that the
compiler is unable to evaluate it to a compile-time initializer
when the architecture-specific bitop is used even if it's obvious.
I found that this is due to that many architecture-specific
non-atomic bitop implementations use inline asm or other hacks which
are faster or more robust when working with "real" variables (i.e.
fields from the structures etc.), but the compilers have no clue how
to optimize them out when called on compile-time constants.

So, in order to let the compiler optimize out such cases, expand the
test_bit() and __*_bit() definitions with a compile-time condition
check, so that they will pick the generic C non-atomic bitop
implementations when all of the arguments passed are compile-time
constants, which means that the result will be a compile-time
constant as well and the compiler will produce more efficient and
simple code in 100% cases (no changes when there's at least one
non-compile-time-constant argument).
The condition itself:

if (
__builtin_constant_p(nr) &&	/* <- bit position is constant */
__builtin_constant_p(!!addr) &&	/* <- compiler knows bitmap addr is
				      always either NULL or not */
addr &&				/* <- bitmap addr is not NULL */
__builtin_constant_p(*addr)	/* <- compiler knows the value of
				      the target bitmap */
)
	/* then pick the generic C variant
else
	/* old code path, arch-specific

I also tried __is_constexpr() as suggested by Andy, but it was
always returning 0 ('not a constant') for the 2,3 and 4th
conditions.

The savings are architecture, compiler and compiler flags dependent,
for example, on x86_64 -O2:

GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)

and ARM64 (courtesy of Mark[0]):

GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)

And the following:

	DECLARE_BITMAP(flags, __IP_TUNNEL_FLAG_NUM) = { };
	__be16 flags;

	__set_bit(IP_TUNNEL_CSUM_BIT, flags);

	tun_flags = cpu_to_be16(*flags & U16_MAX);

	if (test_bit(IP_TUNNEL_VTI_BIT, flags))
		tun_flags |= VTI_ISVTI;

	BUILD_BUG_ON(!__builtin_constant_p(tun_flags));

doesn't blow up anymore (which is being checked now at build time),
so that we can now e.g. use fixed bitmaps in compile-time assertions
etc.

The series has been in intel-next for a while with no reported issues.

From v2[1]:
* collect several Reviewed-bys (Andy, Yury);
* add a comment to generic_test_bit() that it is atomic-safe and
  must always stay like that (the first version of this series
  errorneously tried to change this) (Andy, Marco);
* unify the way how architectures define platform-specific bitops,
  both supporting instrumentation and not: now they define only
  'arch_' versions and asm-generic includes take care of the rest;
* micro-optimize the diffstat of 0004/0007 (__check_bitop_pr())
  (Andy);
* add compile-time tests to lib/test_bitmap to make sure everything
  works as expected on any setup (Yury).

From v1[2]:
* change 'gen_' prefixes to '_generic' to disambiguate from
  'generated' etc. (Mark);
* define a separate 'const_' set to use in the optimization to keep
  the generic test_bit() atomic-safe (Marco);
* unify arch_{test,__*}_bit() as well and include them in the type
  check;
* add more relevant and up-to-date bloat-o-meter results, including
  ARM64 (me, Mark);
* pick a couple '*-by' tags (Mark, Yury).

Also available on my open GH[3].

[0] https://lore.kernel.org/all/Yp4GQFQYD32Rs9Cw@FVFF77S0Q05N
[1] https://lore.kernel.org/linux-arch/20220610113427.908751-1-alexandr.lobakin@intel.com
[2] https://lore.kernel.org/all/20220606114908.962562-1-alexandr.lobakin@intel.com
[3] https://github.com/alobakin/linux/commits/bitops

Alexander Lobakin (7):
  ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr()
  bitops: always define asm-generic non-atomic bitops
  bitops: unify non-atomic bitops prototypes across architectures
  bitops: define const_*() versions of the non-atomics
  bitops: wrap non-atomic bitops with a transparent macro
  bitops: let optimize out non-atomic bitops on compile-time constants
  lib: test_bitmap: add compile-time optimization/evaluations assertions

 arch/alpha/include/asm/bitops.h               |  32 ++--
 arch/hexagon/include/asm/bitops.h             |  24 ++-
 arch/ia64/include/asm/bitops.h                |  42 ++---
 arch/ia64/include/asm/processor.h             |   2 +-
 arch/m68k/include/asm/bitops.h                |  49 ++++--
 arch/sh/include/asm/bitops-op32.h             |  34 ++--
 arch/sparc/include/asm/bitops_32.h            |  18 +-
 arch/sparc/lib/atomic32.c                     |  12 +-
 arch/x86/include/asm/bitops.h                 |  22 +--
 .../asm-generic/bitops/generic-non-atomic.h   | 161 ++++++++++++++++++
 .../bitops/instrumented-non-atomic.h          |  35 ++--
 include/asm-generic/bitops/non-atomic.h       | 121 +------------
 .../bitops/non-instrumented-non-atomic.h      |  16 ++
 include/linux/bitops.h                        |  50 ++++++
 lib/test_bitmap.c                             |  45 +++++
 tools/include/asm-generic/bitops/non-atomic.h |  34 ++--
 tools/include/linux/bitops.h                  |  16 ++
 17 files changed, 476 insertions(+), 237 deletions(-)
 create mode 100644 include/asm-generic/bitops/generic-non-atomic.h
 create mode 100644 include/asm-generic/bitops/non-instrumented-non-atomic.h

-- 
2.36.1


^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH v3 1/7] ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr()
  2022-06-17 14:40   ` Alexander Lobakin
  (?)
@ 2022-06-17 14:40     ` Alexander Lobakin
  -1 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-17 14:40 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel, stable, kernel test robot

test_bit(), as any other bitmap op, takes `unsigned long *` as a
second argument (pointer to the actual bitmap), as any bitmap
itself is an array of unsigned longs. However, the ia64_get_irr()
code passes a ref to `u64` as a second argument.
This works with the ia64 bitops implementation due to that they
have `void *` as the second argument and then cast it later on.
This works with the bitmap API itself due to that `unsigned long`
has the same size on ia64 as `u64` (`unsigned long long`), but
from the compiler PoV those two are different.
Define @irr as `unsigned long` to fix that. That implies no
functional changes. Has been hidden for 16 years!

Fixes: a58786917ce2 ("[IA64] avoid broken SAL_CACHE_FLUSH implementations")
Cc: stable@vger.kernel.org # 2.6.16+
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
---
 arch/ia64/include/asm/processor.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h
index 7cbce290f4e5..757c2f6d8d4b 100644
--- a/arch/ia64/include/asm/processor.h
+++ b/arch/ia64/include/asm/processor.h
@@ -538,7 +538,7 @@ ia64_get_irr(unsigned int vector)
 {
 	unsigned int reg = vector / 64;
 	unsigned int bit = vector % 64;
-	u64 irr;
+	unsigned long irr;
 
 	switch (reg) {
 	case 0: irr = ia64_getreg(_IA64_REG_CR_IRR0); break;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v3 1/7] ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr()
@ 2022-06-17 14:40     ` Alexander Lobakin
  0 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-17 14:40 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel, stable, kernel test robot

test_bit(), as any other bitmap op, takes `unsigned long *` as a
second argument (pointer to the actual bitmap), as any bitmap
itself is an array of unsigned longs. However, the ia64_get_irr()
code passes a ref to `u64` as a second argument.
This works with the ia64 bitops implementation due to that they
have `void *` as the second argument and then cast it later on.
This works with the bitmap API itself due to that `unsigned long`
has the same size on ia64 as `u64` (`unsigned long long`), but
from the compiler PoV those two are different.
Define @irr as `unsigned long` to fix that. That implies no
functional changes. Has been hidden for 16 years!

Fixes: a58786917ce2 ("[IA64] avoid broken SAL_CACHE_FLUSH implementations")
Cc: stable@vger.kernel.org # 2.6.16+
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
---
 arch/ia64/include/asm/processor.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h
index 7cbce290f4e5..757c2f6d8d4b 100644
--- a/arch/ia64/include/asm/processor.h
+++ b/arch/ia64/include/asm/processor.h
@@ -538,7 +538,7 @@ ia64_get_irr(unsigned int vector)
 {
 	unsigned int reg = vector / 64;
 	unsigned int bit = vector % 64;
-	u64 irr;
+	unsigned long irr;
 
 	switch (reg) {
 	case 0: irr = ia64_getreg(_IA64_REG_CR_IRR0); break;
-- 
2.36.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v3 1/7] ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr()
@ 2022-06-17 14:40     ` Alexander Lobakin
  0 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-17 14:40 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64

test_bit(), as any other bitmap op, takes `unsigned long *` as a
second argument (pointer to the actual bitmap), as any bitmap
itself is an array of unsigned longs. However, the ia64_get_irr()
code passes a ref to `u64` as a second argument.
This works with the ia64 bitops implementation due to that they
have `void *` as the second argument and then cast it later on.
This works with the bitmap API itself due to that `unsigned long`
has the same size on ia64 as `u64` (`unsigned long long`), but
from the compiler PoV those two are different.
Define @irr as `unsigned long` to fix that. That implies no
functional changes. Has been hidden for 16 years!

Fixes: a58786917ce2 ("[IA64] avoid broken SAL_CACHE_FLUSH implementations")
Cc: stable@vger.kernel.org # 2.6.16+
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
---
 arch/ia64/include/asm/processor.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h
index 7cbce290f4e5..757c2f6d8d4b 100644
--- a/arch/ia64/include/asm/processor.h
+++ b/arch/ia64/include/asm/processor.h
@@ -538,7 +538,7 @@ ia64_get_irr(unsigned int vector)
 {
 	unsigned int reg = vector / 64;
 	unsigned int bit = vector % 64;
-	u64 irr;
+	unsigned long irr;
 
 	switch (reg) {
 	case 0: irr = ia64_getreg(_IA64_REG_CR_IRR0); break;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v3 2/7] bitops: always define asm-generic non-atomic bitops
  2022-06-17 14:40   ` Alexander Lobakin
  (?)
@ 2022-06-17 14:40     ` Alexander Lobakin
  -1 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-17 14:40 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

Move generic non-atomic bitops from the asm-generic header which
gets included only when there are no architecture-specific
alternatives, to a separate independent file to make them always
available.
Almost no actual code changes, only one comment added to
generic_test_bit() saying that it's an atomic operation itself
and thus `volatile` must always stay there with no cast-aways.

Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> # comment
Suggested-by: Marco Elver <elver@google.com> # reference to kernel-doc
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
---
 .../asm-generic/bitops/generic-non-atomic.h   | 130 ++++++++++++++++++
 include/asm-generic/bitops/non-atomic.h       | 110 ++-------------
 2 files changed, 138 insertions(+), 102 deletions(-)
 create mode 100644 include/asm-generic/bitops/generic-non-atomic.h

diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
new file mode 100644
index 000000000000..7226488810e5
--- /dev/null
+++ b/include/asm-generic/bitops/generic-non-atomic.h
@@ -0,0 +1,130 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
+#define __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
+
+#include <linux/bits.h>
+
+#ifndef _LINUX_BITOPS_H
+#error only <linux/bitops.h> can be included directly
+#endif
+
+/*
+ * Generic definitions for bit operations, should not be used in regular code
+ * directly.
+ */
+
+/**
+ * generic___set_bit - Set a bit in memory
+ * @nr: the bit to set
+ * @addr: the address to start counting from
+ *
+ * Unlike set_bit(), this function is non-atomic and may be reordered.
+ * If it's called on the same region of memory simultaneously, the effect
+ * may be that only one operation succeeds.
+ */
+static __always_inline void
+generic___set_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+	*p  |= mask;
+}
+
+static __always_inline void
+generic___clear_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+	*p &= ~mask;
+}
+
+/**
+ * generic___change_bit - Toggle a bit in memory
+ * @nr: the bit to change
+ * @addr: the address to start counting from
+ *
+ * Unlike change_bit(), this function is non-atomic and may be reordered.
+ * If it's called on the same region of memory simultaneously, the effect
+ * may be that only one operation succeeds.
+ */
+static __always_inline
+void generic___change_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+	*p ^= mask;
+}
+
+/**
+ * generic___test_and_set_bit - Set a bit and return its old value
+ * @nr: Bit to set
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail.  You must protect multiple accesses with a lock.
+ */
+static __always_inline int
+generic___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+	unsigned long old = *p;
+
+	*p = old | mask;
+	return (old & mask) != 0;
+}
+
+/**
+ * generic___test_and_clear_bit - Clear a bit and return its old value
+ * @nr: Bit to clear
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail.  You must protect multiple accesses with a lock.
+ */
+static __always_inline int
+generic___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+	unsigned long old = *p;
+
+	*p = old & ~mask;
+	return (old & mask) != 0;
+}
+
+/* WARNING: non atomic and it can be reordered! */
+static __always_inline int
+generic___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+	unsigned long old = *p;
+
+	*p = old ^ mask;
+	return (old & mask) != 0;
+}
+
+/**
+ * generic_test_bit - Determine whether a bit is set
+ * @nr: bit number to test
+ * @addr: Address to start counting from
+ */
+static __always_inline int
+generic_test_bit(unsigned int nr, const volatile unsigned long *addr)
+{
+	/*
+	 * Unlike the bitops with the '__' prefix above, this one *is* atomic,
+	 * so `volatile` must always stay here with no cast-aways. See
+	 * `Documentation/atomic_bitops.txt` for the details.
+	 */
+	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
+}
+
+#endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h
index 078cc68be2f1..23d3abc1e10d 100644
--- a/include/asm-generic/bitops/non-atomic.h
+++ b/include/asm-generic/bitops/non-atomic.h
@@ -2,121 +2,27 @@
 #ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
 #define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
 
-#include <asm/types.h>
+#include <asm-generic/bitops/generic-non-atomic.h>
 
-/**
- * arch___set_bit - Set a bit in memory
- * @nr: the bit to set
- * @addr: the address to start counting from
- *
- * Unlike set_bit(), this function is non-atomic and may be reordered.
- * If it's called on the same region of memory simultaneously, the effect
- * may be that only one operation succeeds.
- */
-static __always_inline void
-arch___set_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-
-	*p  |= mask;
-}
+#define arch___set_bit generic___set_bit
 #define __set_bit arch___set_bit
 
-static __always_inline void
-arch___clear_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-
-	*p &= ~mask;
-}
+#define arch___clear_bit generic___clear_bit
 #define __clear_bit arch___clear_bit
 
-/**
- * arch___change_bit - Toggle a bit in memory
- * @nr: the bit to change
- * @addr: the address to start counting from
- *
- * Unlike change_bit(), this function is non-atomic and may be reordered.
- * If it's called on the same region of memory simultaneously, the effect
- * may be that only one operation succeeds.
- */
-static __always_inline
-void arch___change_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-
-	*p ^= mask;
-}
+#define arch___change_bit generic___change_bit
 #define __change_bit arch___change_bit
 
-/**
- * arch___test_and_set_bit - Set a bit and return its old value
- * @nr: Bit to set
- * @addr: Address to count from
- *
- * This operation is non-atomic and can be reordered.
- * If two examples of this operation race, one can appear to succeed
- * but actually fail.  You must protect multiple accesses with a lock.
- */
-static __always_inline int
-arch___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-	unsigned long old = *p;
-
-	*p = old | mask;
-	return (old & mask) != 0;
-}
+#define arch___test_and_set_bit generic___test_and_set_bit
 #define __test_and_set_bit arch___test_and_set_bit
 
-/**
- * arch___test_and_clear_bit - Clear a bit and return its old value
- * @nr: Bit to clear
- * @addr: Address to count from
- *
- * This operation is non-atomic and can be reordered.
- * If two examples of this operation race, one can appear to succeed
- * but actually fail.  You must protect multiple accesses with a lock.
- */
-static __always_inline int
-arch___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-	unsigned long old = *p;
-
-	*p = old & ~mask;
-	return (old & mask) != 0;
-}
+#define arch___test_and_clear_bit generic___test_and_clear_bit
 #define __test_and_clear_bit arch___test_and_clear_bit
 
-/* WARNING: non atomic and it can be reordered! */
-static __always_inline int
-arch___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-	unsigned long old = *p;
-
-	*p = old ^ mask;
-	return (old & mask) != 0;
-}
+#define arch___test_and_change_bit generic___test_and_change_bit
 #define __test_and_change_bit arch___test_and_change_bit
 
-/**
- * arch_test_bit - Determine whether a bit is set
- * @nr: bit number to test
- * @addr: Address to start counting from
- */
-static __always_inline int
-arch_test_bit(unsigned int nr, const volatile unsigned long *addr)
-{
-	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
-}
+#define arch_test_bit generic_test_bit
 #define test_bit arch_test_bit
 
 #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v3 2/7] bitops: always define asm-generic non-atomic bitops
@ 2022-06-17 14:40     ` Alexander Lobakin
  0 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-17 14:40 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

Move generic non-atomic bitops from the asm-generic header which
gets included only when there are no architecture-specific
alternatives, to a separate independent file to make them always
available.
Almost no actual code changes, only one comment added to
generic_test_bit() saying that it's an atomic operation itself
and thus `volatile` must always stay there with no cast-aways.

Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> # comment
Suggested-by: Marco Elver <elver@google.com> # reference to kernel-doc
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
---
 .../asm-generic/bitops/generic-non-atomic.h   | 130 ++++++++++++++++++
 include/asm-generic/bitops/non-atomic.h       | 110 ++-------------
 2 files changed, 138 insertions(+), 102 deletions(-)
 create mode 100644 include/asm-generic/bitops/generic-non-atomic.h

diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
new file mode 100644
index 000000000000..7226488810e5
--- /dev/null
+++ b/include/asm-generic/bitops/generic-non-atomic.h
@@ -0,0 +1,130 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
+#define __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
+
+#include <linux/bits.h>
+
+#ifndef _LINUX_BITOPS_H
+#error only <linux/bitops.h> can be included directly
+#endif
+
+/*
+ * Generic definitions for bit operations, should not be used in regular code
+ * directly.
+ */
+
+/**
+ * generic___set_bit - Set a bit in memory
+ * @nr: the bit to set
+ * @addr: the address to start counting from
+ *
+ * Unlike set_bit(), this function is non-atomic and may be reordered.
+ * If it's called on the same region of memory simultaneously, the effect
+ * may be that only one operation succeeds.
+ */
+static __always_inline void
+generic___set_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+	*p  |= mask;
+}
+
+static __always_inline void
+generic___clear_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+	*p &= ~mask;
+}
+
+/**
+ * generic___change_bit - Toggle a bit in memory
+ * @nr: the bit to change
+ * @addr: the address to start counting from
+ *
+ * Unlike change_bit(), this function is non-atomic and may be reordered.
+ * If it's called on the same region of memory simultaneously, the effect
+ * may be that only one operation succeeds.
+ */
+static __always_inline
+void generic___change_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+	*p ^= mask;
+}
+
+/**
+ * generic___test_and_set_bit - Set a bit and return its old value
+ * @nr: Bit to set
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail.  You must protect multiple accesses with a lock.
+ */
+static __always_inline int
+generic___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+	unsigned long old = *p;
+
+	*p = old | mask;
+	return (old & mask) != 0;
+}
+
+/**
+ * generic___test_and_clear_bit - Clear a bit and return its old value
+ * @nr: Bit to clear
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail.  You must protect multiple accesses with a lock.
+ */
+static __always_inline int
+generic___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+	unsigned long old = *p;
+
+	*p = old & ~mask;
+	return (old & mask) != 0;
+}
+
+/* WARNING: non atomic and it can be reordered! */
+static __always_inline int
+generic___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+	unsigned long old = *p;
+
+	*p = old ^ mask;
+	return (old & mask) != 0;
+}
+
+/**
+ * generic_test_bit - Determine whether a bit is set
+ * @nr: bit number to test
+ * @addr: Address to start counting from
+ */
+static __always_inline int
+generic_test_bit(unsigned int nr, const volatile unsigned long *addr)
+{
+	/*
+	 * Unlike the bitops with the '__' prefix above, this one *is* atomic,
+	 * so `volatile` must always stay here with no cast-aways. See
+	 * `Documentation/atomic_bitops.txt` for the details.
+	 */
+	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
+}
+
+#endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h
index 078cc68be2f1..23d3abc1e10d 100644
--- a/include/asm-generic/bitops/non-atomic.h
+++ b/include/asm-generic/bitops/non-atomic.h
@@ -2,121 +2,27 @@
 #ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
 #define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
 
-#include <asm/types.h>
+#include <asm-generic/bitops/generic-non-atomic.h>
 
-/**
- * arch___set_bit - Set a bit in memory
- * @nr: the bit to set
- * @addr: the address to start counting from
- *
- * Unlike set_bit(), this function is non-atomic and may be reordered.
- * If it's called on the same region of memory simultaneously, the effect
- * may be that only one operation succeeds.
- */
-static __always_inline void
-arch___set_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-
-	*p  |= mask;
-}
+#define arch___set_bit generic___set_bit
 #define __set_bit arch___set_bit
 
-static __always_inline void
-arch___clear_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-
-	*p &= ~mask;
-}
+#define arch___clear_bit generic___clear_bit
 #define __clear_bit arch___clear_bit
 
-/**
- * arch___change_bit - Toggle a bit in memory
- * @nr: the bit to change
- * @addr: the address to start counting from
- *
- * Unlike change_bit(), this function is non-atomic and may be reordered.
- * If it's called on the same region of memory simultaneously, the effect
- * may be that only one operation succeeds.
- */
-static __always_inline
-void arch___change_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-
-	*p ^= mask;
-}
+#define arch___change_bit generic___change_bit
 #define __change_bit arch___change_bit
 
-/**
- * arch___test_and_set_bit - Set a bit and return its old value
- * @nr: Bit to set
- * @addr: Address to count from
- *
- * This operation is non-atomic and can be reordered.
- * If two examples of this operation race, one can appear to succeed
- * but actually fail.  You must protect multiple accesses with a lock.
- */
-static __always_inline int
-arch___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-	unsigned long old = *p;
-
-	*p = old | mask;
-	return (old & mask) != 0;
-}
+#define arch___test_and_set_bit generic___test_and_set_bit
 #define __test_and_set_bit arch___test_and_set_bit
 
-/**
- * arch___test_and_clear_bit - Clear a bit and return its old value
- * @nr: Bit to clear
- * @addr: Address to count from
- *
- * This operation is non-atomic and can be reordered.
- * If two examples of this operation race, one can appear to succeed
- * but actually fail.  You must protect multiple accesses with a lock.
- */
-static __always_inline int
-arch___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-	unsigned long old = *p;
-
-	*p = old & ~mask;
-	return (old & mask) != 0;
-}
+#define arch___test_and_clear_bit generic___test_and_clear_bit
 #define __test_and_clear_bit arch___test_and_clear_bit
 
-/* WARNING: non atomic and it can be reordered! */
-static __always_inline int
-arch___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-	unsigned long old = *p;
-
-	*p = old ^ mask;
-	return (old & mask) != 0;
-}
+#define arch___test_and_change_bit generic___test_and_change_bit
 #define __test_and_change_bit arch___test_and_change_bit
 
-/**
- * arch_test_bit - Determine whether a bit is set
- * @nr: bit number to test
- * @addr: Address to start counting from
- */
-static __always_inline int
-arch_test_bit(unsigned int nr, const volatile unsigned long *addr)
-{
-	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
-}
+#define arch_test_bit generic_test_bit
 #define test_bit arch_test_bit
 
 #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
-- 
2.36.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v3 2/7] bitops: always define asm-generic non-atomic bitops
@ 2022-06-17 14:40     ` Alexander Lobakin
  0 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-17 14:40 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64

Move generic non-atomic bitops from the asm-generic header which
gets included only when there are no architecture-specific
alternatives, to a separate independent file to make them always
available.
Almost no actual code changes, only one comment added to
generic_test_bit() saying that it's an atomic operation itself
and thus `volatile` must always stay there with no cast-aways.

Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> # comment
Suggested-by: Marco Elver <elver@google.com> # reference to kernel-doc
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
---
 .../asm-generic/bitops/generic-non-atomic.h   | 130 ++++++++++++++++++
 include/asm-generic/bitops/non-atomic.h       | 110 ++-------------
 2 files changed, 138 insertions(+), 102 deletions(-)
 create mode 100644 include/asm-generic/bitops/generic-non-atomic.h

diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
new file mode 100644
index 000000000000..7226488810e5
--- /dev/null
+++ b/include/asm-generic/bitops/generic-non-atomic.h
@@ -0,0 +1,130 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
+#define __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
+
+#include <linux/bits.h>
+
+#ifndef _LINUX_BITOPS_H
+#error only <linux/bitops.h> can be included directly
+#endif
+
+/*
+ * Generic definitions for bit operations, should not be used in regular code
+ * directly.
+ */
+
+/**
+ * generic___set_bit - Set a bit in memory
+ * @nr: the bit to set
+ * @addr: the address to start counting from
+ *
+ * Unlike set_bit(), this function is non-atomic and may be reordered.
+ * If it's called on the same region of memory simultaneously, the effect
+ * may be that only one operation succeeds.
+ */
+static __always_inline void
+generic___set_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+	*p  |= mask;
+}
+
+static __always_inline void
+generic___clear_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+	*p &= ~mask;
+}
+
+/**
+ * generic___change_bit - Toggle a bit in memory
+ * @nr: the bit to change
+ * @addr: the address to start counting from
+ *
+ * Unlike change_bit(), this function is non-atomic and may be reordered.
+ * If it's called on the same region of memory simultaneously, the effect
+ * may be that only one operation succeeds.
+ */
+static __always_inline
+void generic___change_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+	*p ^= mask;
+}
+
+/**
+ * generic___test_and_set_bit - Set a bit and return its old value
+ * @nr: Bit to set
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail.  You must protect multiple accesses with a lock.
+ */
+static __always_inline int
+generic___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+	unsigned long old = *p;
+
+	*p = old | mask;
+	return (old & mask) != 0;
+}
+
+/**
+ * generic___test_and_clear_bit - Clear a bit and return its old value
+ * @nr: Bit to clear
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail.  You must protect multiple accesses with a lock.
+ */
+static __always_inline int
+generic___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+	unsigned long old = *p;
+
+	*p = old & ~mask;
+	return (old & mask) != 0;
+}
+
+/* WARNING: non atomic and it can be reordered! */
+static __always_inline int
+generic___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+	unsigned long old = *p;
+
+	*p = old ^ mask;
+	return (old & mask) != 0;
+}
+
+/**
+ * generic_test_bit - Determine whether a bit is set
+ * @nr: bit number to test
+ * @addr: Address to start counting from
+ */
+static __always_inline int
+generic_test_bit(unsigned int nr, const volatile unsigned long *addr)
+{
+	/*
+	 * Unlike the bitops with the '__' prefix above, this one *is* atomic,
+	 * so `volatile` must always stay here with no cast-aways. See
+	 * `Documentation/atomic_bitops.txt` for the details.
+	 */
+	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
+}
+
+#endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h
index 078cc68be2f1..23d3abc1e10d 100644
--- a/include/asm-generic/bitops/non-atomic.h
+++ b/include/asm-generic/bitops/non-atomic.h
@@ -2,121 +2,27 @@
 #ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
 #define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
 
-#include <asm/types.h>
+#include <asm-generic/bitops/generic-non-atomic.h>
 
-/**
- * arch___set_bit - Set a bit in memory
- * @nr: the bit to set
- * @addr: the address to start counting from
- *
- * Unlike set_bit(), this function is non-atomic and may be reordered.
- * If it's called on the same region of memory simultaneously, the effect
- * may be that only one operation succeeds.
- */
-static __always_inline void
-arch___set_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-
-	*p  |= mask;
-}
+#define arch___set_bit generic___set_bit
 #define __set_bit arch___set_bit
 
-static __always_inline void
-arch___clear_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-
-	*p &= ~mask;
-}
+#define arch___clear_bit generic___clear_bit
 #define __clear_bit arch___clear_bit
 
-/**
- * arch___change_bit - Toggle a bit in memory
- * @nr: the bit to change
- * @addr: the address to start counting from
- *
- * Unlike change_bit(), this function is non-atomic and may be reordered.
- * If it's called on the same region of memory simultaneously, the effect
- * may be that only one operation succeeds.
- */
-static __always_inline
-void arch___change_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-
-	*p ^= mask;
-}
+#define arch___change_bit generic___change_bit
 #define __change_bit arch___change_bit
 
-/**
- * arch___test_and_set_bit - Set a bit and return its old value
- * @nr: Bit to set
- * @addr: Address to count from
- *
- * This operation is non-atomic and can be reordered.
- * If two examples of this operation race, one can appear to succeed
- * but actually fail.  You must protect multiple accesses with a lock.
- */
-static __always_inline int
-arch___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-	unsigned long old = *p;
-
-	*p = old | mask;
-	return (old & mask) != 0;
-}
+#define arch___test_and_set_bit generic___test_and_set_bit
 #define __test_and_set_bit arch___test_and_set_bit
 
-/**
- * arch___test_and_clear_bit - Clear a bit and return its old value
- * @nr: Bit to clear
- * @addr: Address to count from
- *
- * This operation is non-atomic and can be reordered.
- * If two examples of this operation race, one can appear to succeed
- * but actually fail.  You must protect multiple accesses with a lock.
- */
-static __always_inline int
-arch___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-	unsigned long old = *p;
-
-	*p = old & ~mask;
-	return (old & mask) != 0;
-}
+#define arch___test_and_clear_bit generic___test_and_clear_bit
 #define __test_and_clear_bit arch___test_and_clear_bit
 
-/* WARNING: non atomic and it can be reordered! */
-static __always_inline int
-arch___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-	unsigned long old = *p;
-
-	*p = old ^ mask;
-	return (old & mask) != 0;
-}
+#define arch___test_and_change_bit generic___test_and_change_bit
 #define __test_and_change_bit arch___test_and_change_bit
 
-/**
- * arch_test_bit - Determine whether a bit is set
- * @nr: bit number to test
- * @addr: Address to start counting from
- */
-static __always_inline int
-arch_test_bit(unsigned int nr, const volatile unsigned long *addr)
-{
-	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
-}
+#define arch_test_bit generic_test_bit
 #define test_bit arch_test_bit
 
 #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v3 3/7] bitops: unify non-atomic bitops prototypes across architectures
  2022-06-17 14:40   ` Alexander Lobakin
  (?)
@ 2022-06-17 14:40     ` Alexander Lobakin
  -1 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-17 14:40 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

Currently, there is a mess with the prototypes of the non-atomic
bitops across the different architectures:

ret	bool, int, unsigned long
nr	int, long, unsigned int, unsigned long
addr	volatile unsigned long *, volatile void *

Thankfully, it doesn't provoke any bugs, but can sometimes make
the compiler angry when it's not handy at all.
Adjust all the prototypes to the following standard:

ret	bool				retval can be only 0 or 1
nr	unsigned long			native; signed makes no sense
addr	volatile unsigned long *	bitmaps are arrays of ulongs

Next, some architectures don't define 'arch_' versions as they don't
support instrumentation, others do. To make sure there is always the
same set of callables present and to ease any potential future
changes, make them all follow the rule:
 * architecture-specific files define only 'arch_' versions;
 * non-prefixed versions can be defined only in asm-generic files;
and place the non-prefixed definitions into a new file in
asm-generic to be included by non-instrumented architectures.

Finally, add some static assertions in order to prevent people from
making a mess in this room again.
I also used the %__always_inline attribute consistently, so that
they always get resolved to the actual operations.

Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
---
 arch/alpha/include/asm/bitops.h               | 32 ++++++------
 arch/hexagon/include/asm/bitops.h             | 24 +++++----
 arch/ia64/include/asm/bitops.h                | 42 ++++++++--------
 arch/m68k/include/asm/bitops.h                | 49 +++++++++++++------
 arch/sh/include/asm/bitops-op32.h             | 34 ++++++++-----
 arch/x86/include/asm/bitops.h                 | 22 +++++----
 .../asm-generic/bitops/generic-non-atomic.h   | 24 ++++-----
 .../bitops/instrumented-non-atomic.h          | 21 +++++---
 include/asm-generic/bitops/non-atomic.h       | 13 +----
 .../bitops/non-instrumented-non-atomic.h      | 16 ++++++
 include/linux/bitops.h                        | 17 +++++++
 tools/include/asm-generic/bitops/non-atomic.h | 24 +++++----
 12 files changed, 198 insertions(+), 120 deletions(-)
 create mode 100644 include/asm-generic/bitops/non-instrumented-non-atomic.h

diff --git a/arch/alpha/include/asm/bitops.h b/arch/alpha/include/asm/bitops.h
index e1d8483a45f2..492c7713ddae 100644
--- a/arch/alpha/include/asm/bitops.h
+++ b/arch/alpha/include/asm/bitops.h
@@ -46,8 +46,8 @@ set_bit(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static inline void
-__set_bit(unsigned long nr, volatile void * addr)
+static __always_inline void
+arch___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	int *m = ((int *) addr) + (nr >> 5);
 
@@ -82,8 +82,8 @@ clear_bit_unlock(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static __inline__ void
-__clear_bit(unsigned long nr, volatile void * addr)
+static __always_inline void
+arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	int *m = ((int *) addr) + (nr >> 5);
 
@@ -94,7 +94,7 @@ static inline void
 __clear_bit_unlock(unsigned long nr, volatile void * addr)
 {
 	smp_mb();
-	__clear_bit(nr, addr);
+	arch___clear_bit(nr, addr);
 }
 
 static inline void
@@ -118,8 +118,8 @@ change_bit(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static __inline__ void
-__change_bit(unsigned long nr, volatile void * addr)
+static __always_inline void
+arch___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	int *m = ((int *) addr) + (nr >> 5);
 
@@ -186,8 +186,8 @@ test_and_set_bit_lock(unsigned long nr, volatile void *addr)
 /*
  * WARNING: non atomic version.
  */
-static inline int
-__test_and_set_bit(unsigned long nr, volatile void * addr)
+static __always_inline bool
+arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = 1 << (nr & 0x1f);
 	int *m = ((int *) addr) + (nr >> 5);
@@ -230,8 +230,8 @@ test_and_clear_bit(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static inline int
-__test_and_clear_bit(unsigned long nr, volatile void * addr)
+static __always_inline bool
+arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = 1 << (nr & 0x1f);
 	int *m = ((int *) addr) + (nr >> 5);
@@ -272,8 +272,8 @@ test_and_change_bit(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static __inline__ int
-__test_and_change_bit(unsigned long nr, volatile void * addr)
+static __always_inline bool
+arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = 1 << (nr & 0x1f);
 	int *m = ((int *) addr) + (nr >> 5);
@@ -283,8 +283,8 @@ __test_and_change_bit(unsigned long nr, volatile void * addr)
 	return (old & mask) != 0;
 }
 
-static inline int
-test_bit(int nr, const volatile void * addr)
+static __always_inline bool
+arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return (1UL & (((const int *) addr)[nr >> 5] >> (nr & 31))) != 0UL;
 }
@@ -450,6 +450,8 @@ sched_find_first_bit(const unsigned long b[2])
 	return __ffs(tmp) + ofs;
 }
 
+#include <asm-generic/bitops/non-instrumented-non-atomic.h>
+
 #include <asm-generic/bitops/le.h>
 
 #include <asm-generic/bitops/ext2-atomic-setbit.h>
diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h
index 75d6ba3643b8..da500471ac73 100644
--- a/arch/hexagon/include/asm/bitops.h
+++ b/arch/hexagon/include/asm/bitops.h
@@ -127,38 +127,45 @@ static inline void change_bit(int nr, volatile void *addr)
  * be atomic, particularly for things like slab_lock and slab_unlock.
  *
  */
-static inline void __clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	test_and_clear_bit(nr, addr);
 }
 
-static inline void __set_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+arch___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	test_and_set_bit(nr, addr);
 }
 
-static inline void __change_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+arch___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	test_and_change_bit(nr, addr);
 }
 
 /*  Apparently, at least some of these are allowed to be non-atomic  */
-static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_clear_bit(nr, addr);
 }
 
-static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_set_bit(nr, addr);
 }
 
-static inline int __test_and_change_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_change_bit(nr, addr);
 }
 
-static inline int __test_bit(int nr, const volatile unsigned long *addr)
+static __always_inline bool
+arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	int retval;
 
@@ -172,8 +179,6 @@ static inline int __test_bit(int nr, const volatile unsigned long *addr)
 	return retval;
 }
 
-#define test_bit(nr, addr) __test_bit(nr, addr)
-
 /*
  * ffz - find first zero in word.
  * @word: The word to search
@@ -271,6 +276,7 @@ static inline unsigned long __fls(unsigned long word)
 }
 
 #include <asm-generic/bitops/lock.h>
+#include <asm-generic/bitops/non-instrumented-non-atomic.h>
 
 #include <asm-generic/bitops/fls64.h>
 #include <asm-generic/bitops/sched.h>
diff --git a/arch/ia64/include/asm/bitops.h b/arch/ia64/include/asm/bitops.h
index 577be93c0818..9f62af7fd7c4 100644
--- a/arch/ia64/include/asm/bitops.h
+++ b/arch/ia64/include/asm/bitops.h
@@ -53,7 +53,7 @@ set_bit (int nr, volatile void *addr)
 }
 
 /**
- * __set_bit - Set a bit in memory
+ * arch___set_bit - Set a bit in memory
  * @nr: the bit to set
  * @addr: the address to start counting from
  *
@@ -61,8 +61,8 @@ set_bit (int nr, volatile void *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static __inline__ void
-__set_bit (int nr, volatile void *addr)
+static __always_inline void
+arch___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	*((__u32 *) addr + (nr >> 5)) |= (1 << (nr & 31));
 }
@@ -135,7 +135,7 @@ __clear_bit_unlock(int nr, void *addr)
 }
 
 /**
- * __clear_bit - Clears a bit in memory (non-atomic version)
+ * arch___clear_bit - Clears a bit in memory (non-atomic version)
  * @nr: the bit to clear
  * @addr: the address to start counting from
  *
@@ -143,8 +143,8 @@ __clear_bit_unlock(int nr, void *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static __inline__ void
-__clear_bit (int nr, volatile void *addr)
+static __always_inline void
+arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	*((__u32 *) addr + (nr >> 5)) &= ~(1 << (nr & 31));
 }
@@ -175,7 +175,7 @@ change_bit (int nr, volatile void *addr)
 }
 
 /**
- * __change_bit - Toggle a bit in memory
+ * arch___change_bit - Toggle a bit in memory
  * @nr: the bit to toggle
  * @addr: the address to start counting from
  *
@@ -183,8 +183,8 @@ change_bit (int nr, volatile void *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static __inline__ void
-__change_bit (int nr, volatile void *addr)
+static __always_inline void
+arch___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	*((__u32 *) addr + (nr >> 5)) ^= (1 << (nr & 31));
 }
@@ -224,7 +224,7 @@ test_and_set_bit (int nr, volatile void *addr)
 #define test_and_set_bit_lock test_and_set_bit
 
 /**
- * __test_and_set_bit - Set a bit and return its old value
+ * arch___test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
@@ -232,8 +232,8 @@ test_and_set_bit (int nr, volatile void *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static __inline__ int
-__test_and_set_bit (int nr, volatile void *addr)
+static __always_inline bool
+arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__u32 *p = (__u32 *) addr + (nr >> 5);
 	__u32 m = 1 << (nr & 31);
@@ -269,7 +269,7 @@ test_and_clear_bit (int nr, volatile void *addr)
 }
 
 /**
- * __test_and_clear_bit - Clear a bit and return its old value
+ * arch___test_and_clear_bit - Clear a bit and return its old value
  * @nr: Bit to clear
  * @addr: Address to count from
  *
@@ -277,8 +277,8 @@ test_and_clear_bit (int nr, volatile void *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static __inline__ int
-__test_and_clear_bit(int nr, volatile void * addr)
+static __always_inline bool
+arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__u32 *p = (__u32 *) addr + (nr >> 5);
 	__u32 m = 1 << (nr & 31);
@@ -314,14 +314,14 @@ test_and_change_bit (int nr, volatile void *addr)
 }
 
 /**
- * __test_and_change_bit - Change a bit and return its old value
+ * arch___test_and_change_bit - Change a bit and return its old value
  * @nr: Bit to change
  * @addr: Address to count from
  *
  * This operation is non-atomic and can be reordered.
  */
-static __inline__ int
-__test_and_change_bit (int nr, void *addr)
+static __always_inline bool
+arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__u32 old, bit = (1 << (nr & 31));
 	__u32 *m = (__u32 *) addr + (nr >> 5);
@@ -331,8 +331,8 @@ __test_and_change_bit (int nr, void *addr)
 	return (old & bit) != 0;
 }
 
-static __inline__ int
-test_bit (int nr, const volatile void *addr)
+static __always_inline bool
+arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1 & (((const volatile __u32 *) addr)[nr >> 5] >> (nr & 31));
 }
@@ -443,6 +443,8 @@ static __inline__ unsigned long __arch_hweight64(unsigned long x)
 
 #ifdef __KERNEL__
 
+#include <asm-generic/bitops/non-instrumented-non-atomic.h>
+
 #include <asm-generic/bitops/le.h>
 
 #include <asm-generic/bitops/ext2-atomic-setbit.h>
diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
index 51283db53667..71495faf2a90 100644
--- a/arch/m68k/include/asm/bitops.h
+++ b/arch/m68k/include/asm/bitops.h
@@ -65,8 +65,11 @@ static inline void bfset_mem_set_bit(int nr, volatile unsigned long *vaddr)
 				bfset_mem_set_bit(nr, vaddr))
 #endif
 
-#define __set_bit(nr, vaddr)	set_bit(nr, vaddr)
-
+static __always_inline void
+arch___set_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	set_bit(nr, addr);
+}
 
 static inline void bclr_reg_clear_bit(int nr, volatile unsigned long *vaddr)
 {
@@ -105,8 +108,11 @@ static inline void bfclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
 				bfclr_mem_clear_bit(nr, vaddr))
 #endif
 
-#define __clear_bit(nr, vaddr)	clear_bit(nr, vaddr)
-
+static __always_inline void
+arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	clear_bit(nr, addr);
+}
 
 static inline void bchg_reg_change_bit(int nr, volatile unsigned long *vaddr)
 {
@@ -145,14 +151,17 @@ static inline void bfchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
 				bfchg_mem_change_bit(nr, vaddr))
 #endif
 
-#define __change_bit(nr, vaddr)	change_bit(nr, vaddr)
-
-
-static inline int test_bit(int nr, const volatile unsigned long *vaddr)
+static __always_inline void
+arch___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
-	return (vaddr[nr >> 5] & (1UL << (nr & 31))) != 0;
+	change_bit(nr, addr);
 }
 
+static __always_inline bool
+arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
+{
+	return (addr[nr >> 5] & (1UL << (nr & 31))) != 0;
+}
 
 static inline int bset_reg_test_and_set_bit(int nr,
 					    volatile unsigned long *vaddr)
@@ -201,8 +210,11 @@ static inline int bfset_mem_test_and_set_bit(int nr,
 					bfset_mem_test_and_set_bit(nr, vaddr))
 #endif
 
-#define __test_and_set_bit(nr, vaddr)	test_and_set_bit(nr, vaddr)
-
+static __always_inline bool
+arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	return test_and_set_bit(nr, addr);
+}
 
 static inline int bclr_reg_test_and_clear_bit(int nr,
 					      volatile unsigned long *vaddr)
@@ -251,8 +263,11 @@ static inline int bfclr_mem_test_and_clear_bit(int nr,
 					bfclr_mem_test_and_clear_bit(nr, vaddr))
 #endif
 
-#define __test_and_clear_bit(nr, vaddr)	test_and_clear_bit(nr, vaddr)
-
+static __always_inline bool
+arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	return test_and_clear_bit(nr, addr);
+}
 
 static inline int bchg_reg_test_and_change_bit(int nr,
 					       volatile unsigned long *vaddr)
@@ -301,8 +316,11 @@ static inline int bfchg_mem_test_and_change_bit(int nr,
 					bfchg_mem_test_and_change_bit(nr, vaddr))
 #endif
 
-#define __test_and_change_bit(nr, vaddr) test_and_change_bit(nr, vaddr)
-
+static __always_inline bool
+arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	return test_and_change_bit(nr, addr);
+}
 
 /*
  *	The true 68020 and more advanced processors support the "bfffo"
@@ -522,6 +540,7 @@ static inline int __fls(int x)
 #define clear_bit_unlock	clear_bit
 #define __clear_bit_unlock	clear_bit_unlock
 
+#include <asm-generic/bitops/non-instrumented-non-atomic.h>
 #include <asm-generic/bitops/ext2-atomic.h>
 #include <asm-generic/bitops/fls64.h>
 #include <asm-generic/bitops/sched.h>
diff --git a/arch/sh/include/asm/bitops-op32.h b/arch/sh/include/asm/bitops-op32.h
index cfe5465acce7..565a85d8b7fb 100644
--- a/arch/sh/include/asm/bitops-op32.h
+++ b/arch/sh/include/asm/bitops-op32.h
@@ -2,6 +2,8 @@
 #ifndef __ASM_SH_BITOPS_OP32_H
 #define __ASM_SH_BITOPS_OP32_H
 
+#include <linux/bits.h>
+
 /*
  * The bit modifying instructions on SH-2A are only capable of working
  * with a 3-bit immediate, which signifies the shift position for the bit
@@ -16,7 +18,8 @@
 #define BYTE_OFFSET(nr)		((nr) % BITS_PER_BYTE)
 #endif
 
-static inline void __set_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+arch___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	if (__builtin_constant_p(nr)) {
 		__asm__ __volatile__ (
@@ -33,7 +36,8 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
 	}
 }
 
-static inline void __clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	if (__builtin_constant_p(nr)) {
 		__asm__ __volatile__ (
@@ -52,7 +56,7 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
 }
 
 /**
- * __change_bit - Toggle a bit in memory
+ * arch___change_bit - Toggle a bit in memory
  * @nr: the bit to change
  * @addr: the address to start counting from
  *
@@ -60,7 +64,8 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static inline void __change_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+arch___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	if (__builtin_constant_p(nr)) {
 		__asm__ __volatile__ (
@@ -79,7 +84,7 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
 }
 
 /**
- * __test_and_set_bit - Set a bit and return its old value
+ * arch___test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
@@ -87,7 +92,8 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -98,7 +104,7 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
 }
 
 /**
- * __test_and_clear_bit - Clear a bit and return its old value
+ * arch___test_and_clear_bit - Clear a bit and return its old value
  * @nr: Bit to clear
  * @addr: Address to count from
  *
@@ -106,7 +112,8 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -117,8 +124,8 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
 }
 
 /* WARNING: non atomic and it can be reordered! */
-static inline int __test_and_change_bit(int nr,
-					    volatile unsigned long *addr)
+static __always_inline bool
+arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -129,13 +136,16 @@ static inline int __test_and_change_bit(int nr,
 }
 
 /**
- * test_bit - Determine whether a bit is set
+ * arch_test_bit - Determine whether a bit is set
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
-static inline int test_bit(int nr, const volatile unsigned long *addr)
+static __always_inline bool
+arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
 
+#include <asm-generic/bitops/non-instrumented-non-atomic.h>
+
 #endif /* __ASM_SH_BITOPS_OP32_H */
diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index a288ecd230ab..973c6bd17f98 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -63,7 +63,7 @@ arch_set_bit(long nr, volatile unsigned long *addr)
 }
 
 static __always_inline void
-arch___set_bit(long nr, volatile unsigned long *addr)
+arch___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	asm volatile(__ASM_SIZE(bts) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
 }
@@ -89,7 +89,7 @@ arch_clear_bit_unlock(long nr, volatile unsigned long *addr)
 }
 
 static __always_inline void
-arch___clear_bit(long nr, volatile unsigned long *addr)
+arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	asm volatile(__ASM_SIZE(btr) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
 }
@@ -114,7 +114,7 @@ arch___clear_bit_unlock(long nr, volatile unsigned long *addr)
 }
 
 static __always_inline void
-arch___change_bit(long nr, volatile unsigned long *addr)
+arch___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	asm volatile(__ASM_SIZE(btc) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
 }
@@ -145,7 +145,7 @@ arch_test_and_set_bit_lock(long nr, volatile unsigned long *addr)
 }
 
 static __always_inline bool
-arch___test_and_set_bit(long nr, volatile unsigned long *addr)
+arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	bool oldbit;
 
@@ -171,7 +171,7 @@ arch_test_and_clear_bit(long nr, volatile unsigned long *addr)
  * this without also updating arch/x86/kernel/kvm.c
  */
 static __always_inline bool
-arch___test_and_clear_bit(long nr, volatile unsigned long *addr)
+arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	bool oldbit;
 
@@ -183,7 +183,7 @@ arch___test_and_clear_bit(long nr, volatile unsigned long *addr)
 }
 
 static __always_inline bool
-arch___test_and_change_bit(long nr, volatile unsigned long *addr)
+arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	bool oldbit;
 
@@ -219,10 +219,12 @@ static __always_inline bool variable_test_bit(long nr, volatile const unsigned l
 	return oldbit;
 }
 
-#define arch_test_bit(nr, addr)			\
-	(__builtin_constant_p((nr))		\
-	 ? constant_test_bit((nr), (addr))	\
-	 : variable_test_bit((nr), (addr)))
+static __always_inline bool
+arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
+{
+	return __builtin_constant_p(nr) ? constant_test_bit(nr, addr) :
+					  variable_test_bit(nr, addr);
+}
 
 /**
  * __ffs - find first set bit in word
diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
index 7226488810e5..b85b8a2ac239 100644
--- a/include/asm-generic/bitops/generic-non-atomic.h
+++ b/include/asm-generic/bitops/generic-non-atomic.h
@@ -24,7 +24,7 @@
  * may be that only one operation succeeds.
  */
 static __always_inline void
-generic___set_bit(unsigned int nr, volatile unsigned long *addr)
+generic___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -33,7 +33,7 @@ generic___set_bit(unsigned int nr, volatile unsigned long *addr)
 }
 
 static __always_inline void
-generic___clear_bit(unsigned int nr, volatile unsigned long *addr)
+generic___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -50,8 +50,8 @@ generic___clear_bit(unsigned int nr, volatile unsigned long *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static __always_inline
-void generic___change_bit(unsigned int nr, volatile unsigned long *addr)
+static __always_inline void
+generic___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -68,8 +68,8 @@ void generic___change_bit(unsigned int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static __always_inline int
-generic___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
+static __always_inline bool
+generic___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -88,8 +88,8 @@ generic___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static __always_inline int
-generic___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
+static __always_inline bool
+generic___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -100,8 +100,8 @@ generic___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
 }
 
 /* WARNING: non atomic and it can be reordered! */
-static __always_inline int
-generic___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
+static __always_inline bool
+generic___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -116,8 +116,8 @@ generic___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
-static __always_inline int
-generic_test_bit(unsigned int nr, const volatile unsigned long *addr)
+static __always_inline bool
+generic_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	/*
 	 * Unlike the bitops with the '__' prefix above, this one *is* atomic,
diff --git a/include/asm-generic/bitops/instrumented-non-atomic.h b/include/asm-generic/bitops/instrumented-non-atomic.h
index 7ab1ecc37782..b019f77ef21c 100644
--- a/include/asm-generic/bitops/instrumented-non-atomic.h
+++ b/include/asm-generic/bitops/instrumented-non-atomic.h
@@ -22,7 +22,8 @@
  * region of memory concurrently, the effect may be that only one operation
  * succeeds.
  */
-static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___set_bit(nr, addr);
@@ -37,7 +38,8 @@ static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
  * region of memory concurrently, the effect may be that only one operation
  * succeeds.
  */
-static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___clear_bit(nr, addr);
@@ -52,7 +54,8 @@ static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
  * region of memory concurrently, the effect may be that only one operation
  * succeeds.
  */
-static __always_inline void __change_bit(long nr, volatile unsigned long *addr)
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___change_bit(nr, addr);
@@ -90,7 +93,8 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
  * This operation is non-atomic. If two instances of this operation race, one
  * can appear to succeed but actually fail.
  */
-static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_set_bit(nr, addr);
@@ -104,7 +108,8 @@ static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *
  * This operation is non-atomic. If two instances of this operation race, one
  * can appear to succeed but actually fail.
  */
-static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_clear_bit(nr, addr);
@@ -118,7 +123,8 @@ static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long
  * This operation is non-atomic. If two instances of this operation race, one
  * can appear to succeed but actually fail.
  */
-static __always_inline bool __test_and_change_bit(long nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_change_bit(nr, addr);
@@ -129,7 +135,8 @@ static __always_inline bool __test_and_change_bit(long nr, volatile unsigned lon
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
-static __always_inline bool test_bit(long nr, const volatile unsigned long *addr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long));
 	return arch_test_bit(nr, addr);
diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h
index 23d3abc1e10d..5c37ced343ae 100644
--- a/include/asm-generic/bitops/non-atomic.h
+++ b/include/asm-generic/bitops/non-atomic.h
@@ -5,24 +5,15 @@
 #include <asm-generic/bitops/generic-non-atomic.h>
 
 #define arch___set_bit generic___set_bit
-#define __set_bit arch___set_bit
-
 #define arch___clear_bit generic___clear_bit
-#define __clear_bit arch___clear_bit
-
 #define arch___change_bit generic___change_bit
-#define __change_bit arch___change_bit
 
 #define arch___test_and_set_bit generic___test_and_set_bit
-#define __test_and_set_bit arch___test_and_set_bit
-
 #define arch___test_and_clear_bit generic___test_and_clear_bit
-#define __test_and_clear_bit arch___test_and_clear_bit
-
 #define arch___test_and_change_bit generic___test_and_change_bit
-#define __test_and_change_bit arch___test_and_change_bit
 
 #define arch_test_bit generic_test_bit
-#define test_bit arch_test_bit
+
+#include <asm-generic/bitops/non-instrumented-non-atomic.h>
 
 #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
diff --git a/include/asm-generic/bitops/non-instrumented-non-atomic.h b/include/asm-generic/bitops/non-instrumented-non-atomic.h
new file mode 100644
index 000000000000..e0fd7bf72a56
--- /dev/null
+++ b/include/asm-generic/bitops/non-instrumented-non-atomic.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H
+#define __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H
+
+#define __set_bit		arch___set_bit
+#define __clear_bit		arch___clear_bit
+#define __change_bit		arch___change_bit
+
+#define __test_and_set_bit	arch___test_and_set_bit
+#define __test_and_clear_bit	arch___test_and_clear_bit
+#define __test_and_change_bit	arch___test_and_change_bit
+
+#define test_bit		arch_test_bit
+
+#endif /* __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H */
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 7aaed501f768..87087454a288 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -26,12 +26,29 @@ extern unsigned int __sw_hweight16(unsigned int w);
 extern unsigned int __sw_hweight32(unsigned int w);
 extern unsigned long __sw_hweight64(__u64 w);
 
+#include <asm-generic/bitops/generic-non-atomic.h>
+
 /*
  * Include this here because some architectures need generic_ffs/fls in
  * scope
  */
 #include <asm/bitops.h>
 
+/* Check that the bitops prototypes are sane */
+#define __check_bitop_pr(name)						\
+	static_assert(__same_type(arch_##name, generic_##name) &&	\
+		      __same_type(name, generic_##name))
+
+__check_bitop_pr(__set_bit);
+__check_bitop_pr(__clear_bit);
+__check_bitop_pr(__change_bit);
+__check_bitop_pr(__test_and_set_bit);
+__check_bitop_pr(__test_and_clear_bit);
+__check_bitop_pr(__test_and_change_bit);
+__check_bitop_pr(test_bit);
+
+#undef __check_bitop_pr
+
 static inline int get_bitmask_order(unsigned int count)
 {
 	int order;
diff --git a/tools/include/asm-generic/bitops/non-atomic.h b/tools/include/asm-generic/bitops/non-atomic.h
index 7e10c4b50c5d..e5e78e42e57b 100644
--- a/tools/include/asm-generic/bitops/non-atomic.h
+++ b/tools/include/asm-generic/bitops/non-atomic.h
@@ -2,7 +2,7 @@
 #ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
 #define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
 
-#include <asm/types.h>
+#include <linux/bits.h>
 
 /**
  * __set_bit - Set a bit in memory
@@ -13,7 +13,8 @@
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static inline void __set_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -21,7 +22,8 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
 	*p  |= mask;
 }
 
-static inline void __clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -38,7 +40,8 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static inline void __change_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -55,7 +58,8 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -74,7 +78,8 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -85,8 +90,8 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
 }
 
 /* WARNING: non atomic and it can be reordered! */
-static inline int __test_and_change_bit(int nr,
-					    volatile unsigned long *addr)
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -101,7 +106,8 @@ static inline int __test_and_change_bit(int nr,
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
-static inline int test_bit(int nr, const volatile unsigned long *addr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v3 3/7] bitops: unify non-atomic bitops prototypes across architectures
@ 2022-06-17 14:40     ` Alexander Lobakin
  0 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-17 14:40 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

Currently, there is a mess with the prototypes of the non-atomic
bitops across the different architectures:

ret	bool, int, unsigned long
nr	int, long, unsigned int, unsigned long
addr	volatile unsigned long *, volatile void *

Thankfully, it doesn't provoke any bugs, but can sometimes make
the compiler angry when it's not handy at all.
Adjust all the prototypes to the following standard:

ret	bool				retval can be only 0 or 1
nr	unsigned long			native; signed makes no sense
addr	volatile unsigned long *	bitmaps are arrays of ulongs

Next, some architectures don't define 'arch_' versions as they don't
support instrumentation, others do. To make sure there is always the
same set of callables present and to ease any potential future
changes, make them all follow the rule:
 * architecture-specific files define only 'arch_' versions;
 * non-prefixed versions can be defined only in asm-generic files;
and place the non-prefixed definitions into a new file in
asm-generic to be included by non-instrumented architectures.

Finally, add some static assertions in order to prevent people from
making a mess in this room again.
I also used the %__always_inline attribute consistently, so that
they always get resolved to the actual operations.

Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
---
 arch/alpha/include/asm/bitops.h               | 32 ++++++------
 arch/hexagon/include/asm/bitops.h             | 24 +++++----
 arch/ia64/include/asm/bitops.h                | 42 ++++++++--------
 arch/m68k/include/asm/bitops.h                | 49 +++++++++++++------
 arch/sh/include/asm/bitops-op32.h             | 34 ++++++++-----
 arch/x86/include/asm/bitops.h                 | 22 +++++----
 .../asm-generic/bitops/generic-non-atomic.h   | 24 ++++-----
 .../bitops/instrumented-non-atomic.h          | 21 +++++---
 include/asm-generic/bitops/non-atomic.h       | 13 +----
 .../bitops/non-instrumented-non-atomic.h      | 16 ++++++
 include/linux/bitops.h                        | 17 +++++++
 tools/include/asm-generic/bitops/non-atomic.h | 24 +++++----
 12 files changed, 198 insertions(+), 120 deletions(-)
 create mode 100644 include/asm-generic/bitops/non-instrumented-non-atomic.h

diff --git a/arch/alpha/include/asm/bitops.h b/arch/alpha/include/asm/bitops.h
index e1d8483a45f2..492c7713ddae 100644
--- a/arch/alpha/include/asm/bitops.h
+++ b/arch/alpha/include/asm/bitops.h
@@ -46,8 +46,8 @@ set_bit(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static inline void
-__set_bit(unsigned long nr, volatile void * addr)
+static __always_inline void
+arch___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	int *m = ((int *) addr) + (nr >> 5);
 
@@ -82,8 +82,8 @@ clear_bit_unlock(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static __inline__ void
-__clear_bit(unsigned long nr, volatile void * addr)
+static __always_inline void
+arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	int *m = ((int *) addr) + (nr >> 5);
 
@@ -94,7 +94,7 @@ static inline void
 __clear_bit_unlock(unsigned long nr, volatile void * addr)
 {
 	smp_mb();
-	__clear_bit(nr, addr);
+	arch___clear_bit(nr, addr);
 }
 
 static inline void
@@ -118,8 +118,8 @@ change_bit(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static __inline__ void
-__change_bit(unsigned long nr, volatile void * addr)
+static __always_inline void
+arch___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	int *m = ((int *) addr) + (nr >> 5);
 
@@ -186,8 +186,8 @@ test_and_set_bit_lock(unsigned long nr, volatile void *addr)
 /*
  * WARNING: non atomic version.
  */
-static inline int
-__test_and_set_bit(unsigned long nr, volatile void * addr)
+static __always_inline bool
+arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = 1 << (nr & 0x1f);
 	int *m = ((int *) addr) + (nr >> 5);
@@ -230,8 +230,8 @@ test_and_clear_bit(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static inline int
-__test_and_clear_bit(unsigned long nr, volatile void * addr)
+static __always_inline bool
+arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = 1 << (nr & 0x1f);
 	int *m = ((int *) addr) + (nr >> 5);
@@ -272,8 +272,8 @@ test_and_change_bit(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static __inline__ int
-__test_and_change_bit(unsigned long nr, volatile void * addr)
+static __always_inline bool
+arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = 1 << (nr & 0x1f);
 	int *m = ((int *) addr) + (nr >> 5);
@@ -283,8 +283,8 @@ __test_and_change_bit(unsigned long nr, volatile void * addr)
 	return (old & mask) != 0;
 }
 
-static inline int
-test_bit(int nr, const volatile void * addr)
+static __always_inline bool
+arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return (1UL & (((const int *) addr)[nr >> 5] >> (nr & 31))) != 0UL;
 }
@@ -450,6 +450,8 @@ sched_find_first_bit(const unsigned long b[2])
 	return __ffs(tmp) + ofs;
 }
 
+#include <asm-generic/bitops/non-instrumented-non-atomic.h>
+
 #include <asm-generic/bitops/le.h>
 
 #include <asm-generic/bitops/ext2-atomic-setbit.h>
diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h
index 75d6ba3643b8..da500471ac73 100644
--- a/arch/hexagon/include/asm/bitops.h
+++ b/arch/hexagon/include/asm/bitops.h
@@ -127,38 +127,45 @@ static inline void change_bit(int nr, volatile void *addr)
  * be atomic, particularly for things like slab_lock and slab_unlock.
  *
  */
-static inline void __clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	test_and_clear_bit(nr, addr);
 }
 
-static inline void __set_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+arch___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	test_and_set_bit(nr, addr);
 }
 
-static inline void __change_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+arch___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	test_and_change_bit(nr, addr);
 }
 
 /*  Apparently, at least some of these are allowed to be non-atomic  */
-static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_clear_bit(nr, addr);
 }
 
-static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_set_bit(nr, addr);
 }
 
-static inline int __test_and_change_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_change_bit(nr, addr);
 }
 
-static inline int __test_bit(int nr, const volatile unsigned long *addr)
+static __always_inline bool
+arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	int retval;
 
@@ -172,8 +179,6 @@ static inline int __test_bit(int nr, const volatile unsigned long *addr)
 	return retval;
 }
 
-#define test_bit(nr, addr) __test_bit(nr, addr)
-
 /*
  * ffz - find first zero in word.
  * @word: The word to search
@@ -271,6 +276,7 @@ static inline unsigned long __fls(unsigned long word)
 }
 
 #include <asm-generic/bitops/lock.h>
+#include <asm-generic/bitops/non-instrumented-non-atomic.h>
 
 #include <asm-generic/bitops/fls64.h>
 #include <asm-generic/bitops/sched.h>
diff --git a/arch/ia64/include/asm/bitops.h b/arch/ia64/include/asm/bitops.h
index 577be93c0818..9f62af7fd7c4 100644
--- a/arch/ia64/include/asm/bitops.h
+++ b/arch/ia64/include/asm/bitops.h
@@ -53,7 +53,7 @@ set_bit (int nr, volatile void *addr)
 }
 
 /**
- * __set_bit - Set a bit in memory
+ * arch___set_bit - Set a bit in memory
  * @nr: the bit to set
  * @addr: the address to start counting from
  *
@@ -61,8 +61,8 @@ set_bit (int nr, volatile void *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static __inline__ void
-__set_bit (int nr, volatile void *addr)
+static __always_inline void
+arch___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	*((__u32 *) addr + (nr >> 5)) |= (1 << (nr & 31));
 }
@@ -135,7 +135,7 @@ __clear_bit_unlock(int nr, void *addr)
 }
 
 /**
- * __clear_bit - Clears a bit in memory (non-atomic version)
+ * arch___clear_bit - Clears a bit in memory (non-atomic version)
  * @nr: the bit to clear
  * @addr: the address to start counting from
  *
@@ -143,8 +143,8 @@ __clear_bit_unlock(int nr, void *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static __inline__ void
-__clear_bit (int nr, volatile void *addr)
+static __always_inline void
+arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	*((__u32 *) addr + (nr >> 5)) &= ~(1 << (nr & 31));
 }
@@ -175,7 +175,7 @@ change_bit (int nr, volatile void *addr)
 }
 
 /**
- * __change_bit - Toggle a bit in memory
+ * arch___change_bit - Toggle a bit in memory
  * @nr: the bit to toggle
  * @addr: the address to start counting from
  *
@@ -183,8 +183,8 @@ change_bit (int nr, volatile void *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static __inline__ void
-__change_bit (int nr, volatile void *addr)
+static __always_inline void
+arch___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	*((__u32 *) addr + (nr >> 5)) ^= (1 << (nr & 31));
 }
@@ -224,7 +224,7 @@ test_and_set_bit (int nr, volatile void *addr)
 #define test_and_set_bit_lock test_and_set_bit
 
 /**
- * __test_and_set_bit - Set a bit and return its old value
+ * arch___test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
@@ -232,8 +232,8 @@ test_and_set_bit (int nr, volatile void *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static __inline__ int
-__test_and_set_bit (int nr, volatile void *addr)
+static __always_inline bool
+arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__u32 *p = (__u32 *) addr + (nr >> 5);
 	__u32 m = 1 << (nr & 31);
@@ -269,7 +269,7 @@ test_and_clear_bit (int nr, volatile void *addr)
 }
 
 /**
- * __test_and_clear_bit - Clear a bit and return its old value
+ * arch___test_and_clear_bit - Clear a bit and return its old value
  * @nr: Bit to clear
  * @addr: Address to count from
  *
@@ -277,8 +277,8 @@ test_and_clear_bit (int nr, volatile void *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static __inline__ int
-__test_and_clear_bit(int nr, volatile void * addr)
+static __always_inline bool
+arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__u32 *p = (__u32 *) addr + (nr >> 5);
 	__u32 m = 1 << (nr & 31);
@@ -314,14 +314,14 @@ test_and_change_bit (int nr, volatile void *addr)
 }
 
 /**
- * __test_and_change_bit - Change a bit and return its old value
+ * arch___test_and_change_bit - Change a bit and return its old value
  * @nr: Bit to change
  * @addr: Address to count from
  *
  * This operation is non-atomic and can be reordered.
  */
-static __inline__ int
-__test_and_change_bit (int nr, void *addr)
+static __always_inline bool
+arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__u32 old, bit = (1 << (nr & 31));
 	__u32 *m = (__u32 *) addr + (nr >> 5);
@@ -331,8 +331,8 @@ __test_and_change_bit (int nr, void *addr)
 	return (old & bit) != 0;
 }
 
-static __inline__ int
-test_bit (int nr, const volatile void *addr)
+static __always_inline bool
+arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1 & (((const volatile __u32 *) addr)[nr >> 5] >> (nr & 31));
 }
@@ -443,6 +443,8 @@ static __inline__ unsigned long __arch_hweight64(unsigned long x)
 
 #ifdef __KERNEL__
 
+#include <asm-generic/bitops/non-instrumented-non-atomic.h>
+
 #include <asm-generic/bitops/le.h>
 
 #include <asm-generic/bitops/ext2-atomic-setbit.h>
diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
index 51283db53667..71495faf2a90 100644
--- a/arch/m68k/include/asm/bitops.h
+++ b/arch/m68k/include/asm/bitops.h
@@ -65,8 +65,11 @@ static inline void bfset_mem_set_bit(int nr, volatile unsigned long *vaddr)
 				bfset_mem_set_bit(nr, vaddr))
 #endif
 
-#define __set_bit(nr, vaddr)	set_bit(nr, vaddr)
-
+static __always_inline void
+arch___set_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	set_bit(nr, addr);
+}
 
 static inline void bclr_reg_clear_bit(int nr, volatile unsigned long *vaddr)
 {
@@ -105,8 +108,11 @@ static inline void bfclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
 				bfclr_mem_clear_bit(nr, vaddr))
 #endif
 
-#define __clear_bit(nr, vaddr)	clear_bit(nr, vaddr)
-
+static __always_inline void
+arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	clear_bit(nr, addr);
+}
 
 static inline void bchg_reg_change_bit(int nr, volatile unsigned long *vaddr)
 {
@@ -145,14 +151,17 @@ static inline void bfchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
 				bfchg_mem_change_bit(nr, vaddr))
 #endif
 
-#define __change_bit(nr, vaddr)	change_bit(nr, vaddr)
-
-
-static inline int test_bit(int nr, const volatile unsigned long *vaddr)
+static __always_inline void
+arch___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
-	return (vaddr[nr >> 5] & (1UL << (nr & 31))) != 0;
+	change_bit(nr, addr);
 }
 
+static __always_inline bool
+arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
+{
+	return (addr[nr >> 5] & (1UL << (nr & 31))) != 0;
+}
 
 static inline int bset_reg_test_and_set_bit(int nr,
 					    volatile unsigned long *vaddr)
@@ -201,8 +210,11 @@ static inline int bfset_mem_test_and_set_bit(int nr,
 					bfset_mem_test_and_set_bit(nr, vaddr))
 #endif
 
-#define __test_and_set_bit(nr, vaddr)	test_and_set_bit(nr, vaddr)
-
+static __always_inline bool
+arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	return test_and_set_bit(nr, addr);
+}
 
 static inline int bclr_reg_test_and_clear_bit(int nr,
 					      volatile unsigned long *vaddr)
@@ -251,8 +263,11 @@ static inline int bfclr_mem_test_and_clear_bit(int nr,
 					bfclr_mem_test_and_clear_bit(nr, vaddr))
 #endif
 
-#define __test_and_clear_bit(nr, vaddr)	test_and_clear_bit(nr, vaddr)
-
+static __always_inline bool
+arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	return test_and_clear_bit(nr, addr);
+}
 
 static inline int bchg_reg_test_and_change_bit(int nr,
 					       volatile unsigned long *vaddr)
@@ -301,8 +316,11 @@ static inline int bfchg_mem_test_and_change_bit(int nr,
 					bfchg_mem_test_and_change_bit(nr, vaddr))
 #endif
 
-#define __test_and_change_bit(nr, vaddr) test_and_change_bit(nr, vaddr)
-
+static __always_inline bool
+arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	return test_and_change_bit(nr, addr);
+}
 
 /*
  *	The true 68020 and more advanced processors support the "bfffo"
@@ -522,6 +540,7 @@ static inline int __fls(int x)
 #define clear_bit_unlock	clear_bit
 #define __clear_bit_unlock	clear_bit_unlock
 
+#include <asm-generic/bitops/non-instrumented-non-atomic.h>
 #include <asm-generic/bitops/ext2-atomic.h>
 #include <asm-generic/bitops/fls64.h>
 #include <asm-generic/bitops/sched.h>
diff --git a/arch/sh/include/asm/bitops-op32.h b/arch/sh/include/asm/bitops-op32.h
index cfe5465acce7..565a85d8b7fb 100644
--- a/arch/sh/include/asm/bitops-op32.h
+++ b/arch/sh/include/asm/bitops-op32.h
@@ -2,6 +2,8 @@
 #ifndef __ASM_SH_BITOPS_OP32_H
 #define __ASM_SH_BITOPS_OP32_H
 
+#include <linux/bits.h>
+
 /*
  * The bit modifying instructions on SH-2A are only capable of working
  * with a 3-bit immediate, which signifies the shift position for the bit
@@ -16,7 +18,8 @@
 #define BYTE_OFFSET(nr)		((nr) % BITS_PER_BYTE)
 #endif
 
-static inline void __set_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+arch___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	if (__builtin_constant_p(nr)) {
 		__asm__ __volatile__ (
@@ -33,7 +36,8 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
 	}
 }
 
-static inline void __clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	if (__builtin_constant_p(nr)) {
 		__asm__ __volatile__ (
@@ -52,7 +56,7 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
 }
 
 /**
- * __change_bit - Toggle a bit in memory
+ * arch___change_bit - Toggle a bit in memory
  * @nr: the bit to change
  * @addr: the address to start counting from
  *
@@ -60,7 +64,8 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static inline void __change_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+arch___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	if (__builtin_constant_p(nr)) {
 		__asm__ __volatile__ (
@@ -79,7 +84,7 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
 }
 
 /**
- * __test_and_set_bit - Set a bit and return its old value
+ * arch___test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
@@ -87,7 +92,8 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -98,7 +104,7 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
 }
 
 /**
- * __test_and_clear_bit - Clear a bit and return its old value
+ * arch___test_and_clear_bit - Clear a bit and return its old value
  * @nr: Bit to clear
  * @addr: Address to count from
  *
@@ -106,7 +112,8 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -117,8 +124,8 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
 }
 
 /* WARNING: non atomic and it can be reordered! */
-static inline int __test_and_change_bit(int nr,
-					    volatile unsigned long *addr)
+static __always_inline bool
+arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -129,13 +136,16 @@ static inline int __test_and_change_bit(int nr,
 }
 
 /**
- * test_bit - Determine whether a bit is set
+ * arch_test_bit - Determine whether a bit is set
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
-static inline int test_bit(int nr, const volatile unsigned long *addr)
+static __always_inline bool
+arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
 
+#include <asm-generic/bitops/non-instrumented-non-atomic.h>
+
 #endif /* __ASM_SH_BITOPS_OP32_H */
diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index a288ecd230ab..973c6bd17f98 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -63,7 +63,7 @@ arch_set_bit(long nr, volatile unsigned long *addr)
 }
 
 static __always_inline void
-arch___set_bit(long nr, volatile unsigned long *addr)
+arch___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	asm volatile(__ASM_SIZE(bts) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
 }
@@ -89,7 +89,7 @@ arch_clear_bit_unlock(long nr, volatile unsigned long *addr)
 }
 
 static __always_inline void
-arch___clear_bit(long nr, volatile unsigned long *addr)
+arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	asm volatile(__ASM_SIZE(btr) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
 }
@@ -114,7 +114,7 @@ arch___clear_bit_unlock(long nr, volatile unsigned long *addr)
 }
 
 static __always_inline void
-arch___change_bit(long nr, volatile unsigned long *addr)
+arch___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	asm volatile(__ASM_SIZE(btc) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
 }
@@ -145,7 +145,7 @@ arch_test_and_set_bit_lock(long nr, volatile unsigned long *addr)
 }
 
 static __always_inline bool
-arch___test_and_set_bit(long nr, volatile unsigned long *addr)
+arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	bool oldbit;
 
@@ -171,7 +171,7 @@ arch_test_and_clear_bit(long nr, volatile unsigned long *addr)
  * this without also updating arch/x86/kernel/kvm.c
  */
 static __always_inline bool
-arch___test_and_clear_bit(long nr, volatile unsigned long *addr)
+arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	bool oldbit;
 
@@ -183,7 +183,7 @@ arch___test_and_clear_bit(long nr, volatile unsigned long *addr)
 }
 
 static __always_inline bool
-arch___test_and_change_bit(long nr, volatile unsigned long *addr)
+arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	bool oldbit;
 
@@ -219,10 +219,12 @@ static __always_inline bool variable_test_bit(long nr, volatile const unsigned l
 	return oldbit;
 }
 
-#define arch_test_bit(nr, addr)			\
-	(__builtin_constant_p((nr))		\
-	 ? constant_test_bit((nr), (addr))	\
-	 : variable_test_bit((nr), (addr)))
+static __always_inline bool
+arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
+{
+	return __builtin_constant_p(nr) ? constant_test_bit(nr, addr) :
+					  variable_test_bit(nr, addr);
+}
 
 /**
  * __ffs - find first set bit in word
diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
index 7226488810e5..b85b8a2ac239 100644
--- a/include/asm-generic/bitops/generic-non-atomic.h
+++ b/include/asm-generic/bitops/generic-non-atomic.h
@@ -24,7 +24,7 @@
  * may be that only one operation succeeds.
  */
 static __always_inline void
-generic___set_bit(unsigned int nr, volatile unsigned long *addr)
+generic___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -33,7 +33,7 @@ generic___set_bit(unsigned int nr, volatile unsigned long *addr)
 }
 
 static __always_inline void
-generic___clear_bit(unsigned int nr, volatile unsigned long *addr)
+generic___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -50,8 +50,8 @@ generic___clear_bit(unsigned int nr, volatile unsigned long *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static __always_inline
-void generic___change_bit(unsigned int nr, volatile unsigned long *addr)
+static __always_inline void
+generic___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -68,8 +68,8 @@ void generic___change_bit(unsigned int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static __always_inline int
-generic___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
+static __always_inline bool
+generic___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -88,8 +88,8 @@ generic___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static __always_inline int
-generic___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
+static __always_inline bool
+generic___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -100,8 +100,8 @@ generic___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
 }
 
 /* WARNING: non atomic and it can be reordered! */
-static __always_inline int
-generic___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
+static __always_inline bool
+generic___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -116,8 +116,8 @@ generic___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
-static __always_inline int
-generic_test_bit(unsigned int nr, const volatile unsigned long *addr)
+static __always_inline bool
+generic_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	/*
 	 * Unlike the bitops with the '__' prefix above, this one *is* atomic,
diff --git a/include/asm-generic/bitops/instrumented-non-atomic.h b/include/asm-generic/bitops/instrumented-non-atomic.h
index 7ab1ecc37782..b019f77ef21c 100644
--- a/include/asm-generic/bitops/instrumented-non-atomic.h
+++ b/include/asm-generic/bitops/instrumented-non-atomic.h
@@ -22,7 +22,8 @@
  * region of memory concurrently, the effect may be that only one operation
  * succeeds.
  */
-static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___set_bit(nr, addr);
@@ -37,7 +38,8 @@ static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
  * region of memory concurrently, the effect may be that only one operation
  * succeeds.
  */
-static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___clear_bit(nr, addr);
@@ -52,7 +54,8 @@ static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
  * region of memory concurrently, the effect may be that only one operation
  * succeeds.
  */
-static __always_inline void __change_bit(long nr, volatile unsigned long *addr)
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___change_bit(nr, addr);
@@ -90,7 +93,8 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
  * This operation is non-atomic. If two instances of this operation race, one
  * can appear to succeed but actually fail.
  */
-static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_set_bit(nr, addr);
@@ -104,7 +108,8 @@ static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *
  * This operation is non-atomic. If two instances of this operation race, one
  * can appear to succeed but actually fail.
  */
-static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_clear_bit(nr, addr);
@@ -118,7 +123,8 @@ static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long
  * This operation is non-atomic. If two instances of this operation race, one
  * can appear to succeed but actually fail.
  */
-static __always_inline bool __test_and_change_bit(long nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_change_bit(nr, addr);
@@ -129,7 +135,8 @@ static __always_inline bool __test_and_change_bit(long nr, volatile unsigned lon
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
-static __always_inline bool test_bit(long nr, const volatile unsigned long *addr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long));
 	return arch_test_bit(nr, addr);
diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h
index 23d3abc1e10d..5c37ced343ae 100644
--- a/include/asm-generic/bitops/non-atomic.h
+++ b/include/asm-generic/bitops/non-atomic.h
@@ -5,24 +5,15 @@
 #include <asm-generic/bitops/generic-non-atomic.h>
 
 #define arch___set_bit generic___set_bit
-#define __set_bit arch___set_bit
-
 #define arch___clear_bit generic___clear_bit
-#define __clear_bit arch___clear_bit
-
 #define arch___change_bit generic___change_bit
-#define __change_bit arch___change_bit
 
 #define arch___test_and_set_bit generic___test_and_set_bit
-#define __test_and_set_bit arch___test_and_set_bit
-
 #define arch___test_and_clear_bit generic___test_and_clear_bit
-#define __test_and_clear_bit arch___test_and_clear_bit
-
 #define arch___test_and_change_bit generic___test_and_change_bit
-#define __test_and_change_bit arch___test_and_change_bit
 
 #define arch_test_bit generic_test_bit
-#define test_bit arch_test_bit
+
+#include <asm-generic/bitops/non-instrumented-non-atomic.h>
 
 #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
diff --git a/include/asm-generic/bitops/non-instrumented-non-atomic.h b/include/asm-generic/bitops/non-instrumented-non-atomic.h
new file mode 100644
index 000000000000..e0fd7bf72a56
--- /dev/null
+++ b/include/asm-generic/bitops/non-instrumented-non-atomic.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H
+#define __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H
+
+#define __set_bit		arch___set_bit
+#define __clear_bit		arch___clear_bit
+#define __change_bit		arch___change_bit
+
+#define __test_and_set_bit	arch___test_and_set_bit
+#define __test_and_clear_bit	arch___test_and_clear_bit
+#define __test_and_change_bit	arch___test_and_change_bit
+
+#define test_bit		arch_test_bit
+
+#endif /* __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H */
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 7aaed501f768..87087454a288 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -26,12 +26,29 @@ extern unsigned int __sw_hweight16(unsigned int w);
 extern unsigned int __sw_hweight32(unsigned int w);
 extern unsigned long __sw_hweight64(__u64 w);
 
+#include <asm-generic/bitops/generic-non-atomic.h>
+
 /*
  * Include this here because some architectures need generic_ffs/fls in
  * scope
  */
 #include <asm/bitops.h>
 
+/* Check that the bitops prototypes are sane */
+#define __check_bitop_pr(name)						\
+	static_assert(__same_type(arch_##name, generic_##name) &&	\
+		      __same_type(name, generic_##name))
+
+__check_bitop_pr(__set_bit);
+__check_bitop_pr(__clear_bit);
+__check_bitop_pr(__change_bit);
+__check_bitop_pr(__test_and_set_bit);
+__check_bitop_pr(__test_and_clear_bit);
+__check_bitop_pr(__test_and_change_bit);
+__check_bitop_pr(test_bit);
+
+#undef __check_bitop_pr
+
 static inline int get_bitmask_order(unsigned int count)
 {
 	int order;
diff --git a/tools/include/asm-generic/bitops/non-atomic.h b/tools/include/asm-generic/bitops/non-atomic.h
index 7e10c4b50c5d..e5e78e42e57b 100644
--- a/tools/include/asm-generic/bitops/non-atomic.h
+++ b/tools/include/asm-generic/bitops/non-atomic.h
@@ -2,7 +2,7 @@
 #ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
 #define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
 
-#include <asm/types.h>
+#include <linux/bits.h>
 
 /**
  * __set_bit - Set a bit in memory
@@ -13,7 +13,8 @@
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static inline void __set_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -21,7 +22,8 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
 	*p  |= mask;
 }
 
-static inline void __clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -38,7 +40,8 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static inline void __change_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -55,7 +58,8 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -74,7 +78,8 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -85,8 +90,8 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
 }
 
 /* WARNING: non atomic and it can be reordered! */
-static inline int __test_and_change_bit(int nr,
-					    volatile unsigned long *addr)
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -101,7 +106,8 @@ static inline int __test_and_change_bit(int nr,
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
-static inline int test_bit(int nr, const volatile unsigned long *addr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
-- 
2.36.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v3 3/7] bitops: unify non-atomic bitops prototypes across architectures
@ 2022-06-17 14:40     ` Alexander Lobakin
  0 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-17 14:40 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64

Currently, there is a mess with the prototypes of the non-atomic
bitops across the different architectures:

ret	bool, int, unsigned long
nr	int, long, unsigned int, unsigned long
addr	volatile unsigned long *, volatile void *

Thankfully, it doesn't provoke any bugs, but can sometimes make
the compiler angry when it's not handy at all.
Adjust all the prototypes to the following standard:

ret	bool				retval can be only 0 or 1
nr	unsigned long			native; signed makes no sense
addr	volatile unsigned long *	bitmaps are arrays of ulongs

Next, some architectures don't define 'arch_' versions as they don't
support instrumentation, others do. To make sure there is always the
same set of callables present and to ease any potential future
changes, make them all follow the rule:
 * architecture-specific files define only 'arch_' versions;
 * non-prefixed versions can be defined only in asm-generic files;
and place the non-prefixed definitions into a new file in
asm-generic to be included by non-instrumented architectures.

Finally, add some static assertions in order to prevent people from
making a mess in this room again.
I also used the %__always_inline attribute consistently, so that
they always get resolved to the actual operations.

Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
---
 arch/alpha/include/asm/bitops.h               | 32 ++++++------
 arch/hexagon/include/asm/bitops.h             | 24 +++++----
 arch/ia64/include/asm/bitops.h                | 42 ++++++++--------
 arch/m68k/include/asm/bitops.h                | 49 +++++++++++++------
 arch/sh/include/asm/bitops-op32.h             | 34 ++++++++-----
 arch/x86/include/asm/bitops.h                 | 22 +++++----
 .../asm-generic/bitops/generic-non-atomic.h   | 24 ++++-----
 .../bitops/instrumented-non-atomic.h          | 21 +++++---
 include/asm-generic/bitops/non-atomic.h       | 13 +----
 .../bitops/non-instrumented-non-atomic.h      | 16 ++++++
 include/linux/bitops.h                        | 17 +++++++
 tools/include/asm-generic/bitops/non-atomic.h | 24 +++++----
 12 files changed, 198 insertions(+), 120 deletions(-)
 create mode 100644 include/asm-generic/bitops/non-instrumented-non-atomic.h

diff --git a/arch/alpha/include/asm/bitops.h b/arch/alpha/include/asm/bitops.h
index e1d8483a45f2..492c7713ddae 100644
--- a/arch/alpha/include/asm/bitops.h
+++ b/arch/alpha/include/asm/bitops.h
@@ -46,8 +46,8 @@ set_bit(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static inline void
-__set_bit(unsigned long nr, volatile void * addr)
+static __always_inline void
+arch___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	int *m = ((int *) addr) + (nr >> 5);
 
@@ -82,8 +82,8 @@ clear_bit_unlock(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static __inline__ void
-__clear_bit(unsigned long nr, volatile void * addr)
+static __always_inline void
+arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	int *m = ((int *) addr) + (nr >> 5);
 
@@ -94,7 +94,7 @@ static inline void
 __clear_bit_unlock(unsigned long nr, volatile void * addr)
 {
 	smp_mb();
-	__clear_bit(nr, addr);
+	arch___clear_bit(nr, addr);
 }
 
 static inline void
@@ -118,8 +118,8 @@ change_bit(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static __inline__ void
-__change_bit(unsigned long nr, volatile void * addr)
+static __always_inline void
+arch___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	int *m = ((int *) addr) + (nr >> 5);
 
@@ -186,8 +186,8 @@ test_and_set_bit_lock(unsigned long nr, volatile void *addr)
 /*
  * WARNING: non atomic version.
  */
-static inline int
-__test_and_set_bit(unsigned long nr, volatile void * addr)
+static __always_inline bool
+arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = 1 << (nr & 0x1f);
 	int *m = ((int *) addr) + (nr >> 5);
@@ -230,8 +230,8 @@ test_and_clear_bit(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static inline int
-__test_and_clear_bit(unsigned long nr, volatile void * addr)
+static __always_inline bool
+arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = 1 << (nr & 0x1f);
 	int *m = ((int *) addr) + (nr >> 5);
@@ -272,8 +272,8 @@ test_and_change_bit(unsigned long nr, volatile void * addr)
 /*
  * WARNING: non atomic version.
  */
-static __inline__ int
-__test_and_change_bit(unsigned long nr, volatile void * addr)
+static __always_inline bool
+arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = 1 << (nr & 0x1f);
 	int *m = ((int *) addr) + (nr >> 5);
@@ -283,8 +283,8 @@ __test_and_change_bit(unsigned long nr, volatile void * addr)
 	return (old & mask) != 0;
 }
 
-static inline int
-test_bit(int nr, const volatile void * addr)
+static __always_inline bool
+arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return (1UL & (((const int *) addr)[nr >> 5] >> (nr & 31))) != 0UL;
 }
@@ -450,6 +450,8 @@ sched_find_first_bit(const unsigned long b[2])
 	return __ffs(tmp) + ofs;
 }
 
+#include <asm-generic/bitops/non-instrumented-non-atomic.h>
+
 #include <asm-generic/bitops/le.h>
 
 #include <asm-generic/bitops/ext2-atomic-setbit.h>
diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h
index 75d6ba3643b8..da500471ac73 100644
--- a/arch/hexagon/include/asm/bitops.h
+++ b/arch/hexagon/include/asm/bitops.h
@@ -127,38 +127,45 @@ static inline void change_bit(int nr, volatile void *addr)
  * be atomic, particularly for things like slab_lock and slab_unlock.
  *
  */
-static inline void __clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	test_and_clear_bit(nr, addr);
 }
 
-static inline void __set_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+arch___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	test_and_set_bit(nr, addr);
 }
 
-static inline void __change_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+arch___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	test_and_change_bit(nr, addr);
 }
 
 /*  Apparently, at least some of these are allowed to be non-atomic  */
-static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_clear_bit(nr, addr);
 }
 
-static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_set_bit(nr, addr);
 }
 
-static inline int __test_and_change_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	return test_and_change_bit(nr, addr);
 }
 
-static inline int __test_bit(int nr, const volatile unsigned long *addr)
+static __always_inline bool
+arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	int retval;
 
@@ -172,8 +179,6 @@ static inline int __test_bit(int nr, const volatile unsigned long *addr)
 	return retval;
 }
 
-#define test_bit(nr, addr) __test_bit(nr, addr)
-
 /*
  * ffz - find first zero in word.
  * @word: The word to search
@@ -271,6 +276,7 @@ static inline unsigned long __fls(unsigned long word)
 }
 
 #include <asm-generic/bitops/lock.h>
+#include <asm-generic/bitops/non-instrumented-non-atomic.h>
 
 #include <asm-generic/bitops/fls64.h>
 #include <asm-generic/bitops/sched.h>
diff --git a/arch/ia64/include/asm/bitops.h b/arch/ia64/include/asm/bitops.h
index 577be93c0818..9f62af7fd7c4 100644
--- a/arch/ia64/include/asm/bitops.h
+++ b/arch/ia64/include/asm/bitops.h
@@ -53,7 +53,7 @@ set_bit (int nr, volatile void *addr)
 }
 
 /**
- * __set_bit - Set a bit in memory
+ * arch___set_bit - Set a bit in memory
  * @nr: the bit to set
  * @addr: the address to start counting from
  *
@@ -61,8 +61,8 @@ set_bit (int nr, volatile void *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static __inline__ void
-__set_bit (int nr, volatile void *addr)
+static __always_inline void
+arch___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	*((__u32 *) addr + (nr >> 5)) |= (1 << (nr & 31));
 }
@@ -135,7 +135,7 @@ __clear_bit_unlock(int nr, void *addr)
 }
 
 /**
- * __clear_bit - Clears a bit in memory (non-atomic version)
+ * arch___clear_bit - Clears a bit in memory (non-atomic version)
  * @nr: the bit to clear
  * @addr: the address to start counting from
  *
@@ -143,8 +143,8 @@ __clear_bit_unlock(int nr, void *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static __inline__ void
-__clear_bit (int nr, volatile void *addr)
+static __always_inline void
+arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	*((__u32 *) addr + (nr >> 5)) &= ~(1 << (nr & 31));
 }
@@ -175,7 +175,7 @@ change_bit (int nr, volatile void *addr)
 }
 
 /**
- * __change_bit - Toggle a bit in memory
+ * arch___change_bit - Toggle a bit in memory
  * @nr: the bit to toggle
  * @addr: the address to start counting from
  *
@@ -183,8 +183,8 @@ change_bit (int nr, volatile void *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static __inline__ void
-__change_bit (int nr, volatile void *addr)
+static __always_inline void
+arch___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	*((__u32 *) addr + (nr >> 5)) ^= (1 << (nr & 31));
 }
@@ -224,7 +224,7 @@ test_and_set_bit (int nr, volatile void *addr)
 #define test_and_set_bit_lock test_and_set_bit
 
 /**
- * __test_and_set_bit - Set a bit and return its old value
+ * arch___test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
@@ -232,8 +232,8 @@ test_and_set_bit (int nr, volatile void *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static __inline__ int
-__test_and_set_bit (int nr, volatile void *addr)
+static __always_inline bool
+arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__u32 *p = (__u32 *) addr + (nr >> 5);
 	__u32 m = 1 << (nr & 31);
@@ -269,7 +269,7 @@ test_and_clear_bit (int nr, volatile void *addr)
 }
 
 /**
- * __test_and_clear_bit - Clear a bit and return its old value
+ * arch___test_and_clear_bit - Clear a bit and return its old value
  * @nr: Bit to clear
  * @addr: Address to count from
  *
@@ -277,8 +277,8 @@ test_and_clear_bit (int nr, volatile void *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static __inline__ int
-__test_and_clear_bit(int nr, volatile void * addr)
+static __always_inline bool
+arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__u32 *p = (__u32 *) addr + (nr >> 5);
 	__u32 m = 1 << (nr & 31);
@@ -314,14 +314,14 @@ test_and_change_bit (int nr, volatile void *addr)
 }
 
 /**
- * __test_and_change_bit - Change a bit and return its old value
+ * arch___test_and_change_bit - Change a bit and return its old value
  * @nr: Bit to change
  * @addr: Address to count from
  *
  * This operation is non-atomic and can be reordered.
  */
-static __inline__ int
-__test_and_change_bit (int nr, void *addr)
+static __always_inline bool
+arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__u32 old, bit = (1 << (nr & 31));
 	__u32 *m = (__u32 *) addr + (nr >> 5);
@@ -331,8 +331,8 @@ __test_and_change_bit (int nr, void *addr)
 	return (old & bit) != 0;
 }
 
-static __inline__ int
-test_bit (int nr, const volatile void *addr)
+static __always_inline bool
+arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1 & (((const volatile __u32 *) addr)[nr >> 5] >> (nr & 31));
 }
@@ -443,6 +443,8 @@ static __inline__ unsigned long __arch_hweight64(unsigned long x)
 
 #ifdef __KERNEL__
 
+#include <asm-generic/bitops/non-instrumented-non-atomic.h>
+
 #include <asm-generic/bitops/le.h>
 
 #include <asm-generic/bitops/ext2-atomic-setbit.h>
diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
index 51283db53667..71495faf2a90 100644
--- a/arch/m68k/include/asm/bitops.h
+++ b/arch/m68k/include/asm/bitops.h
@@ -65,8 +65,11 @@ static inline void bfset_mem_set_bit(int nr, volatile unsigned long *vaddr)
 				bfset_mem_set_bit(nr, vaddr))
 #endif
 
-#define __set_bit(nr, vaddr)	set_bit(nr, vaddr)
-
+static __always_inline void
+arch___set_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	set_bit(nr, addr);
+}
 
 static inline void bclr_reg_clear_bit(int nr, volatile unsigned long *vaddr)
 {
@@ -105,8 +108,11 @@ static inline void bfclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
 				bfclr_mem_clear_bit(nr, vaddr))
 #endif
 
-#define __clear_bit(nr, vaddr)	clear_bit(nr, vaddr)
-
+static __always_inline void
+arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	clear_bit(nr, addr);
+}
 
 static inline void bchg_reg_change_bit(int nr, volatile unsigned long *vaddr)
 {
@@ -145,14 +151,17 @@ static inline void bfchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
 				bfchg_mem_change_bit(nr, vaddr))
 #endif
 
-#define __change_bit(nr, vaddr)	change_bit(nr, vaddr)
-
-
-static inline int test_bit(int nr, const volatile unsigned long *vaddr)
+static __always_inline void
+arch___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
-	return (vaddr[nr >> 5] & (1UL << (nr & 31))) != 0;
+	change_bit(nr, addr);
 }
 
+static __always_inline bool
+arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
+{
+	return (addr[nr >> 5] & (1UL << (nr & 31))) != 0;
+}
 
 static inline int bset_reg_test_and_set_bit(int nr,
 					    volatile unsigned long *vaddr)
@@ -201,8 +210,11 @@ static inline int bfset_mem_test_and_set_bit(int nr,
 					bfset_mem_test_and_set_bit(nr, vaddr))
 #endif
 
-#define __test_and_set_bit(nr, vaddr)	test_and_set_bit(nr, vaddr)
-
+static __always_inline bool
+arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	return test_and_set_bit(nr, addr);
+}
 
 static inline int bclr_reg_test_and_clear_bit(int nr,
 					      volatile unsigned long *vaddr)
@@ -251,8 +263,11 @@ static inline int bfclr_mem_test_and_clear_bit(int nr,
 					bfclr_mem_test_and_clear_bit(nr, vaddr))
 #endif
 
-#define __test_and_clear_bit(nr, vaddr)	test_and_clear_bit(nr, vaddr)
-
+static __always_inline bool
+arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	return test_and_clear_bit(nr, addr);
+}
 
 static inline int bchg_reg_test_and_change_bit(int nr,
 					       volatile unsigned long *vaddr)
@@ -301,8 +316,11 @@ static inline int bfchg_mem_test_and_change_bit(int nr,
 					bfchg_mem_test_and_change_bit(nr, vaddr))
 #endif
 
-#define __test_and_change_bit(nr, vaddr) test_and_change_bit(nr, vaddr)
-
+static __always_inline bool
+arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+{
+	return test_and_change_bit(nr, addr);
+}
 
 /*
  *	The true 68020 and more advanced processors support the "bfffo"
@@ -522,6 +540,7 @@ static inline int __fls(int x)
 #define clear_bit_unlock	clear_bit
 #define __clear_bit_unlock	clear_bit_unlock
 
+#include <asm-generic/bitops/non-instrumented-non-atomic.h>
 #include <asm-generic/bitops/ext2-atomic.h>
 #include <asm-generic/bitops/fls64.h>
 #include <asm-generic/bitops/sched.h>
diff --git a/arch/sh/include/asm/bitops-op32.h b/arch/sh/include/asm/bitops-op32.h
index cfe5465acce7..565a85d8b7fb 100644
--- a/arch/sh/include/asm/bitops-op32.h
+++ b/arch/sh/include/asm/bitops-op32.h
@@ -2,6 +2,8 @@
 #ifndef __ASM_SH_BITOPS_OP32_H
 #define __ASM_SH_BITOPS_OP32_H
 
+#include <linux/bits.h>
+
 /*
  * The bit modifying instructions on SH-2A are only capable of working
  * with a 3-bit immediate, which signifies the shift position for the bit
@@ -16,7 +18,8 @@
 #define BYTE_OFFSET(nr)		((nr) % BITS_PER_BYTE)
 #endif
 
-static inline void __set_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+arch___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	if (__builtin_constant_p(nr)) {
 		__asm__ __volatile__ (
@@ -33,7 +36,8 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
 	}
 }
 
-static inline void __clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	if (__builtin_constant_p(nr)) {
 		__asm__ __volatile__ (
@@ -52,7 +56,7 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
 }
 
 /**
- * __change_bit - Toggle a bit in memory
+ * arch___change_bit - Toggle a bit in memory
  * @nr: the bit to change
  * @addr: the address to start counting from
  *
@@ -60,7 +64,8 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static inline void __change_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+arch___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	if (__builtin_constant_p(nr)) {
 		__asm__ __volatile__ (
@@ -79,7 +84,7 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
 }
 
 /**
- * __test_and_set_bit - Set a bit and return its old value
+ * arch___test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
@@ -87,7 +92,8 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -98,7 +104,7 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
 }
 
 /**
- * __test_and_clear_bit - Clear a bit and return its old value
+ * arch___test_and_clear_bit - Clear a bit and return its old value
  * @nr: Bit to clear
  * @addr: Address to count from
  *
@@ -106,7 +112,8 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -117,8 +124,8 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
 }
 
 /* WARNING: non atomic and it can be reordered! */
-static inline int __test_and_change_bit(int nr,
-					    volatile unsigned long *addr)
+static __always_inline bool
+arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -129,13 +136,16 @@ static inline int __test_and_change_bit(int nr,
 }
 
 /**
- * test_bit - Determine whether a bit is set
+ * arch_test_bit - Determine whether a bit is set
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
-static inline int test_bit(int nr, const volatile unsigned long *addr)
+static __always_inline bool
+arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
 
+#include <asm-generic/bitops/non-instrumented-non-atomic.h>
+
 #endif /* __ASM_SH_BITOPS_OP32_H */
diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index a288ecd230ab..973c6bd17f98 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -63,7 +63,7 @@ arch_set_bit(long nr, volatile unsigned long *addr)
 }
 
 static __always_inline void
-arch___set_bit(long nr, volatile unsigned long *addr)
+arch___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	asm volatile(__ASM_SIZE(bts) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
 }
@@ -89,7 +89,7 @@ arch_clear_bit_unlock(long nr, volatile unsigned long *addr)
 }
 
 static __always_inline void
-arch___clear_bit(long nr, volatile unsigned long *addr)
+arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	asm volatile(__ASM_SIZE(btr) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
 }
@@ -114,7 +114,7 @@ arch___clear_bit_unlock(long nr, volatile unsigned long *addr)
 }
 
 static __always_inline void
-arch___change_bit(long nr, volatile unsigned long *addr)
+arch___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	asm volatile(__ASM_SIZE(btc) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
 }
@@ -145,7 +145,7 @@ arch_test_and_set_bit_lock(long nr, volatile unsigned long *addr)
 }
 
 static __always_inline bool
-arch___test_and_set_bit(long nr, volatile unsigned long *addr)
+arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	bool oldbit;
 
@@ -171,7 +171,7 @@ arch_test_and_clear_bit(long nr, volatile unsigned long *addr)
  * this without also updating arch/x86/kernel/kvm.c
  */
 static __always_inline bool
-arch___test_and_clear_bit(long nr, volatile unsigned long *addr)
+arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	bool oldbit;
 
@@ -183,7 +183,7 @@ arch___test_and_clear_bit(long nr, volatile unsigned long *addr)
 }
 
 static __always_inline bool
-arch___test_and_change_bit(long nr, volatile unsigned long *addr)
+arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	bool oldbit;
 
@@ -219,10 +219,12 @@ static __always_inline bool variable_test_bit(long nr, volatile const unsigned l
 	return oldbit;
 }
 
-#define arch_test_bit(nr, addr)			\
-	(__builtin_constant_p((nr))		\
-	 ? constant_test_bit((nr), (addr))	\
-	 : variable_test_bit((nr), (addr)))
+static __always_inline bool
+arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
+{
+	return __builtin_constant_p(nr) ? constant_test_bit(nr, addr) :
+					  variable_test_bit(nr, addr);
+}
 
 /**
  * __ffs - find first set bit in word
diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
index 7226488810e5..b85b8a2ac239 100644
--- a/include/asm-generic/bitops/generic-non-atomic.h
+++ b/include/asm-generic/bitops/generic-non-atomic.h
@@ -24,7 +24,7 @@
  * may be that only one operation succeeds.
  */
 static __always_inline void
-generic___set_bit(unsigned int nr, volatile unsigned long *addr)
+generic___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -33,7 +33,7 @@ generic___set_bit(unsigned int nr, volatile unsigned long *addr)
 }
 
 static __always_inline void
-generic___clear_bit(unsigned int nr, volatile unsigned long *addr)
+generic___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -50,8 +50,8 @@ generic___clear_bit(unsigned int nr, volatile unsigned long *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static __always_inline
-void generic___change_bit(unsigned int nr, volatile unsigned long *addr)
+static __always_inline void
+generic___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -68,8 +68,8 @@ void generic___change_bit(unsigned int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static __always_inline int
-generic___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
+static __always_inline bool
+generic___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -88,8 +88,8 @@ generic___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static __always_inline int
-generic___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
+static __always_inline bool
+generic___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -100,8 +100,8 @@ generic___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
 }
 
 /* WARNING: non atomic and it can be reordered! */
-static __always_inline int
-generic___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
+static __always_inline bool
+generic___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -116,8 +116,8 @@ generic___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
-static __always_inline int
-generic_test_bit(unsigned int nr, const volatile unsigned long *addr)
+static __always_inline bool
+generic_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	/*
 	 * Unlike the bitops with the '__' prefix above, this one *is* atomic,
diff --git a/include/asm-generic/bitops/instrumented-non-atomic.h b/include/asm-generic/bitops/instrumented-non-atomic.h
index 7ab1ecc37782..b019f77ef21c 100644
--- a/include/asm-generic/bitops/instrumented-non-atomic.h
+++ b/include/asm-generic/bitops/instrumented-non-atomic.h
@@ -22,7 +22,8 @@
  * region of memory concurrently, the effect may be that only one operation
  * succeeds.
  */
-static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___set_bit(nr, addr);
@@ -37,7 +38,8 @@ static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
  * region of memory concurrently, the effect may be that only one operation
  * succeeds.
  */
-static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___clear_bit(nr, addr);
@@ -52,7 +54,8 @@ static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
  * region of memory concurrently, the effect may be that only one operation
  * succeeds.
  */
-static __always_inline void __change_bit(long nr, volatile unsigned long *addr)
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___change_bit(nr, addr);
@@ -90,7 +93,8 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
  * This operation is non-atomic. If two instances of this operation race, one
  * can appear to succeed but actually fail.
  */
-static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_set_bit(nr, addr);
@@ -104,7 +108,8 @@ static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *
  * This operation is non-atomic. If two instances of this operation race, one
  * can appear to succeed but actually fail.
  */
-static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_clear_bit(nr, addr);
@@ -118,7 +123,8 @@ static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long
  * This operation is non-atomic. If two instances of this operation race, one
  * can appear to succeed but actually fail.
  */
-static __always_inline bool __test_and_change_bit(long nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_change_bit(nr, addr);
@@ -129,7 +135,8 @@ static __always_inline bool __test_and_change_bit(long nr, volatile unsigned lon
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
-static __always_inline bool test_bit(long nr, const volatile unsigned long *addr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long));
 	return arch_test_bit(nr, addr);
diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h
index 23d3abc1e10d..5c37ced343ae 100644
--- a/include/asm-generic/bitops/non-atomic.h
+++ b/include/asm-generic/bitops/non-atomic.h
@@ -5,24 +5,15 @@
 #include <asm-generic/bitops/generic-non-atomic.h>
 
 #define arch___set_bit generic___set_bit
-#define __set_bit arch___set_bit
-
 #define arch___clear_bit generic___clear_bit
-#define __clear_bit arch___clear_bit
-
 #define arch___change_bit generic___change_bit
-#define __change_bit arch___change_bit
 
 #define arch___test_and_set_bit generic___test_and_set_bit
-#define __test_and_set_bit arch___test_and_set_bit
-
 #define arch___test_and_clear_bit generic___test_and_clear_bit
-#define __test_and_clear_bit arch___test_and_clear_bit
-
 #define arch___test_and_change_bit generic___test_and_change_bit
-#define __test_and_change_bit arch___test_and_change_bit
 
 #define arch_test_bit generic_test_bit
-#define test_bit arch_test_bit
+
+#include <asm-generic/bitops/non-instrumented-non-atomic.h>
 
 #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
diff --git a/include/asm-generic/bitops/non-instrumented-non-atomic.h b/include/asm-generic/bitops/non-instrumented-non-atomic.h
new file mode 100644
index 000000000000..e0fd7bf72a56
--- /dev/null
+++ b/include/asm-generic/bitops/non-instrumented-non-atomic.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H
+#define __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H
+
+#define __set_bit		arch___set_bit
+#define __clear_bit		arch___clear_bit
+#define __change_bit		arch___change_bit
+
+#define __test_and_set_bit	arch___test_and_set_bit
+#define __test_and_clear_bit	arch___test_and_clear_bit
+#define __test_and_change_bit	arch___test_and_change_bit
+
+#define test_bit		arch_test_bit
+
+#endif /* __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H */
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 7aaed501f768..87087454a288 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -26,12 +26,29 @@ extern unsigned int __sw_hweight16(unsigned int w);
 extern unsigned int __sw_hweight32(unsigned int w);
 extern unsigned long __sw_hweight64(__u64 w);
 
+#include <asm-generic/bitops/generic-non-atomic.h>
+
 /*
  * Include this here because some architectures need generic_ffs/fls in
  * scope
  */
 #include <asm/bitops.h>
 
+/* Check that the bitops prototypes are sane */
+#define __check_bitop_pr(name)						\
+	static_assert(__same_type(arch_##name, generic_##name) &&	\
+		      __same_type(name, generic_##name))
+
+__check_bitop_pr(__set_bit);
+__check_bitop_pr(__clear_bit);
+__check_bitop_pr(__change_bit);
+__check_bitop_pr(__test_and_set_bit);
+__check_bitop_pr(__test_and_clear_bit);
+__check_bitop_pr(__test_and_change_bit);
+__check_bitop_pr(test_bit);
+
+#undef __check_bitop_pr
+
 static inline int get_bitmask_order(unsigned int count)
 {
 	int order;
diff --git a/tools/include/asm-generic/bitops/non-atomic.h b/tools/include/asm-generic/bitops/non-atomic.h
index 7e10c4b50c5d..e5e78e42e57b 100644
--- a/tools/include/asm-generic/bitops/non-atomic.h
+++ b/tools/include/asm-generic/bitops/non-atomic.h
@@ -2,7 +2,7 @@
 #ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
 #define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
 
-#include <asm/types.h>
+#include <linux/bits.h>
 
 /**
  * __set_bit - Set a bit in memory
@@ -13,7 +13,8 @@
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static inline void __set_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -21,7 +22,8 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
 	*p  |= mask;
 }
 
-static inline void __clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -38,7 +40,8 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static inline void __change_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -55,7 +58,8 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -74,7 +78,8 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -85,8 +90,8 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
 }
 
 /* WARNING: non atomic and it can be reordered! */
-static inline int __test_and_change_bit(int nr,
-					    volatile unsigned long *addr)
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -101,7 +106,8 @@ static inline int __test_and_change_bit(int nr,
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
-static inline int test_bit(int nr, const volatile unsigned long *addr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v3 4/7] bitops: define const_*() versions of the non-atomics
  2022-06-17 14:40   ` Alexander Lobakin
  (?)
@ 2022-06-17 14:40     ` Alexander Lobakin
  -1 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-17 14:40 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

Define const_*() variants of the non-atomic bitops to be used when
the input arguments are compile-time constants, so that the compiler
will be always able to resolve those to compile-time constants as
well. Those are mostly direct aliases for generic_*() with one
exception for const_test_bit(): the original one is declared
atomic-safe and thus doesn't discard the `volatile` qualifier, so
in order to let optimize code, define it separately disregarding
the qualifier.
Add them to the compile-time type checks as well just in case.

Suggested-by: Marco Elver <elver@google.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 .../asm-generic/bitops/generic-non-atomic.h   | 31 +++++++++++++++++++
 include/linux/bitops.h                        |  1 +
 2 files changed, 32 insertions(+)

diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
index b85b8a2ac239..3d5ebd24652b 100644
--- a/include/asm-generic/bitops/generic-non-atomic.h
+++ b/include/asm-generic/bitops/generic-non-atomic.h
@@ -127,4 +127,35 @@ generic_test_bit(unsigned long nr, const volatile unsigned long *addr)
 	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
 
+/*
+ * const_*() definitions provide good compile-time optimizations when
+ * the passed arguments can be resolved at compile time.
+ */
+#define const___set_bit			generic___set_bit
+#define const___clear_bit		generic___clear_bit
+#define const___change_bit		generic___change_bit
+#define const___test_and_set_bit	generic___test_and_set_bit
+#define const___test_and_clear_bit	generic___test_and_clear_bit
+#define const___test_and_change_bit	generic___test_and_change_bit
+
+/**
+ * const_test_bit - Determine whether a bit is set
+ * @nr: bit number to test
+ * @addr: Address to start counting from
+ *
+ * A version of generic_test_bit() which discards the `volatile` qualifier to
+ * allow a compiler to optimize code harder. Non-atomic and to be called only
+ * for testing compile-time constants, e.g. by the corresponding macros, not
+ * directly from "regular" code.
+ */
+static __always_inline bool
+const_test_bit(unsigned long nr, const volatile unsigned long *addr)
+{
+	const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long val = *p;
+
+	return !!(val & mask);
+}
+
 #endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 87087454a288..d393297287d5 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -37,6 +37,7 @@ extern unsigned long __sw_hweight64(__u64 w);
 /* Check that the bitops prototypes are sane */
 #define __check_bitop_pr(name)						\
 	static_assert(__same_type(arch_##name, generic_##name) &&	\
+		      __same_type(const_##name, generic_##name) &&	\
 		      __same_type(name, generic_##name))
 
 __check_bitop_pr(__set_bit);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v3 4/7] bitops: define const_*() versions of the non-atomics
@ 2022-06-17 14:40     ` Alexander Lobakin
  0 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-17 14:40 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

Define const_*() variants of the non-atomic bitops to be used when
the input arguments are compile-time constants, so that the compiler
will be always able to resolve those to compile-time constants as
well. Those are mostly direct aliases for generic_*() with one
exception for const_test_bit(): the original one is declared
atomic-safe and thus doesn't discard the `volatile` qualifier, so
in order to let optimize code, define it separately disregarding
the qualifier.
Add them to the compile-time type checks as well just in case.

Suggested-by: Marco Elver <elver@google.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 .../asm-generic/bitops/generic-non-atomic.h   | 31 +++++++++++++++++++
 include/linux/bitops.h                        |  1 +
 2 files changed, 32 insertions(+)

diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
index b85b8a2ac239..3d5ebd24652b 100644
--- a/include/asm-generic/bitops/generic-non-atomic.h
+++ b/include/asm-generic/bitops/generic-non-atomic.h
@@ -127,4 +127,35 @@ generic_test_bit(unsigned long nr, const volatile unsigned long *addr)
 	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
 
+/*
+ * const_*() definitions provide good compile-time optimizations when
+ * the passed arguments can be resolved at compile time.
+ */
+#define const___set_bit			generic___set_bit
+#define const___clear_bit		generic___clear_bit
+#define const___change_bit		generic___change_bit
+#define const___test_and_set_bit	generic___test_and_set_bit
+#define const___test_and_clear_bit	generic___test_and_clear_bit
+#define const___test_and_change_bit	generic___test_and_change_bit
+
+/**
+ * const_test_bit - Determine whether a bit is set
+ * @nr: bit number to test
+ * @addr: Address to start counting from
+ *
+ * A version of generic_test_bit() which discards the `volatile` qualifier to
+ * allow a compiler to optimize code harder. Non-atomic and to be called only
+ * for testing compile-time constants, e.g. by the corresponding macros, not
+ * directly from "regular" code.
+ */
+static __always_inline bool
+const_test_bit(unsigned long nr, const volatile unsigned long *addr)
+{
+	const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long val = *p;
+
+	return !!(val & mask);
+}
+
 #endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 87087454a288..d393297287d5 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -37,6 +37,7 @@ extern unsigned long __sw_hweight64(__u64 w);
 /* Check that the bitops prototypes are sane */
 #define __check_bitop_pr(name)						\
 	static_assert(__same_type(arch_##name, generic_##name) &&	\
+		      __same_type(const_##name, generic_##name) &&	\
 		      __same_type(name, generic_##name))
 
 __check_bitop_pr(__set_bit);
-- 
2.36.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v3 4/7] bitops: define const_*() versions of the non-atomics
@ 2022-06-17 14:40     ` Alexander Lobakin
  0 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-17 14:40 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64

Define const_*() variants of the non-atomic bitops to be used when
the input arguments are compile-time constants, so that the compiler
will be always able to resolve those to compile-time constants as
well. Those are mostly direct aliases for generic_*() with one
exception for const_test_bit(): the original one is declared
atomic-safe and thus doesn't discard the `volatile` qualifier, so
in order to let optimize code, define it separately disregarding
the qualifier.
Add them to the compile-time type checks as well just in case.

Suggested-by: Marco Elver <elver@google.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 .../asm-generic/bitops/generic-non-atomic.h   | 31 +++++++++++++++++++
 include/linux/bitops.h                        |  1 +
 2 files changed, 32 insertions(+)

diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
index b85b8a2ac239..3d5ebd24652b 100644
--- a/include/asm-generic/bitops/generic-non-atomic.h
+++ b/include/asm-generic/bitops/generic-non-atomic.h
@@ -127,4 +127,35 @@ generic_test_bit(unsigned long nr, const volatile unsigned long *addr)
 	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
 
+/*
+ * const_*() definitions provide good compile-time optimizations when
+ * the passed arguments can be resolved at compile time.
+ */
+#define const___set_bit			generic___set_bit
+#define const___clear_bit		generic___clear_bit
+#define const___change_bit		generic___change_bit
+#define const___test_and_set_bit	generic___test_and_set_bit
+#define const___test_and_clear_bit	generic___test_and_clear_bit
+#define const___test_and_change_bit	generic___test_and_change_bit
+
+/**
+ * const_test_bit - Determine whether a bit is set
+ * @nr: bit number to test
+ * @addr: Address to start counting from
+ *
+ * A version of generic_test_bit() which discards the `volatile` qualifier to
+ * allow a compiler to optimize code harder. Non-atomic and to be called only
+ * for testing compile-time constants, e.g. by the corresponding macros, not
+ * directly from "regular" code.
+ */
+static __always_inline bool
+const_test_bit(unsigned long nr, const volatile unsigned long *addr)
+{
+	const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long val = *p;
+
+	return !!(val & mask);
+}
+
 #endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 87087454a288..d393297287d5 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -37,6 +37,7 @@ extern unsigned long __sw_hweight64(__u64 w);
 /* Check that the bitops prototypes are sane */
 #define __check_bitop_pr(name)						\
 	static_assert(__same_type(arch_##name, generic_##name) &&	\
+		      __same_type(const_##name, generic_##name) &&	\
 		      __same_type(name, generic_##name))
 
 __check_bitop_pr(__set_bit);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v3 5/7] bitops: wrap non-atomic bitops with a transparent macro
  2022-06-17 14:40   ` Alexander Lobakin
  (?)
@ 2022-06-17 14:40     ` Alexander Lobakin
  -1 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-17 14:40 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

In preparation for altering the non-atomic bitops with a macro, wrap
them in a transparent definition. This requires prepending one more
'_' to their names in order to be able to do that seamlessly. It is
a simple change, given that all the non-prefixed definitions are now
in asm-generic.
sparc32 already has several triple-underscored functions, so I had
to rename them ('___' -> 'sp32_').

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 arch/sparc/include/asm/bitops_32.h            | 18 ++++++------
 arch/sparc/lib/atomic32.c                     | 12 ++++----
 .../bitops/instrumented-non-atomic.h          | 28 +++++++++----------
 .../bitops/non-instrumented-non-atomic.h      | 14 +++++-----
 include/linux/bitops.h                        | 18 +++++++++++-
 tools/include/asm-generic/bitops/non-atomic.h | 24 ++++++++--------
 tools/include/linux/bitops.h                  | 16 +++++++++++
 7 files changed, 81 insertions(+), 49 deletions(-)

diff --git a/arch/sparc/include/asm/bitops_32.h b/arch/sparc/include/asm/bitops_32.h
index 889afa9f990f..3448c191b484 100644
--- a/arch/sparc/include/asm/bitops_32.h
+++ b/arch/sparc/include/asm/bitops_32.h
@@ -19,9 +19,9 @@
 #error only <linux/bitops.h> can be included directly
 #endif
 
-unsigned long ___set_bit(unsigned long *addr, unsigned long mask);
-unsigned long ___clear_bit(unsigned long *addr, unsigned long mask);
-unsigned long ___change_bit(unsigned long *addr, unsigned long mask);
+unsigned long sp32___set_bit(unsigned long *addr, unsigned long mask);
+unsigned long sp32___clear_bit(unsigned long *addr, unsigned long mask);
+unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask);
 
 /*
  * Set bit 'nr' in 32-bit quantity at address 'addr' where bit '0'
@@ -36,7 +36,7 @@ static inline int test_and_set_bit(unsigned long nr, volatile unsigned long *add
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	return ___set_bit(ADDR, mask) != 0;
+	return sp32___set_bit(ADDR, mask) != 0;
 }
 
 static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
@@ -46,7 +46,7 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	(void) ___set_bit(ADDR, mask);
+	(void) sp32___set_bit(ADDR, mask);
 }
 
 static inline int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
@@ -56,7 +56,7 @@ static inline int test_and_clear_bit(unsigned long nr, volatile unsigned long *a
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	return ___clear_bit(ADDR, mask) != 0;
+	return sp32___clear_bit(ADDR, mask) != 0;
 }
 
 static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
@@ -66,7 +66,7 @@ static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	(void) ___clear_bit(ADDR, mask);
+	(void) sp32___clear_bit(ADDR, mask);
 }
 
 static inline int test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
@@ -76,7 +76,7 @@ static inline int test_and_change_bit(unsigned long nr, volatile unsigned long *
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	return ___change_bit(ADDR, mask) != 0;
+	return sp32___change_bit(ADDR, mask) != 0;
 }
 
 static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
@@ -86,7 +86,7 @@ static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	(void) ___change_bit(ADDR, mask);
+	(void) sp32___change_bit(ADDR, mask);
 }
 
 #include <asm-generic/bitops/non-atomic.h>
diff --git a/arch/sparc/lib/atomic32.c b/arch/sparc/lib/atomic32.c
index 8b81d0f00c97..cf80d1ae352b 100644
--- a/arch/sparc/lib/atomic32.c
+++ b/arch/sparc/lib/atomic32.c
@@ -120,7 +120,7 @@ void arch_atomic_set(atomic_t *v, int i)
 }
 EXPORT_SYMBOL(arch_atomic_set);
 
-unsigned long ___set_bit(unsigned long *addr, unsigned long mask)
+unsigned long sp32___set_bit(unsigned long *addr, unsigned long mask)
 {
 	unsigned long old, flags;
 
@@ -131,9 +131,9 @@ unsigned long ___set_bit(unsigned long *addr, unsigned long mask)
 
 	return old & mask;
 }
-EXPORT_SYMBOL(___set_bit);
+EXPORT_SYMBOL(sp32___set_bit);
 
-unsigned long ___clear_bit(unsigned long *addr, unsigned long mask)
+unsigned long sp32___clear_bit(unsigned long *addr, unsigned long mask)
 {
 	unsigned long old, flags;
 
@@ -144,9 +144,9 @@ unsigned long ___clear_bit(unsigned long *addr, unsigned long mask)
 
 	return old & mask;
 }
-EXPORT_SYMBOL(___clear_bit);
+EXPORT_SYMBOL(sp32___clear_bit);
 
-unsigned long ___change_bit(unsigned long *addr, unsigned long mask)
+unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask)
 {
 	unsigned long old, flags;
 
@@ -157,7 +157,7 @@ unsigned long ___change_bit(unsigned long *addr, unsigned long mask)
 
 	return old & mask;
 }
-EXPORT_SYMBOL(___change_bit);
+EXPORT_SYMBOL(sp32___change_bit);
 
 unsigned long __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
 {
diff --git a/include/asm-generic/bitops/instrumented-non-atomic.h b/include/asm-generic/bitops/instrumented-non-atomic.h
index b019f77ef21c..988a3bbfba34 100644
--- a/include/asm-generic/bitops/instrumented-non-atomic.h
+++ b/include/asm-generic/bitops/instrumented-non-atomic.h
@@ -14,7 +14,7 @@
 #include <linux/instrumented.h>
 
 /**
- * __set_bit - Set a bit in memory
+ * ___set_bit - Set a bit in memory
  * @nr: the bit to set
  * @addr: the address to start counting from
  *
@@ -23,14 +23,14 @@
  * succeeds.
  */
 static __always_inline void
-__set_bit(unsigned long nr, volatile unsigned long *addr)
+___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___set_bit(nr, addr);
 }
 
 /**
- * __clear_bit - Clears a bit in memory
+ * ___clear_bit - Clears a bit in memory
  * @nr: the bit to clear
  * @addr: the address to start counting from
  *
@@ -39,14 +39,14 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
  * succeeds.
  */
 static __always_inline void
-__clear_bit(unsigned long nr, volatile unsigned long *addr)
+___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___clear_bit(nr, addr);
 }
 
 /**
- * __change_bit - Toggle a bit in memory
+ * ___change_bit - Toggle a bit in memory
  * @nr: the bit to change
  * @addr: the address to start counting from
  *
@@ -55,7 +55,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
  * succeeds.
  */
 static __always_inline void
-__change_bit(unsigned long nr, volatile unsigned long *addr)
+___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___change_bit(nr, addr);
@@ -86,7 +86,7 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
 }
 
 /**
- * __test_and_set_bit - Set a bit and return its old value
+ * ___test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
@@ -94,14 +94,14 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
  * can appear to succeed but actually fail.
  */
 static __always_inline bool
-__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_set_bit(nr, addr);
 }
 
 /**
- * __test_and_clear_bit - Clear a bit and return its old value
+ * ___test_and_clear_bit - Clear a bit and return its old value
  * @nr: Bit to clear
  * @addr: Address to count from
  *
@@ -109,14 +109,14 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
  * can appear to succeed but actually fail.
  */
 static __always_inline bool
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_clear_bit(nr, addr);
 }
 
 /**
- * __test_and_change_bit - Change a bit and return its old value
+ * ___test_and_change_bit - Change a bit and return its old value
  * @nr: Bit to change
  * @addr: Address to count from
  *
@@ -124,19 +124,19 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
  * can appear to succeed but actually fail.
  */
 static __always_inline bool
-__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_change_bit(nr, addr);
 }
 
 /**
- * test_bit - Determine whether a bit is set
+ * _test_bit - Determine whether a bit is set
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
 static __always_inline bool
-test_bit(unsigned long nr, const volatile unsigned long *addr)
+_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long));
 	return arch_test_bit(nr, addr);
diff --git a/include/asm-generic/bitops/non-instrumented-non-atomic.h b/include/asm-generic/bitops/non-instrumented-non-atomic.h
index e0fd7bf72a56..bdb9b1ffaee9 100644
--- a/include/asm-generic/bitops/non-instrumented-non-atomic.h
+++ b/include/asm-generic/bitops/non-instrumented-non-atomic.h
@@ -3,14 +3,14 @@
 #ifndef __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H
 #define __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H
 
-#define __set_bit		arch___set_bit
-#define __clear_bit		arch___clear_bit
-#define __change_bit		arch___change_bit
+#define ___set_bit		arch___set_bit
+#define ___clear_bit		arch___clear_bit
+#define ___change_bit		arch___change_bit
 
-#define __test_and_set_bit	arch___test_and_set_bit
-#define __test_and_clear_bit	arch___test_and_clear_bit
-#define __test_and_change_bit	arch___test_and_change_bit
+#define ___test_and_set_bit	arch___test_and_set_bit
+#define ___test_and_clear_bit	arch___test_and_clear_bit
+#define ___test_and_change_bit	arch___test_and_change_bit
 
-#define test_bit		arch_test_bit
+#define _test_bit		arch_test_bit
 
 #endif /* __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H */
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index d393297287d5..3c3afbae1533 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -26,8 +26,24 @@ extern unsigned int __sw_hweight16(unsigned int w);
 extern unsigned int __sw_hweight32(unsigned int w);
 extern unsigned long __sw_hweight64(__u64 w);
 
+/*
+ * Defined here because those may be needed by architecture-specific static
+ * inlines.
+ */
+
 #include <asm-generic/bitops/generic-non-atomic.h>
 
+#define bitop(op, nr, addr)						\
+	op(nr, addr)
+
+#define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
+#define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
+#define __change_bit(nr, addr)		bitop(___change_bit, nr, addr)
+#define __test_and_set_bit(nr, addr)	bitop(___test_and_set_bit, nr, addr)
+#define __test_and_clear_bit(nr, addr)	bitop(___test_and_clear_bit, nr, addr)
+#define __test_and_change_bit(nr, addr)	bitop(___test_and_change_bit, nr, addr)
+#define test_bit(nr, addr)		bitop(_test_bit, nr, addr)
+
 /*
  * Include this here because some architectures need generic_ffs/fls in
  * scope
@@ -38,7 +54,7 @@ extern unsigned long __sw_hweight64(__u64 w);
 #define __check_bitop_pr(name)						\
 	static_assert(__same_type(arch_##name, generic_##name) &&	\
 		      __same_type(const_##name, generic_##name) &&	\
-		      __same_type(name, generic_##name))
+		      __same_type(_##name, generic_##name))
 
 __check_bitop_pr(__set_bit);
 __check_bitop_pr(__clear_bit);
diff --git a/tools/include/asm-generic/bitops/non-atomic.h b/tools/include/asm-generic/bitops/non-atomic.h
index e5e78e42e57b..0c472a833408 100644
--- a/tools/include/asm-generic/bitops/non-atomic.h
+++ b/tools/include/asm-generic/bitops/non-atomic.h
@@ -5,7 +5,7 @@
 #include <linux/bits.h>
 
 /**
- * __set_bit - Set a bit in memory
+ * ___set_bit - Set a bit in memory
  * @nr: the bit to set
  * @addr: the address to start counting from
  *
@@ -14,7 +14,7 @@
  * may be that only one operation succeeds.
  */
 static __always_inline void
-__set_bit(unsigned long nr, volatile unsigned long *addr)
+___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -23,7 +23,7 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 static __always_inline void
-__clear_bit(unsigned long nr, volatile unsigned long *addr)
+___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -32,7 +32,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * __change_bit - Toggle a bit in memory
+ * ___change_bit - Toggle a bit in memory
  * @nr: the bit to change
  * @addr: the address to start counting from
  *
@@ -41,7 +41,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
  * may be that only one operation succeeds.
  */
 static __always_inline void
-__change_bit(unsigned long nr, volatile unsigned long *addr)
+___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -50,7 +50,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * __test_and_set_bit - Set a bit and return its old value
+ * ___test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
@@ -59,7 +59,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
  * but actually fail.  You must protect multiple accesses with a lock.
  */
 static __always_inline bool
-__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -70,7 +70,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * __test_and_clear_bit - Clear a bit and return its old value
+ * ___test_and_clear_bit - Clear a bit and return its old value
  * @nr: Bit to clear
  * @addr: Address to count from
  *
@@ -79,7 +79,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
  * but actually fail.  You must protect multiple accesses with a lock.
  */
 static __always_inline bool
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -91,7 +91,7 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 
 /* WARNING: non atomic and it can be reordered! */
 static __always_inline bool
-__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -102,12 +102,12 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * test_bit - Determine whether a bit is set
+ * _test_bit - Determine whether a bit is set
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
 static __always_inline bool
-test_bit(unsigned long nr, const volatile unsigned long *addr)
+_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
diff --git a/tools/include/linux/bitops.h b/tools/include/linux/bitops.h
index 5fca38fe1ba8..f18683b95ea6 100644
--- a/tools/include/linux/bitops.h
+++ b/tools/include/linux/bitops.h
@@ -25,6 +25,22 @@ extern unsigned int __sw_hweight16(unsigned int w);
 extern unsigned int __sw_hweight32(unsigned int w);
 extern unsigned long __sw_hweight64(__u64 w);
 
+/*
+ * Defined here because those may be needed by architecture-specific static
+ * inlines.
+ */
+
+#define bitop(op, nr, addr)						\
+	op(nr, addr)
+
+#define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
+#define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
+#define __change_bit(nr, addr)		bitop(___change_bit, nr, addr)
+#define __test_and_set_bit(nr, addr)	bitop(___test_and_set_bit, nr, addr)
+#define __test_and_clear_bit(nr, addr)	bitop(___test_and_clear_bit, nr, addr)
+#define __test_and_change_bit(nr, addr)	bitop(___test_and_change_bit, nr, addr)
+#define test_bit(nr, addr)		bitop(_test_bit, nr, addr)
+
 /*
  * Include this here because some architectures need generic_ffs/fls in
  * scope
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v3 5/7] bitops: wrap non-atomic bitops with a transparent macro
@ 2022-06-17 14:40     ` Alexander Lobakin
  0 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-17 14:40 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

In preparation for altering the non-atomic bitops with a macro, wrap
them in a transparent definition. This requires prepending one more
'_' to their names in order to be able to do that seamlessly. It is
a simple change, given that all the non-prefixed definitions are now
in asm-generic.
sparc32 already has several triple-underscored functions, so I had
to rename them ('___' -> 'sp32_').

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 arch/sparc/include/asm/bitops_32.h            | 18 ++++++------
 arch/sparc/lib/atomic32.c                     | 12 ++++----
 .../bitops/instrumented-non-atomic.h          | 28 +++++++++----------
 .../bitops/non-instrumented-non-atomic.h      | 14 +++++-----
 include/linux/bitops.h                        | 18 +++++++++++-
 tools/include/asm-generic/bitops/non-atomic.h | 24 ++++++++--------
 tools/include/linux/bitops.h                  | 16 +++++++++++
 7 files changed, 81 insertions(+), 49 deletions(-)

diff --git a/arch/sparc/include/asm/bitops_32.h b/arch/sparc/include/asm/bitops_32.h
index 889afa9f990f..3448c191b484 100644
--- a/arch/sparc/include/asm/bitops_32.h
+++ b/arch/sparc/include/asm/bitops_32.h
@@ -19,9 +19,9 @@
 #error only <linux/bitops.h> can be included directly
 #endif
 
-unsigned long ___set_bit(unsigned long *addr, unsigned long mask);
-unsigned long ___clear_bit(unsigned long *addr, unsigned long mask);
-unsigned long ___change_bit(unsigned long *addr, unsigned long mask);
+unsigned long sp32___set_bit(unsigned long *addr, unsigned long mask);
+unsigned long sp32___clear_bit(unsigned long *addr, unsigned long mask);
+unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask);
 
 /*
  * Set bit 'nr' in 32-bit quantity at address 'addr' where bit '0'
@@ -36,7 +36,7 @@ static inline int test_and_set_bit(unsigned long nr, volatile unsigned long *add
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	return ___set_bit(ADDR, mask) != 0;
+	return sp32___set_bit(ADDR, mask) != 0;
 }
 
 static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
@@ -46,7 +46,7 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	(void) ___set_bit(ADDR, mask);
+	(void) sp32___set_bit(ADDR, mask);
 }
 
 static inline int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
@@ -56,7 +56,7 @@ static inline int test_and_clear_bit(unsigned long nr, volatile unsigned long *a
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	return ___clear_bit(ADDR, mask) != 0;
+	return sp32___clear_bit(ADDR, mask) != 0;
 }
 
 static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
@@ -66,7 +66,7 @@ static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	(void) ___clear_bit(ADDR, mask);
+	(void) sp32___clear_bit(ADDR, mask);
 }
 
 static inline int test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
@@ -76,7 +76,7 @@ static inline int test_and_change_bit(unsigned long nr, volatile unsigned long *
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	return ___change_bit(ADDR, mask) != 0;
+	return sp32___change_bit(ADDR, mask) != 0;
 }
 
 static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
@@ -86,7 +86,7 @@ static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	(void) ___change_bit(ADDR, mask);
+	(void) sp32___change_bit(ADDR, mask);
 }
 
 #include <asm-generic/bitops/non-atomic.h>
diff --git a/arch/sparc/lib/atomic32.c b/arch/sparc/lib/atomic32.c
index 8b81d0f00c97..cf80d1ae352b 100644
--- a/arch/sparc/lib/atomic32.c
+++ b/arch/sparc/lib/atomic32.c
@@ -120,7 +120,7 @@ void arch_atomic_set(atomic_t *v, int i)
 }
 EXPORT_SYMBOL(arch_atomic_set);
 
-unsigned long ___set_bit(unsigned long *addr, unsigned long mask)
+unsigned long sp32___set_bit(unsigned long *addr, unsigned long mask)
 {
 	unsigned long old, flags;
 
@@ -131,9 +131,9 @@ unsigned long ___set_bit(unsigned long *addr, unsigned long mask)
 
 	return old & mask;
 }
-EXPORT_SYMBOL(___set_bit);
+EXPORT_SYMBOL(sp32___set_bit);
 
-unsigned long ___clear_bit(unsigned long *addr, unsigned long mask)
+unsigned long sp32___clear_bit(unsigned long *addr, unsigned long mask)
 {
 	unsigned long old, flags;
 
@@ -144,9 +144,9 @@ unsigned long ___clear_bit(unsigned long *addr, unsigned long mask)
 
 	return old & mask;
 }
-EXPORT_SYMBOL(___clear_bit);
+EXPORT_SYMBOL(sp32___clear_bit);
 
-unsigned long ___change_bit(unsigned long *addr, unsigned long mask)
+unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask)
 {
 	unsigned long old, flags;
 
@@ -157,7 +157,7 @@ unsigned long ___change_bit(unsigned long *addr, unsigned long mask)
 
 	return old & mask;
 }
-EXPORT_SYMBOL(___change_bit);
+EXPORT_SYMBOL(sp32___change_bit);
 
 unsigned long __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
 {
diff --git a/include/asm-generic/bitops/instrumented-non-atomic.h b/include/asm-generic/bitops/instrumented-non-atomic.h
index b019f77ef21c..988a3bbfba34 100644
--- a/include/asm-generic/bitops/instrumented-non-atomic.h
+++ b/include/asm-generic/bitops/instrumented-non-atomic.h
@@ -14,7 +14,7 @@
 #include <linux/instrumented.h>
 
 /**
- * __set_bit - Set a bit in memory
+ * ___set_bit - Set a bit in memory
  * @nr: the bit to set
  * @addr: the address to start counting from
  *
@@ -23,14 +23,14 @@
  * succeeds.
  */
 static __always_inline void
-__set_bit(unsigned long nr, volatile unsigned long *addr)
+___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___set_bit(nr, addr);
 }
 
 /**
- * __clear_bit - Clears a bit in memory
+ * ___clear_bit - Clears a bit in memory
  * @nr: the bit to clear
  * @addr: the address to start counting from
  *
@@ -39,14 +39,14 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
  * succeeds.
  */
 static __always_inline void
-__clear_bit(unsigned long nr, volatile unsigned long *addr)
+___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___clear_bit(nr, addr);
 }
 
 /**
- * __change_bit - Toggle a bit in memory
+ * ___change_bit - Toggle a bit in memory
  * @nr: the bit to change
  * @addr: the address to start counting from
  *
@@ -55,7 +55,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
  * succeeds.
  */
 static __always_inline void
-__change_bit(unsigned long nr, volatile unsigned long *addr)
+___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___change_bit(nr, addr);
@@ -86,7 +86,7 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
 }
 
 /**
- * __test_and_set_bit - Set a bit and return its old value
+ * ___test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
@@ -94,14 +94,14 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
  * can appear to succeed but actually fail.
  */
 static __always_inline bool
-__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_set_bit(nr, addr);
 }
 
 /**
- * __test_and_clear_bit - Clear a bit and return its old value
+ * ___test_and_clear_bit - Clear a bit and return its old value
  * @nr: Bit to clear
  * @addr: Address to count from
  *
@@ -109,14 +109,14 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
  * can appear to succeed but actually fail.
  */
 static __always_inline bool
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_clear_bit(nr, addr);
 }
 
 /**
- * __test_and_change_bit - Change a bit and return its old value
+ * ___test_and_change_bit - Change a bit and return its old value
  * @nr: Bit to change
  * @addr: Address to count from
  *
@@ -124,19 +124,19 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
  * can appear to succeed but actually fail.
  */
 static __always_inline bool
-__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_change_bit(nr, addr);
 }
 
 /**
- * test_bit - Determine whether a bit is set
+ * _test_bit - Determine whether a bit is set
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
 static __always_inline bool
-test_bit(unsigned long nr, const volatile unsigned long *addr)
+_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long));
 	return arch_test_bit(nr, addr);
diff --git a/include/asm-generic/bitops/non-instrumented-non-atomic.h b/include/asm-generic/bitops/non-instrumented-non-atomic.h
index e0fd7bf72a56..bdb9b1ffaee9 100644
--- a/include/asm-generic/bitops/non-instrumented-non-atomic.h
+++ b/include/asm-generic/bitops/non-instrumented-non-atomic.h
@@ -3,14 +3,14 @@
 #ifndef __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H
 #define __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H
 
-#define __set_bit		arch___set_bit
-#define __clear_bit		arch___clear_bit
-#define __change_bit		arch___change_bit
+#define ___set_bit		arch___set_bit
+#define ___clear_bit		arch___clear_bit
+#define ___change_bit		arch___change_bit
 
-#define __test_and_set_bit	arch___test_and_set_bit
-#define __test_and_clear_bit	arch___test_and_clear_bit
-#define __test_and_change_bit	arch___test_and_change_bit
+#define ___test_and_set_bit	arch___test_and_set_bit
+#define ___test_and_clear_bit	arch___test_and_clear_bit
+#define ___test_and_change_bit	arch___test_and_change_bit
 
-#define test_bit		arch_test_bit
+#define _test_bit		arch_test_bit
 
 #endif /* __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H */
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index d393297287d5..3c3afbae1533 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -26,8 +26,24 @@ extern unsigned int __sw_hweight16(unsigned int w);
 extern unsigned int __sw_hweight32(unsigned int w);
 extern unsigned long __sw_hweight64(__u64 w);
 
+/*
+ * Defined here because those may be needed by architecture-specific static
+ * inlines.
+ */
+
 #include <asm-generic/bitops/generic-non-atomic.h>
 
+#define bitop(op, nr, addr)						\
+	op(nr, addr)
+
+#define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
+#define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
+#define __change_bit(nr, addr)		bitop(___change_bit, nr, addr)
+#define __test_and_set_bit(nr, addr)	bitop(___test_and_set_bit, nr, addr)
+#define __test_and_clear_bit(nr, addr)	bitop(___test_and_clear_bit, nr, addr)
+#define __test_and_change_bit(nr, addr)	bitop(___test_and_change_bit, nr, addr)
+#define test_bit(nr, addr)		bitop(_test_bit, nr, addr)
+
 /*
  * Include this here because some architectures need generic_ffs/fls in
  * scope
@@ -38,7 +54,7 @@ extern unsigned long __sw_hweight64(__u64 w);
 #define __check_bitop_pr(name)						\
 	static_assert(__same_type(arch_##name, generic_##name) &&	\
 		      __same_type(const_##name, generic_##name) &&	\
-		      __same_type(name, generic_##name))
+		      __same_type(_##name, generic_##name))
 
 __check_bitop_pr(__set_bit);
 __check_bitop_pr(__clear_bit);
diff --git a/tools/include/asm-generic/bitops/non-atomic.h b/tools/include/asm-generic/bitops/non-atomic.h
index e5e78e42e57b..0c472a833408 100644
--- a/tools/include/asm-generic/bitops/non-atomic.h
+++ b/tools/include/asm-generic/bitops/non-atomic.h
@@ -5,7 +5,7 @@
 #include <linux/bits.h>
 
 /**
- * __set_bit - Set a bit in memory
+ * ___set_bit - Set a bit in memory
  * @nr: the bit to set
  * @addr: the address to start counting from
  *
@@ -14,7 +14,7 @@
  * may be that only one operation succeeds.
  */
 static __always_inline void
-__set_bit(unsigned long nr, volatile unsigned long *addr)
+___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -23,7 +23,7 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 static __always_inline void
-__clear_bit(unsigned long nr, volatile unsigned long *addr)
+___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -32,7 +32,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * __change_bit - Toggle a bit in memory
+ * ___change_bit - Toggle a bit in memory
  * @nr: the bit to change
  * @addr: the address to start counting from
  *
@@ -41,7 +41,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
  * may be that only one operation succeeds.
  */
 static __always_inline void
-__change_bit(unsigned long nr, volatile unsigned long *addr)
+___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -50,7 +50,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * __test_and_set_bit - Set a bit and return its old value
+ * ___test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
@@ -59,7 +59,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
  * but actually fail.  You must protect multiple accesses with a lock.
  */
 static __always_inline bool
-__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -70,7 +70,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * __test_and_clear_bit - Clear a bit and return its old value
+ * ___test_and_clear_bit - Clear a bit and return its old value
  * @nr: Bit to clear
  * @addr: Address to count from
  *
@@ -79,7 +79,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
  * but actually fail.  You must protect multiple accesses with a lock.
  */
 static __always_inline bool
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -91,7 +91,7 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 
 /* WARNING: non atomic and it can be reordered! */
 static __always_inline bool
-__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -102,12 +102,12 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * test_bit - Determine whether a bit is set
+ * _test_bit - Determine whether a bit is set
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
 static __always_inline bool
-test_bit(unsigned long nr, const volatile unsigned long *addr)
+_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
diff --git a/tools/include/linux/bitops.h b/tools/include/linux/bitops.h
index 5fca38fe1ba8..f18683b95ea6 100644
--- a/tools/include/linux/bitops.h
+++ b/tools/include/linux/bitops.h
@@ -25,6 +25,22 @@ extern unsigned int __sw_hweight16(unsigned int w);
 extern unsigned int __sw_hweight32(unsigned int w);
 extern unsigned long __sw_hweight64(__u64 w);
 
+/*
+ * Defined here because those may be needed by architecture-specific static
+ * inlines.
+ */
+
+#define bitop(op, nr, addr)						\
+	op(nr, addr)
+
+#define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
+#define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
+#define __change_bit(nr, addr)		bitop(___change_bit, nr, addr)
+#define __test_and_set_bit(nr, addr)	bitop(___test_and_set_bit, nr, addr)
+#define __test_and_clear_bit(nr, addr)	bitop(___test_and_clear_bit, nr, addr)
+#define __test_and_change_bit(nr, addr)	bitop(___test_and_change_bit, nr, addr)
+#define test_bit(nr, addr)		bitop(_test_bit, nr, addr)
+
 /*
  * Include this here because some architectures need generic_ffs/fls in
  * scope
-- 
2.36.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v3 5/7] bitops: wrap non-atomic bitops with a transparent macro
@ 2022-06-17 14:40     ` Alexander Lobakin
  0 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-17 14:40 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64

In preparation for altering the non-atomic bitops with a macro, wrap
them in a transparent definition. This requires prepending one more
'_' to their names in order to be able to do that seamlessly. It is
a simple change, given that all the non-prefixed definitions are now
in asm-generic.
sparc32 already has several triple-underscored functions, so I had
to rename them ('___' -> 'sp32_').

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 arch/sparc/include/asm/bitops_32.h            | 18 ++++++------
 arch/sparc/lib/atomic32.c                     | 12 ++++----
 .../bitops/instrumented-non-atomic.h          | 28 +++++++++----------
 .../bitops/non-instrumented-non-atomic.h      | 14 +++++-----
 include/linux/bitops.h                        | 18 +++++++++++-
 tools/include/asm-generic/bitops/non-atomic.h | 24 ++++++++--------
 tools/include/linux/bitops.h                  | 16 +++++++++++
 7 files changed, 81 insertions(+), 49 deletions(-)

diff --git a/arch/sparc/include/asm/bitops_32.h b/arch/sparc/include/asm/bitops_32.h
index 889afa9f990f..3448c191b484 100644
--- a/arch/sparc/include/asm/bitops_32.h
+++ b/arch/sparc/include/asm/bitops_32.h
@@ -19,9 +19,9 @@
 #error only <linux/bitops.h> can be included directly
 #endif
 
-unsigned long ___set_bit(unsigned long *addr, unsigned long mask);
-unsigned long ___clear_bit(unsigned long *addr, unsigned long mask);
-unsigned long ___change_bit(unsigned long *addr, unsigned long mask);
+unsigned long sp32___set_bit(unsigned long *addr, unsigned long mask);
+unsigned long sp32___clear_bit(unsigned long *addr, unsigned long mask);
+unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask);
 
 /*
  * Set bit 'nr' in 32-bit quantity at address 'addr' where bit '0'
@@ -36,7 +36,7 @@ static inline int test_and_set_bit(unsigned long nr, volatile unsigned long *add
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	return ___set_bit(ADDR, mask) != 0;
+	return sp32___set_bit(ADDR, mask) != 0;
 }
 
 static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
@@ -46,7 +46,7 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	(void) ___set_bit(ADDR, mask);
+	(void) sp32___set_bit(ADDR, mask);
 }
 
 static inline int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
@@ -56,7 +56,7 @@ static inline int test_and_clear_bit(unsigned long nr, volatile unsigned long *a
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	return ___clear_bit(ADDR, mask) != 0;
+	return sp32___clear_bit(ADDR, mask) != 0;
 }
 
 static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
@@ -66,7 +66,7 @@ static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	(void) ___clear_bit(ADDR, mask);
+	(void) sp32___clear_bit(ADDR, mask);
 }
 
 static inline int test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
@@ -76,7 +76,7 @@ static inline int test_and_change_bit(unsigned long nr, volatile unsigned long *
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	return ___change_bit(ADDR, mask) != 0;
+	return sp32___change_bit(ADDR, mask) != 0;
 }
 
 static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
@@ -86,7 +86,7 @@ static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
 	ADDR = ((unsigned long *) addr) + (nr >> 5);
 	mask = 1 << (nr & 31);
 
-	(void) ___change_bit(ADDR, mask);
+	(void) sp32___change_bit(ADDR, mask);
 }
 
 #include <asm-generic/bitops/non-atomic.h>
diff --git a/arch/sparc/lib/atomic32.c b/arch/sparc/lib/atomic32.c
index 8b81d0f00c97..cf80d1ae352b 100644
--- a/arch/sparc/lib/atomic32.c
+++ b/arch/sparc/lib/atomic32.c
@@ -120,7 +120,7 @@ void arch_atomic_set(atomic_t *v, int i)
 }
 EXPORT_SYMBOL(arch_atomic_set);
 
-unsigned long ___set_bit(unsigned long *addr, unsigned long mask)
+unsigned long sp32___set_bit(unsigned long *addr, unsigned long mask)
 {
 	unsigned long old, flags;
 
@@ -131,9 +131,9 @@ unsigned long ___set_bit(unsigned long *addr, unsigned long mask)
 
 	return old & mask;
 }
-EXPORT_SYMBOL(___set_bit);
+EXPORT_SYMBOL(sp32___set_bit);
 
-unsigned long ___clear_bit(unsigned long *addr, unsigned long mask)
+unsigned long sp32___clear_bit(unsigned long *addr, unsigned long mask)
 {
 	unsigned long old, flags;
 
@@ -144,9 +144,9 @@ unsigned long ___clear_bit(unsigned long *addr, unsigned long mask)
 
 	return old & mask;
 }
-EXPORT_SYMBOL(___clear_bit);
+EXPORT_SYMBOL(sp32___clear_bit);
 
-unsigned long ___change_bit(unsigned long *addr, unsigned long mask)
+unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask)
 {
 	unsigned long old, flags;
 
@@ -157,7 +157,7 @@ unsigned long ___change_bit(unsigned long *addr, unsigned long mask)
 
 	return old & mask;
 }
-EXPORT_SYMBOL(___change_bit);
+EXPORT_SYMBOL(sp32___change_bit);
 
 unsigned long __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
 {
diff --git a/include/asm-generic/bitops/instrumented-non-atomic.h b/include/asm-generic/bitops/instrumented-non-atomic.h
index b019f77ef21c..988a3bbfba34 100644
--- a/include/asm-generic/bitops/instrumented-non-atomic.h
+++ b/include/asm-generic/bitops/instrumented-non-atomic.h
@@ -14,7 +14,7 @@
 #include <linux/instrumented.h>
 
 /**
- * __set_bit - Set a bit in memory
+ * ___set_bit - Set a bit in memory
  * @nr: the bit to set
  * @addr: the address to start counting from
  *
@@ -23,14 +23,14 @@
  * succeeds.
  */
 static __always_inline void
-__set_bit(unsigned long nr, volatile unsigned long *addr)
+___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___set_bit(nr, addr);
 }
 
 /**
- * __clear_bit - Clears a bit in memory
+ * ___clear_bit - Clears a bit in memory
  * @nr: the bit to clear
  * @addr: the address to start counting from
  *
@@ -39,14 +39,14 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
  * succeeds.
  */
 static __always_inline void
-__clear_bit(unsigned long nr, volatile unsigned long *addr)
+___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___clear_bit(nr, addr);
 }
 
 /**
- * __change_bit - Toggle a bit in memory
+ * ___change_bit - Toggle a bit in memory
  * @nr: the bit to change
  * @addr: the address to start counting from
  *
@@ -55,7 +55,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
  * succeeds.
  */
 static __always_inline void
-__change_bit(unsigned long nr, volatile unsigned long *addr)
+___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___change_bit(nr, addr);
@@ -86,7 +86,7 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
 }
 
 /**
- * __test_and_set_bit - Set a bit and return its old value
+ * ___test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
@@ -94,14 +94,14 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
  * can appear to succeed but actually fail.
  */
 static __always_inline bool
-__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_set_bit(nr, addr);
 }
 
 /**
- * __test_and_clear_bit - Clear a bit and return its old value
+ * ___test_and_clear_bit - Clear a bit and return its old value
  * @nr: Bit to clear
  * @addr: Address to count from
  *
@@ -109,14 +109,14 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
  * can appear to succeed but actually fail.
  */
 static __always_inline bool
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_clear_bit(nr, addr);
 }
 
 /**
- * __test_and_change_bit - Change a bit and return its old value
+ * ___test_and_change_bit - Change a bit and return its old value
  * @nr: Bit to change
  * @addr: Address to count from
  *
@@ -124,19 +124,19 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
  * can appear to succeed but actually fail.
  */
 static __always_inline bool
-__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	__instrument_read_write_bitop(nr, addr);
 	return arch___test_and_change_bit(nr, addr);
 }
 
 /**
- * test_bit - Determine whether a bit is set
+ * _test_bit - Determine whether a bit is set
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
 static __always_inline bool
-test_bit(unsigned long nr, const volatile unsigned long *addr)
+_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long));
 	return arch_test_bit(nr, addr);
diff --git a/include/asm-generic/bitops/non-instrumented-non-atomic.h b/include/asm-generic/bitops/non-instrumented-non-atomic.h
index e0fd7bf72a56..bdb9b1ffaee9 100644
--- a/include/asm-generic/bitops/non-instrumented-non-atomic.h
+++ b/include/asm-generic/bitops/non-instrumented-non-atomic.h
@@ -3,14 +3,14 @@
 #ifndef __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H
 #define __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H
 
-#define __set_bit		arch___set_bit
-#define __clear_bit		arch___clear_bit
-#define __change_bit		arch___change_bit
+#define ___set_bit		arch___set_bit
+#define ___clear_bit		arch___clear_bit
+#define ___change_bit		arch___change_bit
 
-#define __test_and_set_bit	arch___test_and_set_bit
-#define __test_and_clear_bit	arch___test_and_clear_bit
-#define __test_and_change_bit	arch___test_and_change_bit
+#define ___test_and_set_bit	arch___test_and_set_bit
+#define ___test_and_clear_bit	arch___test_and_clear_bit
+#define ___test_and_change_bit	arch___test_and_change_bit
 
-#define test_bit		arch_test_bit
+#define _test_bit		arch_test_bit
 
 #endif /* __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H */
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index d393297287d5..3c3afbae1533 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -26,8 +26,24 @@ extern unsigned int __sw_hweight16(unsigned int w);
 extern unsigned int __sw_hweight32(unsigned int w);
 extern unsigned long __sw_hweight64(__u64 w);
 
+/*
+ * Defined here because those may be needed by architecture-specific static
+ * inlines.
+ */
+
 #include <asm-generic/bitops/generic-non-atomic.h>
 
+#define bitop(op, nr, addr)						\
+	op(nr, addr)
+
+#define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
+#define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
+#define __change_bit(nr, addr)		bitop(___change_bit, nr, addr)
+#define __test_and_set_bit(nr, addr)	bitop(___test_and_set_bit, nr, addr)
+#define __test_and_clear_bit(nr, addr)	bitop(___test_and_clear_bit, nr, addr)
+#define __test_and_change_bit(nr, addr)	bitop(___test_and_change_bit, nr, addr)
+#define test_bit(nr, addr)		bitop(_test_bit, nr, addr)
+
 /*
  * Include this here because some architectures need generic_ffs/fls in
  * scope
@@ -38,7 +54,7 @@ extern unsigned long __sw_hweight64(__u64 w);
 #define __check_bitop_pr(name)						\
 	static_assert(__same_type(arch_##name, generic_##name) &&	\
 		      __same_type(const_##name, generic_##name) &&	\
-		      __same_type(name, generic_##name))
+		      __same_type(_##name, generic_##name))
 
 __check_bitop_pr(__set_bit);
 __check_bitop_pr(__clear_bit);
diff --git a/tools/include/asm-generic/bitops/non-atomic.h b/tools/include/asm-generic/bitops/non-atomic.h
index e5e78e42e57b..0c472a833408 100644
--- a/tools/include/asm-generic/bitops/non-atomic.h
+++ b/tools/include/asm-generic/bitops/non-atomic.h
@@ -5,7 +5,7 @@
 #include <linux/bits.h>
 
 /**
- * __set_bit - Set a bit in memory
+ * ___set_bit - Set a bit in memory
  * @nr: the bit to set
  * @addr: the address to start counting from
  *
@@ -14,7 +14,7 @@
  * may be that only one operation succeeds.
  */
 static __always_inline void
-__set_bit(unsigned long nr, volatile unsigned long *addr)
+___set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -23,7 +23,7 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 static __always_inline void
-__clear_bit(unsigned long nr, volatile unsigned long *addr)
+___clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -32,7 +32,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * __change_bit - Toggle a bit in memory
+ * ___change_bit - Toggle a bit in memory
  * @nr: the bit to change
  * @addr: the address to start counting from
  *
@@ -41,7 +41,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
  * may be that only one operation succeeds.
  */
 static __always_inline void
-__change_bit(unsigned long nr, volatile unsigned long *addr)
+___change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -50,7 +50,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * __test_and_set_bit - Set a bit and return its old value
+ * ___test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
@@ -59,7 +59,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
  * but actually fail.  You must protect multiple accesses with a lock.
  */
 static __always_inline bool
-__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -70,7 +70,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * __test_and_clear_bit - Clear a bit and return its old value
+ * ___test_and_clear_bit - Clear a bit and return its old value
  * @nr: Bit to clear
  * @addr: Address to count from
  *
@@ -79,7 +79,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
  * but actually fail.  You must protect multiple accesses with a lock.
  */
 static __always_inline bool
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -91,7 +91,7 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 
 /* WARNING: non atomic and it can be reordered! */
 static __always_inline bool
-__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -102,12 +102,12 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /**
- * test_bit - Determine whether a bit is set
+ * _test_bit - Determine whether a bit is set
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
 static __always_inline bool
-test_bit(unsigned long nr, const volatile unsigned long *addr)
+_test_bit(unsigned long nr, const volatile unsigned long *addr)
 {
 	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
diff --git a/tools/include/linux/bitops.h b/tools/include/linux/bitops.h
index 5fca38fe1ba8..f18683b95ea6 100644
--- a/tools/include/linux/bitops.h
+++ b/tools/include/linux/bitops.h
@@ -25,6 +25,22 @@ extern unsigned int __sw_hweight16(unsigned int w);
 extern unsigned int __sw_hweight32(unsigned int w);
 extern unsigned long __sw_hweight64(__u64 w);
 
+/*
+ * Defined here because those may be needed by architecture-specific static
+ * inlines.
+ */
+
+#define bitop(op, nr, addr)						\
+	op(nr, addr)
+
+#define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
+#define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
+#define __change_bit(nr, addr)		bitop(___change_bit, nr, addr)
+#define __test_and_set_bit(nr, addr)	bitop(___test_and_set_bit, nr, addr)
+#define __test_and_clear_bit(nr, addr)	bitop(___test_and_clear_bit, nr, addr)
+#define __test_and_change_bit(nr, addr)	bitop(___test_and_change_bit, nr, addr)
+#define test_bit(nr, addr)		bitop(_test_bit, nr, addr)
+
 /*
  * Include this here because some architectures need generic_ffs/fls in
  * scope
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v3 6/7] bitops: let optimize out non-atomic bitops on compile-time constants
  2022-06-17 14:40   ` Alexander Lobakin
  (?)
@ 2022-06-17 14:40     ` Alexander Lobakin
  -1 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-17 14:40 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

Currently, many architecture-specific non-atomic bitop
implementations use inline asm or other hacks which are faster or
more robust when working with "real" variables (i.e. fields from
the structures etc.), but the compilers have no clue how to optimize
them out when called on compile-time constants. That said, the
following code:

	DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
	unsigned long bar = BIT(BAR_BIT);
	unsigned long baz = 0;

	__set_bit(FOO_BIT, foo);
	baz |= BIT(BAZ_BIT);

	BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
	BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
	BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));

triggers the first assertion on x86_64, which means that the
compiler is unable to evaluate it to a compile-time initializer
when the architecture-specific bitop is used even if it's obvious.
In order to let the compiler optimize out such cases, expand the
bitop() macro to use the "constant" C non-atomic bitop
implementations when all of the arguments passed are compile-time
constants, which means that the result will be a compile-time
constant as well, so that it produces more efficient and simple
code in 100% cases, comparing to the architecture-specific
counterparts.

The savings are architecture, compiler and compiler flags dependent,
for example, on x86_64 -O2:

GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)

and ARM64 (courtesy of Mark):

GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)

Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 include/linux/bitops.h | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 3c3afbae1533..26a43360c4ae 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -33,8 +33,24 @@ extern unsigned long __sw_hweight64(__u64 w);
 
 #include <asm-generic/bitops/generic-non-atomic.h>
 
+/*
+ * Many architecture-specific non-atomic bitops contain inline asm code and due
+ * to that the compiler can't optimize them to compile-time expressions or
+ * constants. In contrary, gen_*() helpers are defined in pure C and compilers
+ * optimize them just well.
+ * Therefore, to make `unsigned long foo = 0; __set_bit(BAR, &foo)` effectively
+ * equal to `unsigned long foo = BIT(BAR)`, pick the generic C alternative when
+ * the arguments can be resolved at compile time. That expression itself is a
+ * constant and doesn't bring any functional changes to the rest of cases.
+ * The casts to `uintptr_t` are needed to mitigate `-Waddress` warnings when
+ * passing a bitmap from .bss or .data (-> `!!addr` is always true).
+ */
 #define bitop(op, nr, addr)						\
-	op(nr, addr)
+	((__builtin_constant_p(nr) &&					\
+	  __builtin_constant_p((uintptr_t)(addr) != (uintptr_t)NULL) &&	\
+	  (uintptr_t)(addr) != (uintptr_t)NULL &&			\
+	  __builtin_constant_p(*(const unsigned long *)(addr))) ?	\
+	 const##op(nr, addr) : op(nr, addr))
 
 #define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
 #define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v3 6/7] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-17 14:40     ` Alexander Lobakin
  0 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-17 14:40 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

Currently, many architecture-specific non-atomic bitop
implementations use inline asm or other hacks which are faster or
more robust when working with "real" variables (i.e. fields from
the structures etc.), but the compilers have no clue how to optimize
them out when called on compile-time constants. That said, the
following code:

	DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
	unsigned long bar = BIT(BAR_BIT);
	unsigned long baz = 0;

	__set_bit(FOO_BIT, foo);
	baz |= BIT(BAZ_BIT);

	BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
	BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
	BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));

triggers the first assertion on x86_64, which means that the
compiler is unable to evaluate it to a compile-time initializer
when the architecture-specific bitop is used even if it's obvious.
In order to let the compiler optimize out such cases, expand the
bitop() macro to use the "constant" C non-atomic bitop
implementations when all of the arguments passed are compile-time
constants, which means that the result will be a compile-time
constant as well, so that it produces more efficient and simple
code in 100% cases, comparing to the architecture-specific
counterparts.

The savings are architecture, compiler and compiler flags dependent,
for example, on x86_64 -O2:

GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)

and ARM64 (courtesy of Mark):

GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)

Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 include/linux/bitops.h | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 3c3afbae1533..26a43360c4ae 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -33,8 +33,24 @@ extern unsigned long __sw_hweight64(__u64 w);
 
 #include <asm-generic/bitops/generic-non-atomic.h>
 
+/*
+ * Many architecture-specific non-atomic bitops contain inline asm code and due
+ * to that the compiler can't optimize them to compile-time expressions or
+ * constants. In contrary, gen_*() helpers are defined in pure C and compilers
+ * optimize them just well.
+ * Therefore, to make `unsigned long foo = 0; __set_bit(BAR, &foo)` effectively
+ * equal to `unsigned long foo = BIT(BAR)`, pick the generic C alternative when
+ * the arguments can be resolved at compile time. That expression itself is a
+ * constant and doesn't bring any functional changes to the rest of cases.
+ * The casts to `uintptr_t` are needed to mitigate `-Waddress` warnings when
+ * passing a bitmap from .bss or .data (-> `!!addr` is always true).
+ */
 #define bitop(op, nr, addr)						\
-	op(nr, addr)
+	((__builtin_constant_p(nr) &&					\
+	  __builtin_constant_p((uintptr_t)(addr) != (uintptr_t)NULL) &&	\
+	  (uintptr_t)(addr) != (uintptr_t)NULL &&			\
+	  __builtin_constant_p(*(const unsigned long *)(addr))) ?	\
+	 const##op(nr, addr) : op(nr, addr))
 
 #define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
 #define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
-- 
2.36.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v3 6/7] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-17 14:40     ` Alexander Lobakin
  0 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-17 14:40 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64

Currently, many architecture-specific non-atomic bitop
implementations use inline asm or other hacks which are faster or
more robust when working with "real" variables (i.e. fields from
the structures etc.), but the compilers have no clue how to optimize
them out when called on compile-time constants. That said, the
following code:

	DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
	unsigned long bar = BIT(BAR_BIT);
	unsigned long baz = 0;

	__set_bit(FOO_BIT, foo);
	baz |= BIT(BAZ_BIT);

	BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
	BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
	BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));

triggers the first assertion on x86_64, which means that the
compiler is unable to evaluate it to a compile-time initializer
when the architecture-specific bitop is used even if it's obvious.
In order to let the compiler optimize out such cases, expand the
bitop() macro to use the "constant" C non-atomic bitop
implementations when all of the arguments passed are compile-time
constants, which means that the result will be a compile-time
constant as well, so that it produces more efficient and simple
code in 100% cases, comparing to the architecture-specific
counterparts.

The savings are architecture, compiler and compiler flags dependent,
for example, on x86_64 -O2:

GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)

and ARM64 (courtesy of Mark):

GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)

Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 include/linux/bitops.h | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 3c3afbae1533..26a43360c4ae 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -33,8 +33,24 @@ extern unsigned long __sw_hweight64(__u64 w);
 
 #include <asm-generic/bitops/generic-non-atomic.h>
 
+/*
+ * Many architecture-specific non-atomic bitops contain inline asm code and due
+ * to that the compiler can't optimize them to compile-time expressions or
+ * constants. In contrary, gen_*() helpers are defined in pure C and compilers
+ * optimize them just well.
+ * Therefore, to make `unsigned long foo = 0; __set_bit(BAR, &foo)` effectively
+ * equal to `unsigned long foo = BIT(BAR)`, pick the generic C alternative when
+ * the arguments can be resolved at compile time. That expression itself is a
+ * constant and doesn't bring any functional changes to the rest of cases.
+ * The casts to `uintptr_t` are needed to mitigate `-Waddress` warnings when
+ * passing a bitmap from .bss or .data (-> `!!addr` is always true).
+ */
 #define bitop(op, nr, addr)						\
-	op(nr, addr)
+	((__builtin_constant_p(nr) &&					\
+	  __builtin_constant_p((uintptr_t)(addr) != (uintptr_t)NULL) &&	\
+	  (uintptr_t)(addr) != (uintptr_t)NULL &&			\
+	  __builtin_constant_p(*(const unsigned long *)(addr))) ?	\
+	 const##op(nr, addr) : op(nr, addr))
 
 #define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
 #define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v3 7/7] lib: test_bitmap: add compile-time optimization/evaluations assertions
  2022-06-17 14:40   ` Alexander Lobakin
  (?)
@ 2022-06-17 14:40     ` Alexander Lobakin
  -1 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-17 14:40 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

Add a function to the bitmap test suite, which will ensure that
compilers are able to evaluate operations performed by the
bitops/bitmap helpers to compile-time constants when all of the
arguments are compile-time constants as well, or trigger a build
bug otherwise. This should work on all architectures and all the
optimization levels supported by Kbuild.
The function doesn't perform any runtime tests and gets optimized
out to nothing after passing the build assertions.

Suggested-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 lib/test_bitmap.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index d5923a640457..3a7b09b82794 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -869,6 +869,50 @@ static void __init test_bitmap_print_buf(void)
 	}
 }
 
+static void __init test_bitmap_const_eval(void)
+{
+	DECLARE_BITMAP(bitmap, BITS_PER_LONG);
+	unsigned long initvar = BIT(2);
+	unsigned long bitopvar = 0;
+	unsigned long var = 0;
+	int res;
+
+	/*
+	 * Compilers must be able to optimize all of those to compile-time
+	 * constants on any supported optimization level (-O2, -Os) and any
+	 * architecture. Otherwise, trigger a build bug.
+	 * The whole function gets optimized out then, there's nothing to do
+	 * in runtime.
+	 */
+
+	/* Equals to `unsigned long bitmap[1] = { BIT(5), }` */
+	bitmap_clear(bitmap, 0, BITS_PER_LONG);
+	if (!test_bit(7, bitmap))
+		bitmap_set(bitmap, 5, 1);
+
+	/* Equals to `unsigned long bitopvar = BIT(20)` */
+	__change_bit(31, &bitopvar);
+	bitmap_shift_right(&bitopvar, &bitopvar, 11, BITS_PER_LONG);
+
+	/* Equals to `unsigned long var = BIT(25)` */
+	var |= BIT(25);
+	if (var & BIT(0))
+		var ^= GENMASK(9, 6);
+
+	/* __const_hweight<32|64>(BIT(5)) == 1 */
+	res = bitmap_weight(bitmap, 20);
+	BUILD_BUG_ON(!__builtin_constant_p(res));
+
+	/* !(BIT(31) & BIT(18)) == 1 */
+	res = !test_bit(18, &bitopvar);
+	BUILD_BUG_ON(!__builtin_constant_p(res));
+
+	/* BIT(2) & GENMASK(14, 8) == 0 */
+	BUILD_BUG_ON(!__builtin_constant_p(initvar & GENMASK(14, 8)));
+	/* ~BIT(25) */
+	BUILD_BUG_ON(!__builtin_constant_p(~var));
+}
+
 static void __init selftest(void)
 {
 	test_zero_clear();
@@ -884,6 +928,7 @@ static void __init selftest(void)
 	test_for_each_set_clump8();
 	test_bitmap_cut();
 	test_bitmap_print_buf();
+	test_bitmap_const_eval();
 }
 
 KSTM_MODULE_LOADERS(test_bitmap);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v3 7/7] lib: test_bitmap: add compile-time optimization/evaluations assertions
@ 2022-06-17 14:40     ` Alexander Lobakin
  0 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-17 14:40 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

Add a function to the bitmap test suite, which will ensure that
compilers are able to evaluate operations performed by the
bitops/bitmap helpers to compile-time constants when all of the
arguments are compile-time constants as well, or trigger a build
bug otherwise. This should work on all architectures and all the
optimization levels supported by Kbuild.
The function doesn't perform any runtime tests and gets optimized
out to nothing after passing the build assertions.

Suggested-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 lib/test_bitmap.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index d5923a640457..3a7b09b82794 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -869,6 +869,50 @@ static void __init test_bitmap_print_buf(void)
 	}
 }
 
+static void __init test_bitmap_const_eval(void)
+{
+	DECLARE_BITMAP(bitmap, BITS_PER_LONG);
+	unsigned long initvar = BIT(2);
+	unsigned long bitopvar = 0;
+	unsigned long var = 0;
+	int res;
+
+	/*
+	 * Compilers must be able to optimize all of those to compile-time
+	 * constants on any supported optimization level (-O2, -Os) and any
+	 * architecture. Otherwise, trigger a build bug.
+	 * The whole function gets optimized out then, there's nothing to do
+	 * in runtime.
+	 */
+
+	/* Equals to `unsigned long bitmap[1] = { BIT(5), }` */
+	bitmap_clear(bitmap, 0, BITS_PER_LONG);
+	if (!test_bit(7, bitmap))
+		bitmap_set(bitmap, 5, 1);
+
+	/* Equals to `unsigned long bitopvar = BIT(20)` */
+	__change_bit(31, &bitopvar);
+	bitmap_shift_right(&bitopvar, &bitopvar, 11, BITS_PER_LONG);
+
+	/* Equals to `unsigned long var = BIT(25)` */
+	var |= BIT(25);
+	if (var & BIT(0))
+		var ^= GENMASK(9, 6);
+
+	/* __const_hweight<32|64>(BIT(5)) = 1 */
+	res = bitmap_weight(bitmap, 20);
+	BUILD_BUG_ON(!__builtin_constant_p(res));
+
+	/* !(BIT(31) & BIT(18)) = 1 */
+	res = !test_bit(18, &bitopvar);
+	BUILD_BUG_ON(!__builtin_constant_p(res));
+
+	/* BIT(2) & GENMASK(14, 8) = 0 */
+	BUILD_BUG_ON(!__builtin_constant_p(initvar & GENMASK(14, 8)));
+	/* ~BIT(25) */
+	BUILD_BUG_ON(!__builtin_constant_p(~var));
+}
+
 static void __init selftest(void)
 {
 	test_zero_clear();
@@ -884,6 +928,7 @@ static void __init selftest(void)
 	test_for_each_set_clump8();
 	test_bitmap_cut();
 	test_bitmap_print_buf();
+	test_bitmap_const_eval();
 }
 
 KSTM_MODULE_LOADERS(test_bitmap);
-- 
2.36.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v3 7/7] lib: test_bitmap: add compile-time optimization/evaluations assertions
@ 2022-06-17 14:40     ` Alexander Lobakin
  0 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-17 14:40 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64

Add a function to the bitmap test suite, which will ensure that
compilers are able to evaluate operations performed by the
bitops/bitmap helpers to compile-time constants when all of the
arguments are compile-time constants as well, or trigger a build
bug otherwise. This should work on all architectures and all the
optimization levels supported by Kbuild.
The function doesn't perform any runtime tests and gets optimized
out to nothing after passing the build assertions.

Suggested-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 lib/test_bitmap.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index d5923a640457..3a7b09b82794 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -869,6 +869,50 @@ static void __init test_bitmap_print_buf(void)
 	}
 }
 
+static void __init test_bitmap_const_eval(void)
+{
+	DECLARE_BITMAP(bitmap, BITS_PER_LONG);
+	unsigned long initvar = BIT(2);
+	unsigned long bitopvar = 0;
+	unsigned long var = 0;
+	int res;
+
+	/*
+	 * Compilers must be able to optimize all of those to compile-time
+	 * constants on any supported optimization level (-O2, -Os) and any
+	 * architecture. Otherwise, trigger a build bug.
+	 * The whole function gets optimized out then, there's nothing to do
+	 * in runtime.
+	 */
+
+	/* Equals to `unsigned long bitmap[1] = { BIT(5), }` */
+	bitmap_clear(bitmap, 0, BITS_PER_LONG);
+	if (!test_bit(7, bitmap))
+		bitmap_set(bitmap, 5, 1);
+
+	/* Equals to `unsigned long bitopvar = BIT(20)` */
+	__change_bit(31, &bitopvar);
+	bitmap_shift_right(&bitopvar, &bitopvar, 11, BITS_PER_LONG);
+
+	/* Equals to `unsigned long var = BIT(25)` */
+	var |= BIT(25);
+	if (var & BIT(0))
+		var ^= GENMASK(9, 6);
+
+	/* __const_hweight<32|64>(BIT(5)) == 1 */
+	res = bitmap_weight(bitmap, 20);
+	BUILD_BUG_ON(!__builtin_constant_p(res));
+
+	/* !(BIT(31) & BIT(18)) == 1 */
+	res = !test_bit(18, &bitopvar);
+	BUILD_BUG_ON(!__builtin_constant_p(res));
+
+	/* BIT(2) & GENMASK(14, 8) == 0 */
+	BUILD_BUG_ON(!__builtin_constant_p(initvar & GENMASK(14, 8)));
+	/* ~BIT(25) */
+	BUILD_BUG_ON(!__builtin_constant_p(~var));
+}
+
 static void __init selftest(void)
 {
 	test_zero_clear();
@@ -884,6 +928,7 @@ static void __init selftest(void)
 	test_for_each_set_clump8();
 	test_bitmap_cut();
 	test_bitmap_print_buf();
+	test_bitmap_const_eval();
 }
 
 KSTM_MODULE_LOADERS(test_bitmap);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [alobakin:bitops 3/7] block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t
@ 2022-06-19  9:20 kernel test robot
  2022-06-17 14:40   ` Alexander Lobakin
  0 siblings, 1 reply; 93+ messages in thread
From: kernel test robot @ 2022-06-19  9:20 UTC (permalink / raw)
  To: Alexander Lobakin; +Cc: kbuild-all, linux-kernel, Yury Norov

tree:   https://github.com/alobakin/linux bitops
head:   9bd39b17ce49d350eed93a031e0da6389067013e
commit: 521611f961a7dda92eefa26e1afd3914c06af64e [3/7] bitops: unify non-atomic bitops prototypes across architectures
config: mips-randconfig-s031-20220619 (https://download.01.org/0day-ci/archive/20220619/202206191726.wq70mbMK-lkp@intel.com/config)
compiler: mips64el-linux-gcc (GCC) 11.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.4-30-g92122700-dirty
        # https://github.com/alobakin/linux/commit/521611f961a7dda92eefa26e1afd3914c06af64e
        git remote add alobakin https://github.com/alobakin/linux
        git fetch --no-tags alobakin bitops
        git checkout 521611f961a7dda92eefa26e1afd3914c06af64e
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=mips SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
   command-line: note: in included file:
   builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQUIRE redefined
   builtin:0:0: sparse: this was the original definition
   builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_SEQ_CST redefined
   builtin:0:0: sparse: this was the original definition
   builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQ_REL redefined
   builtin:0:0: sparse: this was the original definition
   builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_RELEASE redefined
   builtin:0:0: sparse: this was the original definition
   block/elevator.c: note: in included file (through include/linux/bitops.h, include/linux/kernel.h):
   include/asm-generic/bitops/generic-non-atomic.h:29:9: sparse: sparse: unreplaced symbol 'mask'
   include/asm-generic/bitops/generic-non-atomic.h:30:9: sparse: sparse: unreplaced symbol 'p'
   include/asm-generic/bitops/generic-non-atomic.h:32:10: sparse: sparse: unreplaced symbol 'p'
   include/asm-generic/bitops/generic-non-atomic.h:32:16: sparse: sparse: unreplaced symbol 'mask'
   include/asm-generic/bitops/generic-non-atomic.h:27:1: sparse: sparse: unreplaced symbol 'return'
   include/asm-generic/bitops/generic-non-atomic.h:38:9: sparse: sparse: unreplaced symbol 'mask'
   include/asm-generic/bitops/generic-non-atomic.h:39:9: sparse: sparse: unreplaced symbol 'p'
   include/asm-generic/bitops/generic-non-atomic.h:41:10: sparse: sparse: unreplaced symbol 'p'
   include/asm-generic/bitops/generic-non-atomic.h:41:16: sparse: sparse: unreplaced symbol 'mask'
   include/asm-generic/bitops/generic-non-atomic.h:36:1: sparse: sparse: unreplaced symbol 'return'
   include/asm-generic/bitops/generic-non-atomic.h:56:9: sparse: sparse: unreplaced symbol 'mask'
   include/asm-generic/bitops/generic-non-atomic.h:57:9: sparse: sparse: unreplaced symbol 'p'
   include/asm-generic/bitops/generic-non-atomic.h:59:10: sparse: sparse: unreplaced symbol 'p'
   include/asm-generic/bitops/generic-non-atomic.h:59:15: sparse: sparse: unreplaced symbol 'mask'
   include/asm-generic/bitops/generic-non-atomic.h:54:1: sparse: sparse: unreplaced symbol 'return'
   include/asm-generic/bitops/generic-non-atomic.h:74:9: sparse: sparse: unreplaced symbol 'mask'
   include/asm-generic/bitops/generic-non-atomic.h:75:9: sparse: sparse: unreplaced symbol 'p'
   include/asm-generic/bitops/generic-non-atomic.h:76:9: sparse: sparse: unreplaced symbol 'old'
   include/asm-generic/bitops/generic-non-atomic.h:78:10: sparse: sparse: unreplaced symbol 'p'
   include/asm-generic/bitops/generic-non-atomic.h:78:14: sparse: sparse: unreplaced symbol 'old'
   include/asm-generic/bitops/generic-non-atomic.h:78:20: sparse: sparse: unreplaced symbol 'mask'
   include/asm-generic/bitops/generic-non-atomic.h:79:17: sparse: sparse: unreplaced symbol 'old'
   include/asm-generic/bitops/generic-non-atomic.h:79:23: sparse: sparse: unreplaced symbol 'mask'
   include/asm-generic/bitops/generic-non-atomic.h:79:9: sparse: sparse: unreplaced symbol 'return'
   include/asm-generic/bitops/generic-non-atomic.h:72:1: sparse: sparse: unreplaced symbol 'return'
   include/asm-generic/bitops/generic-non-atomic.h:94:9: sparse: sparse: unreplaced symbol 'mask'
   include/asm-generic/bitops/generic-non-atomic.h:95:9: sparse: sparse: unreplaced symbol 'p'
   include/asm-generic/bitops/generic-non-atomic.h:96:9: sparse: sparse: unreplaced symbol 'old'
   include/asm-generic/bitops/generic-non-atomic.h:98:10: sparse: sparse: unreplaced symbol 'p'
   include/asm-generic/bitops/generic-non-atomic.h:98:14: sparse: sparse: unreplaced symbol 'old'
   include/asm-generic/bitops/generic-non-atomic.h:98:21: sparse: sparse: unreplaced symbol 'mask'
   include/asm-generic/bitops/generic-non-atomic.h:99:17: sparse: sparse: unreplaced symbol 'old'
   include/asm-generic/bitops/generic-non-atomic.h:99:23: sparse: sparse: unreplaced symbol 'mask'
   include/asm-generic/bitops/generic-non-atomic.h:99:9: sparse: sparse: unreplaced symbol 'return'
   include/asm-generic/bitops/generic-non-atomic.h:92:1: sparse: sparse: unreplaced symbol 'return'
   include/asm-generic/bitops/generic-non-atomic.h:106:9: sparse: sparse: unreplaced symbol 'mask'
   include/asm-generic/bitops/generic-non-atomic.h:107:9: sparse: sparse: unreplaced symbol 'p'
   include/asm-generic/bitops/generic-non-atomic.h:108:9: sparse: sparse: unreplaced symbol 'old'
   include/asm-generic/bitops/generic-non-atomic.h:110:10: sparse: sparse: unreplaced symbol 'p'
   include/asm-generic/bitops/generic-non-atomic.h:110:14: sparse: sparse: unreplaced symbol 'old'
   include/asm-generic/bitops/generic-non-atomic.h:110:20: sparse: sparse: unreplaced symbol 'mask'
   include/asm-generic/bitops/generic-non-atomic.h:111:17: sparse: sparse: unreplaced symbol 'old'
   include/asm-generic/bitops/generic-non-atomic.h:111:23: sparse: sparse: unreplaced symbol 'mask'
   include/asm-generic/bitops/generic-non-atomic.h:111:9: sparse: sparse: unreplaced symbol 'return'
   include/asm-generic/bitops/generic-non-atomic.h:104:1: sparse: sparse: unreplaced symbol 'return'
   include/asm-generic/bitops/generic-non-atomic.h:127:9: sparse: sparse: unreplaced symbol 'return'
   include/asm-generic/bitops/generic-non-atomic.h:120:1: sparse: sparse: unreplaced symbol 'return'
>> block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t

vim +222 block/elevator.c

9817064b68fef7 Jens Axboe        2006-07-28  217  
70b3ea056f3074 Jens Axboe        2016-12-07  218  void elv_rqhash_add(struct request_queue *q, struct request *rq)
9817064b68fef7 Jens Axboe        2006-07-28  219  {
b374d18a4bfce7 Jens Axboe        2008-10-31  220  	struct elevator_queue *e = q->elevator;
9817064b68fef7 Jens Axboe        2006-07-28  221  
9817064b68fef7 Jens Axboe        2006-07-28 @222  	BUG_ON(ELV_ON_HASH(rq));
242d98f077ac0a Sasha Levin       2012-12-17  223  	hash_add(e->hash, &rq->hash, rq_hash_key(rq));
e806402130c9c4 Christoph Hellwig 2016-10-20  224  	rq->rq_flags |= RQF_HASHED;
9817064b68fef7 Jens Axboe        2006-07-28  225  }
bd166ef183c263 Jens Axboe        2017-01-17  226  EXPORT_SYMBOL_GPL(elv_rqhash_add);
9817064b68fef7 Jens Axboe        2006-07-28  227  

:::::: The code at line 222 was first introduced by commit
:::::: 9817064b68fef7e4580c6df1ea597e106b9ff88b [PATCH] elevator: move the backmerging logic into the elevator core

:::::: TO: Jens Axboe <axboe@suse.de>
:::::: CC: Jens Axboe <axboe@nelson.home.kernel.dk>

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 2/7] bitops: always define asm-generic non-atomic bitops
  2022-06-17 14:40     ` Alexander Lobakin
  (?)
@ 2022-06-20  9:49       ` Marco Elver
  -1 siblings, 0 replies; 93+ messages in thread
From: Marco Elver @ 2022-06-20  9:49 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Mark Rutland,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Borislav Petkov, Tony Luck, Maciej Fijalkowski, Jesse Brandeburg,
	Greg Kroah-Hartman, linux-alpha, linux-hexagon, linux-ia64,
	linux-m68k, linux-sh, sparclinux, linux-arch, linux-kernel

On Fri, 17 Jun 2022 at 19:19, Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
>
> Move generic non-atomic bitops from the asm-generic header which
> gets included only when there are no architecture-specific
> alternatives, to a separate independent file to make them always
> available.
> Almost no actual code changes, only one comment added to
> generic_test_bit() saying that it's an atomic operation itself
> and thus `volatile` must always stay there with no cast-aways.
>
> Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> # comment
> Suggested-by: Marco Elver <elver@google.com> # reference to kernel-doc
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

Reviewed-by: Marco Elver <elver@google.com>

> ---
>  .../asm-generic/bitops/generic-non-atomic.h   | 130 ++++++++++++++++++
>  include/asm-generic/bitops/non-atomic.h       | 110 ++-------------
>  2 files changed, 138 insertions(+), 102 deletions(-)
>  create mode 100644 include/asm-generic/bitops/generic-non-atomic.h
>
> diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> new file mode 100644
> index 000000000000..7226488810e5
> --- /dev/null
> +++ b/include/asm-generic/bitops/generic-non-atomic.h
> @@ -0,0 +1,130 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#ifndef __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
> +#define __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
> +
> +#include <linux/bits.h>
> +
> +#ifndef _LINUX_BITOPS_H
> +#error only <linux/bitops.h> can be included directly
> +#endif
> +
> +/*
> + * Generic definitions for bit operations, should not be used in regular code
> + * directly.
> + */
> +
> +/**
> + * generic___set_bit - Set a bit in memory
> + * @nr: the bit to set
> + * @addr: the address to start counting from
> + *
> + * Unlike set_bit(), this function is non-atomic and may be reordered.
> + * If it's called on the same region of memory simultaneously, the effect
> + * may be that only one operation succeeds.
> + */
> +static __always_inline void
> +generic___set_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +       unsigned long mask = BIT_MASK(nr);
> +       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +
> +       *p  |= mask;
> +}
> +
> +static __always_inline void
> +generic___clear_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +       unsigned long mask = BIT_MASK(nr);
> +       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +
> +       *p &= ~mask;
> +}
> +
> +/**
> + * generic___change_bit - Toggle a bit in memory
> + * @nr: the bit to change
> + * @addr: the address to start counting from
> + *
> + * Unlike change_bit(), this function is non-atomic and may be reordered.
> + * If it's called on the same region of memory simultaneously, the effect
> + * may be that only one operation succeeds.
> + */
> +static __always_inline
> +void generic___change_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +       unsigned long mask = BIT_MASK(nr);
> +       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +
> +       *p ^= mask;
> +}
> +
> +/**
> + * generic___test_and_set_bit - Set a bit and return its old value
> + * @nr: Bit to set
> + * @addr: Address to count from
> + *
> + * This operation is non-atomic and can be reordered.
> + * If two examples of this operation race, one can appear to succeed
> + * but actually fail.  You must protect multiple accesses with a lock.
> + */
> +static __always_inline int
> +generic___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +       unsigned long mask = BIT_MASK(nr);
> +       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +       unsigned long old = *p;
> +
> +       *p = old | mask;
> +       return (old & mask) != 0;
> +}
> +
> +/**
> + * generic___test_and_clear_bit - Clear a bit and return its old value
> + * @nr: Bit to clear
> + * @addr: Address to count from
> + *
> + * This operation is non-atomic and can be reordered.
> + * If two examples of this operation race, one can appear to succeed
> + * but actually fail.  You must protect multiple accesses with a lock.
> + */
> +static __always_inline int
> +generic___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +       unsigned long mask = BIT_MASK(nr);
> +       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +       unsigned long old = *p;
> +
> +       *p = old & ~mask;
> +       return (old & mask) != 0;
> +}
> +
> +/* WARNING: non atomic and it can be reordered! */
> +static __always_inline int
> +generic___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +       unsigned long mask = BIT_MASK(nr);
> +       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +       unsigned long old = *p;
> +
> +       *p = old ^ mask;
> +       return (old & mask) != 0;
> +}
> +
> +/**
> + * generic_test_bit - Determine whether a bit is set
> + * @nr: bit number to test
> + * @addr: Address to start counting from
> + */
> +static __always_inline int
> +generic_test_bit(unsigned int nr, const volatile unsigned long *addr)
> +{
> +       /*
> +        * Unlike the bitops with the '__' prefix above, this one *is* atomic,
> +        * so `volatile` must always stay here with no cast-aways. See
> +        * `Documentation/atomic_bitops.txt` for the details.
> +        */
> +       return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> +}
> +
> +#endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
> diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h
> index 078cc68be2f1..23d3abc1e10d 100644
> --- a/include/asm-generic/bitops/non-atomic.h
> +++ b/include/asm-generic/bitops/non-atomic.h
> @@ -2,121 +2,27 @@
>  #ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
>  #define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
>
> -#include <asm/types.h>
> +#include <asm-generic/bitops/generic-non-atomic.h>
>
> -/**
> - * arch___set_bit - Set a bit in memory
> - * @nr: the bit to set
> - * @addr: the address to start counting from
> - *
> - * Unlike set_bit(), this function is non-atomic and may be reordered.
> - * If it's called on the same region of memory simultaneously, the effect
> - * may be that only one operation succeeds.
> - */
> -static __always_inline void
> -arch___set_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -       unsigned long mask = BIT_MASK(nr);
> -       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -
> -       *p  |= mask;
> -}
> +#define arch___set_bit generic___set_bit
>  #define __set_bit arch___set_bit
>
> -static __always_inline void
> -arch___clear_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -       unsigned long mask = BIT_MASK(nr);
> -       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -
> -       *p &= ~mask;
> -}
> +#define arch___clear_bit generic___clear_bit
>  #define __clear_bit arch___clear_bit
>
> -/**
> - * arch___change_bit - Toggle a bit in memory
> - * @nr: the bit to change
> - * @addr: the address to start counting from
> - *
> - * Unlike change_bit(), this function is non-atomic and may be reordered.
> - * If it's called on the same region of memory simultaneously, the effect
> - * may be that only one operation succeeds.
> - */
> -static __always_inline
> -void arch___change_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -       unsigned long mask = BIT_MASK(nr);
> -       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -
> -       *p ^= mask;
> -}
> +#define arch___change_bit generic___change_bit
>  #define __change_bit arch___change_bit
>
> -/**
> - * arch___test_and_set_bit - Set a bit and return its old value
> - * @nr: Bit to set
> - * @addr: Address to count from
> - *
> - * This operation is non-atomic and can be reordered.
> - * If two examples of this operation race, one can appear to succeed
> - * but actually fail.  You must protect multiple accesses with a lock.
> - */
> -static __always_inline int
> -arch___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -       unsigned long mask = BIT_MASK(nr);
> -       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -       unsigned long old = *p;
> -
> -       *p = old | mask;
> -       return (old & mask) != 0;
> -}
> +#define arch___test_and_set_bit generic___test_and_set_bit
>  #define __test_and_set_bit arch___test_and_set_bit
>
> -/**
> - * arch___test_and_clear_bit - Clear a bit and return its old value
> - * @nr: Bit to clear
> - * @addr: Address to count from
> - *
> - * This operation is non-atomic and can be reordered.
> - * If two examples of this operation race, one can appear to succeed
> - * but actually fail.  You must protect multiple accesses with a lock.
> - */
> -static __always_inline int
> -arch___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -       unsigned long mask = BIT_MASK(nr);
> -       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -       unsigned long old = *p;
> -
> -       *p = old & ~mask;
> -       return (old & mask) != 0;
> -}
> +#define arch___test_and_clear_bit generic___test_and_clear_bit
>  #define __test_and_clear_bit arch___test_and_clear_bit
>
> -/* WARNING: non atomic and it can be reordered! */
> -static __always_inline int
> -arch___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -       unsigned long mask = BIT_MASK(nr);
> -       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -       unsigned long old = *p;
> -
> -       *p = old ^ mask;
> -       return (old & mask) != 0;
> -}
> +#define arch___test_and_change_bit generic___test_and_change_bit
>  #define __test_and_change_bit arch___test_and_change_bit
>
> -/**
> - * arch_test_bit - Determine whether a bit is set
> - * @nr: bit number to test
> - * @addr: Address to start counting from
> - */
> -static __always_inline int
> -arch_test_bit(unsigned int nr, const volatile unsigned long *addr)
> -{
> -       return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> -}
> +#define arch_test_bit generic_test_bit
>  #define test_bit arch_test_bit
>
>  #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
> --
> 2.36.1
>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 2/7] bitops: always define asm-generic non-atomic bitops
@ 2022-06-20  9:49       ` Marco Elver
  0 siblings, 0 replies; 93+ messages in thread
From: Marco Elver @ 2022-06-20  9:49 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Mark Rutland,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Borislav Petkov, Tony Luck, Maciej Fijalkowski, Jesse Brandeburg,
	Greg Kroah-Hartman, linux-alpha, linux-hexagon, linux-ia64,
	linux-m68k, linux-sh, sparclinux, linux-arch, linux-kernel

On Fri, 17 Jun 2022 at 19:19, Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
>
> Move generic non-atomic bitops from the asm-generic header which
> gets included only when there are no architecture-specific
> alternatives, to a separate independent file to make them always
> available.
> Almost no actual code changes, only one comment added to
> generic_test_bit() saying that it's an atomic operation itself
> and thus `volatile` must always stay there with no cast-aways.
>
> Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> # comment
> Suggested-by: Marco Elver <elver@google.com> # reference to kernel-doc
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

Reviewed-by: Marco Elver <elver@google.com>

> ---
>  .../asm-generic/bitops/generic-non-atomic.h   | 130 ++++++++++++++++++
>  include/asm-generic/bitops/non-atomic.h       | 110 ++-------------
>  2 files changed, 138 insertions(+), 102 deletions(-)
>  create mode 100644 include/asm-generic/bitops/generic-non-atomic.h
>
> diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> new file mode 100644
> index 000000000000..7226488810e5
> --- /dev/null
> +++ b/include/asm-generic/bitops/generic-non-atomic.h
> @@ -0,0 +1,130 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#ifndef __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
> +#define __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
> +
> +#include <linux/bits.h>
> +
> +#ifndef _LINUX_BITOPS_H
> +#error only <linux/bitops.h> can be included directly
> +#endif
> +
> +/*
> + * Generic definitions for bit operations, should not be used in regular code
> + * directly.
> + */
> +
> +/**
> + * generic___set_bit - Set a bit in memory
> + * @nr: the bit to set
> + * @addr: the address to start counting from
> + *
> + * Unlike set_bit(), this function is non-atomic and may be reordered.
> + * If it's called on the same region of memory simultaneously, the effect
> + * may be that only one operation succeeds.
> + */
> +static __always_inline void
> +generic___set_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +       unsigned long mask = BIT_MASK(nr);
> +       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +
> +       *p  |= mask;
> +}
> +
> +static __always_inline void
> +generic___clear_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +       unsigned long mask = BIT_MASK(nr);
> +       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +
> +       *p &= ~mask;
> +}
> +
> +/**
> + * generic___change_bit - Toggle a bit in memory
> + * @nr: the bit to change
> + * @addr: the address to start counting from
> + *
> + * Unlike change_bit(), this function is non-atomic and may be reordered.
> + * If it's called on the same region of memory simultaneously, the effect
> + * may be that only one operation succeeds.
> + */
> +static __always_inline
> +void generic___change_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +       unsigned long mask = BIT_MASK(nr);
> +       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +
> +       *p ^= mask;
> +}
> +
> +/**
> + * generic___test_and_set_bit - Set a bit and return its old value
> + * @nr: Bit to set
> + * @addr: Address to count from
> + *
> + * This operation is non-atomic and can be reordered.
> + * If two examples of this operation race, one can appear to succeed
> + * but actually fail.  You must protect multiple accesses with a lock.
> + */
> +static __always_inline int
> +generic___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +       unsigned long mask = BIT_MASK(nr);
> +       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +       unsigned long old = *p;
> +
> +       *p = old | mask;
> +       return (old & mask) != 0;
> +}
> +
> +/**
> + * generic___test_and_clear_bit - Clear a bit and return its old value
> + * @nr: Bit to clear
> + * @addr: Address to count from
> + *
> + * This operation is non-atomic and can be reordered.
> + * If two examples of this operation race, one can appear to succeed
> + * but actually fail.  You must protect multiple accesses with a lock.
> + */
> +static __always_inline int
> +generic___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +       unsigned long mask = BIT_MASK(nr);
> +       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +       unsigned long old = *p;
> +
> +       *p = old & ~mask;
> +       return (old & mask) != 0;
> +}
> +
> +/* WARNING: non atomic and it can be reordered! */
> +static __always_inline int
> +generic___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +       unsigned long mask = BIT_MASK(nr);
> +       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +       unsigned long old = *p;
> +
> +       *p = old ^ mask;
> +       return (old & mask) != 0;
> +}
> +
> +/**
> + * generic_test_bit - Determine whether a bit is set
> + * @nr: bit number to test
> + * @addr: Address to start counting from
> + */
> +static __always_inline int
> +generic_test_bit(unsigned int nr, const volatile unsigned long *addr)
> +{
> +       /*
> +        * Unlike the bitops with the '__' prefix above, this one *is* atomic,
> +        * so `volatile` must always stay here with no cast-aways. See
> +        * `Documentation/atomic_bitops.txt` for the details.
> +        */
> +       return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> +}
> +
> +#endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
> diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h
> index 078cc68be2f1..23d3abc1e10d 100644
> --- a/include/asm-generic/bitops/non-atomic.h
> +++ b/include/asm-generic/bitops/non-atomic.h
> @@ -2,121 +2,27 @@
>  #ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
>  #define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
>
> -#include <asm/types.h>
> +#include <asm-generic/bitops/generic-non-atomic.h>
>
> -/**
> - * arch___set_bit - Set a bit in memory
> - * @nr: the bit to set
> - * @addr: the address to start counting from
> - *
> - * Unlike set_bit(), this function is non-atomic and may be reordered.
> - * If it's called on the same region of memory simultaneously, the effect
> - * may be that only one operation succeeds.
> - */
> -static __always_inline void
> -arch___set_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -       unsigned long mask = BIT_MASK(nr);
> -       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -
> -       *p  |= mask;
> -}
> +#define arch___set_bit generic___set_bit
>  #define __set_bit arch___set_bit
>
> -static __always_inline void
> -arch___clear_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -       unsigned long mask = BIT_MASK(nr);
> -       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -
> -       *p &= ~mask;
> -}
> +#define arch___clear_bit generic___clear_bit
>  #define __clear_bit arch___clear_bit
>
> -/**
> - * arch___change_bit - Toggle a bit in memory
> - * @nr: the bit to change
> - * @addr: the address to start counting from
> - *
> - * Unlike change_bit(), this function is non-atomic and may be reordered.
> - * If it's called on the same region of memory simultaneously, the effect
> - * may be that only one operation succeeds.
> - */
> -static __always_inline
> -void arch___change_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -       unsigned long mask = BIT_MASK(nr);
> -       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -
> -       *p ^= mask;
> -}
> +#define arch___change_bit generic___change_bit
>  #define __change_bit arch___change_bit
>
> -/**
> - * arch___test_and_set_bit - Set a bit and return its old value
> - * @nr: Bit to set
> - * @addr: Address to count from
> - *
> - * This operation is non-atomic and can be reordered.
> - * If two examples of this operation race, one can appear to succeed
> - * but actually fail.  You must protect multiple accesses with a lock.
> - */
> -static __always_inline int
> -arch___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -       unsigned long mask = BIT_MASK(nr);
> -       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -       unsigned long old = *p;
> -
> -       *p = old | mask;
> -       return (old & mask) != 0;
> -}
> +#define arch___test_and_set_bit generic___test_and_set_bit
>  #define __test_and_set_bit arch___test_and_set_bit
>
> -/**
> - * arch___test_and_clear_bit - Clear a bit and return its old value
> - * @nr: Bit to clear
> - * @addr: Address to count from
> - *
> - * This operation is non-atomic and can be reordered.
> - * If two examples of this operation race, one can appear to succeed
> - * but actually fail.  You must protect multiple accesses with a lock.
> - */
> -static __always_inline int
> -arch___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -       unsigned long mask = BIT_MASK(nr);
> -       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -       unsigned long old = *p;
> -
> -       *p = old & ~mask;
> -       return (old & mask) != 0;
> -}
> +#define arch___test_and_clear_bit generic___test_and_clear_bit
>  #define __test_and_clear_bit arch___test_and_clear_bit
>
> -/* WARNING: non atomic and it can be reordered! */
> -static __always_inline int
> -arch___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -       unsigned long mask = BIT_MASK(nr);
> -       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -       unsigned long old = *p;
> -
> -       *p = old ^ mask;
> -       return (old & mask) != 0;
> -}
> +#define arch___test_and_change_bit generic___test_and_change_bit
>  #define __test_and_change_bit arch___test_and_change_bit
>
> -/**
> - * arch_test_bit - Determine whether a bit is set
> - * @nr: bit number to test
> - * @addr: Address to start counting from
> - */
> -static __always_inline int
> -arch_test_bit(unsigned int nr, const volatile unsigned long *addr)
> -{
> -       return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> -}
> +#define arch_test_bit generic_test_bit
>  #define test_bit arch_test_bit
>
>  #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
> --
> 2.36.1
>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 2/7] bitops: always define asm-generic non-atomic bitops
@ 2022-06-20  9:49       ` Marco Elver
  0 siblings, 0 replies; 93+ messages in thread
From: Marco Elver @ 2022-06-20  9:49 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Mark Rutland,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Borislav Petkov, Tony Luck, Maciej Fijalkowski, Jesse Brandeburg,
	Greg Kroah-Hartman, linux-alpha, linux-hexagon, linux-ia64, linu

On Fri, 17 Jun 2022 at 19:19, Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
>
> Move generic non-atomic bitops from the asm-generic header which
> gets included only when there are no architecture-specific
> alternatives, to a separate independent file to make them always
> available.
> Almost no actual code changes, only one comment added to
> generic_test_bit() saying that it's an atomic operation itself
> and thus `volatile` must always stay there with no cast-aways.
>
> Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> # comment
> Suggested-by: Marco Elver <elver@google.com> # reference to kernel-doc
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

Reviewed-by: Marco Elver <elver@google.com>

> ---
>  .../asm-generic/bitops/generic-non-atomic.h   | 130 ++++++++++++++++++
>  include/asm-generic/bitops/non-atomic.h       | 110 ++-------------
>  2 files changed, 138 insertions(+), 102 deletions(-)
>  create mode 100644 include/asm-generic/bitops/generic-non-atomic.h
>
> diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> new file mode 100644
> index 000000000000..7226488810e5
> --- /dev/null
> +++ b/include/asm-generic/bitops/generic-non-atomic.h
> @@ -0,0 +1,130 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#ifndef __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
> +#define __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
> +
> +#include <linux/bits.h>
> +
> +#ifndef _LINUX_BITOPS_H
> +#error only <linux/bitops.h> can be included directly
> +#endif
> +
> +/*
> + * Generic definitions for bit operations, should not be used in regular code
> + * directly.
> + */
> +
> +/**
> + * generic___set_bit - Set a bit in memory
> + * @nr: the bit to set
> + * @addr: the address to start counting from
> + *
> + * Unlike set_bit(), this function is non-atomic and may be reordered.
> + * If it's called on the same region of memory simultaneously, the effect
> + * may be that only one operation succeeds.
> + */
> +static __always_inline void
> +generic___set_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +       unsigned long mask = BIT_MASK(nr);
> +       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +
> +       *p  |= mask;
> +}
> +
> +static __always_inline void
> +generic___clear_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +       unsigned long mask = BIT_MASK(nr);
> +       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +
> +       *p &= ~mask;
> +}
> +
> +/**
> + * generic___change_bit - Toggle a bit in memory
> + * @nr: the bit to change
> + * @addr: the address to start counting from
> + *
> + * Unlike change_bit(), this function is non-atomic and may be reordered.
> + * If it's called on the same region of memory simultaneously, the effect
> + * may be that only one operation succeeds.
> + */
> +static __always_inline
> +void generic___change_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +       unsigned long mask = BIT_MASK(nr);
> +       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +
> +       *p ^= mask;
> +}
> +
> +/**
> + * generic___test_and_set_bit - Set a bit and return its old value
> + * @nr: Bit to set
> + * @addr: Address to count from
> + *
> + * This operation is non-atomic and can be reordered.
> + * If two examples of this operation race, one can appear to succeed
> + * but actually fail.  You must protect multiple accesses with a lock.
> + */
> +static __always_inline int
> +generic___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +       unsigned long mask = BIT_MASK(nr);
> +       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +       unsigned long old = *p;
> +
> +       *p = old | mask;
> +       return (old & mask) != 0;
> +}
> +
> +/**
> + * generic___test_and_clear_bit - Clear a bit and return its old value
> + * @nr: Bit to clear
> + * @addr: Address to count from
> + *
> + * This operation is non-atomic and can be reordered.
> + * If two examples of this operation race, one can appear to succeed
> + * but actually fail.  You must protect multiple accesses with a lock.
> + */
> +static __always_inline int
> +generic___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +       unsigned long mask = BIT_MASK(nr);
> +       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +       unsigned long old = *p;
> +
> +       *p = old & ~mask;
> +       return (old & mask) != 0;
> +}
> +
> +/* WARNING: non atomic and it can be reordered! */
> +static __always_inline int
> +generic___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> +       unsigned long mask = BIT_MASK(nr);
> +       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +       unsigned long old = *p;
> +
> +       *p = old ^ mask;
> +       return (old & mask) != 0;
> +}
> +
> +/**
> + * generic_test_bit - Determine whether a bit is set
> + * @nr: bit number to test
> + * @addr: Address to start counting from
> + */
> +static __always_inline int
> +generic_test_bit(unsigned int nr, const volatile unsigned long *addr)
> +{
> +       /*
> +        * Unlike the bitops with the '__' prefix above, this one *is* atomic,
> +        * so `volatile` must always stay here with no cast-aways. See
> +        * `Documentation/atomic_bitops.txt` for the details.
> +        */
> +       return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> +}
> +
> +#endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
> diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h
> index 078cc68be2f1..23d3abc1e10d 100644
> --- a/include/asm-generic/bitops/non-atomic.h
> +++ b/include/asm-generic/bitops/non-atomic.h
> @@ -2,121 +2,27 @@
>  #ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
>  #define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
>
> -#include <asm/types.h>
> +#include <asm-generic/bitops/generic-non-atomic.h>
>
> -/**
> - * arch___set_bit - Set a bit in memory
> - * @nr: the bit to set
> - * @addr: the address to start counting from
> - *
> - * Unlike set_bit(), this function is non-atomic and may be reordered.
> - * If it's called on the same region of memory simultaneously, the effect
> - * may be that only one operation succeeds.
> - */
> -static __always_inline void
> -arch___set_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -       unsigned long mask = BIT_MASK(nr);
> -       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -
> -       *p  |= mask;
> -}
> +#define arch___set_bit generic___set_bit
>  #define __set_bit arch___set_bit
>
> -static __always_inline void
> -arch___clear_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -       unsigned long mask = BIT_MASK(nr);
> -       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -
> -       *p &= ~mask;
> -}
> +#define arch___clear_bit generic___clear_bit
>  #define __clear_bit arch___clear_bit
>
> -/**
> - * arch___change_bit - Toggle a bit in memory
> - * @nr: the bit to change
> - * @addr: the address to start counting from
> - *
> - * Unlike change_bit(), this function is non-atomic and may be reordered.
> - * If it's called on the same region of memory simultaneously, the effect
> - * may be that only one operation succeeds.
> - */
> -static __always_inline
> -void arch___change_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -       unsigned long mask = BIT_MASK(nr);
> -       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -
> -       *p ^= mask;
> -}
> +#define arch___change_bit generic___change_bit
>  #define __change_bit arch___change_bit
>
> -/**
> - * arch___test_and_set_bit - Set a bit and return its old value
> - * @nr: Bit to set
> - * @addr: Address to count from
> - *
> - * This operation is non-atomic and can be reordered.
> - * If two examples of this operation race, one can appear to succeed
> - * but actually fail.  You must protect multiple accesses with a lock.
> - */
> -static __always_inline int
> -arch___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -       unsigned long mask = BIT_MASK(nr);
> -       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -       unsigned long old = *p;
> -
> -       *p = old | mask;
> -       return (old & mask) != 0;
> -}
> +#define arch___test_and_set_bit generic___test_and_set_bit
>  #define __test_and_set_bit arch___test_and_set_bit
>
> -/**
> - * arch___test_and_clear_bit - Clear a bit and return its old value
> - * @nr: Bit to clear
> - * @addr: Address to count from
> - *
> - * This operation is non-atomic and can be reordered.
> - * If two examples of this operation race, one can appear to succeed
> - * but actually fail.  You must protect multiple accesses with a lock.
> - */
> -static __always_inline int
> -arch___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -       unsigned long mask = BIT_MASK(nr);
> -       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -       unsigned long old = *p;
> -
> -       *p = old & ~mask;
> -       return (old & mask) != 0;
> -}
> +#define arch___test_and_clear_bit generic___test_and_clear_bit
>  #define __test_and_clear_bit arch___test_and_clear_bit
>
> -/* WARNING: non atomic and it can be reordered! */
> -static __always_inline int
> -arch___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> -       unsigned long mask = BIT_MASK(nr);
> -       unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -       unsigned long old = *p;
> -
> -       *p = old ^ mask;
> -       return (old & mask) != 0;
> -}
> +#define arch___test_and_change_bit generic___test_and_change_bit
>  #define __test_and_change_bit arch___test_and_change_bit
>
> -/**
> - * arch_test_bit - Determine whether a bit is set
> - * @nr: bit number to test
> - * @addr: Address to start counting from
> - */
> -static __always_inline int
> -arch_test_bit(unsigned int nr, const volatile unsigned long *addr)
> -{
> -       return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> -}
> +#define arch_test_bit generic_test_bit
>  #define test_bit arch_test_bit
>
>  #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
> --
> 2.36.1
>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 4/7] bitops: define const_*() versions of the non-atomics
  2022-06-17 14:40     ` Alexander Lobakin
  (?)
@ 2022-06-20  9:49       ` Marco Elver
  -1 siblings, 0 replies; 93+ messages in thread
From: Marco Elver @ 2022-06-20  9:49 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Mark Rutland,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Borislav Petkov, Tony Luck, Maciej Fijalkowski, Jesse Brandeburg,
	Greg Kroah-Hartman, linux-alpha, linux-hexagon, linux-ia64,
	linux-m68k, linux-sh, sparclinux, linux-arch, linux-kernel

On Fri, 17 Jun 2022 at 19:21, Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
>
> Define const_*() variants of the non-atomic bitops to be used when
> the input arguments are compile-time constants, so that the compiler
> will be always able to resolve those to compile-time constants as
> well. Those are mostly direct aliases for generic_*() with one
> exception for const_test_bit(): the original one is declared
> atomic-safe and thus doesn't discard the `volatile` qualifier, so
> in order to let optimize code, define it separately disregarding
> the qualifier.
> Add them to the compile-time type checks as well just in case.
>
> Suggested-by: Marco Elver <elver@google.com>
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>

Reviewed-by: Marco Elver <elver@google.com>

> ---
>  .../asm-generic/bitops/generic-non-atomic.h   | 31 +++++++++++++++++++
>  include/linux/bitops.h                        |  1 +
>  2 files changed, 32 insertions(+)
>
> diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> index b85b8a2ac239..3d5ebd24652b 100644
> --- a/include/asm-generic/bitops/generic-non-atomic.h
> +++ b/include/asm-generic/bitops/generic-non-atomic.h
> @@ -127,4 +127,35 @@ generic_test_bit(unsigned long nr, const volatile unsigned long *addr)
>         return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
>  }
>
> +/*
> + * const_*() definitions provide good compile-time optimizations when
> + * the passed arguments can be resolved at compile time.
> + */
> +#define const___set_bit                        generic___set_bit
> +#define const___clear_bit              generic___clear_bit
> +#define const___change_bit             generic___change_bit
> +#define const___test_and_set_bit       generic___test_and_set_bit
> +#define const___test_and_clear_bit     generic___test_and_clear_bit
> +#define const___test_and_change_bit    generic___test_and_change_bit
> +
> +/**
> + * const_test_bit - Determine whether a bit is set
> + * @nr: bit number to test
> + * @addr: Address to start counting from
> + *
> + * A version of generic_test_bit() which discards the `volatile` qualifier to
> + * allow a compiler to optimize code harder. Non-atomic and to be called only
> + * for testing compile-time constants, e.g. by the corresponding macros, not
> + * directly from "regular" code.
> + */
> +static __always_inline bool
> +const_test_bit(unsigned long nr, const volatile unsigned long *addr)
> +{
> +       const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
> +       unsigned long mask = BIT_MASK(nr);
> +       unsigned long val = *p;
> +
> +       return !!(val & mask);
> +}
> +
>  #endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index 87087454a288..d393297287d5 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -37,6 +37,7 @@ extern unsigned long __sw_hweight64(__u64 w);
>  /* Check that the bitops prototypes are sane */
>  #define __check_bitop_pr(name)                                         \
>         static_assert(__same_type(arch_##name, generic_##name) &&       \
> +                     __same_type(const_##name, generic_##name) &&      \
>                       __same_type(name, generic_##name))
>
>  __check_bitop_pr(__set_bit);
> --
> 2.36.1
>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 4/7] bitops: define const_*() versions of the non-atomics
@ 2022-06-20  9:49       ` Marco Elver
  0 siblings, 0 replies; 93+ messages in thread
From: Marco Elver @ 2022-06-20  9:49 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Mark Rutland,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Borislav Petkov, Tony Luck, Maciej Fijalkowski, Jesse Brandeburg,
	Greg Kroah-Hartman, linux-alpha, linux-hexagon, linux-ia64,
	linux-m68k, linux-sh, sparclinux, linux-arch, linux-kernel

On Fri, 17 Jun 2022 at 19:21, Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
>
> Define const_*() variants of the non-atomic bitops to be used when
> the input arguments are compile-time constants, so that the compiler
> will be always able to resolve those to compile-time constants as
> well. Those are mostly direct aliases for generic_*() with one
> exception for const_test_bit(): the original one is declared
> atomic-safe and thus doesn't discard the `volatile` qualifier, so
> in order to let optimize code, define it separately disregarding
> the qualifier.
> Add them to the compile-time type checks as well just in case.
>
> Suggested-by: Marco Elver <elver@google.com>
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>

Reviewed-by: Marco Elver <elver@google.com>

> ---
>  .../asm-generic/bitops/generic-non-atomic.h   | 31 +++++++++++++++++++
>  include/linux/bitops.h                        |  1 +
>  2 files changed, 32 insertions(+)
>
> diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> index b85b8a2ac239..3d5ebd24652b 100644
> --- a/include/asm-generic/bitops/generic-non-atomic.h
> +++ b/include/asm-generic/bitops/generic-non-atomic.h
> @@ -127,4 +127,35 @@ generic_test_bit(unsigned long nr, const volatile unsigned long *addr)
>         return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
>  }
>
> +/*
> + * const_*() definitions provide good compile-time optimizations when
> + * the passed arguments can be resolved at compile time.
> + */
> +#define const___set_bit                        generic___set_bit
> +#define const___clear_bit              generic___clear_bit
> +#define const___change_bit             generic___change_bit
> +#define const___test_and_set_bit       generic___test_and_set_bit
> +#define const___test_and_clear_bit     generic___test_and_clear_bit
> +#define const___test_and_change_bit    generic___test_and_change_bit
> +
> +/**
> + * const_test_bit - Determine whether a bit is set
> + * @nr: bit number to test
> + * @addr: Address to start counting from
> + *
> + * A version of generic_test_bit() which discards the `volatile` qualifier to
> + * allow a compiler to optimize code harder. Non-atomic and to be called only
> + * for testing compile-time constants, e.g. by the corresponding macros, not
> + * directly from "regular" code.
> + */
> +static __always_inline bool
> +const_test_bit(unsigned long nr, const volatile unsigned long *addr)
> +{
> +       const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
> +       unsigned long mask = BIT_MASK(nr);
> +       unsigned long val = *p;
> +
> +       return !!(val & mask);
> +}
> +
>  #endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index 87087454a288..d393297287d5 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -37,6 +37,7 @@ extern unsigned long __sw_hweight64(__u64 w);
>  /* Check that the bitops prototypes are sane */
>  #define __check_bitop_pr(name)                                         \
>         static_assert(__same_type(arch_##name, generic_##name) &&       \
> +                     __same_type(const_##name, generic_##name) &&      \
>                       __same_type(name, generic_##name))
>
>  __check_bitop_pr(__set_bit);
> --
> 2.36.1
>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 4/7] bitops: define const_*() versions of the non-atomics
@ 2022-06-20  9:49       ` Marco Elver
  0 siblings, 0 replies; 93+ messages in thread
From: Marco Elver @ 2022-06-20  9:49 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Mark Rutland,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Borislav Petkov, Tony Luck, Maciej Fijalkowski, Jesse Brandeburg,
	Greg Kroah-Hartman, linux-alpha, linux-hexagon, linux-ia64, linu

On Fri, 17 Jun 2022 at 19:21, Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
>
> Define const_*() variants of the non-atomic bitops to be used when
> the input arguments are compile-time constants, so that the compiler
> will be always able to resolve those to compile-time constants as
> well. Those are mostly direct aliases for generic_*() with one
> exception for const_test_bit(): the original one is declared
> atomic-safe and thus doesn't discard the `volatile` qualifier, so
> in order to let optimize code, define it separately disregarding
> the qualifier.
> Add them to the compile-time type checks as well just in case.
>
> Suggested-by: Marco Elver <elver@google.com>
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>

Reviewed-by: Marco Elver <elver@google.com>

> ---
>  .../asm-generic/bitops/generic-non-atomic.h   | 31 +++++++++++++++++++
>  include/linux/bitops.h                        |  1 +
>  2 files changed, 32 insertions(+)
>
> diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> index b85b8a2ac239..3d5ebd24652b 100644
> --- a/include/asm-generic/bitops/generic-non-atomic.h
> +++ b/include/asm-generic/bitops/generic-non-atomic.h
> @@ -127,4 +127,35 @@ generic_test_bit(unsigned long nr, const volatile unsigned long *addr)
>         return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
>  }
>
> +/*
> + * const_*() definitions provide good compile-time optimizations when
> + * the passed arguments can be resolved at compile time.
> + */
> +#define const___set_bit                        generic___set_bit
> +#define const___clear_bit              generic___clear_bit
> +#define const___change_bit             generic___change_bit
> +#define const___test_and_set_bit       generic___test_and_set_bit
> +#define const___test_and_clear_bit     generic___test_and_clear_bit
> +#define const___test_and_change_bit    generic___test_and_change_bit
> +
> +/**
> + * const_test_bit - Determine whether a bit is set
> + * @nr: bit number to test
> + * @addr: Address to start counting from
> + *
> + * A version of generic_test_bit() which discards the `volatile` qualifier to
> + * allow a compiler to optimize code harder. Non-atomic and to be called only
> + * for testing compile-time constants, e.g. by the corresponding macros, not
> + * directly from "regular" code.
> + */
> +static __always_inline bool
> +const_test_bit(unsigned long nr, const volatile unsigned long *addr)
> +{
> +       const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
> +       unsigned long mask = BIT_MASK(nr);
> +       unsigned long val = *p;
> +
> +       return !!(val & mask);
> +}
> +
>  #endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index 87087454a288..d393297287d5 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -37,6 +37,7 @@ extern unsigned long __sw_hweight64(__u64 w);
>  /* Check that the bitops prototypes are sane */
>  #define __check_bitop_pr(name)                                         \
>         static_assert(__same_type(arch_##name, generic_##name) &&       \
> +                     __same_type(const_##name, generic_##name) &&      \
>                       __same_type(name, generic_##name))
>
>  __check_bitop_pr(__set_bit);
> --
> 2.36.1
>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 5/7] bitops: wrap non-atomic bitops with a transparent macro
  2022-06-17 14:40     ` Alexander Lobakin
  (?)
@ 2022-06-20  9:50       ` Marco Elver
  -1 siblings, 0 replies; 93+ messages in thread
From: Marco Elver @ 2022-06-20  9:50 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Mark Rutland,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Borislav Petkov, Tony Luck, Maciej Fijalkowski, Jesse Brandeburg,
	Greg Kroah-Hartman, linux-alpha, linux-hexagon, linux-ia64,
	linux-m68k, linux-sh, sparclinux, linux-arch, linux-kernel

On Fri, 17 Jun 2022 at 19:19, Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
>
> In preparation for altering the non-atomic bitops with a macro, wrap
> them in a transparent definition. This requires prepending one more
> '_' to their names in order to be able to do that seamlessly. It is
> a simple change, given that all the non-prefixed definitions are now
> in asm-generic.
> sparc32 already has several triple-underscored functions, so I had
> to rename them ('___' -> 'sp32_').
>
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>

Reviewed-by: Marco Elver <elver@google.com>

> ---
>  arch/sparc/include/asm/bitops_32.h            | 18 ++++++------
>  arch/sparc/lib/atomic32.c                     | 12 ++++----
>  .../bitops/instrumented-non-atomic.h          | 28 +++++++++----------
>  .../bitops/non-instrumented-non-atomic.h      | 14 +++++-----
>  include/linux/bitops.h                        | 18 +++++++++++-
>  tools/include/asm-generic/bitops/non-atomic.h | 24 ++++++++--------
>  tools/include/linux/bitops.h                  | 16 +++++++++++
>  7 files changed, 81 insertions(+), 49 deletions(-)
>
> diff --git a/arch/sparc/include/asm/bitops_32.h b/arch/sparc/include/asm/bitops_32.h
> index 889afa9f990f..3448c191b484 100644
> --- a/arch/sparc/include/asm/bitops_32.h
> +++ b/arch/sparc/include/asm/bitops_32.h
> @@ -19,9 +19,9 @@
>  #error only <linux/bitops.h> can be included directly
>  #endif
>
> -unsigned long ___set_bit(unsigned long *addr, unsigned long mask);
> -unsigned long ___clear_bit(unsigned long *addr, unsigned long mask);
> -unsigned long ___change_bit(unsigned long *addr, unsigned long mask);
> +unsigned long sp32___set_bit(unsigned long *addr, unsigned long mask);
> +unsigned long sp32___clear_bit(unsigned long *addr, unsigned long mask);
> +unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask);
>
>  /*
>   * Set bit 'nr' in 32-bit quantity at address 'addr' where bit '0'
> @@ -36,7 +36,7 @@ static inline int test_and_set_bit(unsigned long nr, volatile unsigned long *add
>         ADDR = ((unsigned long *) addr) + (nr >> 5);
>         mask = 1 << (nr & 31);
>
> -       return ___set_bit(ADDR, mask) != 0;
> +       return sp32___set_bit(ADDR, mask) != 0;
>  }
>
>  static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -46,7 +46,7 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
>         ADDR = ((unsigned long *) addr) + (nr >> 5);
>         mask = 1 << (nr & 31);
>
> -       (void) ___set_bit(ADDR, mask);
> +       (void) sp32___set_bit(ADDR, mask);
>  }
>
>  static inline int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -56,7 +56,7 @@ static inline int test_and_clear_bit(unsigned long nr, volatile unsigned long *a
>         ADDR = ((unsigned long *) addr) + (nr >> 5);
>         mask = 1 << (nr & 31);
>
> -       return ___clear_bit(ADDR, mask) != 0;
> +       return sp32___clear_bit(ADDR, mask) != 0;
>  }
>
>  static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -66,7 +66,7 @@ static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
>         ADDR = ((unsigned long *) addr) + (nr >> 5);
>         mask = 1 << (nr & 31);
>
> -       (void) ___clear_bit(ADDR, mask);
> +       (void) sp32___clear_bit(ADDR, mask);
>  }
>
>  static inline int test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -76,7 +76,7 @@ static inline int test_and_change_bit(unsigned long nr, volatile unsigned long *
>         ADDR = ((unsigned long *) addr) + (nr >> 5);
>         mask = 1 << (nr & 31);
>
> -       return ___change_bit(ADDR, mask) != 0;
> +       return sp32___change_bit(ADDR, mask) != 0;
>  }
>
>  static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -86,7 +86,7 @@ static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
>         ADDR = ((unsigned long *) addr) + (nr >> 5);
>         mask = 1 << (nr & 31);
>
> -       (void) ___change_bit(ADDR, mask);
> +       (void) sp32___change_bit(ADDR, mask);
>  }
>
>  #include <asm-generic/bitops/non-atomic.h>
> diff --git a/arch/sparc/lib/atomic32.c b/arch/sparc/lib/atomic32.c
> index 8b81d0f00c97..cf80d1ae352b 100644
> --- a/arch/sparc/lib/atomic32.c
> +++ b/arch/sparc/lib/atomic32.c
> @@ -120,7 +120,7 @@ void arch_atomic_set(atomic_t *v, int i)
>  }
>  EXPORT_SYMBOL(arch_atomic_set);
>
> -unsigned long ___set_bit(unsigned long *addr, unsigned long mask)
> +unsigned long sp32___set_bit(unsigned long *addr, unsigned long mask)
>  {
>         unsigned long old, flags;
>
> @@ -131,9 +131,9 @@ unsigned long ___set_bit(unsigned long *addr, unsigned long mask)
>
>         return old & mask;
>  }
> -EXPORT_SYMBOL(___set_bit);
> +EXPORT_SYMBOL(sp32___set_bit);
>
> -unsigned long ___clear_bit(unsigned long *addr, unsigned long mask)
> +unsigned long sp32___clear_bit(unsigned long *addr, unsigned long mask)
>  {
>         unsigned long old, flags;
>
> @@ -144,9 +144,9 @@ unsigned long ___clear_bit(unsigned long *addr, unsigned long mask)
>
>         return old & mask;
>  }
> -EXPORT_SYMBOL(___clear_bit);
> +EXPORT_SYMBOL(sp32___clear_bit);
>
> -unsigned long ___change_bit(unsigned long *addr, unsigned long mask)
> +unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask)
>  {
>         unsigned long old, flags;
>
> @@ -157,7 +157,7 @@ unsigned long ___change_bit(unsigned long *addr, unsigned long mask)
>
>         return old & mask;
>  }
> -EXPORT_SYMBOL(___change_bit);
> +EXPORT_SYMBOL(sp32___change_bit);
>
>  unsigned long __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
>  {
> diff --git a/include/asm-generic/bitops/instrumented-non-atomic.h b/include/asm-generic/bitops/instrumented-non-atomic.h
> index b019f77ef21c..988a3bbfba34 100644
> --- a/include/asm-generic/bitops/instrumented-non-atomic.h
> +++ b/include/asm-generic/bitops/instrumented-non-atomic.h
> @@ -14,7 +14,7 @@
>  #include <linux/instrumented.h>
>
>  /**
> - * __set_bit - Set a bit in memory
> + * ___set_bit - Set a bit in memory
>   * @nr: the bit to set
>   * @addr: the address to start counting from
>   *
> @@ -23,14 +23,14 @@
>   * succeeds.
>   */
>  static __always_inline void
> -__set_bit(unsigned long nr, volatile unsigned long *addr)
> +___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         instrument_write(addr + BIT_WORD(nr), sizeof(long));
>         arch___set_bit(nr, addr);
>  }
>
>  /**
> - * __clear_bit - Clears a bit in memory
> + * ___clear_bit - Clears a bit in memory
>   * @nr: the bit to clear
>   * @addr: the address to start counting from
>   *
> @@ -39,14 +39,14 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
>   * succeeds.
>   */
>  static __always_inline void
> -__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         instrument_write(addr + BIT_WORD(nr), sizeof(long));
>         arch___clear_bit(nr, addr);
>  }
>
>  /**
> - * __change_bit - Toggle a bit in memory
> + * ___change_bit - Toggle a bit in memory
>   * @nr: the bit to change
>   * @addr: the address to start counting from
>   *
> @@ -55,7 +55,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
>   * succeeds.
>   */
>  static __always_inline void
> -__change_bit(unsigned long nr, volatile unsigned long *addr)
> +___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         instrument_write(addr + BIT_WORD(nr), sizeof(long));
>         arch___change_bit(nr, addr);
> @@ -86,7 +86,7 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
>  }
>
>  /**
> - * __test_and_set_bit - Set a bit and return its old value
> + * ___test_and_set_bit - Set a bit and return its old value
>   * @nr: Bit to set
>   * @addr: Address to count from
>   *
> @@ -94,14 +94,14 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
>   * can appear to succeed but actually fail.
>   */
>  static __always_inline bool
> -__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         __instrument_read_write_bitop(nr, addr);
>         return arch___test_and_set_bit(nr, addr);
>  }
>
>  /**
> - * __test_and_clear_bit - Clear a bit and return its old value
> + * ___test_and_clear_bit - Clear a bit and return its old value
>   * @nr: Bit to clear
>   * @addr: Address to count from
>   *
> @@ -109,14 +109,14 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>   * can appear to succeed but actually fail.
>   */
>  static __always_inline bool
> -__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         __instrument_read_write_bitop(nr, addr);
>         return arch___test_and_clear_bit(nr, addr);
>  }
>
>  /**
> - * __test_and_change_bit - Change a bit and return its old value
> + * ___test_and_change_bit - Change a bit and return its old value
>   * @nr: Bit to change
>   * @addr: Address to count from
>   *
> @@ -124,19 +124,19 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>   * can appear to succeed but actually fail.
>   */
>  static __always_inline bool
> -__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         __instrument_read_write_bitop(nr, addr);
>         return arch___test_and_change_bit(nr, addr);
>  }
>
>  /**
> - * test_bit - Determine whether a bit is set
> + * _test_bit - Determine whether a bit is set
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
>  static __always_inline bool
> -test_bit(unsigned long nr, const volatile unsigned long *addr)
> +_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>         instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long));
>         return arch_test_bit(nr, addr);
> diff --git a/include/asm-generic/bitops/non-instrumented-non-atomic.h b/include/asm-generic/bitops/non-instrumented-non-atomic.h
> index e0fd7bf72a56..bdb9b1ffaee9 100644
> --- a/include/asm-generic/bitops/non-instrumented-non-atomic.h
> +++ b/include/asm-generic/bitops/non-instrumented-non-atomic.h
> @@ -3,14 +3,14 @@
>  #ifndef __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H
>  #define __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H
>
> -#define __set_bit              arch___set_bit
> -#define __clear_bit            arch___clear_bit
> -#define __change_bit           arch___change_bit
> +#define ___set_bit             arch___set_bit
> +#define ___clear_bit           arch___clear_bit
> +#define ___change_bit          arch___change_bit
>
> -#define __test_and_set_bit     arch___test_and_set_bit
> -#define __test_and_clear_bit   arch___test_and_clear_bit
> -#define __test_and_change_bit  arch___test_and_change_bit
> +#define ___test_and_set_bit    arch___test_and_set_bit
> +#define ___test_and_clear_bit  arch___test_and_clear_bit
> +#define ___test_and_change_bit arch___test_and_change_bit
>
> -#define test_bit               arch_test_bit
> +#define _test_bit              arch_test_bit
>
>  #endif /* __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H */
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index d393297287d5..3c3afbae1533 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -26,8 +26,24 @@ extern unsigned int __sw_hweight16(unsigned int w);
>  extern unsigned int __sw_hweight32(unsigned int w);
>  extern unsigned long __sw_hweight64(__u64 w);
>
> +/*
> + * Defined here because those may be needed by architecture-specific static
> + * inlines.
> + */
> +
>  #include <asm-generic/bitops/generic-non-atomic.h>
>
> +#define bitop(op, nr, addr)                                            \
> +       op(nr, addr)
> +
> +#define __set_bit(nr, addr)            bitop(___set_bit, nr, addr)
> +#define __clear_bit(nr, addr)          bitop(___clear_bit, nr, addr)
> +#define __change_bit(nr, addr)         bitop(___change_bit, nr, addr)
> +#define __test_and_set_bit(nr, addr)   bitop(___test_and_set_bit, nr, addr)
> +#define __test_and_clear_bit(nr, addr) bitop(___test_and_clear_bit, nr, addr)
> +#define __test_and_change_bit(nr, addr)        bitop(___test_and_change_bit, nr, addr)
> +#define test_bit(nr, addr)             bitop(_test_bit, nr, addr)
> +
>  /*
>   * Include this here because some architectures need generic_ffs/fls in
>   * scope
> @@ -38,7 +54,7 @@ extern unsigned long __sw_hweight64(__u64 w);
>  #define __check_bitop_pr(name)                                         \
>         static_assert(__same_type(arch_##name, generic_##name) &&       \
>                       __same_type(const_##name, generic_##name) &&      \
> -                     __same_type(name, generic_##name))
> +                     __same_type(_##name, generic_##name))
>
>  __check_bitop_pr(__set_bit);
>  __check_bitop_pr(__clear_bit);
> diff --git a/tools/include/asm-generic/bitops/non-atomic.h b/tools/include/asm-generic/bitops/non-atomic.h
> index e5e78e42e57b..0c472a833408 100644
> --- a/tools/include/asm-generic/bitops/non-atomic.h
> +++ b/tools/include/asm-generic/bitops/non-atomic.h
> @@ -5,7 +5,7 @@
>  #include <linux/bits.h>
>
>  /**
> - * __set_bit - Set a bit in memory
> + * ___set_bit - Set a bit in memory
>   * @nr: the bit to set
>   * @addr: the address to start counting from
>   *
> @@ -14,7 +14,7 @@
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -__set_bit(unsigned long nr, volatile unsigned long *addr)
> +___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         unsigned long mask = BIT_MASK(nr);
>         unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -23,7 +23,7 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>
>  static __always_inline void
> -__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         unsigned long mask = BIT_MASK(nr);
>         unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -32,7 +32,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>
>  /**
> - * __change_bit - Toggle a bit in memory
> + * ___change_bit - Toggle a bit in memory
>   * @nr: the bit to change
>   * @addr: the address to start counting from
>   *
> @@ -41,7 +41,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -__change_bit(unsigned long nr, volatile unsigned long *addr)
> +___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         unsigned long mask = BIT_MASK(nr);
>         unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -50,7 +50,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>
>  /**
> - * __test_and_set_bit - Set a bit and return its old value
> + * ___test_and_set_bit - Set a bit and return its old value
>   * @nr: Bit to set
>   * @addr: Address to count from
>   *
> @@ -59,7 +59,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
>  static __always_inline bool
> -__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         unsigned long mask = BIT_MASK(nr);
>         unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -70,7 +70,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>
>  /**
> - * __test_and_clear_bit - Clear a bit and return its old value
> + * ___test_and_clear_bit - Clear a bit and return its old value
>   * @nr: Bit to clear
>   * @addr: Address to count from
>   *
> @@ -79,7 +79,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
>  static __always_inline bool
> -__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         unsigned long mask = BIT_MASK(nr);
>         unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -91,7 +91,7 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>
>  /* WARNING: non atomic and it can be reordered! */
>  static __always_inline bool
> -__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         unsigned long mask = BIT_MASK(nr);
>         unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -102,12 +102,12 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>
>  /**
> - * test_bit - Determine whether a bit is set
> + * _test_bit - Determine whether a bit is set
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
>  static __always_inline bool
> -test_bit(unsigned long nr, const volatile unsigned long *addr)
> +_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>         return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
>  }
> diff --git a/tools/include/linux/bitops.h b/tools/include/linux/bitops.h
> index 5fca38fe1ba8..f18683b95ea6 100644
> --- a/tools/include/linux/bitops.h
> +++ b/tools/include/linux/bitops.h
> @@ -25,6 +25,22 @@ extern unsigned int __sw_hweight16(unsigned int w);
>  extern unsigned int __sw_hweight32(unsigned int w);
>  extern unsigned long __sw_hweight64(__u64 w);
>
> +/*
> + * Defined here because those may be needed by architecture-specific static
> + * inlines.
> + */
> +
> +#define bitop(op, nr, addr)                                            \
> +       op(nr, addr)
> +
> +#define __set_bit(nr, addr)            bitop(___set_bit, nr, addr)
> +#define __clear_bit(nr, addr)          bitop(___clear_bit, nr, addr)
> +#define __change_bit(nr, addr)         bitop(___change_bit, nr, addr)
> +#define __test_and_set_bit(nr, addr)   bitop(___test_and_set_bit, nr, addr)
> +#define __test_and_clear_bit(nr, addr) bitop(___test_and_clear_bit, nr, addr)
> +#define __test_and_change_bit(nr, addr)        bitop(___test_and_change_bit, nr, addr)
> +#define test_bit(nr, addr)             bitop(_test_bit, nr, addr)
> +
>  /*
>   * Include this here because some architectures need generic_ffs/fls in
>   * scope
> --
> 2.36.1
>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 5/7] bitops: wrap non-atomic bitops with a transparent macro
@ 2022-06-20  9:50       ` Marco Elver
  0 siblings, 0 replies; 93+ messages in thread
From: Marco Elver @ 2022-06-20  9:50 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Mark Rutland,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Borislav Petkov, Tony Luck, Maciej Fijalkowski, Jesse Brandeburg,
	Greg Kroah-Hartman, linux-alpha, linux-hexagon, linux-ia64,
	linux-m68k, linux-sh, sparclinux, linux-arch, linux-kernel

On Fri, 17 Jun 2022 at 19:19, Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
>
> In preparation for altering the non-atomic bitops with a macro, wrap
> them in a transparent definition. This requires prepending one more
> '_' to their names in order to be able to do that seamlessly. It is
> a simple change, given that all the non-prefixed definitions are now
> in asm-generic.
> sparc32 already has several triple-underscored functions, so I had
> to rename them ('___' -> 'sp32_').
>
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>

Reviewed-by: Marco Elver <elver@google.com>

> ---
>  arch/sparc/include/asm/bitops_32.h            | 18 ++++++------
>  arch/sparc/lib/atomic32.c                     | 12 ++++----
>  .../bitops/instrumented-non-atomic.h          | 28 +++++++++----------
>  .../bitops/non-instrumented-non-atomic.h      | 14 +++++-----
>  include/linux/bitops.h                        | 18 +++++++++++-
>  tools/include/asm-generic/bitops/non-atomic.h | 24 ++++++++--------
>  tools/include/linux/bitops.h                  | 16 +++++++++++
>  7 files changed, 81 insertions(+), 49 deletions(-)
>
> diff --git a/arch/sparc/include/asm/bitops_32.h b/arch/sparc/include/asm/bitops_32.h
> index 889afa9f990f..3448c191b484 100644
> --- a/arch/sparc/include/asm/bitops_32.h
> +++ b/arch/sparc/include/asm/bitops_32.h
> @@ -19,9 +19,9 @@
>  #error only <linux/bitops.h> can be included directly
>  #endif
>
> -unsigned long ___set_bit(unsigned long *addr, unsigned long mask);
> -unsigned long ___clear_bit(unsigned long *addr, unsigned long mask);
> -unsigned long ___change_bit(unsigned long *addr, unsigned long mask);
> +unsigned long sp32___set_bit(unsigned long *addr, unsigned long mask);
> +unsigned long sp32___clear_bit(unsigned long *addr, unsigned long mask);
> +unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask);
>
>  /*
>   * Set bit 'nr' in 32-bit quantity at address 'addr' where bit '0'
> @@ -36,7 +36,7 @@ static inline int test_and_set_bit(unsigned long nr, volatile unsigned long *add
>         ADDR = ((unsigned long *) addr) + (nr >> 5);
>         mask = 1 << (nr & 31);
>
> -       return ___set_bit(ADDR, mask) != 0;
> +       return sp32___set_bit(ADDR, mask) != 0;
>  }
>
>  static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -46,7 +46,7 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
>         ADDR = ((unsigned long *) addr) + (nr >> 5);
>         mask = 1 << (nr & 31);
>
> -       (void) ___set_bit(ADDR, mask);
> +       (void) sp32___set_bit(ADDR, mask);
>  }
>
>  static inline int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -56,7 +56,7 @@ static inline int test_and_clear_bit(unsigned long nr, volatile unsigned long *a
>         ADDR = ((unsigned long *) addr) + (nr >> 5);
>         mask = 1 << (nr & 31);
>
> -       return ___clear_bit(ADDR, mask) != 0;
> +       return sp32___clear_bit(ADDR, mask) != 0;
>  }
>
>  static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -66,7 +66,7 @@ static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
>         ADDR = ((unsigned long *) addr) + (nr >> 5);
>         mask = 1 << (nr & 31);
>
> -       (void) ___clear_bit(ADDR, mask);
> +       (void) sp32___clear_bit(ADDR, mask);
>  }
>
>  static inline int test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -76,7 +76,7 @@ static inline int test_and_change_bit(unsigned long nr, volatile unsigned long *
>         ADDR = ((unsigned long *) addr) + (nr >> 5);
>         mask = 1 << (nr & 31);
>
> -       return ___change_bit(ADDR, mask) != 0;
> +       return sp32___change_bit(ADDR, mask) != 0;
>  }
>
>  static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -86,7 +86,7 @@ static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
>         ADDR = ((unsigned long *) addr) + (nr >> 5);
>         mask = 1 << (nr & 31);
>
> -       (void) ___change_bit(ADDR, mask);
> +       (void) sp32___change_bit(ADDR, mask);
>  }
>
>  #include <asm-generic/bitops/non-atomic.h>
> diff --git a/arch/sparc/lib/atomic32.c b/arch/sparc/lib/atomic32.c
> index 8b81d0f00c97..cf80d1ae352b 100644
> --- a/arch/sparc/lib/atomic32.c
> +++ b/arch/sparc/lib/atomic32.c
> @@ -120,7 +120,7 @@ void arch_atomic_set(atomic_t *v, int i)
>  }
>  EXPORT_SYMBOL(arch_atomic_set);
>
> -unsigned long ___set_bit(unsigned long *addr, unsigned long mask)
> +unsigned long sp32___set_bit(unsigned long *addr, unsigned long mask)
>  {
>         unsigned long old, flags;
>
> @@ -131,9 +131,9 @@ unsigned long ___set_bit(unsigned long *addr, unsigned long mask)
>
>         return old & mask;
>  }
> -EXPORT_SYMBOL(___set_bit);
> +EXPORT_SYMBOL(sp32___set_bit);
>
> -unsigned long ___clear_bit(unsigned long *addr, unsigned long mask)
> +unsigned long sp32___clear_bit(unsigned long *addr, unsigned long mask)
>  {
>         unsigned long old, flags;
>
> @@ -144,9 +144,9 @@ unsigned long ___clear_bit(unsigned long *addr, unsigned long mask)
>
>         return old & mask;
>  }
> -EXPORT_SYMBOL(___clear_bit);
> +EXPORT_SYMBOL(sp32___clear_bit);
>
> -unsigned long ___change_bit(unsigned long *addr, unsigned long mask)
> +unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask)
>  {
>         unsigned long old, flags;
>
> @@ -157,7 +157,7 @@ unsigned long ___change_bit(unsigned long *addr, unsigned long mask)
>
>         return old & mask;
>  }
> -EXPORT_SYMBOL(___change_bit);
> +EXPORT_SYMBOL(sp32___change_bit);
>
>  unsigned long __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
>  {
> diff --git a/include/asm-generic/bitops/instrumented-non-atomic.h b/include/asm-generic/bitops/instrumented-non-atomic.h
> index b019f77ef21c..988a3bbfba34 100644
> --- a/include/asm-generic/bitops/instrumented-non-atomic.h
> +++ b/include/asm-generic/bitops/instrumented-non-atomic.h
> @@ -14,7 +14,7 @@
>  #include <linux/instrumented.h>
>
>  /**
> - * __set_bit - Set a bit in memory
> + * ___set_bit - Set a bit in memory
>   * @nr: the bit to set
>   * @addr: the address to start counting from
>   *
> @@ -23,14 +23,14 @@
>   * succeeds.
>   */
>  static __always_inline void
> -__set_bit(unsigned long nr, volatile unsigned long *addr)
> +___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         instrument_write(addr + BIT_WORD(nr), sizeof(long));
>         arch___set_bit(nr, addr);
>  }
>
>  /**
> - * __clear_bit - Clears a bit in memory
> + * ___clear_bit - Clears a bit in memory
>   * @nr: the bit to clear
>   * @addr: the address to start counting from
>   *
> @@ -39,14 +39,14 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
>   * succeeds.
>   */
>  static __always_inline void
> -__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         instrument_write(addr + BIT_WORD(nr), sizeof(long));
>         arch___clear_bit(nr, addr);
>  }
>
>  /**
> - * __change_bit - Toggle a bit in memory
> + * ___change_bit - Toggle a bit in memory
>   * @nr: the bit to change
>   * @addr: the address to start counting from
>   *
> @@ -55,7 +55,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
>   * succeeds.
>   */
>  static __always_inline void
> -__change_bit(unsigned long nr, volatile unsigned long *addr)
> +___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         instrument_write(addr + BIT_WORD(nr), sizeof(long));
>         arch___change_bit(nr, addr);
> @@ -86,7 +86,7 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
>  }
>
>  /**
> - * __test_and_set_bit - Set a bit and return its old value
> + * ___test_and_set_bit - Set a bit and return its old value
>   * @nr: Bit to set
>   * @addr: Address to count from
>   *
> @@ -94,14 +94,14 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
>   * can appear to succeed but actually fail.
>   */
>  static __always_inline bool
> -__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         __instrument_read_write_bitop(nr, addr);
>         return arch___test_and_set_bit(nr, addr);
>  }
>
>  /**
> - * __test_and_clear_bit - Clear a bit and return its old value
> + * ___test_and_clear_bit - Clear a bit and return its old value
>   * @nr: Bit to clear
>   * @addr: Address to count from
>   *
> @@ -109,14 +109,14 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>   * can appear to succeed but actually fail.
>   */
>  static __always_inline bool
> -__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         __instrument_read_write_bitop(nr, addr);
>         return arch___test_and_clear_bit(nr, addr);
>  }
>
>  /**
> - * __test_and_change_bit - Change a bit and return its old value
> + * ___test_and_change_bit - Change a bit and return its old value
>   * @nr: Bit to change
>   * @addr: Address to count from
>   *
> @@ -124,19 +124,19 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>   * can appear to succeed but actually fail.
>   */
>  static __always_inline bool
> -__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         __instrument_read_write_bitop(nr, addr);
>         return arch___test_and_change_bit(nr, addr);
>  }
>
>  /**
> - * test_bit - Determine whether a bit is set
> + * _test_bit - Determine whether a bit is set
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
>  static __always_inline bool
> -test_bit(unsigned long nr, const volatile unsigned long *addr)
> +_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>         instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long));
>         return arch_test_bit(nr, addr);
> diff --git a/include/asm-generic/bitops/non-instrumented-non-atomic.h b/include/asm-generic/bitops/non-instrumented-non-atomic.h
> index e0fd7bf72a56..bdb9b1ffaee9 100644
> --- a/include/asm-generic/bitops/non-instrumented-non-atomic.h
> +++ b/include/asm-generic/bitops/non-instrumented-non-atomic.h
> @@ -3,14 +3,14 @@
>  #ifndef __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H
>  #define __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H
>
> -#define __set_bit              arch___set_bit
> -#define __clear_bit            arch___clear_bit
> -#define __change_bit           arch___change_bit
> +#define ___set_bit             arch___set_bit
> +#define ___clear_bit           arch___clear_bit
> +#define ___change_bit          arch___change_bit
>
> -#define __test_and_set_bit     arch___test_and_set_bit
> -#define __test_and_clear_bit   arch___test_and_clear_bit
> -#define __test_and_change_bit  arch___test_and_change_bit
> +#define ___test_and_set_bit    arch___test_and_set_bit
> +#define ___test_and_clear_bit  arch___test_and_clear_bit
> +#define ___test_and_change_bit arch___test_and_change_bit
>
> -#define test_bit               arch_test_bit
> +#define _test_bit              arch_test_bit
>
>  #endif /* __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H */
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index d393297287d5..3c3afbae1533 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -26,8 +26,24 @@ extern unsigned int __sw_hweight16(unsigned int w);
>  extern unsigned int __sw_hweight32(unsigned int w);
>  extern unsigned long __sw_hweight64(__u64 w);
>
> +/*
> + * Defined here because those may be needed by architecture-specific static
> + * inlines.
> + */
> +
>  #include <asm-generic/bitops/generic-non-atomic.h>
>
> +#define bitop(op, nr, addr)                                            \
> +       op(nr, addr)
> +
> +#define __set_bit(nr, addr)            bitop(___set_bit, nr, addr)
> +#define __clear_bit(nr, addr)          bitop(___clear_bit, nr, addr)
> +#define __change_bit(nr, addr)         bitop(___change_bit, nr, addr)
> +#define __test_and_set_bit(nr, addr)   bitop(___test_and_set_bit, nr, addr)
> +#define __test_and_clear_bit(nr, addr) bitop(___test_and_clear_bit, nr, addr)
> +#define __test_and_change_bit(nr, addr)        bitop(___test_and_change_bit, nr, addr)
> +#define test_bit(nr, addr)             bitop(_test_bit, nr, addr)
> +
>  /*
>   * Include this here because some architectures need generic_ffs/fls in
>   * scope
> @@ -38,7 +54,7 @@ extern unsigned long __sw_hweight64(__u64 w);
>  #define __check_bitop_pr(name)                                         \
>         static_assert(__same_type(arch_##name, generic_##name) &&       \
>                       __same_type(const_##name, generic_##name) &&      \
> -                     __same_type(name, generic_##name))
> +                     __same_type(_##name, generic_##name))
>
>  __check_bitop_pr(__set_bit);
>  __check_bitop_pr(__clear_bit);
> diff --git a/tools/include/asm-generic/bitops/non-atomic.h b/tools/include/asm-generic/bitops/non-atomic.h
> index e5e78e42e57b..0c472a833408 100644
> --- a/tools/include/asm-generic/bitops/non-atomic.h
> +++ b/tools/include/asm-generic/bitops/non-atomic.h
> @@ -5,7 +5,7 @@
>  #include <linux/bits.h>
>
>  /**
> - * __set_bit - Set a bit in memory
> + * ___set_bit - Set a bit in memory
>   * @nr: the bit to set
>   * @addr: the address to start counting from
>   *
> @@ -14,7 +14,7 @@
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -__set_bit(unsigned long nr, volatile unsigned long *addr)
> +___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         unsigned long mask = BIT_MASK(nr);
>         unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -23,7 +23,7 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>
>  static __always_inline void
> -__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         unsigned long mask = BIT_MASK(nr);
>         unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -32,7 +32,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>
>  /**
> - * __change_bit - Toggle a bit in memory
> + * ___change_bit - Toggle a bit in memory
>   * @nr: the bit to change
>   * @addr: the address to start counting from
>   *
> @@ -41,7 +41,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -__change_bit(unsigned long nr, volatile unsigned long *addr)
> +___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         unsigned long mask = BIT_MASK(nr);
>         unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -50,7 +50,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>
>  /**
> - * __test_and_set_bit - Set a bit and return its old value
> + * ___test_and_set_bit - Set a bit and return its old value
>   * @nr: Bit to set
>   * @addr: Address to count from
>   *
> @@ -59,7 +59,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
>  static __always_inline bool
> -__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         unsigned long mask = BIT_MASK(nr);
>         unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -70,7 +70,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>
>  /**
> - * __test_and_clear_bit - Clear a bit and return its old value
> + * ___test_and_clear_bit - Clear a bit and return its old value
>   * @nr: Bit to clear
>   * @addr: Address to count from
>   *
> @@ -79,7 +79,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
>  static __always_inline bool
> -__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         unsigned long mask = BIT_MASK(nr);
>         unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -91,7 +91,7 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>
>  /* WARNING: non atomic and it can be reordered! */
>  static __always_inline bool
> -__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         unsigned long mask = BIT_MASK(nr);
>         unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -102,12 +102,12 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>
>  /**
> - * test_bit - Determine whether a bit is set
> + * _test_bit - Determine whether a bit is set
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
>  static __always_inline bool
> -test_bit(unsigned long nr, const volatile unsigned long *addr)
> +_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>         return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
>  }
> diff --git a/tools/include/linux/bitops.h b/tools/include/linux/bitops.h
> index 5fca38fe1ba8..f18683b95ea6 100644
> --- a/tools/include/linux/bitops.h
> +++ b/tools/include/linux/bitops.h
> @@ -25,6 +25,22 @@ extern unsigned int __sw_hweight16(unsigned int w);
>  extern unsigned int __sw_hweight32(unsigned int w);
>  extern unsigned long __sw_hweight64(__u64 w);
>
> +/*
> + * Defined here because those may be needed by architecture-specific static
> + * inlines.
> + */
> +
> +#define bitop(op, nr, addr)                                            \
> +       op(nr, addr)
> +
> +#define __set_bit(nr, addr)            bitop(___set_bit, nr, addr)
> +#define __clear_bit(nr, addr)          bitop(___clear_bit, nr, addr)
> +#define __change_bit(nr, addr)         bitop(___change_bit, nr, addr)
> +#define __test_and_set_bit(nr, addr)   bitop(___test_and_set_bit, nr, addr)
> +#define __test_and_clear_bit(nr, addr) bitop(___test_and_clear_bit, nr, addr)
> +#define __test_and_change_bit(nr, addr)        bitop(___test_and_change_bit, nr, addr)
> +#define test_bit(nr, addr)             bitop(_test_bit, nr, addr)
> +
>  /*
>   * Include this here because some architectures need generic_ffs/fls in
>   * scope
> --
> 2.36.1
>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 5/7] bitops: wrap non-atomic bitops with a transparent macro
@ 2022-06-20  9:50       ` Marco Elver
  0 siblings, 0 replies; 93+ messages in thread
From: Marco Elver @ 2022-06-20  9:50 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Mark Rutland,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Borislav Petkov, Tony Luck, Maciej Fijalkowski, Jesse Brandeburg,
	Greg Kroah-Hartman, linux-alpha, linux-hexagon, linux-ia64, linu

On Fri, 17 Jun 2022 at 19:19, Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
>
> In preparation for altering the non-atomic bitops with a macro, wrap
> them in a transparent definition. This requires prepending one more
> '_' to their names in order to be able to do that seamlessly. It is
> a simple change, given that all the non-prefixed definitions are now
> in asm-generic.
> sparc32 already has several triple-underscored functions, so I had
> to rename them ('___' -> 'sp32_').
>
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>

Reviewed-by: Marco Elver <elver@google.com>

> ---
>  arch/sparc/include/asm/bitops_32.h            | 18 ++++++------
>  arch/sparc/lib/atomic32.c                     | 12 ++++----
>  .../bitops/instrumented-non-atomic.h          | 28 +++++++++----------
>  .../bitops/non-instrumented-non-atomic.h      | 14 +++++-----
>  include/linux/bitops.h                        | 18 +++++++++++-
>  tools/include/asm-generic/bitops/non-atomic.h | 24 ++++++++--------
>  tools/include/linux/bitops.h                  | 16 +++++++++++
>  7 files changed, 81 insertions(+), 49 deletions(-)
>
> diff --git a/arch/sparc/include/asm/bitops_32.h b/arch/sparc/include/asm/bitops_32.h
> index 889afa9f990f..3448c191b484 100644
> --- a/arch/sparc/include/asm/bitops_32.h
> +++ b/arch/sparc/include/asm/bitops_32.h
> @@ -19,9 +19,9 @@
>  #error only <linux/bitops.h> can be included directly
>  #endif
>
> -unsigned long ___set_bit(unsigned long *addr, unsigned long mask);
> -unsigned long ___clear_bit(unsigned long *addr, unsigned long mask);
> -unsigned long ___change_bit(unsigned long *addr, unsigned long mask);
> +unsigned long sp32___set_bit(unsigned long *addr, unsigned long mask);
> +unsigned long sp32___clear_bit(unsigned long *addr, unsigned long mask);
> +unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask);
>
>  /*
>   * Set bit 'nr' in 32-bit quantity at address 'addr' where bit '0'
> @@ -36,7 +36,7 @@ static inline int test_and_set_bit(unsigned long nr, volatile unsigned long *add
>         ADDR = ((unsigned long *) addr) + (nr >> 5);
>         mask = 1 << (nr & 31);
>
> -       return ___set_bit(ADDR, mask) != 0;
> +       return sp32___set_bit(ADDR, mask) != 0;
>  }
>
>  static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -46,7 +46,7 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
>         ADDR = ((unsigned long *) addr) + (nr >> 5);
>         mask = 1 << (nr & 31);
>
> -       (void) ___set_bit(ADDR, mask);
> +       (void) sp32___set_bit(ADDR, mask);
>  }
>
>  static inline int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -56,7 +56,7 @@ static inline int test_and_clear_bit(unsigned long nr, volatile unsigned long *a
>         ADDR = ((unsigned long *) addr) + (nr >> 5);
>         mask = 1 << (nr & 31);
>
> -       return ___clear_bit(ADDR, mask) != 0;
> +       return sp32___clear_bit(ADDR, mask) != 0;
>  }
>
>  static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -66,7 +66,7 @@ static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
>         ADDR = ((unsigned long *) addr) + (nr >> 5);
>         mask = 1 << (nr & 31);
>
> -       (void) ___clear_bit(ADDR, mask);
> +       (void) sp32___clear_bit(ADDR, mask);
>  }
>
>  static inline int test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -76,7 +76,7 @@ static inline int test_and_change_bit(unsigned long nr, volatile unsigned long *
>         ADDR = ((unsigned long *) addr) + (nr >> 5);
>         mask = 1 << (nr & 31);
>
> -       return ___change_bit(ADDR, mask) != 0;
> +       return sp32___change_bit(ADDR, mask) != 0;
>  }
>
>  static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -86,7 +86,7 @@ static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
>         ADDR = ((unsigned long *) addr) + (nr >> 5);
>         mask = 1 << (nr & 31);
>
> -       (void) ___change_bit(ADDR, mask);
> +       (void) sp32___change_bit(ADDR, mask);
>  }
>
>  #include <asm-generic/bitops/non-atomic.h>
> diff --git a/arch/sparc/lib/atomic32.c b/arch/sparc/lib/atomic32.c
> index 8b81d0f00c97..cf80d1ae352b 100644
> --- a/arch/sparc/lib/atomic32.c
> +++ b/arch/sparc/lib/atomic32.c
> @@ -120,7 +120,7 @@ void arch_atomic_set(atomic_t *v, int i)
>  }
>  EXPORT_SYMBOL(arch_atomic_set);
>
> -unsigned long ___set_bit(unsigned long *addr, unsigned long mask)
> +unsigned long sp32___set_bit(unsigned long *addr, unsigned long mask)
>  {
>         unsigned long old, flags;
>
> @@ -131,9 +131,9 @@ unsigned long ___set_bit(unsigned long *addr, unsigned long mask)
>
>         return old & mask;
>  }
> -EXPORT_SYMBOL(___set_bit);
> +EXPORT_SYMBOL(sp32___set_bit);
>
> -unsigned long ___clear_bit(unsigned long *addr, unsigned long mask)
> +unsigned long sp32___clear_bit(unsigned long *addr, unsigned long mask)
>  {
>         unsigned long old, flags;
>
> @@ -144,9 +144,9 @@ unsigned long ___clear_bit(unsigned long *addr, unsigned long mask)
>
>         return old & mask;
>  }
> -EXPORT_SYMBOL(___clear_bit);
> +EXPORT_SYMBOL(sp32___clear_bit);
>
> -unsigned long ___change_bit(unsigned long *addr, unsigned long mask)
> +unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask)
>  {
>         unsigned long old, flags;
>
> @@ -157,7 +157,7 @@ unsigned long ___change_bit(unsigned long *addr, unsigned long mask)
>
>         return old & mask;
>  }
> -EXPORT_SYMBOL(___change_bit);
> +EXPORT_SYMBOL(sp32___change_bit);
>
>  unsigned long __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
>  {
> diff --git a/include/asm-generic/bitops/instrumented-non-atomic.h b/include/asm-generic/bitops/instrumented-non-atomic.h
> index b019f77ef21c..988a3bbfba34 100644
> --- a/include/asm-generic/bitops/instrumented-non-atomic.h
> +++ b/include/asm-generic/bitops/instrumented-non-atomic.h
> @@ -14,7 +14,7 @@
>  #include <linux/instrumented.h>
>
>  /**
> - * __set_bit - Set a bit in memory
> + * ___set_bit - Set a bit in memory
>   * @nr: the bit to set
>   * @addr: the address to start counting from
>   *
> @@ -23,14 +23,14 @@
>   * succeeds.
>   */
>  static __always_inline void
> -__set_bit(unsigned long nr, volatile unsigned long *addr)
> +___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         instrument_write(addr + BIT_WORD(nr), sizeof(long));
>         arch___set_bit(nr, addr);
>  }
>
>  /**
> - * __clear_bit - Clears a bit in memory
> + * ___clear_bit - Clears a bit in memory
>   * @nr: the bit to clear
>   * @addr: the address to start counting from
>   *
> @@ -39,14 +39,14 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
>   * succeeds.
>   */
>  static __always_inline void
> -__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         instrument_write(addr + BIT_WORD(nr), sizeof(long));
>         arch___clear_bit(nr, addr);
>  }
>
>  /**
> - * __change_bit - Toggle a bit in memory
> + * ___change_bit - Toggle a bit in memory
>   * @nr: the bit to change
>   * @addr: the address to start counting from
>   *
> @@ -55,7 +55,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
>   * succeeds.
>   */
>  static __always_inline void
> -__change_bit(unsigned long nr, volatile unsigned long *addr)
> +___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         instrument_write(addr + BIT_WORD(nr), sizeof(long));
>         arch___change_bit(nr, addr);
> @@ -86,7 +86,7 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
>  }
>
>  /**
> - * __test_and_set_bit - Set a bit and return its old value
> + * ___test_and_set_bit - Set a bit and return its old value
>   * @nr: Bit to set
>   * @addr: Address to count from
>   *
> @@ -94,14 +94,14 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
>   * can appear to succeed but actually fail.
>   */
>  static __always_inline bool
> -__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         __instrument_read_write_bitop(nr, addr);
>         return arch___test_and_set_bit(nr, addr);
>  }
>
>  /**
> - * __test_and_clear_bit - Clear a bit and return its old value
> + * ___test_and_clear_bit - Clear a bit and return its old value
>   * @nr: Bit to clear
>   * @addr: Address to count from
>   *
> @@ -109,14 +109,14 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>   * can appear to succeed but actually fail.
>   */
>  static __always_inline bool
> -__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         __instrument_read_write_bitop(nr, addr);
>         return arch___test_and_clear_bit(nr, addr);
>  }
>
>  /**
> - * __test_and_change_bit - Change a bit and return its old value
> + * ___test_and_change_bit - Change a bit and return its old value
>   * @nr: Bit to change
>   * @addr: Address to count from
>   *
> @@ -124,19 +124,19 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>   * can appear to succeed but actually fail.
>   */
>  static __always_inline bool
> -__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         __instrument_read_write_bitop(nr, addr);
>         return arch___test_and_change_bit(nr, addr);
>  }
>
>  /**
> - * test_bit - Determine whether a bit is set
> + * _test_bit - Determine whether a bit is set
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
>  static __always_inline bool
> -test_bit(unsigned long nr, const volatile unsigned long *addr)
> +_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>         instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long));
>         return arch_test_bit(nr, addr);
> diff --git a/include/asm-generic/bitops/non-instrumented-non-atomic.h b/include/asm-generic/bitops/non-instrumented-non-atomic.h
> index e0fd7bf72a56..bdb9b1ffaee9 100644
> --- a/include/asm-generic/bitops/non-instrumented-non-atomic.h
> +++ b/include/asm-generic/bitops/non-instrumented-non-atomic.h
> @@ -3,14 +3,14 @@
>  #ifndef __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H
>  #define __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H
>
> -#define __set_bit              arch___set_bit
> -#define __clear_bit            arch___clear_bit
> -#define __change_bit           arch___change_bit
> +#define ___set_bit             arch___set_bit
> +#define ___clear_bit           arch___clear_bit
> +#define ___change_bit          arch___change_bit
>
> -#define __test_and_set_bit     arch___test_and_set_bit
> -#define __test_and_clear_bit   arch___test_and_clear_bit
> -#define __test_and_change_bit  arch___test_and_change_bit
> +#define ___test_and_set_bit    arch___test_and_set_bit
> +#define ___test_and_clear_bit  arch___test_and_clear_bit
> +#define ___test_and_change_bit arch___test_and_change_bit
>
> -#define test_bit               arch_test_bit
> +#define _test_bit              arch_test_bit
>
>  #endif /* __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H */
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index d393297287d5..3c3afbae1533 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -26,8 +26,24 @@ extern unsigned int __sw_hweight16(unsigned int w);
>  extern unsigned int __sw_hweight32(unsigned int w);
>  extern unsigned long __sw_hweight64(__u64 w);
>
> +/*
> + * Defined here because those may be needed by architecture-specific static
> + * inlines.
> + */
> +
>  #include <asm-generic/bitops/generic-non-atomic.h>
>
> +#define bitop(op, nr, addr)                                            \
> +       op(nr, addr)
> +
> +#define __set_bit(nr, addr)            bitop(___set_bit, nr, addr)
> +#define __clear_bit(nr, addr)          bitop(___clear_bit, nr, addr)
> +#define __change_bit(nr, addr)         bitop(___change_bit, nr, addr)
> +#define __test_and_set_bit(nr, addr)   bitop(___test_and_set_bit, nr, addr)
> +#define __test_and_clear_bit(nr, addr) bitop(___test_and_clear_bit, nr, addr)
> +#define __test_and_change_bit(nr, addr)        bitop(___test_and_change_bit, nr, addr)
> +#define test_bit(nr, addr)             bitop(_test_bit, nr, addr)
> +
>  /*
>   * Include this here because some architectures need generic_ffs/fls in
>   * scope
> @@ -38,7 +54,7 @@ extern unsigned long __sw_hweight64(__u64 w);
>  #define __check_bitop_pr(name)                                         \
>         static_assert(__same_type(arch_##name, generic_##name) &&       \
>                       __same_type(const_##name, generic_##name) &&      \
> -                     __same_type(name, generic_##name))
> +                     __same_type(_##name, generic_##name))
>
>  __check_bitop_pr(__set_bit);
>  __check_bitop_pr(__clear_bit);
> diff --git a/tools/include/asm-generic/bitops/non-atomic.h b/tools/include/asm-generic/bitops/non-atomic.h
> index e5e78e42e57b..0c472a833408 100644
> --- a/tools/include/asm-generic/bitops/non-atomic.h
> +++ b/tools/include/asm-generic/bitops/non-atomic.h
> @@ -5,7 +5,7 @@
>  #include <linux/bits.h>
>
>  /**
> - * __set_bit - Set a bit in memory
> + * ___set_bit - Set a bit in memory
>   * @nr: the bit to set
>   * @addr: the address to start counting from
>   *
> @@ -14,7 +14,7 @@
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -__set_bit(unsigned long nr, volatile unsigned long *addr)
> +___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         unsigned long mask = BIT_MASK(nr);
>         unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -23,7 +23,7 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>
>  static __always_inline void
> -__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         unsigned long mask = BIT_MASK(nr);
>         unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -32,7 +32,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>
>  /**
> - * __change_bit - Toggle a bit in memory
> + * ___change_bit - Toggle a bit in memory
>   * @nr: the bit to change
>   * @addr: the address to start counting from
>   *
> @@ -41,7 +41,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -__change_bit(unsigned long nr, volatile unsigned long *addr)
> +___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         unsigned long mask = BIT_MASK(nr);
>         unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -50,7 +50,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>
>  /**
> - * __test_and_set_bit - Set a bit and return its old value
> + * ___test_and_set_bit - Set a bit and return its old value
>   * @nr: Bit to set
>   * @addr: Address to count from
>   *
> @@ -59,7 +59,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
>  static __always_inline bool
> -__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         unsigned long mask = BIT_MASK(nr);
>         unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -70,7 +70,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>
>  /**
> - * __test_and_clear_bit - Clear a bit and return its old value
> + * ___test_and_clear_bit - Clear a bit and return its old value
>   * @nr: Bit to clear
>   * @addr: Address to count from
>   *
> @@ -79,7 +79,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
>  static __always_inline bool
> -__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         unsigned long mask = BIT_MASK(nr);
>         unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -91,7 +91,7 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>
>  /* WARNING: non atomic and it can be reordered! */
>  static __always_inline bool
> -__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>         unsigned long mask = BIT_MASK(nr);
>         unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -102,12 +102,12 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>
>  /**
> - * test_bit - Determine whether a bit is set
> + * _test_bit - Determine whether a bit is set
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
>  static __always_inline bool
> -test_bit(unsigned long nr, const volatile unsigned long *addr)
> +_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>         return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
>  }
> diff --git a/tools/include/linux/bitops.h b/tools/include/linux/bitops.h
> index 5fca38fe1ba8..f18683b95ea6 100644
> --- a/tools/include/linux/bitops.h
> +++ b/tools/include/linux/bitops.h
> @@ -25,6 +25,22 @@ extern unsigned int __sw_hweight16(unsigned int w);
>  extern unsigned int __sw_hweight32(unsigned int w);
>  extern unsigned long __sw_hweight64(__u64 w);
>
> +/*
> + * Defined here because those may be needed by architecture-specific static
> + * inlines.
> + */
> +
> +#define bitop(op, nr, addr)                                            \
> +       op(nr, addr)
> +
> +#define __set_bit(nr, addr)            bitop(___set_bit, nr, addr)
> +#define __clear_bit(nr, addr)          bitop(___clear_bit, nr, addr)
> +#define __change_bit(nr, addr)         bitop(___change_bit, nr, addr)
> +#define __test_and_set_bit(nr, addr)   bitop(___test_and_set_bit, nr, addr)
> +#define __test_and_clear_bit(nr, addr) bitop(___test_and_clear_bit, nr, addr)
> +#define __test_and_change_bit(nr, addr)        bitop(___test_and_change_bit, nr, addr)
> +#define test_bit(nr, addr)             bitop(_test_bit, nr, addr)
> +
>  /*
>   * Include this here because some architectures need generic_ffs/fls in
>   * scope
> --
> 2.36.1
>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 6/7] bitops: let optimize out non-atomic bitops on compile-time constants
  2022-06-17 14:40     ` Alexander Lobakin
  (?)
@ 2022-06-20  9:51       ` Marco Elver
  -1 siblings, 0 replies; 93+ messages in thread
From: Marco Elver @ 2022-06-20  9:51 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Mark Rutland,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Borislav Petkov, Tony Luck, Maciej Fijalkowski, Jesse Brandeburg,
	Greg Kroah-Hartman, linux-alpha, linux-hexagon, linux-ia64,
	linux-m68k, linux-sh, sparclinux, linux-arch, linux-kernel

On Fri, 17 Jun 2022 at 19:00, Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
>
> Currently, many architecture-specific non-atomic bitop
> implementations use inline asm or other hacks which are faster or
> more robust when working with "real" variables (i.e. fields from
> the structures etc.), but the compilers have no clue how to optimize
> them out when called on compile-time constants. That said, the
> following code:
>
>         DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
>         unsigned long bar = BIT(BAR_BIT);
>         unsigned long baz = 0;
>
>         __set_bit(FOO_BIT, foo);
>         baz |= BIT(BAZ_BIT);
>
>         BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
>         BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
>         BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
>
> triggers the first assertion on x86_64, which means that the
> compiler is unable to evaluate it to a compile-time initializer
> when the architecture-specific bitop is used even if it's obvious.
> In order to let the compiler optimize out such cases, expand the
> bitop() macro to use the "constant" C non-atomic bitop
> implementations when all of the arguments passed are compile-time
> constants, which means that the result will be a compile-time
> constant as well, so that it produces more efficient and simple
> code in 100% cases, comparing to the architecture-specific
> counterparts.
>
> The savings are architecture, compiler and compiler flags dependent,
> for example, on x86_64 -O2:
>
> GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
>
> and ARM64 (courtesy of Mark):
>
> GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>

Reviewed-by: Marco Elver <elver@google.com>

> ---
>  include/linux/bitops.h | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index 3c3afbae1533..26a43360c4ae 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -33,8 +33,24 @@ extern unsigned long __sw_hweight64(__u64 w);
>
>  #include <asm-generic/bitops/generic-non-atomic.h>
>
> +/*
> + * Many architecture-specific non-atomic bitops contain inline asm code and due
> + * to that the compiler can't optimize them to compile-time expressions or
> + * constants. In contrary, gen_*() helpers are defined in pure C and compilers
> + * optimize them just well.
> + * Therefore, to make `unsigned long foo = 0; __set_bit(BAR, &foo)` effectively
> + * equal to `unsigned long foo = BIT(BAR)`, pick the generic C alternative when
> + * the arguments can be resolved at compile time. That expression itself is a
> + * constant and doesn't bring any functional changes to the rest of cases.
> + * The casts to `uintptr_t` are needed to mitigate `-Waddress` warnings when
> + * passing a bitmap from .bss or .data (-> `!!addr` is always true).
> + */
>  #define bitop(op, nr, addr)                                            \
> -       op(nr, addr)
> +       ((__builtin_constant_p(nr) &&                                   \
> +         __builtin_constant_p((uintptr_t)(addr) != (uintptr_t)NULL) && \
> +         (uintptr_t)(addr) != (uintptr_t)NULL &&                       \
> +         __builtin_constant_p(*(const unsigned long *)(addr))) ?       \
> +        const##op(nr, addr) : op(nr, addr))
>
>  #define __set_bit(nr, addr)            bitop(___set_bit, nr, addr)
>  #define __clear_bit(nr, addr)          bitop(___clear_bit, nr, addr)
> --
> 2.36.1
>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 6/7] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-20  9:51       ` Marco Elver
  0 siblings, 0 replies; 93+ messages in thread
From: Marco Elver @ 2022-06-20  9:51 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Mark Rutland,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Borislav Petkov, Tony Luck, Maciej Fijalkowski, Jesse Brandeburg,
	Greg Kroah-Hartman, linux-alpha, linux-hexagon, linux-ia64,
	linux-m68k, linux-sh, sparclinux, linux-arch, linux-kernel

On Fri, 17 Jun 2022 at 19:00, Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
>
> Currently, many architecture-specific non-atomic bitop
> implementations use inline asm or other hacks which are faster or
> more robust when working with "real" variables (i.e. fields from
> the structures etc.), but the compilers have no clue how to optimize
> them out when called on compile-time constants. That said, the
> following code:
>
>         DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
>         unsigned long bar = BIT(BAR_BIT);
>         unsigned long baz = 0;
>
>         __set_bit(FOO_BIT, foo);
>         baz |= BIT(BAZ_BIT);
>
>         BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
>         BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
>         BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
>
> triggers the first assertion on x86_64, which means that the
> compiler is unable to evaluate it to a compile-time initializer
> when the architecture-specific bitop is used even if it's obvious.
> In order to let the compiler optimize out such cases, expand the
> bitop() macro to use the "constant" C non-atomic bitop
> implementations when all of the arguments passed are compile-time
> constants, which means that the result will be a compile-time
> constant as well, so that it produces more efficient and simple
> code in 100% cases, comparing to the architecture-specific
> counterparts.
>
> The savings are architecture, compiler and compiler flags dependent,
> for example, on x86_64 -O2:
>
> GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
>
> and ARM64 (courtesy of Mark):
>
> GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>

Reviewed-by: Marco Elver <elver@google.com>

> ---
>  include/linux/bitops.h | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index 3c3afbae1533..26a43360c4ae 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -33,8 +33,24 @@ extern unsigned long __sw_hweight64(__u64 w);
>
>  #include <asm-generic/bitops/generic-non-atomic.h>
>
> +/*
> + * Many architecture-specific non-atomic bitops contain inline asm code and due
> + * to that the compiler can't optimize them to compile-time expressions or
> + * constants. In contrary, gen_*() helpers are defined in pure C and compilers
> + * optimize them just well.
> + * Therefore, to make `unsigned long foo = 0; __set_bit(BAR, &foo)` effectively
> + * equal to `unsigned long foo = BIT(BAR)`, pick the generic C alternative when
> + * the arguments can be resolved at compile time. That expression itself is a
> + * constant and doesn't bring any functional changes to the rest of cases.
> + * The casts to `uintptr_t` are needed to mitigate `-Waddress` warnings when
> + * passing a bitmap from .bss or .data (-> `!!addr` is always true).
> + */
>  #define bitop(op, nr, addr)                                            \
> -       op(nr, addr)
> +       ((__builtin_constant_p(nr) &&                                   \
> +         __builtin_constant_p((uintptr_t)(addr) != (uintptr_t)NULL) && \
> +         (uintptr_t)(addr) != (uintptr_t)NULL &&                       \
> +         __builtin_constant_p(*(const unsigned long *)(addr))) ?       \
> +        const##op(nr, addr) : op(nr, addr))
>
>  #define __set_bit(nr, addr)            bitop(___set_bit, nr, addr)
>  #define __clear_bit(nr, addr)          bitop(___clear_bit, nr, addr)
> --
> 2.36.1
>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 6/7] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-20  9:51       ` Marco Elver
  0 siblings, 0 replies; 93+ messages in thread
From: Marco Elver @ 2022-06-20  9:51 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Mark Rutland,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Borislav Petkov, Tony Luck, Maciej Fijalkowski, Jesse Brandeburg,
	Greg Kroah-Hartman, linux-alpha, linux-hexagon, linux-ia64, linu

On Fri, 17 Jun 2022 at 19:00, Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
>
> Currently, many architecture-specific non-atomic bitop
> implementations use inline asm or other hacks which are faster or
> more robust when working with "real" variables (i.e. fields from
> the structures etc.), but the compilers have no clue how to optimize
> them out when called on compile-time constants. That said, the
> following code:
>
>         DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
>         unsigned long bar = BIT(BAR_BIT);
>         unsigned long baz = 0;
>
>         __set_bit(FOO_BIT, foo);
>         baz |= BIT(BAZ_BIT);
>
>         BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
>         BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
>         BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
>
> triggers the first assertion on x86_64, which means that the
> compiler is unable to evaluate it to a compile-time initializer
> when the architecture-specific bitop is used even if it's obvious.
> In order to let the compiler optimize out such cases, expand the
> bitop() macro to use the "constant" C non-atomic bitop
> implementations when all of the arguments passed are compile-time
> constants, which means that the result will be a compile-time
> constant as well, so that it produces more efficient and simple
> code in 100% cases, comparing to the architecture-specific
> counterparts.
>
> The savings are architecture, compiler and compiler flags dependent,
> for example, on x86_64 -O2:
>
> GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
>
> and ARM64 (courtesy of Mark):
>
> GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>

Reviewed-by: Marco Elver <elver@google.com>

> ---
>  include/linux/bitops.h | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index 3c3afbae1533..26a43360c4ae 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -33,8 +33,24 @@ extern unsigned long __sw_hweight64(__u64 w);
>
>  #include <asm-generic/bitops/generic-non-atomic.h>
>
> +/*
> + * Many architecture-specific non-atomic bitops contain inline asm code and due
> + * to that the compiler can't optimize them to compile-time expressions or
> + * constants. In contrary, gen_*() helpers are defined in pure C and compilers
> + * optimize them just well.
> + * Therefore, to make `unsigned long foo = 0; __set_bit(BAR, &foo)` effectively
> + * equal to `unsigned long foo = BIT(BAR)`, pick the generic C alternative when
> + * the arguments can be resolved at compile time. That expression itself is a
> + * constant and doesn't bring any functional changes to the rest of cases.
> + * The casts to `uintptr_t` are needed to mitigate `-Waddress` warnings when
> + * passing a bitmap from .bss or .data (-> `!!addr` is always true).
> + */
>  #define bitop(op, nr, addr)                                            \
> -       op(nr, addr)
> +       ((__builtin_constant_p(nr) &&                                   \
> +         __builtin_constant_p((uintptr_t)(addr) != (uintptr_t)NULL) && \
> +         (uintptr_t)(addr) != (uintptr_t)NULL &&                       \
> +         __builtin_constant_p(*(const unsigned long *)(addr))) ?       \
> +        const##op(nr, addr) : op(nr, addr))
>
>  #define __set_bit(nr, addr)            bitop(___set_bit, nr, addr)
>  #define __clear_bit(nr, addr)          bitop(___clear_bit, nr, addr)
> --
> 2.36.1
>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 3/7] bitops: unify non-atomic bitops prototypes across architectures
  2022-06-17 14:40     ` Alexander Lobakin
  (?)
@ 2022-06-20 10:02       ` Andy Shevchenko
  -1 siblings, 0 replies; 93+ messages in thread
From: Andy Shevchenko @ 2022-06-20 10:02 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Mark Rutland, Matt Turner, Brian Cain,
	Geert Uytterhoeven, Yoshinori Sato, Rich Felker, David S. Miller,
	Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

On Fri, Jun 17, 2022 at 04:40:27PM +0200, Alexander Lobakin wrote:
> Currently, there is a mess with the prototypes of the non-atomic
> bitops across the different architectures:
> 
> ret	bool, int, unsigned long
> nr	int, long, unsigned int, unsigned long
> addr	volatile unsigned long *, volatile void *
> 
> Thankfully, it doesn't provoke any bugs, but can sometimes make
> the compiler angry when it's not handy at all.
> Adjust all the prototypes to the following standard:
> 
> ret	bool				retval can be only 0 or 1
> nr	unsigned long			native; signed makes no sense
> addr	volatile unsigned long *	bitmaps are arrays of ulongs
> 
> Next, some architectures don't define 'arch_' versions as they don't
> support instrumentation, others do. To make sure there is always the
> same set of callables present and to ease any potential future
> changes, make them all follow the rule:
>  * architecture-specific files define only 'arch_' versions;
>  * non-prefixed versions can be defined only in asm-generic files;
> and place the non-prefixed definitions into a new file in
> asm-generic to be included by non-instrumented architectures.
> 
> Finally, add some static assertions in order to prevent people from
> making a mess in this room again.
> I also used the %__always_inline attribute consistently, so that
> they always get resolved to the actual operations.
> 
> Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> Acked-by: Mark Rutland <mark.rutland@arm.com>
> Reviewed-by: Yury Norov <yury.norov@gmail.com>

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Thanks for doing this!

> ---
>  arch/alpha/include/asm/bitops.h               | 32 ++++++------
>  arch/hexagon/include/asm/bitops.h             | 24 +++++----
>  arch/ia64/include/asm/bitops.h                | 42 ++++++++--------
>  arch/m68k/include/asm/bitops.h                | 49 +++++++++++++------
>  arch/sh/include/asm/bitops-op32.h             | 34 ++++++++-----
>  arch/x86/include/asm/bitops.h                 | 22 +++++----
>  .../asm-generic/bitops/generic-non-atomic.h   | 24 ++++-----
>  .../bitops/instrumented-non-atomic.h          | 21 +++++---
>  include/asm-generic/bitops/non-atomic.h       | 13 +----
>  .../bitops/non-instrumented-non-atomic.h      | 16 ++++++
>  include/linux/bitops.h                        | 17 +++++++
>  tools/include/asm-generic/bitops/non-atomic.h | 24 +++++----
>  12 files changed, 198 insertions(+), 120 deletions(-)
>  create mode 100644 include/asm-generic/bitops/non-instrumented-non-atomic.h
> 
> diff --git a/arch/alpha/include/asm/bitops.h b/arch/alpha/include/asm/bitops.h
> index e1d8483a45f2..492c7713ddae 100644
> --- a/arch/alpha/include/asm/bitops.h
> +++ b/arch/alpha/include/asm/bitops.h
> @@ -46,8 +46,8 @@ set_bit(unsigned long nr, volatile void * addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static inline void
> -__set_bit(unsigned long nr, volatile void * addr)
> +static __always_inline void
> +arch___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	int *m = ((int *) addr) + (nr >> 5);
>  
> @@ -82,8 +82,8 @@ clear_bit_unlock(unsigned long nr, volatile void * addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static __inline__ void
> -__clear_bit(unsigned long nr, volatile void * addr)
> +static __always_inline void
> +arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	int *m = ((int *) addr) + (nr >> 5);
>  
> @@ -94,7 +94,7 @@ static inline void
>  __clear_bit_unlock(unsigned long nr, volatile void * addr)
>  {
>  	smp_mb();
> -	__clear_bit(nr, addr);
> +	arch___clear_bit(nr, addr);
>  }
>  
>  static inline void
> @@ -118,8 +118,8 @@ change_bit(unsigned long nr, volatile void * addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static __inline__ void
> -__change_bit(unsigned long nr, volatile void * addr)
> +static __always_inline void
> +arch___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	int *m = ((int *) addr) + (nr >> 5);
>  
> @@ -186,8 +186,8 @@ test_and_set_bit_lock(unsigned long nr, volatile void *addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static inline int
> -__test_and_set_bit(unsigned long nr, volatile void * addr)
> +static __always_inline bool
> +arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = 1 << (nr & 0x1f);
>  	int *m = ((int *) addr) + (nr >> 5);
> @@ -230,8 +230,8 @@ test_and_clear_bit(unsigned long nr, volatile void * addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static inline int
> -__test_and_clear_bit(unsigned long nr, volatile void * addr)
> +static __always_inline bool
> +arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = 1 << (nr & 0x1f);
>  	int *m = ((int *) addr) + (nr >> 5);
> @@ -272,8 +272,8 @@ test_and_change_bit(unsigned long nr, volatile void * addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static __inline__ int
> -__test_and_change_bit(unsigned long nr, volatile void * addr)
> +static __always_inline bool
> +arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = 1 << (nr & 0x1f);
>  	int *m = ((int *) addr) + (nr >> 5);
> @@ -283,8 +283,8 @@ __test_and_change_bit(unsigned long nr, volatile void * addr)
>  	return (old & mask) != 0;
>  }
>  
> -static inline int
> -test_bit(int nr, const volatile void * addr)
> +static __always_inline bool
> +arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return (1UL & (((const int *) addr)[nr >> 5] >> (nr & 31))) != 0UL;
>  }
> @@ -450,6 +450,8 @@ sched_find_first_bit(const unsigned long b[2])
>  	return __ffs(tmp) + ofs;
>  }
>  
> +#include <asm-generic/bitops/non-instrumented-non-atomic.h>
> +
>  #include <asm-generic/bitops/le.h>
>  
>  #include <asm-generic/bitops/ext2-atomic-setbit.h>
> diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h
> index 75d6ba3643b8..da500471ac73 100644
> --- a/arch/hexagon/include/asm/bitops.h
> +++ b/arch/hexagon/include/asm/bitops.h
> @@ -127,38 +127,45 @@ static inline void change_bit(int nr, volatile void *addr)
>   * be atomic, particularly for things like slab_lock and slab_unlock.
>   *
>   */
> -static inline void __clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	test_and_clear_bit(nr, addr);
>  }
>  
> -static inline void __set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +arch___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	test_and_set_bit(nr, addr);
>  }
>  
> -static inline void __change_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +arch___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	test_and_change_bit(nr, addr);
>  }
>  
>  /*  Apparently, at least some of these are allowed to be non-atomic  */
> -static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_clear_bit(nr, addr);
>  }
>  
> -static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_set_bit(nr, addr);
>  }
>  
> -static inline int __test_and_change_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_change_bit(nr, addr);
>  }
>  
> -static inline int __test_bit(int nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	int retval;
>  
> @@ -172,8 +179,6 @@ static inline int __test_bit(int nr, const volatile unsigned long *addr)
>  	return retval;
>  }
>  
> -#define test_bit(nr, addr) __test_bit(nr, addr)
> -
>  /*
>   * ffz - find first zero in word.
>   * @word: The word to search
> @@ -271,6 +276,7 @@ static inline unsigned long __fls(unsigned long word)
>  }
>  
>  #include <asm-generic/bitops/lock.h>
> +#include <asm-generic/bitops/non-instrumented-non-atomic.h>
>  
>  #include <asm-generic/bitops/fls64.h>
>  #include <asm-generic/bitops/sched.h>
> diff --git a/arch/ia64/include/asm/bitops.h b/arch/ia64/include/asm/bitops.h
> index 577be93c0818..9f62af7fd7c4 100644
> --- a/arch/ia64/include/asm/bitops.h
> +++ b/arch/ia64/include/asm/bitops.h
> @@ -53,7 +53,7 @@ set_bit (int nr, volatile void *addr)
>  }
>  
>  /**
> - * __set_bit - Set a bit in memory
> + * arch___set_bit - Set a bit in memory
>   * @nr: the bit to set
>   * @addr: the address to start counting from
>   *
> @@ -61,8 +61,8 @@ set_bit (int nr, volatile void *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static __inline__ void
> -__set_bit (int nr, volatile void *addr)
> +static __always_inline void
> +arch___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	*((__u32 *) addr + (nr >> 5)) |= (1 << (nr & 31));
>  }
> @@ -135,7 +135,7 @@ __clear_bit_unlock(int nr, void *addr)
>  }
>  
>  /**
> - * __clear_bit - Clears a bit in memory (non-atomic version)
> + * arch___clear_bit - Clears a bit in memory (non-atomic version)
>   * @nr: the bit to clear
>   * @addr: the address to start counting from
>   *
> @@ -143,8 +143,8 @@ __clear_bit_unlock(int nr, void *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static __inline__ void
> -__clear_bit (int nr, volatile void *addr)
> +static __always_inline void
> +arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	*((__u32 *) addr + (nr >> 5)) &= ~(1 << (nr & 31));
>  }
> @@ -175,7 +175,7 @@ change_bit (int nr, volatile void *addr)
>  }
>  
>  /**
> - * __change_bit - Toggle a bit in memory
> + * arch___change_bit - Toggle a bit in memory
>   * @nr: the bit to toggle
>   * @addr: the address to start counting from
>   *
> @@ -183,8 +183,8 @@ change_bit (int nr, volatile void *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static __inline__ void
> -__change_bit (int nr, volatile void *addr)
> +static __always_inline void
> +arch___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	*((__u32 *) addr + (nr >> 5)) ^= (1 << (nr & 31));
>  }
> @@ -224,7 +224,7 @@ test_and_set_bit (int nr, volatile void *addr)
>  #define test_and_set_bit_lock test_and_set_bit
>  
>  /**
> - * __test_and_set_bit - Set a bit and return its old value
> + * arch___test_and_set_bit - Set a bit and return its old value
>   * @nr: Bit to set
>   * @addr: Address to count from
>   *
> @@ -232,8 +232,8 @@ test_and_set_bit (int nr, volatile void *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static __inline__ int
> -__test_and_set_bit (int nr, volatile void *addr)
> +static __always_inline bool
> +arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__u32 *p = (__u32 *) addr + (nr >> 5);
>  	__u32 m = 1 << (nr & 31);
> @@ -269,7 +269,7 @@ test_and_clear_bit (int nr, volatile void *addr)
>  }
>  
>  /**
> - * __test_and_clear_bit - Clear a bit and return its old value
> + * arch___test_and_clear_bit - Clear a bit and return its old value
>   * @nr: Bit to clear
>   * @addr: Address to count from
>   *
> @@ -277,8 +277,8 @@ test_and_clear_bit (int nr, volatile void *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static __inline__ int
> -__test_and_clear_bit(int nr, volatile void * addr)
> +static __always_inline bool
> +arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__u32 *p = (__u32 *) addr + (nr >> 5);
>  	__u32 m = 1 << (nr & 31);
> @@ -314,14 +314,14 @@ test_and_change_bit (int nr, volatile void *addr)
>  }
>  
>  /**
> - * __test_and_change_bit - Change a bit and return its old value
> + * arch___test_and_change_bit - Change a bit and return its old value
>   * @nr: Bit to change
>   * @addr: Address to count from
>   *
>   * This operation is non-atomic and can be reordered.
>   */
> -static __inline__ int
> -__test_and_change_bit (int nr, void *addr)
> +static __always_inline bool
> +arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__u32 old, bit = (1 << (nr & 31));
>  	__u32 *m = (__u32 *) addr + (nr >> 5);
> @@ -331,8 +331,8 @@ __test_and_change_bit (int nr, void *addr)
>  	return (old & bit) != 0;
>  }
>  
> -static __inline__ int
> -test_bit (int nr, const volatile void *addr)
> +static __always_inline bool
> +arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return 1 & (((const volatile __u32 *) addr)[nr >> 5] >> (nr & 31));
>  }
> @@ -443,6 +443,8 @@ static __inline__ unsigned long __arch_hweight64(unsigned long x)
>  
>  #ifdef __KERNEL__
>  
> +#include <asm-generic/bitops/non-instrumented-non-atomic.h>
> +
>  #include <asm-generic/bitops/le.h>
>  
>  #include <asm-generic/bitops/ext2-atomic-setbit.h>
> diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
> index 51283db53667..71495faf2a90 100644
> --- a/arch/m68k/include/asm/bitops.h
> +++ b/arch/m68k/include/asm/bitops.h
> @@ -65,8 +65,11 @@ static inline void bfset_mem_set_bit(int nr, volatile unsigned long *vaddr)
>  				bfset_mem_set_bit(nr, vaddr))
>  #endif
>  
> -#define __set_bit(nr, vaddr)	set_bit(nr, vaddr)
> -
> +static __always_inline void
> +arch___set_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> +	set_bit(nr, addr);
> +}
>  
>  static inline void bclr_reg_clear_bit(int nr, volatile unsigned long *vaddr)
>  {
> @@ -105,8 +108,11 @@ static inline void bfclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
>  				bfclr_mem_clear_bit(nr, vaddr))
>  #endif
>  
> -#define __clear_bit(nr, vaddr)	clear_bit(nr, vaddr)
> -
> +static __always_inline void
> +arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> +	clear_bit(nr, addr);
> +}
>  
>  static inline void bchg_reg_change_bit(int nr, volatile unsigned long *vaddr)
>  {
> @@ -145,14 +151,17 @@ static inline void bfchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
>  				bfchg_mem_change_bit(nr, vaddr))
>  #endif
>  
> -#define __change_bit(nr, vaddr)	change_bit(nr, vaddr)
> -
> -
> -static inline int test_bit(int nr, const volatile unsigned long *vaddr)
> +static __always_inline void
> +arch___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
> -	return (vaddr[nr >> 5] & (1UL << (nr & 31))) != 0;
> +	change_bit(nr, addr);
>  }
>  
> +static __always_inline bool
> +arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
> +{
> +	return (addr[nr >> 5] & (1UL << (nr & 31))) != 0;
> +}
>  
>  static inline int bset_reg_test_and_set_bit(int nr,
>  					    volatile unsigned long *vaddr)
> @@ -201,8 +210,11 @@ static inline int bfset_mem_test_and_set_bit(int nr,
>  					bfset_mem_test_and_set_bit(nr, vaddr))
>  #endif
>  
> -#define __test_and_set_bit(nr, vaddr)	test_and_set_bit(nr, vaddr)
> -
> +static __always_inline bool
> +arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> +	return test_and_set_bit(nr, addr);
> +}
>  
>  static inline int bclr_reg_test_and_clear_bit(int nr,
>  					      volatile unsigned long *vaddr)
> @@ -251,8 +263,11 @@ static inline int bfclr_mem_test_and_clear_bit(int nr,
>  					bfclr_mem_test_and_clear_bit(nr, vaddr))
>  #endif
>  
> -#define __test_and_clear_bit(nr, vaddr)	test_and_clear_bit(nr, vaddr)
> -
> +static __always_inline bool
> +arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> +	return test_and_clear_bit(nr, addr);
> +}
>  
>  static inline int bchg_reg_test_and_change_bit(int nr,
>  					       volatile unsigned long *vaddr)
> @@ -301,8 +316,11 @@ static inline int bfchg_mem_test_and_change_bit(int nr,
>  					bfchg_mem_test_and_change_bit(nr, vaddr))
>  #endif
>  
> -#define __test_and_change_bit(nr, vaddr) test_and_change_bit(nr, vaddr)
> -
> +static __always_inline bool
> +arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> +	return test_and_change_bit(nr, addr);
> +}
>  
>  /*
>   *	The true 68020 and more advanced processors support the "bfffo"
> @@ -522,6 +540,7 @@ static inline int __fls(int x)
>  #define clear_bit_unlock	clear_bit
>  #define __clear_bit_unlock	clear_bit_unlock
>  
> +#include <asm-generic/bitops/non-instrumented-non-atomic.h>
>  #include <asm-generic/bitops/ext2-atomic.h>
>  #include <asm-generic/bitops/fls64.h>
>  #include <asm-generic/bitops/sched.h>
> diff --git a/arch/sh/include/asm/bitops-op32.h b/arch/sh/include/asm/bitops-op32.h
> index cfe5465acce7..565a85d8b7fb 100644
> --- a/arch/sh/include/asm/bitops-op32.h
> +++ b/arch/sh/include/asm/bitops-op32.h
> @@ -2,6 +2,8 @@
>  #ifndef __ASM_SH_BITOPS_OP32_H
>  #define __ASM_SH_BITOPS_OP32_H
>  
> +#include <linux/bits.h>
> +
>  /*
>   * The bit modifying instructions on SH-2A are only capable of working
>   * with a 3-bit immediate, which signifies the shift position for the bit
> @@ -16,7 +18,8 @@
>  #define BYTE_OFFSET(nr)		((nr) % BITS_PER_BYTE)
>  #endif
>  
> -static inline void __set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +arch___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	if (__builtin_constant_p(nr)) {
>  		__asm__ __volatile__ (
> @@ -33,7 +36,8 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
>  	}
>  }
>  
> -static inline void __clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	if (__builtin_constant_p(nr)) {
>  		__asm__ __volatile__ (
> @@ -52,7 +56,7 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __change_bit - Toggle a bit in memory
> + * arch___change_bit - Toggle a bit in memory
>   * @nr: the bit to change
>   * @addr: the address to start counting from
>   *
> @@ -60,7 +64,8 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static inline void __change_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +arch___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	if (__builtin_constant_p(nr)) {
>  		__asm__ __volatile__ (
> @@ -79,7 +84,7 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __test_and_set_bit - Set a bit and return its old value
> + * arch___test_and_set_bit - Set a bit and return its old value
>   * @nr: Bit to set
>   * @addr: Address to count from
>   *
> @@ -87,7 +92,8 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -98,7 +104,7 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __test_and_clear_bit - Clear a bit and return its old value
> + * arch___test_and_clear_bit - Clear a bit and return its old value
>   * @nr: Bit to clear
>   * @addr: Address to count from
>   *
> @@ -106,7 +112,8 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -117,8 +124,8 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
>  }
>  
>  /* WARNING: non atomic and it can be reordered! */
> -static inline int __test_and_change_bit(int nr,
> -					    volatile unsigned long *addr)
> +static __always_inline bool
> +arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -129,13 +136,16 @@ static inline int __test_and_change_bit(int nr,
>  }
>  
>  /**
> - * test_bit - Determine whether a bit is set
> + * arch_test_bit - Determine whether a bit is set
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
> -static inline int test_bit(int nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
>  }
>  
> +#include <asm-generic/bitops/non-instrumented-non-atomic.h>
> +
>  #endif /* __ASM_SH_BITOPS_OP32_H */
> diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
> index a288ecd230ab..973c6bd17f98 100644
> --- a/arch/x86/include/asm/bitops.h
> +++ b/arch/x86/include/asm/bitops.h
> @@ -63,7 +63,7 @@ arch_set_bit(long nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline void
> -arch___set_bit(long nr, volatile unsigned long *addr)
> +arch___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	asm volatile(__ASM_SIZE(bts) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
>  }
> @@ -89,7 +89,7 @@ arch_clear_bit_unlock(long nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline void
> -arch___clear_bit(long nr, volatile unsigned long *addr)
> +arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	asm volatile(__ASM_SIZE(btr) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
>  }
> @@ -114,7 +114,7 @@ arch___clear_bit_unlock(long nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline void
> -arch___change_bit(long nr, volatile unsigned long *addr)
> +arch___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	asm volatile(__ASM_SIZE(btc) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
>  }
> @@ -145,7 +145,7 @@ arch_test_and_set_bit_lock(long nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline bool
> -arch___test_and_set_bit(long nr, volatile unsigned long *addr)
> +arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	bool oldbit;
>  
> @@ -171,7 +171,7 @@ arch_test_and_clear_bit(long nr, volatile unsigned long *addr)
>   * this without also updating arch/x86/kernel/kvm.c
>   */
>  static __always_inline bool
> -arch___test_and_clear_bit(long nr, volatile unsigned long *addr)
> +arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	bool oldbit;
>  
> @@ -183,7 +183,7 @@ arch___test_and_clear_bit(long nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline bool
> -arch___test_and_change_bit(long nr, volatile unsigned long *addr)
> +arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	bool oldbit;
>  
> @@ -219,10 +219,12 @@ static __always_inline bool variable_test_bit(long nr, volatile const unsigned l
>  	return oldbit;
>  }
>  
> -#define arch_test_bit(nr, addr)			\
> -	(__builtin_constant_p((nr))		\
> -	 ? constant_test_bit((nr), (addr))	\
> -	 : variable_test_bit((nr), (addr)))
> +static __always_inline bool
> +arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
> +{
> +	return __builtin_constant_p(nr) ? constant_test_bit(nr, addr) :
> +					  variable_test_bit(nr, addr);
> +}
>  
>  /**
>   * __ffs - find first set bit in word
> diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> index 7226488810e5..b85b8a2ac239 100644
> --- a/include/asm-generic/bitops/generic-non-atomic.h
> +++ b/include/asm-generic/bitops/generic-non-atomic.h
> @@ -24,7 +24,7 @@
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -generic___set_bit(unsigned int nr, volatile unsigned long *addr)
> +generic___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -33,7 +33,7 @@ generic___set_bit(unsigned int nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline void
> -generic___clear_bit(unsigned int nr, volatile unsigned long *addr)
> +generic___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -50,8 +50,8 @@ generic___clear_bit(unsigned int nr, volatile unsigned long *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static __always_inline
> -void generic___change_bit(unsigned int nr, volatile unsigned long *addr)
> +static __always_inline void
> +generic___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -68,8 +68,8 @@ void generic___change_bit(unsigned int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static __always_inline int
> -generic___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +generic___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -88,8 +88,8 @@ generic___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static __always_inline int
> -generic___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +generic___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -100,8 +100,8 @@ generic___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
>  }
>  
>  /* WARNING: non atomic and it can be reordered! */
> -static __always_inline int
> -generic___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +generic___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -116,8 +116,8 @@ generic___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
> -static __always_inline int
> -generic_test_bit(unsigned int nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +generic_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	/*
>  	 * Unlike the bitops with the '__' prefix above, this one *is* atomic,
> diff --git a/include/asm-generic/bitops/instrumented-non-atomic.h b/include/asm-generic/bitops/instrumented-non-atomic.h
> index 7ab1ecc37782..b019f77ef21c 100644
> --- a/include/asm-generic/bitops/instrumented-non-atomic.h
> +++ b/include/asm-generic/bitops/instrumented-non-atomic.h
> @@ -22,7 +22,8 @@
>   * region of memory concurrently, the effect may be that only one operation
>   * succeeds.
>   */
> -static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
> +static __always_inline void
> +__set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___set_bit(nr, addr);
> @@ -37,7 +38,8 @@ static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
>   * region of memory concurrently, the effect may be that only one operation
>   * succeeds.
>   */
> -static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
> +static __always_inline void
> +__clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___clear_bit(nr, addr);
> @@ -52,7 +54,8 @@ static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
>   * region of memory concurrently, the effect may be that only one operation
>   * succeeds.
>   */
> -static __always_inline void __change_bit(long nr, volatile unsigned long *addr)
> +static __always_inline void
> +__change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___change_bit(nr, addr);
> @@ -90,7 +93,8 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
>   * This operation is non-atomic. If two instances of this operation race, one
>   * can appear to succeed but actually fail.
>   */
> -static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_set_bit(nr, addr);
> @@ -104,7 +108,8 @@ static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *
>   * This operation is non-atomic. If two instances of this operation race, one
>   * can appear to succeed but actually fail.
>   */
> -static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_clear_bit(nr, addr);
> @@ -118,7 +123,8 @@ static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long
>   * This operation is non-atomic. If two instances of this operation race, one
>   * can appear to succeed but actually fail.
>   */
> -static __always_inline bool __test_and_change_bit(long nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_change_bit(nr, addr);
> @@ -129,7 +135,8 @@ static __always_inline bool __test_and_change_bit(long nr, volatile unsigned lon
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
> -static __always_inline bool test_bit(long nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long));
>  	return arch_test_bit(nr, addr);
> diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h
> index 23d3abc1e10d..5c37ced343ae 100644
> --- a/include/asm-generic/bitops/non-atomic.h
> +++ b/include/asm-generic/bitops/non-atomic.h
> @@ -5,24 +5,15 @@
>  #include <asm-generic/bitops/generic-non-atomic.h>
>  
>  #define arch___set_bit generic___set_bit
> -#define __set_bit arch___set_bit
> -
>  #define arch___clear_bit generic___clear_bit
> -#define __clear_bit arch___clear_bit
> -
>  #define arch___change_bit generic___change_bit
> -#define __change_bit arch___change_bit
>  
>  #define arch___test_and_set_bit generic___test_and_set_bit
> -#define __test_and_set_bit arch___test_and_set_bit
> -
>  #define arch___test_and_clear_bit generic___test_and_clear_bit
> -#define __test_and_clear_bit arch___test_and_clear_bit
> -
>  #define arch___test_and_change_bit generic___test_and_change_bit
> -#define __test_and_change_bit arch___test_and_change_bit
>  
>  #define arch_test_bit generic_test_bit
> -#define test_bit arch_test_bit
> +
> +#include <asm-generic/bitops/non-instrumented-non-atomic.h>
>  
>  #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
> diff --git a/include/asm-generic/bitops/non-instrumented-non-atomic.h b/include/asm-generic/bitops/non-instrumented-non-atomic.h
> new file mode 100644
> index 000000000000..e0fd7bf72a56
> --- /dev/null
> +++ b/include/asm-generic/bitops/non-instrumented-non-atomic.h
> @@ -0,0 +1,16 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H
> +#define __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H
> +
> +#define __set_bit		arch___set_bit
> +#define __clear_bit		arch___clear_bit
> +#define __change_bit		arch___change_bit
> +
> +#define __test_and_set_bit	arch___test_and_set_bit
> +#define __test_and_clear_bit	arch___test_and_clear_bit
> +#define __test_and_change_bit	arch___test_and_change_bit
> +
> +#define test_bit		arch_test_bit
> +
> +#endif /* __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H */
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index 7aaed501f768..87087454a288 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -26,12 +26,29 @@ extern unsigned int __sw_hweight16(unsigned int w);
>  extern unsigned int __sw_hweight32(unsigned int w);
>  extern unsigned long __sw_hweight64(__u64 w);
>  
> +#include <asm-generic/bitops/generic-non-atomic.h>
> +
>  /*
>   * Include this here because some architectures need generic_ffs/fls in
>   * scope
>   */
>  #include <asm/bitops.h>
>  
> +/* Check that the bitops prototypes are sane */
> +#define __check_bitop_pr(name)						\
> +	static_assert(__same_type(arch_##name, generic_##name) &&	\
> +		      __same_type(name, generic_##name))
> +
> +__check_bitop_pr(__set_bit);
> +__check_bitop_pr(__clear_bit);
> +__check_bitop_pr(__change_bit);
> +__check_bitop_pr(__test_and_set_bit);
> +__check_bitop_pr(__test_and_clear_bit);
> +__check_bitop_pr(__test_and_change_bit);
> +__check_bitop_pr(test_bit);
> +
> +#undef __check_bitop_pr
> +
>  static inline int get_bitmask_order(unsigned int count)
>  {
>  	int order;
> diff --git a/tools/include/asm-generic/bitops/non-atomic.h b/tools/include/asm-generic/bitops/non-atomic.h
> index 7e10c4b50c5d..e5e78e42e57b 100644
> --- a/tools/include/asm-generic/bitops/non-atomic.h
> +++ b/tools/include/asm-generic/bitops/non-atomic.h
> @@ -2,7 +2,7 @@
>  #ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
>  #define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
>  
> -#include <asm/types.h>
> +#include <linux/bits.h>
>  
>  /**
>   * __set_bit - Set a bit in memory
> @@ -13,7 +13,8 @@
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static inline void __set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -21,7 +22,8 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
>  	*p  |= mask;
>  }
>  
> -static inline void __clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -38,7 +40,8 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static inline void __change_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -55,7 +58,8 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -74,7 +78,8 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -85,8 +90,8 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
>  }
>  
>  /* WARNING: non atomic and it can be reordered! */
> -static inline int __test_and_change_bit(int nr,
> -					    volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -101,7 +106,8 @@ static inline int __test_and_change_bit(int nr,
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
> -static inline int test_bit(int nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
>  }
> -- 
> 2.36.1
> 

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 3/7] bitops: unify non-atomic bitops prototypes across architectures
@ 2022-06-20 10:02       ` Andy Shevchenko
  0 siblings, 0 replies; 93+ messages in thread
From: Andy Shevchenko @ 2022-06-20 10:02 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Mark Rutland, Matt Turner, Brian Cain,
	Geert Uytterhoeven, Yoshinori Sato, Rich Felker, David S. Miller,
	Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

On Fri, Jun 17, 2022 at 04:40:27PM +0200, Alexander Lobakin wrote:
> Currently, there is a mess with the prototypes of the non-atomic
> bitops across the different architectures:
> 
> ret	bool, int, unsigned long
> nr	int, long, unsigned int, unsigned long
> addr	volatile unsigned long *, volatile void *
> 
> Thankfully, it doesn't provoke any bugs, but can sometimes make
> the compiler angry when it's not handy at all.
> Adjust all the prototypes to the following standard:
> 
> ret	bool				retval can be only 0 or 1
> nr	unsigned long			native; signed makes no sense
> addr	volatile unsigned long *	bitmaps are arrays of ulongs
> 
> Next, some architectures don't define 'arch_' versions as they don't
> support instrumentation, others do. To make sure there is always the
> same set of callables present and to ease any potential future
> changes, make them all follow the rule:
>  * architecture-specific files define only 'arch_' versions;
>  * non-prefixed versions can be defined only in asm-generic files;
> and place the non-prefixed definitions into a new file in
> asm-generic to be included by non-instrumented architectures.
> 
> Finally, add some static assertions in order to prevent people from
> making a mess in this room again.
> I also used the %__always_inline attribute consistently, so that
> they always get resolved to the actual operations.
> 
> Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> Acked-by: Mark Rutland <mark.rutland@arm.com>
> Reviewed-by: Yury Norov <yury.norov@gmail.com>

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Thanks for doing this!

> ---
>  arch/alpha/include/asm/bitops.h               | 32 ++++++------
>  arch/hexagon/include/asm/bitops.h             | 24 +++++----
>  arch/ia64/include/asm/bitops.h                | 42 ++++++++--------
>  arch/m68k/include/asm/bitops.h                | 49 +++++++++++++------
>  arch/sh/include/asm/bitops-op32.h             | 34 ++++++++-----
>  arch/x86/include/asm/bitops.h                 | 22 +++++----
>  .../asm-generic/bitops/generic-non-atomic.h   | 24 ++++-----
>  .../bitops/instrumented-non-atomic.h          | 21 +++++---
>  include/asm-generic/bitops/non-atomic.h       | 13 +----
>  .../bitops/non-instrumented-non-atomic.h      | 16 ++++++
>  include/linux/bitops.h                        | 17 +++++++
>  tools/include/asm-generic/bitops/non-atomic.h | 24 +++++----
>  12 files changed, 198 insertions(+), 120 deletions(-)
>  create mode 100644 include/asm-generic/bitops/non-instrumented-non-atomic.h
> 
> diff --git a/arch/alpha/include/asm/bitops.h b/arch/alpha/include/asm/bitops.h
> index e1d8483a45f2..492c7713ddae 100644
> --- a/arch/alpha/include/asm/bitops.h
> +++ b/arch/alpha/include/asm/bitops.h
> @@ -46,8 +46,8 @@ set_bit(unsigned long nr, volatile void * addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static inline void
> -__set_bit(unsigned long nr, volatile void * addr)
> +static __always_inline void
> +arch___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	int *m = ((int *) addr) + (nr >> 5);
>  
> @@ -82,8 +82,8 @@ clear_bit_unlock(unsigned long nr, volatile void * addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static __inline__ void
> -__clear_bit(unsigned long nr, volatile void * addr)
> +static __always_inline void
> +arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	int *m = ((int *) addr) + (nr >> 5);
>  
> @@ -94,7 +94,7 @@ static inline void
>  __clear_bit_unlock(unsigned long nr, volatile void * addr)
>  {
>  	smp_mb();
> -	__clear_bit(nr, addr);
> +	arch___clear_bit(nr, addr);
>  }
>  
>  static inline void
> @@ -118,8 +118,8 @@ change_bit(unsigned long nr, volatile void * addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static __inline__ void
> -__change_bit(unsigned long nr, volatile void * addr)
> +static __always_inline void
> +arch___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	int *m = ((int *) addr) + (nr >> 5);
>  
> @@ -186,8 +186,8 @@ test_and_set_bit_lock(unsigned long nr, volatile void *addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static inline int
> -__test_and_set_bit(unsigned long nr, volatile void * addr)
> +static __always_inline bool
> +arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = 1 << (nr & 0x1f);
>  	int *m = ((int *) addr) + (nr >> 5);
> @@ -230,8 +230,8 @@ test_and_clear_bit(unsigned long nr, volatile void * addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static inline int
> -__test_and_clear_bit(unsigned long nr, volatile void * addr)
> +static __always_inline bool
> +arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = 1 << (nr & 0x1f);
>  	int *m = ((int *) addr) + (nr >> 5);
> @@ -272,8 +272,8 @@ test_and_change_bit(unsigned long nr, volatile void * addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static __inline__ int
> -__test_and_change_bit(unsigned long nr, volatile void * addr)
> +static __always_inline bool
> +arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = 1 << (nr & 0x1f);
>  	int *m = ((int *) addr) + (nr >> 5);
> @@ -283,8 +283,8 @@ __test_and_change_bit(unsigned long nr, volatile void * addr)
>  	return (old & mask) != 0;
>  }
>  
> -static inline int
> -test_bit(int nr, const volatile void * addr)
> +static __always_inline bool
> +arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return (1UL & (((const int *) addr)[nr >> 5] >> (nr & 31))) != 0UL;
>  }
> @@ -450,6 +450,8 @@ sched_find_first_bit(const unsigned long b[2])
>  	return __ffs(tmp) + ofs;
>  }
>  
> +#include <asm-generic/bitops/non-instrumented-non-atomic.h>
> +
>  #include <asm-generic/bitops/le.h>
>  
>  #include <asm-generic/bitops/ext2-atomic-setbit.h>
> diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h
> index 75d6ba3643b8..da500471ac73 100644
> --- a/arch/hexagon/include/asm/bitops.h
> +++ b/arch/hexagon/include/asm/bitops.h
> @@ -127,38 +127,45 @@ static inline void change_bit(int nr, volatile void *addr)
>   * be atomic, particularly for things like slab_lock and slab_unlock.
>   *
>   */
> -static inline void __clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	test_and_clear_bit(nr, addr);
>  }
>  
> -static inline void __set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +arch___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	test_and_set_bit(nr, addr);
>  }
>  
> -static inline void __change_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +arch___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	test_and_change_bit(nr, addr);
>  }
>  
>  /*  Apparently, at least some of these are allowed to be non-atomic  */
> -static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_clear_bit(nr, addr);
>  }
>  
> -static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_set_bit(nr, addr);
>  }
>  
> -static inline int __test_and_change_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_change_bit(nr, addr);
>  }
>  
> -static inline int __test_bit(int nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	int retval;
>  
> @@ -172,8 +179,6 @@ static inline int __test_bit(int nr, const volatile unsigned long *addr)
>  	return retval;
>  }
>  
> -#define test_bit(nr, addr) __test_bit(nr, addr)
> -
>  /*
>   * ffz - find first zero in word.
>   * @word: The word to search
> @@ -271,6 +276,7 @@ static inline unsigned long __fls(unsigned long word)
>  }
>  
>  #include <asm-generic/bitops/lock.h>
> +#include <asm-generic/bitops/non-instrumented-non-atomic.h>
>  
>  #include <asm-generic/bitops/fls64.h>
>  #include <asm-generic/bitops/sched.h>
> diff --git a/arch/ia64/include/asm/bitops.h b/arch/ia64/include/asm/bitops.h
> index 577be93c0818..9f62af7fd7c4 100644
> --- a/arch/ia64/include/asm/bitops.h
> +++ b/arch/ia64/include/asm/bitops.h
> @@ -53,7 +53,7 @@ set_bit (int nr, volatile void *addr)
>  }
>  
>  /**
> - * __set_bit - Set a bit in memory
> + * arch___set_bit - Set a bit in memory
>   * @nr: the bit to set
>   * @addr: the address to start counting from
>   *
> @@ -61,8 +61,8 @@ set_bit (int nr, volatile void *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static __inline__ void
> -__set_bit (int nr, volatile void *addr)
> +static __always_inline void
> +arch___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	*((__u32 *) addr + (nr >> 5)) |= (1 << (nr & 31));
>  }
> @@ -135,7 +135,7 @@ __clear_bit_unlock(int nr, void *addr)
>  }
>  
>  /**
> - * __clear_bit - Clears a bit in memory (non-atomic version)
> + * arch___clear_bit - Clears a bit in memory (non-atomic version)
>   * @nr: the bit to clear
>   * @addr: the address to start counting from
>   *
> @@ -143,8 +143,8 @@ __clear_bit_unlock(int nr, void *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static __inline__ void
> -__clear_bit (int nr, volatile void *addr)
> +static __always_inline void
> +arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	*((__u32 *) addr + (nr >> 5)) &= ~(1 << (nr & 31));
>  }
> @@ -175,7 +175,7 @@ change_bit (int nr, volatile void *addr)
>  }
>  
>  /**
> - * __change_bit - Toggle a bit in memory
> + * arch___change_bit - Toggle a bit in memory
>   * @nr: the bit to toggle
>   * @addr: the address to start counting from
>   *
> @@ -183,8 +183,8 @@ change_bit (int nr, volatile void *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static __inline__ void
> -__change_bit (int nr, volatile void *addr)
> +static __always_inline void
> +arch___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	*((__u32 *) addr + (nr >> 5)) ^= (1 << (nr & 31));
>  }
> @@ -224,7 +224,7 @@ test_and_set_bit (int nr, volatile void *addr)
>  #define test_and_set_bit_lock test_and_set_bit
>  
>  /**
> - * __test_and_set_bit - Set a bit and return its old value
> + * arch___test_and_set_bit - Set a bit and return its old value
>   * @nr: Bit to set
>   * @addr: Address to count from
>   *
> @@ -232,8 +232,8 @@ test_and_set_bit (int nr, volatile void *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static __inline__ int
> -__test_and_set_bit (int nr, volatile void *addr)
> +static __always_inline bool
> +arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__u32 *p = (__u32 *) addr + (nr >> 5);
>  	__u32 m = 1 << (nr & 31);
> @@ -269,7 +269,7 @@ test_and_clear_bit (int nr, volatile void *addr)
>  }
>  
>  /**
> - * __test_and_clear_bit - Clear a bit and return its old value
> + * arch___test_and_clear_bit - Clear a bit and return its old value
>   * @nr: Bit to clear
>   * @addr: Address to count from
>   *
> @@ -277,8 +277,8 @@ test_and_clear_bit (int nr, volatile void *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static __inline__ int
> -__test_and_clear_bit(int nr, volatile void * addr)
> +static __always_inline bool
> +arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__u32 *p = (__u32 *) addr + (nr >> 5);
>  	__u32 m = 1 << (nr & 31);
> @@ -314,14 +314,14 @@ test_and_change_bit (int nr, volatile void *addr)
>  }
>  
>  /**
> - * __test_and_change_bit - Change a bit and return its old value
> + * arch___test_and_change_bit - Change a bit and return its old value
>   * @nr: Bit to change
>   * @addr: Address to count from
>   *
>   * This operation is non-atomic and can be reordered.
>   */
> -static __inline__ int
> -__test_and_change_bit (int nr, void *addr)
> +static __always_inline bool
> +arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__u32 old, bit = (1 << (nr & 31));
>  	__u32 *m = (__u32 *) addr + (nr >> 5);
> @@ -331,8 +331,8 @@ __test_and_change_bit (int nr, void *addr)
>  	return (old & bit) != 0;
>  }
>  
> -static __inline__ int
> -test_bit (int nr, const volatile void *addr)
> +static __always_inline bool
> +arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return 1 & (((const volatile __u32 *) addr)[nr >> 5] >> (nr & 31));
>  }
> @@ -443,6 +443,8 @@ static __inline__ unsigned long __arch_hweight64(unsigned long x)
>  
>  #ifdef __KERNEL__
>  
> +#include <asm-generic/bitops/non-instrumented-non-atomic.h>
> +
>  #include <asm-generic/bitops/le.h>
>  
>  #include <asm-generic/bitops/ext2-atomic-setbit.h>
> diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
> index 51283db53667..71495faf2a90 100644
> --- a/arch/m68k/include/asm/bitops.h
> +++ b/arch/m68k/include/asm/bitops.h
> @@ -65,8 +65,11 @@ static inline void bfset_mem_set_bit(int nr, volatile unsigned long *vaddr)
>  				bfset_mem_set_bit(nr, vaddr))
>  #endif
>  
> -#define __set_bit(nr, vaddr)	set_bit(nr, vaddr)
> -
> +static __always_inline void
> +arch___set_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> +	set_bit(nr, addr);
> +}
>  
>  static inline void bclr_reg_clear_bit(int nr, volatile unsigned long *vaddr)
>  {
> @@ -105,8 +108,11 @@ static inline void bfclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
>  				bfclr_mem_clear_bit(nr, vaddr))
>  #endif
>  
> -#define __clear_bit(nr, vaddr)	clear_bit(nr, vaddr)
> -
> +static __always_inline void
> +arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> +	clear_bit(nr, addr);
> +}
>  
>  static inline void bchg_reg_change_bit(int nr, volatile unsigned long *vaddr)
>  {
> @@ -145,14 +151,17 @@ static inline void bfchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
>  				bfchg_mem_change_bit(nr, vaddr))
>  #endif
>  
> -#define __change_bit(nr, vaddr)	change_bit(nr, vaddr)
> -
> -
> -static inline int test_bit(int nr, const volatile unsigned long *vaddr)
> +static __always_inline void
> +arch___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
> -	return (vaddr[nr >> 5] & (1UL << (nr & 31))) != 0;
> +	change_bit(nr, addr);
>  }
>  
> +static __always_inline bool
> +arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
> +{
> +	return (addr[nr >> 5] & (1UL << (nr & 31))) != 0;
> +}
>  
>  static inline int bset_reg_test_and_set_bit(int nr,
>  					    volatile unsigned long *vaddr)
> @@ -201,8 +210,11 @@ static inline int bfset_mem_test_and_set_bit(int nr,
>  					bfset_mem_test_and_set_bit(nr, vaddr))
>  #endif
>  
> -#define __test_and_set_bit(nr, vaddr)	test_and_set_bit(nr, vaddr)
> -
> +static __always_inline bool
> +arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> +	return test_and_set_bit(nr, addr);
> +}
>  
>  static inline int bclr_reg_test_and_clear_bit(int nr,
>  					      volatile unsigned long *vaddr)
> @@ -251,8 +263,11 @@ static inline int bfclr_mem_test_and_clear_bit(int nr,
>  					bfclr_mem_test_and_clear_bit(nr, vaddr))
>  #endif
>  
> -#define __test_and_clear_bit(nr, vaddr)	test_and_clear_bit(nr, vaddr)
> -
> +static __always_inline bool
> +arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> +	return test_and_clear_bit(nr, addr);
> +}
>  
>  static inline int bchg_reg_test_and_change_bit(int nr,
>  					       volatile unsigned long *vaddr)
> @@ -301,8 +316,11 @@ static inline int bfchg_mem_test_and_change_bit(int nr,
>  					bfchg_mem_test_and_change_bit(nr, vaddr))
>  #endif
>  
> -#define __test_and_change_bit(nr, vaddr) test_and_change_bit(nr, vaddr)
> -
> +static __always_inline bool
> +arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> +	return test_and_change_bit(nr, addr);
> +}
>  
>  /*
>   *	The true 68020 and more advanced processors support the "bfffo"
> @@ -522,6 +540,7 @@ static inline int __fls(int x)
>  #define clear_bit_unlock	clear_bit
>  #define __clear_bit_unlock	clear_bit_unlock
>  
> +#include <asm-generic/bitops/non-instrumented-non-atomic.h>
>  #include <asm-generic/bitops/ext2-atomic.h>
>  #include <asm-generic/bitops/fls64.h>
>  #include <asm-generic/bitops/sched.h>
> diff --git a/arch/sh/include/asm/bitops-op32.h b/arch/sh/include/asm/bitops-op32.h
> index cfe5465acce7..565a85d8b7fb 100644
> --- a/arch/sh/include/asm/bitops-op32.h
> +++ b/arch/sh/include/asm/bitops-op32.h
> @@ -2,6 +2,8 @@
>  #ifndef __ASM_SH_BITOPS_OP32_H
>  #define __ASM_SH_BITOPS_OP32_H
>  
> +#include <linux/bits.h>
> +
>  /*
>   * The bit modifying instructions on SH-2A are only capable of working
>   * with a 3-bit immediate, which signifies the shift position for the bit
> @@ -16,7 +18,8 @@
>  #define BYTE_OFFSET(nr)		((nr) % BITS_PER_BYTE)
>  #endif
>  
> -static inline void __set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +arch___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	if (__builtin_constant_p(nr)) {
>  		__asm__ __volatile__ (
> @@ -33,7 +36,8 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
>  	}
>  }
>  
> -static inline void __clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	if (__builtin_constant_p(nr)) {
>  		__asm__ __volatile__ (
> @@ -52,7 +56,7 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __change_bit - Toggle a bit in memory
> + * arch___change_bit - Toggle a bit in memory
>   * @nr: the bit to change
>   * @addr: the address to start counting from
>   *
> @@ -60,7 +64,8 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static inline void __change_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +arch___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	if (__builtin_constant_p(nr)) {
>  		__asm__ __volatile__ (
> @@ -79,7 +84,7 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __test_and_set_bit - Set a bit and return its old value
> + * arch___test_and_set_bit - Set a bit and return its old value
>   * @nr: Bit to set
>   * @addr: Address to count from
>   *
> @@ -87,7 +92,8 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -98,7 +104,7 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __test_and_clear_bit - Clear a bit and return its old value
> + * arch___test_and_clear_bit - Clear a bit and return its old value
>   * @nr: Bit to clear
>   * @addr: Address to count from
>   *
> @@ -106,7 +112,8 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -117,8 +124,8 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
>  }
>  
>  /* WARNING: non atomic and it can be reordered! */
> -static inline int __test_and_change_bit(int nr,
> -					    volatile unsigned long *addr)
> +static __always_inline bool
> +arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -129,13 +136,16 @@ static inline int __test_and_change_bit(int nr,
>  }
>  
>  /**
> - * test_bit - Determine whether a bit is set
> + * arch_test_bit - Determine whether a bit is set
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
> -static inline int test_bit(int nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
>  }
>  
> +#include <asm-generic/bitops/non-instrumented-non-atomic.h>
> +
>  #endif /* __ASM_SH_BITOPS_OP32_H */
> diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
> index a288ecd230ab..973c6bd17f98 100644
> --- a/arch/x86/include/asm/bitops.h
> +++ b/arch/x86/include/asm/bitops.h
> @@ -63,7 +63,7 @@ arch_set_bit(long nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline void
> -arch___set_bit(long nr, volatile unsigned long *addr)
> +arch___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	asm volatile(__ASM_SIZE(bts) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
>  }
> @@ -89,7 +89,7 @@ arch_clear_bit_unlock(long nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline void
> -arch___clear_bit(long nr, volatile unsigned long *addr)
> +arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	asm volatile(__ASM_SIZE(btr) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
>  }
> @@ -114,7 +114,7 @@ arch___clear_bit_unlock(long nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline void
> -arch___change_bit(long nr, volatile unsigned long *addr)
> +arch___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	asm volatile(__ASM_SIZE(btc) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
>  }
> @@ -145,7 +145,7 @@ arch_test_and_set_bit_lock(long nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline bool
> -arch___test_and_set_bit(long nr, volatile unsigned long *addr)
> +arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	bool oldbit;
>  
> @@ -171,7 +171,7 @@ arch_test_and_clear_bit(long nr, volatile unsigned long *addr)
>   * this without also updating arch/x86/kernel/kvm.c
>   */
>  static __always_inline bool
> -arch___test_and_clear_bit(long nr, volatile unsigned long *addr)
> +arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	bool oldbit;
>  
> @@ -183,7 +183,7 @@ arch___test_and_clear_bit(long nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline bool
> -arch___test_and_change_bit(long nr, volatile unsigned long *addr)
> +arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	bool oldbit;
>  
> @@ -219,10 +219,12 @@ static __always_inline bool variable_test_bit(long nr, volatile const unsigned l
>  	return oldbit;
>  }
>  
> -#define arch_test_bit(nr, addr)			\
> -	(__builtin_constant_p((nr))		\
> -	 ? constant_test_bit((nr), (addr))	\
> -	 : variable_test_bit((nr), (addr)))
> +static __always_inline bool
> +arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
> +{
> +	return __builtin_constant_p(nr) ? constant_test_bit(nr, addr) :
> +					  variable_test_bit(nr, addr);
> +}
>  
>  /**
>   * __ffs - find first set bit in word
> diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> index 7226488810e5..b85b8a2ac239 100644
> --- a/include/asm-generic/bitops/generic-non-atomic.h
> +++ b/include/asm-generic/bitops/generic-non-atomic.h
> @@ -24,7 +24,7 @@
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -generic___set_bit(unsigned int nr, volatile unsigned long *addr)
> +generic___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -33,7 +33,7 @@ generic___set_bit(unsigned int nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline void
> -generic___clear_bit(unsigned int nr, volatile unsigned long *addr)
> +generic___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -50,8 +50,8 @@ generic___clear_bit(unsigned int nr, volatile unsigned long *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static __always_inline
> -void generic___change_bit(unsigned int nr, volatile unsigned long *addr)
> +static __always_inline void
> +generic___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -68,8 +68,8 @@ void generic___change_bit(unsigned int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static __always_inline int
> -generic___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +generic___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -88,8 +88,8 @@ generic___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static __always_inline int
> -generic___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +generic___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -100,8 +100,8 @@ generic___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
>  }
>  
>  /* WARNING: non atomic and it can be reordered! */
> -static __always_inline int
> -generic___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +generic___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -116,8 +116,8 @@ generic___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
> -static __always_inline int
> -generic_test_bit(unsigned int nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +generic_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	/*
>  	 * Unlike the bitops with the '__' prefix above, this one *is* atomic,
> diff --git a/include/asm-generic/bitops/instrumented-non-atomic.h b/include/asm-generic/bitops/instrumented-non-atomic.h
> index 7ab1ecc37782..b019f77ef21c 100644
> --- a/include/asm-generic/bitops/instrumented-non-atomic.h
> +++ b/include/asm-generic/bitops/instrumented-non-atomic.h
> @@ -22,7 +22,8 @@
>   * region of memory concurrently, the effect may be that only one operation
>   * succeeds.
>   */
> -static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
> +static __always_inline void
> +__set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___set_bit(nr, addr);
> @@ -37,7 +38,8 @@ static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
>   * region of memory concurrently, the effect may be that only one operation
>   * succeeds.
>   */
> -static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
> +static __always_inline void
> +__clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___clear_bit(nr, addr);
> @@ -52,7 +54,8 @@ static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
>   * region of memory concurrently, the effect may be that only one operation
>   * succeeds.
>   */
> -static __always_inline void __change_bit(long nr, volatile unsigned long *addr)
> +static __always_inline void
> +__change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___change_bit(nr, addr);
> @@ -90,7 +93,8 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
>   * This operation is non-atomic. If two instances of this operation race, one
>   * can appear to succeed but actually fail.
>   */
> -static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_set_bit(nr, addr);
> @@ -104,7 +108,8 @@ static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *
>   * This operation is non-atomic. If two instances of this operation race, one
>   * can appear to succeed but actually fail.
>   */
> -static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_clear_bit(nr, addr);
> @@ -118,7 +123,8 @@ static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long
>   * This operation is non-atomic. If two instances of this operation race, one
>   * can appear to succeed but actually fail.
>   */
> -static __always_inline bool __test_and_change_bit(long nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_change_bit(nr, addr);
> @@ -129,7 +135,8 @@ static __always_inline bool __test_and_change_bit(long nr, volatile unsigned lon
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
> -static __always_inline bool test_bit(long nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long));
>  	return arch_test_bit(nr, addr);
> diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h
> index 23d3abc1e10d..5c37ced343ae 100644
> --- a/include/asm-generic/bitops/non-atomic.h
> +++ b/include/asm-generic/bitops/non-atomic.h
> @@ -5,24 +5,15 @@
>  #include <asm-generic/bitops/generic-non-atomic.h>
>  
>  #define arch___set_bit generic___set_bit
> -#define __set_bit arch___set_bit
> -
>  #define arch___clear_bit generic___clear_bit
> -#define __clear_bit arch___clear_bit
> -
>  #define arch___change_bit generic___change_bit
> -#define __change_bit arch___change_bit
>  
>  #define arch___test_and_set_bit generic___test_and_set_bit
> -#define __test_and_set_bit arch___test_and_set_bit
> -
>  #define arch___test_and_clear_bit generic___test_and_clear_bit
> -#define __test_and_clear_bit arch___test_and_clear_bit
> -
>  #define arch___test_and_change_bit generic___test_and_change_bit
> -#define __test_and_change_bit arch___test_and_change_bit
>  
>  #define arch_test_bit generic_test_bit
> -#define test_bit arch_test_bit
> +
> +#include <asm-generic/bitops/non-instrumented-non-atomic.h>
>  
>  #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
> diff --git a/include/asm-generic/bitops/non-instrumented-non-atomic.h b/include/asm-generic/bitops/non-instrumented-non-atomic.h
> new file mode 100644
> index 000000000000..e0fd7bf72a56
> --- /dev/null
> +++ b/include/asm-generic/bitops/non-instrumented-non-atomic.h
> @@ -0,0 +1,16 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H
> +#define __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H
> +
> +#define __set_bit		arch___set_bit
> +#define __clear_bit		arch___clear_bit
> +#define __change_bit		arch___change_bit
> +
> +#define __test_and_set_bit	arch___test_and_set_bit
> +#define __test_and_clear_bit	arch___test_and_clear_bit
> +#define __test_and_change_bit	arch___test_and_change_bit
> +
> +#define test_bit		arch_test_bit
> +
> +#endif /* __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H */
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index 7aaed501f768..87087454a288 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -26,12 +26,29 @@ extern unsigned int __sw_hweight16(unsigned int w);
>  extern unsigned int __sw_hweight32(unsigned int w);
>  extern unsigned long __sw_hweight64(__u64 w);
>  
> +#include <asm-generic/bitops/generic-non-atomic.h>
> +
>  /*
>   * Include this here because some architectures need generic_ffs/fls in
>   * scope
>   */
>  #include <asm/bitops.h>
>  
> +/* Check that the bitops prototypes are sane */
> +#define __check_bitop_pr(name)						\
> +	static_assert(__same_type(arch_##name, generic_##name) &&	\
> +		      __same_type(name, generic_##name))
> +
> +__check_bitop_pr(__set_bit);
> +__check_bitop_pr(__clear_bit);
> +__check_bitop_pr(__change_bit);
> +__check_bitop_pr(__test_and_set_bit);
> +__check_bitop_pr(__test_and_clear_bit);
> +__check_bitop_pr(__test_and_change_bit);
> +__check_bitop_pr(test_bit);
> +
> +#undef __check_bitop_pr
> +
>  static inline int get_bitmask_order(unsigned int count)
>  {
>  	int order;
> diff --git a/tools/include/asm-generic/bitops/non-atomic.h b/tools/include/asm-generic/bitops/non-atomic.h
> index 7e10c4b50c5d..e5e78e42e57b 100644
> --- a/tools/include/asm-generic/bitops/non-atomic.h
> +++ b/tools/include/asm-generic/bitops/non-atomic.h
> @@ -2,7 +2,7 @@
>  #ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
>  #define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
>  
> -#include <asm/types.h>
> +#include <linux/bits.h>
>  
>  /**
>   * __set_bit - Set a bit in memory
> @@ -13,7 +13,8 @@
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static inline void __set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -21,7 +22,8 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
>  	*p  |= mask;
>  }
>  
> -static inline void __clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -38,7 +40,8 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static inline void __change_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -55,7 +58,8 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -74,7 +78,8 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -85,8 +90,8 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
>  }
>  
>  /* WARNING: non atomic and it can be reordered! */
> -static inline int __test_and_change_bit(int nr,
> -					    volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -101,7 +106,8 @@ static inline int __test_and_change_bit(int nr,
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
> -static inline int test_bit(int nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
>  }
> -- 
> 2.36.1
> 

-- 
With Best Regards,
Andy Shevchenko


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 3/7] bitops: unify non-atomic bitops prototypes across architectures
@ 2022-06-20 10:02       ` Andy Shevchenko
  0 siblings, 0 replies; 93+ messages in thread
From: Andy Shevchenko @ 2022-06-20 10:02 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Mark Rutland, Matt Turner, Brian Cain,
	Geert Uytterhoeven, Yoshinori Sato, Rich Felker, David S. Miller,
	Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k

On Fri, Jun 17, 2022 at 04:40:27PM +0200, Alexander Lobakin wrote:
> Currently, there is a mess with the prototypes of the non-atomic
> bitops across the different architectures:
> 
> ret	bool, int, unsigned long
> nr	int, long, unsigned int, unsigned long
> addr	volatile unsigned long *, volatile void *
> 
> Thankfully, it doesn't provoke any bugs, but can sometimes make
> the compiler angry when it's not handy at all.
> Adjust all the prototypes to the following standard:
> 
> ret	bool				retval can be only 0 or 1
> nr	unsigned long			native; signed makes no sense
> addr	volatile unsigned long *	bitmaps are arrays of ulongs
> 
> Next, some architectures don't define 'arch_' versions as they don't
> support instrumentation, others do. To make sure there is always the
> same set of callables present and to ease any potential future
> changes, make them all follow the rule:
>  * architecture-specific files define only 'arch_' versions;
>  * non-prefixed versions can be defined only in asm-generic files;
> and place the non-prefixed definitions into a new file in
> asm-generic to be included by non-instrumented architectures.
> 
> Finally, add some static assertions in order to prevent people from
> making a mess in this room again.
> I also used the %__always_inline attribute consistently, so that
> they always get resolved to the actual operations.
> 
> Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> Acked-by: Mark Rutland <mark.rutland@arm.com>
> Reviewed-by: Yury Norov <yury.norov@gmail.com>

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Thanks for doing this!

> ---
>  arch/alpha/include/asm/bitops.h               | 32 ++++++------
>  arch/hexagon/include/asm/bitops.h             | 24 +++++----
>  arch/ia64/include/asm/bitops.h                | 42 ++++++++--------
>  arch/m68k/include/asm/bitops.h                | 49 +++++++++++++------
>  arch/sh/include/asm/bitops-op32.h             | 34 ++++++++-----
>  arch/x86/include/asm/bitops.h                 | 22 +++++----
>  .../asm-generic/bitops/generic-non-atomic.h   | 24 ++++-----
>  .../bitops/instrumented-non-atomic.h          | 21 +++++---
>  include/asm-generic/bitops/non-atomic.h       | 13 +----
>  .../bitops/non-instrumented-non-atomic.h      | 16 ++++++
>  include/linux/bitops.h                        | 17 +++++++
>  tools/include/asm-generic/bitops/non-atomic.h | 24 +++++----
>  12 files changed, 198 insertions(+), 120 deletions(-)
>  create mode 100644 include/asm-generic/bitops/non-instrumented-non-atomic.h
> 
> diff --git a/arch/alpha/include/asm/bitops.h b/arch/alpha/include/asm/bitops.h
> index e1d8483a45f2..492c7713ddae 100644
> --- a/arch/alpha/include/asm/bitops.h
> +++ b/arch/alpha/include/asm/bitops.h
> @@ -46,8 +46,8 @@ set_bit(unsigned long nr, volatile void * addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static inline void
> -__set_bit(unsigned long nr, volatile void * addr)
> +static __always_inline void
> +arch___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	int *m = ((int *) addr) + (nr >> 5);
>  
> @@ -82,8 +82,8 @@ clear_bit_unlock(unsigned long nr, volatile void * addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static __inline__ void
> -__clear_bit(unsigned long nr, volatile void * addr)
> +static __always_inline void
> +arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	int *m = ((int *) addr) + (nr >> 5);
>  
> @@ -94,7 +94,7 @@ static inline void
>  __clear_bit_unlock(unsigned long nr, volatile void * addr)
>  {
>  	smp_mb();
> -	__clear_bit(nr, addr);
> +	arch___clear_bit(nr, addr);
>  }
>  
>  static inline void
> @@ -118,8 +118,8 @@ change_bit(unsigned long nr, volatile void * addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static __inline__ void
> -__change_bit(unsigned long nr, volatile void * addr)
> +static __always_inline void
> +arch___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	int *m = ((int *) addr) + (nr >> 5);
>  
> @@ -186,8 +186,8 @@ test_and_set_bit_lock(unsigned long nr, volatile void *addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static inline int
> -__test_and_set_bit(unsigned long nr, volatile void * addr)
> +static __always_inline bool
> +arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = 1 << (nr & 0x1f);
>  	int *m = ((int *) addr) + (nr >> 5);
> @@ -230,8 +230,8 @@ test_and_clear_bit(unsigned long nr, volatile void * addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static inline int
> -__test_and_clear_bit(unsigned long nr, volatile void * addr)
> +static __always_inline bool
> +arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = 1 << (nr & 0x1f);
>  	int *m = ((int *) addr) + (nr >> 5);
> @@ -272,8 +272,8 @@ test_and_change_bit(unsigned long nr, volatile void * addr)
>  /*
>   * WARNING: non atomic version.
>   */
> -static __inline__ int
> -__test_and_change_bit(unsigned long nr, volatile void * addr)
> +static __always_inline bool
> +arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = 1 << (nr & 0x1f);
>  	int *m = ((int *) addr) + (nr >> 5);
> @@ -283,8 +283,8 @@ __test_and_change_bit(unsigned long nr, volatile void * addr)
>  	return (old & mask) != 0;
>  }
>  
> -static inline int
> -test_bit(int nr, const volatile void * addr)
> +static __always_inline bool
> +arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return (1UL & (((const int *) addr)[nr >> 5] >> (nr & 31))) != 0UL;
>  }
> @@ -450,6 +450,8 @@ sched_find_first_bit(const unsigned long b[2])
>  	return __ffs(tmp) + ofs;
>  }
>  
> +#include <asm-generic/bitops/non-instrumented-non-atomic.h>
> +
>  #include <asm-generic/bitops/le.h>
>  
>  #include <asm-generic/bitops/ext2-atomic-setbit.h>
> diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h
> index 75d6ba3643b8..da500471ac73 100644
> --- a/arch/hexagon/include/asm/bitops.h
> +++ b/arch/hexagon/include/asm/bitops.h
> @@ -127,38 +127,45 @@ static inline void change_bit(int nr, volatile void *addr)
>   * be atomic, particularly for things like slab_lock and slab_unlock.
>   *
>   */
> -static inline void __clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	test_and_clear_bit(nr, addr);
>  }
>  
> -static inline void __set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +arch___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	test_and_set_bit(nr, addr);
>  }
>  
> -static inline void __change_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +arch___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	test_and_change_bit(nr, addr);
>  }
>  
>  /*  Apparently, at least some of these are allowed to be non-atomic  */
> -static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_clear_bit(nr, addr);
>  }
>  
> -static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_set_bit(nr, addr);
>  }
>  
> -static inline int __test_and_change_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	return test_and_change_bit(nr, addr);
>  }
>  
> -static inline int __test_bit(int nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	int retval;
>  
> @@ -172,8 +179,6 @@ static inline int __test_bit(int nr, const volatile unsigned long *addr)
>  	return retval;
>  }
>  
> -#define test_bit(nr, addr) __test_bit(nr, addr)
> -
>  /*
>   * ffz - find first zero in word.
>   * @word: The word to search
> @@ -271,6 +276,7 @@ static inline unsigned long __fls(unsigned long word)
>  }
>  
>  #include <asm-generic/bitops/lock.h>
> +#include <asm-generic/bitops/non-instrumented-non-atomic.h>
>  
>  #include <asm-generic/bitops/fls64.h>
>  #include <asm-generic/bitops/sched.h>
> diff --git a/arch/ia64/include/asm/bitops.h b/arch/ia64/include/asm/bitops.h
> index 577be93c0818..9f62af7fd7c4 100644
> --- a/arch/ia64/include/asm/bitops.h
> +++ b/arch/ia64/include/asm/bitops.h
> @@ -53,7 +53,7 @@ set_bit (int nr, volatile void *addr)
>  }
>  
>  /**
> - * __set_bit - Set a bit in memory
> + * arch___set_bit - Set a bit in memory
>   * @nr: the bit to set
>   * @addr: the address to start counting from
>   *
> @@ -61,8 +61,8 @@ set_bit (int nr, volatile void *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static __inline__ void
> -__set_bit (int nr, volatile void *addr)
> +static __always_inline void
> +arch___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	*((__u32 *) addr + (nr >> 5)) |= (1 << (nr & 31));
>  }
> @@ -135,7 +135,7 @@ __clear_bit_unlock(int nr, void *addr)
>  }
>  
>  /**
> - * __clear_bit - Clears a bit in memory (non-atomic version)
> + * arch___clear_bit - Clears a bit in memory (non-atomic version)
>   * @nr: the bit to clear
>   * @addr: the address to start counting from
>   *
> @@ -143,8 +143,8 @@ __clear_bit_unlock(int nr, void *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static __inline__ void
> -__clear_bit (int nr, volatile void *addr)
> +static __always_inline void
> +arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	*((__u32 *) addr + (nr >> 5)) &= ~(1 << (nr & 31));
>  }
> @@ -175,7 +175,7 @@ change_bit (int nr, volatile void *addr)
>  }
>  
>  /**
> - * __change_bit - Toggle a bit in memory
> + * arch___change_bit - Toggle a bit in memory
>   * @nr: the bit to toggle
>   * @addr: the address to start counting from
>   *
> @@ -183,8 +183,8 @@ change_bit (int nr, volatile void *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static __inline__ void
> -__change_bit (int nr, volatile void *addr)
> +static __always_inline void
> +arch___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	*((__u32 *) addr + (nr >> 5)) ^= (1 << (nr & 31));
>  }
> @@ -224,7 +224,7 @@ test_and_set_bit (int nr, volatile void *addr)
>  #define test_and_set_bit_lock test_and_set_bit
>  
>  /**
> - * __test_and_set_bit - Set a bit and return its old value
> + * arch___test_and_set_bit - Set a bit and return its old value
>   * @nr: Bit to set
>   * @addr: Address to count from
>   *
> @@ -232,8 +232,8 @@ test_and_set_bit (int nr, volatile void *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static __inline__ int
> -__test_and_set_bit (int nr, volatile void *addr)
> +static __always_inline bool
> +arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__u32 *p = (__u32 *) addr + (nr >> 5);
>  	__u32 m = 1 << (nr & 31);
> @@ -269,7 +269,7 @@ test_and_clear_bit (int nr, volatile void *addr)
>  }
>  
>  /**
> - * __test_and_clear_bit - Clear a bit and return its old value
> + * arch___test_and_clear_bit - Clear a bit and return its old value
>   * @nr: Bit to clear
>   * @addr: Address to count from
>   *
> @@ -277,8 +277,8 @@ test_and_clear_bit (int nr, volatile void *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static __inline__ int
> -__test_and_clear_bit(int nr, volatile void * addr)
> +static __always_inline bool
> +arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__u32 *p = (__u32 *) addr + (nr >> 5);
>  	__u32 m = 1 << (nr & 31);
> @@ -314,14 +314,14 @@ test_and_change_bit (int nr, volatile void *addr)
>  }
>  
>  /**
> - * __test_and_change_bit - Change a bit and return its old value
> + * arch___test_and_change_bit - Change a bit and return its old value
>   * @nr: Bit to change
>   * @addr: Address to count from
>   *
>   * This operation is non-atomic and can be reordered.
>   */
> -static __inline__ int
> -__test_and_change_bit (int nr, void *addr)
> +static __always_inline bool
> +arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__u32 old, bit = (1 << (nr & 31));
>  	__u32 *m = (__u32 *) addr + (nr >> 5);
> @@ -331,8 +331,8 @@ __test_and_change_bit (int nr, void *addr)
>  	return (old & bit) != 0;
>  }
>  
> -static __inline__ int
> -test_bit (int nr, const volatile void *addr)
> +static __always_inline bool
> +arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return 1 & (((const volatile __u32 *) addr)[nr >> 5] >> (nr & 31));
>  }
> @@ -443,6 +443,8 @@ static __inline__ unsigned long __arch_hweight64(unsigned long x)
>  
>  #ifdef __KERNEL__
>  
> +#include <asm-generic/bitops/non-instrumented-non-atomic.h>
> +
>  #include <asm-generic/bitops/le.h>
>  
>  #include <asm-generic/bitops/ext2-atomic-setbit.h>
> diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
> index 51283db53667..71495faf2a90 100644
> --- a/arch/m68k/include/asm/bitops.h
> +++ b/arch/m68k/include/asm/bitops.h
> @@ -65,8 +65,11 @@ static inline void bfset_mem_set_bit(int nr, volatile unsigned long *vaddr)
>  				bfset_mem_set_bit(nr, vaddr))
>  #endif
>  
> -#define __set_bit(nr, vaddr)	set_bit(nr, vaddr)
> -
> +static __always_inline void
> +arch___set_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> +	set_bit(nr, addr);
> +}
>  
>  static inline void bclr_reg_clear_bit(int nr, volatile unsigned long *vaddr)
>  {
> @@ -105,8 +108,11 @@ static inline void bfclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
>  				bfclr_mem_clear_bit(nr, vaddr))
>  #endif
>  
> -#define __clear_bit(nr, vaddr)	clear_bit(nr, vaddr)
> -
> +static __always_inline void
> +arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> +	clear_bit(nr, addr);
> +}
>  
>  static inline void bchg_reg_change_bit(int nr, volatile unsigned long *vaddr)
>  {
> @@ -145,14 +151,17 @@ static inline void bfchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
>  				bfchg_mem_change_bit(nr, vaddr))
>  #endif
>  
> -#define __change_bit(nr, vaddr)	change_bit(nr, vaddr)
> -
> -
> -static inline int test_bit(int nr, const volatile unsigned long *vaddr)
> +static __always_inline void
> +arch___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
> -	return (vaddr[nr >> 5] & (1UL << (nr & 31))) != 0;
> +	change_bit(nr, addr);
>  }
>  
> +static __always_inline bool
> +arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
> +{
> +	return (addr[nr >> 5] & (1UL << (nr & 31))) != 0;
> +}
>  
>  static inline int bset_reg_test_and_set_bit(int nr,
>  					    volatile unsigned long *vaddr)
> @@ -201,8 +210,11 @@ static inline int bfset_mem_test_and_set_bit(int nr,
>  					bfset_mem_test_and_set_bit(nr, vaddr))
>  #endif
>  
> -#define __test_and_set_bit(nr, vaddr)	test_and_set_bit(nr, vaddr)
> -
> +static __always_inline bool
> +arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> +	return test_and_set_bit(nr, addr);
> +}
>  
>  static inline int bclr_reg_test_and_clear_bit(int nr,
>  					      volatile unsigned long *vaddr)
> @@ -251,8 +263,11 @@ static inline int bfclr_mem_test_and_clear_bit(int nr,
>  					bfclr_mem_test_and_clear_bit(nr, vaddr))
>  #endif
>  
> -#define __test_and_clear_bit(nr, vaddr)	test_and_clear_bit(nr, vaddr)
> -
> +static __always_inline bool
> +arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> +	return test_and_clear_bit(nr, addr);
> +}
>  
>  static inline int bchg_reg_test_and_change_bit(int nr,
>  					       volatile unsigned long *vaddr)
> @@ -301,8 +316,11 @@ static inline int bfchg_mem_test_and_change_bit(int nr,
>  					bfchg_mem_test_and_change_bit(nr, vaddr))
>  #endif
>  
> -#define __test_and_change_bit(nr, vaddr) test_and_change_bit(nr, vaddr)
> -
> +static __always_inline bool
> +arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> +	return test_and_change_bit(nr, addr);
> +}
>  
>  /*
>   *	The true 68020 and more advanced processors support the "bfffo"
> @@ -522,6 +540,7 @@ static inline int __fls(int x)
>  #define clear_bit_unlock	clear_bit
>  #define __clear_bit_unlock	clear_bit_unlock
>  
> +#include <asm-generic/bitops/non-instrumented-non-atomic.h>
>  #include <asm-generic/bitops/ext2-atomic.h>
>  #include <asm-generic/bitops/fls64.h>
>  #include <asm-generic/bitops/sched.h>
> diff --git a/arch/sh/include/asm/bitops-op32.h b/arch/sh/include/asm/bitops-op32.h
> index cfe5465acce7..565a85d8b7fb 100644
> --- a/arch/sh/include/asm/bitops-op32.h
> +++ b/arch/sh/include/asm/bitops-op32.h
> @@ -2,6 +2,8 @@
>  #ifndef __ASM_SH_BITOPS_OP32_H
>  #define __ASM_SH_BITOPS_OP32_H
>  
> +#include <linux/bits.h>
> +
>  /*
>   * The bit modifying instructions on SH-2A are only capable of working
>   * with a 3-bit immediate, which signifies the shift position for the bit
> @@ -16,7 +18,8 @@
>  #define BYTE_OFFSET(nr)		((nr) % BITS_PER_BYTE)
>  #endif
>  
> -static inline void __set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +arch___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	if (__builtin_constant_p(nr)) {
>  		__asm__ __volatile__ (
> @@ -33,7 +36,8 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
>  	}
>  }
>  
> -static inline void __clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	if (__builtin_constant_p(nr)) {
>  		__asm__ __volatile__ (
> @@ -52,7 +56,7 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __change_bit - Toggle a bit in memory
> + * arch___change_bit - Toggle a bit in memory
>   * @nr: the bit to change
>   * @addr: the address to start counting from
>   *
> @@ -60,7 +64,8 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static inline void __change_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +arch___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	if (__builtin_constant_p(nr)) {
>  		__asm__ __volatile__ (
> @@ -79,7 +84,7 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __test_and_set_bit - Set a bit and return its old value
> + * arch___test_and_set_bit - Set a bit and return its old value
>   * @nr: Bit to set
>   * @addr: Address to count from
>   *
> @@ -87,7 +92,8 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -98,7 +104,7 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __test_and_clear_bit - Clear a bit and return its old value
> + * arch___test_and_clear_bit - Clear a bit and return its old value
>   * @nr: Bit to clear
>   * @addr: Address to count from
>   *
> @@ -106,7 +112,8 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -117,8 +124,8 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
>  }
>  
>  /* WARNING: non atomic and it can be reordered! */
> -static inline int __test_and_change_bit(int nr,
> -					    volatile unsigned long *addr)
> +static __always_inline bool
> +arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -129,13 +136,16 @@ static inline int __test_and_change_bit(int nr,
>  }
>  
>  /**
> - * test_bit - Determine whether a bit is set
> + * arch_test_bit - Determine whether a bit is set
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
> -static inline int test_bit(int nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
>  }
>  
> +#include <asm-generic/bitops/non-instrumented-non-atomic.h>
> +
>  #endif /* __ASM_SH_BITOPS_OP32_H */
> diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
> index a288ecd230ab..973c6bd17f98 100644
> --- a/arch/x86/include/asm/bitops.h
> +++ b/arch/x86/include/asm/bitops.h
> @@ -63,7 +63,7 @@ arch_set_bit(long nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline void
> -arch___set_bit(long nr, volatile unsigned long *addr)
> +arch___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	asm volatile(__ASM_SIZE(bts) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
>  }
> @@ -89,7 +89,7 @@ arch_clear_bit_unlock(long nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline void
> -arch___clear_bit(long nr, volatile unsigned long *addr)
> +arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	asm volatile(__ASM_SIZE(btr) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
>  }
> @@ -114,7 +114,7 @@ arch___clear_bit_unlock(long nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline void
> -arch___change_bit(long nr, volatile unsigned long *addr)
> +arch___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	asm volatile(__ASM_SIZE(btc) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
>  }
> @@ -145,7 +145,7 @@ arch_test_and_set_bit_lock(long nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline bool
> -arch___test_and_set_bit(long nr, volatile unsigned long *addr)
> +arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	bool oldbit;
>  
> @@ -171,7 +171,7 @@ arch_test_and_clear_bit(long nr, volatile unsigned long *addr)
>   * this without also updating arch/x86/kernel/kvm.c
>   */
>  static __always_inline bool
> -arch___test_and_clear_bit(long nr, volatile unsigned long *addr)
> +arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	bool oldbit;
>  
> @@ -183,7 +183,7 @@ arch___test_and_clear_bit(long nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline bool
> -arch___test_and_change_bit(long nr, volatile unsigned long *addr)
> +arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	bool oldbit;
>  
> @@ -219,10 +219,12 @@ static __always_inline bool variable_test_bit(long nr, volatile const unsigned l
>  	return oldbit;
>  }
>  
> -#define arch_test_bit(nr, addr)			\
> -	(__builtin_constant_p((nr))		\
> -	 ? constant_test_bit((nr), (addr))	\
> -	 : variable_test_bit((nr), (addr)))
> +static __always_inline bool
> +arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
> +{
> +	return __builtin_constant_p(nr) ? constant_test_bit(nr, addr) :
> +					  variable_test_bit(nr, addr);
> +}
>  
>  /**
>   * __ffs - find first set bit in word
> diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> index 7226488810e5..b85b8a2ac239 100644
> --- a/include/asm-generic/bitops/generic-non-atomic.h
> +++ b/include/asm-generic/bitops/generic-non-atomic.h
> @@ -24,7 +24,7 @@
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -generic___set_bit(unsigned int nr, volatile unsigned long *addr)
> +generic___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -33,7 +33,7 @@ generic___set_bit(unsigned int nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline void
> -generic___clear_bit(unsigned int nr, volatile unsigned long *addr)
> +generic___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -50,8 +50,8 @@ generic___clear_bit(unsigned int nr, volatile unsigned long *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static __always_inline
> -void generic___change_bit(unsigned int nr, volatile unsigned long *addr)
> +static __always_inline void
> +generic___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -68,8 +68,8 @@ void generic___change_bit(unsigned int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static __always_inline int
> -generic___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +generic___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -88,8 +88,8 @@ generic___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static __always_inline int
> -generic___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +generic___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -100,8 +100,8 @@ generic___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
>  }
>  
>  /* WARNING: non atomic and it can be reordered! */
> -static __always_inline int
> -generic___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +generic___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -116,8 +116,8 @@ generic___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
> -static __always_inline int
> -generic_test_bit(unsigned int nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +generic_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	/*
>  	 * Unlike the bitops with the '__' prefix above, this one *is* atomic,
> diff --git a/include/asm-generic/bitops/instrumented-non-atomic.h b/include/asm-generic/bitops/instrumented-non-atomic.h
> index 7ab1ecc37782..b019f77ef21c 100644
> --- a/include/asm-generic/bitops/instrumented-non-atomic.h
> +++ b/include/asm-generic/bitops/instrumented-non-atomic.h
> @@ -22,7 +22,8 @@
>   * region of memory concurrently, the effect may be that only one operation
>   * succeeds.
>   */
> -static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
> +static __always_inline void
> +__set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___set_bit(nr, addr);
> @@ -37,7 +38,8 @@ static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
>   * region of memory concurrently, the effect may be that only one operation
>   * succeeds.
>   */
> -static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
> +static __always_inline void
> +__clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___clear_bit(nr, addr);
> @@ -52,7 +54,8 @@ static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
>   * region of memory concurrently, the effect may be that only one operation
>   * succeeds.
>   */
> -static __always_inline void __change_bit(long nr, volatile unsigned long *addr)
> +static __always_inline void
> +__change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___change_bit(nr, addr);
> @@ -90,7 +93,8 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
>   * This operation is non-atomic. If two instances of this operation race, one
>   * can appear to succeed but actually fail.
>   */
> -static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_set_bit(nr, addr);
> @@ -104,7 +108,8 @@ static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *
>   * This operation is non-atomic. If two instances of this operation race, one
>   * can appear to succeed but actually fail.
>   */
> -static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_clear_bit(nr, addr);
> @@ -118,7 +123,8 @@ static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long
>   * This operation is non-atomic. If two instances of this operation race, one
>   * can appear to succeed but actually fail.
>   */
> -static __always_inline bool __test_and_change_bit(long nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_change_bit(nr, addr);
> @@ -129,7 +135,8 @@ static __always_inline bool __test_and_change_bit(long nr, volatile unsigned lon
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
> -static __always_inline bool test_bit(long nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long));
>  	return arch_test_bit(nr, addr);
> diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h
> index 23d3abc1e10d..5c37ced343ae 100644
> --- a/include/asm-generic/bitops/non-atomic.h
> +++ b/include/asm-generic/bitops/non-atomic.h
> @@ -5,24 +5,15 @@
>  #include <asm-generic/bitops/generic-non-atomic.h>
>  
>  #define arch___set_bit generic___set_bit
> -#define __set_bit arch___set_bit
> -
>  #define arch___clear_bit generic___clear_bit
> -#define __clear_bit arch___clear_bit
> -
>  #define arch___change_bit generic___change_bit
> -#define __change_bit arch___change_bit
>  
>  #define arch___test_and_set_bit generic___test_and_set_bit
> -#define __test_and_set_bit arch___test_and_set_bit
> -
>  #define arch___test_and_clear_bit generic___test_and_clear_bit
> -#define __test_and_clear_bit arch___test_and_clear_bit
> -
>  #define arch___test_and_change_bit generic___test_and_change_bit
> -#define __test_and_change_bit arch___test_and_change_bit
>  
>  #define arch_test_bit generic_test_bit
> -#define test_bit arch_test_bit
> +
> +#include <asm-generic/bitops/non-instrumented-non-atomic.h>
>  
>  #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
> diff --git a/include/asm-generic/bitops/non-instrumented-non-atomic.h b/include/asm-generic/bitops/non-instrumented-non-atomic.h
> new file mode 100644
> index 000000000000..e0fd7bf72a56
> --- /dev/null
> +++ b/include/asm-generic/bitops/non-instrumented-non-atomic.h
> @@ -0,0 +1,16 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H
> +#define __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H
> +
> +#define __set_bit		arch___set_bit
> +#define __clear_bit		arch___clear_bit
> +#define __change_bit		arch___change_bit
> +
> +#define __test_and_set_bit	arch___test_and_set_bit
> +#define __test_and_clear_bit	arch___test_and_clear_bit
> +#define __test_and_change_bit	arch___test_and_change_bit
> +
> +#define test_bit		arch_test_bit
> +
> +#endif /* __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H */
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index 7aaed501f768..87087454a288 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -26,12 +26,29 @@ extern unsigned int __sw_hweight16(unsigned int w);
>  extern unsigned int __sw_hweight32(unsigned int w);
>  extern unsigned long __sw_hweight64(__u64 w);
>  
> +#include <asm-generic/bitops/generic-non-atomic.h>
> +
>  /*
>   * Include this here because some architectures need generic_ffs/fls in
>   * scope
>   */
>  #include <asm/bitops.h>
>  
> +/* Check that the bitops prototypes are sane */
> +#define __check_bitop_pr(name)						\
> +	static_assert(__same_type(arch_##name, generic_##name) &&	\
> +		      __same_type(name, generic_##name))
> +
> +__check_bitop_pr(__set_bit);
> +__check_bitop_pr(__clear_bit);
> +__check_bitop_pr(__change_bit);
> +__check_bitop_pr(__test_and_set_bit);
> +__check_bitop_pr(__test_and_clear_bit);
> +__check_bitop_pr(__test_and_change_bit);
> +__check_bitop_pr(test_bit);
> +
> +#undef __check_bitop_pr
> +
>  static inline int get_bitmask_order(unsigned int count)
>  {
>  	int order;
> diff --git a/tools/include/asm-generic/bitops/non-atomic.h b/tools/include/asm-generic/bitops/non-atomic.h
> index 7e10c4b50c5d..e5e78e42e57b 100644
> --- a/tools/include/asm-generic/bitops/non-atomic.h
> +++ b/tools/include/asm-generic/bitops/non-atomic.h
> @@ -2,7 +2,7 @@
>  #ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
>  #define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
>  
> -#include <asm/types.h>
> +#include <linux/bits.h>
>  
>  /**
>   * __set_bit - Set a bit in memory
> @@ -13,7 +13,8 @@
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static inline void __set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -21,7 +22,8 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
>  	*p  |= mask;
>  }
>  
> -static inline void __clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -38,7 +40,8 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
>   * If it's called on the same region of memory simultaneously, the effect
>   * may be that only one operation succeeds.
>   */
> -static inline void __change_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -55,7 +58,8 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -74,7 +78,8 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -85,8 +90,8 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
>  }
>  
>  /* WARNING: non atomic and it can be reordered! */
> -static inline int __test_and_change_bit(int nr,
> -					    volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -101,7 +106,8 @@ static inline int __test_and_change_bit(int nr,
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
> -static inline int test_bit(int nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
>  }
> -- 
> 2.36.1
> 

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 4/7] bitops: define const_*() versions of the non-atomics
  2022-06-17 14:40     ` Alexander Lobakin
  (?)
@ 2022-06-20 10:03       ` Andy Shevchenko
  -1 siblings, 0 replies; 93+ messages in thread
From: Andy Shevchenko @ 2022-06-20 10:03 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Mark Rutland, Matt Turner, Brian Cain,
	Geert Uytterhoeven, Yoshinori Sato, Rich Felker, David S. Miller,
	Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

On Fri, Jun 17, 2022 at 04:40:28PM +0200, Alexander Lobakin wrote:
> Define const_*() variants of the non-atomic bitops to be used when
> the input arguments are compile-time constants, so that the compiler
> will be always able to resolve those to compile-time constants as
> well. Those are mostly direct aliases for generic_*() with one
> exception for const_test_bit(): the original one is declared
> atomic-safe and thus doesn't discard the `volatile` qualifier, so
> in order to let optimize code, define it separately disregarding
> the qualifier.
> Add them to the compile-time type checks as well just in case.

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

> Suggested-by: Marco Elver <elver@google.com>
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> ---
>  .../asm-generic/bitops/generic-non-atomic.h   | 31 +++++++++++++++++++
>  include/linux/bitops.h                        |  1 +
>  2 files changed, 32 insertions(+)
> 
> diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> index b85b8a2ac239..3d5ebd24652b 100644
> --- a/include/asm-generic/bitops/generic-non-atomic.h
> +++ b/include/asm-generic/bitops/generic-non-atomic.h
> @@ -127,4 +127,35 @@ generic_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
>  }
>  
> +/*
> + * const_*() definitions provide good compile-time optimizations when
> + * the passed arguments can be resolved at compile time.
> + */
> +#define const___set_bit			generic___set_bit
> +#define const___clear_bit		generic___clear_bit
> +#define const___change_bit		generic___change_bit
> +#define const___test_and_set_bit	generic___test_and_set_bit
> +#define const___test_and_clear_bit	generic___test_and_clear_bit
> +#define const___test_and_change_bit	generic___test_and_change_bit
> +
> +/**
> + * const_test_bit - Determine whether a bit is set
> + * @nr: bit number to test
> + * @addr: Address to start counting from
> + *
> + * A version of generic_test_bit() which discards the `volatile` qualifier to
> + * allow a compiler to optimize code harder. Non-atomic and to be called only
> + * for testing compile-time constants, e.g. by the corresponding macros, not
> + * directly from "regular" code.
> + */
> +static __always_inline bool
> +const_test_bit(unsigned long nr, const volatile unsigned long *addr)
> +{
> +	const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
> +	unsigned long mask = BIT_MASK(nr);
> +	unsigned long val = *p;
> +
> +	return !!(val & mask);
> +}
> +
>  #endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index 87087454a288..d393297287d5 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -37,6 +37,7 @@ extern unsigned long __sw_hweight64(__u64 w);
>  /* Check that the bitops prototypes are sane */
>  #define __check_bitop_pr(name)						\
>  	static_assert(__same_type(arch_##name, generic_##name) &&	\
> +		      __same_type(const_##name, generic_##name) &&	\
>  		      __same_type(name, generic_##name))
>  
>  __check_bitop_pr(__set_bit);
> -- 
> 2.36.1
> 

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 4/7] bitops: define const_*() versions of the non-atomics
@ 2022-06-20 10:03       ` Andy Shevchenko
  0 siblings, 0 replies; 93+ messages in thread
From: Andy Shevchenko @ 2022-06-20 10:03 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Mark Rutland, Matt Turner, Brian Cain,
	Geert Uytterhoeven, Yoshinori Sato, Rich Felker, David S. Miller,
	Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

On Fri, Jun 17, 2022 at 04:40:28PM +0200, Alexander Lobakin wrote:
> Define const_*() variants of the non-atomic bitops to be used when
> the input arguments are compile-time constants, so that the compiler
> will be always able to resolve those to compile-time constants as
> well. Those are mostly direct aliases for generic_*() with one
> exception for const_test_bit(): the original one is declared
> atomic-safe and thus doesn't discard the `volatile` qualifier, so
> in order to let optimize code, define it separately disregarding
> the qualifier.
> Add them to the compile-time type checks as well just in case.

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

> Suggested-by: Marco Elver <elver@google.com>
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> ---
>  .../asm-generic/bitops/generic-non-atomic.h   | 31 +++++++++++++++++++
>  include/linux/bitops.h                        |  1 +
>  2 files changed, 32 insertions(+)
> 
> diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> index b85b8a2ac239..3d5ebd24652b 100644
> --- a/include/asm-generic/bitops/generic-non-atomic.h
> +++ b/include/asm-generic/bitops/generic-non-atomic.h
> @@ -127,4 +127,35 @@ generic_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
>  }
>  
> +/*
> + * const_*() definitions provide good compile-time optimizations when
> + * the passed arguments can be resolved at compile time.
> + */
> +#define const___set_bit			generic___set_bit
> +#define const___clear_bit		generic___clear_bit
> +#define const___change_bit		generic___change_bit
> +#define const___test_and_set_bit	generic___test_and_set_bit
> +#define const___test_and_clear_bit	generic___test_and_clear_bit
> +#define const___test_and_change_bit	generic___test_and_change_bit
> +
> +/**
> + * const_test_bit - Determine whether a bit is set
> + * @nr: bit number to test
> + * @addr: Address to start counting from
> + *
> + * A version of generic_test_bit() which discards the `volatile` qualifier to
> + * allow a compiler to optimize code harder. Non-atomic and to be called only
> + * for testing compile-time constants, e.g. by the corresponding macros, not
> + * directly from "regular" code.
> + */
> +static __always_inline bool
> +const_test_bit(unsigned long nr, const volatile unsigned long *addr)
> +{
> +	const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
> +	unsigned long mask = BIT_MASK(nr);
> +	unsigned long val = *p;
> +
> +	return !!(val & mask);
> +}
> +
>  #endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index 87087454a288..d393297287d5 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -37,6 +37,7 @@ extern unsigned long __sw_hweight64(__u64 w);
>  /* Check that the bitops prototypes are sane */
>  #define __check_bitop_pr(name)						\
>  	static_assert(__same_type(arch_##name, generic_##name) &&	\
> +		      __same_type(const_##name, generic_##name) &&	\
>  		      __same_type(name, generic_##name))
>  
>  __check_bitop_pr(__set_bit);
> -- 
> 2.36.1
> 

-- 
With Best Regards,
Andy Shevchenko


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 4/7] bitops: define const_*() versions of the non-atomics
@ 2022-06-20 10:03       ` Andy Shevchenko
  0 siblings, 0 replies; 93+ messages in thread
From: Andy Shevchenko @ 2022-06-20 10:03 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Mark Rutland, Matt Turner, Brian Cain,
	Geert Uytterhoeven, Yoshinori Sato, Rich Felker, David S. Miller,
	Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k

On Fri, Jun 17, 2022 at 04:40:28PM +0200, Alexander Lobakin wrote:
> Define const_*() variants of the non-atomic bitops to be used when
> the input arguments are compile-time constants, so that the compiler
> will be always able to resolve those to compile-time constants as
> well. Those are mostly direct aliases for generic_*() with one
> exception for const_test_bit(): the original one is declared
> atomic-safe and thus doesn't discard the `volatile` qualifier, so
> in order to let optimize code, define it separately disregarding
> the qualifier.
> Add them to the compile-time type checks as well just in case.

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

> Suggested-by: Marco Elver <elver@google.com>
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> ---
>  .../asm-generic/bitops/generic-non-atomic.h   | 31 +++++++++++++++++++
>  include/linux/bitops.h                        |  1 +
>  2 files changed, 32 insertions(+)
> 
> diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> index b85b8a2ac239..3d5ebd24652b 100644
> --- a/include/asm-generic/bitops/generic-non-atomic.h
> +++ b/include/asm-generic/bitops/generic-non-atomic.h
> @@ -127,4 +127,35 @@ generic_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
>  }
>  
> +/*
> + * const_*() definitions provide good compile-time optimizations when
> + * the passed arguments can be resolved at compile time.
> + */
> +#define const___set_bit			generic___set_bit
> +#define const___clear_bit		generic___clear_bit
> +#define const___change_bit		generic___change_bit
> +#define const___test_and_set_bit	generic___test_and_set_bit
> +#define const___test_and_clear_bit	generic___test_and_clear_bit
> +#define const___test_and_change_bit	generic___test_and_change_bit
> +
> +/**
> + * const_test_bit - Determine whether a bit is set
> + * @nr: bit number to test
> + * @addr: Address to start counting from
> + *
> + * A version of generic_test_bit() which discards the `volatile` qualifier to
> + * allow a compiler to optimize code harder. Non-atomic and to be called only
> + * for testing compile-time constants, e.g. by the corresponding macros, not
> + * directly from "regular" code.
> + */
> +static __always_inline bool
> +const_test_bit(unsigned long nr, const volatile unsigned long *addr)
> +{
> +	const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
> +	unsigned long mask = BIT_MASK(nr);
> +	unsigned long val = *p;
> +
> +	return !!(val & mask);
> +}
> +
>  #endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index 87087454a288..d393297287d5 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -37,6 +37,7 @@ extern unsigned long __sw_hweight64(__u64 w);
>  /* Check that the bitops prototypes are sane */
>  #define __check_bitop_pr(name)						\
>  	static_assert(__same_type(arch_##name, generic_##name) &&	\
> +		      __same_type(const_##name, generic_##name) &&	\
>  		      __same_type(name, generic_##name))
>  
>  __check_bitop_pr(__set_bit);
> -- 
> 2.36.1
> 

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 6/7] bitops: let optimize out non-atomic bitops on compile-time constants
  2022-06-17 14:40     ` Alexander Lobakin
  (?)
@ 2022-06-20 10:05       ` Andy Shevchenko
  -1 siblings, 0 replies; 93+ messages in thread
From: Andy Shevchenko @ 2022-06-20 10:05 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Mark Rutland, Matt Turner, Brian Cain,
	Geert Uytterhoeven, Yoshinori Sato, Rich Felker, David S. Miller,
	Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

On Fri, Jun 17, 2022 at 04:40:30PM +0200, Alexander Lobakin wrote:
> Currently, many architecture-specific non-atomic bitop
> implementations use inline asm or other hacks which are faster or
> more robust when working with "real" variables (i.e. fields from
> the structures etc.), but the compilers have no clue how to optimize
> them out when called on compile-time constants. That said, the
> following code:
> 
> 	DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
> 	unsigned long bar = BIT(BAR_BIT);
> 	unsigned long baz = 0;
> 
> 	__set_bit(FOO_BIT, foo);
> 	baz |= BIT(BAZ_BIT);
> 
> 	BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
> 	BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
> 	BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
> 
> triggers the first assertion on x86_64, which means that the
> compiler is unable to evaluate it to a compile-time initializer
> when the architecture-specific bitop is used even if it's obvious.
> In order to let the compiler optimize out such cases, expand the
> bitop() macro to use the "constant" C non-atomic bitop
> implementations when all of the arguments passed are compile-time
> constants, which means that the result will be a compile-time
> constant as well, so that it produces more efficient and simple
> code in 100% cases, comparing to the architecture-specific
> counterparts.
> 
> The savings are architecture, compiler and compiler flags dependent,
> for example, on x86_64 -O2:
> 
> GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
> 
> and ARM64 (courtesy of Mark):
> 
> GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)

...

> +/*
> + * Many architecture-specific non-atomic bitops contain inline asm code and due
> + * to that the compiler can't optimize them to compile-time expressions or
> + * constants. In contrary, gen_*() helpers are defined in pure C and compilers

generic_*() ?

> + * optimize them just well.
> + * Therefore, to make `unsigned long foo = 0; __set_bit(BAR, &foo)` effectively
> + * equal to `unsigned long foo = BIT(BAR)`, pick the generic C alternative when
> + * the arguments can be resolved at compile time. That expression itself is a
> + * constant and doesn't bring any functional changes to the rest of cases.
> + * The casts to `uintptr_t` are needed to mitigate `-Waddress` warnings when
> + * passing a bitmap from .bss or .data (-> `!!addr` is always true).
> + */

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 6/7] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-20 10:05       ` Andy Shevchenko
  0 siblings, 0 replies; 93+ messages in thread
From: Andy Shevchenko @ 2022-06-20 10:05 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Mark Rutland, Matt Turner, Brian Cain,
	Geert Uytterhoeven, Yoshinori Sato, Rich Felker, David S. Miller,
	Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

On Fri, Jun 17, 2022 at 04:40:30PM +0200, Alexander Lobakin wrote:
> Currently, many architecture-specific non-atomic bitop
> implementations use inline asm or other hacks which are faster or
> more robust when working with "real" variables (i.e. fields from
> the structures etc.), but the compilers have no clue how to optimize
> them out when called on compile-time constants. That said, the
> following code:
> 
> 	DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
> 	unsigned long bar = BIT(BAR_BIT);
> 	unsigned long baz = 0;
> 
> 	__set_bit(FOO_BIT, foo);
> 	baz |= BIT(BAZ_BIT);
> 
> 	BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
> 	BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
> 	BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
> 
> triggers the first assertion on x86_64, which means that the
> compiler is unable to evaluate it to a compile-time initializer
> when the architecture-specific bitop is used even if it's obvious.
> In order to let the compiler optimize out such cases, expand the
> bitop() macro to use the "constant" C non-atomic bitop
> implementations when all of the arguments passed are compile-time
> constants, which means that the result will be a compile-time
> constant as well, so that it produces more efficient and simple
> code in 100% cases, comparing to the architecture-specific
> counterparts.
> 
> The savings are architecture, compiler and compiler flags dependent,
> for example, on x86_64 -O2:
> 
> GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
> 
> and ARM64 (courtesy of Mark):
> 
> GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)

...

> +/*
> + * Many architecture-specific non-atomic bitops contain inline asm code and due
> + * to that the compiler can't optimize them to compile-time expressions or
> + * constants. In contrary, gen_*() helpers are defined in pure C and compilers

generic_*() ?

> + * optimize them just well.
> + * Therefore, to make `unsigned long foo = 0; __set_bit(BAR, &foo)` effectively
> + * equal to `unsigned long foo = BIT(BAR)`, pick the generic C alternative when
> + * the arguments can be resolved at compile time. That expression itself is a
> + * constant and doesn't bring any functional changes to the rest of cases.
> + * The casts to `uintptr_t` are needed to mitigate `-Waddress` warnings when
> + * passing a bitmap from .bss or .data (-> `!!addr` is always true).
> + */

-- 
With Best Regards,
Andy Shevchenko


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 6/7] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-20 10:05       ` Andy Shevchenko
  0 siblings, 0 replies; 93+ messages in thread
From: Andy Shevchenko @ 2022-06-20 10:05 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Mark Rutland, Matt Turner, Brian Cain,
	Geert Uytterhoeven, Yoshinori Sato, Rich Felker, David S. Miller,
	Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k

On Fri, Jun 17, 2022 at 04:40:30PM +0200, Alexander Lobakin wrote:
> Currently, many architecture-specific non-atomic bitop
> implementations use inline asm or other hacks which are faster or
> more robust when working with "real" variables (i.e. fields from
> the structures etc.), but the compilers have no clue how to optimize
> them out when called on compile-time constants. That said, the
> following code:
> 
> 	DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
> 	unsigned long bar = BIT(BAR_BIT);
> 	unsigned long baz = 0;
> 
> 	__set_bit(FOO_BIT, foo);
> 	baz |= BIT(BAZ_BIT);
> 
> 	BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
> 	BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
> 	BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
> 
> triggers the first assertion on x86_64, which means that the
> compiler is unable to evaluate it to a compile-time initializer
> when the architecture-specific bitop is used even if it's obvious.
> In order to let the compiler optimize out such cases, expand the
> bitop() macro to use the "constant" C non-atomic bitop
> implementations when all of the arguments passed are compile-time
> constants, which means that the result will be a compile-time
> constant as well, so that it produces more efficient and simple
> code in 100% cases, comparing to the architecture-specific
> counterparts.
> 
> The savings are architecture, compiler and compiler flags dependent,
> for example, on x86_64 -O2:
> 
> GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
> 
> and ARM64 (courtesy of Mark):
> 
> GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)

...

> +/*
> + * Many architecture-specific non-atomic bitops contain inline asm code and due
> + * to that the compiler can't optimize them to compile-time expressions or
> + * constants. In contrary, gen_*() helpers are defined in pure C and compilers

generic_*() ?

> + * optimize them just well.
> + * Therefore, to make `unsigned long foo = 0; __set_bit(BAR, &foo)` effectively
> + * equal to `unsigned long foo = BIT(BAR)`, pick the generic C alternative when
> + * the arguments can be resolved at compile time. That expression itself is a
> + * constant and doesn't bring any functional changes to the rest of cases.
> + * The casts to `uintptr_t` are needed to mitigate `-Waddress` warnings when
> + * passing a bitmap from .bss or .data (-> `!!addr` is always true).
> + */

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 7/7] lib: test_bitmap: add compile-time optimization/evaluations assertions
  2022-06-17 14:40     ` Alexander Lobakin
  (?)
@ 2022-06-20 10:07       ` Andy Shevchenko
  -1 siblings, 0 replies; 93+ messages in thread
From: Andy Shevchenko @ 2022-06-20 10:07 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Mark Rutland, Matt Turner, Brian Cain,
	Geert Uytterhoeven, Yoshinori Sato, Rich Felker, David S. Miller,
	Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

On Fri, Jun 17, 2022 at 04:40:31PM +0200, Alexander Lobakin wrote:
> Add a function to the bitmap test suite, which will ensure that
> compilers are able to evaluate operations performed by the
> bitops/bitmap helpers to compile-time constants when all of the
> arguments are compile-time constants as well, or trigger a build
> bug otherwise. This should work on all architectures and all the
> optimization levels supported by Kbuild.
> The function doesn't perform any runtime tests and gets optimized
> out to nothing after passing the build assertions.

Always in favour of new test cases!
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

> Suggested-by: Yury Norov <yury.norov@gmail.com>
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> ---
>  lib/test_bitmap.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 45 insertions(+)
> 
> diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
> index d5923a640457..3a7b09b82794 100644
> --- a/lib/test_bitmap.c
> +++ b/lib/test_bitmap.c
> @@ -869,6 +869,50 @@ static void __init test_bitmap_print_buf(void)
>  	}
>  }
>  
> +static void __init test_bitmap_const_eval(void)
> +{
> +	DECLARE_BITMAP(bitmap, BITS_PER_LONG);
> +	unsigned long initvar = BIT(2);
> +	unsigned long bitopvar = 0;
> +	unsigned long var = 0;
> +	int res;
> +
> +	/*
> +	 * Compilers must be able to optimize all of those to compile-time
> +	 * constants on any supported optimization level (-O2, -Os) and any
> +	 * architecture. Otherwise, trigger a build bug.
> +	 * The whole function gets optimized out then, there's nothing to do
> +	 * in runtime.
> +	 */
> +
> +	/* Equals to `unsigned long bitmap[1] = { BIT(5), }` */
> +	bitmap_clear(bitmap, 0, BITS_PER_LONG);
> +	if (!test_bit(7, bitmap))
> +		bitmap_set(bitmap, 5, 1);
> +
> +	/* Equals to `unsigned long bitopvar = BIT(20)` */
> +	__change_bit(31, &bitopvar);
> +	bitmap_shift_right(&bitopvar, &bitopvar, 11, BITS_PER_LONG);
> +
> +	/* Equals to `unsigned long var = BIT(25)` */
> +	var |= BIT(25);
> +	if (var & BIT(0))
> +		var ^= GENMASK(9, 6);
> +
> +	/* __const_hweight<32|64>(BIT(5)) == 1 */
> +	res = bitmap_weight(bitmap, 20);
> +	BUILD_BUG_ON(!__builtin_constant_p(res));
> +
> +	/* !(BIT(31) & BIT(18)) == 1 */
> +	res = !test_bit(18, &bitopvar);
> +	BUILD_BUG_ON(!__builtin_constant_p(res));
> +
> +	/* BIT(2) & GENMASK(14, 8) == 0 */
> +	BUILD_BUG_ON(!__builtin_constant_p(initvar & GENMASK(14, 8)));
> +	/* ~BIT(25) */
> +	BUILD_BUG_ON(!__builtin_constant_p(~var));
> +}
> +
>  static void __init selftest(void)
>  {
>  	test_zero_clear();
> @@ -884,6 +928,7 @@ static void __init selftest(void)
>  	test_for_each_set_clump8();
>  	test_bitmap_cut();
>  	test_bitmap_print_buf();
> +	test_bitmap_const_eval();
>  }
>  
>  KSTM_MODULE_LOADERS(test_bitmap);
> -- 
> 2.36.1
> 

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 7/7] lib: test_bitmap: add compile-time optimization/evaluations assertions
@ 2022-06-20 10:07       ` Andy Shevchenko
  0 siblings, 0 replies; 93+ messages in thread
From: Andy Shevchenko @ 2022-06-20 10:07 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Mark Rutland, Matt Turner, Brian Cain,
	Geert Uytterhoeven, Yoshinori Sato, Rich Felker, David S. Miller,
	Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

On Fri, Jun 17, 2022 at 04:40:31PM +0200, Alexander Lobakin wrote:
> Add a function to the bitmap test suite, which will ensure that
> compilers are able to evaluate operations performed by the
> bitops/bitmap helpers to compile-time constants when all of the
> arguments are compile-time constants as well, or trigger a build
> bug otherwise. This should work on all architectures and all the
> optimization levels supported by Kbuild.
> The function doesn't perform any runtime tests and gets optimized
> out to nothing after passing the build assertions.

Always in favour of new test cases!
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

> Suggested-by: Yury Norov <yury.norov@gmail.com>
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> ---
>  lib/test_bitmap.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 45 insertions(+)
> 
> diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
> index d5923a640457..3a7b09b82794 100644
> --- a/lib/test_bitmap.c
> +++ b/lib/test_bitmap.c
> @@ -869,6 +869,50 @@ static void __init test_bitmap_print_buf(void)
>  	}
>  }
>  
> +static void __init test_bitmap_const_eval(void)
> +{
> +	DECLARE_BITMAP(bitmap, BITS_PER_LONG);
> +	unsigned long initvar = BIT(2);
> +	unsigned long bitopvar = 0;
> +	unsigned long var = 0;
> +	int res;
> +
> +	/*
> +	 * Compilers must be able to optimize all of those to compile-time
> +	 * constants on any supported optimization level (-O2, -Os) and any
> +	 * architecture. Otherwise, trigger a build bug.
> +	 * The whole function gets optimized out then, there's nothing to do
> +	 * in runtime.
> +	 */
> +
> +	/* Equals to `unsigned long bitmap[1] = { BIT(5), }` */
> +	bitmap_clear(bitmap, 0, BITS_PER_LONG);
> +	if (!test_bit(7, bitmap))
> +		bitmap_set(bitmap, 5, 1);
> +
> +	/* Equals to `unsigned long bitopvar = BIT(20)` */
> +	__change_bit(31, &bitopvar);
> +	bitmap_shift_right(&bitopvar, &bitopvar, 11, BITS_PER_LONG);
> +
> +	/* Equals to `unsigned long var = BIT(25)` */
> +	var |= BIT(25);
> +	if (var & BIT(0))
> +		var ^= GENMASK(9, 6);
> +
> +	/* __const_hweight<32|64>(BIT(5)) = 1 */
> +	res = bitmap_weight(bitmap, 20);
> +	BUILD_BUG_ON(!__builtin_constant_p(res));
> +
> +	/* !(BIT(31) & BIT(18)) = 1 */
> +	res = !test_bit(18, &bitopvar);
> +	BUILD_BUG_ON(!__builtin_constant_p(res));
> +
> +	/* BIT(2) & GENMASK(14, 8) = 0 */
> +	BUILD_BUG_ON(!__builtin_constant_p(initvar & GENMASK(14, 8)));
> +	/* ~BIT(25) */
> +	BUILD_BUG_ON(!__builtin_constant_p(~var));
> +}
> +
>  static void __init selftest(void)
>  {
>  	test_zero_clear();
> @@ -884,6 +928,7 @@ static void __init selftest(void)
>  	test_for_each_set_clump8();
>  	test_bitmap_cut();
>  	test_bitmap_print_buf();
> +	test_bitmap_const_eval();
>  }
>  
>  KSTM_MODULE_LOADERS(test_bitmap);
> -- 
> 2.36.1
> 

-- 
With Best Regards,
Andy Shevchenko


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 7/7] lib: test_bitmap: add compile-time optimization/evaluations assertions
@ 2022-06-20 10:07       ` Andy Shevchenko
  0 siblings, 0 replies; 93+ messages in thread
From: Andy Shevchenko @ 2022-06-20 10:07 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Mark Rutland, Matt Turner, Brian Cain,
	Geert Uytterhoeven, Yoshinori Sato, Rich Felker, David S. Miller,
	Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k

On Fri, Jun 17, 2022 at 04:40:31PM +0200, Alexander Lobakin wrote:
> Add a function to the bitmap test suite, which will ensure that
> compilers are able to evaluate operations performed by the
> bitops/bitmap helpers to compile-time constants when all of the
> arguments are compile-time constants as well, or trigger a build
> bug otherwise. This should work on all architectures and all the
> optimization levels supported by Kbuild.
> The function doesn't perform any runtime tests and gets optimized
> out to nothing after passing the build assertions.

Always in favour of new test cases!
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

> Suggested-by: Yury Norov <yury.norov@gmail.com>
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> ---
>  lib/test_bitmap.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 45 insertions(+)
> 
> diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
> index d5923a640457..3a7b09b82794 100644
> --- a/lib/test_bitmap.c
> +++ b/lib/test_bitmap.c
> @@ -869,6 +869,50 @@ static void __init test_bitmap_print_buf(void)
>  	}
>  }
>  
> +static void __init test_bitmap_const_eval(void)
> +{
> +	DECLARE_BITMAP(bitmap, BITS_PER_LONG);
> +	unsigned long initvar = BIT(2);
> +	unsigned long bitopvar = 0;
> +	unsigned long var = 0;
> +	int res;
> +
> +	/*
> +	 * Compilers must be able to optimize all of those to compile-time
> +	 * constants on any supported optimization level (-O2, -Os) and any
> +	 * architecture. Otherwise, trigger a build bug.
> +	 * The whole function gets optimized out then, there's nothing to do
> +	 * in runtime.
> +	 */
> +
> +	/* Equals to `unsigned long bitmap[1] = { BIT(5), }` */
> +	bitmap_clear(bitmap, 0, BITS_PER_LONG);
> +	if (!test_bit(7, bitmap))
> +		bitmap_set(bitmap, 5, 1);
> +
> +	/* Equals to `unsigned long bitopvar = BIT(20)` */
> +	__change_bit(31, &bitopvar);
> +	bitmap_shift_right(&bitopvar, &bitopvar, 11, BITS_PER_LONG);
> +
> +	/* Equals to `unsigned long var = BIT(25)` */
> +	var |= BIT(25);
> +	if (var & BIT(0))
> +		var ^= GENMASK(9, 6);
> +
> +	/* __const_hweight<32|64>(BIT(5)) == 1 */
> +	res = bitmap_weight(bitmap, 20);
> +	BUILD_BUG_ON(!__builtin_constant_p(res));
> +
> +	/* !(BIT(31) & BIT(18)) == 1 */
> +	res = !test_bit(18, &bitopvar);
> +	BUILD_BUG_ON(!__builtin_constant_p(res));
> +
> +	/* BIT(2) & GENMASK(14, 8) == 0 */
> +	BUILD_BUG_ON(!__builtin_constant_p(initvar & GENMASK(14, 8)));
> +	/* ~BIT(25) */
> +	BUILD_BUG_ON(!__builtin_constant_p(~var));
> +}
> +
>  static void __init selftest(void)
>  {
>  	test_zero_clear();
> @@ -884,6 +928,7 @@ static void __init selftest(void)
>  	test_for_each_set_clump8();
>  	test_bitmap_cut();
>  	test_bitmap_print_buf();
> +	test_bitmap_const_eval();
>  }
>  
>  KSTM_MODULE_LOADERS(test_bitmap);
> -- 
> 2.36.1
> 

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 5/7] bitops: wrap non-atomic bitops with a transparent macro
  2022-06-17 14:40     ` Alexander Lobakin
  (?)
@ 2022-06-20 10:08       ` Andy Shevchenko
  -1 siblings, 0 replies; 93+ messages in thread
From: Andy Shevchenko @ 2022-06-20 10:08 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Mark Rutland, Matt Turner, Brian Cain,
	Geert Uytterhoeven, Yoshinori Sato, Rich Felker, David S. Miller,
	Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

On Fri, Jun 17, 2022 at 04:40:29PM +0200, Alexander Lobakin wrote:
> In preparation for altering the non-atomic bitops with a macro, wrap
> them in a transparent definition. This requires prepending one more
> '_' to their names in order to be able to do that seamlessly. It is
> a simple change, given that all the non-prefixed definitions are now
> in asm-generic.
> sparc32 already has several triple-underscored functions, so I had
> to rename them ('___' -> 'sp32_').

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> ---
>  arch/sparc/include/asm/bitops_32.h            | 18 ++++++------
>  arch/sparc/lib/atomic32.c                     | 12 ++++----
>  .../bitops/instrumented-non-atomic.h          | 28 +++++++++----------
>  .../bitops/non-instrumented-non-atomic.h      | 14 +++++-----
>  include/linux/bitops.h                        | 18 +++++++++++-
>  tools/include/asm-generic/bitops/non-atomic.h | 24 ++++++++--------
>  tools/include/linux/bitops.h                  | 16 +++++++++++
>  7 files changed, 81 insertions(+), 49 deletions(-)
> 
> diff --git a/arch/sparc/include/asm/bitops_32.h b/arch/sparc/include/asm/bitops_32.h
> index 889afa9f990f..3448c191b484 100644
> --- a/arch/sparc/include/asm/bitops_32.h
> +++ b/arch/sparc/include/asm/bitops_32.h
> @@ -19,9 +19,9 @@
>  #error only <linux/bitops.h> can be included directly
>  #endif
>  
> -unsigned long ___set_bit(unsigned long *addr, unsigned long mask);
> -unsigned long ___clear_bit(unsigned long *addr, unsigned long mask);
> -unsigned long ___change_bit(unsigned long *addr, unsigned long mask);
> +unsigned long sp32___set_bit(unsigned long *addr, unsigned long mask);
> +unsigned long sp32___clear_bit(unsigned long *addr, unsigned long mask);
> +unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask);
>  
>  /*
>   * Set bit 'nr' in 32-bit quantity at address 'addr' where bit '0'
> @@ -36,7 +36,7 @@ static inline int test_and_set_bit(unsigned long nr, volatile unsigned long *add
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	return ___set_bit(ADDR, mask) != 0;
> +	return sp32___set_bit(ADDR, mask) != 0;
>  }
>  
>  static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -46,7 +46,7 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	(void) ___set_bit(ADDR, mask);
> +	(void) sp32___set_bit(ADDR, mask);
>  }
>  
>  static inline int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -56,7 +56,7 @@ static inline int test_and_clear_bit(unsigned long nr, volatile unsigned long *a
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	return ___clear_bit(ADDR, mask) != 0;
> +	return sp32___clear_bit(ADDR, mask) != 0;
>  }
>  
>  static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -66,7 +66,7 @@ static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	(void) ___clear_bit(ADDR, mask);
> +	(void) sp32___clear_bit(ADDR, mask);
>  }
>  
>  static inline int test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -76,7 +76,7 @@ static inline int test_and_change_bit(unsigned long nr, volatile unsigned long *
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	return ___change_bit(ADDR, mask) != 0;
> +	return sp32___change_bit(ADDR, mask) != 0;
>  }
>  
>  static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -86,7 +86,7 @@ static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	(void) ___change_bit(ADDR, mask);
> +	(void) sp32___change_bit(ADDR, mask);
>  }
>  
>  #include <asm-generic/bitops/non-atomic.h>
> diff --git a/arch/sparc/lib/atomic32.c b/arch/sparc/lib/atomic32.c
> index 8b81d0f00c97..cf80d1ae352b 100644
> --- a/arch/sparc/lib/atomic32.c
> +++ b/arch/sparc/lib/atomic32.c
> @@ -120,7 +120,7 @@ void arch_atomic_set(atomic_t *v, int i)
>  }
>  EXPORT_SYMBOL(arch_atomic_set);
>  
> -unsigned long ___set_bit(unsigned long *addr, unsigned long mask)
> +unsigned long sp32___set_bit(unsigned long *addr, unsigned long mask)
>  {
>  	unsigned long old, flags;
>  
> @@ -131,9 +131,9 @@ unsigned long ___set_bit(unsigned long *addr, unsigned long mask)
>  
>  	return old & mask;
>  }
> -EXPORT_SYMBOL(___set_bit);
> +EXPORT_SYMBOL(sp32___set_bit);
>  
> -unsigned long ___clear_bit(unsigned long *addr, unsigned long mask)
> +unsigned long sp32___clear_bit(unsigned long *addr, unsigned long mask)
>  {
>  	unsigned long old, flags;
>  
> @@ -144,9 +144,9 @@ unsigned long ___clear_bit(unsigned long *addr, unsigned long mask)
>  
>  	return old & mask;
>  }
> -EXPORT_SYMBOL(___clear_bit);
> +EXPORT_SYMBOL(sp32___clear_bit);
>  
> -unsigned long ___change_bit(unsigned long *addr, unsigned long mask)
> +unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask)
>  {
>  	unsigned long old, flags;
>  
> @@ -157,7 +157,7 @@ unsigned long ___change_bit(unsigned long *addr, unsigned long mask)
>  
>  	return old & mask;
>  }
> -EXPORT_SYMBOL(___change_bit);
> +EXPORT_SYMBOL(sp32___change_bit);
>  
>  unsigned long __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
>  {
> diff --git a/include/asm-generic/bitops/instrumented-non-atomic.h b/include/asm-generic/bitops/instrumented-non-atomic.h
> index b019f77ef21c..988a3bbfba34 100644
> --- a/include/asm-generic/bitops/instrumented-non-atomic.h
> +++ b/include/asm-generic/bitops/instrumented-non-atomic.h
> @@ -14,7 +14,7 @@
>  #include <linux/instrumented.h>
>  
>  /**
> - * __set_bit - Set a bit in memory
> + * ___set_bit - Set a bit in memory
>   * @nr: the bit to set
>   * @addr: the address to start counting from
>   *
> @@ -23,14 +23,14 @@
>   * succeeds.
>   */
>  static __always_inline void
> -__set_bit(unsigned long nr, volatile unsigned long *addr)
> +___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___set_bit(nr, addr);
>  }
>  
>  /**
> - * __clear_bit - Clears a bit in memory
> + * ___clear_bit - Clears a bit in memory
>   * @nr: the bit to clear
>   * @addr: the address to start counting from
>   *
> @@ -39,14 +39,14 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
>   * succeeds.
>   */
>  static __always_inline void
> -__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___clear_bit(nr, addr);
>  }
>  
>  /**
> - * __change_bit - Toggle a bit in memory
> + * ___change_bit - Toggle a bit in memory
>   * @nr: the bit to change
>   * @addr: the address to start counting from
>   *
> @@ -55,7 +55,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
>   * succeeds.
>   */
>  static __always_inline void
> -__change_bit(unsigned long nr, volatile unsigned long *addr)
> +___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___change_bit(nr, addr);
> @@ -86,7 +86,7 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
>  }
>  
>  /**
> - * __test_and_set_bit - Set a bit and return its old value
> + * ___test_and_set_bit - Set a bit and return its old value
>   * @nr: Bit to set
>   * @addr: Address to count from
>   *
> @@ -94,14 +94,14 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
>   * can appear to succeed but actually fail.
>   */
>  static __always_inline bool
> -__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_set_bit(nr, addr);
>  }
>  
>  /**
> - * __test_and_clear_bit - Clear a bit and return its old value
> + * ___test_and_clear_bit - Clear a bit and return its old value
>   * @nr: Bit to clear
>   * @addr: Address to count from
>   *
> @@ -109,14 +109,14 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>   * can appear to succeed but actually fail.
>   */
>  static __always_inline bool
> -__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_clear_bit(nr, addr);
>  }
>  
>  /**
> - * __test_and_change_bit - Change a bit and return its old value
> + * ___test_and_change_bit - Change a bit and return its old value
>   * @nr: Bit to change
>   * @addr: Address to count from
>   *
> @@ -124,19 +124,19 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>   * can appear to succeed but actually fail.
>   */
>  static __always_inline bool
> -__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_change_bit(nr, addr);
>  }
>  
>  /**
> - * test_bit - Determine whether a bit is set
> + * _test_bit - Determine whether a bit is set
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
>  static __always_inline bool
> -test_bit(unsigned long nr, const volatile unsigned long *addr)
> +_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long));
>  	return arch_test_bit(nr, addr);
> diff --git a/include/asm-generic/bitops/non-instrumented-non-atomic.h b/include/asm-generic/bitops/non-instrumented-non-atomic.h
> index e0fd7bf72a56..bdb9b1ffaee9 100644
> --- a/include/asm-generic/bitops/non-instrumented-non-atomic.h
> +++ b/include/asm-generic/bitops/non-instrumented-non-atomic.h
> @@ -3,14 +3,14 @@
>  #ifndef __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H
>  #define __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H
>  
> -#define __set_bit		arch___set_bit
> -#define __clear_bit		arch___clear_bit
> -#define __change_bit		arch___change_bit
> +#define ___set_bit		arch___set_bit
> +#define ___clear_bit		arch___clear_bit
> +#define ___change_bit		arch___change_bit
>  
> -#define __test_and_set_bit	arch___test_and_set_bit
> -#define __test_and_clear_bit	arch___test_and_clear_bit
> -#define __test_and_change_bit	arch___test_and_change_bit
> +#define ___test_and_set_bit	arch___test_and_set_bit
> +#define ___test_and_clear_bit	arch___test_and_clear_bit
> +#define ___test_and_change_bit	arch___test_and_change_bit
>  
> -#define test_bit		arch_test_bit
> +#define _test_bit		arch_test_bit
>  
>  #endif /* __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H */
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index d393297287d5..3c3afbae1533 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -26,8 +26,24 @@ extern unsigned int __sw_hweight16(unsigned int w);
>  extern unsigned int __sw_hweight32(unsigned int w);
>  extern unsigned long __sw_hweight64(__u64 w);
>  
> +/*
> + * Defined here because those may be needed by architecture-specific static
> + * inlines.
> + */
> +
>  #include <asm-generic/bitops/generic-non-atomic.h>
>  
> +#define bitop(op, nr, addr)						\
> +	op(nr, addr)
> +
> +#define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
> +#define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
> +#define __change_bit(nr, addr)		bitop(___change_bit, nr, addr)
> +#define __test_and_set_bit(nr, addr)	bitop(___test_and_set_bit, nr, addr)
> +#define __test_and_clear_bit(nr, addr)	bitop(___test_and_clear_bit, nr, addr)
> +#define __test_and_change_bit(nr, addr)	bitop(___test_and_change_bit, nr, addr)
> +#define test_bit(nr, addr)		bitop(_test_bit, nr, addr)
> +
>  /*
>   * Include this here because some architectures need generic_ffs/fls in
>   * scope
> @@ -38,7 +54,7 @@ extern unsigned long __sw_hweight64(__u64 w);
>  #define __check_bitop_pr(name)						\
>  	static_assert(__same_type(arch_##name, generic_##name) &&	\
>  		      __same_type(const_##name, generic_##name) &&	\
> -		      __same_type(name, generic_##name))
> +		      __same_type(_##name, generic_##name))
>  
>  __check_bitop_pr(__set_bit);
>  __check_bitop_pr(__clear_bit);
> diff --git a/tools/include/asm-generic/bitops/non-atomic.h b/tools/include/asm-generic/bitops/non-atomic.h
> index e5e78e42e57b..0c472a833408 100644
> --- a/tools/include/asm-generic/bitops/non-atomic.h
> +++ b/tools/include/asm-generic/bitops/non-atomic.h
> @@ -5,7 +5,7 @@
>  #include <linux/bits.h>
>  
>  /**
> - * __set_bit - Set a bit in memory
> + * ___set_bit - Set a bit in memory
>   * @nr: the bit to set
>   * @addr: the address to start counting from
>   *
> @@ -14,7 +14,7 @@
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -__set_bit(unsigned long nr, volatile unsigned long *addr)
> +___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -23,7 +23,7 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline void
> -__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -32,7 +32,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __change_bit - Toggle a bit in memory
> + * ___change_bit - Toggle a bit in memory
>   * @nr: the bit to change
>   * @addr: the address to start counting from
>   *
> @@ -41,7 +41,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -__change_bit(unsigned long nr, volatile unsigned long *addr)
> +___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -50,7 +50,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __test_and_set_bit - Set a bit and return its old value
> + * ___test_and_set_bit - Set a bit and return its old value
>   * @nr: Bit to set
>   * @addr: Address to count from
>   *
> @@ -59,7 +59,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
>  static __always_inline bool
> -__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -70,7 +70,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __test_and_clear_bit - Clear a bit and return its old value
> + * ___test_and_clear_bit - Clear a bit and return its old value
>   * @nr: Bit to clear
>   * @addr: Address to count from
>   *
> @@ -79,7 +79,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
>  static __always_inline bool
> -__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -91,7 +91,7 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  
>  /* WARNING: non atomic and it can be reordered! */
>  static __always_inline bool
> -__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -102,12 +102,12 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * test_bit - Determine whether a bit is set
> + * _test_bit - Determine whether a bit is set
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
>  static __always_inline bool
> -test_bit(unsigned long nr, const volatile unsigned long *addr)
> +_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
>  }
> diff --git a/tools/include/linux/bitops.h b/tools/include/linux/bitops.h
> index 5fca38fe1ba8..f18683b95ea6 100644
> --- a/tools/include/linux/bitops.h
> +++ b/tools/include/linux/bitops.h
> @@ -25,6 +25,22 @@ extern unsigned int __sw_hweight16(unsigned int w);
>  extern unsigned int __sw_hweight32(unsigned int w);
>  extern unsigned long __sw_hweight64(__u64 w);
>  
> +/*
> + * Defined here because those may be needed by architecture-specific static
> + * inlines.
> + */
> +
> +#define bitop(op, nr, addr)						\
> +	op(nr, addr)
> +
> +#define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
> +#define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
> +#define __change_bit(nr, addr)		bitop(___change_bit, nr, addr)
> +#define __test_and_set_bit(nr, addr)	bitop(___test_and_set_bit, nr, addr)
> +#define __test_and_clear_bit(nr, addr)	bitop(___test_and_clear_bit, nr, addr)
> +#define __test_and_change_bit(nr, addr)	bitop(___test_and_change_bit, nr, addr)
> +#define test_bit(nr, addr)		bitop(_test_bit, nr, addr)
> +
>  /*
>   * Include this here because some architectures need generic_ffs/fls in
>   * scope
> -- 
> 2.36.1
> 

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 5/7] bitops: wrap non-atomic bitops with a transparent macro
@ 2022-06-20 10:08       ` Andy Shevchenko
  0 siblings, 0 replies; 93+ messages in thread
From: Andy Shevchenko @ 2022-06-20 10:08 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Mark Rutland, Matt Turner, Brian Cain,
	Geert Uytterhoeven, Yoshinori Sato, Rich Felker, David S. Miller,
	Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

On Fri, Jun 17, 2022 at 04:40:29PM +0200, Alexander Lobakin wrote:
> In preparation for altering the non-atomic bitops with a macro, wrap
> them in a transparent definition. This requires prepending one more
> '_' to their names in order to be able to do that seamlessly. It is
> a simple change, given that all the non-prefixed definitions are now
> in asm-generic.
> sparc32 already has several triple-underscored functions, so I had
> to rename them ('___' -> 'sp32_').

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> ---
>  arch/sparc/include/asm/bitops_32.h            | 18 ++++++------
>  arch/sparc/lib/atomic32.c                     | 12 ++++----
>  .../bitops/instrumented-non-atomic.h          | 28 +++++++++----------
>  .../bitops/non-instrumented-non-atomic.h      | 14 +++++-----
>  include/linux/bitops.h                        | 18 +++++++++++-
>  tools/include/asm-generic/bitops/non-atomic.h | 24 ++++++++--------
>  tools/include/linux/bitops.h                  | 16 +++++++++++
>  7 files changed, 81 insertions(+), 49 deletions(-)
> 
> diff --git a/arch/sparc/include/asm/bitops_32.h b/arch/sparc/include/asm/bitops_32.h
> index 889afa9f990f..3448c191b484 100644
> --- a/arch/sparc/include/asm/bitops_32.h
> +++ b/arch/sparc/include/asm/bitops_32.h
> @@ -19,9 +19,9 @@
>  #error only <linux/bitops.h> can be included directly
>  #endif
>  
> -unsigned long ___set_bit(unsigned long *addr, unsigned long mask);
> -unsigned long ___clear_bit(unsigned long *addr, unsigned long mask);
> -unsigned long ___change_bit(unsigned long *addr, unsigned long mask);
> +unsigned long sp32___set_bit(unsigned long *addr, unsigned long mask);
> +unsigned long sp32___clear_bit(unsigned long *addr, unsigned long mask);
> +unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask);
>  
>  /*
>   * Set bit 'nr' in 32-bit quantity at address 'addr' where bit '0'
> @@ -36,7 +36,7 @@ static inline int test_and_set_bit(unsigned long nr, volatile unsigned long *add
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	return ___set_bit(ADDR, mask) != 0;
> +	return sp32___set_bit(ADDR, mask) != 0;
>  }
>  
>  static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -46,7 +46,7 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	(void) ___set_bit(ADDR, mask);
> +	(void) sp32___set_bit(ADDR, mask);
>  }
>  
>  static inline int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -56,7 +56,7 @@ static inline int test_and_clear_bit(unsigned long nr, volatile unsigned long *a
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	return ___clear_bit(ADDR, mask) != 0;
> +	return sp32___clear_bit(ADDR, mask) != 0;
>  }
>  
>  static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -66,7 +66,7 @@ static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	(void) ___clear_bit(ADDR, mask);
> +	(void) sp32___clear_bit(ADDR, mask);
>  }
>  
>  static inline int test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -76,7 +76,7 @@ static inline int test_and_change_bit(unsigned long nr, volatile unsigned long *
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	return ___change_bit(ADDR, mask) != 0;
> +	return sp32___change_bit(ADDR, mask) != 0;
>  }
>  
>  static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -86,7 +86,7 @@ static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	(void) ___change_bit(ADDR, mask);
> +	(void) sp32___change_bit(ADDR, mask);
>  }
>  
>  #include <asm-generic/bitops/non-atomic.h>
> diff --git a/arch/sparc/lib/atomic32.c b/arch/sparc/lib/atomic32.c
> index 8b81d0f00c97..cf80d1ae352b 100644
> --- a/arch/sparc/lib/atomic32.c
> +++ b/arch/sparc/lib/atomic32.c
> @@ -120,7 +120,7 @@ void arch_atomic_set(atomic_t *v, int i)
>  }
>  EXPORT_SYMBOL(arch_atomic_set);
>  
> -unsigned long ___set_bit(unsigned long *addr, unsigned long mask)
> +unsigned long sp32___set_bit(unsigned long *addr, unsigned long mask)
>  {
>  	unsigned long old, flags;
>  
> @@ -131,9 +131,9 @@ unsigned long ___set_bit(unsigned long *addr, unsigned long mask)
>  
>  	return old & mask;
>  }
> -EXPORT_SYMBOL(___set_bit);
> +EXPORT_SYMBOL(sp32___set_bit);
>  
> -unsigned long ___clear_bit(unsigned long *addr, unsigned long mask)
> +unsigned long sp32___clear_bit(unsigned long *addr, unsigned long mask)
>  {
>  	unsigned long old, flags;
>  
> @@ -144,9 +144,9 @@ unsigned long ___clear_bit(unsigned long *addr, unsigned long mask)
>  
>  	return old & mask;
>  }
> -EXPORT_SYMBOL(___clear_bit);
> +EXPORT_SYMBOL(sp32___clear_bit);
>  
> -unsigned long ___change_bit(unsigned long *addr, unsigned long mask)
> +unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask)
>  {
>  	unsigned long old, flags;
>  
> @@ -157,7 +157,7 @@ unsigned long ___change_bit(unsigned long *addr, unsigned long mask)
>  
>  	return old & mask;
>  }
> -EXPORT_SYMBOL(___change_bit);
> +EXPORT_SYMBOL(sp32___change_bit);
>  
>  unsigned long __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
>  {
> diff --git a/include/asm-generic/bitops/instrumented-non-atomic.h b/include/asm-generic/bitops/instrumented-non-atomic.h
> index b019f77ef21c..988a3bbfba34 100644
> --- a/include/asm-generic/bitops/instrumented-non-atomic.h
> +++ b/include/asm-generic/bitops/instrumented-non-atomic.h
> @@ -14,7 +14,7 @@
>  #include <linux/instrumented.h>
>  
>  /**
> - * __set_bit - Set a bit in memory
> + * ___set_bit - Set a bit in memory
>   * @nr: the bit to set
>   * @addr: the address to start counting from
>   *
> @@ -23,14 +23,14 @@
>   * succeeds.
>   */
>  static __always_inline void
> -__set_bit(unsigned long nr, volatile unsigned long *addr)
> +___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___set_bit(nr, addr);
>  }
>  
>  /**
> - * __clear_bit - Clears a bit in memory
> + * ___clear_bit - Clears a bit in memory
>   * @nr: the bit to clear
>   * @addr: the address to start counting from
>   *
> @@ -39,14 +39,14 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
>   * succeeds.
>   */
>  static __always_inline void
> -__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___clear_bit(nr, addr);
>  }
>  
>  /**
> - * __change_bit - Toggle a bit in memory
> + * ___change_bit - Toggle a bit in memory
>   * @nr: the bit to change
>   * @addr: the address to start counting from
>   *
> @@ -55,7 +55,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
>   * succeeds.
>   */
>  static __always_inline void
> -__change_bit(unsigned long nr, volatile unsigned long *addr)
> +___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___change_bit(nr, addr);
> @@ -86,7 +86,7 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
>  }
>  
>  /**
> - * __test_and_set_bit - Set a bit and return its old value
> + * ___test_and_set_bit - Set a bit and return its old value
>   * @nr: Bit to set
>   * @addr: Address to count from
>   *
> @@ -94,14 +94,14 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
>   * can appear to succeed but actually fail.
>   */
>  static __always_inline bool
> -__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_set_bit(nr, addr);
>  }
>  
>  /**
> - * __test_and_clear_bit - Clear a bit and return its old value
> + * ___test_and_clear_bit - Clear a bit and return its old value
>   * @nr: Bit to clear
>   * @addr: Address to count from
>   *
> @@ -109,14 +109,14 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>   * can appear to succeed but actually fail.
>   */
>  static __always_inline bool
> -__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_clear_bit(nr, addr);
>  }
>  
>  /**
> - * __test_and_change_bit - Change a bit and return its old value
> + * ___test_and_change_bit - Change a bit and return its old value
>   * @nr: Bit to change
>   * @addr: Address to count from
>   *
> @@ -124,19 +124,19 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>   * can appear to succeed but actually fail.
>   */
>  static __always_inline bool
> -__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_change_bit(nr, addr);
>  }
>  
>  /**
> - * test_bit - Determine whether a bit is set
> + * _test_bit - Determine whether a bit is set
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
>  static __always_inline bool
> -test_bit(unsigned long nr, const volatile unsigned long *addr)
> +_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long));
>  	return arch_test_bit(nr, addr);
> diff --git a/include/asm-generic/bitops/non-instrumented-non-atomic.h b/include/asm-generic/bitops/non-instrumented-non-atomic.h
> index e0fd7bf72a56..bdb9b1ffaee9 100644
> --- a/include/asm-generic/bitops/non-instrumented-non-atomic.h
> +++ b/include/asm-generic/bitops/non-instrumented-non-atomic.h
> @@ -3,14 +3,14 @@
>  #ifndef __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H
>  #define __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H
>  
> -#define __set_bit		arch___set_bit
> -#define __clear_bit		arch___clear_bit
> -#define __change_bit		arch___change_bit
> +#define ___set_bit		arch___set_bit
> +#define ___clear_bit		arch___clear_bit
> +#define ___change_bit		arch___change_bit
>  
> -#define __test_and_set_bit	arch___test_and_set_bit
> -#define __test_and_clear_bit	arch___test_and_clear_bit
> -#define __test_and_change_bit	arch___test_and_change_bit
> +#define ___test_and_set_bit	arch___test_and_set_bit
> +#define ___test_and_clear_bit	arch___test_and_clear_bit
> +#define ___test_and_change_bit	arch___test_and_change_bit
>  
> -#define test_bit		arch_test_bit
> +#define _test_bit		arch_test_bit
>  
>  #endif /* __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H */
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index d393297287d5..3c3afbae1533 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -26,8 +26,24 @@ extern unsigned int __sw_hweight16(unsigned int w);
>  extern unsigned int __sw_hweight32(unsigned int w);
>  extern unsigned long __sw_hweight64(__u64 w);
>  
> +/*
> + * Defined here because those may be needed by architecture-specific static
> + * inlines.
> + */
> +
>  #include <asm-generic/bitops/generic-non-atomic.h>
>  
> +#define bitop(op, nr, addr)						\
> +	op(nr, addr)
> +
> +#define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
> +#define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
> +#define __change_bit(nr, addr)		bitop(___change_bit, nr, addr)
> +#define __test_and_set_bit(nr, addr)	bitop(___test_and_set_bit, nr, addr)
> +#define __test_and_clear_bit(nr, addr)	bitop(___test_and_clear_bit, nr, addr)
> +#define __test_and_change_bit(nr, addr)	bitop(___test_and_change_bit, nr, addr)
> +#define test_bit(nr, addr)		bitop(_test_bit, nr, addr)
> +
>  /*
>   * Include this here because some architectures need generic_ffs/fls in
>   * scope
> @@ -38,7 +54,7 @@ extern unsigned long __sw_hweight64(__u64 w);
>  #define __check_bitop_pr(name)						\
>  	static_assert(__same_type(arch_##name, generic_##name) &&	\
>  		      __same_type(const_##name, generic_##name) &&	\
> -		      __same_type(name, generic_##name))
> +		      __same_type(_##name, generic_##name))
>  
>  __check_bitop_pr(__set_bit);
>  __check_bitop_pr(__clear_bit);
> diff --git a/tools/include/asm-generic/bitops/non-atomic.h b/tools/include/asm-generic/bitops/non-atomic.h
> index e5e78e42e57b..0c472a833408 100644
> --- a/tools/include/asm-generic/bitops/non-atomic.h
> +++ b/tools/include/asm-generic/bitops/non-atomic.h
> @@ -5,7 +5,7 @@
>  #include <linux/bits.h>
>  
>  /**
> - * __set_bit - Set a bit in memory
> + * ___set_bit - Set a bit in memory
>   * @nr: the bit to set
>   * @addr: the address to start counting from
>   *
> @@ -14,7 +14,7 @@
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -__set_bit(unsigned long nr, volatile unsigned long *addr)
> +___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -23,7 +23,7 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline void
> -__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -32,7 +32,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __change_bit - Toggle a bit in memory
> + * ___change_bit - Toggle a bit in memory
>   * @nr: the bit to change
>   * @addr: the address to start counting from
>   *
> @@ -41,7 +41,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -__change_bit(unsigned long nr, volatile unsigned long *addr)
> +___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -50,7 +50,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __test_and_set_bit - Set a bit and return its old value
> + * ___test_and_set_bit - Set a bit and return its old value
>   * @nr: Bit to set
>   * @addr: Address to count from
>   *
> @@ -59,7 +59,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
>  static __always_inline bool
> -__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -70,7 +70,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __test_and_clear_bit - Clear a bit and return its old value
> + * ___test_and_clear_bit - Clear a bit and return its old value
>   * @nr: Bit to clear
>   * @addr: Address to count from
>   *
> @@ -79,7 +79,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
>  static __always_inline bool
> -__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -91,7 +91,7 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  
>  /* WARNING: non atomic and it can be reordered! */
>  static __always_inline bool
> -__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -102,12 +102,12 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * test_bit - Determine whether a bit is set
> + * _test_bit - Determine whether a bit is set
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
>  static __always_inline bool
> -test_bit(unsigned long nr, const volatile unsigned long *addr)
> +_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
>  }
> diff --git a/tools/include/linux/bitops.h b/tools/include/linux/bitops.h
> index 5fca38fe1ba8..f18683b95ea6 100644
> --- a/tools/include/linux/bitops.h
> +++ b/tools/include/linux/bitops.h
> @@ -25,6 +25,22 @@ extern unsigned int __sw_hweight16(unsigned int w);
>  extern unsigned int __sw_hweight32(unsigned int w);
>  extern unsigned long __sw_hweight64(__u64 w);
>  
> +/*
> + * Defined here because those may be needed by architecture-specific static
> + * inlines.
> + */
> +
> +#define bitop(op, nr, addr)						\
> +	op(nr, addr)
> +
> +#define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
> +#define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
> +#define __change_bit(nr, addr)		bitop(___change_bit, nr, addr)
> +#define __test_and_set_bit(nr, addr)	bitop(___test_and_set_bit, nr, addr)
> +#define __test_and_clear_bit(nr, addr)	bitop(___test_and_clear_bit, nr, addr)
> +#define __test_and_change_bit(nr, addr)	bitop(___test_and_change_bit, nr, addr)
> +#define test_bit(nr, addr)		bitop(_test_bit, nr, addr)
> +
>  /*
>   * Include this here because some architectures need generic_ffs/fls in
>   * scope
> -- 
> 2.36.1
> 

-- 
With Best Regards,
Andy Shevchenko


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 5/7] bitops: wrap non-atomic bitops with a transparent macro
@ 2022-06-20 10:08       ` Andy Shevchenko
  0 siblings, 0 replies; 93+ messages in thread
From: Andy Shevchenko @ 2022-06-20 10:08 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Mark Rutland, Matt Turner, Brian Cain,
	Geert Uytterhoeven, Yoshinori Sato, Rich Felker, David S. Miller,
	Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k

On Fri, Jun 17, 2022 at 04:40:29PM +0200, Alexander Lobakin wrote:
> In preparation for altering the non-atomic bitops with a macro, wrap
> them in a transparent definition. This requires prepending one more
> '_' to their names in order to be able to do that seamlessly. It is
> a simple change, given that all the non-prefixed definitions are now
> in asm-generic.
> sparc32 already has several triple-underscored functions, so I had
> to rename them ('___' -> 'sp32_').

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> ---
>  arch/sparc/include/asm/bitops_32.h            | 18 ++++++------
>  arch/sparc/lib/atomic32.c                     | 12 ++++----
>  .../bitops/instrumented-non-atomic.h          | 28 +++++++++----------
>  .../bitops/non-instrumented-non-atomic.h      | 14 +++++-----
>  include/linux/bitops.h                        | 18 +++++++++++-
>  tools/include/asm-generic/bitops/non-atomic.h | 24 ++++++++--------
>  tools/include/linux/bitops.h                  | 16 +++++++++++
>  7 files changed, 81 insertions(+), 49 deletions(-)
> 
> diff --git a/arch/sparc/include/asm/bitops_32.h b/arch/sparc/include/asm/bitops_32.h
> index 889afa9f990f..3448c191b484 100644
> --- a/arch/sparc/include/asm/bitops_32.h
> +++ b/arch/sparc/include/asm/bitops_32.h
> @@ -19,9 +19,9 @@
>  #error only <linux/bitops.h> can be included directly
>  #endif
>  
> -unsigned long ___set_bit(unsigned long *addr, unsigned long mask);
> -unsigned long ___clear_bit(unsigned long *addr, unsigned long mask);
> -unsigned long ___change_bit(unsigned long *addr, unsigned long mask);
> +unsigned long sp32___set_bit(unsigned long *addr, unsigned long mask);
> +unsigned long sp32___clear_bit(unsigned long *addr, unsigned long mask);
> +unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask);
>  
>  /*
>   * Set bit 'nr' in 32-bit quantity at address 'addr' where bit '0'
> @@ -36,7 +36,7 @@ static inline int test_and_set_bit(unsigned long nr, volatile unsigned long *add
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	return ___set_bit(ADDR, mask) != 0;
> +	return sp32___set_bit(ADDR, mask) != 0;
>  }
>  
>  static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -46,7 +46,7 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	(void) ___set_bit(ADDR, mask);
> +	(void) sp32___set_bit(ADDR, mask);
>  }
>  
>  static inline int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -56,7 +56,7 @@ static inline int test_and_clear_bit(unsigned long nr, volatile unsigned long *a
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	return ___clear_bit(ADDR, mask) != 0;
> +	return sp32___clear_bit(ADDR, mask) != 0;
>  }
>  
>  static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -66,7 +66,7 @@ static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	(void) ___clear_bit(ADDR, mask);
> +	(void) sp32___clear_bit(ADDR, mask);
>  }
>  
>  static inline int test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -76,7 +76,7 @@ static inline int test_and_change_bit(unsigned long nr, volatile unsigned long *
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	return ___change_bit(ADDR, mask) != 0;
> +	return sp32___change_bit(ADDR, mask) != 0;
>  }
>  
>  static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
> @@ -86,7 +86,7 @@ static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
>  	ADDR = ((unsigned long *) addr) + (nr >> 5);
>  	mask = 1 << (nr & 31);
>  
> -	(void) ___change_bit(ADDR, mask);
> +	(void) sp32___change_bit(ADDR, mask);
>  }
>  
>  #include <asm-generic/bitops/non-atomic.h>
> diff --git a/arch/sparc/lib/atomic32.c b/arch/sparc/lib/atomic32.c
> index 8b81d0f00c97..cf80d1ae352b 100644
> --- a/arch/sparc/lib/atomic32.c
> +++ b/arch/sparc/lib/atomic32.c
> @@ -120,7 +120,7 @@ void arch_atomic_set(atomic_t *v, int i)
>  }
>  EXPORT_SYMBOL(arch_atomic_set);
>  
> -unsigned long ___set_bit(unsigned long *addr, unsigned long mask)
> +unsigned long sp32___set_bit(unsigned long *addr, unsigned long mask)
>  {
>  	unsigned long old, flags;
>  
> @@ -131,9 +131,9 @@ unsigned long ___set_bit(unsigned long *addr, unsigned long mask)
>  
>  	return old & mask;
>  }
> -EXPORT_SYMBOL(___set_bit);
> +EXPORT_SYMBOL(sp32___set_bit);
>  
> -unsigned long ___clear_bit(unsigned long *addr, unsigned long mask)
> +unsigned long sp32___clear_bit(unsigned long *addr, unsigned long mask)
>  {
>  	unsigned long old, flags;
>  
> @@ -144,9 +144,9 @@ unsigned long ___clear_bit(unsigned long *addr, unsigned long mask)
>  
>  	return old & mask;
>  }
> -EXPORT_SYMBOL(___clear_bit);
> +EXPORT_SYMBOL(sp32___clear_bit);
>  
> -unsigned long ___change_bit(unsigned long *addr, unsigned long mask)
> +unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask)
>  {
>  	unsigned long old, flags;
>  
> @@ -157,7 +157,7 @@ unsigned long ___change_bit(unsigned long *addr, unsigned long mask)
>  
>  	return old & mask;
>  }
> -EXPORT_SYMBOL(___change_bit);
> +EXPORT_SYMBOL(sp32___change_bit);
>  
>  unsigned long __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
>  {
> diff --git a/include/asm-generic/bitops/instrumented-non-atomic.h b/include/asm-generic/bitops/instrumented-non-atomic.h
> index b019f77ef21c..988a3bbfba34 100644
> --- a/include/asm-generic/bitops/instrumented-non-atomic.h
> +++ b/include/asm-generic/bitops/instrumented-non-atomic.h
> @@ -14,7 +14,7 @@
>  #include <linux/instrumented.h>
>  
>  /**
> - * __set_bit - Set a bit in memory
> + * ___set_bit - Set a bit in memory
>   * @nr: the bit to set
>   * @addr: the address to start counting from
>   *
> @@ -23,14 +23,14 @@
>   * succeeds.
>   */
>  static __always_inline void
> -__set_bit(unsigned long nr, volatile unsigned long *addr)
> +___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___set_bit(nr, addr);
>  }
>  
>  /**
> - * __clear_bit - Clears a bit in memory
> + * ___clear_bit - Clears a bit in memory
>   * @nr: the bit to clear
>   * @addr: the address to start counting from
>   *
> @@ -39,14 +39,14 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
>   * succeeds.
>   */
>  static __always_inline void
> -__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___clear_bit(nr, addr);
>  }
>  
>  /**
> - * __change_bit - Toggle a bit in memory
> + * ___change_bit - Toggle a bit in memory
>   * @nr: the bit to change
>   * @addr: the address to start counting from
>   *
> @@ -55,7 +55,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
>   * succeeds.
>   */
>  static __always_inline void
> -__change_bit(unsigned long nr, volatile unsigned long *addr)
> +___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	instrument_write(addr + BIT_WORD(nr), sizeof(long));
>  	arch___change_bit(nr, addr);
> @@ -86,7 +86,7 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
>  }
>  
>  /**
> - * __test_and_set_bit - Set a bit and return its old value
> + * ___test_and_set_bit - Set a bit and return its old value
>   * @nr: Bit to set
>   * @addr: Address to count from
>   *
> @@ -94,14 +94,14 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
>   * can appear to succeed but actually fail.
>   */
>  static __always_inline bool
> -__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_set_bit(nr, addr);
>  }
>  
>  /**
> - * __test_and_clear_bit - Clear a bit and return its old value
> + * ___test_and_clear_bit - Clear a bit and return its old value
>   * @nr: Bit to clear
>   * @addr: Address to count from
>   *
> @@ -109,14 +109,14 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>   * can appear to succeed but actually fail.
>   */
>  static __always_inline bool
> -__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_clear_bit(nr, addr);
>  }
>  
>  /**
> - * __test_and_change_bit - Change a bit and return its old value
> + * ___test_and_change_bit - Change a bit and return its old value
>   * @nr: Bit to change
>   * @addr: Address to count from
>   *
> @@ -124,19 +124,19 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>   * can appear to succeed but actually fail.
>   */
>  static __always_inline bool
> -__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	__instrument_read_write_bitop(nr, addr);
>  	return arch___test_and_change_bit(nr, addr);
>  }
>  
>  /**
> - * test_bit - Determine whether a bit is set
> + * _test_bit - Determine whether a bit is set
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
>  static __always_inline bool
> -test_bit(unsigned long nr, const volatile unsigned long *addr)
> +_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long));
>  	return arch_test_bit(nr, addr);
> diff --git a/include/asm-generic/bitops/non-instrumented-non-atomic.h b/include/asm-generic/bitops/non-instrumented-non-atomic.h
> index e0fd7bf72a56..bdb9b1ffaee9 100644
> --- a/include/asm-generic/bitops/non-instrumented-non-atomic.h
> +++ b/include/asm-generic/bitops/non-instrumented-non-atomic.h
> @@ -3,14 +3,14 @@
>  #ifndef __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H
>  #define __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H
>  
> -#define __set_bit		arch___set_bit
> -#define __clear_bit		arch___clear_bit
> -#define __change_bit		arch___change_bit
> +#define ___set_bit		arch___set_bit
> +#define ___clear_bit		arch___clear_bit
> +#define ___change_bit		arch___change_bit
>  
> -#define __test_and_set_bit	arch___test_and_set_bit
> -#define __test_and_clear_bit	arch___test_and_clear_bit
> -#define __test_and_change_bit	arch___test_and_change_bit
> +#define ___test_and_set_bit	arch___test_and_set_bit
> +#define ___test_and_clear_bit	arch___test_and_clear_bit
> +#define ___test_and_change_bit	arch___test_and_change_bit
>  
> -#define test_bit		arch_test_bit
> +#define _test_bit		arch_test_bit
>  
>  #endif /* __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H */
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index d393297287d5..3c3afbae1533 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -26,8 +26,24 @@ extern unsigned int __sw_hweight16(unsigned int w);
>  extern unsigned int __sw_hweight32(unsigned int w);
>  extern unsigned long __sw_hweight64(__u64 w);
>  
> +/*
> + * Defined here because those may be needed by architecture-specific static
> + * inlines.
> + */
> +
>  #include <asm-generic/bitops/generic-non-atomic.h>
>  
> +#define bitop(op, nr, addr)						\
> +	op(nr, addr)
> +
> +#define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
> +#define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
> +#define __change_bit(nr, addr)		bitop(___change_bit, nr, addr)
> +#define __test_and_set_bit(nr, addr)	bitop(___test_and_set_bit, nr, addr)
> +#define __test_and_clear_bit(nr, addr)	bitop(___test_and_clear_bit, nr, addr)
> +#define __test_and_change_bit(nr, addr)	bitop(___test_and_change_bit, nr, addr)
> +#define test_bit(nr, addr)		bitop(_test_bit, nr, addr)
> +
>  /*
>   * Include this here because some architectures need generic_ffs/fls in
>   * scope
> @@ -38,7 +54,7 @@ extern unsigned long __sw_hweight64(__u64 w);
>  #define __check_bitop_pr(name)						\
>  	static_assert(__same_type(arch_##name, generic_##name) &&	\
>  		      __same_type(const_##name, generic_##name) &&	\
> -		      __same_type(name, generic_##name))
> +		      __same_type(_##name, generic_##name))
>  
>  __check_bitop_pr(__set_bit);
>  __check_bitop_pr(__clear_bit);
> diff --git a/tools/include/asm-generic/bitops/non-atomic.h b/tools/include/asm-generic/bitops/non-atomic.h
> index e5e78e42e57b..0c472a833408 100644
> --- a/tools/include/asm-generic/bitops/non-atomic.h
> +++ b/tools/include/asm-generic/bitops/non-atomic.h
> @@ -5,7 +5,7 @@
>  #include <linux/bits.h>
>  
>  /**
> - * __set_bit - Set a bit in memory
> + * ___set_bit - Set a bit in memory
>   * @nr: the bit to set
>   * @addr: the address to start counting from
>   *
> @@ -14,7 +14,7 @@
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -__set_bit(unsigned long nr, volatile unsigned long *addr)
> +___set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -23,7 +23,7 @@ __set_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  static __always_inline void
> -__clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -32,7 +32,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __change_bit - Toggle a bit in memory
> + * ___change_bit - Toggle a bit in memory
>   * @nr: the bit to change
>   * @addr: the address to start counting from
>   *
> @@ -41,7 +41,7 @@ __clear_bit(unsigned long nr, volatile unsigned long *addr)
>   * may be that only one operation succeeds.
>   */
>  static __always_inline void
> -__change_bit(unsigned long nr, volatile unsigned long *addr)
> +___change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -50,7 +50,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __test_and_set_bit - Set a bit and return its old value
> + * ___test_and_set_bit - Set a bit and return its old value
>   * @nr: Bit to set
>   * @addr: Address to count from
>   *
> @@ -59,7 +59,7 @@ __change_bit(unsigned long nr, volatile unsigned long *addr)
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
>  static __always_inline bool
> -__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -70,7 +70,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * __test_and_clear_bit - Clear a bit and return its old value
> + * ___test_and_clear_bit - Clear a bit and return its old value
>   * @nr: Bit to clear
>   * @addr: Address to count from
>   *
> @@ -79,7 +79,7 @@ __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
>  static __always_inline bool
> -__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -91,7 +91,7 @@ __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
>  
>  /* WARNING: non atomic and it can be reordered! */
>  static __always_inline bool
> -__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  {
>  	unsigned long mask = BIT_MASK(nr);
>  	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -102,12 +102,12 @@ __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * test_bit - Determine whether a bit is set
> + * _test_bit - Determine whether a bit is set
>   * @nr: bit number to test
>   * @addr: Address to start counting from
>   */
>  static __always_inline bool
> -test_bit(unsigned long nr, const volatile unsigned long *addr)
> +_test_bit(unsigned long nr, const volatile unsigned long *addr)
>  {
>  	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
>  }
> diff --git a/tools/include/linux/bitops.h b/tools/include/linux/bitops.h
> index 5fca38fe1ba8..f18683b95ea6 100644
> --- a/tools/include/linux/bitops.h
> +++ b/tools/include/linux/bitops.h
> @@ -25,6 +25,22 @@ extern unsigned int __sw_hweight16(unsigned int w);
>  extern unsigned int __sw_hweight32(unsigned int w);
>  extern unsigned long __sw_hweight64(__u64 w);
>  
> +/*
> + * Defined here because those may be needed by architecture-specific static
> + * inlines.
> + */
> +
> +#define bitop(op, nr, addr)						\
> +	op(nr, addr)
> +
> +#define __set_bit(nr, addr)		bitop(___set_bit, nr, addr)
> +#define __clear_bit(nr, addr)		bitop(___clear_bit, nr, addr)
> +#define __change_bit(nr, addr)		bitop(___change_bit, nr, addr)
> +#define __test_and_set_bit(nr, addr)	bitop(___test_and_set_bit, nr, addr)
> +#define __test_and_clear_bit(nr, addr)	bitop(___test_and_clear_bit, nr, addr)
> +#define __test_and_change_bit(nr, addr)	bitop(___test_and_change_bit, nr, addr)
> +#define test_bit(nr, addr)		bitop(_test_bit, nr, addr)
> +
>  /*
>   * Include this here because some architectures need generic_ffs/fls in
>   * scope
> -- 
> 2.36.1
> 

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants
  2022-06-17 14:40   ` Alexander Lobakin
  (?)
@ 2022-06-20 12:07     ` Geert Uytterhoeven
  -1 siblings, 0 replies; 93+ messages in thread
From: Geert Uytterhoeven @ 2022-06-20 12:07 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Mark Rutland,
	Matt Turner, Brian Cain, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, alpha,
	open list:QUALCOMM HEXAGON...,
	linux-ia64, linux-m68k, Linux-sh list, sparclinux, Linux-Arch,
	Linux Kernel Mailing List

Hi Olek,

On Fri, Jun 17, 2022 at 6:51 PM Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
> While I was working on converting some structure fields from a fixed
> type to a bitmap, I started observing code size increase not only in
> places where the code works with the converted structure fields, but
> also where the converted vars were on the stack. That said, the
> following code:
>
>         DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
>         unsigned long bar = BIT(BAR_BIT);
>         unsigned long baz = 0;
>
>         __set_bit(FOO_BIT, foo);
>         baz |= BIT(BAZ_BIT);
>
>         BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
>         BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
>         BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
>
> triggers the first assertion on x86_64, which means that the
> compiler is unable to evaluate it to a compile-time initializer
> when the architecture-specific bitop is used even if it's obvious.
> I found that this is due to that many architecture-specific
> non-atomic bitop implementations use inline asm or other hacks which
> are faster or more robust when working with "real" variables (i.e.
> fields from the structures etc.), but the compilers have no clue how
> to optimize them out when called on compile-time constants.
>
> So, in order to let the compiler optimize out such cases, expand the
> test_bit() and __*_bit() definitions with a compile-time condition
> check, so that they will pick the generic C non-atomic bitop
> implementations when all of the arguments passed are compile-time
> constants, which means that the result will be a compile-time
> constant as well and the compiler will produce more efficient and
> simple code in 100% cases (no changes when there's at least one
> non-compile-time-constant argument).
> The condition itself:
>
> if (
> __builtin_constant_p(nr) &&     /* <- bit position is constant */
> __builtin_constant_p(!!addr) && /* <- compiler knows bitmap addr is
>                                       always either NULL or not */
> addr &&                         /* <- bitmap addr is not NULL */
> __builtin_constant_p(*addr)     /* <- compiler knows the value of
>                                       the target bitmap */
> )
>         /* then pick the generic C variant
> else
>         /* old code path, arch-specific
>
> I also tried __is_constexpr() as suggested by Andy, but it was
> always returning 0 ('not a constant') for the 2,3 and 4th
> conditions.
>
> The savings are architecture, compiler and compiler flags dependent,
> for example, on x86_64 -O2:
>
> GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
>
> and ARM64 (courtesy of Mark[0]):
>
> GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
>
> And the following:
>
>         DECLARE_BITMAP(flags, __IP_TUNNEL_FLAG_NUM) = { };
>         __be16 flags;
>
>         __set_bit(IP_TUNNEL_CSUM_BIT, flags);
>
>         tun_flags = cpu_to_be16(*flags & U16_MAX);
>
>         if (test_bit(IP_TUNNEL_VTI_BIT, flags))
>                 tun_flags |= VTI_ISVTI;
>
>         BUILD_BUG_ON(!__builtin_constant_p(tun_flags));
>
> doesn't blow up anymore (which is being checked now at build time),
> so that we can now e.g. use fixed bitmaps in compile-time assertions
> etc.
>
> The series has been in intel-next for a while with no reported issues.
>
> From v2[1]:
> * collect several Reviewed-bys (Andy, Yury);
> * add a comment to generic_test_bit() that it is atomic-safe and
>   must always stay like that (the first version of this series
>   errorneously tried to change this) (Andy, Marco);
> * unify the way how architectures define platform-specific bitops,
>   both supporting instrumentation and not: now they define only
>   'arch_' versions and asm-generic includes take care of the rest;
> * micro-optimize the diffstat of 0004/0007 (__check_bitop_pr())
>   (Andy);
> * add compile-time tests to lib/test_bitmap to make sure everything
>   works as expected on any setup (Yury).

Thanks for the update!

Still seeing
add/remove: 49/13 grow/shrink: 280/137 up/down: 6464/-3328 (3136)

on m68k atari_defconfig (i.e.CONFIG_CC_OPTIMIZE_FOR_SIZE=y)
with gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04).

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-20 12:07     ` Geert Uytterhoeven
  0 siblings, 0 replies; 93+ messages in thread
From: Geert Uytterhoeven @ 2022-06-20 12:07 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Mark Rutland,
	Matt Turner, Brian Cain, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, alpha,
	open list:QUALCOMM HEXAGON...,
	linux-ia64, linux-m68k, Linux-sh list, sparclinux, Linux-Arch,
	Linux Kernel Mailing List

Hi Olek,

On Fri, Jun 17, 2022 at 6:51 PM Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
> While I was working on converting some structure fields from a fixed
> type to a bitmap, I started observing code size increase not only in
> places where the code works with the converted structure fields, but
> also where the converted vars were on the stack. That said, the
> following code:
>
>         DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
>         unsigned long bar = BIT(BAR_BIT);
>         unsigned long baz = 0;
>
>         __set_bit(FOO_BIT, foo);
>         baz |= BIT(BAZ_BIT);
>
>         BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
>         BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
>         BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
>
> triggers the first assertion on x86_64, which means that the
> compiler is unable to evaluate it to a compile-time initializer
> when the architecture-specific bitop is used even if it's obvious.
> I found that this is due to that many architecture-specific
> non-atomic bitop implementations use inline asm or other hacks which
> are faster or more robust when working with "real" variables (i.e.
> fields from the structures etc.), but the compilers have no clue how
> to optimize them out when called on compile-time constants.
>
> So, in order to let the compiler optimize out such cases, expand the
> test_bit() and __*_bit() definitions with a compile-time condition
> check, so that they will pick the generic C non-atomic bitop
> implementations when all of the arguments passed are compile-time
> constants, which means that the result will be a compile-time
> constant as well and the compiler will produce more efficient and
> simple code in 100% cases (no changes when there's at least one
> non-compile-time-constant argument).
> The condition itself:
>
> if (
> __builtin_constant_p(nr) &&     /* <- bit position is constant */
> __builtin_constant_p(!!addr) && /* <- compiler knows bitmap addr is
>                                       always either NULL or not */
> addr &&                         /* <- bitmap addr is not NULL */
> __builtin_constant_p(*addr)     /* <- compiler knows the value of
>                                       the target bitmap */
> )
>         /* then pick the generic C variant
> else
>         /* old code path, arch-specific
>
> I also tried __is_constexpr() as suggested by Andy, but it was
> always returning 0 ('not a constant') for the 2,3 and 4th
> conditions.
>
> The savings are architecture, compiler and compiler flags dependent,
> for example, on x86_64 -O2:
>
> GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
>
> and ARM64 (courtesy of Mark[0]):
>
> GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
>
> And the following:
>
>         DECLARE_BITMAP(flags, __IP_TUNNEL_FLAG_NUM) = { };
>         __be16 flags;
>
>         __set_bit(IP_TUNNEL_CSUM_BIT, flags);
>
>         tun_flags = cpu_to_be16(*flags & U16_MAX);
>
>         if (test_bit(IP_TUNNEL_VTI_BIT, flags))
>                 tun_flags |= VTI_ISVTI;
>
>         BUILD_BUG_ON(!__builtin_constant_p(tun_flags));
>
> doesn't blow up anymore (which is being checked now at build time),
> so that we can now e.g. use fixed bitmaps in compile-time assertions
> etc.
>
> The series has been in intel-next for a while with no reported issues.
>
> From v2[1]:
> * collect several Reviewed-bys (Andy, Yury);
> * add a comment to generic_test_bit() that it is atomic-safe and
>   must always stay like that (the first version of this series
>   errorneously tried to change this) (Andy, Marco);
> * unify the way how architectures define platform-specific bitops,
>   both supporting instrumentation and not: now they define only
>   'arch_' versions and asm-generic includes take care of the rest;
> * micro-optimize the diffstat of 0004/0007 (__check_bitop_pr())
>   (Andy);
> * add compile-time tests to lib/test_bitmap to make sure everything
>   works as expected on any setup (Yury).

Thanks for the update!

Still seeing
add/remove: 49/13 grow/shrink: 280/137 up/down: 6464/-3328 (3136)

on m68k atari_defconfig (i.e.CONFIG_CC_OPTIMIZE_FOR_SIZE=y)
with gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04).

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-20 12:07     ` Geert Uytterhoeven
  0 siblings, 0 replies; 93+ messages in thread
From: Geert Uytterhoeven @ 2022-06-20 12:07 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Mark Rutland,
	Matt Turner, Brian Cain, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, alpha,
	open list:QUALCOMM HEXAGON...

Hi Olek,

On Fri, Jun 17, 2022 at 6:51 PM Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
> While I was working on converting some structure fields from a fixed
> type to a bitmap, I started observing code size increase not only in
> places where the code works with the converted structure fields, but
> also where the converted vars were on the stack. That said, the
> following code:
>
>         DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
>         unsigned long bar = BIT(BAR_BIT);
>         unsigned long baz = 0;
>
>         __set_bit(FOO_BIT, foo);
>         baz |= BIT(BAZ_BIT);
>
>         BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
>         BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
>         BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
>
> triggers the first assertion on x86_64, which means that the
> compiler is unable to evaluate it to a compile-time initializer
> when the architecture-specific bitop is used even if it's obvious.
> I found that this is due to that many architecture-specific
> non-atomic bitop implementations use inline asm or other hacks which
> are faster or more robust when working with "real" variables (i.e.
> fields from the structures etc.), but the compilers have no clue how
> to optimize them out when called on compile-time constants.
>
> So, in order to let the compiler optimize out such cases, expand the
> test_bit() and __*_bit() definitions with a compile-time condition
> check, so that they will pick the generic C non-atomic bitop
> implementations when all of the arguments passed are compile-time
> constants, which means that the result will be a compile-time
> constant as well and the compiler will produce more efficient and
> simple code in 100% cases (no changes when there's at least one
> non-compile-time-constant argument).
> The condition itself:
>
> if (
> __builtin_constant_p(nr) &&     /* <- bit position is constant */
> __builtin_constant_p(!!addr) && /* <- compiler knows bitmap addr is
>                                       always either NULL or not */
> addr &&                         /* <- bitmap addr is not NULL */
> __builtin_constant_p(*addr)     /* <- compiler knows the value of
>                                       the target bitmap */
> )
>         /* then pick the generic C variant
> else
>         /* old code path, arch-specific
>
> I also tried __is_constexpr() as suggested by Andy, but it was
> always returning 0 ('not a constant') for the 2,3 and 4th
> conditions.
>
> The savings are architecture, compiler and compiler flags dependent,
> for example, on x86_64 -O2:
>
> GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
>
> and ARM64 (courtesy of Mark[0]):
>
> GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
>
> And the following:
>
>         DECLARE_BITMAP(flags, __IP_TUNNEL_FLAG_NUM) = { };
>         __be16 flags;
>
>         __set_bit(IP_TUNNEL_CSUM_BIT, flags);
>
>         tun_flags = cpu_to_be16(*flags & U16_MAX);
>
>         if (test_bit(IP_TUNNEL_VTI_BIT, flags))
>                 tun_flags |= VTI_ISVTI;
>
>         BUILD_BUG_ON(!__builtin_constant_p(tun_flags));
>
> doesn't blow up anymore (which is being checked now at build time),
> so that we can now e.g. use fixed bitmaps in compile-time assertions
> etc.
>
> The series has been in intel-next for a while with no reported issues.
>
> From v2[1]:
> * collect several Reviewed-bys (Andy, Yury);
> * add a comment to generic_test_bit() that it is atomic-safe and
>   must always stay like that (the first version of this series
>   errorneously tried to change this) (Andy, Marco);
> * unify the way how architectures define platform-specific bitops,
>   both supporting instrumentation and not: now they define only
>   'arch_' versions and asm-generic includes take care of the rest;
> * micro-optimize the diffstat of 0004/0007 (__check_bitop_pr())
>   (Andy);
> * add compile-time tests to lib/test_bitmap to make sure everything
>   works as expected on any setup (Yury).

Thanks for the update!

Still seeing
add/remove: 49/13 grow/shrink: 280/137 up/down: 6464/-3328 (3136)

on m68k atari_defconfig (i.e.CONFIG_CC_OPTIMIZE_FOR_SIZE=y)
with gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04).

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 6/7] bitops: let optimize out non-atomic bitops on compile-time constants
  2022-06-20 10:05       ` Andy Shevchenko
  (?)
@ 2022-06-20 13:12         ` Alexander Lobakin
  -1 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-20 13:12 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Alexander Lobakin, Arnd Bergmann, Yury Norov, Mark Rutland,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

From: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Date: Mon, 20 Jun 2022 13:05:06 +0300

> On Fri, Jun 17, 2022 at 04:40:30PM +0200, Alexander Lobakin wrote:
> > Currently, many architecture-specific non-atomic bitop
> > implementations use inline asm or other hacks which are faster or
> > more robust when working with "real" variables (i.e. fields from
> > the structures etc.), but the compilers have no clue how to optimize
> > them out when called on compile-time constants. That said, the
> > following code:
> > 
> > 	DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
> > 	unsigned long bar = BIT(BAR_BIT);
> > 	unsigned long baz = 0;
> > 
> > 	__set_bit(FOO_BIT, foo);
> > 	baz |= BIT(BAZ_BIT);
> > 
> > 	BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
> > 	BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
> > 	BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
> > 
> > triggers the first assertion on x86_64, which means that the
> > compiler is unable to evaluate it to a compile-time initializer
> > when the architecture-specific bitop is used even if it's obvious.
> > In order to let the compiler optimize out such cases, expand the
> > bitop() macro to use the "constant" C non-atomic bitop
> > implementations when all of the arguments passed are compile-time
> > constants, which means that the result will be a compile-time
> > constant as well, so that it produces more efficient and simple
> > code in 100% cases, comparing to the architecture-specific
> > counterparts.
> > 
> > The savings are architecture, compiler and compiler flags dependent,
> > for example, on x86_64 -O2:
> > 
> > GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> > LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> > LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
> > 
> > and ARM64 (courtesy of Mark):
> > 
> > GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> > LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
> 
> ...
> 
> > +/*
> > + * Many architecture-specific non-atomic bitops contain inline asm code and due
> > + * to that the compiler can't optimize them to compile-time expressions or
> > + * constants. In contrary, gen_*() helpers are defined in pure C and compilers
> 
> generic_*() ?

Ah right, bah, forgot to change that in v2. Will fix in v4, as
__builtin_constant_p() test from v7 triggered build bugs on ARC,
will look into that.

> 
> > + * optimize them just well.
> > + * Therefore, to make `unsigned long foo = 0; __set_bit(BAR, &foo)` effectively
> > + * equal to `unsigned long foo = BIT(BAR)`, pick the generic C alternative when
> > + * the arguments can be resolved at compile time. That expression itself is a
> > + * constant and doesn't bring any functional changes to the rest of cases.
> > + * The casts to `uintptr_t` are needed to mitigate `-Waddress` warnings when
> > + * passing a bitmap from .bss or .data (-> `!!addr` is always true).
> > + */
> 
> -- 
> With Best Regards,
> Andy Shevchenko

Thanks,
Al

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 6/7] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-20 13:12         ` Alexander Lobakin
  0 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-20 13:12 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Alexander Lobakin, Arnd Bergmann, Yury Norov, Mark Rutland,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

From: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Date: Mon, 20 Jun 2022 13:05:06 +0300

> On Fri, Jun 17, 2022 at 04:40:30PM +0200, Alexander Lobakin wrote:
> > Currently, many architecture-specific non-atomic bitop
> > implementations use inline asm or other hacks which are faster or
> > more robust when working with "real" variables (i.e. fields from
> > the structures etc.), but the compilers have no clue how to optimize
> > them out when called on compile-time constants. That said, the
> > following code:
> > 
> > 	DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
> > 	unsigned long bar = BIT(BAR_BIT);
> > 	unsigned long baz = 0;
> > 
> > 	__set_bit(FOO_BIT, foo);
> > 	baz |= BIT(BAZ_BIT);
> > 
> > 	BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
> > 	BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
> > 	BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
> > 
> > triggers the first assertion on x86_64, which means that the
> > compiler is unable to evaluate it to a compile-time initializer
> > when the architecture-specific bitop is used even if it's obvious.
> > In order to let the compiler optimize out such cases, expand the
> > bitop() macro to use the "constant" C non-atomic bitop
> > implementations when all of the arguments passed are compile-time
> > constants, which means that the result will be a compile-time
> > constant as well, so that it produces more efficient and simple
> > code in 100% cases, comparing to the architecture-specific
> > counterparts.
> > 
> > The savings are architecture, compiler and compiler flags dependent,
> > for example, on x86_64 -O2:
> > 
> > GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> > LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> > LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
> > 
> > and ARM64 (courtesy of Mark):
> > 
> > GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> > LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
> 
> ...
> 
> > +/*
> > + * Many architecture-specific non-atomic bitops contain inline asm code and due
> > + * to that the compiler can't optimize them to compile-time expressions or
> > + * constants. In contrary, gen_*() helpers are defined in pure C and compilers
> 
> generic_*() ?

Ah right, bah, forgot to change that in v2. Will fix in v4, as
__builtin_constant_p() test from v7 triggered build bugs on ARC,
will look into that.

> 
> > + * optimize them just well.
> > + * Therefore, to make `unsigned long foo = 0; __set_bit(BAR, &foo)` effectively
> > + * equal to `unsigned long foo = BIT(BAR)`, pick the generic C alternative when
> > + * the arguments can be resolved at compile time. That expression itself is a
> > + * constant and doesn't bring any functional changes to the rest of cases.
> > + * The casts to `uintptr_t` are needed to mitigate `-Waddress` warnings when
> > + * passing a bitmap from .bss or .data (-> `!!addr` is always true).
> > + */
> 
> -- 
> With Best Regards,
> Andy Shevchenko

Thanks,
Al

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 6/7] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-20 13:12         ` Alexander Lobakin
  0 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-20 13:12 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Alexander Lobakin, Arnd Bergmann, Yury Norov, Mark Rutland,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon

From: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Date: Mon, 20 Jun 2022 13:05:06 +0300

> On Fri, Jun 17, 2022 at 04:40:30PM +0200, Alexander Lobakin wrote:
> > Currently, many architecture-specific non-atomic bitop
> > implementations use inline asm or other hacks which are faster or
> > more robust when working with "real" variables (i.e. fields from
> > the structures etc.), but the compilers have no clue how to optimize
> > them out when called on compile-time constants. That said, the
> > following code:
> > 
> > 	DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
> > 	unsigned long bar = BIT(BAR_BIT);
> > 	unsigned long baz = 0;
> > 
> > 	__set_bit(FOO_BIT, foo);
> > 	baz |= BIT(BAZ_BIT);
> > 
> > 	BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
> > 	BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
> > 	BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
> > 
> > triggers the first assertion on x86_64, which means that the
> > compiler is unable to evaluate it to a compile-time initializer
> > when the architecture-specific bitop is used even if it's obvious.
> > In order to let the compiler optimize out such cases, expand the
> > bitop() macro to use the "constant" C non-atomic bitop
> > implementations when all of the arguments passed are compile-time
> > constants, which means that the result will be a compile-time
> > constant as well, so that it produces more efficient and simple
> > code in 100% cases, comparing to the architecture-specific
> > counterparts.
> > 
> > The savings are architecture, compiler and compiler flags dependent,
> > for example, on x86_64 -O2:
> > 
> > GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> > LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> > LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
> > 
> > and ARM64 (courtesy of Mark):
> > 
> > GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> > LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
> 
> ...
> 
> > +/*
> > + * Many architecture-specific non-atomic bitops contain inline asm code and due
> > + * to that the compiler can't optimize them to compile-time expressions or
> > + * constants. In contrary, gen_*() helpers are defined in pure C and compilers
> 
> generic_*() ?

Ah right, bah, forgot to change that in v2. Will fix in v4, as
__builtin_constant_p() test from v7 triggered build bugs on ARC,
will look into that.

> 
> > + * optimize them just well.
> > + * Therefore, to make `unsigned long foo = 0; __set_bit(BAR, &foo)` effectively
> > + * equal to `unsigned long foo = BIT(BAR)`, pick the generic C alternative when
> > + * the arguments can be resolved at compile time. That expression itself is a
> > + * constant and doesn't bring any functional changes to the rest of cases.
> > + * The casts to `uintptr_t` are needed to mitigate `-Waddress` warnings when
> > + * passing a bitmap from .bss or .data (-> `!!addr` is always true).
> > + */
> 
> -- 
> With Best Regards,
> Andy Shevchenko

Thanks,
Al

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants
  2022-06-20 12:07     ` Geert Uytterhoeven
  (?)
@ 2022-06-20 13:22       ` Alexander Lobakin
  -1 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-20 13:22 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Alexander Lobakin, Arnd Bergmann, Yury Norov, Andy Shevchenko,
	Mark Rutland, Matt Turner, Brian Cain, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, alpha,
	open list:QUALCOMM HEXAGON...,
	linux-ia64, linux-m68k, Linux-sh list, sparclinux, Linux-Arch,
	Linux Kernel Mailing List

From: Geert Uytterhoeven <geert@linux-m68k.org>
Date: Mon, 20 Jun 2022 14:07:27 +0200

> Hi Olek,

Hey!

> 
> On Fri, Jun 17, 2022 at 6:51 PM Alexander Lobakin
> <alexandr.lobakin@intel.com> wrote:
> > While I was working on converting some structure fields from a fixed
> > type to a bitmap, I started observing code size increase not only in
> > places where the code works with the converted structure fields, but
> > also where the converted vars were on the stack. That said, the
> > following code:
> >
> >         DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
> >         unsigned long bar = BIT(BAR_BIT);
> >         unsigned long baz = 0;
> >
> >         __set_bit(FOO_BIT, foo);
> >         baz |= BIT(BAZ_BIT);
> >
> >         BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
> >         BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
> >         BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
> >
> > triggers the first assertion on x86_64, which means that the
> > compiler is unable to evaluate it to a compile-time initializer
> > when the architecture-specific bitop is used even if it's obvious.
> > I found that this is due to that many architecture-specific
> > non-atomic bitop implementations use inline asm or other hacks which
> > are faster or more robust when working with "real" variables (i.e.
> > fields from the structures etc.), but the compilers have no clue how
> > to optimize them out when called on compile-time constants.
> >
> > So, in order to let the compiler optimize out such cases, expand the
> > test_bit() and __*_bit() definitions with a compile-time condition
> > check, so that they will pick the generic C non-atomic bitop
> > implementations when all of the arguments passed are compile-time
> > constants, which means that the result will be a compile-time
> > constant as well and the compiler will produce more efficient and
> > simple code in 100% cases (no changes when there's at least one
> > non-compile-time-constant argument).
> > The condition itself:
> >
> > if (
> > __builtin_constant_p(nr) &&     /* <- bit position is constant */
> > __builtin_constant_p(!!addr) && /* <- compiler knows bitmap addr is
> >                                       always either NULL or not */
> > addr &&                         /* <- bitmap addr is not NULL */
> > __builtin_constant_p(*addr)     /* <- compiler knows the value of
> >                                       the target bitmap */
> > )
> >         /* then pick the generic C variant
> > else
> >         /* old code path, arch-specific
> >
> > I also tried __is_constexpr() as suggested by Andy, but it was
> > always returning 0 ('not a constant') for the 2,3 and 4th
> > conditions.
> >
> > The savings are architecture, compiler and compiler flags dependent,
> > for example, on x86_64 -O2:
> >
> > GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> > LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> > LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
> >
> > and ARM64 (courtesy of Mark[0]):
> >
> > GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> > LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
> >
> > And the following:
> >
> >         DECLARE_BITMAP(flags, __IP_TUNNEL_FLAG_NUM) = { };
> >         __be16 flags;
> >
> >         __set_bit(IP_TUNNEL_CSUM_BIT, flags);
> >
> >         tun_flags = cpu_to_be16(*flags & U16_MAX);
> >
> >         if (test_bit(IP_TUNNEL_VTI_BIT, flags))
> >                 tun_flags |= VTI_ISVTI;
> >
> >         BUILD_BUG_ON(!__builtin_constant_p(tun_flags));
> >
> > doesn't blow up anymore (which is being checked now at build time),
> > so that we can now e.g. use fixed bitmaps in compile-time assertions
> > etc.
> >
> > The series has been in intel-next for a while with no reported issues.
> >
> > From v2[1]:
> > * collect several Reviewed-bys (Andy, Yury);
> > * add a comment to generic_test_bit() that it is atomic-safe and
> >   must always stay like that (the first version of this series
> >   errorneously tried to change this) (Andy, Marco);
> > * unify the way how architectures define platform-specific bitops,
> >   both supporting instrumentation and not: now they define only
> >   'arch_' versions and asm-generic includes take care of the rest;
> > * micro-optimize the diffstat of 0004/0007 (__check_bitop_pr())
> >   (Andy);
> > * add compile-time tests to lib/test_bitmap to make sure everything
> >   works as expected on any setup (Yury).
> 
> Thanks for the update!
> 
> Still seeing
> add/remove: 49/13 grow/shrink: 280/137 up/down: 6464/-3328 (3136)

Meh. What about -O2 (OPTIMIZE_FOR_PERFORMANCE)? I have a thought to
make it depend on the config option above, but that would make code
behave differently, so it's not safe.
Are those 3 Kb critical for m68k machines? I'm asking because for
some embedded systems they are :)
Another thing, this could happen due to inlining rebalance. E.g.
the compiler could inline or uninline some functions due to
resolving bit{maps,ops} to compile-time constants. I was seeing
such in the past several times. Also, IIRC you already sent some
bloat-o-meter results here, and that was the case.
On the other hand, if lib/test_bitmap.o builds successfully (I
assume it does), the series works as expected.

> 
> on m68k atari_defconfig (i.e.CONFIG_CC_OPTIMIZE_FOR_SIZE=y)
> with gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04).
> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds

Thanks,
Olek

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-20 13:22       ` Alexander Lobakin
  0 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-20 13:22 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Alexander Lobakin, Arnd Bergmann, Yury Norov, Andy Shevchenko,
	Mark Rutland, Matt Turner, Brian Cain, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, alpha,
	open list:QUALCOMM HEXAGON...,
	linux-ia64, linux-m68k, Linux-sh list, sparclinux, Linux-Arch,
	Linux Kernel Mailing List

From: Geert Uytterhoeven <geert@linux-m68k.org>
Date: Mon, 20 Jun 2022 14:07:27 +0200

> Hi Olek,

Hey!

> 
> On Fri, Jun 17, 2022 at 6:51 PM Alexander Lobakin
> <alexandr.lobakin@intel.com> wrote:
> > While I was working on converting some structure fields from a fixed
> > type to a bitmap, I started observing code size increase not only in
> > places where the code works with the converted structure fields, but
> > also where the converted vars were on the stack. That said, the
> > following code:
> >
> >         DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
> >         unsigned long bar = BIT(BAR_BIT);
> >         unsigned long baz = 0;
> >
> >         __set_bit(FOO_BIT, foo);
> >         baz |= BIT(BAZ_BIT);
> >
> >         BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
> >         BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
> >         BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
> >
> > triggers the first assertion on x86_64, which means that the
> > compiler is unable to evaluate it to a compile-time initializer
> > when the architecture-specific bitop is used even if it's obvious.
> > I found that this is due to that many architecture-specific
> > non-atomic bitop implementations use inline asm or other hacks which
> > are faster or more robust when working with "real" variables (i.e.
> > fields from the structures etc.), but the compilers have no clue how
> > to optimize them out when called on compile-time constants.
> >
> > So, in order to let the compiler optimize out such cases, expand the
> > test_bit() and __*_bit() definitions with a compile-time condition
> > check, so that they will pick the generic C non-atomic bitop
> > implementations when all of the arguments passed are compile-time
> > constants, which means that the result will be a compile-time
> > constant as well and the compiler will produce more efficient and
> > simple code in 100% cases (no changes when there's at least one
> > non-compile-time-constant argument).
> > The condition itself:
> >
> > if (
> > __builtin_constant_p(nr) &&     /* <- bit position is constant */
> > __builtin_constant_p(!!addr) && /* <- compiler knows bitmap addr is
> >                                       always either NULL or not */
> > addr &&                         /* <- bitmap addr is not NULL */
> > __builtin_constant_p(*addr)     /* <- compiler knows the value of
> >                                       the target bitmap */
> > )
> >         /* then pick the generic C variant
> > else
> >         /* old code path, arch-specific
> >
> > I also tried __is_constexpr() as suggested by Andy, but it was
> > always returning 0 ('not a constant') for the 2,3 and 4th
> > conditions.
> >
> > The savings are architecture, compiler and compiler flags dependent,
> > for example, on x86_64 -O2:
> >
> > GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> > LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> > LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
> >
> > and ARM64 (courtesy of Mark[0]):
> >
> > GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> > LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
> >
> > And the following:
> >
> >         DECLARE_BITMAP(flags, __IP_TUNNEL_FLAG_NUM) = { };
> >         __be16 flags;
> >
> >         __set_bit(IP_TUNNEL_CSUM_BIT, flags);
> >
> >         tun_flags = cpu_to_be16(*flags & U16_MAX);
> >
> >         if (test_bit(IP_TUNNEL_VTI_BIT, flags))
> >                 tun_flags |= VTI_ISVTI;
> >
> >         BUILD_BUG_ON(!__builtin_constant_p(tun_flags));
> >
> > doesn't blow up anymore (which is being checked now at build time),
> > so that we can now e.g. use fixed bitmaps in compile-time assertions
> > etc.
> >
> > The series has been in intel-next for a while with no reported issues.
> >
> > From v2[1]:
> > * collect several Reviewed-bys (Andy, Yury);
> > * add a comment to generic_test_bit() that it is atomic-safe and
> >   must always stay like that (the first version of this series
> >   errorneously tried to change this) (Andy, Marco);
> > * unify the way how architectures define platform-specific bitops,
> >   both supporting instrumentation and not: now they define only
> >   'arch_' versions and asm-generic includes take care of the rest;
> > * micro-optimize the diffstat of 0004/0007 (__check_bitop_pr())
> >   (Andy);
> > * add compile-time tests to lib/test_bitmap to make sure everything
> >   works as expected on any setup (Yury).
> 
> Thanks for the update!
> 
> Still seeing
> add/remove: 49/13 grow/shrink: 280/137 up/down: 6464/-3328 (3136)

Meh. What about -O2 (OPTIMIZE_FOR_PERFORMANCE)? I have a thought to
make it depend on the config option above, but that would make code
behave differently, so it's not safe.
Are those 3 Kb critical for m68k machines? I'm asking because for
some embedded systems they are :)
Another thing, this could happen due to inlining rebalance. E.g.
the compiler could inline or uninline some functions due to
resolving bit{maps,ops} to compile-time constants. I was seeing
such in the past several times. Also, IIRC you already sent some
bloat-o-meter results here, and that was the case.
On the other hand, if lib/test_bitmap.o builds successfully (I
assume it does), the series works as expected.

> 
> on m68k atari_defconfig (i.e.CONFIG_CC_OPTIMIZE_FOR_SIZE=y)
> with gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04).
> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds

Thanks,
Olek

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-20 13:22       ` Alexander Lobakin
  0 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-20 13:22 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Alexander Lobakin, Arnd Bergmann, Yury Norov, Andy Shevchenko,
	Mark Rutland, Matt Turner, Brian Cain, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, alpha

From: Geert Uytterhoeven <geert@linux-m68k.org>
Date: Mon, 20 Jun 2022 14:07:27 +0200

> Hi Olek,

Hey!

> 
> On Fri, Jun 17, 2022 at 6:51 PM Alexander Lobakin
> <alexandr.lobakin@intel.com> wrote:
> > While I was working on converting some structure fields from a fixed
> > type to a bitmap, I started observing code size increase not only in
> > places where the code works with the converted structure fields, but
> > also where the converted vars were on the stack. That said, the
> > following code:
> >
> >         DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
> >         unsigned long bar = BIT(BAR_BIT);
> >         unsigned long baz = 0;
> >
> >         __set_bit(FOO_BIT, foo);
> >         baz |= BIT(BAZ_BIT);
> >
> >         BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
> >         BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
> >         BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
> >
> > triggers the first assertion on x86_64, which means that the
> > compiler is unable to evaluate it to a compile-time initializer
> > when the architecture-specific bitop is used even if it's obvious.
> > I found that this is due to that many architecture-specific
> > non-atomic bitop implementations use inline asm or other hacks which
> > are faster or more robust when working with "real" variables (i.e.
> > fields from the structures etc.), but the compilers have no clue how
> > to optimize them out when called on compile-time constants.
> >
> > So, in order to let the compiler optimize out such cases, expand the
> > test_bit() and __*_bit() definitions with a compile-time condition
> > check, so that they will pick the generic C non-atomic bitop
> > implementations when all of the arguments passed are compile-time
> > constants, which means that the result will be a compile-time
> > constant as well and the compiler will produce more efficient and
> > simple code in 100% cases (no changes when there's at least one
> > non-compile-time-constant argument).
> > The condition itself:
> >
> > if (
> > __builtin_constant_p(nr) &&     /* <- bit position is constant */
> > __builtin_constant_p(!!addr) && /* <- compiler knows bitmap addr is
> >                                       always either NULL or not */
> > addr &&                         /* <- bitmap addr is not NULL */
> > __builtin_constant_p(*addr)     /* <- compiler knows the value of
> >                                       the target bitmap */
> > )
> >         /* then pick the generic C variant
> > else
> >         /* old code path, arch-specific
> >
> > I also tried __is_constexpr() as suggested by Andy, but it was
> > always returning 0 ('not a constant') for the 2,3 and 4th
> > conditions.
> >
> > The savings are architecture, compiler and compiler flags dependent,
> > for example, on x86_64 -O2:
> >
> > GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> > LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> > LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
> >
> > and ARM64 (courtesy of Mark[0]):
> >
> > GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> > LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
> >
> > And the following:
> >
> >         DECLARE_BITMAP(flags, __IP_TUNNEL_FLAG_NUM) = { };
> >         __be16 flags;
> >
> >         __set_bit(IP_TUNNEL_CSUM_BIT, flags);
> >
> >         tun_flags = cpu_to_be16(*flags & U16_MAX);
> >
> >         if (test_bit(IP_TUNNEL_VTI_BIT, flags))
> >                 tun_flags |= VTI_ISVTI;
> >
> >         BUILD_BUG_ON(!__builtin_constant_p(tun_flags));
> >
> > doesn't blow up anymore (which is being checked now at build time),
> > so that we can now e.g. use fixed bitmaps in compile-time assertions
> > etc.
> >
> > The series has been in intel-next for a while with no reported issues.
> >
> > From v2[1]:
> > * collect several Reviewed-bys (Andy, Yury);
> > * add a comment to generic_test_bit() that it is atomic-safe and
> >   must always stay like that (the first version of this series
> >   errorneously tried to change this) (Andy, Marco);
> > * unify the way how architectures define platform-specific bitops,
> >   both supporting instrumentation and not: now they define only
> >   'arch_' versions and asm-generic includes take care of the rest;
> > * micro-optimize the diffstat of 0004/0007 (__check_bitop_pr())
> >   (Andy);
> > * add compile-time tests to lib/test_bitmap to make sure everything
> >   works as expected on any setup (Yury).
> 
> Thanks for the update!
> 
> Still seeing
> add/remove: 49/13 grow/shrink: 280/137 up/down: 6464/-3328 (3136)

Meh. What about -O2 (OPTIMIZE_FOR_PERFORMANCE)? I have a thought to
make it depend on the config option above, but that would make code
behave differently, so it's not safe.
Are those 3 Kb critical for m68k machines? I'm asking because for
some embedded systems they are :)
Another thing, this could happen due to inlining rebalance. E.g.
the compiler could inline or uninline some functions due to
resolving bit{maps,ops} to compile-time constants. I was seeing
such in the past several times. Also, IIRC you already sent some
bloat-o-meter results here, and that was the case.
On the other hand, if lib/test_bitmap.o builds successfully (I
assume it does), the series works as expected.

> 
> on m68k atari_defconfig (i.e.CONFIG_CC_OPTIMIZE_FOR_SIZE=y)
> with gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04).
> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds

Thanks,
Olek

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [alobakin:bitops 3/7] block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t
  2022-06-17 14:40   ` Alexander Lobakin
  (?)
@ 2022-06-20 13:51     ` Alexander Lobakin
  -1 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-20 13:51 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel, kbuild-all, kernel test robot

From: kernel test robot <lkp@intel.com>
Date: Sun, 19 Jun 2022 17:20:05 +0800

Also, could someone please help me with this? I don't get what went
wrong with sparse, it's not even some new code, just moving old
stuff.

> tree:   https://github.com/alobakin/linux bitops
> head:   9bd39b17ce49d350eed93a031e0da6389067013e
> commit: 521611f961a7dda92eefa26e1afd3914c06af64e [3/7] bitops: unify non-atomic bitops prototypes across architectures
> config: mips-randconfig-s031-20220619 (https://download.01.org/0day-ci/archive/20220619/202206191726.wq70mbMK-lkp@intel.com/config)
> compiler: mips64el-linux-gcc (GCC) 11.3.0
> reproduce:
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # apt-get install sparse
>         # sparse version: v0.6.4-30-g92122700-dirty
>         # https://github.com/alobakin/linux/commit/521611f961a7dda92eefa26e1afd3914c06af64e
>         git remote add alobakin https://github.com/alobakin/linux
>         git fetch --no-tags alobakin bitops
>         git checkout 521611f961a7dda92eefa26e1afd3914c06af64e
>         # save the config file
>         mkdir build_dir && cp config build_dir/.config
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=mips SHELL=/bin/bash
> 
> If you fix the issue, kindly add following tag where applicable
> Reported-by: kernel test robot <lkp@intel.com>
> 
> 
> sparse warnings: (new ones prefixed by >>)
>    command-line: note: in included file:
>    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQUIRE redefined
>    builtin:0:0: sparse: this was the original definition
>    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_SEQ_CST redefined
>    builtin:0:0: sparse: this was the original definition
>    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQ_REL redefined
>    builtin:0:0: sparse: this was the original definition
>    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_RELEASE redefined
>    builtin:0:0: sparse: this was the original definition
>    block/elevator.c: note: in included file (through include/linux/bitops.h, include/linux/kernel.h):
>    include/asm-generic/bitops/generic-non-atomic.h:29:9: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:30:9: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:32:10: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:32:16: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:27:1: sparse: sparse: unreplaced symbol 'return'
>    include/asm-generic/bitops/generic-non-atomic.h:38:9: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:39:9: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:41:10: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:41:16: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:36:1: sparse: sparse: unreplaced symbol 'return'
>    include/asm-generic/bitops/generic-non-atomic.h:56:9: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:57:9: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:59:10: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:59:15: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:54:1: sparse: sparse: unreplaced symbol 'return'
>    include/asm-generic/bitops/generic-non-atomic.h:74:9: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:75:9: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:76:9: sparse: sparse: unreplaced symbol 'old'
>    include/asm-generic/bitops/generic-non-atomic.h:78:10: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:78:14: sparse: sparse: unreplaced symbol 'old'
>    include/asm-generic/bitops/generic-non-atomic.h:78:20: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:79:17: sparse: sparse: unreplaced symbol 'old'
>    include/asm-generic/bitops/generic-non-atomic.h:79:23: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:79:9: sparse: sparse: unreplaced symbol 'return'
>    include/asm-generic/bitops/generic-non-atomic.h:72:1: sparse: sparse: unreplaced symbol 'return'
>    include/asm-generic/bitops/generic-non-atomic.h:94:9: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:95:9: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:96:9: sparse: sparse: unreplaced symbol 'old'
>    include/asm-generic/bitops/generic-non-atomic.h:98:10: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:98:14: sparse: sparse: unreplaced symbol 'old'
>    include/asm-generic/bitops/generic-non-atomic.h:98:21: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:99:17: sparse: sparse: unreplaced symbol 'old'
>    include/asm-generic/bitops/generic-non-atomic.h:99:23: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:99:9: sparse: sparse: unreplaced symbol 'return'
>    include/asm-generic/bitops/generic-non-atomic.h:92:1: sparse: sparse: unreplaced symbol 'return'
>    include/asm-generic/bitops/generic-non-atomic.h:106:9: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:107:9: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:108:9: sparse: sparse: unreplaced symbol 'old'
>    include/asm-generic/bitops/generic-non-atomic.h:110:10: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:110:14: sparse: sparse: unreplaced symbol 'old'
>    include/asm-generic/bitops/generic-non-atomic.h:110:20: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:111:17: sparse: sparse: unreplaced symbol 'old'
>    include/asm-generic/bitops/generic-non-atomic.h:111:23: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:111:9: sparse: sparse: unreplaced symbol 'return'
>    include/asm-generic/bitops/generic-non-atomic.h:104:1: sparse: sparse: unreplaced symbol 'return'
>    include/asm-generic/bitops/generic-non-atomic.h:127:9: sparse: sparse: unreplaced symbol 'return'
>    include/asm-generic/bitops/generic-non-atomic.h:120:1: sparse: sparse: unreplaced symbol 'return'
> >> block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t
> 
> vim +222 block/elevator.c
> 
> 9817064b68fef7 Jens Axboe        2006-07-28  217  
> 70b3ea056f3074 Jens Axboe        2016-12-07  218  void elv_rqhash_add(struct request_queue *q, struct request *rq)
> 9817064b68fef7 Jens Axboe        2006-07-28  219  {
> b374d18a4bfce7 Jens Axboe        2008-10-31  220  	struct elevator_queue *e = q->elevator;
> 9817064b68fef7 Jens Axboe        2006-07-28  221  
> 9817064b68fef7 Jens Axboe        2006-07-28 @222  	BUG_ON(ELV_ON_HASH(rq));
> 242d98f077ac0a Sasha Levin       2012-12-17  223  	hash_add(e->hash, &rq->hash, rq_hash_key(rq));
> e806402130c9c4 Christoph Hellwig 2016-10-20  224  	rq->rq_flags |= RQF_HASHED;
> 9817064b68fef7 Jens Axboe        2006-07-28  225  }
> bd166ef183c263 Jens Axboe        2017-01-17  226  EXPORT_SYMBOL_GPL(elv_rqhash_add);
> 9817064b68fef7 Jens Axboe        2006-07-28  227  
> 
> :::::: The code at line 222 was first introduced by commit
> :::::: 9817064b68fef7e4580c6df1ea597e106b9ff88b [PATCH] elevator: move the backmerging logic into the elevator core
> 
> :::::: TO: Jens Axboe <axboe@suse.de>
> :::::: CC: Jens Axboe <axboe@nelson.home.kernel.dk>
> 
> -- 
> 0-DAY CI Kernel Test Service
> https://01.org/lkp

Thanks,
Olek

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [alobakin:bitops 3/7] block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t
@ 2022-06-20 13:51     ` Alexander Lobakin
  0 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-20 13:51 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel, kbuild-all, kernel test robot

From: kernel test robot <lkp@intel.com>
Date: Sun, 19 Jun 2022 17:20:05 +0800

Also, could someone please help me with this? I don't get what went
wrong with sparse, it's not even some new code, just moving old
stuff.

> tree:   https://github.com/alobakin/linux bitops
> head:   9bd39b17ce49d350eed93a031e0da6389067013e
> commit: 521611f961a7dda92eefa26e1afd3914c06af64e [3/7] bitops: unify non-atomic bitops prototypes across architectures
> config: mips-randconfig-s031-20220619 (https://download.01.org/0day-ci/archive/20220619/202206191726.wq70mbMK-lkp@intel.com/config)
> compiler: mips64el-linux-gcc (GCC) 11.3.0
> reproduce:
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # apt-get install sparse
>         # sparse version: v0.6.4-30-g92122700-dirty
>         # https://github.com/alobakin/linux/commit/521611f961a7dda92eefa26e1afd3914c06af64e
>         git remote add alobakin https://github.com/alobakin/linux
>         git fetch --no-tags alobakin bitops
>         git checkout 521611f961a7dda92eefa26e1afd3914c06af64e
>         # save the config file
>         mkdir build_dir && cp config build_dir/.config
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=mips SHELL=/bin/bash
> 
> If you fix the issue, kindly add following tag where applicable
> Reported-by: kernel test robot <lkp@intel.com>
> 
> 
> sparse warnings: (new ones prefixed by >>)
>    command-line: note: in included file:
>    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQUIRE redefined
>    builtin:0:0: sparse: this was the original definition
>    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_SEQ_CST redefined
>    builtin:0:0: sparse: this was the original definition
>    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQ_REL redefined
>    builtin:0:0: sparse: this was the original definition
>    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_RELEASE redefined
>    builtin:0:0: sparse: this was the original definition
>    block/elevator.c: note: in included file (through include/linux/bitops.h, include/linux/kernel.h):
>    include/asm-generic/bitops/generic-non-atomic.h:29:9: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:30:9: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:32:10: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:32:16: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:27:1: sparse: sparse: unreplaced symbol 'return'
>    include/asm-generic/bitops/generic-non-atomic.h:38:9: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:39:9: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:41:10: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:41:16: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:36:1: sparse: sparse: unreplaced symbol 'return'
>    include/asm-generic/bitops/generic-non-atomic.h:56:9: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:57:9: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:59:10: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:59:15: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:54:1: sparse: sparse: unreplaced symbol 'return'
>    include/asm-generic/bitops/generic-non-atomic.h:74:9: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:75:9: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:76:9: sparse: sparse: unreplaced symbol 'old'
>    include/asm-generic/bitops/generic-non-atomic.h:78:10: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:78:14: sparse: sparse: unreplaced symbol 'old'
>    include/asm-generic/bitops/generic-non-atomic.h:78:20: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:79:17: sparse: sparse: unreplaced symbol 'old'
>    include/asm-generic/bitops/generic-non-atomic.h:79:23: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:79:9: sparse: sparse: unreplaced symbol 'return'
>    include/asm-generic/bitops/generic-non-atomic.h:72:1: sparse: sparse: unreplaced symbol 'return'
>    include/asm-generic/bitops/generic-non-atomic.h:94:9: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:95:9: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:96:9: sparse: sparse: unreplaced symbol 'old'
>    include/asm-generic/bitops/generic-non-atomic.h:98:10: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:98:14: sparse: sparse: unreplaced symbol 'old'
>    include/asm-generic/bitops/generic-non-atomic.h:98:21: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:99:17: sparse: sparse: unreplaced symbol 'old'
>    include/asm-generic/bitops/generic-non-atomic.h:99:23: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:99:9: sparse: sparse: unreplaced symbol 'return'
>    include/asm-generic/bitops/generic-non-atomic.h:92:1: sparse: sparse: unreplaced symbol 'return'
>    include/asm-generic/bitops/generic-non-atomic.h:106:9: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:107:9: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:108:9: sparse: sparse: unreplaced symbol 'old'
>    include/asm-generic/bitops/generic-non-atomic.h:110:10: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:110:14: sparse: sparse: unreplaced symbol 'old'
>    include/asm-generic/bitops/generic-non-atomic.h:110:20: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:111:17: sparse: sparse: unreplaced symbol 'old'
>    include/asm-generic/bitops/generic-non-atomic.h:111:23: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:111:9: sparse: sparse: unreplaced symbol 'return'
>    include/asm-generic/bitops/generic-non-atomic.h:104:1: sparse: sparse: unreplaced symbol 'return'
>    include/asm-generic/bitops/generic-non-atomic.h:127:9: sparse: sparse: unreplaced symbol 'return'
>    include/asm-generic/bitops/generic-non-atomic.h:120:1: sparse: sparse: unreplaced symbol 'return'
> >> block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t
> 
> vim +222 block/elevator.c
> 
> 9817064b68fef7 Jens Axboe        2006-07-28  217  
> 70b3ea056f3074 Jens Axboe        2016-12-07  218  void elv_rqhash_add(struct request_queue *q, struct request *rq)
> 9817064b68fef7 Jens Axboe        2006-07-28  219  {
> b374d18a4bfce7 Jens Axboe        2008-10-31  220  	struct elevator_queue *e = q->elevator;
> 9817064b68fef7 Jens Axboe        2006-07-28  221  
> 9817064b68fef7 Jens Axboe        2006-07-28 @222  	BUG_ON(ELV_ON_HASH(rq));
> 242d98f077ac0a Sasha Levin       2012-12-17  223  	hash_add(e->hash, &rq->hash, rq_hash_key(rq));
> e806402130c9c4 Christoph Hellwig 2016-10-20  224  	rq->rq_flags |= RQF_HASHED;
> 9817064b68fef7 Jens Axboe        2006-07-28  225  }
> bd166ef183c263 Jens Axboe        2017-01-17  226  EXPORT_SYMBOL_GPL(elv_rqhash_add);
> 9817064b68fef7 Jens Axboe        2006-07-28  227  
> 
> :::::: The code at line 222 was first introduced by commit
> :::::: 9817064b68fef7e4580c6df1ea597e106b9ff88b [PATCH] elevator: move the backmerging logic into the elevator core
> 
> :::::: TO: Jens Axboe <axboe@suse.de>
> :::::: CC: Jens Axboe <axboe@nelson.home.kernel.dk>
> 
> -- 
> 0-DAY CI Kernel Test Service
> https://01.org/lkp

Thanks,
Olek

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [alobakin:bitops 3/7] block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t
@ 2022-06-20 13:51     ` Alexander Lobakin
  0 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-20 13:51 UTC (permalink / raw)
  To: Arnd Bergmann, Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Mark Rutland, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-

From: kernel test robot <lkp@intel.com>
Date: Sun, 19 Jun 2022 17:20:05 +0800

Also, could someone please help me with this? I don't get what went
wrong with sparse, it's not even some new code, just moving old
stuff.

> tree:   https://github.com/alobakin/linux bitops
> head:   9bd39b17ce49d350eed93a031e0da6389067013e
> commit: 521611f961a7dda92eefa26e1afd3914c06af64e [3/7] bitops: unify non-atomic bitops prototypes across architectures
> config: mips-randconfig-s031-20220619 (https://download.01.org/0day-ci/archive/20220619/202206191726.wq70mbMK-lkp@intel.com/config)
> compiler: mips64el-linux-gcc (GCC) 11.3.0
> reproduce:
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # apt-get install sparse
>         # sparse version: v0.6.4-30-g92122700-dirty
>         # https://github.com/alobakin/linux/commit/521611f961a7dda92eefa26e1afd3914c06af64e
>         git remote add alobakin https://github.com/alobakin/linux
>         git fetch --no-tags alobakin bitops
>         git checkout 521611f961a7dda92eefa26e1afd3914c06af64e
>         # save the config file
>         mkdir build_dir && cp config build_dir/.config
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=mips SHELL=/bin/bash
> 
> If you fix the issue, kindly add following tag where applicable
> Reported-by: kernel test robot <lkp@intel.com>
> 
> 
> sparse warnings: (new ones prefixed by >>)
>    command-line: note: in included file:
>    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQUIRE redefined
>    builtin:0:0: sparse: this was the original definition
>    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_SEQ_CST redefined
>    builtin:0:0: sparse: this was the original definition
>    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQ_REL redefined
>    builtin:0:0: sparse: this was the original definition
>    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_RELEASE redefined
>    builtin:0:0: sparse: this was the original definition
>    block/elevator.c: note: in included file (through include/linux/bitops.h, include/linux/kernel.h):
>    include/asm-generic/bitops/generic-non-atomic.h:29:9: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:30:9: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:32:10: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:32:16: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:27:1: sparse: sparse: unreplaced symbol 'return'
>    include/asm-generic/bitops/generic-non-atomic.h:38:9: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:39:9: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:41:10: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:41:16: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:36:1: sparse: sparse: unreplaced symbol 'return'
>    include/asm-generic/bitops/generic-non-atomic.h:56:9: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:57:9: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:59:10: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:59:15: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:54:1: sparse: sparse: unreplaced symbol 'return'
>    include/asm-generic/bitops/generic-non-atomic.h:74:9: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:75:9: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:76:9: sparse: sparse: unreplaced symbol 'old'
>    include/asm-generic/bitops/generic-non-atomic.h:78:10: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:78:14: sparse: sparse: unreplaced symbol 'old'
>    include/asm-generic/bitops/generic-non-atomic.h:78:20: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:79:17: sparse: sparse: unreplaced symbol 'old'
>    include/asm-generic/bitops/generic-non-atomic.h:79:23: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:79:9: sparse: sparse: unreplaced symbol 'return'
>    include/asm-generic/bitops/generic-non-atomic.h:72:1: sparse: sparse: unreplaced symbol 'return'
>    include/asm-generic/bitops/generic-non-atomic.h:94:9: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:95:9: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:96:9: sparse: sparse: unreplaced symbol 'old'
>    include/asm-generic/bitops/generic-non-atomic.h:98:10: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:98:14: sparse: sparse: unreplaced symbol 'old'
>    include/asm-generic/bitops/generic-non-atomic.h:98:21: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:99:17: sparse: sparse: unreplaced symbol 'old'
>    include/asm-generic/bitops/generic-non-atomic.h:99:23: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:99:9: sparse: sparse: unreplaced symbol 'return'
>    include/asm-generic/bitops/generic-non-atomic.h:92:1: sparse: sparse: unreplaced symbol 'return'
>    include/asm-generic/bitops/generic-non-atomic.h:106:9: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:107:9: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:108:9: sparse: sparse: unreplaced symbol 'old'
>    include/asm-generic/bitops/generic-non-atomic.h:110:10: sparse: sparse: unreplaced symbol 'p'
>    include/asm-generic/bitops/generic-non-atomic.h:110:14: sparse: sparse: unreplaced symbol 'old'
>    include/asm-generic/bitops/generic-non-atomic.h:110:20: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:111:17: sparse: sparse: unreplaced symbol 'old'
>    include/asm-generic/bitops/generic-non-atomic.h:111:23: sparse: sparse: unreplaced symbol 'mask'
>    include/asm-generic/bitops/generic-non-atomic.h:111:9: sparse: sparse: unreplaced symbol 'return'
>    include/asm-generic/bitops/generic-non-atomic.h:104:1: sparse: sparse: unreplaced symbol 'return'
>    include/asm-generic/bitops/generic-non-atomic.h:127:9: sparse: sparse: unreplaced symbol 'return'
>    include/asm-generic/bitops/generic-non-atomic.h:120:1: sparse: sparse: unreplaced symbol 'return'
> >> block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t
> 
> vim +222 block/elevator.c
> 
> 9817064b68fef7 Jens Axboe        2006-07-28  217  
> 70b3ea056f3074 Jens Axboe        2016-12-07  218  void elv_rqhash_add(struct request_queue *q, struct request *rq)
> 9817064b68fef7 Jens Axboe        2006-07-28  219  {
> b374d18a4bfce7 Jens Axboe        2008-10-31  220  	struct elevator_queue *e = q->elevator;
> 9817064b68fef7 Jens Axboe        2006-07-28  221  
> 9817064b68fef7 Jens Axboe        2006-07-28 @222  	BUG_ON(ELV_ON_HASH(rq));
> 242d98f077ac0a Sasha Levin       2012-12-17  223  	hash_add(e->hash, &rq->hash, rq_hash_key(rq));
> e806402130c9c4 Christoph Hellwig 2016-10-20  224  	rq->rq_flags |= RQF_HASHED;
> 9817064b68fef7 Jens Axboe        2006-07-28  225  }
> bd166ef183c263 Jens Axboe        2017-01-17  226  EXPORT_SYMBOL_GPL(elv_rqhash_add);
> 9817064b68fef7 Jens Axboe        2006-07-28  227  
> 
> :::::: The code at line 222 was first introduced by commit
> :::::: 9817064b68fef7e4580c6df1ea597e106b9ff88b [PATCH] elevator: move the backmerging logic into the elevator core
> 
> :::::: TO: Jens Axboe <axboe@suse.de>
> :::::: CC: Jens Axboe <axboe@nelson.home.kernel.dk>
> 
> -- 
> 0-DAY CI Kernel Test Service
> https://01.org/lkp

Thanks,
Olek

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants
  2022-06-17 14:40   ` Alexander Lobakin
  (?)
@ 2022-06-20 14:19     ` Mark Rutland
  -1 siblings, 0 replies; 93+ messages in thread
From: Mark Rutland @ 2022-06-20 14:19 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

On Fri, Jun 17, 2022 at 04:40:24PM +0200, Alexander Lobakin wrote:
> So, in order to let the compiler optimize out such cases, expand the
> test_bit() and __*_bit() definitions with a compile-time condition
> check, so that they will pick the generic C non-atomic bitop
> implementations when all of the arguments passed are compile-time
> constants, which means that the result will be a compile-time
> constant as well and the compiler will produce more efficient and
> simple code in 100% cases (no changes when there's at least one
> non-compile-time-constant argument).

> The savings are architecture, compiler and compiler flags dependent,
> for example, on x86_64 -O2:
> 
> GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
> 
> and ARM64 (courtesy of Mark[0]):
> 
> GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)

Hmm... with *this version* of the series, I'm not getting results nearly as
good as that when building defconfig atop v5.19-rc3:

  GCC 8.5.0:   add/remove: 83/49 grow/shrink: 973/1147 up/down: 32020/-47824 (-15804)
  GCC 9.3.0:   add/remove: 68/51 grow/shrink: 1167/592 up/down: 30720/-31352 (-632)
  GCC 10.3.0:  add/remove: 84/37 grow/shrink: 1711/1003 up/down: 45392/-41844 (3548)
  GCC 11.1.0:  add/remove: 88/31 grow/shrink: 1635/963 up/down: 51540/-46096 (5444)
  GCC 11.3.0:  add/remove: 89/32 grow/shrink: 1629/966 up/down: 51456/-46056 (5400)
  GCC 12.1.0:  add/remove: 84/31 grow/shrink: 1540/829 up/down: 48772/-43164 (5608)

  LLVM 12.0.1: add/remove: 118/58 grow/shrink: 437/381 up/down: 45312/-65668 (-20356)
  LLVM 13.0.1: add/remove: 35/19 grow/shrink: 416/243 up/down: 14408/-22200 (-7792)
  LLVM 14.0.0: add/remove: 42/16 grow/shrink: 415/234 up/down: 15296/-21008 (-5712)

... and that now seems to be regressing codegen with recent versions of GCC as
much as it improves it LLVM.

I'm not sure if we've improved some other code and removed the benefit between
v5.19-rc1 and v5.19-rc3, or whether something else it at play, but this doesn't
look as compelling as it did.

Overall that's mostly hidden in the Image size, due to 64K alignment and
padding requirements:

  Toolchain      Before      After       Difference

  GCC 8.5.0      36178432    36178432    0
  GCC 9.3.0      36112896    36112896    0
  GCC 10.3.0     36442624    36377088    -65536
  GCC 11.1.0     36311552    36377088    +65536
  GCC 11.3.0     36311552    36311552    0
  GCC 12.1.0     36377088    36377088    0

  LLVM 12.0.1    31418880    31418880    0
  LLVM 13.0.1    31418880    31418880    0
  LLVM 14.0.0    31218176    31218176    0

... so aside from the blip around GCC 10.3.0 and 11.1.0, there's not a massive
change overall (due to 64KiB alignment restrictions for portions of the kernel
Image).

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-20 14:19     ` Mark Rutland
  0 siblings, 0 replies; 93+ messages in thread
From: Mark Rutland @ 2022-06-20 14:19 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

On Fri, Jun 17, 2022 at 04:40:24PM +0200, Alexander Lobakin wrote:
> So, in order to let the compiler optimize out such cases, expand the
> test_bit() and __*_bit() definitions with a compile-time condition
> check, so that they will pick the generic C non-atomic bitop
> implementations when all of the arguments passed are compile-time
> constants, which means that the result will be a compile-time
> constant as well and the compiler will produce more efficient and
> simple code in 100% cases (no changes when there's at least one
> non-compile-time-constant argument).

> The savings are architecture, compiler and compiler flags dependent,
> for example, on x86_64 -O2:
> 
> GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
> 
> and ARM64 (courtesy of Mark[0]):
> 
> GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)

Hmm... with *this version* of the series, I'm not getting results nearly as
good as that when building defconfig atop v5.19-rc3:

  GCC 8.5.0:   add/remove: 83/49 grow/shrink: 973/1147 up/down: 32020/-47824 (-15804)
  GCC 9.3.0:   add/remove: 68/51 grow/shrink: 1167/592 up/down: 30720/-31352 (-632)
  GCC 10.3.0:  add/remove: 84/37 grow/shrink: 1711/1003 up/down: 45392/-41844 (3548)
  GCC 11.1.0:  add/remove: 88/31 grow/shrink: 1635/963 up/down: 51540/-46096 (5444)
  GCC 11.3.0:  add/remove: 89/32 grow/shrink: 1629/966 up/down: 51456/-46056 (5400)
  GCC 12.1.0:  add/remove: 84/31 grow/shrink: 1540/829 up/down: 48772/-43164 (5608)

  LLVM 12.0.1: add/remove: 118/58 grow/shrink: 437/381 up/down: 45312/-65668 (-20356)
  LLVM 13.0.1: add/remove: 35/19 grow/shrink: 416/243 up/down: 14408/-22200 (-7792)
  LLVM 14.0.0: add/remove: 42/16 grow/shrink: 415/234 up/down: 15296/-21008 (-5712)

... and that now seems to be regressing codegen with recent versions of GCC as
much as it improves it LLVM.

I'm not sure if we've improved some other code and removed the benefit between
v5.19-rc1 and v5.19-rc3, or whether something else it at play, but this doesn't
look as compelling as it did.

Overall that's mostly hidden in the Image size, due to 64K alignment and
padding requirements:

  Toolchain      Before      After       Difference

  GCC 8.5.0      36178432    36178432    0
  GCC 9.3.0      36112896    36112896    0
  GCC 10.3.0     36442624    36377088    -65536
  GCC 11.1.0     36311552    36377088    +65536
  GCC 11.3.0     36311552    36311552    0
  GCC 12.1.0     36377088    36377088    0

  LLVM 12.0.1    31418880    31418880    0
  LLVM 13.0.1    31418880    31418880    0
  LLVM 14.0.0    31218176    31218176    0

... so aside from the blip around GCC 10.3.0 and 11.1.0, there's not a massive
change overall (due to 64KiB alignment restrictions for portions of the kernel
Image).

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-20 14:19     ` Mark Rutland
  0 siblings, 0 replies; 93+ messages in thread
From: Mark Rutland @ 2022-06-20 14:19 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68

On Fri, Jun 17, 2022 at 04:40:24PM +0200, Alexander Lobakin wrote:
> So, in order to let the compiler optimize out such cases, expand the
> test_bit() and __*_bit() definitions with a compile-time condition
> check, so that they will pick the generic C non-atomic bitop
> implementations when all of the arguments passed are compile-time
> constants, which means that the result will be a compile-time
> constant as well and the compiler will produce more efficient and
> simple code in 100% cases (no changes when there's at least one
> non-compile-time-constant argument).

> The savings are architecture, compiler and compiler flags dependent,
> for example, on x86_64 -O2:
> 
> GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
> 
> and ARM64 (courtesy of Mark[0]):
> 
> GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)

Hmm... with *this version* of the series, I'm not getting results nearly as
good as that when building defconfig atop v5.19-rc3:

  GCC 8.5.0:   add/remove: 83/49 grow/shrink: 973/1147 up/down: 32020/-47824 (-15804)
  GCC 9.3.0:   add/remove: 68/51 grow/shrink: 1167/592 up/down: 30720/-31352 (-632)
  GCC 10.3.0:  add/remove: 84/37 grow/shrink: 1711/1003 up/down: 45392/-41844 (3548)
  GCC 11.1.0:  add/remove: 88/31 grow/shrink: 1635/963 up/down: 51540/-46096 (5444)
  GCC 11.3.0:  add/remove: 89/32 grow/shrink: 1629/966 up/down: 51456/-46056 (5400)
  GCC 12.1.0:  add/remove: 84/31 grow/shrink: 1540/829 up/down: 48772/-43164 (5608)

  LLVM 12.0.1: add/remove: 118/58 grow/shrink: 437/381 up/down: 45312/-65668 (-20356)
  LLVM 13.0.1: add/remove: 35/19 grow/shrink: 416/243 up/down: 14408/-22200 (-7792)
  LLVM 14.0.0: add/remove: 42/16 grow/shrink: 415/234 up/down: 15296/-21008 (-5712)

... and that now seems to be regressing codegen with recent versions of GCC as
much as it improves it LLVM.

I'm not sure if we've improved some other code and removed the benefit between
v5.19-rc1 and v5.19-rc3, or whether something else it at play, but this doesn't
look as compelling as it did.

Overall that's mostly hidden in the Image size, due to 64K alignment and
padding requirements:

  Toolchain      Before      After       Difference

  GCC 8.5.0      36178432    36178432    0
  GCC 9.3.0      36112896    36112896    0
  GCC 10.3.0     36442624    36377088    -65536
  GCC 11.1.0     36311552    36377088    +65536
  GCC 11.3.0     36311552    36311552    0
  GCC 12.1.0     36377088    36377088    0

  LLVM 12.0.1    31418880    31418880    0
  LLVM 13.0.1    31418880    31418880    0
  LLVM 14.0.0    31218176    31218176    0

... so aside from the blip around GCC 10.3.0 and 11.1.0, there's not a massive
change overall (due to 64KiB alignment restrictions for portions of the kernel
Image).

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants
  2022-06-20 14:19     ` Mark Rutland
  (?)
@ 2022-06-20 15:08       ` Alexander Lobakin
  -1 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-20 15:08 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Alexander Lobakin, Arnd Bergmann, Yury Norov, Andy Shevchenko,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

From: Mark Rutland <mark.rutland@arm.com>
Date: Mon, 20 Jun 2022 15:19:42 +0100

> On Fri, Jun 17, 2022 at 04:40:24PM +0200, Alexander Lobakin wrote:
> > So, in order to let the compiler optimize out such cases, expand the
> > test_bit() and __*_bit() definitions with a compile-time condition
> > check, so that they will pick the generic C non-atomic bitop
> > implementations when all of the arguments passed are compile-time
> > constants, which means that the result will be a compile-time
> > constant as well and the compiler will produce more efficient and
> > simple code in 100% cases (no changes when there's at least one
> > non-compile-time-constant argument).
> 
> > The savings are architecture, compiler and compiler flags dependent,
> > for example, on x86_64 -O2:
> > 
> > GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> > LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> > LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
> > 
> > and ARM64 (courtesy of Mark[0]):
> > 
> > GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> > LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
> 
> Hmm... with *this version* of the series, I'm not getting results nearly as
> good as that when building defconfig atop v5.19-rc3:
> 
>   GCC 8.5.0:   add/remove: 83/49 grow/shrink: 973/1147 up/down: 32020/-47824 (-15804)
>   GCC 9.3.0:   add/remove: 68/51 grow/shrink: 1167/592 up/down: 30720/-31352 (-632)
>   GCC 10.3.0:  add/remove: 84/37 grow/shrink: 1711/1003 up/down: 45392/-41844 (3548)
>   GCC 11.1.0:  add/remove: 88/31 grow/shrink: 1635/963 up/down: 51540/-46096 (5444)
>   GCC 11.3.0:  add/remove: 89/32 grow/shrink: 1629/966 up/down: 51456/-46056 (5400)
>   GCC 12.1.0:  add/remove: 84/31 grow/shrink: 1540/829 up/down: 48772/-43164 (5608)
> 
>   LLVM 12.0.1: add/remove: 118/58 grow/shrink: 437/381 up/down: 45312/-65668 (-20356)
>   LLVM 13.0.1: add/remove: 35/19 grow/shrink: 416/243 up/down: 14408/-22200 (-7792)
>   LLVM 14.0.0: add/remove: 42/16 grow/shrink: 415/234 up/down: 15296/-21008 (-5712)
> 
> ... and that now seems to be regressing codegen with recent versions of GCC as
> much as it improves it LLVM.
> 
> I'm not sure if we've improved some other code and removed the benefit between
> v5.19-rc1 and v5.19-rc3, or whether something else it at play, but this doesn't
> look as compelling as it did.

Mostly likely it's due to that in v1 I mistakingly removed
`volatile` from gen[eric]_test_bit(), so there was an impact for
non-constant cases as well.
+5 Kb sounds bad tho. Do you have CONFIG_TEST_BITMAP enabled, does
it work? Probably the same reason as for m68k, more constant
optimization -> more aggressive inlining or inlining rebalance ->
larger code. OTOH I've no idea why sometimes compiler decides to
uninline really tiny functions where due to this patch series some
bitops have been converted to constants, like it goes on m68k.

> 
> Overall that's mostly hidden in the Image size, due to 64K alignment and
> padding requirements:
> 
>   Toolchain      Before      After       Difference
> 
>   GCC 8.5.0      36178432    36178432    0
>   GCC 9.3.0      36112896    36112896    0
>   GCC 10.3.0     36442624    36377088    -65536
>   GCC 11.1.0     36311552    36377088    +65536
>   GCC 11.3.0     36311552    36311552    0
>   GCC 12.1.0     36377088    36377088    0
> 
>   LLVM 12.0.1    31418880    31418880    0
>   LLVM 13.0.1    31418880    31418880    0
>   LLVM 14.0.0    31218176    31218176    0
> 
> ... so aside from the blip around GCC 10.3.0 and 11.1.0, there's not a massive
> change overall (due to 64KiB alignment restrictions for portions of the kernel
> Image).
> 
> Thanks,
> Mark.

Thanks,
Olek

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-20 15:08       ` Alexander Lobakin
  0 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-20 15:08 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Alexander Lobakin, Arnd Bergmann, Yury Norov, Andy Shevchenko,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

From: Mark Rutland <mark.rutland@arm.com>
Date: Mon, 20 Jun 2022 15:19:42 +0100

> On Fri, Jun 17, 2022 at 04:40:24PM +0200, Alexander Lobakin wrote:
> > So, in order to let the compiler optimize out such cases, expand the
> > test_bit() and __*_bit() definitions with a compile-time condition
> > check, so that they will pick the generic C non-atomic bitop
> > implementations when all of the arguments passed are compile-time
> > constants, which means that the result will be a compile-time
> > constant as well and the compiler will produce more efficient and
> > simple code in 100% cases (no changes when there's at least one
> > non-compile-time-constant argument).
> 
> > The savings are architecture, compiler and compiler flags dependent,
> > for example, on x86_64 -O2:
> > 
> > GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> > LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> > LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
> > 
> > and ARM64 (courtesy of Mark[0]):
> > 
> > GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> > LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
> 
> Hmm... with *this version* of the series, I'm not getting results nearly as
> good as that when building defconfig atop v5.19-rc3:
> 
>   GCC 8.5.0:   add/remove: 83/49 grow/shrink: 973/1147 up/down: 32020/-47824 (-15804)
>   GCC 9.3.0:   add/remove: 68/51 grow/shrink: 1167/592 up/down: 30720/-31352 (-632)
>   GCC 10.3.0:  add/remove: 84/37 grow/shrink: 1711/1003 up/down: 45392/-41844 (3548)
>   GCC 11.1.0:  add/remove: 88/31 grow/shrink: 1635/963 up/down: 51540/-46096 (5444)
>   GCC 11.3.0:  add/remove: 89/32 grow/shrink: 1629/966 up/down: 51456/-46056 (5400)
>   GCC 12.1.0:  add/remove: 84/31 grow/shrink: 1540/829 up/down: 48772/-43164 (5608)
> 
>   LLVM 12.0.1: add/remove: 118/58 grow/shrink: 437/381 up/down: 45312/-65668 (-20356)
>   LLVM 13.0.1: add/remove: 35/19 grow/shrink: 416/243 up/down: 14408/-22200 (-7792)
>   LLVM 14.0.0: add/remove: 42/16 grow/shrink: 415/234 up/down: 15296/-21008 (-5712)
> 
> ... and that now seems to be regressing codegen with recent versions of GCC as
> much as it improves it LLVM.
> 
> I'm not sure if we've improved some other code and removed the benefit between
> v5.19-rc1 and v5.19-rc3, or whether something else it at play, but this doesn't
> look as compelling as it did.

Mostly likely it's due to that in v1 I mistakingly removed
`volatile` from gen[eric]_test_bit(), so there was an impact for
non-constant cases as well.
+5 Kb sounds bad tho. Do you have CONFIG_TEST_BITMAP enabled, does
it work? Probably the same reason as for m68k, more constant
optimization -> more aggressive inlining or inlining rebalance ->
larger code. OTOH I've no idea why sometimes compiler decides to
uninline really tiny functions where due to this patch series some
bitops have been converted to constants, like it goes on m68k.

> 
> Overall that's mostly hidden in the Image size, due to 64K alignment and
> padding requirements:
> 
>   Toolchain      Before      After       Difference
> 
>   GCC 8.5.0      36178432    36178432    0
>   GCC 9.3.0      36112896    36112896    0
>   GCC 10.3.0     36442624    36377088    -65536
>   GCC 11.1.0     36311552    36377088    +65536
>   GCC 11.3.0     36311552    36311552    0
>   GCC 12.1.0     36377088    36377088    0
> 
>   LLVM 12.0.1    31418880    31418880    0
>   LLVM 13.0.1    31418880    31418880    0
>   LLVM 14.0.0    31218176    31218176    0
> 
> ... so aside from the blip around GCC 10.3.0 and 11.1.0, there's not a massive
> change overall (due to 64KiB alignment restrictions for portions of the kernel
> Image).
> 
> Thanks,
> Mark.

Thanks,
Olek

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-20 15:08       ` Alexander Lobakin
  0 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-20 15:08 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Alexander Lobakin, Arnd Bergmann, Yury Norov, Andy Shevchenko,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-he

From: Mark Rutland <mark.rutland@arm.com>
Date: Mon, 20 Jun 2022 15:19:42 +0100

> On Fri, Jun 17, 2022 at 04:40:24PM +0200, Alexander Lobakin wrote:
> > So, in order to let the compiler optimize out such cases, expand the
> > test_bit() and __*_bit() definitions with a compile-time condition
> > check, so that they will pick the generic C non-atomic bitop
> > implementations when all of the arguments passed are compile-time
> > constants, which means that the result will be a compile-time
> > constant as well and the compiler will produce more efficient and
> > simple code in 100% cases (no changes when there's at least one
> > non-compile-time-constant argument).
> 
> > The savings are architecture, compiler and compiler flags dependent,
> > for example, on x86_64 -O2:
> > 
> > GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> > LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> > LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
> > 
> > and ARM64 (courtesy of Mark[0]):
> > 
> > GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> > LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
> 
> Hmm... with *this version* of the series, I'm not getting results nearly as
> good as that when building defconfig atop v5.19-rc3:
> 
>   GCC 8.5.0:   add/remove: 83/49 grow/shrink: 973/1147 up/down: 32020/-47824 (-15804)
>   GCC 9.3.0:   add/remove: 68/51 grow/shrink: 1167/592 up/down: 30720/-31352 (-632)
>   GCC 10.3.0:  add/remove: 84/37 grow/shrink: 1711/1003 up/down: 45392/-41844 (3548)
>   GCC 11.1.0:  add/remove: 88/31 grow/shrink: 1635/963 up/down: 51540/-46096 (5444)
>   GCC 11.3.0:  add/remove: 89/32 grow/shrink: 1629/966 up/down: 51456/-46056 (5400)
>   GCC 12.1.0:  add/remove: 84/31 grow/shrink: 1540/829 up/down: 48772/-43164 (5608)
> 
>   LLVM 12.0.1: add/remove: 118/58 grow/shrink: 437/381 up/down: 45312/-65668 (-20356)
>   LLVM 13.0.1: add/remove: 35/19 grow/shrink: 416/243 up/down: 14408/-22200 (-7792)
>   LLVM 14.0.0: add/remove: 42/16 grow/shrink: 415/234 up/down: 15296/-21008 (-5712)
> 
> ... and that now seems to be regressing codegen with recent versions of GCC as
> much as it improves it LLVM.
> 
> I'm not sure if we've improved some other code and removed the benefit between
> v5.19-rc1 and v5.19-rc3, or whether something else it at play, but this doesn't
> look as compelling as it did.

Mostly likely it's due to that in v1 I mistakingly removed
`volatile` from gen[eric]_test_bit(), so there was an impact for
non-constant cases as well.
+5 Kb sounds bad tho. Do you have CONFIG_TEST_BITMAP enabled, does
it work? Probably the same reason as for m68k, more constant
optimization -> more aggressive inlining or inlining rebalance ->
larger code. OTOH I've no idea why sometimes compiler decides to
uninline really tiny functions where due to this patch series some
bitops have been converted to constants, like it goes on m68k.

> 
> Overall that's mostly hidden in the Image size, due to 64K alignment and
> padding requirements:
> 
>   Toolchain      Before      After       Difference
> 
>   GCC 8.5.0      36178432    36178432    0
>   GCC 9.3.0      36112896    36112896    0
>   GCC 10.3.0     36442624    36377088    -65536
>   GCC 11.1.0     36311552    36377088    +65536
>   GCC 11.3.0     36311552    36311552    0
>   GCC 12.1.0     36377088    36377088    0
> 
>   LLVM 12.0.1    31418880    31418880    0
>   LLVM 13.0.1    31418880    31418880    0
>   LLVM 14.0.0    31218176    31218176    0
> 
> ... so aside from the blip around GCC 10.3.0 and 11.1.0, there's not a massive
> change overall (due to 64KiB alignment restrictions for portions of the kernel
> Image).
> 
> Thanks,
> Mark.

Thanks,
Olek

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [alobakin:bitops 3/7] block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t
  2022-06-20 13:51     ` Alexander Lobakin
  (?)
  (?)
@ 2022-06-20 15:18       ` Andy Shevchenko
  -1 siblings, 0 replies; 93+ messages in thread
From: Andy Shevchenko @ 2022-06-20 15:18 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Mark Rutland,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, Linux-SH, Sparc kernel list, Linux-Arch,
	Linux Kernel Mailing List, kbuild-all, kernel test robot

On Mon, Jun 20, 2022 at 4:48 PM Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
>
> From: kernel test robot <lkp@intel.com>
> Date: Sun, 19 Jun 2022 17:20:05 +0800
>
> Also, could someone please help me with this? I don't get what went
> wrong with sparse, it's not even some new code, just moving old
> stuff.
>
> > tree:   https://github.com/alobakin/linux bitops
> > head:   9bd39b17ce49d350eed93a031e0da6389067013e
> > commit: 521611f961a7dda92eefa26e1afd3914c06af64e [3/7] bitops: unify non-atomic bitops prototypes across architectures
> > config: mips-randconfig-s031-20220619 (https://download.01.org/0day-ci/archive/20220619/202206191726.wq70mbMK-lkp@intel.com/config)
> > compiler: mips64el-linux-gcc (GCC) 11.3.0
> > reproduce:
> >         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> >         chmod +x ~/bin/make.cross
> >         # apt-get install sparse
> >         # sparse version: v0.6.4-30-g92122700-dirty
> >         # https://github.com/alobakin/linux/commit/521611f961a7dda92eefa26e1afd3914c06af64e
> >         git remote add alobakin https://github.com/alobakin/linux
> >         git fetch --no-tags alobakin bitops
> >         git checkout 521611f961a7dda92eefa26e1afd3914c06af64e
> >         # save the config file
> >         mkdir build_dir && cp config build_dir/.config
> >         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=mips SHELL=/bin/bash
> >
> > If you fix the issue, kindly add following tag where applicable
> > Reported-by: kernel test robot <lkp@intel.com>
> >
> >
> > sparse warnings: (new ones prefixed by >>)
> >    command-line: note: in included file:
> >    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQUIRE redefined
> >    builtin:0:0: sparse: this was the original definition
> >    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_SEQ_CST redefined
> >    builtin:0:0: sparse: this was the original definition
> >    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQ_REL redefined
> >    builtin:0:0: sparse: this was the original definition
> >    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_RELEASE redefined
> >    builtin:0:0: sparse: this was the original definition
> >    block/elevator.c: note: in included file (through include/linux/bitops.h, include/linux/kernel.h):
> >    include/asm-generic/bitops/generic-non-atomic.h:29:9: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:30:9: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:32:10: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:32:16: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:27:1: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:38:9: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:39:9: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:41:10: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:41:16: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:36:1: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:56:9: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:57:9: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:59:10: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:59:15: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:54:1: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:74:9: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:75:9: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:76:9: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:78:10: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:78:14: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:78:20: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:79:17: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:79:23: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:79:9: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:72:1: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:94:9: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:95:9: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:96:9: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:98:10: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:98:14: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:98:21: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:99:17: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:99:23: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:99:9: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:92:1: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:106:9: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:107:9: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:108:9: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:110:10: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:110:14: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:110:20: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:111:17: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:111:23: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:111:9: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:104:1: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:127:9: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:120:1: sparse: sparse: unreplaced symbol 'return'
> > >> block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t
> >
> > vim +222 block/elevator.c
> >
> > 9817064b68fef7 Jens Axboe        2006-07-28  217
> > 70b3ea056f3074 Jens Axboe        2016-12-07  218  void elv_rqhash_add(struct request_queue *q, struct request *rq)
> > 9817064b68fef7 Jens Axboe        2006-07-28  219  {
> > b374d18a4bfce7 Jens Axboe        2008-10-31  220      struct elevator_queue *e = q->elevator;
> > 9817064b68fef7 Jens Axboe        2006-07-28  221
> > 9817064b68fef7 Jens Axboe        2006-07-28 @222      BUG_ON(ELV_ON_HASH(rq));
> > 242d98f077ac0a Sasha Levin       2012-12-17  223      hash_add(e->hash, &rq->hash, rq_hash_key(rq));
> > e806402130c9c4 Christoph Hellwig 2016-10-20  224      rq->rq_flags |= RQF_HASHED;
> > 9817064b68fef7 Jens Axboe        2006-07-28  225  }
> > bd166ef183c263 Jens Axboe        2017-01-17  226  EXPORT_SYMBOL_GPL(elv_rqhash_add);
> > 9817064b68fef7 Jens Axboe        2006-07-28  227

It looks like a false positive for _your_ case, but if you want to fix
here is the background.

The sparse has an ability to control custom types that should never
set bits outside of the limited range. For this the special annotation
is given, i.e. __bitwise. Since the culprit type is defined that way
it means the pure integer (signed or unsigned) that comes with pure
definition can't be used in a safe way. To solve this each of such
definitions should be converted to the very same type (req_flags_t).
See serial core where some UART flags are defined in a similar way and
how code copes with that.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [alobakin:bitops 3/7] block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t
@ 2022-06-20 15:18       ` Andy Shevchenko
  0 siblings, 0 replies; 93+ messages in thread
From: Andy Shevchenko @ 2022-06-20 15:18 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 8963 bytes --]

On Mon, Jun 20, 2022 at 4:48 PM Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
>
> From: kernel test robot <lkp@intel.com>
> Date: Sun, 19 Jun 2022 17:20:05 +0800
>
> Also, could someone please help me with this? I don't get what went
> wrong with sparse, it's not even some new code, just moving old
> stuff.
>
> > tree:   https://github.com/alobakin/linux bitops
> > head:   9bd39b17ce49d350eed93a031e0da6389067013e
> > commit: 521611f961a7dda92eefa26e1afd3914c06af64e [3/7] bitops: unify non-atomic bitops prototypes across architectures
> > config: mips-randconfig-s031-20220619 (https://download.01.org/0day-ci/archive/20220619/202206191726.wq70mbMK-lkp(a)intel.com/config)
> > compiler: mips64el-linux-gcc (GCC) 11.3.0
> > reproduce:
> >         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> >         chmod +x ~/bin/make.cross
> >         # apt-get install sparse
> >         # sparse version: v0.6.4-30-g92122700-dirty
> >         # https://github.com/alobakin/linux/commit/521611f961a7dda92eefa26e1afd3914c06af64e
> >         git remote add alobakin https://github.com/alobakin/linux
> >         git fetch --no-tags alobakin bitops
> >         git checkout 521611f961a7dda92eefa26e1afd3914c06af64e
> >         # save the config file
> >         mkdir build_dir && cp config build_dir/.config
> >         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=mips SHELL=/bin/bash
> >
> > If you fix the issue, kindly add following tag where applicable
> > Reported-by: kernel test robot <lkp@intel.com>
> >
> >
> > sparse warnings: (new ones prefixed by >>)
> >    command-line: note: in included file:
> >    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQUIRE redefined
> >    builtin:0:0: sparse: this was the original definition
> >    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_SEQ_CST redefined
> >    builtin:0:0: sparse: this was the original definition
> >    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQ_REL redefined
> >    builtin:0:0: sparse: this was the original definition
> >    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_RELEASE redefined
> >    builtin:0:0: sparse: this was the original definition
> >    block/elevator.c: note: in included file (through include/linux/bitops.h, include/linux/kernel.h):
> >    include/asm-generic/bitops/generic-non-atomic.h:29:9: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:30:9: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:32:10: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:32:16: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:27:1: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:38:9: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:39:9: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:41:10: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:41:16: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:36:1: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:56:9: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:57:9: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:59:10: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:59:15: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:54:1: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:74:9: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:75:9: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:76:9: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:78:10: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:78:14: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:78:20: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:79:17: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:79:23: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:79:9: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:72:1: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:94:9: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:95:9: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:96:9: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:98:10: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:98:14: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:98:21: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:99:17: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:99:23: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:99:9: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:92:1: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:106:9: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:107:9: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:108:9: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:110:10: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:110:14: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:110:20: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:111:17: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:111:23: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:111:9: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:104:1: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:127:9: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:120:1: sparse: sparse: unreplaced symbol 'return'
> > >> block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t
> >
> > vim +222 block/elevator.c
> >
> > 9817064b68fef7 Jens Axboe        2006-07-28  217
> > 70b3ea056f3074 Jens Axboe        2016-12-07  218  void elv_rqhash_add(struct request_queue *q, struct request *rq)
> > 9817064b68fef7 Jens Axboe        2006-07-28  219  {
> > b374d18a4bfce7 Jens Axboe        2008-10-31  220      struct elevator_queue *e = q->elevator;
> > 9817064b68fef7 Jens Axboe        2006-07-28  221
> > 9817064b68fef7 Jens Axboe        2006-07-28 @222      BUG_ON(ELV_ON_HASH(rq));
> > 242d98f077ac0a Sasha Levin       2012-12-17  223      hash_add(e->hash, &rq->hash, rq_hash_key(rq));
> > e806402130c9c4 Christoph Hellwig 2016-10-20  224      rq->rq_flags |= RQF_HASHED;
> > 9817064b68fef7 Jens Axboe        2006-07-28  225  }
> > bd166ef183c263 Jens Axboe        2017-01-17  226  EXPORT_SYMBOL_GPL(elv_rqhash_add);
> > 9817064b68fef7 Jens Axboe        2006-07-28  227

It looks like a false positive for _your_ case, but if you want to fix
here is the background.

The sparse has an ability to control custom types that should never
set bits outside of the limited range. For this the special annotation
is given, i.e. __bitwise. Since the culprit type is defined that way
it means the pure integer (signed or unsigned) that comes with pure
definition can't be used in a safe way. To solve this each of such
definitions should be converted to the very same type (req_flags_t).
See serial core where some UART flags are defined in a similar way and
how code copes with that.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [alobakin:bitops 3/7] block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t
@ 2022-06-20 15:18       ` Andy Shevchenko
  0 siblings, 0 replies; 93+ messages in thread
From: Andy Shevchenko @ 2022-06-20 15:18 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Mark Rutland,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, Linux-SH, Sparc kernel list, Linux-Arch,
	Linux Kernel Mailing List, kbuild-all, kernel test robot

On Mon, Jun 20, 2022 at 4:48 PM Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
>
> From: kernel test robot <lkp@intel.com>
> Date: Sun, 19 Jun 2022 17:20:05 +0800
>
> Also, could someone please help me with this? I don't get what went
> wrong with sparse, it's not even some new code, just moving old
> stuff.
>
> > tree:   https://github.com/alobakin/linux bitops
> > head:   9bd39b17ce49d350eed93a031e0da6389067013e
> > commit: 521611f961a7dda92eefa26e1afd3914c06af64e [3/7] bitops: unify non-atomic bitops prototypes across architectures
> > config: mips-randconfig-s031-20220619 (https://download.01.org/0day-ci/archive/20220619/202206191726.wq70mbMK-lkp@intel.com/config)
> > compiler: mips64el-linux-gcc (GCC) 11.3.0
> > reproduce:
> >         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> >         chmod +x ~/bin/make.cross
> >         # apt-get install sparse
> >         # sparse version: v0.6.4-30-g92122700-dirty
> >         # https://github.com/alobakin/linux/commit/521611f961a7dda92eefa26e1afd3914c06af64e
> >         git remote add alobakin https://github.com/alobakin/linux
> >         git fetch --no-tags alobakin bitops
> >         git checkout 521611f961a7dda92eefa26e1afd3914c06af64e
> >         # save the config file
> >         mkdir build_dir && cp config build_dir/.config
> >         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=mips SHELL=/bin/bash
> >
> > If you fix the issue, kindly add following tag where applicable
> > Reported-by: kernel test robot <lkp@intel.com>
> >
> >
> > sparse warnings: (new ones prefixed by >>)
> >    command-line: note: in included file:
> >    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQUIRE redefined
> >    builtin:0:0: sparse: this was the original definition
> >    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_SEQ_CST redefined
> >    builtin:0:0: sparse: this was the original definition
> >    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQ_REL redefined
> >    builtin:0:0: sparse: this was the original definition
> >    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_RELEASE redefined
> >    builtin:0:0: sparse: this was the original definition
> >    block/elevator.c: note: in included file (through include/linux/bitops.h, include/linux/kernel.h):
> >    include/asm-generic/bitops/generic-non-atomic.h:29:9: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:30:9: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:32:10: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:32:16: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:27:1: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:38:9: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:39:9: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:41:10: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:41:16: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:36:1: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:56:9: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:57:9: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:59:10: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:59:15: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:54:1: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:74:9: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:75:9: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:76:9: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:78:10: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:78:14: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:78:20: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:79:17: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:79:23: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:79:9: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:72:1: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:94:9: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:95:9: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:96:9: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:98:10: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:98:14: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:98:21: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:99:17: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:99:23: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:99:9: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:92:1: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:106:9: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:107:9: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:108:9: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:110:10: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:110:14: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:110:20: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:111:17: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:111:23: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:111:9: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:104:1: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:127:9: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:120:1: sparse: sparse: unreplaced symbol 'return'
> > >> block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t
> >
> > vim +222 block/elevator.c
> >
> > 9817064b68fef7 Jens Axboe        2006-07-28  217
> > 70b3ea056f3074 Jens Axboe        2016-12-07  218  void elv_rqhash_add(struct request_queue *q, struct request *rq)
> > 9817064b68fef7 Jens Axboe        2006-07-28  219  {
> > b374d18a4bfce7 Jens Axboe        2008-10-31  220      struct elevator_queue *e = q->elevator;
> > 9817064b68fef7 Jens Axboe        2006-07-28  221
> > 9817064b68fef7 Jens Axboe        2006-07-28 @222      BUG_ON(ELV_ON_HASH(rq));
> > 242d98f077ac0a Sasha Levin       2012-12-17  223      hash_add(e->hash, &rq->hash, rq_hash_key(rq));
> > e806402130c9c4 Christoph Hellwig 2016-10-20  224      rq->rq_flags |= RQF_HASHED;
> > 9817064b68fef7 Jens Axboe        2006-07-28  225  }
> > bd166ef183c263 Jens Axboe        2017-01-17  226  EXPORT_SYMBOL_GPL(elv_rqhash_add);
> > 9817064b68fef7 Jens Axboe        2006-07-28  227

It looks like a false positive for _your_ case, but if you want to fix
here is the background.

The sparse has an ability to control custom types that should never
set bits outside of the limited range. For this the special annotation
is given, i.e. __bitwise. Since the culprit type is defined that way
it means the pure integer (signed or unsigned) that comes with pure
definition can't be used in a safe way. To solve this each of such
definitions should be converted to the very same type (req_flags_t).
See serial core where some UART flags are defined in a similar way and
how code copes with that.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [alobakin:bitops 3/7] block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t
@ 2022-06-20 15:18       ` Andy Shevchenko
  0 siblings, 0 replies; 93+ messages in thread
From: Andy Shevchenko @ 2022-06-20 15:18 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Mark Rutland,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon

On Mon, Jun 20, 2022 at 4:48 PM Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
>
> From: kernel test robot <lkp@intel.com>
> Date: Sun, 19 Jun 2022 17:20:05 +0800
>
> Also, could someone please help me with this? I don't get what went
> wrong with sparse, it's not even some new code, just moving old
> stuff.
>
> > tree:   https://github.com/alobakin/linux bitops
> > head:   9bd39b17ce49d350eed93a031e0da6389067013e
> > commit: 521611f961a7dda92eefa26e1afd3914c06af64e [3/7] bitops: unify non-atomic bitops prototypes across architectures
> > config: mips-randconfig-s031-20220619 (https://download.01.org/0day-ci/archive/20220619/202206191726.wq70mbMK-lkp@intel.com/config)
> > compiler: mips64el-linux-gcc (GCC) 11.3.0
> > reproduce:
> >         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> >         chmod +x ~/bin/make.cross
> >         # apt-get install sparse
> >         # sparse version: v0.6.4-30-g92122700-dirty
> >         # https://github.com/alobakin/linux/commit/521611f961a7dda92eefa26e1afd3914c06af64e
> >         git remote add alobakin https://github.com/alobakin/linux
> >         git fetch --no-tags alobakin bitops
> >         git checkout 521611f961a7dda92eefa26e1afd3914c06af64e
> >         # save the config file
> >         mkdir build_dir && cp config build_dir/.config
> >         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=mips SHELL=/bin/bash
> >
> > If you fix the issue, kindly add following tag where applicable
> > Reported-by: kernel test robot <lkp@intel.com>
> >
> >
> > sparse warnings: (new ones prefixed by >>)
> >    command-line: note: in included file:
> >    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQUIRE redefined
> >    builtin:0:0: sparse: this was the original definition
> >    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_SEQ_CST redefined
> >    builtin:0:0: sparse: this was the original definition
> >    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQ_REL redefined
> >    builtin:0:0: sparse: this was the original definition
> >    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_RELEASE redefined
> >    builtin:0:0: sparse: this was the original definition
> >    block/elevator.c: note: in included file (through include/linux/bitops.h, include/linux/kernel.h):
> >    include/asm-generic/bitops/generic-non-atomic.h:29:9: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:30:9: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:32:10: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:32:16: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:27:1: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:38:9: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:39:9: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:41:10: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:41:16: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:36:1: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:56:9: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:57:9: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:59:10: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:59:15: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:54:1: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:74:9: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:75:9: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:76:9: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:78:10: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:78:14: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:78:20: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:79:17: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:79:23: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:79:9: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:72:1: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:94:9: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:95:9: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:96:9: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:98:10: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:98:14: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:98:21: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:99:17: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:99:23: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:99:9: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:92:1: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:106:9: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:107:9: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:108:9: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:110:10: sparse: sparse: unreplaced symbol 'p'
> >    include/asm-generic/bitops/generic-non-atomic.h:110:14: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:110:20: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:111:17: sparse: sparse: unreplaced symbol 'old'
> >    include/asm-generic/bitops/generic-non-atomic.h:111:23: sparse: sparse: unreplaced symbol 'mask'
> >    include/asm-generic/bitops/generic-non-atomic.h:111:9: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:104:1: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:127:9: sparse: sparse: unreplaced symbol 'return'
> >    include/asm-generic/bitops/generic-non-atomic.h:120:1: sparse: sparse: unreplaced symbol 'return'
> > >> block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t
> >
> > vim +222 block/elevator.c
> >
> > 9817064b68fef7 Jens Axboe        2006-07-28  217
> > 70b3ea056f3074 Jens Axboe        2016-12-07  218  void elv_rqhash_add(struct request_queue *q, struct request *rq)
> > 9817064b68fef7 Jens Axboe        2006-07-28  219  {
> > b374d18a4bfce7 Jens Axboe        2008-10-31  220      struct elevator_queue *e = q->elevator;
> > 9817064b68fef7 Jens Axboe        2006-07-28  221
> > 9817064b68fef7 Jens Axboe        2006-07-28 @222      BUG_ON(ELV_ON_HASH(rq));
> > 242d98f077ac0a Sasha Levin       2012-12-17  223      hash_add(e->hash, &rq->hash, rq_hash_key(rq));
> > e806402130c9c4 Christoph Hellwig 2016-10-20  224      rq->rq_flags |= RQF_HASHED;
> > 9817064b68fef7 Jens Axboe        2006-07-28  225  }
> > bd166ef183c263 Jens Axboe        2017-01-17  226  EXPORT_SYMBOL_GPL(elv_rqhash_add);
> > 9817064b68fef7 Jens Axboe        2006-07-28  227

It looks like a false positive for _your_ case, but if you want to fix
here is the background.

The sparse has an ability to control custom types that should never
set bits outside of the limited range. For this the special annotation
is given, i.e. __bitwise. Since the culprit type is defined that way
it means the pure integer (signed or unsigned) that comes with pure
definition can't be used in a safe way. To solve this each of such
definitions should be converted to the very same type (req_flags_t).
See serial core where some UART flags are defined in a similar way and
how code copes with that.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [alobakin:bitops 3/7] block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t
  2022-06-20 15:18       ` Andy Shevchenko
  (?)
@ 2022-06-20 15:27         ` Alexander Lobakin
  -1 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-20 15:27 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Alexander Lobakin, Arnd Bergmann, Yury Norov, Andy Shevchenko,
	Mark Rutland, Matt Turner, Brian Cain, Geert Uytterhoeven,
	Yoshinori Sato, Rich Felker, David S. Miller, Kees Cook,
	Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, Linux-SH, Sparc kernel list, Linux-Arch,
	Linux Kernel Mailing List, kbuild-all, kernel test robot

From: Andy Shevchenko <andy.shevchenko@gmail.com>
Date: Mon, 20 Jun 2022 17:18:40 +0200

> On Mon, Jun 20, 2022 at 4:48 PM Alexander Lobakin
> <alexandr.lobakin@intel.com> wrote:
> >
> > From: kernel test robot <lkp@intel.com>
> > Date: Sun, 19 Jun 2022 17:20:05 +0800
> >
> > Also, could someone please help me with this? I don't get what went
> > wrong with sparse, it's not even some new code, just moving old
> > stuff.
> >
> > > tree:   https://github.com/alobakin/linux bitops
> > > head:   9bd39b17ce49d350eed93a031e0da6389067013e
> > > commit: 521611f961a7dda92eefa26e1afd3914c06af64e [3/7] bitops: unify non-atomic bitops prototypes across architectures
> > > config: mips-randconfig-s031-20220619 (https://download.01.org/0day-ci/archive/20220619/202206191726.wq70mbMK-lkp@intel.com/config)
> > > compiler: mips64el-linux-gcc (GCC) 11.3.0
> > > reproduce:
> > >         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> > >         chmod +x ~/bin/make.cross
> > >         # apt-get install sparse
> > >         # sparse version: v0.6.4-30-g92122700-dirty
> > >         # https://github.com/alobakin/linux/commit/521611f961a7dda92eefa26e1afd3914c06af64e
> > >         git remote add alobakin https://github.com/alobakin/linux
> > >         git fetch --no-tags alobakin bitops
> > >         git checkout 521611f961a7dda92eefa26e1afd3914c06af64e
> > >         # save the config file
> > >         mkdir build_dir && cp config build_dir/.config
> > >         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=mips SHELL=/bin/bash
> > >
> > > If you fix the issue, kindly add following tag where applicable
> > > Reported-by: kernel test robot <lkp@intel.com>
> > >
> > >
> > > sparse warnings: (new ones prefixed by >>)
> > >    command-line: note: in included file:
> > >    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQUIRE redefined
> > >    builtin:0:0: sparse: this was the original definition
> > >    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_SEQ_CST redefined
> > >    builtin:0:0: sparse: this was the original definition
> > >    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQ_REL redefined
> > >    builtin:0:0: sparse: this was the original definition
> > >    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_RELEASE redefined
> > >    builtin:0:0: sparse: this was the original definition
> > >    block/elevator.c: note: in included file (through include/linux/bitops.h, include/linux/kernel.h):
> > >    include/asm-generic/bitops/generic-non-atomic.h:29:9: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:30:9: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:32:10: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:32:16: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:27:1: sparse: sparse: unreplaced symbol 'return'
> > >    include/asm-generic/bitops/generic-non-atomic.h:38:9: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:39:9: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:41:10: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:41:16: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:36:1: sparse: sparse: unreplaced symbol 'return'
> > >    include/asm-generic/bitops/generic-non-atomic.h:56:9: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:57:9: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:59:10: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:59:15: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:54:1: sparse: sparse: unreplaced symbol 'return'
> > >    include/asm-generic/bitops/generic-non-atomic.h:74:9: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:75:9: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:76:9: sparse: sparse: unreplaced symbol 'old'
> > >    include/asm-generic/bitops/generic-non-atomic.h:78:10: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:78:14: sparse: sparse: unreplaced symbol 'old'
> > >    include/asm-generic/bitops/generic-non-atomic.h:78:20: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:79:17: sparse: sparse: unreplaced symbol 'old'
> > >    include/asm-generic/bitops/generic-non-atomic.h:79:23: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:79:9: sparse: sparse: unreplaced symbol 'return'
> > >    include/asm-generic/bitops/generic-non-atomic.h:72:1: sparse: sparse: unreplaced symbol 'return'
> > >    include/asm-generic/bitops/generic-non-atomic.h:94:9: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:95:9: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:96:9: sparse: sparse: unreplaced symbol 'old'
> > >    include/asm-generic/bitops/generic-non-atomic.h:98:10: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:98:14: sparse: sparse: unreplaced symbol 'old'
> > >    include/asm-generic/bitops/generic-non-atomic.h:98:21: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:99:17: sparse: sparse: unreplaced symbol 'old'
> > >    include/asm-generic/bitops/generic-non-atomic.h:99:23: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:99:9: sparse: sparse: unreplaced symbol 'return'
> > >    include/asm-generic/bitops/generic-non-atomic.h:92:1: sparse: sparse: unreplaced symbol 'return'
> > >    include/asm-generic/bitops/generic-non-atomic.h:106:9: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:107:9: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:108:9: sparse: sparse: unreplaced symbol 'old'
> > >    include/asm-generic/bitops/generic-non-atomic.h:110:10: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:110:14: sparse: sparse: unreplaced symbol 'old'
> > >    include/asm-generic/bitops/generic-non-atomic.h:110:20: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:111:17: sparse: sparse: unreplaced symbol 'old'
> > >    include/asm-generic/bitops/generic-non-atomic.h:111:23: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:111:9: sparse: sparse: unreplaced symbol 'return'
> > >    include/asm-generic/bitops/generic-non-atomic.h:104:1: sparse: sparse: unreplaced symbol 'return'
> > >    include/asm-generic/bitops/generic-non-atomic.h:127:9: sparse: sparse: unreplaced symbol 'return'
> > >    include/asm-generic/bitops/generic-non-atomic.h:120:1: sparse: sparse: unreplaced symbol 'return'
> > > >> block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t
> > >
> > > vim +222 block/elevator.c
> > >
> > > 9817064b68fef7 Jens Axboe        2006-07-28  217
> > > 70b3ea056f3074 Jens Axboe        2016-12-07  218  void elv_rqhash_add(struct request_queue *q, struct request *rq)
> > > 9817064b68fef7 Jens Axboe        2006-07-28  219  {
> > > b374d18a4bfce7 Jens Axboe        2008-10-31  220      struct elevator_queue *e = q->elevator;
> > > 9817064b68fef7 Jens Axboe        2006-07-28  221
> > > 9817064b68fef7 Jens Axboe        2006-07-28 @222      BUG_ON(ELV_ON_HASH(rq));
> > > 242d98f077ac0a Sasha Levin       2012-12-17  223      hash_add(e->hash, &rq->hash, rq_hash_key(rq));
> > > e806402130c9c4 Christoph Hellwig 2016-10-20  224      rq->rq_flags |= RQF_HASHED;
> > > 9817064b68fef7 Jens Axboe        2006-07-28  225  }
> > > bd166ef183c263 Jens Axboe        2017-01-17  226  EXPORT_SYMBOL_GPL(elv_rqhash_add);
> > > 9817064b68fef7 Jens Axboe        2006-07-28  227
> 
> It looks like a false positive for _your_ case, but if you want to fix
> here is the background.
> 
> The sparse has an ability to control custom types that should never
> set bits outside of the limited range. For this the special annotation
> is given, i.e. __bitwise. Since the culprit type is defined that way
> it means the pure integer (signed or unsigned) that comes with pure
> definition can't be used in a safe way. To solve this each of such
> definitions should be converted to the very same type (req_flags_t).
> See serial core where some UART flags are defined in a similar way and
> how code copes with that.

Ah, I haven't looked at the req_flags_t definition and didn't know
it uses bitwise annotation. So it's a local blk layer problem (not
casting from/to bitwise), not a generic header issue. Thanks for
the hint!

> 
> -- 
> With Best Regards,
> Andy Shevchenko

Thanks,
Olek

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [alobakin:bitops 3/7] block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t
@ 2022-06-20 15:27         ` Alexander Lobakin
  0 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-20 15:27 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Alexander Lobakin, Arnd Bergmann, Yury Norov, Andy Shevchenko,
	Mark Rutland, Matt Turner, Brian Cain, Geert Uytterhoeven,
	Yoshinori Sato, Rich Felker, David S. Miller, Kees Cook,
	Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, Linux-SH, Sparc kernel list, Linux-Arch,
	Linux Kernel Mailing List, kbuild-all, kernel test robot

From: Andy Shevchenko <andy.shevchenko@gmail.com>
Date: Mon, 20 Jun 2022 17:18:40 +0200

> On Mon, Jun 20, 2022 at 4:48 PM Alexander Lobakin
> <alexandr.lobakin@intel.com> wrote:
> >
> > From: kernel test robot <lkp@intel.com>
> > Date: Sun, 19 Jun 2022 17:20:05 +0800
> >
> > Also, could someone please help me with this? I don't get what went
> > wrong with sparse, it's not even some new code, just moving old
> > stuff.
> >
> > > tree:   https://github.com/alobakin/linux bitops
> > > head:   9bd39b17ce49d350eed93a031e0da6389067013e
> > > commit: 521611f961a7dda92eefa26e1afd3914c06af64e [3/7] bitops: unify non-atomic bitops prototypes across architectures
> > > config: mips-randconfig-s031-20220619 (https://download.01.org/0day-ci/archive/20220619/202206191726.wq70mbMK-lkp@intel.com/config)
> > > compiler: mips64el-linux-gcc (GCC) 11.3.0
> > > reproduce:
> > >         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> > >         chmod +x ~/bin/make.cross
> > >         # apt-get install sparse
> > >         # sparse version: v0.6.4-30-g92122700-dirty
> > >         # https://github.com/alobakin/linux/commit/521611f961a7dda92eefa26e1afd3914c06af64e
> > >         git remote add alobakin https://github.com/alobakin/linux
> > >         git fetch --no-tags alobakin bitops
> > >         git checkout 521611f961a7dda92eefa26e1afd3914c06af64e
> > >         # save the config file
> > >         mkdir build_dir && cp config build_dir/.config
> > >         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=mips SHELL=/bin/bash
> > >
> > > If you fix the issue, kindly add following tag where applicable
> > > Reported-by: kernel test robot <lkp@intel.com>
> > >
> > >
> > > sparse warnings: (new ones prefixed by >>)
> > >    command-line: note: in included file:
> > >    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQUIRE redefined
> > >    builtin:0:0: sparse: this was the original definition
> > >    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_SEQ_CST redefined
> > >    builtin:0:0: sparse: this was the original definition
> > >    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQ_REL redefined
> > >    builtin:0:0: sparse: this was the original definition
> > >    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_RELEASE redefined
> > >    builtin:0:0: sparse: this was the original definition
> > >    block/elevator.c: note: in included file (through include/linux/bitops.h, include/linux/kernel.h):
> > >    include/asm-generic/bitops/generic-non-atomic.h:29:9: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:30:9: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:32:10: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:32:16: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:27:1: sparse: sparse: unreplaced symbol 'return'
> > >    include/asm-generic/bitops/generic-non-atomic.h:38:9: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:39:9: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:41:10: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:41:16: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:36:1: sparse: sparse: unreplaced symbol 'return'
> > >    include/asm-generic/bitops/generic-non-atomic.h:56:9: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:57:9: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:59:10: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:59:15: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:54:1: sparse: sparse: unreplaced symbol 'return'
> > >    include/asm-generic/bitops/generic-non-atomic.h:74:9: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:75:9: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:76:9: sparse: sparse: unreplaced symbol 'old'
> > >    include/asm-generic/bitops/generic-non-atomic.h:78:10: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:78:14: sparse: sparse: unreplaced symbol 'old'
> > >    include/asm-generic/bitops/generic-non-atomic.h:78:20: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:79:17: sparse: sparse: unreplaced symbol 'old'
> > >    include/asm-generic/bitops/generic-non-atomic.h:79:23: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:79:9: sparse: sparse: unreplaced symbol 'return'
> > >    include/asm-generic/bitops/generic-non-atomic.h:72:1: sparse: sparse: unreplaced symbol 'return'
> > >    include/asm-generic/bitops/generic-non-atomic.h:94:9: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:95:9: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:96:9: sparse: sparse: unreplaced symbol 'old'
> > >    include/asm-generic/bitops/generic-non-atomic.h:98:10: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:98:14: sparse: sparse: unreplaced symbol 'old'
> > >    include/asm-generic/bitops/generic-non-atomic.h:98:21: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:99:17: sparse: sparse: unreplaced symbol 'old'
> > >    include/asm-generic/bitops/generic-non-atomic.h:99:23: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:99:9: sparse: sparse: unreplaced symbol 'return'
> > >    include/asm-generic/bitops/generic-non-atomic.h:92:1: sparse: sparse: unreplaced symbol 'return'
> > >    include/asm-generic/bitops/generic-non-atomic.h:106:9: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:107:9: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:108:9: sparse: sparse: unreplaced symbol 'old'
> > >    include/asm-generic/bitops/generic-non-atomic.h:110:10: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:110:14: sparse: sparse: unreplaced symbol 'old'
> > >    include/asm-generic/bitops/generic-non-atomic.h:110:20: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:111:17: sparse: sparse: unreplaced symbol 'old'
> > >    include/asm-generic/bitops/generic-non-atomic.h:111:23: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:111:9: sparse: sparse: unreplaced symbol 'return'
> > >    include/asm-generic/bitops/generic-non-atomic.h:104:1: sparse: sparse: unreplaced symbol 'return'
> > >    include/asm-generic/bitops/generic-non-atomic.h:127:9: sparse: sparse: unreplaced symbol 'return'
> > >    include/asm-generic/bitops/generic-non-atomic.h:120:1: sparse: sparse: unreplaced symbol 'return'
> > > >> block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t
> > >
> > > vim +222 block/elevator.c
> > >
> > > 9817064b68fef7 Jens Axboe        2006-07-28  217
> > > 70b3ea056f3074 Jens Axboe        2016-12-07  218  void elv_rqhash_add(struct request_queue *q, struct request *rq)
> > > 9817064b68fef7 Jens Axboe        2006-07-28  219  {
> > > b374d18a4bfce7 Jens Axboe        2008-10-31  220      struct elevator_queue *e = q->elevator;
> > > 9817064b68fef7 Jens Axboe        2006-07-28  221
> > > 9817064b68fef7 Jens Axboe        2006-07-28 @222      BUG_ON(ELV_ON_HASH(rq));
> > > 242d98f077ac0a Sasha Levin       2012-12-17  223      hash_add(e->hash, &rq->hash, rq_hash_key(rq));
> > > e806402130c9c4 Christoph Hellwig 2016-10-20  224      rq->rq_flags |= RQF_HASHED;
> > > 9817064b68fef7 Jens Axboe        2006-07-28  225  }
> > > bd166ef183c263 Jens Axboe        2017-01-17  226  EXPORT_SYMBOL_GPL(elv_rqhash_add);
> > > 9817064b68fef7 Jens Axboe        2006-07-28  227
> 
> It looks like a false positive for _your_ case, but if you want to fix
> here is the background.
> 
> The sparse has an ability to control custom types that should never
> set bits outside of the limited range. For this the special annotation
> is given, i.e. __bitwise. Since the culprit type is defined that way
> it means the pure integer (signed or unsigned) that comes with pure
> definition can't be used in a safe way. To solve this each of such
> definitions should be converted to the very same type (req_flags_t).
> See serial core where some UART flags are defined in a similar way and
> how code copes with that.

Ah, I haven't looked at the req_flags_t definition and didn't know
it uses bitwise annotation. So it's a local blk layer problem (not
casting from/to bitwise), not a generic header issue. Thanks for
the hint!

> 
> -- 
> With Best Regards,
> Andy Shevchenko

Thanks,
Olek

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [alobakin:bitops 3/7] block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t
@ 2022-06-20 15:27         ` Alexander Lobakin
  0 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-20 15:27 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Alexander Lobakin, Arnd Bergmann, Yury Norov, Andy Shevchenko,
	Mark Rutland, Matt Turner, Brian Cain, Geert Uytterhoeven,
	Yoshinori Sato, Rich Felker, David S. Miller, Kees Cook,
	Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman

From: Andy Shevchenko <andy.shevchenko@gmail.com>
Date: Mon, 20 Jun 2022 17:18:40 +0200

> On Mon, Jun 20, 2022 at 4:48 PM Alexander Lobakin
> <alexandr.lobakin@intel.com> wrote:
> >
> > From: kernel test robot <lkp@intel.com>
> > Date: Sun, 19 Jun 2022 17:20:05 +0800
> >
> > Also, could someone please help me with this? I don't get what went
> > wrong with sparse, it's not even some new code, just moving old
> > stuff.
> >
> > > tree:   https://github.com/alobakin/linux bitops
> > > head:   9bd39b17ce49d350eed93a031e0da6389067013e
> > > commit: 521611f961a7dda92eefa26e1afd3914c06af64e [3/7] bitops: unify non-atomic bitops prototypes across architectures
> > > config: mips-randconfig-s031-20220619 (https://download.01.org/0day-ci/archive/20220619/202206191726.wq70mbMK-lkp@intel.com/config)
> > > compiler: mips64el-linux-gcc (GCC) 11.3.0
> > > reproduce:
> > >         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> > >         chmod +x ~/bin/make.cross
> > >         # apt-get install sparse
> > >         # sparse version: v0.6.4-30-g92122700-dirty
> > >         # https://github.com/alobakin/linux/commit/521611f961a7dda92eefa26e1afd3914c06af64e
> > >         git remote add alobakin https://github.com/alobakin/linux
> > >         git fetch --no-tags alobakin bitops
> > >         git checkout 521611f961a7dda92eefa26e1afd3914c06af64e
> > >         # save the config file
> > >         mkdir build_dir && cp config build_dir/.config
> > >         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=mips SHELL=/bin/bash
> > >
> > > If you fix the issue, kindly add following tag where applicable
> > > Reported-by: kernel test robot <lkp@intel.com>
> > >
> > >
> > > sparse warnings: (new ones prefixed by >>)
> > >    command-line: note: in included file:
> > >    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQUIRE redefined
> > >    builtin:0:0: sparse: this was the original definition
> > >    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_SEQ_CST redefined
> > >    builtin:0:0: sparse: this was the original definition
> > >    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQ_REL redefined
> > >    builtin:0:0: sparse: this was the original definition
> > >    builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_RELEASE redefined
> > >    builtin:0:0: sparse: this was the original definition
> > >    block/elevator.c: note: in included file (through include/linux/bitops.h, include/linux/kernel.h):
> > >    include/asm-generic/bitops/generic-non-atomic.h:29:9: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:30:9: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:32:10: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:32:16: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:27:1: sparse: sparse: unreplaced symbol 'return'
> > >    include/asm-generic/bitops/generic-non-atomic.h:38:9: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:39:9: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:41:10: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:41:16: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:36:1: sparse: sparse: unreplaced symbol 'return'
> > >    include/asm-generic/bitops/generic-non-atomic.h:56:9: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:57:9: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:59:10: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:59:15: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:54:1: sparse: sparse: unreplaced symbol 'return'
> > >    include/asm-generic/bitops/generic-non-atomic.h:74:9: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:75:9: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:76:9: sparse: sparse: unreplaced symbol 'old'
> > >    include/asm-generic/bitops/generic-non-atomic.h:78:10: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:78:14: sparse: sparse: unreplaced symbol 'old'
> > >    include/asm-generic/bitops/generic-non-atomic.h:78:20: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:79:17: sparse: sparse: unreplaced symbol 'old'
> > >    include/asm-generic/bitops/generic-non-atomic.h:79:23: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:79:9: sparse: sparse: unreplaced symbol 'return'
> > >    include/asm-generic/bitops/generic-non-atomic.h:72:1: sparse: sparse: unreplaced symbol 'return'
> > >    include/asm-generic/bitops/generic-non-atomic.h:94:9: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:95:9: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:96:9: sparse: sparse: unreplaced symbol 'old'
> > >    include/asm-generic/bitops/generic-non-atomic.h:98:10: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:98:14: sparse: sparse: unreplaced symbol 'old'
> > >    include/asm-generic/bitops/generic-non-atomic.h:98:21: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:99:17: sparse: sparse: unreplaced symbol 'old'
> > >    include/asm-generic/bitops/generic-non-atomic.h:99:23: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:99:9: sparse: sparse: unreplaced symbol 'return'
> > >    include/asm-generic/bitops/generic-non-atomic.h:92:1: sparse: sparse: unreplaced symbol 'return'
> > >    include/asm-generic/bitops/generic-non-atomic.h:106:9: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:107:9: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:108:9: sparse: sparse: unreplaced symbol 'old'
> > >    include/asm-generic/bitops/generic-non-atomic.h:110:10: sparse: sparse: unreplaced symbol 'p'
> > >    include/asm-generic/bitops/generic-non-atomic.h:110:14: sparse: sparse: unreplaced symbol 'old'
> > >    include/asm-generic/bitops/generic-non-atomic.h:110:20: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:111:17: sparse: sparse: unreplaced symbol 'old'
> > >    include/asm-generic/bitops/generic-non-atomic.h:111:23: sparse: sparse: unreplaced symbol 'mask'
> > >    include/asm-generic/bitops/generic-non-atomic.h:111:9: sparse: sparse: unreplaced symbol 'return'
> > >    include/asm-generic/bitops/generic-non-atomic.h:104:1: sparse: sparse: unreplaced symbol 'return'
> > >    include/asm-generic/bitops/generic-non-atomic.h:127:9: sparse: sparse: unreplaced symbol 'return'
> > >    include/asm-generic/bitops/generic-non-atomic.h:120:1: sparse: sparse: unreplaced symbol 'return'
> > > >> block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t
> > >
> > > vim +222 block/elevator.c
> > >
> > > 9817064b68fef7 Jens Axboe        2006-07-28  217
> > > 70b3ea056f3074 Jens Axboe        2016-12-07  218  void elv_rqhash_add(struct request_queue *q, struct request *rq)
> > > 9817064b68fef7 Jens Axboe        2006-07-28  219  {
> > > b374d18a4bfce7 Jens Axboe        2008-10-31  220      struct elevator_queue *e = q->elevator;
> > > 9817064b68fef7 Jens Axboe        2006-07-28  221
> > > 9817064b68fef7 Jens Axboe        2006-07-28 @222      BUG_ON(ELV_ON_HASH(rq));
> > > 242d98f077ac0a Sasha Levin       2012-12-17  223      hash_add(e->hash, &rq->hash, rq_hash_key(rq));
> > > e806402130c9c4 Christoph Hellwig 2016-10-20  224      rq->rq_flags |= RQF_HASHED;
> > > 9817064b68fef7 Jens Axboe        2006-07-28  225  }
> > > bd166ef183c263 Jens Axboe        2017-01-17  226  EXPORT_SYMBOL_GPL(elv_rqhash_add);
> > > 9817064b68fef7 Jens Axboe        2006-07-28  227
> 
> It looks like a false positive for _your_ case, but if you want to fix
> here is the background.
> 
> The sparse has an ability to control custom types that should never
> set bits outside of the limited range. For this the special annotation
> is given, i.e. __bitwise. Since the culprit type is defined that way
> it means the pure integer (signed or unsigned) that comes with pure
> definition can't be used in a safe way. To solve this each of such
> definitions should be converted to the very same type (req_flags_t).
> See serial core where some UART flags are defined in a similar way and
> how code copes with that.

Ah, I haven't looked at the req_flags_t definition and didn't know
it uses bitwise annotation. So it's a local blk layer problem (not
casting from/to bitwise), not a generic header issue. Thanks for
the hint!

> 
> -- 
> With Best Regards,
> Andy Shevchenko

Thanks,
Olek

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [alobakin:bitops 3/7] block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t
  2022-06-20 13:51     ` Alexander Lobakin
  (?)
  (?)
@ 2022-06-20 19:21       ` Luc Van Oostenryck
  -1 siblings, 0 replies; 93+ messages in thread
From: Luc Van Oostenryck @ 2022-06-20 19:21 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Mark Rutland,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel, kbuild-all, kernel test robot

On Mon, Jun 20, 2022 at 03:51:46PM +0200, Alexander Lobakin wrote:
> From: kernel test robot <lkp@intel.com>
> Date: Sun, 19 Jun 2022 17:20:05 +0800
> 
> Also, could someone please help me with this? I don't get what went
> wrong with sparse, it's not even some new code, just moving old
> stuff.

Hi,

The first sparse's warnings (sparse: preprocessor token __ATOMIC_*) are already
fixed (and most probably, the bots have already taken the fix).
I'm working on the second ones (sparse: unreplaced symbol).

-- Luc (Sparse's maintainer)

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [alobakin:bitops 3/7] block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t
@ 2022-06-20 19:21       ` Luc Van Oostenryck
  0 siblings, 0 replies; 93+ messages in thread
From: Luc Van Oostenryck @ 2022-06-20 19:21 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 550 bytes --]

On Mon, Jun 20, 2022 at 03:51:46PM +0200, Alexander Lobakin wrote:
> From: kernel test robot <lkp@intel.com>
> Date: Sun, 19 Jun 2022 17:20:05 +0800
> 
> Also, could someone please help me with this? I don't get what went
> wrong with sparse, it's not even some new code, just moving old
> stuff.

Hi,

The first sparse's warnings (sparse: preprocessor token __ATOMIC_*) are already
fixed (and most probably, the bots have already taken the fix).
I'm working on the second ones (sparse: unreplaced symbol).

-- Luc (Sparse's maintainer)

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [alobakin:bitops 3/7] block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t
@ 2022-06-20 19:21       ` Luc Van Oostenryck
  0 siblings, 0 replies; 93+ messages in thread
From: Luc Van Oostenryck @ 2022-06-20 19:21 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Mark Rutland,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel, kbuild-all, kernel test robot

On Mon, Jun 20, 2022 at 03:51:46PM +0200, Alexander Lobakin wrote:
> From: kernel test robot <lkp@intel.com>
> Date: Sun, 19 Jun 2022 17:20:05 +0800
> 
> Also, could someone please help me with this? I don't get what went
> wrong with sparse, it's not even some new code, just moving old
> stuff.

Hi,

The first sparse's warnings (sparse: preprocessor token __ATOMIC_*) are already
fixed (and most probably, the bots have already taken the fix).
I'm working on the second ones (sparse: unreplaced symbol).

-- Luc (Sparse's maintainer)

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [alobakin:bitops 3/7] block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t
@ 2022-06-20 19:21       ` Luc Van Oostenryck
  0 siblings, 0 replies; 93+ messages in thread
From: Luc Van Oostenryck @ 2022-06-20 19:21 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Mark Rutland,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon

On Mon, Jun 20, 2022 at 03:51:46PM +0200, Alexander Lobakin wrote:
> From: kernel test robot <lkp@intel.com>
> Date: Sun, 19 Jun 2022 17:20:05 +0800
> 
> Also, could someone please help me with this? I don't get what went
> wrong with sparse, it's not even some new code, just moving old
> stuff.

Hi,

The first sparse's warnings (sparse: preprocessor token __ATOMIC_*) are already
fixed (and most probably, the bots have already taken the fix).
I'm working on the second ones (sparse: unreplaced symbol).

-- Luc (Sparse's maintainer)

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants
  2022-06-20 15:08       ` Alexander Lobakin
  (?)
@ 2022-06-21  6:03         ` Mark Rutland
  -1 siblings, 0 replies; 93+ messages in thread
From: Mark Rutland @ 2022-06-21  6:03 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

On Mon, Jun 20, 2022 at 05:08:55PM +0200, Alexander Lobakin wrote:
> From: Mark Rutland <mark.rutland@arm.com>
> Date: Mon, 20 Jun 2022 15:19:42 +0100
> 
> > On Fri, Jun 17, 2022 at 04:40:24PM +0200, Alexander Lobakin wrote:
> > 
> > > The savings are architecture, compiler and compiler flags dependent,
> > > for example, on x86_64 -O2:
> > > 
> > > GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> > > LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> > > LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
> > > 
> > > and ARM64 (courtesy of Mark[0]):
> > > 
> > > GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> > > LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
> > 
> > Hmm... with *this version* of the series, I'm not getting results nearly as
> > good as that when building defconfig atop v5.19-rc3:
> > 
> >   GCC 8.5.0:   add/remove: 83/49 grow/shrink: 973/1147 up/down: 32020/-47824 (-15804)
> >   GCC 9.3.0:   add/remove: 68/51 grow/shrink: 1167/592 up/down: 30720/-31352 (-632)
> >   GCC 10.3.0:  add/remove: 84/37 grow/shrink: 1711/1003 up/down: 45392/-41844 (3548)
> >   GCC 11.1.0:  add/remove: 88/31 grow/shrink: 1635/963 up/down: 51540/-46096 (5444)
> >   GCC 11.3.0:  add/remove: 89/32 grow/shrink: 1629/966 up/down: 51456/-46056 (5400)
> >   GCC 12.1.0:  add/remove: 84/31 grow/shrink: 1540/829 up/down: 48772/-43164 (5608)
> > 
> >   LLVM 12.0.1: add/remove: 118/58 grow/shrink: 437/381 up/down: 45312/-65668 (-20356)
> >   LLVM 13.0.1: add/remove: 35/19 grow/shrink: 416/243 up/down: 14408/-22200 (-7792)
> >   LLVM 14.0.0: add/remove: 42/16 grow/shrink: 415/234 up/down: 15296/-21008 (-5712)
> > 
> > ... and that now seems to be regressing codegen with recent versions of GCC as
> > much as it improves it LLVM.
> > 
> > I'm not sure if we've improved some other code and removed the benefit between
> > v5.19-rc1 and v5.19-rc3, or whether something else it at play, but this doesn't
> > look as compelling as it did.
> 
> Mostly likely it's due to that in v1 I mistakingly removed
> `volatile` from gen[eric]_test_bit(), so there was an impact for
> non-constant cases as well.
> +5 Kb sounds bad tho. Do you have CONFIG_TEST_BITMAP enabled, does
> it work? 

I didn't have it enabled, but I tried that just nw with GCC 12.1.0 and it
builds cleanly, and test_bitmap_const_eval() gets optimized away entirely. If i
remove the `static` from that, GCC generates:

| <test_bitmap_const_eval>:
|     paciasp
|     autiasp
|     ret

... which is a trivial stub.

> Probably the same reason as for m68k, more constant
> optimization -> more aggressive inlining or inlining rebalance ->
> larger code. OTOH I've no idea why sometimes compiler decides to
> uninline really tiny functions where due to this patch series some
> bitops have been converted to constants, like it goes on m68k.

Hmm.... it'd be interesting to take a look at a few architectures and see what
the common case is.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-21  6:03         ` Mark Rutland
  0 siblings, 0 replies; 93+ messages in thread
From: Mark Rutland @ 2022-06-21  6:03 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

On Mon, Jun 20, 2022 at 05:08:55PM +0200, Alexander Lobakin wrote:
> From: Mark Rutland <mark.rutland@arm.com>
> Date: Mon, 20 Jun 2022 15:19:42 +0100
> 
> > On Fri, Jun 17, 2022 at 04:40:24PM +0200, Alexander Lobakin wrote:
> > 
> > > The savings are architecture, compiler and compiler flags dependent,
> > > for example, on x86_64 -O2:
> > > 
> > > GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> > > LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> > > LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
> > > 
> > > and ARM64 (courtesy of Mark[0]):
> > > 
> > > GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> > > LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
> > 
> > Hmm... with *this version* of the series, I'm not getting results nearly as
> > good as that when building defconfig atop v5.19-rc3:
> > 
> >   GCC 8.5.0:   add/remove: 83/49 grow/shrink: 973/1147 up/down: 32020/-47824 (-15804)
> >   GCC 9.3.0:   add/remove: 68/51 grow/shrink: 1167/592 up/down: 30720/-31352 (-632)
> >   GCC 10.3.0:  add/remove: 84/37 grow/shrink: 1711/1003 up/down: 45392/-41844 (3548)
> >   GCC 11.1.0:  add/remove: 88/31 grow/shrink: 1635/963 up/down: 51540/-46096 (5444)
> >   GCC 11.3.0:  add/remove: 89/32 grow/shrink: 1629/966 up/down: 51456/-46056 (5400)
> >   GCC 12.1.0:  add/remove: 84/31 grow/shrink: 1540/829 up/down: 48772/-43164 (5608)
> > 
> >   LLVM 12.0.1: add/remove: 118/58 grow/shrink: 437/381 up/down: 45312/-65668 (-20356)
> >   LLVM 13.0.1: add/remove: 35/19 grow/shrink: 416/243 up/down: 14408/-22200 (-7792)
> >   LLVM 14.0.0: add/remove: 42/16 grow/shrink: 415/234 up/down: 15296/-21008 (-5712)
> > 
> > ... and that now seems to be regressing codegen with recent versions of GCC as
> > much as it improves it LLVM.
> > 
> > I'm not sure if we've improved some other code and removed the benefit between
> > v5.19-rc1 and v5.19-rc3, or whether something else it at play, but this doesn't
> > look as compelling as it did.
> 
> Mostly likely it's due to that in v1 I mistakingly removed
> `volatile` from gen[eric]_test_bit(), so there was an impact for
> non-constant cases as well.
> +5 Kb sounds bad tho. Do you have CONFIG_TEST_BITMAP enabled, does
> it work? 

I didn't have it enabled, but I tried that just nw with GCC 12.1.0 and it
builds cleanly, and test_bitmap_const_eval() gets optimized away entirely. If i
remove the `static` from that, GCC generates:

| <test_bitmap_const_eval>:
|     paciasp
|     autiasp
|     ret

... which is a trivial stub.

> Probably the same reason as for m68k, more constant
> optimization -> more aggressive inlining or inlining rebalance ->
> larger code. OTOH I've no idea why sometimes compiler decides to
> uninline really tiny functions where due to this patch series some
> bitops have been converted to constants, like it goes on m68k.

Hmm.... it'd be interesting to take a look at a few architectures and see what
the common case is.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-21  6:03         ` Mark Rutland
  0 siblings, 0 replies; 93+ messages in thread
From: Mark Rutland @ 2022-06-21  6:03 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68

On Mon, Jun 20, 2022 at 05:08:55PM +0200, Alexander Lobakin wrote:
> From: Mark Rutland <mark.rutland@arm.com>
> Date: Mon, 20 Jun 2022 15:19:42 +0100
> 
> > On Fri, Jun 17, 2022 at 04:40:24PM +0200, Alexander Lobakin wrote:
> > 
> > > The savings are architecture, compiler and compiler flags dependent,
> > > for example, on x86_64 -O2:
> > > 
> > > GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> > > LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> > > LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
> > > 
> > > and ARM64 (courtesy of Mark[0]):
> > > 
> > > GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> > > LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
> > 
> > Hmm... with *this version* of the series, I'm not getting results nearly as
> > good as that when building defconfig atop v5.19-rc3:
> > 
> >   GCC 8.5.0:   add/remove: 83/49 grow/shrink: 973/1147 up/down: 32020/-47824 (-15804)
> >   GCC 9.3.0:   add/remove: 68/51 grow/shrink: 1167/592 up/down: 30720/-31352 (-632)
> >   GCC 10.3.0:  add/remove: 84/37 grow/shrink: 1711/1003 up/down: 45392/-41844 (3548)
> >   GCC 11.1.0:  add/remove: 88/31 grow/shrink: 1635/963 up/down: 51540/-46096 (5444)
> >   GCC 11.3.0:  add/remove: 89/32 grow/shrink: 1629/966 up/down: 51456/-46056 (5400)
> >   GCC 12.1.0:  add/remove: 84/31 grow/shrink: 1540/829 up/down: 48772/-43164 (5608)
> > 
> >   LLVM 12.0.1: add/remove: 118/58 grow/shrink: 437/381 up/down: 45312/-65668 (-20356)
> >   LLVM 13.0.1: add/remove: 35/19 grow/shrink: 416/243 up/down: 14408/-22200 (-7792)
> >   LLVM 14.0.0: add/remove: 42/16 grow/shrink: 415/234 up/down: 15296/-21008 (-5712)
> > 
> > ... and that now seems to be regressing codegen with recent versions of GCC as
> > much as it improves it LLVM.
> > 
> > I'm not sure if we've improved some other code and removed the benefit between
> > v5.19-rc1 and v5.19-rc3, or whether something else it at play, but this doesn't
> > look as compelling as it did.
> 
> Mostly likely it's due to that in v1 I mistakingly removed
> `volatile` from gen[eric]_test_bit(), so there was an impact for
> non-constant cases as well.
> +5 Kb sounds bad tho. Do you have CONFIG_TEST_BITMAP enabled, does
> it work? 

I didn't have it enabled, but I tried that just nw with GCC 12.1.0 and it
builds cleanly, and test_bitmap_const_eval() gets optimized away entirely. If i
remove the `static` from that, GCC generates:

| <test_bitmap_const_eval>:
|     paciasp
|     autiasp
|     ret

... which is a trivial stub.

> Probably the same reason as for m68k, more constant
> optimization -> more aggressive inlining or inlining rebalance ->
> larger code. OTOH I've no idea why sometimes compiler decides to
> uninline really tiny functions where due to this patch series some
> bitops have been converted to constants, like it goes on m68k.

Hmm.... it'd be interesting to take a look at a few architectures and see what
the common case is.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants
  2022-06-20 15:08       ` Alexander Lobakin
  (?)
@ 2022-06-21 17:39         ` Yury Norov
  -1 siblings, 0 replies; 93+ messages in thread
From: Yury Norov @ 2022-06-21 17:39 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Mark Rutland, Arnd Bergmann, Andy Shevchenko, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

On Mon, Jun 20, 2022 at 05:08:55PM +0200, Alexander Lobakin wrote:
> From: Mark Rutland <mark.rutland@arm.com>
> Date: Mon, 20 Jun 2022 15:19:42 +0100
> 
> > On Fri, Jun 17, 2022 at 04:40:24PM +0200, Alexander Lobakin wrote:
> > > So, in order to let the compiler optimize out such cases, expand the
> > > test_bit() and __*_bit() definitions with a compile-time condition
> > > check, so that they will pick the generic C non-atomic bitop
> > > implementations when all of the arguments passed are compile-time
> > > constants, which means that the result will be a compile-time
> > > constant as well and the compiler will produce more efficient and
> > > simple code in 100% cases (no changes when there's at least one
> > > non-compile-time-constant argument).
> > 
> > > The savings are architecture, compiler and compiler flags dependent,
> > > for example, on x86_64 -O2:
> > > 
> > > GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> > > LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> > > LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
> > > 
> > > and ARM64 (courtesy of Mark[0]):
> > > 
> > > GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> > > LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
> > 
> > Hmm... with *this version* of the series, I'm not getting results nearly as
> > good as that when building defconfig atop v5.19-rc3:
> > 
> >   GCC 8.5.0:   add/remove: 83/49 grow/shrink: 973/1147 up/down: 32020/-47824 (-15804)
> >   GCC 9.3.0:   add/remove: 68/51 grow/shrink: 1167/592 up/down: 30720/-31352 (-632)
> >   GCC 10.3.0:  add/remove: 84/37 grow/shrink: 1711/1003 up/down: 45392/-41844 (3548)
> >   GCC 11.1.0:  add/remove: 88/31 grow/shrink: 1635/963 up/down: 51540/-46096 (5444)
> >   GCC 11.3.0:  add/remove: 89/32 grow/shrink: 1629/966 up/down: 51456/-46056 (5400)
> >   GCC 12.1.0:  add/remove: 84/31 grow/shrink: 1540/829 up/down: 48772/-43164 (5608)
> > 
> >   LLVM 12.0.1: add/remove: 118/58 grow/shrink: 437/381 up/down: 45312/-65668 (-20356)
> >   LLVM 13.0.1: add/remove: 35/19 grow/shrink: 416/243 up/down: 14408/-22200 (-7792)
> >   LLVM 14.0.0: add/remove: 42/16 grow/shrink: 415/234 up/down: 15296/-21008 (-5712)
> > 
> > ... and that now seems to be regressing codegen with recent versions of GCC as
> > much as it improves it LLVM.
> > 
> > I'm not sure if we've improved some other code and removed the benefit between
> > v5.19-rc1 and v5.19-rc3, or whether something else it at play, but this doesn't
> > look as compelling as it did.
> 
> Mostly likely it's due to that in v1 I mistakingly removed
> `volatile` from gen[eric]_test_bit(), so there was an impact for
> non-constant cases as well.
> +5 Kb sounds bad tho. Do you have CONFIG_TEST_BITMAP enabled, does
> it work? Probably the same reason as for m68k, more constant
> optimization -> more aggressive inlining or inlining rebalance ->
> larger code. OTOH I've no idea why sometimes compiler decides to
> uninline really tiny functions where due to this patch series some
> bitops have been converted to constants, like it goes on m68k.
> 
> > 
> > Overall that's mostly hidden in the Image size, due to 64K alignment and
> > padding requirements:
> > 
> >   Toolchain      Before      After       Difference
> > 
> >   GCC 8.5.0      36178432    36178432    0
> >   GCC 9.3.0      36112896    36112896    0
> >   GCC 10.3.0     36442624    36377088    -65536
> >   GCC 11.1.0     36311552    36377088    +65536
> >   GCC 11.3.0     36311552    36311552    0
> >   GCC 12.1.0     36377088    36377088    0
> > 
> >   LLVM 12.0.1    31418880    31418880    0
> >   LLVM 13.0.1    31418880    31418880    0
> >   LLVM 14.0.0    31218176    31218176    0
> > 
> > ... so aside from the blip around GCC 10.3.0 and 11.1.0, there's not a massive
> > change overall (due to 64KiB alignment restrictions for portions of the kernel
> > Image).

I gave it a try on v5.19-rc3 for arm64 with my default GCC 11.2, and it's:
add/remove: 89/33 grow/shrink: 1629/966 up/down: 51456/-46064 (5392)

Which is not great in terms of layout size. But I don't think we should
focus too much on those numbers. The goal of the series is not to shrink
the image; the true goal is to provide more information to the compiler
in a hope that it will make a better decision regarding optimizations.

Looking at results provided by Mark, both GCC and LLVM have a tendency
to inline and use other techniques that increase the image more
aggressively in newer releases, comparing to old ones. From this
perspective, unless we find some terribly wrong behavior, I'm OK with
+5K for the Image, because I trust my compiler and believe it spent
those 5K wisely.

For the reasons said above, I think we shouldn't disable const
bitops for -Os build.

I think this series has total positive impact because it adds a lot
of information for compiler with just a few lines of code.

If no objections, I think it's good to try it in -next. Alexander,
would you like me to fix gen/generic typo in comment and take it in
bitmap-for-next, or you'd prefer to send v4?

Thanks,
Yury

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-21 17:39         ` Yury Norov
  0 siblings, 0 replies; 93+ messages in thread
From: Yury Norov @ 2022-06-21 17:39 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Mark Rutland, Arnd Bergmann, Andy Shevchenko, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

On Mon, Jun 20, 2022 at 05:08:55PM +0200, Alexander Lobakin wrote:
> From: Mark Rutland <mark.rutland@arm.com>
> Date: Mon, 20 Jun 2022 15:19:42 +0100
> 
> > On Fri, Jun 17, 2022 at 04:40:24PM +0200, Alexander Lobakin wrote:
> > > So, in order to let the compiler optimize out such cases, expand the
> > > test_bit() and __*_bit() definitions with a compile-time condition
> > > check, so that they will pick the generic C non-atomic bitop
> > > implementations when all of the arguments passed are compile-time
> > > constants, which means that the result will be a compile-time
> > > constant as well and the compiler will produce more efficient and
> > > simple code in 100% cases (no changes when there's at least one
> > > non-compile-time-constant argument).
> > 
> > > The savings are architecture, compiler and compiler flags dependent,
> > > for example, on x86_64 -O2:
> > > 
> > > GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> > > LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> > > LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
> > > 
> > > and ARM64 (courtesy of Mark[0]):
> > > 
> > > GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> > > LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
> > 
> > Hmm... with *this version* of the series, I'm not getting results nearly as
> > good as that when building defconfig atop v5.19-rc3:
> > 
> >   GCC 8.5.0:   add/remove: 83/49 grow/shrink: 973/1147 up/down: 32020/-47824 (-15804)
> >   GCC 9.3.0:   add/remove: 68/51 grow/shrink: 1167/592 up/down: 30720/-31352 (-632)
> >   GCC 10.3.0:  add/remove: 84/37 grow/shrink: 1711/1003 up/down: 45392/-41844 (3548)
> >   GCC 11.1.0:  add/remove: 88/31 grow/shrink: 1635/963 up/down: 51540/-46096 (5444)
> >   GCC 11.3.0:  add/remove: 89/32 grow/shrink: 1629/966 up/down: 51456/-46056 (5400)
> >   GCC 12.1.0:  add/remove: 84/31 grow/shrink: 1540/829 up/down: 48772/-43164 (5608)
> > 
> >   LLVM 12.0.1: add/remove: 118/58 grow/shrink: 437/381 up/down: 45312/-65668 (-20356)
> >   LLVM 13.0.1: add/remove: 35/19 grow/shrink: 416/243 up/down: 14408/-22200 (-7792)
> >   LLVM 14.0.0: add/remove: 42/16 grow/shrink: 415/234 up/down: 15296/-21008 (-5712)
> > 
> > ... and that now seems to be regressing codegen with recent versions of GCC as
> > much as it improves it LLVM.
> > 
> > I'm not sure if we've improved some other code and removed the benefit between
> > v5.19-rc1 and v5.19-rc3, or whether something else it at play, but this doesn't
> > look as compelling as it did.
> 
> Mostly likely it's due to that in v1 I mistakingly removed
> `volatile` from gen[eric]_test_bit(), so there was an impact for
> non-constant cases as well.
> +5 Kb sounds bad tho. Do you have CONFIG_TEST_BITMAP enabled, does
> it work? Probably the same reason as for m68k, more constant
> optimization -> more aggressive inlining or inlining rebalance ->
> larger code. OTOH I've no idea why sometimes compiler decides to
> uninline really tiny functions where due to this patch series some
> bitops have been converted to constants, like it goes on m68k.
> 
> > 
> > Overall that's mostly hidden in the Image size, due to 64K alignment and
> > padding requirements:
> > 
> >   Toolchain      Before      After       Difference
> > 
> >   GCC 8.5.0      36178432    36178432    0
> >   GCC 9.3.0      36112896    36112896    0
> >   GCC 10.3.0     36442624    36377088    -65536
> >   GCC 11.1.0     36311552    36377088    +65536
> >   GCC 11.3.0     36311552    36311552    0
> >   GCC 12.1.0     36377088    36377088    0
> > 
> >   LLVM 12.0.1    31418880    31418880    0
> >   LLVM 13.0.1    31418880    31418880    0
> >   LLVM 14.0.0    31218176    31218176    0
> > 
> > ... so aside from the blip around GCC 10.3.0 and 11.1.0, there's not a massive
> > change overall (due to 64KiB alignment restrictions for portions of the kernel
> > Image).

I gave it a try on v5.19-rc3 for arm64 with my default GCC 11.2, and it's:
add/remove: 89/33 grow/shrink: 1629/966 up/down: 51456/-46064 (5392)

Which is not great in terms of layout size. But I don't think we should
focus too much on those numbers. The goal of the series is not to shrink
the image; the true goal is to provide more information to the compiler
in a hope that it will make a better decision regarding optimizations.

Looking at results provided by Mark, both GCC and LLVM have a tendency
to inline and use other techniques that increase the image more
aggressively in newer releases, comparing to old ones. From this
perspective, unless we find some terribly wrong behavior, I'm OK with
+5K for the Image, because I trust my compiler and believe it spent
those 5K wisely.

For the reasons said above, I think we shouldn't disable const
bitops for -Os build.

I think this series has total positive impact because it adds a lot
of information for compiler with just a few lines of code.

If no objections, I think it's good to try it in -next. Alexander,
would you like me to fix gen/generic typo in comment and take it in
bitmap-for-next, or you'd prefer to send v4?

Thanks,
Yury

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-21 17:39         ` Yury Norov
  0 siblings, 0 replies; 93+ messages in thread
From: Yury Norov @ 2022-06-21 17:39 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Mark Rutland, Arnd Bergmann, Andy Shevchenko, Matt Turner,
	Brian Cain, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m

On Mon, Jun 20, 2022 at 05:08:55PM +0200, Alexander Lobakin wrote:
> From: Mark Rutland <mark.rutland@arm.com>
> Date: Mon, 20 Jun 2022 15:19:42 +0100
> 
> > On Fri, Jun 17, 2022 at 04:40:24PM +0200, Alexander Lobakin wrote:
> > > So, in order to let the compiler optimize out such cases, expand the
> > > test_bit() and __*_bit() definitions with a compile-time condition
> > > check, so that they will pick the generic C non-atomic bitop
> > > implementations when all of the arguments passed are compile-time
> > > constants, which means that the result will be a compile-time
> > > constant as well and the compiler will produce more efficient and
> > > simple code in 100% cases (no changes when there's at least one
> > > non-compile-time-constant argument).
> > 
> > > The savings are architecture, compiler and compiler flags dependent,
> > > for example, on x86_64 -O2:
> > > 
> > > GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> > > LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> > > LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
> > > 
> > > and ARM64 (courtesy of Mark[0]):
> > > 
> > > GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> > > LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
> > 
> > Hmm... with *this version* of the series, I'm not getting results nearly as
> > good as that when building defconfig atop v5.19-rc3:
> > 
> >   GCC 8.5.0:   add/remove: 83/49 grow/shrink: 973/1147 up/down: 32020/-47824 (-15804)
> >   GCC 9.3.0:   add/remove: 68/51 grow/shrink: 1167/592 up/down: 30720/-31352 (-632)
> >   GCC 10.3.0:  add/remove: 84/37 grow/shrink: 1711/1003 up/down: 45392/-41844 (3548)
> >   GCC 11.1.0:  add/remove: 88/31 grow/shrink: 1635/963 up/down: 51540/-46096 (5444)
> >   GCC 11.3.0:  add/remove: 89/32 grow/shrink: 1629/966 up/down: 51456/-46056 (5400)
> >   GCC 12.1.0:  add/remove: 84/31 grow/shrink: 1540/829 up/down: 48772/-43164 (5608)
> > 
> >   LLVM 12.0.1: add/remove: 118/58 grow/shrink: 437/381 up/down: 45312/-65668 (-20356)
> >   LLVM 13.0.1: add/remove: 35/19 grow/shrink: 416/243 up/down: 14408/-22200 (-7792)
> >   LLVM 14.0.0: add/remove: 42/16 grow/shrink: 415/234 up/down: 15296/-21008 (-5712)
> > 
> > ... and that now seems to be regressing codegen with recent versions of GCC as
> > much as it improves it LLVM.
> > 
> > I'm not sure if we've improved some other code and removed the benefit between
> > v5.19-rc1 and v5.19-rc3, or whether something else it at play, but this doesn't
> > look as compelling as it did.
> 
> Mostly likely it's due to that in v1 I mistakingly removed
> `volatile` from gen[eric]_test_bit(), so there was an impact for
> non-constant cases as well.
> +5 Kb sounds bad tho. Do you have CONFIG_TEST_BITMAP enabled, does
> it work? Probably the same reason as for m68k, more constant
> optimization -> more aggressive inlining or inlining rebalance ->
> larger code. OTOH I've no idea why sometimes compiler decides to
> uninline really tiny functions where due to this patch series some
> bitops have been converted to constants, like it goes on m68k.
> 
> > 
> > Overall that's mostly hidden in the Image size, due to 64K alignment and
> > padding requirements:
> > 
> >   Toolchain      Before      After       Difference
> > 
> >   GCC 8.5.0      36178432    36178432    0
> >   GCC 9.3.0      36112896    36112896    0
> >   GCC 10.3.0     36442624    36377088    -65536
> >   GCC 11.1.0     36311552    36377088    +65536
> >   GCC 11.3.0     36311552    36311552    0
> >   GCC 12.1.0     36377088    36377088    0
> > 
> >   LLVM 12.0.1    31418880    31418880    0
> >   LLVM 13.0.1    31418880    31418880    0
> >   LLVM 14.0.0    31218176    31218176    0
> > 
> > ... so aside from the blip around GCC 10.3.0 and 11.1.0, there's not a massive
> > change overall (due to 64KiB alignment restrictions for portions of the kernel
> > Image).

I gave it a try on v5.19-rc3 for arm64 with my default GCC 11.2, and it's:
add/remove: 89/33 grow/shrink: 1629/966 up/down: 51456/-46064 (5392)

Which is not great in terms of layout size. But I don't think we should
focus too much on those numbers. The goal of the series is not to shrink
the image; the true goal is to provide more information to the compiler
in a hope that it will make a better decision regarding optimizations.

Looking at results provided by Mark, both GCC and LLVM have a tendency
to inline and use other techniques that increase the image more
aggressively in newer releases, comparing to old ones. From this
perspective, unless we find some terribly wrong behavior, I'm OK with
+5K for the Image, because I trust my compiler and believe it spent
those 5K wisely.

For the reasons said above, I think we shouldn't disable const
bitops for -Os build.

I think this series has total positive impact because it adds a lot
of information for compiler with just a few lines of code.

If no objections, I think it's good to try it in -next. Alexander,
would you like me to fix gen/generic typo in comment and take it in
bitmap-for-next, or you'd prefer to send v4?

Thanks,
Yury

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants
  2022-06-21 17:39         ` Yury Norov
  (?)
@ 2022-06-21 18:51           ` Alexander Lobakin
  -1 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-21 18:51 UTC (permalink / raw)
  To: Yury Norov
  Cc: Alexander Lobakin, Mark Rutland, Arnd Bergmann, Andy Shevchenko,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

From: Yury Norov <yury.norov@gmail.com>
Date: Tue, 21 Jun 2022 10:39:44 -0700

> On Mon, Jun 20, 2022 at 05:08:55PM +0200, Alexander Lobakin wrote:
> > From: Mark Rutland <mark.rutland@arm.com>
> > Date: Mon, 20 Jun 2022 15:19:42 +0100
> > 
> > > On Fri, Jun 17, 2022 at 04:40:24PM +0200, Alexander Lobakin wrote:
> > > > So, in order to let the compiler optimize out such cases, expand the
> > > > test_bit() and __*_bit() definitions with a compile-time condition
> > > > check, so that they will pick the generic C non-atomic bitop
> > > > implementations when all of the arguments passed are compile-time
> > > > constants, which means that the result will be a compile-time
> > > > constant as well and the compiler will produce more efficient and
> > > > simple code in 100% cases (no changes when there's at least one
> > > > non-compile-time-constant argument).
> > > 
> > > > The savings are architecture, compiler and compiler flags dependent,
> > > > for example, on x86_64 -O2:
> > > > 
> > > > GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> > > > LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> > > > LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
> > > > 
> > > > and ARM64 (courtesy of Mark[0]):
> > > > 
> > > > GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> > > > LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
> > > 
> > > Hmm... with *this version* of the series, I'm not getting results nearly as
> > > good as that when building defconfig atop v5.19-rc3:
> > > 
> > >   GCC 8.5.0:   add/remove: 83/49 grow/shrink: 973/1147 up/down: 32020/-47824 (-15804)
> > >   GCC 9.3.0:   add/remove: 68/51 grow/shrink: 1167/592 up/down: 30720/-31352 (-632)
> > >   GCC 10.3.0:  add/remove: 84/37 grow/shrink: 1711/1003 up/down: 45392/-41844 (3548)
> > >   GCC 11.1.0:  add/remove: 88/31 grow/shrink: 1635/963 up/down: 51540/-46096 (5444)
> > >   GCC 11.3.0:  add/remove: 89/32 grow/shrink: 1629/966 up/down: 51456/-46056 (5400)
> > >   GCC 12.1.0:  add/remove: 84/31 grow/shrink: 1540/829 up/down: 48772/-43164 (5608)
> > > 
> > >   LLVM 12.0.1: add/remove: 118/58 grow/shrink: 437/381 up/down: 45312/-65668 (-20356)
> > >   LLVM 13.0.1: add/remove: 35/19 grow/shrink: 416/243 up/down: 14408/-22200 (-7792)
> > >   LLVM 14.0.0: add/remove: 42/16 grow/shrink: 415/234 up/down: 15296/-21008 (-5712)
> > > 
> > > ... and that now seems to be regressing codegen with recent versions of GCC as
> > > much as it improves it LLVM.
> > > 
> > > I'm not sure if we've improved some other code and removed the benefit between
> > > v5.19-rc1 and v5.19-rc3, or whether something else it at play, but this doesn't
> > > look as compelling as it did.
> > 
> > Mostly likely it's due to that in v1 I mistakingly removed
> > `volatile` from gen[eric]_test_bit(), so there was an impact for
> > non-constant cases as well.
> > +5 Kb sounds bad tho. Do you have CONFIG_TEST_BITMAP enabled, does
> > it work? Probably the same reason as for m68k, more constant
> > optimization -> more aggressive inlining or inlining rebalance ->
> > larger code. OTOH I've no idea why sometimes compiler decides to
> > uninline really tiny functions where due to this patch series some
> > bitops have been converted to constants, like it goes on m68k.
> > 
> > > 
> > > Overall that's mostly hidden in the Image size, due to 64K alignment and
> > > padding requirements:
> > > 
> > >   Toolchain      Before      After       Difference
> > > 
> > >   GCC 8.5.0      36178432    36178432    0
> > >   GCC 9.3.0      36112896    36112896    0
> > >   GCC 10.3.0     36442624    36377088    -65536
> > >   GCC 11.1.0     36311552    36377088    +65536
> > >   GCC 11.3.0     36311552    36311552    0
> > >   GCC 12.1.0     36377088    36377088    0
> > > 
> > >   LLVM 12.0.1    31418880    31418880    0
> > >   LLVM 13.0.1    31418880    31418880    0
> > >   LLVM 14.0.0    31218176    31218176    0
> > > 
> > > ... so aside from the blip around GCC 10.3.0 and 11.1.0, there's not a massive
> > > change overall (due to 64KiB alignment restrictions for portions of the kernel
> > > Image).
> 
> I gave it a try on v5.19-rc3 for arm64 with my default GCC 11.2, and it's:
> add/remove: 89/33 grow/shrink: 1629/966 up/down: 51456/-46064 (5392)
> 
> Which is not great in terms of layout size. But I don't think we should
> focus too much on those numbers. The goal of the series is not to shrink
> the image; the true goal is to provide more information to the compiler
> in a hope that it will make a better decision regarding optimizations.
> 
> Looking at results provided by Mark, both GCC and LLVM have a tendency
> to inline and use other techniques that increase the image more
> aggressively in newer releases, comparing to old ones. From this
> perspective, unless we find some terribly wrong behavior, I'm OK with
> +5K for the Image, because I trust my compiler and believe it spent
> those 5K wisely.
> 
> For the reasons said above, I think we shouldn't disable const
> bitops for -Os build.
> 
> I think this series has total positive impact because it adds a lot
> of information for compiler with just a few lines of code.

Right, that was the primary intention. But then I got some text size
decreases and thought this applies to any setup :)

> 
> If no objections, I think it's good to try it in -next. Alexander,
> would you like me to fix gen/generic typo in comment and take it in
> bitmap-for-next, or you'd prefer to send v4?

I'm sending v4 in a couple minutes, lkp reported that on ARC GCC
never expands mem*() builtins to plain assignments, which sucks,
but failed my compile-time tests, so I adjusted code a bit. Hope
that change will be okay for everyone, so that you could pick it.

> 
> Thanks,
> Yury

Thanks,
Olek

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-21 18:51           ` Alexander Lobakin
  0 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-21 18:51 UTC (permalink / raw)
  To: Yury Norov
  Cc: Alexander Lobakin, Mark Rutland, Arnd Bergmann, Andy Shevchenko,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexagon,
	linux-ia64, linux-m68k, linux-sh, sparclinux, linux-arch,
	linux-kernel

From: Yury Norov <yury.norov@gmail.com>
Date: Tue, 21 Jun 2022 10:39:44 -0700

> On Mon, Jun 20, 2022 at 05:08:55PM +0200, Alexander Lobakin wrote:
> > From: Mark Rutland <mark.rutland@arm.com>
> > Date: Mon, 20 Jun 2022 15:19:42 +0100
> > 
> > > On Fri, Jun 17, 2022 at 04:40:24PM +0200, Alexander Lobakin wrote:
> > > > So, in order to let the compiler optimize out such cases, expand the
> > > > test_bit() and __*_bit() definitions with a compile-time condition
> > > > check, so that they will pick the generic C non-atomic bitop
> > > > implementations when all of the arguments passed are compile-time
> > > > constants, which means that the result will be a compile-time
> > > > constant as well and the compiler will produce more efficient and
> > > > simple code in 100% cases (no changes when there's at least one
> > > > non-compile-time-constant argument).
> > > 
> > > > The savings are architecture, compiler and compiler flags dependent,
> > > > for example, on x86_64 -O2:
> > > > 
> > > > GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> > > > LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> > > > LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
> > > > 
> > > > and ARM64 (courtesy of Mark[0]):
> > > > 
> > > > GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> > > > LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
> > > 
> > > Hmm... with *this version* of the series, I'm not getting results nearly as
> > > good as that when building defconfig atop v5.19-rc3:
> > > 
> > >   GCC 8.5.0:   add/remove: 83/49 grow/shrink: 973/1147 up/down: 32020/-47824 (-15804)
> > >   GCC 9.3.0:   add/remove: 68/51 grow/shrink: 1167/592 up/down: 30720/-31352 (-632)
> > >   GCC 10.3.0:  add/remove: 84/37 grow/shrink: 1711/1003 up/down: 45392/-41844 (3548)
> > >   GCC 11.1.0:  add/remove: 88/31 grow/shrink: 1635/963 up/down: 51540/-46096 (5444)
> > >   GCC 11.3.0:  add/remove: 89/32 grow/shrink: 1629/966 up/down: 51456/-46056 (5400)
> > >   GCC 12.1.0:  add/remove: 84/31 grow/shrink: 1540/829 up/down: 48772/-43164 (5608)
> > > 
> > >   LLVM 12.0.1: add/remove: 118/58 grow/shrink: 437/381 up/down: 45312/-65668 (-20356)
> > >   LLVM 13.0.1: add/remove: 35/19 grow/shrink: 416/243 up/down: 14408/-22200 (-7792)
> > >   LLVM 14.0.0: add/remove: 42/16 grow/shrink: 415/234 up/down: 15296/-21008 (-5712)
> > > 
> > > ... and that now seems to be regressing codegen with recent versions of GCC as
> > > much as it improves it LLVM.
> > > 
> > > I'm not sure if we've improved some other code and removed the benefit between
> > > v5.19-rc1 and v5.19-rc3, or whether something else it at play, but this doesn't
> > > look as compelling as it did.
> > 
> > Mostly likely it's due to that in v1 I mistakingly removed
> > `volatile` from gen[eric]_test_bit(), so there was an impact for
> > non-constant cases as well.
> > +5 Kb sounds bad tho. Do you have CONFIG_TEST_BITMAP enabled, does
> > it work? Probably the same reason as for m68k, more constant
> > optimization -> more aggressive inlining or inlining rebalance ->
> > larger code. OTOH I've no idea why sometimes compiler decides to
> > uninline really tiny functions where due to this patch series some
> > bitops have been converted to constants, like it goes on m68k.
> > 
> > > 
> > > Overall that's mostly hidden in the Image size, due to 64K alignment and
> > > padding requirements:
> > > 
> > >   Toolchain      Before      After       Difference
> > > 
> > >   GCC 8.5.0      36178432    36178432    0
> > >   GCC 9.3.0      36112896    36112896    0
> > >   GCC 10.3.0     36442624    36377088    -65536
> > >   GCC 11.1.0     36311552    36377088    +65536
> > >   GCC 11.3.0     36311552    36311552    0
> > >   GCC 12.1.0     36377088    36377088    0
> > > 
> > >   LLVM 12.0.1    31418880    31418880    0
> > >   LLVM 13.0.1    31418880    31418880    0
> > >   LLVM 14.0.0    31218176    31218176    0
> > > 
> > > ... so aside from the blip around GCC 10.3.0 and 11.1.0, there's not a massive
> > > change overall (due to 64KiB alignment restrictions for portions of the kernel
> > > Image).
> 
> I gave it a try on v5.19-rc3 for arm64 with my default GCC 11.2, and it's:
> add/remove: 89/33 grow/shrink: 1629/966 up/down: 51456/-46064 (5392)
> 
> Which is not great in terms of layout size. But I don't think we should
> focus too much on those numbers. The goal of the series is not to shrink
> the image; the true goal is to provide more information to the compiler
> in a hope that it will make a better decision regarding optimizations.
> 
> Looking at results provided by Mark, both GCC and LLVM have a tendency
> to inline and use other techniques that increase the image more
> aggressively in newer releases, comparing to old ones. From this
> perspective, unless we find some terribly wrong behavior, I'm OK with
> +5K for the Image, because I trust my compiler and believe it spent
> those 5K wisely.
> 
> For the reasons said above, I think we shouldn't disable const
> bitops for -Os build.
> 
> I think this series has total positive impact because it adds a lot
> of information for compiler with just a few lines of code.

Right, that was the primary intention. But then I got some text size
decreases and thought this applies to any setup :)

> 
> If no objections, I think it's good to try it in -next. Alexander,
> would you like me to fix gen/generic typo in comment and take it in
> bitmap-for-next, or you'd prefer to send v4?

I'm sending v4 in a couple minutes, lkp reported that on ARC GCC
never expands mem*() builtins to plain assignments, which sucks,
but failed my compile-time tests, so I adjusted code a bit. Hope
that change will be okay for everyone, so that you could pick it.

> 
> Thanks,
> Yury

Thanks,
Olek

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants
@ 2022-06-21 18:51           ` Alexander Lobakin
  0 siblings, 0 replies; 93+ messages in thread
From: Alexander Lobakin @ 2022-06-21 18:51 UTC (permalink / raw)
  To: Yury Norov
  Cc: Alexander Lobakin, Mark Rutland, Arnd Bergmann, Andy Shevchenko,
	Matt Turner, Brian Cain, Geert Uytterhoeven, Yoshinori Sato,
	Rich Felker, David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, linux-alpha, linux-hexa

From: Yury Norov <yury.norov@gmail.com>
Date: Tue, 21 Jun 2022 10:39:44 -0700

> On Mon, Jun 20, 2022 at 05:08:55PM +0200, Alexander Lobakin wrote:
> > From: Mark Rutland <mark.rutland@arm.com>
> > Date: Mon, 20 Jun 2022 15:19:42 +0100
> > 
> > > On Fri, Jun 17, 2022 at 04:40:24PM +0200, Alexander Lobakin wrote:
> > > > So, in order to let the compiler optimize out such cases, expand the
> > > > test_bit() and __*_bit() definitions with a compile-time condition
> > > > check, so that they will pick the generic C non-atomic bitop
> > > > implementations when all of the arguments passed are compile-time
> > > > constants, which means that the result will be a compile-time
> > > > constant as well and the compiler will produce more efficient and
> > > > simple code in 100% cases (no changes when there's at least one
> > > > non-compile-time-constant argument).
> > > 
> > > > The savings are architecture, compiler and compiler flags dependent,
> > > > for example, on x86_64 -O2:
> > > > 
> > > > GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> > > > LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> > > > LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
> > > > 
> > > > and ARM64 (courtesy of Mark[0]):
> > > > 
> > > > GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> > > > LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
> > > 
> > > Hmm... with *this version* of the series, I'm not getting results nearly as
> > > good as that when building defconfig atop v5.19-rc3:
> > > 
> > >   GCC 8.5.0:   add/remove: 83/49 grow/shrink: 973/1147 up/down: 32020/-47824 (-15804)
> > >   GCC 9.3.0:   add/remove: 68/51 grow/shrink: 1167/592 up/down: 30720/-31352 (-632)
> > >   GCC 10.3.0:  add/remove: 84/37 grow/shrink: 1711/1003 up/down: 45392/-41844 (3548)
> > >   GCC 11.1.0:  add/remove: 88/31 grow/shrink: 1635/963 up/down: 51540/-46096 (5444)
> > >   GCC 11.3.0:  add/remove: 89/32 grow/shrink: 1629/966 up/down: 51456/-46056 (5400)
> > >   GCC 12.1.0:  add/remove: 84/31 grow/shrink: 1540/829 up/down: 48772/-43164 (5608)
> > > 
> > >   LLVM 12.0.1: add/remove: 118/58 grow/shrink: 437/381 up/down: 45312/-65668 (-20356)
> > >   LLVM 13.0.1: add/remove: 35/19 grow/shrink: 416/243 up/down: 14408/-22200 (-7792)
> > >   LLVM 14.0.0: add/remove: 42/16 grow/shrink: 415/234 up/down: 15296/-21008 (-5712)
> > > 
> > > ... and that now seems to be regressing codegen with recent versions of GCC as
> > > much as it improves it LLVM.
> > > 
> > > I'm not sure if we've improved some other code and removed the benefit between
> > > v5.19-rc1 and v5.19-rc3, or whether something else it at play, but this doesn't
> > > look as compelling as it did.
> > 
> > Mostly likely it's due to that in v1 I mistakingly removed
> > `volatile` from gen[eric]_test_bit(), so there was an impact for
> > non-constant cases as well.
> > +5 Kb sounds bad tho. Do you have CONFIG_TEST_BITMAP enabled, does
> > it work? Probably the same reason as for m68k, more constant
> > optimization -> more aggressive inlining or inlining rebalance ->
> > larger code. OTOH I've no idea why sometimes compiler decides to
> > uninline really tiny functions where due to this patch series some
> > bitops have been converted to constants, like it goes on m68k.
> > 
> > > 
> > > Overall that's mostly hidden in the Image size, due to 64K alignment and
> > > padding requirements:
> > > 
> > >   Toolchain      Before      After       Difference
> > > 
> > >   GCC 8.5.0      36178432    36178432    0
> > >   GCC 9.3.0      36112896    36112896    0
> > >   GCC 10.3.0     36442624    36377088    -65536
> > >   GCC 11.1.0     36311552    36377088    +65536
> > >   GCC 11.3.0     36311552    36311552    0
> > >   GCC 12.1.0     36377088    36377088    0
> > > 
> > >   LLVM 12.0.1    31418880    31418880    0
> > >   LLVM 13.0.1    31418880    31418880    0
> > >   LLVM 14.0.0    31218176    31218176    0
> > > 
> > > ... so aside from the blip around GCC 10.3.0 and 11.1.0, there's not a massive
> > > change overall (due to 64KiB alignment restrictions for portions of the kernel
> > > Image).
> 
> I gave it a try on v5.19-rc3 for arm64 with my default GCC 11.2, and it's:
> add/remove: 89/33 grow/shrink: 1629/966 up/down: 51456/-46064 (5392)
> 
> Which is not great in terms of layout size. But I don't think we should
> focus too much on those numbers. The goal of the series is not to shrink
> the image; the true goal is to provide more information to the compiler
> in a hope that it will make a better decision regarding optimizations.
> 
> Looking at results provided by Mark, both GCC and LLVM have a tendency
> to inline and use other techniques that increase the image more
> aggressively in newer releases, comparing to old ones. From this
> perspective, unless we find some terribly wrong behavior, I'm OK with
> +5K for the Image, because I trust my compiler and believe it spent
> those 5K wisely.
> 
> For the reasons said above, I think we shouldn't disable const
> bitops for -Os build.
> 
> I think this series has total positive impact because it adds a lot
> of information for compiler with just a few lines of code.

Right, that was the primary intention. But then I got some text size
decreases and thought this applies to any setup :)

> 
> If no objections, I think it's good to try it in -next. Alexander,
> would you like me to fix gen/generic typo in comment and take it in
> bitmap-for-next, or you'd prefer to send v4?

I'm sending v4 in a couple minutes, lkp reported that on ARC GCC
never expands mem*() builtins to plain assignments, which sucks,
but failed my compile-time tests, so I adjusted code a bit. Hope
that change will be okay for everyone, so that you could pick it.

> 
> Thanks,
> Yury

Thanks,
Olek

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 3/7] bitops: unify non-atomic bitops prototypes across architectures
  2022-06-17 14:40     ` Alexander Lobakin
  (?)
@ 2022-07-06 10:07       ` Geert Uytterhoeven
  -1 siblings, 0 replies; 93+ messages in thread
From: Geert Uytterhoeven @ 2022-07-06 10:07 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Mark Rutland,
	Matt Turner, Brian Cain, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, alpha,
	open list:QUALCOMM HEXAGON...,
	linux-ia64, linux-m68k, Linux-sh list, sparclinux, Linux-Arch,
	Linux Kernel Mailing List

On Fri, Jun 17, 2022 at 7:09 PM Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
> Currently, there is a mess with the prototypes of the non-atomic
> bitops across the different architectures:
>
> ret     bool, int, unsigned long
> nr      int, long, unsigned int, unsigned long
> addr    volatile unsigned long *, volatile void *
>
> Thankfully, it doesn't provoke any bugs, but can sometimes make
> the compiler angry when it's not handy at all.
> Adjust all the prototypes to the following standard:
>
> ret     bool                            retval can be only 0 or 1
> nr      unsigned long                   native; signed makes no sense
> addr    volatile unsigned long *        bitmaps are arrays of ulongs
>
> Next, some architectures don't define 'arch_' versions as they don't
> support instrumentation, others do. To make sure there is always the
> same set of callables present and to ease any potential future
> changes, make them all follow the rule:
>  * architecture-specific files define only 'arch_' versions;
>  * non-prefixed versions can be defined only in asm-generic files;
> and place the non-prefixed definitions into a new file in
> asm-generic to be included by non-instrumented architectures.
>
> Finally, add some static assertions in order to prevent people from
> making a mess in this room again.
> I also used the %__always_inline attribute consistently, so that
> they always get resolved to the actual operations.
>
> Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> Acked-by: Mark Rutland <mark.rutland@arm.com>
> Reviewed-by: Yury Norov <yury.norov@gmail.com>

>  arch/m68k/include/asm/bitops.h                | 49 +++++++++++++------

Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 3/7] bitops: unify non-atomic bitops prototypes across architectures
@ 2022-07-06 10:07       ` Geert Uytterhoeven
  0 siblings, 0 replies; 93+ messages in thread
From: Geert Uytterhoeven @ 2022-07-06 10:07 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Mark Rutland,
	Matt Turner, Brian Cain, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, alpha,
	open list:QUALCOMM HEXAGON...,
	linux-ia64, linux-m68k, Linux-sh list, sparclinux, Linux-Arch,
	Linux Kernel Mailing List

On Fri, Jun 17, 2022 at 7:09 PM Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
> Currently, there is a mess with the prototypes of the non-atomic
> bitops across the different architectures:
>
> ret     bool, int, unsigned long
> nr      int, long, unsigned int, unsigned long
> addr    volatile unsigned long *, volatile void *
>
> Thankfully, it doesn't provoke any bugs, but can sometimes make
> the compiler angry when it's not handy at all.
> Adjust all the prototypes to the following standard:
>
> ret     bool                            retval can be only 0 or 1
> nr      unsigned long                   native; signed makes no sense
> addr    volatile unsigned long *        bitmaps are arrays of ulongs
>
> Next, some architectures don't define 'arch_' versions as they don't
> support instrumentation, others do. To make sure there is always the
> same set of callables present and to ease any potential future
> changes, make them all follow the rule:
>  * architecture-specific files define only 'arch_' versions;
>  * non-prefixed versions can be defined only in asm-generic files;
> and place the non-prefixed definitions into a new file in
> asm-generic to be included by non-instrumented architectures.
>
> Finally, add some static assertions in order to prevent people from
> making a mess in this room again.
> I also used the %__always_inline attribute consistently, so that
> they always get resolved to the actual operations.
>
> Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> Acked-by: Mark Rutland <mark.rutland@arm.com>
> Reviewed-by: Yury Norov <yury.norov@gmail.com>

>  arch/m68k/include/asm/bitops.h                | 49 +++++++++++++------

Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 3/7] bitops: unify non-atomic bitops prototypes across architectures
@ 2022-07-06 10:07       ` Geert Uytterhoeven
  0 siblings, 0 replies; 93+ messages in thread
From: Geert Uytterhoeven @ 2022-07-06 10:07 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Arnd Bergmann, Yury Norov, Andy Shevchenko, Mark Rutland,
	Matt Turner, Brian Cain, Yoshinori Sato, Rich Felker,
	David S. Miller, Kees Cook, Peter Zijlstra (Intel),
	Marco Elver, Borislav Petkov, Tony Luck, Maciej Fijalkowski,
	Jesse Brandeburg, Greg Kroah-Hartman, alpha,
	open list:QUALCOMM HEXAGON...

On Fri, Jun 17, 2022 at 7:09 PM Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
> Currently, there is a mess with the prototypes of the non-atomic
> bitops across the different architectures:
>
> ret     bool, int, unsigned long
> nr      int, long, unsigned int, unsigned long
> addr    volatile unsigned long *, volatile void *
>
> Thankfully, it doesn't provoke any bugs, but can sometimes make
> the compiler angry when it's not handy at all.
> Adjust all the prototypes to the following standard:
>
> ret     bool                            retval can be only 0 or 1
> nr      unsigned long                   native; signed makes no sense
> addr    volatile unsigned long *        bitmaps are arrays of ulongs
>
> Next, some architectures don't define 'arch_' versions as they don't
> support instrumentation, others do. To make sure there is always the
> same set of callables present and to ease any potential future
> changes, make them all follow the rule:
>  * architecture-specific files define only 'arch_' versions;
>  * non-prefixed versions can be defined only in asm-generic files;
> and place the non-prefixed definitions into a new file in
> asm-generic to be included by non-instrumented architectures.
>
> Finally, add some static assertions in order to prevent people from
> making a mess in this room again.
> I also used the %__always_inline attribute consistently, so that
> they always get resolved to the actual operations.
>
> Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> Acked-by: Mark Rutland <mark.rutland@arm.com>
> Reviewed-by: Yury Norov <yury.norov@gmail.com>

>  arch/m68k/include/asm/bitops.h                | 49 +++++++++++++------

Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>

Gr{oetje,eeting}s,

                        Geert

^ permalink raw reply	[flat|nested] 93+ messages in thread

end of thread, other threads:[~2022-07-06 10:14 UTC | newest]

Thread overview: 93+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-19  9:20 [alobakin:bitops 3/7] block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t kernel test robot
2022-06-17 14:40 ` [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants Alexander Lobakin
2022-06-17 14:40   ` Alexander Lobakin
2022-06-17 14:40   ` Alexander Lobakin
2022-06-17 14:40   ` [PATCH v3 1/7] ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr() Alexander Lobakin
2022-06-17 14:40     ` Alexander Lobakin
2022-06-17 14:40     ` Alexander Lobakin
2022-06-17 14:40   ` [PATCH v3 2/7] bitops: always define asm-generic non-atomic bitops Alexander Lobakin
2022-06-17 14:40     ` Alexander Lobakin
2022-06-17 14:40     ` Alexander Lobakin
2022-06-20  9:49     ` Marco Elver
2022-06-20  9:49       ` Marco Elver
2022-06-20  9:49       ` Marco Elver
2022-06-17 14:40   ` [PATCH v3 3/7] bitops: unify non-atomic bitops prototypes across architectures Alexander Lobakin
2022-06-17 14:40     ` Alexander Lobakin
2022-06-17 14:40     ` Alexander Lobakin
2022-06-20 10:02     ` Andy Shevchenko
2022-06-20 10:02       ` Andy Shevchenko
2022-06-20 10:02       ` Andy Shevchenko
2022-07-06 10:07     ` Geert Uytterhoeven
2022-07-06 10:07       ` Geert Uytterhoeven
2022-07-06 10:07       ` Geert Uytterhoeven
2022-06-17 14:40   ` [PATCH v3 4/7] bitops: define const_*() versions of the non-atomics Alexander Lobakin
2022-06-17 14:40     ` Alexander Lobakin
2022-06-17 14:40     ` Alexander Lobakin
2022-06-20  9:49     ` Marco Elver
2022-06-20  9:49       ` Marco Elver
2022-06-20  9:49       ` Marco Elver
2022-06-20 10:03     ` Andy Shevchenko
2022-06-20 10:03       ` Andy Shevchenko
2022-06-20 10:03       ` Andy Shevchenko
2022-06-17 14:40   ` [PATCH v3 5/7] bitops: wrap non-atomic bitops with a transparent macro Alexander Lobakin
2022-06-17 14:40     ` Alexander Lobakin
2022-06-17 14:40     ` Alexander Lobakin
2022-06-20  9:50     ` Marco Elver
2022-06-20  9:50       ` Marco Elver
2022-06-20  9:50       ` Marco Elver
2022-06-20 10:08     ` Andy Shevchenko
2022-06-20 10:08       ` Andy Shevchenko
2022-06-20 10:08       ` Andy Shevchenko
2022-06-17 14:40   ` [PATCH v3 6/7] bitops: let optimize out non-atomic bitops on compile-time constants Alexander Lobakin
2022-06-17 14:40     ` Alexander Lobakin
2022-06-17 14:40     ` Alexander Lobakin
2022-06-20  9:51     ` Marco Elver
2022-06-20  9:51       ` Marco Elver
2022-06-20  9:51       ` Marco Elver
2022-06-20 10:05     ` Andy Shevchenko
2022-06-20 10:05       ` Andy Shevchenko
2022-06-20 10:05       ` Andy Shevchenko
2022-06-20 13:12       ` Alexander Lobakin
2022-06-20 13:12         ` Alexander Lobakin
2022-06-20 13:12         ` Alexander Lobakin
2022-06-17 14:40   ` [PATCH v3 7/7] lib: test_bitmap: add compile-time optimization/evaluations assertions Alexander Lobakin
2022-06-17 14:40     ` Alexander Lobakin
2022-06-17 14:40     ` Alexander Lobakin
2022-06-20 10:07     ` Andy Shevchenko
2022-06-20 10:07       ` Andy Shevchenko
2022-06-20 10:07       ` Andy Shevchenko
2022-06-20 12:07   ` [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants Geert Uytterhoeven
2022-06-20 12:07     ` Geert Uytterhoeven
2022-06-20 12:07     ` Geert Uytterhoeven
2022-06-20 13:22     ` Alexander Lobakin
2022-06-20 13:22       ` Alexander Lobakin
2022-06-20 13:22       ` Alexander Lobakin
2022-06-20 13:51   ` [alobakin:bitops 3/7] block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t Alexander Lobakin
2022-06-20 13:51     ` Alexander Lobakin
2022-06-20 13:51     ` Alexander Lobakin
2022-06-20 15:18     ` Andy Shevchenko
2022-06-20 15:18       ` Andy Shevchenko
2022-06-20 15:18       ` Andy Shevchenko
2022-06-20 15:18       ` Andy Shevchenko
2022-06-20 15:27       ` Alexander Lobakin
2022-06-20 15:27         ` Alexander Lobakin
2022-06-20 15:27         ` Alexander Lobakin
2022-06-20 19:21     ` Luc Van Oostenryck
2022-06-20 19:21       ` Luc Van Oostenryck
2022-06-20 19:21       ` Luc Van Oostenryck
2022-06-20 19:21       ` Luc Van Oostenryck
2022-06-20 14:19   ` [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants Mark Rutland
2022-06-20 14:19     ` Mark Rutland
2022-06-20 14:19     ` Mark Rutland
2022-06-20 15:08     ` Alexander Lobakin
2022-06-20 15:08       ` Alexander Lobakin
2022-06-20 15:08       ` Alexander Lobakin
2022-06-21  6:03       ` Mark Rutland
2022-06-21  6:03         ` Mark Rutland
2022-06-21  6:03         ` Mark Rutland
2022-06-21 17:39       ` Yury Norov
2022-06-21 17:39         ` Yury Norov
2022-06-21 17:39         ` Yury Norov
2022-06-21 18:51         ` Alexander Lobakin
2022-06-21 18:51           ` Alexander Lobakin
2022-06-21 18:51           ` Alexander Lobakin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.