All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 00/12] lib/find_bit: fast path for small bitmaps
@ 2021-04-01  0:31 Yury Norov
  2021-04-01  0:31 ` [PATCH 01/12] tools: disable -Wno-type-limits Yury Norov
                   ` (12 more replies)
  0 siblings, 13 replies; 34+ messages in thread
From: Yury Norov @ 2021-04-01  0:31 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton
  Cc: Yury Norov, linux-m68k, linux-arch, linux-sh, Alexey Klimov,
	Andy Shevchenko, Arnd Bergmann, David Sterba, Dennis Zhou,
	Geert Uytterhoeven, Jianpeng Ma, Joe Perches,
	John Paul Adrian Glaubitz, Josh Poimboeuf, Rasmus Villemoes,
	Rich Felker, Stefano Brivio, Wei Yang, Wolfram Sang,
	Yoshinori Sato

Bitmap operations are much simpler and faster in case of small bitmaps
which fit into a single word. In linux/bitmap.c we have a machinery that
allows compiler to replace actual function call with a few instructions
if bitmaps passed into the function are small and their size is known at
compile time.

find_*_bit() API lacks this functionality; but users will benefit from it
a lot. One important example is cpumask subsystem when
NR_CPUS <= BITS_PER_LONG.

v6 is mostly a resend. The only change comparing to v5 is a fix of
small_const_nbits() synchronization patch.

v1: https://www.spinics.net/lists/kernel/msg3804727.html
v2: https://www.spinics.net/lists/linux-m68k/msg16945.html
v3: https://www.spinics.net/lists/kernel/msg3837020.html
v4: https://patchwork.kernel.org/project/linux-sh/cover/20210316015424.1999082-1-yury.norov@gmail.com/
v5: https://lore.kernel.org/linux-arch/20210321215457.588554-1-yury.norov@gmail.com/T/
v6: - sync small_const_nbits() properly (patch 6).
    - Rasmus' ack added.

Yury Norov (12):
  tools: disable -Wno-type-limits
  tools: bitmap: sync function declarations with the kernel
  tools: sync BITMAP_LAST_WORD_MASK() macro with the kernel
  arch: rearrange headers inclusion order in asm/bitops for m68k and sh
  lib: extend the scope of small_const_nbits() macro
  tools: sync small_const_nbits() macro with the kernel
  lib: inline _find_next_bit() wrappers
  tools: sync find_next_bit implementation
  lib: add fast path for find_next_*_bit()
  lib: add fast path for find_first_*_bit() and find_last_bit()
  tools: sync lib/find_bit implementation
  MAINTAINERS: Add entry for the bitmap API

 MAINTAINERS                             |  16 ++++
 arch/m68k/include/asm/bitops.h          |   6 +-
 arch/sh/include/asm/bitops.h            |   5 +-
 include/asm-generic/bitops/find.h       | 108 +++++++++++++++++++++---
 include/asm-generic/bitops/le.h         |  38 ++++++++-
 include/asm-generic/bitsperlong.h       |  12 +++
 include/linux/bitmap.h                  |   8 --
 include/linux/bitops.h                  |  12 ---
 lib/find_bit.c                          |  68 ++-------------
 tools/include/asm-generic/bitops/find.h |  85 +++++++++++++++++--
 tools/include/asm-generic/bitsperlong.h |   3 +
 tools/include/linux/bitmap.h            |  18 ++--
 tools/lib/bitmap.c                      |   4 +-
 tools/lib/find_bit.c                    |  56 +++++-------
 tools/scripts/Makefile.include          |   1 +
 15 files changed, 284 insertions(+), 156 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 01/12] tools: disable -Wno-type-limits
  2021-04-01  0:31 [PATCH v6 00/12] lib/find_bit: fast path for small bitmaps Yury Norov
@ 2021-04-01  0:31 ` Yury Norov
  2021-04-01  0:31 ` [PATCH 02/12] tools: bitmap: sync function declarations with the kernel Yury Norov
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 34+ messages in thread
From: Yury Norov @ 2021-04-01  0:31 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton
  Cc: Yury Norov, linux-m68k, linux-arch, linux-sh, Alexey Klimov,
	Andy Shevchenko, Arnd Bergmann, David Sterba, Dennis Zhou,
	Geert Uytterhoeven, Jianpeng Ma, Joe Perches,
	John Paul Adrian Glaubitz, Josh Poimboeuf, Rasmus Villemoes,
	Rich Felker, Stefano Brivio, Wei Yang, Wolfram Sang,
	Yoshinori Sato

GENMASK(h, l) may be passed with unsigned types. In such case, type-limits
warning is generated for example in case of GENMASK(h, 0).

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
---
 tools/scripts/Makefile.include | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/scripts/Makefile.include b/tools/scripts/Makefile.include
index 84dbf61a7eca..15e99905cb7d 100644
--- a/tools/scripts/Makefile.include
+++ b/tools/scripts/Makefile.include
@@ -38,6 +38,7 @@ EXTRA_WARNINGS += -Wswitch-enum
 EXTRA_WARNINGS += -Wundef
 EXTRA_WARNINGS += -Wwrite-strings
 EXTRA_WARNINGS += -Wformat
+EXTRA_WARNINGS += -Wno-type-limits
 
 CC_NO_CLANG := $(shell $(CC) -dM -E -x c /dev/null | grep -Fq "__clang__"; echo $$?)
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 02/12] tools: bitmap: sync function declarations with the kernel
  2021-04-01  0:31 [PATCH v6 00/12] lib/find_bit: fast path for small bitmaps Yury Norov
  2021-04-01  0:31 ` [PATCH 01/12] tools: disable -Wno-type-limits Yury Norov
@ 2021-04-01  0:31 ` Yury Norov
  2021-04-01  0:31 ` [PATCH 03/12] tools: sync BITMAP_LAST_WORD_MASK() macro " Yury Norov
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 34+ messages in thread
From: Yury Norov @ 2021-04-01  0:31 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton
  Cc: Yury Norov, linux-m68k, linux-arch, linux-sh, Alexey Klimov,
	Andy Shevchenko, Arnd Bergmann, David Sterba, Dennis Zhou,
	Geert Uytterhoeven, Jianpeng Ma, Joe Perches,
	John Paul Adrian Glaubitz, Josh Poimboeuf, Rasmus Villemoes,
	Rich Felker, Stefano Brivio, Wei Yang, Wolfram Sang,
	Yoshinori Sato

Some functions in tools/include/linux/bitmap.h declare nbits as int. In the
kernel nbits is declared as unsigned int.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
---
 tools/include/linux/bitmap.h | 8 ++++----
 tools/lib/bitmap.c           | 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/tools/include/linux/bitmap.h b/tools/include/linux/bitmap.h
index 477a1cae513f..7cbd23e56d48 100644
--- a/tools/include/linux/bitmap.h
+++ b/tools/include/linux/bitmap.h
@@ -30,7 +30,7 @@ void bitmap_clear(unsigned long *map, unsigned int start, int len);
 #define small_const_nbits(nbits) \
 	(__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG)
 
-static inline void bitmap_zero(unsigned long *dst, int nbits)
+static inline void bitmap_zero(unsigned long *dst, unsigned int nbits)
 {
 	if (small_const_nbits(nbits))
 		*dst = 0UL;
@@ -66,7 +66,7 @@ static inline int bitmap_full(const unsigned long *src, unsigned int nbits)
 	return find_first_zero_bit(src, nbits) == nbits;
 }
 
-static inline int bitmap_weight(const unsigned long *src, int nbits)
+static inline int bitmap_weight(const unsigned long *src, unsigned int nbits)
 {
 	if (small_const_nbits(nbits))
 		return hweight_long(*src & BITMAP_LAST_WORD_MASK(nbits));
@@ -74,7 +74,7 @@ static inline int bitmap_weight(const unsigned long *src, int nbits)
 }
 
 static inline void bitmap_or(unsigned long *dst, const unsigned long *src1,
-			     const unsigned long *src2, int nbits)
+			     const unsigned long *src2, unsigned int nbits)
 {
 	if (small_const_nbits(nbits))
 		*dst = *src1 | *src2;
@@ -141,7 +141,7 @@ static inline void bitmap_free(unsigned long *bitmap)
  * @buf: buffer to store output
  * @size: size of @buf
  */
-size_t bitmap_scnprintf(unsigned long *bitmap, int nbits,
+size_t bitmap_scnprintf(unsigned long *bitmap, unsigned int nbits,
 			char *buf, size_t size);
 
 /**
diff --git a/tools/lib/bitmap.c b/tools/lib/bitmap.c
index 5043747ef6c5..f4e914712b6f 100644
--- a/tools/lib/bitmap.c
+++ b/tools/lib/bitmap.c
@@ -28,11 +28,11 @@ void __bitmap_or(unsigned long *dst, const unsigned long *bitmap1,
 		dst[k] = bitmap1[k] | bitmap2[k];
 }
 
-size_t bitmap_scnprintf(unsigned long *bitmap, int nbits,
+size_t bitmap_scnprintf(unsigned long *bitmap, unsigned int nbits,
 			char *buf, size_t size)
 {
 	/* current bit is 'cur', most recently seen range is [rbot, rtop] */
-	int cur, rbot, rtop;
+	unsigned int cur, rbot, rtop;
 	bool first = true;
 	size_t ret = 0;
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 03/12] tools: sync BITMAP_LAST_WORD_MASK() macro with the kernel
  2021-04-01  0:31 [PATCH v6 00/12] lib/find_bit: fast path for small bitmaps Yury Norov
  2021-04-01  0:31 ` [PATCH 01/12] tools: disable -Wno-type-limits Yury Norov
  2021-04-01  0:31 ` [PATCH 02/12] tools: bitmap: sync function declarations with the kernel Yury Norov
@ 2021-04-01  0:31 ` Yury Norov
  2021-04-01  0:31 ` [PATCH 04/12] arch: rearrange headers inclusion order in asm/bitops for m68k and sh Yury Norov
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 34+ messages in thread
From: Yury Norov @ 2021-04-01  0:31 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton
  Cc: Yury Norov, linux-m68k, linux-arch, linux-sh, Alexey Klimov,
	Andy Shevchenko, Arnd Bergmann, David Sterba, Dennis Zhou,
	Geert Uytterhoeven, Jianpeng Ma, Joe Perches,
	John Paul Adrian Glaubitz, Josh Poimboeuf, Rasmus Villemoes,
	Rich Felker, Stefano Brivio, Wei Yang, Wolfram Sang,
	Yoshinori Sato

Kernel version generates better code.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
---
 tools/include/linux/bitmap.h | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/tools/include/linux/bitmap.h b/tools/include/linux/bitmap.h
index 7cbd23e56d48..4aabc23ec747 100644
--- a/tools/include/linux/bitmap.h
+++ b/tools/include/linux/bitmap.h
@@ -20,12 +20,7 @@ int __bitmap_equal(const unsigned long *bitmap1,
 void bitmap_clear(unsigned long *map, unsigned int start, int len);
 
 #define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) & (BITS_PER_LONG - 1)))
-
-#define BITMAP_LAST_WORD_MASK(nbits)					\
-(									\
-	((nbits) % BITS_PER_LONG) ?					\
-		(1UL<<((nbits) % BITS_PER_LONG))-1 : ~0UL		\
-)
+#define BITMAP_LAST_WORD_MASK(nbits) (~0UL >> (-(nbits) & (BITS_PER_LONG - 1)))
 
 #define small_const_nbits(nbits) \
 	(__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 04/12] arch: rearrange headers inclusion order in asm/bitops for m68k and sh
  2021-04-01  0:31 [PATCH v6 00/12] lib/find_bit: fast path for small bitmaps Yury Norov
                   ` (2 preceding siblings ...)
  2021-04-01  0:31 ` [PATCH 03/12] tools: sync BITMAP_LAST_WORD_MASK() macro " Yury Norov
@ 2021-04-01  0:31 ` Yury Norov
  2021-04-01  0:31 ` [PATCH 05/12] lib: extend the scope of small_const_nbits() macro Yury Norov
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 34+ messages in thread
From: Yury Norov @ 2021-04-01  0:31 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton
  Cc: Yury Norov, linux-m68k, linux-arch, linux-sh, Alexey Klimov,
	Andy Shevchenko, Arnd Bergmann, David Sterba, Dennis Zhou,
	Geert Uytterhoeven, Jianpeng Ma, Joe Perches,
	John Paul Adrian Glaubitz, Josh Poimboeuf, Rasmus Villemoes,
	Rich Felker, Stefano Brivio, Wei Yang, Wolfram Sang,
	Yoshinori Sato

m68k and sh include bitmap/{find,le}.h prior to ffs/fls headers. New
fast-path implementation in find.h requires ffs/fls. Reordering the
headers inclusion sequence helps to prevent compile-time implicit
function declaration error.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
---
 arch/m68k/include/asm/bitops.h | 6 +++---
 arch/sh/include/asm/bitops.h   | 5 +++--
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
index 10133a968c8e..7b414099e5fc 100644
--- a/arch/m68k/include/asm/bitops.h
+++ b/arch/m68k/include/asm/bitops.h
@@ -440,8 +440,6 @@ static inline unsigned long ffz(unsigned long word)
 
 #endif
 
-#include <asm-generic/bitops/find.h>
-
 #ifdef __KERNEL__
 
 #if defined(CONFIG_CPU_HAS_NO_BITFIELDS)
@@ -525,10 +523,12 @@ static inline int __fls(int x)
 #define __clear_bit_unlock	clear_bit_unlock
 
 #include <asm-generic/bitops/ext2-atomic.h>
-#include <asm-generic/bitops/le.h>
 #include <asm-generic/bitops/fls64.h>
 #include <asm-generic/bitops/sched.h>
 #include <asm-generic/bitops/hweight.h>
+#include <asm-generic/bitops/le.h>
 #endif /* __KERNEL__ */
 
+#include <asm-generic/bitops/find.h>
+
 #endif /* _M68K_BITOPS_H */
diff --git a/arch/sh/include/asm/bitops.h b/arch/sh/include/asm/bitops.h
index 450b5854d7c6..3b6c7b5b7ec9 100644
--- a/arch/sh/include/asm/bitops.h
+++ b/arch/sh/include/asm/bitops.h
@@ -58,15 +58,16 @@ static inline unsigned long __ffs(unsigned long word)
 	return result;
 }
 
-#include <asm-generic/bitops/find.h>
 #include <asm-generic/bitops/ffs.h>
 #include <asm-generic/bitops/hweight.h>
 #include <asm-generic/bitops/lock.h>
 #include <asm-generic/bitops/sched.h>
-#include <asm-generic/bitops/le.h>
 #include <asm-generic/bitops/ext2-atomic.h>
 #include <asm-generic/bitops/fls.h>
 #include <asm-generic/bitops/__fls.h>
 #include <asm-generic/bitops/fls64.h>
 
+#include <asm-generic/bitops/le.h>
+#include <asm-generic/bitops/find.h>
+
 #endif /* __ASM_SH_BITOPS_H */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 05/12] lib: extend the scope of small_const_nbits() macro
  2021-04-01  0:31 [PATCH v6 00/12] lib/find_bit: fast path for small bitmaps Yury Norov
                   ` (3 preceding siblings ...)
  2021-04-01  0:31 ` [PATCH 04/12] arch: rearrange headers inclusion order in asm/bitops for m68k and sh Yury Norov
@ 2021-04-01  0:31 ` Yury Norov
  2021-04-01  8:35   ` Andy Shevchenko
  2021-04-01  0:31 ` [PATCH 06/12] tools: sync small_const_nbits() macro with the kernel Yury Norov
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Yury Norov @ 2021-04-01  0:31 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton
  Cc: Yury Norov, linux-m68k, linux-arch, linux-sh, Alexey Klimov,
	Andy Shevchenko, Arnd Bergmann, David Sterba, Dennis Zhou,
	Geert Uytterhoeven, Jianpeng Ma, Joe Perches,
	John Paul Adrian Glaubitz, Josh Poimboeuf, Rasmus Villemoes,
	Rich Felker, Stefano Brivio, Wei Yang, Wolfram Sang,
	Yoshinori Sato

find_bit would also benefit from small_const_nbits() optimizations.
The detailed comment is provided by Rasmus Villemoes.

Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
---
 include/asm-generic/bitsperlong.h | 12 ++++++++++++
 include/linux/bitmap.h            |  8 --------
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/include/asm-generic/bitsperlong.h b/include/asm-generic/bitsperlong.h
index 3905c1c93dc2..1023e2a4bd37 100644
--- a/include/asm-generic/bitsperlong.h
+++ b/include/asm-generic/bitsperlong.h
@@ -23,4 +23,16 @@
 #define BITS_PER_LONG_LONG 64
 #endif
 
+/*
+ * small_const_nbits(n) is true precisely when it is known at compile-time
+ * that BITMAP_SIZE(n) is 1, i.e. 1 <= n <= BITS_PER_LONG. This allows
+ * various bit/bitmap APIs to provide a fast inline implementation. Bitmaps
+ * of size 0 are very rare, and a compile-time-known-size 0 is most likely
+ * a sign of error. They will be handled correctly by the bit/bitmap APIs,
+ * but using the out-of-line functions, so that the inline implementations
+ * can unconditionally dereference the pointer(s).
+ */
+#define small_const_nbits(nbits) \
+	(__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG && (nbits) > 0)
+
 #endif /* __ASM_GENERIC_BITS_PER_LONG */
diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index 2cb1d7cfe8f9..a36cfcec4e77 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -230,14 +230,6 @@ int bitmap_print_to_pagebuf(bool list, char *buf,
 #define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) & (BITS_PER_LONG - 1)))
 #define BITMAP_LAST_WORD_MASK(nbits) (~0UL >> (-(nbits) & (BITS_PER_LONG - 1)))
 
-/*
- * The static inlines below do not handle constant nbits==0 correctly,
- * so make such users (should any ever turn up) call the out-of-line
- * versions.
- */
-#define small_const_nbits(nbits) \
-	(__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG && (nbits) > 0)
-
 static inline void bitmap_zero(unsigned long *dst, unsigned int nbits)
 {
 	unsigned int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 06/12] tools: sync small_const_nbits() macro with the kernel
  2021-04-01  0:31 [PATCH v6 00/12] lib/find_bit: fast path for small bitmaps Yury Norov
                   ` (4 preceding siblings ...)
  2021-04-01  0:31 ` [PATCH 05/12] lib: extend the scope of small_const_nbits() macro Yury Norov
@ 2021-04-01  0:31 ` Yury Norov
  2021-04-01  0:31 ` [PATCH 07/12] lib: inline _find_next_bit() wrappers Yury Norov
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 34+ messages in thread
From: Yury Norov @ 2021-04-01  0:31 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton
  Cc: Yury Norov, linux-m68k, linux-arch, linux-sh, Alexey Klimov,
	Andy Shevchenko, Arnd Bergmann, David Sterba, Dennis Zhou,
	Geert Uytterhoeven, Jianpeng Ma, Joe Perches,
	John Paul Adrian Glaubitz, Josh Poimboeuf, Rasmus Villemoes,
	Rich Felker, Stefano Brivio, Wei Yang, Wolfram Sang,
	Yoshinori Sato

Sync implementation with the kernel and move the macro from
tools/include/linux/bitmap.h to tools/include/asm-generic/bitsperlong.h

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
---
 tools/include/asm-generic/bitsperlong.h | 3 +++
 tools/include/linux/bitmap.h            | 3 ---
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/include/asm-generic/bitsperlong.h b/tools/include/asm-generic/bitsperlong.h
index 8f2283052333..2093d56ddd11 100644
--- a/tools/include/asm-generic/bitsperlong.h
+++ b/tools/include/asm-generic/bitsperlong.h
@@ -18,4 +18,7 @@
 #define BITS_PER_LONG_LONG 64
 #endif
 
+#define small_const_nbits(nbits) \
+	(__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG && (nbits) > 0)
+
 #endif /* __ASM_GENERIC_BITS_PER_LONG */
diff --git a/tools/include/linux/bitmap.h b/tools/include/linux/bitmap.h
index 4aabc23ec747..330dbf7509cc 100644
--- a/tools/include/linux/bitmap.h
+++ b/tools/include/linux/bitmap.h
@@ -22,9 +22,6 @@ void bitmap_clear(unsigned long *map, unsigned int start, int len);
 #define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) & (BITS_PER_LONG - 1)))
 #define BITMAP_LAST_WORD_MASK(nbits) (~0UL >> (-(nbits) & (BITS_PER_LONG - 1)))
 
-#define small_const_nbits(nbits) \
-	(__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG)
-
 static inline void bitmap_zero(unsigned long *dst, unsigned int nbits)
 {
 	if (small_const_nbits(nbits))
-- 
2.25.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 07/12] lib: inline _find_next_bit() wrappers
  2021-04-01  0:31 [PATCH v6 00/12] lib/find_bit: fast path for small bitmaps Yury Norov
                   ` (5 preceding siblings ...)
  2021-04-01  0:31 ` [PATCH 06/12] tools: sync small_const_nbits() macro with the kernel Yury Norov
@ 2021-04-01  0:31 ` Yury Norov
  2021-04-01  8:37   ` Andy Shevchenko
  2021-04-01  0:31 ` [PATCH 08/12] tools: sync find_next_bit implementation Yury Norov
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Yury Norov @ 2021-04-01  0:31 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton
  Cc: Yury Norov, linux-m68k, linux-arch, linux-sh, Alexey Klimov,
	Andy Shevchenko, Arnd Bergmann, David Sterba, Dennis Zhou,
	Geert Uytterhoeven, Jianpeng Ma, Joe Perches,
	John Paul Adrian Glaubitz, Josh Poimboeuf, Rasmus Villemoes,
	Rich Felker, Stefano Brivio, Wei Yang, Wolfram Sang,
	Yoshinori Sato

lib/find_bit.c declares five single-line wrappers for _find_next_bit().
We may turn those wrappers to inline functions. It eliminates unneeded
function calls and opens room for compile-time optimizations.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
---
 include/asm-generic/bitops/find.h | 28 ++++++++++++----
 include/asm-generic/bitops/le.h   | 17 +++++++---
 lib/find_bit.c                    | 56 ++-----------------------------
 3 files changed, 37 insertions(+), 64 deletions(-)

diff --git a/include/asm-generic/bitops/find.h b/include/asm-generic/bitops/find.h
index 9fdf21302fdf..7ad70dab8e93 100644
--- a/include/asm-generic/bitops/find.h
+++ b/include/asm-generic/bitops/find.h
@@ -2,6 +2,10 @@
 #ifndef _ASM_GENERIC_BITOPS_FIND_H_
 #define _ASM_GENERIC_BITOPS_FIND_H_
 
+extern unsigned long _find_next_bit(const unsigned long *addr1,
+		const unsigned long *addr2, unsigned long nbits,
+		unsigned long start, unsigned long invert, unsigned long le);
+
 #ifndef find_next_bit
 /**
  * find_next_bit - find the next set bit in a memory region
@@ -12,8 +16,12 @@
  * Returns the bit number for the next set bit
  * If no bits are set, returns @size.
  */
-extern unsigned long find_next_bit(const unsigned long *addr, unsigned long
-		size, unsigned long offset);
+static inline
+unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
+			    unsigned long offset)
+{
+	return _find_next_bit(addr, NULL, size, offset, 0UL, 0);
+}
 #endif
 
 #ifndef find_next_and_bit
@@ -27,9 +35,13 @@ extern unsigned long find_next_bit(const unsigned long *addr, unsigned long
  * Returns the bit number for the next set bit
  * If no bits are set, returns @size.
  */
-extern unsigned long find_next_and_bit(const unsigned long *addr1,
+static inline
+unsigned long find_next_and_bit(const unsigned long *addr1,
 		const unsigned long *addr2, unsigned long size,
-		unsigned long offset);
+		unsigned long offset)
+{
+	return _find_next_bit(addr1, addr2, size, offset, 0UL, 0);
+}
 #endif
 
 #ifndef find_next_zero_bit
@@ -42,8 +54,12 @@ extern unsigned long find_next_and_bit(const unsigned long *addr1,
  * Returns the bit number of the next zero bit
  * If no bits are zero, returns @size.
  */
-extern unsigned long find_next_zero_bit(const unsigned long *addr, unsigned
-		long size, unsigned long offset);
+static inline
+unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
+				 unsigned long offset)
+{
+	return _find_next_bit(addr, NULL, size, offset, ~0UL, 0);
+}
 #endif
 
 #ifdef CONFIG_GENERIC_FIND_FIRST_BIT
diff --git a/include/asm-generic/bitops/le.h b/include/asm-generic/bitops/le.h
index 188d3eba3ace..21305f6cea0b 100644
--- a/include/asm-generic/bitops/le.h
+++ b/include/asm-generic/bitops/le.h
@@ -2,6 +2,7 @@
 #ifndef _ASM_GENERIC_BITOPS_LE_H_
 #define _ASM_GENERIC_BITOPS_LE_H_
 
+#include <asm-generic/bitops/find.h>
 #include <asm/types.h>
 #include <asm/byteorder.h>
 
@@ -32,13 +33,21 @@ static inline unsigned long find_first_zero_bit_le(const void *addr,
 #define BITOP_LE_SWIZZLE	((BITS_PER_LONG-1) & ~0x7)
 
 #ifndef find_next_zero_bit_le
-extern unsigned long find_next_zero_bit_le(const void *addr,
-		unsigned long size, unsigned long offset);
+static inline
+unsigned long find_next_zero_bit_le(const void *addr, unsigned
+		long size, unsigned long offset)
+{
+	return _find_next_bit(addr, NULL, size, offset, ~0UL, 1);
+}
 #endif
 
 #ifndef find_next_bit_le
-extern unsigned long find_next_bit_le(const void *addr,
-		unsigned long size, unsigned long offset);
+static inline
+unsigned long find_next_bit_le(const void *addr, unsigned
+		long size, unsigned long offset)
+{
+	return _find_next_bit(addr, NULL, size, offset, 0UL, 1);
+}
 #endif
 
 #ifndef find_first_zero_bit_le
diff --git a/lib/find_bit.c b/lib/find_bit.c
index f67f86fd2f62..b03a101367f8 100644
--- a/lib/find_bit.c
+++ b/lib/find_bit.c
@@ -29,7 +29,7 @@
  *    searching it for one bits.
  *  - The optional "addr2", which is anded with "addr1" if present.
  */
-static unsigned long _find_next_bit(const unsigned long *addr1,
+unsigned long _find_next_bit(const unsigned long *addr1,
 		const unsigned long *addr2, unsigned long nbits,
 		unsigned long start, unsigned long invert, unsigned long le)
 {
@@ -68,37 +68,7 @@ static unsigned long _find_next_bit(const unsigned long *addr1,
 
 	return min(start + __ffs(tmp), nbits);
 }
-#endif
-
-#ifndef find_next_bit
-/*
- * Find the next set bit in a memory region.
- */
-unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
-			    unsigned long offset)
-{
-	return _find_next_bit(addr, NULL, size, offset, 0UL, 0);
-}
-EXPORT_SYMBOL(find_next_bit);
-#endif
-
-#ifndef find_next_zero_bit
-unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
-				 unsigned long offset)
-{
-	return _find_next_bit(addr, NULL, size, offset, ~0UL, 0);
-}
-EXPORT_SYMBOL(find_next_zero_bit);
-#endif
-
-#if !defined(find_next_and_bit)
-unsigned long find_next_and_bit(const unsigned long *addr1,
-		const unsigned long *addr2, unsigned long size,
-		unsigned long offset)
-{
-	return _find_next_bit(addr1, addr2, size, offset, 0UL, 0);
-}
-EXPORT_SYMBOL(find_next_and_bit);
+EXPORT_SYMBOL(_find_next_bit);
 #endif
 
 #ifndef find_first_bit
@@ -157,28 +127,6 @@ unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
 EXPORT_SYMBOL(find_last_bit);
 #endif
 
-#ifdef __BIG_ENDIAN
-
-#ifndef find_next_zero_bit_le
-unsigned long find_next_zero_bit_le(const void *addr, unsigned
-		long size, unsigned long offset)
-{
-	return _find_next_bit(addr, NULL, size, offset, ~0UL, 1);
-}
-EXPORT_SYMBOL(find_next_zero_bit_le);
-#endif
-
-#ifndef find_next_bit_le
-unsigned long find_next_bit_le(const void *addr, unsigned
-		long size, unsigned long offset)
-{
-	return _find_next_bit(addr, NULL, size, offset, 0UL, 1);
-}
-EXPORT_SYMBOL(find_next_bit_le);
-#endif
-
-#endif /* __BIG_ENDIAN */
-
 unsigned long find_next_clump8(unsigned long *clump, const unsigned long *addr,
 			       unsigned long size, unsigned long offset)
 {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 08/12] tools: sync find_next_bit implementation
  2021-04-01  0:31 [PATCH v6 00/12] lib/find_bit: fast path for small bitmaps Yury Norov
                   ` (6 preceding siblings ...)
  2021-04-01  0:31 ` [PATCH 07/12] lib: inline _find_next_bit() wrappers Yury Norov
@ 2021-04-01  0:31 ` Yury Norov
  2021-04-01  0:31 ` [PATCH 09/12] lib: add fast path for find_next_*_bit() Yury Norov
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 34+ messages in thread
From: Yury Norov @ 2021-04-01  0:31 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton
  Cc: Yury Norov, linux-m68k, linux-arch, linux-sh, Alexey Klimov,
	Andy Shevchenko, Arnd Bergmann, David Sterba, Dennis Zhou,
	Geert Uytterhoeven, Jianpeng Ma, Joe Perches,
	John Paul Adrian Glaubitz, Josh Poimboeuf, Rasmus Villemoes,
	Rich Felker, Stefano Brivio, Wei Yang, Wolfram Sang,
	Yoshinori Sato

Sync the implementation with recent kernel changes.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
---
 tools/include/asm-generic/bitops/find.h | 27 ++++++++++---
 tools/lib/find_bit.c                    | 52 ++++++++++---------------
 2 files changed, 42 insertions(+), 37 deletions(-)

diff --git a/tools/include/asm-generic/bitops/find.h b/tools/include/asm-generic/bitops/find.h
index 16ed1982cb34..9fe62d10b084 100644
--- a/tools/include/asm-generic/bitops/find.h
+++ b/tools/include/asm-generic/bitops/find.h
@@ -2,6 +2,10 @@
 #ifndef _TOOLS_LINUX_ASM_GENERIC_BITOPS_FIND_H_
 #define _TOOLS_LINUX_ASM_GENERIC_BITOPS_FIND_H_
 
+extern unsigned long _find_next_bit(const unsigned long *addr1,
+		const unsigned long *addr2, unsigned long nbits,
+		unsigned long start, unsigned long invert, unsigned long le);
+
 #ifndef find_next_bit
 /**
  * find_next_bit - find the next set bit in a memory region
@@ -12,8 +16,12 @@
  * Returns the bit number for the next set bit
  * If no bits are set, returns @size.
  */
-extern unsigned long find_next_bit(const unsigned long *addr, unsigned long
-		size, unsigned long offset);
+static inline
+unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
+			    unsigned long offset)
+{
+	return _find_next_bit(addr, NULL, size, offset, 0UL, 0);
+}
 #endif
 
 #ifndef find_next_and_bit
@@ -27,13 +35,16 @@ extern unsigned long find_next_bit(const unsigned long *addr, unsigned long
  * Returns the bit number for the next set bit
  * If no bits are set, returns @size.
  */
-extern unsigned long find_next_and_bit(const unsigned long *addr1,
+static inline
+unsigned long find_next_and_bit(const unsigned long *addr1,
 		const unsigned long *addr2, unsigned long size,
-		unsigned long offset);
+		unsigned long offset)
+{
+	return _find_next_bit(addr1, addr2, size, offset, 0UL, 0);
+}
 #endif
 
 #ifndef find_next_zero_bit
-
 /**
  * find_next_zero_bit - find the next cleared bit in a memory region
  * @addr: The address to base the search on
@@ -43,8 +54,12 @@ extern unsigned long find_next_and_bit(const unsigned long *addr1,
  * Returns the bit number of the next zero bit
  * If no bits are zero, returns @size.
  */
+static inline
 unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
-				 unsigned long offset);
+				 unsigned long offset)
+{
+	return _find_next_bit(addr, NULL, size, offset, ~0UL, 0);
+}
 #endif
 
 #ifndef find_first_bit
diff --git a/tools/lib/find_bit.c b/tools/lib/find_bit.c
index ac37022e9486..589fd2f26f94 100644
--- a/tools/lib/find_bit.c
+++ b/tools/lib/find_bit.c
@@ -28,11 +28,12 @@
  *    searching it for one bits.
  *  - The optional "addr2", which is anded with "addr1" if present.
  */
-static inline unsigned long _find_next_bit(const unsigned long *addr1,
+unsigned long _find_next_bit(const unsigned long *addr1,
 		const unsigned long *addr2, unsigned long nbits,
-		unsigned long start, unsigned long invert)
+		unsigned long start, unsigned long invert, unsigned long le)
 {
-	unsigned long tmp;
+	unsigned long tmp, mask;
+	(void) le;
 
 	if (unlikely(start >= nbits))
 		return nbits;
@@ -43,7 +44,19 @@ static inline unsigned long _find_next_bit(const unsigned long *addr1,
 	tmp ^= invert;
 
 	/* Handle 1st word. */
-	tmp &= BITMAP_FIRST_WORD_MASK(start);
+	mask = BITMAP_FIRST_WORD_MASK(start);
+
+	/*
+	 * Due to the lack of swab() in tools, and the fact that it doesn't
+	 * need little-endian support, just comment it out
+	 */
+#if (0)
+	if (le)
+		mask = swab(mask);
+#endif
+
+	tmp &= mask;
+
 	start = round_down(start, BITS_PER_LONG);
 
 	while (!tmp) {
@@ -57,18 +70,12 @@ static inline unsigned long _find_next_bit(const unsigned long *addr1,
 		tmp ^= invert;
 	}
 
-	return min(start + __ffs(tmp), nbits);
-}
+#if (0)
+	if (le)
+		tmp = swab(tmp);
 #endif
 
-#ifndef find_next_bit
-/*
- * Find the next set bit in a memory region.
- */
-unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
-			    unsigned long offset)
-{
-	return _find_next_bit(addr, NULL, size, offset, 0UL);
+	return min(start + __ffs(tmp), nbits);
 }
 #endif
 
@@ -105,20 +112,3 @@ unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
 	return size;
 }
 #endif
-
-#ifndef find_next_zero_bit
-unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
-				 unsigned long offset)
-{
-	return _find_next_bit(addr, NULL, size, offset, ~0UL);
-}
-#endif
-
-#ifndef find_next_and_bit
-unsigned long find_next_and_bit(const unsigned long *addr1,
-		const unsigned long *addr2, unsigned long size,
-		unsigned long offset)
-{
-	return _find_next_bit(addr1, addr2, size, offset, 0UL);
-}
-#endif
-- 
2.25.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 09/12] lib: add fast path for find_next_*_bit()
  2021-04-01  0:31 [PATCH v6 00/12] lib/find_bit: fast path for small bitmaps Yury Norov
                   ` (7 preceding siblings ...)
  2021-04-01  0:31 ` [PATCH 08/12] tools: sync find_next_bit implementation Yury Norov
@ 2021-04-01  0:31 ` Yury Norov
  2021-04-01  8:48   ` Andy Shevchenko
  2021-04-01  0:31 ` [PATCH 10/12] lib: add fast path for find_first_*_bit() and find_last_bit() Yury Norov
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Yury Norov @ 2021-04-01  0:31 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton
  Cc: Yury Norov, linux-m68k, linux-arch, linux-sh, Alexey Klimov,
	Andy Shevchenko, Arnd Bergmann, David Sterba, Dennis Zhou,
	Geert Uytterhoeven, Jianpeng Ma, Joe Perches,
	John Paul Adrian Glaubitz, Josh Poimboeuf, Rasmus Villemoes,
	Rich Felker, Stefano Brivio, Wei Yang, Wolfram Sang,
	Yoshinori Sato

Similarly to bitmap functions, find_next_*_bit() users will benefit
if we'll handle a case of bitmaps that fit into a single word inline.
In the very best case, the compiler may replace a function call with
a few instructions.

This is the quite typical find_next_bit() user:

	unsigned int cpumask_next(int n, const struct cpumask *srcp)
	{
		/* -1 is a legal arg here. */
		if (n != -1)
			cpumask_check(n);
		return find_next_bit(cpumask_bits(srcp), nr_cpumask_bits, n + 1);
	}
	EXPORT_SYMBOL(cpumask_next);

Currently, on ARM64 the generated code looks like this:
	0000000000000000 <cpumask_next>:
	   0:   a9bf7bfd        stp     x29, x30, [sp, #-16]!
	   4:   11000402        add     w2, w0, #0x1
	   8:   aa0103e0        mov     x0, x1
	   c:   d2800401        mov     x1, #0x40                       // #64
	  10:   910003fd        mov     x29, sp
	  14:   93407c42        sxtw    x2, w2
	  18:   94000000        bl      0 <find_next_bit>
	  1c:   a8c17bfd        ldp     x29, x30, [sp], #16
	  20:   d65f03c0        ret
	  24:   d503201f        nop

After applying this patch:
	0000000000000140 <cpumask_next>:
	 140:   11000400        add     w0, w0, #0x1
	 144:   93407c00        sxtw    x0, w0
	 148:   f100fc1f        cmp     x0, #0x3f
	 14c:   54000168        b.hi    178 <cpumask_next+0x38>  // b.pmore
	 150:   f9400023        ldr     x3, [x1]
	 154:   92800001        mov     x1, #0xffffffffffffffff         // #-1
	 158:   9ac02020        lsl     x0, x1, x0
	 15c:   52800802        mov     w2, #0x40                       // #64
	 160:   8a030001        and     x1, x0, x3
	 164:   dac00020        rbit    x0, x1
	 168:   f100003f        cmp     x1, #0x0
	 16c:   dac01000        clz     x0, x0
	 170:   1a800040        csel    w0, w2, w0, eq  // eq = none
	 174:   d65f03c0        ret
	 178:   52800800        mov     w0, #0x40                       // #64
	 17c:   d65f03c0        ret

find_next_bit() call is replaced with 6 instructions. find_next_bit()
itself is 41 instructions plus function call overhead.

Despite inlining, the scripts/bloat-o-meter report smaller .text size
after applying the series:
	add/remove: 11/9 grow/shrink: 233/176 up/down: 5780/-6768 (-988)

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
---
 include/asm-generic/bitops/find.h | 30 ++++++++++++++++++++++++++++++
 include/asm-generic/bitops/le.h   | 21 +++++++++++++++++++++
 2 files changed, 51 insertions(+)

diff --git a/include/asm-generic/bitops/find.h b/include/asm-generic/bitops/find.h
index 7ad70dab8e93..4148c74a1e4d 100644
--- a/include/asm-generic/bitops/find.h
+++ b/include/asm-generic/bitops/find.h
@@ -20,6 +20,16 @@ static inline
 unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
 			    unsigned long offset)
 {
+	if (small_const_nbits(size)) {
+		unsigned long val;
+
+		if (unlikely(offset >= size))
+			return size;
+
+		val = *addr & GENMASK(size - 1, offset);
+		return val ? __ffs(val) : size;
+	}
+
 	return _find_next_bit(addr, NULL, size, offset, 0UL, 0);
 }
 #endif
@@ -40,6 +50,16 @@ unsigned long find_next_and_bit(const unsigned long *addr1,
 		const unsigned long *addr2, unsigned long size,
 		unsigned long offset)
 {
+	if (small_const_nbits(size)) {
+		unsigned long val;
+
+		if (unlikely(offset >= size))
+			return size;
+
+		val = *addr1 & *addr2 & GENMASK(size - 1, offset);
+		return val ? __ffs(val) : size;
+	}
+
 	return _find_next_bit(addr1, addr2, size, offset, 0UL, 0);
 }
 #endif
@@ -58,6 +78,16 @@ static inline
 unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
 				 unsigned long offset)
 {
+	if (small_const_nbits(size)) {
+		unsigned long val;
+
+		if (unlikely(offset >= size))
+			return size;
+
+		val = *addr | ~GENMASK(size - 1, offset);
+		return val == ~0UL ? size : ffz(val);
+	}
+
 	return _find_next_bit(addr, NULL, size, offset, ~0UL, 0);
 }
 #endif
diff --git a/include/asm-generic/bitops/le.h b/include/asm-generic/bitops/le.h
index 21305f6cea0b..5a28629cbf4d 100644
--- a/include/asm-generic/bitops/le.h
+++ b/include/asm-generic/bitops/le.h
@@ -5,6 +5,7 @@
 #include <asm-generic/bitops/find.h>
 #include <asm/types.h>
 #include <asm/byteorder.h>
+#include <linux/swab.h>
 
 #if defined(__LITTLE_ENDIAN)
 
@@ -37,6 +38,16 @@ static inline
 unsigned long find_next_zero_bit_le(const void *addr, unsigned
 		long size, unsigned long offset)
 {
+	if (small_const_nbits(size)) {
+		unsigned long val = *(const unsigned long *)addr;
+
+		if (unlikely(offset >= size))
+			return size;
+
+		val = swab(val) | ~GENMASK(size - 1, offset);
+		return val == ~0UL ? size : ffz(val);
+	}
+
 	return _find_next_bit(addr, NULL, size, offset, ~0UL, 1);
 }
 #endif
@@ -46,6 +57,16 @@ static inline
 unsigned long find_next_bit_le(const void *addr, unsigned
 		long size, unsigned long offset)
 {
+	if (small_const_nbits(size)) {
+		unsigned long val = *(const unsigned long *)addr;
+
+		if (unlikely(offset >= size))
+			return size;
+
+		val = swab(val) & GENMASK(size - 1, offset);
+		return val ? __ffs(val) : size;
+	}
+
 	return _find_next_bit(addr, NULL, size, offset, 0UL, 1);
 }
 #endif
-- 
2.25.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 10/12] lib: add fast path for find_first_*_bit() and find_last_bit()
  2021-04-01  0:31 [PATCH v6 00/12] lib/find_bit: fast path for small bitmaps Yury Norov
                   ` (8 preceding siblings ...)
  2021-04-01  0:31 ` [PATCH 09/12] lib: add fast path for find_next_*_bit() Yury Norov
@ 2021-04-01  0:31 ` Yury Norov
  2021-04-01  8:58   ` Andy Shevchenko
  2021-04-01  0:31 ` [PATCH 11/12] tools: sync lib/find_bit implementation Yury Norov
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Yury Norov @ 2021-04-01  0:31 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton
  Cc: Yury Norov, linux-m68k, linux-arch, linux-sh, Alexey Klimov,
	Andy Shevchenko, Arnd Bergmann, David Sterba, Dennis Zhou,
	Geert Uytterhoeven, Jianpeng Ma, Joe Perches,
	John Paul Adrian Glaubitz, Josh Poimboeuf, Rasmus Villemoes,
	Rich Felker, Stefano Brivio, Wei Yang, Wolfram Sang,
	Yoshinori Sato

Similarly to bitmap functions, users would benefit if we'll handle
a case of small-size bitmaps that fit into a single word.

While here, move the find_last_bit() declaration to bitops/find.h
where other find_*_bit() functions sit.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
---
 include/asm-generic/bitops/find.h | 50 ++++++++++++++++++++++++++++---
 include/linux/bitops.h            | 12 --------
 lib/find_bit.c                    | 12 ++++----
 3 files changed, 52 insertions(+), 22 deletions(-)

diff --git a/include/asm-generic/bitops/find.h b/include/asm-generic/bitops/find.h
index 4148c74a1e4d..0d132ee2a291 100644
--- a/include/asm-generic/bitops/find.h
+++ b/include/asm-generic/bitops/find.h
@@ -5,6 +5,9 @@
 extern unsigned long _find_next_bit(const unsigned long *addr1,
 		const unsigned long *addr2, unsigned long nbits,
 		unsigned long start, unsigned long invert, unsigned long le);
+extern unsigned long _find_first_bit(const unsigned long *addr, unsigned long size);
+extern unsigned long _find_first_zero_bit(const unsigned long *addr, unsigned long size);
+extern unsigned long _find_last_bit(const unsigned long *addr, unsigned long size);
 
 #ifndef find_next_bit
 /**
@@ -102,8 +105,17 @@ unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
  * Returns the bit number of the first set bit.
  * If no bits are set, returns @size.
  */
-extern unsigned long find_first_bit(const unsigned long *addr,
-				    unsigned long size);
+static inline
+unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
+{
+	if (small_const_nbits(size)) {
+		unsigned long val = *addr & GENMASK(size - 1, 0);
+
+		return val ? __ffs(val) : size;
+	}
+
+	return _find_first_bit(addr, size);
+}
 
 /**
  * find_first_zero_bit - find the first cleared bit in a memory region
@@ -113,8 +125,17 @@ extern unsigned long find_first_bit(const unsigned long *addr,
  * Returns the bit number of the first cleared bit.
  * If no bits are zero, returns @size.
  */
-extern unsigned long find_first_zero_bit(const unsigned long *addr,
-					 unsigned long size);
+static inline
+unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
+{
+	if (small_const_nbits(size)) {
+		unsigned long val = *addr | ~GENMASK(size - 1, 0);
+
+		return val == ~0UL ? size : ffz(val);
+	}
+
+	return _find_first_zero_bit(addr, size);
+}
 #else /* CONFIG_GENERIC_FIND_FIRST_BIT */
 
 #ifndef find_first_bit
@@ -126,6 +147,27 @@ extern unsigned long find_first_zero_bit(const unsigned long *addr,
 
 #endif /* CONFIG_GENERIC_FIND_FIRST_BIT */
 
+#ifndef find_last_bit
+/**
+ * find_last_bit - find the last set bit in a memory region
+ * @addr: The address to start the search at
+ * @size: The number of bits to search
+ *
+ * Returns the bit number of the last set bit, or size.
+ */
+static inline
+unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
+{
+	if (small_const_nbits(size)) {
+		unsigned long val = *addr & GENMASK(size - 1, 0);
+
+		return val ? __fls(val) : size;
+	}
+
+	return _find_last_bit(addr, size);
+}
+#endif
+
 /**
  * find_next_clump8 - find next 8-bit clump with set bits in a memory region
  * @clump: location to store copy of found clump
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index a5a48303b0f1..26bf15e6cd35 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -286,17 +286,5 @@ static __always_inline void __assign_bit(long nr, volatile unsigned long *addr,
 })
 #endif
 
-#ifndef find_last_bit
-/**
- * find_last_bit - find the last set bit in a memory region
- * @addr: The address to start the search at
- * @size: The number of bits to search
- *
- * Returns the bit number of the last set bit, or size.
- */
-extern unsigned long find_last_bit(const unsigned long *addr,
-				   unsigned long size);
-#endif
-
 #endif /* __KERNEL__ */
 #endif
diff --git a/lib/find_bit.c b/lib/find_bit.c
index b03a101367f8..0f8e2e369b1d 100644
--- a/lib/find_bit.c
+++ b/lib/find_bit.c
@@ -75,7 +75,7 @@ EXPORT_SYMBOL(_find_next_bit);
 /*
  * Find the first set bit in a memory region.
  */
-unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
+unsigned long _find_first_bit(const unsigned long *addr, unsigned long size)
 {
 	unsigned long idx;
 
@@ -86,14 +86,14 @@ unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
 
 	return size;
 }
-EXPORT_SYMBOL(find_first_bit);
+EXPORT_SYMBOL(_find_first_bit);
 #endif
 
 #ifndef find_first_zero_bit
 /*
  * Find the first cleared bit in a memory region.
  */
-unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
+unsigned long _find_first_zero_bit(const unsigned long *addr, unsigned long size)
 {
 	unsigned long idx;
 
@@ -104,11 +104,11 @@ unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
 
 	return size;
 }
-EXPORT_SYMBOL(find_first_zero_bit);
+EXPORT_SYMBOL(_find_first_zero_bit);
 #endif
 
 #ifndef find_last_bit
-unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
+unsigned long _find_last_bit(const unsigned long *addr, unsigned long size)
 {
 	if (size) {
 		unsigned long val = BITMAP_LAST_WORD_MASK(size);
@@ -124,7 +124,7 @@ unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
 	}
 	return size;
 }
-EXPORT_SYMBOL(find_last_bit);
+EXPORT_SYMBOL(_find_last_bit);
 #endif
 
 unsigned long find_next_clump8(unsigned long *clump, const unsigned long *addr,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 11/12] tools: sync lib/find_bit implementation
  2021-04-01  0:31 [PATCH v6 00/12] lib/find_bit: fast path for small bitmaps Yury Norov
                   ` (9 preceding siblings ...)
  2021-04-01  0:31 ` [PATCH 10/12] lib: add fast path for find_first_*_bit() and find_last_bit() Yury Norov
@ 2021-04-01  0:31 ` Yury Norov
  2021-05-10 15:27   ` Tetsuo Handa
  2021-04-01  0:31 ` [PATCH 12/12] MAINTAINERS: Add entry for the bitmap API Yury Norov
  2021-04-01  9:14 ` [PATCH v6 00/12] lib/find_bit: fast path for small bitmaps Andy Shevchenko
  12 siblings, 1 reply; 34+ messages in thread
From: Yury Norov @ 2021-04-01  0:31 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton
  Cc: Yury Norov, linux-m68k, linux-arch, linux-sh, Alexey Klimov,
	Andy Shevchenko, Arnd Bergmann, David Sterba, Dennis Zhou,
	Geert Uytterhoeven, Jianpeng Ma, Joe Perches,
	John Paul Adrian Glaubitz, Josh Poimboeuf, Rasmus Villemoes,
	Rich Felker, Stefano Brivio, Wei Yang, Wolfram Sang,
	Yoshinori Sato

Add fast paths to find_*_bit() functions as per kernel implementation.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
---
 tools/include/asm-generic/bitops/find.h | 58 +++++++++++++++++++++++--
 tools/lib/find_bit.c                    |  4 +-
 2 files changed, 57 insertions(+), 5 deletions(-)

diff --git a/tools/include/asm-generic/bitops/find.h b/tools/include/asm-generic/bitops/find.h
index 9fe62d10b084..6481fd11012a 100644
--- a/tools/include/asm-generic/bitops/find.h
+++ b/tools/include/asm-generic/bitops/find.h
@@ -5,6 +5,9 @@
 extern unsigned long _find_next_bit(const unsigned long *addr1,
 		const unsigned long *addr2, unsigned long nbits,
 		unsigned long start, unsigned long invert, unsigned long le);
+extern unsigned long _find_first_bit(const unsigned long *addr, unsigned long size);
+extern unsigned long _find_first_zero_bit(const unsigned long *addr, unsigned long size);
+extern unsigned long _find_last_bit(const unsigned long *addr, unsigned long size);
 
 #ifndef find_next_bit
 /**
@@ -20,6 +23,16 @@ static inline
 unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
 			    unsigned long offset)
 {
+	if (small_const_nbits(size)) {
+		unsigned long val;
+
+		if (unlikely(offset >= size))
+			return size;
+
+		val = *addr & GENMASK(size - 1, offset);
+		return val ? __ffs(val) : size;
+	}
+
 	return _find_next_bit(addr, NULL, size, offset, 0UL, 0);
 }
 #endif
@@ -40,6 +53,16 @@ unsigned long find_next_and_bit(const unsigned long *addr1,
 		const unsigned long *addr2, unsigned long size,
 		unsigned long offset)
 {
+	if (small_const_nbits(size)) {
+		unsigned long val;
+
+		if (unlikely(offset >= size))
+			return size;
+
+		val = *addr1 & *addr2 & GENMASK(size - 1, offset);
+		return val ? __ffs(val) : size;
+	}
+
 	return _find_next_bit(addr1, addr2, size, offset, 0UL, 0);
 }
 #endif
@@ -58,6 +81,16 @@ static inline
 unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
 				 unsigned long offset)
 {
+	if (small_const_nbits(size)) {
+		unsigned long val;
+
+		if (unlikely(offset >= size))
+			return size;
+
+		val = *addr | ~GENMASK(size - 1, offset);
+		return val == ~0UL ? size : ffz(val);
+	}
+
 	return _find_next_bit(addr, NULL, size, offset, ~0UL, 0);
 }
 #endif
@@ -72,8 +105,17 @@ unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
  * Returns the bit number of the first set bit.
  * If no bits are set, returns @size.
  */
-extern unsigned long find_first_bit(const unsigned long *addr,
-				    unsigned long size);
+static inline
+unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
+{
+	if (small_const_nbits(size)) {
+		unsigned long val = *addr & GENMASK(size - 1, 0);
+
+		return val ? __ffs(val) : size;
+	}
+
+	return _find_first_bit(addr, size);
+}
 
 #endif /* find_first_bit */
 
@@ -87,7 +129,17 @@ extern unsigned long find_first_bit(const unsigned long *addr,
  * Returns the bit number of the first cleared bit.
  * If no bits are zero, returns @size.
  */
-unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size);
+static inline
+unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
+{
+	if (small_const_nbits(size)) {
+		unsigned long val = *addr | ~GENMASK(size - 1, 0);
+
+		return val == ~0UL ? size : ffz(val);
+	}
+
+	return _find_first_zero_bit(addr, size);
+}
 #endif
 
 #endif /*_TOOLS_LINUX_ASM_GENERIC_BITOPS_FIND_H_ */
diff --git a/tools/lib/find_bit.c b/tools/lib/find_bit.c
index 589fd2f26f94..109aa7ffcf97 100644
--- a/tools/lib/find_bit.c
+++ b/tools/lib/find_bit.c
@@ -83,7 +83,7 @@ unsigned long _find_next_bit(const unsigned long *addr1,
 /*
  * Find the first set bit in a memory region.
  */
-unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
+unsigned long _find_first_bit(const unsigned long *addr, unsigned long size)
 {
 	unsigned long idx;
 
@@ -100,7 +100,7 @@ unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
 /*
  * Find the first cleared bit in a memory region.
  */
-unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
+unsigned long _find_first_zero_bit(const unsigned long *addr, unsigned long size)
 {
 	unsigned long idx;
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 12/12] MAINTAINERS: Add entry for the bitmap API
  2021-04-01  0:31 [PATCH v6 00/12] lib/find_bit: fast path for small bitmaps Yury Norov
                   ` (10 preceding siblings ...)
  2021-04-01  0:31 ` [PATCH 11/12] tools: sync lib/find_bit implementation Yury Norov
@ 2021-04-01  0:31 ` Yury Norov
  2021-04-01  9:14 ` [PATCH v6 00/12] lib/find_bit: fast path for small bitmaps Andy Shevchenko
  12 siblings, 0 replies; 34+ messages in thread
From: Yury Norov @ 2021-04-01  0:31 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton
  Cc: Yury Norov, linux-m68k, linux-arch, linux-sh, Alexey Klimov,
	Andy Shevchenko, Arnd Bergmann, David Sterba, Dennis Zhou,
	Geert Uytterhoeven, Jianpeng Ma, Joe Perches,
	John Paul Adrian Glaubitz, Josh Poimboeuf, Rasmus Villemoes,
	Rich Felker, Stefano Brivio, Wei Yang, Wolfram Sang,
	Yoshinori Sato

Add myself as maintainer for bitmap API and Andy and Rasmus as reviewers.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
---
 MAINTAINERS | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index bbd4484b5ba6..fcf89f8cbb6a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3172,6 +3172,22 @@ F:	Documentation/filesystems/bfs.rst
 F:	fs/bfs/
 F:	include/uapi/linux/bfs_fs.h
 
+BITMAP API
+M:	Yury Norov <yury.norov@gmail.com>
+R:	Andy Shevchenko <andriy.shevchenko@linux.intel.com>
+R:	Rasmus Villemoes <linux@rasmusvillemoes.dk>
+S:	Maintained
+F:	include/asm-generic/bitops/find.h
+F:	include/linux/bitmap.h
+F:	lib/bitmap.c
+F:	lib/find_bit.c
+F:	lib/find_bit_benchmark.c
+F:	lib/test_bitmap.c
+F:	tools/include/asm-generic/bitops/find.h
+F:	tools/include/linux/bitmap.h
+F:	tools/lib/bitmap.c
+F:	tools/lib/find_bit.c
+
 BLINKM RGB LED DRIVER
 M:	Jan-Simon Moeller <jansimon.moeller@gmx.de>
 S:	Maintained
-- 
2.25.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 05/12] lib: extend the scope of small_const_nbits() macro
  2021-04-01  0:31 ` [PATCH 05/12] lib: extend the scope of small_const_nbits() macro Yury Norov
@ 2021-04-01  8:35   ` Andy Shevchenko
  0 siblings, 0 replies; 34+ messages in thread
From: Andy Shevchenko @ 2021-04-01  8:35 UTC (permalink / raw)
  To: Yury Norov
  Cc: Linux Kernel Mailing List, Andrew Morton, linux-m68k, Linux-Arch,
	Linux-SH, Alexey Klimov, Andy Shevchenko, Arnd Bergmann,
	David Sterba, Dennis Zhou, Geert Uytterhoeven, Jianpeng Ma,
	Joe Perches, John Paul Adrian Glaubitz, Josh Poimboeuf,
	Rasmus Villemoes, Rich Felker, Stefano Brivio, Wei Yang,
	Wolfram Sang, Yoshinori Sato

On Thu, Apr 1, 2021 at 3:41 AM Yury Norov <yury.norov@gmail.com> wrote:
>
> find_bit would also benefit from small_const_nbits() optimizations.
> The detailed comment is provided by Rasmus Villemoes.

Thanks, now it looks good!
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>

> Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
> ---
>  include/asm-generic/bitsperlong.h | 12 ++++++++++++
>  include/linux/bitmap.h            |  8 --------
>  2 files changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/include/asm-generic/bitsperlong.h b/include/asm-generic/bitsperlong.h
> index 3905c1c93dc2..1023e2a4bd37 100644
> --- a/include/asm-generic/bitsperlong.h
> +++ b/include/asm-generic/bitsperlong.h
> @@ -23,4 +23,16 @@
>  #define BITS_PER_LONG_LONG 64
>  #endif
>
> +/*
> + * small_const_nbits(n) is true precisely when it is known at compile-time
> + * that BITMAP_SIZE(n) is 1, i.e. 1 <= n <= BITS_PER_LONG. This allows
> + * various bit/bitmap APIs to provide a fast inline implementation. Bitmaps
> + * of size 0 are very rare, and a compile-time-known-size 0 is most likely
> + * a sign of error. They will be handled correctly by the bit/bitmap APIs,
> + * but using the out-of-line functions, so that the inline implementations
> + * can unconditionally dereference the pointer(s).
> + */
> +#define small_const_nbits(nbits) \
> +       (__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG && (nbits) > 0)
> +
>  #endif /* __ASM_GENERIC_BITS_PER_LONG */
> diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
> index 2cb1d7cfe8f9..a36cfcec4e77 100644
> --- a/include/linux/bitmap.h
> +++ b/include/linux/bitmap.h
> @@ -230,14 +230,6 @@ int bitmap_print_to_pagebuf(bool list, char *buf,
>  #define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) & (BITS_PER_LONG - 1)))
>  #define BITMAP_LAST_WORD_MASK(nbits) (~0UL >> (-(nbits) & (BITS_PER_LONG - 1)))
>
> -/*
> - * The static inlines below do not handle constant nbits==0 correctly,
> - * so make such users (should any ever turn up) call the out-of-line
> - * versions.
> - */
> -#define small_const_nbits(nbits) \
> -       (__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG && (nbits) > 0)
> -
>  static inline void bitmap_zero(unsigned long *dst, unsigned int nbits)
>  {
>         unsigned int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long);
> --
> 2.25.1
>


-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 07/12] lib: inline _find_next_bit() wrappers
  2021-04-01  0:31 ` [PATCH 07/12] lib: inline _find_next_bit() wrappers Yury Norov
@ 2021-04-01  8:37   ` Andy Shevchenko
  0 siblings, 0 replies; 34+ messages in thread
From: Andy Shevchenko @ 2021-04-01  8:37 UTC (permalink / raw)
  To: Yury Norov
  Cc: Linux Kernel Mailing List, Andrew Morton, linux-m68k, Linux-Arch,
	Linux-SH, Alexey Klimov, Andy Shevchenko, Arnd Bergmann,
	David Sterba, Dennis Zhou, Geert Uytterhoeven, Jianpeng Ma,
	Joe Perches, John Paul Adrian Glaubitz, Josh Poimboeuf,
	Rasmus Villemoes, Rich Felker, Stefano Brivio, Wei Yang,
	Wolfram Sang, Yoshinori Sato

On Thu, Apr 1, 2021 at 3:42 AM Yury Norov <yury.norov@gmail.com> wrote:
>
> lib/find_bit.c declares five single-line wrappers for _find_next_bit().
> We may turn those wrappers to inline functions. It eliminates unneeded
> function calls and opens room for compile-time optimizations.

Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>

> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
> ---
>  include/asm-generic/bitops/find.h | 28 ++++++++++++----
>  include/asm-generic/bitops/le.h   | 17 +++++++---
>  lib/find_bit.c                    | 56 ++-----------------------------
>  3 files changed, 37 insertions(+), 64 deletions(-)
>
> diff --git a/include/asm-generic/bitops/find.h b/include/asm-generic/bitops/find.h
> index 9fdf21302fdf..7ad70dab8e93 100644
> --- a/include/asm-generic/bitops/find.h
> +++ b/include/asm-generic/bitops/find.h
> @@ -2,6 +2,10 @@
>  #ifndef _ASM_GENERIC_BITOPS_FIND_H_
>  #define _ASM_GENERIC_BITOPS_FIND_H_
>
> +extern unsigned long _find_next_bit(const unsigned long *addr1,
> +               const unsigned long *addr2, unsigned long nbits,
> +               unsigned long start, unsigned long invert, unsigned long le);
> +
>  #ifndef find_next_bit
>  /**
>   * find_next_bit - find the next set bit in a memory region
> @@ -12,8 +16,12 @@
>   * Returns the bit number for the next set bit
>   * If no bits are set, returns @size.
>   */
> -extern unsigned long find_next_bit(const unsigned long *addr, unsigned long
> -               size, unsigned long offset);
> +static inline
> +unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
> +                           unsigned long offset)
> +{
> +       return _find_next_bit(addr, NULL, size, offset, 0UL, 0);
> +}
>  #endif
>
>  #ifndef find_next_and_bit
> @@ -27,9 +35,13 @@ extern unsigned long find_next_bit(const unsigned long *addr, unsigned long
>   * Returns the bit number for the next set bit
>   * If no bits are set, returns @size.
>   */
> -extern unsigned long find_next_and_bit(const unsigned long *addr1,
> +static inline
> +unsigned long find_next_and_bit(const unsigned long *addr1,
>                 const unsigned long *addr2, unsigned long size,
> -               unsigned long offset);
> +               unsigned long offset)
> +{
> +       return _find_next_bit(addr1, addr2, size, offset, 0UL, 0);
> +}
>  #endif
>
>  #ifndef find_next_zero_bit
> @@ -42,8 +54,12 @@ extern unsigned long find_next_and_bit(const unsigned long *addr1,
>   * Returns the bit number of the next zero bit
>   * If no bits are zero, returns @size.
>   */
> -extern unsigned long find_next_zero_bit(const unsigned long *addr, unsigned
> -               long size, unsigned long offset);
> +static inline
> +unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
> +                                unsigned long offset)
> +{
> +       return _find_next_bit(addr, NULL, size, offset, ~0UL, 0);
> +}
>  #endif
>
>  #ifdef CONFIG_GENERIC_FIND_FIRST_BIT
> diff --git a/include/asm-generic/bitops/le.h b/include/asm-generic/bitops/le.h
> index 188d3eba3ace..21305f6cea0b 100644
> --- a/include/asm-generic/bitops/le.h
> +++ b/include/asm-generic/bitops/le.h
> @@ -2,6 +2,7 @@
>  #ifndef _ASM_GENERIC_BITOPS_LE_H_
>  #define _ASM_GENERIC_BITOPS_LE_H_
>
> +#include <asm-generic/bitops/find.h>
>  #include <asm/types.h>
>  #include <asm/byteorder.h>
>
> @@ -32,13 +33,21 @@ static inline unsigned long find_first_zero_bit_le(const void *addr,
>  #define BITOP_LE_SWIZZLE       ((BITS_PER_LONG-1) & ~0x7)
>
>  #ifndef find_next_zero_bit_le
> -extern unsigned long find_next_zero_bit_le(const void *addr,
> -               unsigned long size, unsigned long offset);
> +static inline
> +unsigned long find_next_zero_bit_le(const void *addr, unsigned
> +               long size, unsigned long offset)
> +{
> +       return _find_next_bit(addr, NULL, size, offset, ~0UL, 1);
> +}
>  #endif
>
>  #ifndef find_next_bit_le
> -extern unsigned long find_next_bit_le(const void *addr,
> -               unsigned long size, unsigned long offset);
> +static inline
> +unsigned long find_next_bit_le(const void *addr, unsigned
> +               long size, unsigned long offset)
> +{
> +       return _find_next_bit(addr, NULL, size, offset, 0UL, 1);
> +}
>  #endif
>
>  #ifndef find_first_zero_bit_le
> diff --git a/lib/find_bit.c b/lib/find_bit.c
> index f67f86fd2f62..b03a101367f8 100644
> --- a/lib/find_bit.c
> +++ b/lib/find_bit.c
> @@ -29,7 +29,7 @@
>   *    searching it for one bits.
>   *  - The optional "addr2", which is anded with "addr1" if present.
>   */
> -static unsigned long _find_next_bit(const unsigned long *addr1,
> +unsigned long _find_next_bit(const unsigned long *addr1,
>                 const unsigned long *addr2, unsigned long nbits,
>                 unsigned long start, unsigned long invert, unsigned long le)
>  {
> @@ -68,37 +68,7 @@ static unsigned long _find_next_bit(const unsigned long *addr1,
>
>         return min(start + __ffs(tmp), nbits);
>  }
> -#endif
> -
> -#ifndef find_next_bit
> -/*
> - * Find the next set bit in a memory region.
> - */
> -unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
> -                           unsigned long offset)
> -{
> -       return _find_next_bit(addr, NULL, size, offset, 0UL, 0);
> -}
> -EXPORT_SYMBOL(find_next_bit);
> -#endif
> -
> -#ifndef find_next_zero_bit
> -unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
> -                                unsigned long offset)
> -{
> -       return _find_next_bit(addr, NULL, size, offset, ~0UL, 0);
> -}
> -EXPORT_SYMBOL(find_next_zero_bit);
> -#endif
> -
> -#if !defined(find_next_and_bit)
> -unsigned long find_next_and_bit(const unsigned long *addr1,
> -               const unsigned long *addr2, unsigned long size,
> -               unsigned long offset)
> -{
> -       return _find_next_bit(addr1, addr2, size, offset, 0UL, 0);
> -}
> -EXPORT_SYMBOL(find_next_and_bit);
> +EXPORT_SYMBOL(_find_next_bit);
>  #endif
>
>  #ifndef find_first_bit
> @@ -157,28 +127,6 @@ unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
>  EXPORT_SYMBOL(find_last_bit);
>  #endif
>
> -#ifdef __BIG_ENDIAN
> -
> -#ifndef find_next_zero_bit_le
> -unsigned long find_next_zero_bit_le(const void *addr, unsigned
> -               long size, unsigned long offset)
> -{
> -       return _find_next_bit(addr, NULL, size, offset, ~0UL, 1);
> -}
> -EXPORT_SYMBOL(find_next_zero_bit_le);
> -#endif
> -
> -#ifndef find_next_bit_le
> -unsigned long find_next_bit_le(const void *addr, unsigned
> -               long size, unsigned long offset)
> -{
> -       return _find_next_bit(addr, NULL, size, offset, 0UL, 1);
> -}
> -EXPORT_SYMBOL(find_next_bit_le);
> -#endif
> -
> -#endif /* __BIG_ENDIAN */
> -
>  unsigned long find_next_clump8(unsigned long *clump, const unsigned long *addr,
>                                unsigned long size, unsigned long offset)
>  {
> --
> 2.25.1
>


-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 09/12] lib: add fast path for find_next_*_bit()
  2021-04-01  0:31 ` [PATCH 09/12] lib: add fast path for find_next_*_bit() Yury Norov
@ 2021-04-01  8:48   ` Andy Shevchenko
  0 siblings, 0 replies; 34+ messages in thread
From: Andy Shevchenko @ 2021-04-01  8:48 UTC (permalink / raw)
  To: Yury Norov
  Cc: Linux Kernel Mailing List, Andrew Morton, linux-m68k, Linux-Arch,
	Linux-SH, Alexey Klimov, Andy Shevchenko, Arnd Bergmann,
	David Sterba, Dennis Zhou, Geert Uytterhoeven, Jianpeng Ma,
	Joe Perches, John Paul Adrian Glaubitz, Josh Poimboeuf,
	Rasmus Villemoes, Rich Felker, Stefano Brivio, Wei Yang,
	Wolfram Sang, Yoshinori Sato

On Thu, Apr 1, 2021 at 3:41 AM Yury Norov <yury.norov@gmail.com> wrote:
>
> Similarly to bitmap functions, find_next_*_bit() users will benefit
> if we'll handle a case of bitmaps that fit into a single word inline.
> In the very best case, the compiler may replace a function call with
> a few instructions.
>
> This is the quite typical find_next_bit() user:
>
>         unsigned int cpumask_next(int n, const struct cpumask *srcp)
>         {
>                 /* -1 is a legal arg here. */
>                 if (n != -1)
>                         cpumask_check(n);
>                 return find_next_bit(cpumask_bits(srcp), nr_cpumask_bits, n + 1);
>         }
>         EXPORT_SYMBOL(cpumask_next);
>
> Currently, on ARM64 the generated code looks like this:
>         0000000000000000 <cpumask_next>:
>            0:   a9bf7bfd        stp     x29, x30, [sp, #-16]!
>            4:   11000402        add     w2, w0, #0x1
>            8:   aa0103e0        mov     x0, x1
>            c:   d2800401        mov     x1, #0x40                       // #64
>           10:   910003fd        mov     x29, sp
>           14:   93407c42        sxtw    x2, w2
>           18:   94000000        bl      0 <find_next_bit>
>           1c:   a8c17bfd        ldp     x29, x30, [sp], #16
>           20:   d65f03c0        ret
>           24:   d503201f        nop
>
> After applying this patch:
>         0000000000000140 <cpumask_next>:
>          140:   11000400        add     w0, w0, #0x1
>          144:   93407c00        sxtw    x0, w0
>          148:   f100fc1f        cmp     x0, #0x3f
>          14c:   54000168        b.hi    178 <cpumask_next+0x38>  // b.pmore
>          150:   f9400023        ldr     x3, [x1]
>          154:   92800001        mov     x1, #0xffffffffffffffff         // #-1
>          158:   9ac02020        lsl     x0, x1, x0
>          15c:   52800802        mov     w2, #0x40                       // #64
>          160:   8a030001        and     x1, x0, x3
>          164:   dac00020        rbit    x0, x1
>          168:   f100003f        cmp     x1, #0x0
>          16c:   dac01000        clz     x0, x0
>          170:   1a800040        csel    w0, w2, w0, eq  // eq = none
>          174:   d65f03c0        ret
>          178:   52800800        mov     w0, #0x40                       // #64
>          17c:   d65f03c0        ret
>
> find_next_bit() call is replaced with 6 instructions. find_next_bit()
> itself is 41 instructions plus function call overhead.
>
> Despite inlining, the scripts/bloat-o-meter report smaller .text size
> after applying the series:
>         add/remove: 11/9 grow/shrink: 233/176 up/down: 5780/-6768 (-988)

Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>

> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
> ---
>  include/asm-generic/bitops/find.h | 30 ++++++++++++++++++++++++++++++
>  include/asm-generic/bitops/le.h   | 21 +++++++++++++++++++++
>  2 files changed, 51 insertions(+)
>
> diff --git a/include/asm-generic/bitops/find.h b/include/asm-generic/bitops/find.h
> index 7ad70dab8e93..4148c74a1e4d 100644
> --- a/include/asm-generic/bitops/find.h
> +++ b/include/asm-generic/bitops/find.h
> @@ -20,6 +20,16 @@ static inline
>  unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
>                             unsigned long offset)
>  {
> +       if (small_const_nbits(size)) {
> +               unsigned long val;
> +
> +               if (unlikely(offset >= size))
> +                       return size;
> +
> +               val = *addr & GENMASK(size - 1, offset);
> +               return val ? __ffs(val) : size;
> +       }
> +
>         return _find_next_bit(addr, NULL, size, offset, 0UL, 0);
>  }
>  #endif
> @@ -40,6 +50,16 @@ unsigned long find_next_and_bit(const unsigned long *addr1,
>                 const unsigned long *addr2, unsigned long size,
>                 unsigned long offset)
>  {
> +       if (small_const_nbits(size)) {
> +               unsigned long val;
> +
> +               if (unlikely(offset >= size))
> +                       return size;
> +
> +               val = *addr1 & *addr2 & GENMASK(size - 1, offset);
> +               return val ? __ffs(val) : size;
> +       }
> +
>         return _find_next_bit(addr1, addr2, size, offset, 0UL, 0);
>  }
>  #endif
> @@ -58,6 +78,16 @@ static inline
>  unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
>                                  unsigned long offset)
>  {
> +       if (small_const_nbits(size)) {
> +               unsigned long val;
> +
> +               if (unlikely(offset >= size))
> +                       return size;
> +
> +               val = *addr | ~GENMASK(size - 1, offset);
> +               return val == ~0UL ? size : ffz(val);
> +       }
> +
>         return _find_next_bit(addr, NULL, size, offset, ~0UL, 0);
>  }
>  #endif
> diff --git a/include/asm-generic/bitops/le.h b/include/asm-generic/bitops/le.h
> index 21305f6cea0b..5a28629cbf4d 100644
> --- a/include/asm-generic/bitops/le.h
> +++ b/include/asm-generic/bitops/le.h
> @@ -5,6 +5,7 @@
>  #include <asm-generic/bitops/find.h>
>  #include <asm/types.h>
>  #include <asm/byteorder.h>
> +#include <linux/swab.h>
>
>  #if defined(__LITTLE_ENDIAN)
>
> @@ -37,6 +38,16 @@ static inline
>  unsigned long find_next_zero_bit_le(const void *addr, unsigned
>                 long size, unsigned long offset)
>  {
> +       if (small_const_nbits(size)) {
> +               unsigned long val = *(const unsigned long *)addr;
> +
> +               if (unlikely(offset >= size))
> +                       return size;
> +
> +               val = swab(val) | ~GENMASK(size - 1, offset);
> +               return val == ~0UL ? size : ffz(val);
> +       }
> +
>         return _find_next_bit(addr, NULL, size, offset, ~0UL, 1);
>  }
>  #endif
> @@ -46,6 +57,16 @@ static inline
>  unsigned long find_next_bit_le(const void *addr, unsigned
>                 long size, unsigned long offset)
>  {
> +       if (small_const_nbits(size)) {
> +               unsigned long val = *(const unsigned long *)addr;
> +
> +               if (unlikely(offset >= size))
> +                       return size;
> +
> +               val = swab(val) & GENMASK(size - 1, offset);
> +               return val ? __ffs(val) : size;
> +       }
> +
>         return _find_next_bit(addr, NULL, size, offset, 0UL, 1);
>  }
>  #endif
> --
> 2.25.1
>


-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 10/12] lib: add fast path for find_first_*_bit() and find_last_bit()
  2021-04-01  0:31 ` [PATCH 10/12] lib: add fast path for find_first_*_bit() and find_last_bit() Yury Norov
@ 2021-04-01  8:58   ` Andy Shevchenko
  0 siblings, 0 replies; 34+ messages in thread
From: Andy Shevchenko @ 2021-04-01  8:58 UTC (permalink / raw)
  To: Yury Norov
  Cc: Linux Kernel Mailing List, Andrew Morton, linux-m68k, Linux-Arch,
	Linux-SH, Alexey Klimov, Andy Shevchenko, Arnd Bergmann,
	David Sterba, Dennis Zhou, Geert Uytterhoeven, Jianpeng Ma,
	Joe Perches, John Paul Adrian Glaubitz, Josh Poimboeuf,
	Rasmus Villemoes, Rich Felker, Stefano Brivio, Wei Yang,
	Wolfram Sang, Yoshinori Sato

On Thu, Apr 1, 2021 at 3:41 AM Yury Norov <yury.norov@gmail.com> wrote:
>
> Similarly to bitmap functions, users would benefit if we'll handle
> a case of small-size bitmaps that fit into a single word.
>
> While here, move the find_last_bit() declaration to bitops/find.h
> where other find_*_bit() functions sit.

Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>

> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
> ---
>  include/asm-generic/bitops/find.h | 50 ++++++++++++++++++++++++++++---
>  include/linux/bitops.h            | 12 --------
>  lib/find_bit.c                    | 12 ++++----
>  3 files changed, 52 insertions(+), 22 deletions(-)
>
> diff --git a/include/asm-generic/bitops/find.h b/include/asm-generic/bitops/find.h
> index 4148c74a1e4d..0d132ee2a291 100644
> --- a/include/asm-generic/bitops/find.h
> +++ b/include/asm-generic/bitops/find.h
> @@ -5,6 +5,9 @@
>  extern unsigned long _find_next_bit(const unsigned long *addr1,
>                 const unsigned long *addr2, unsigned long nbits,
>                 unsigned long start, unsigned long invert, unsigned long le);
> +extern unsigned long _find_first_bit(const unsigned long *addr, unsigned long size);
> +extern unsigned long _find_first_zero_bit(const unsigned long *addr, unsigned long size);
> +extern unsigned long _find_last_bit(const unsigned long *addr, unsigned long size);
>
>  #ifndef find_next_bit
>  /**
> @@ -102,8 +105,17 @@ unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
>   * Returns the bit number of the first set bit.
>   * If no bits are set, returns @size.
>   */
> -extern unsigned long find_first_bit(const unsigned long *addr,
> -                                   unsigned long size);
> +static inline
> +unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
> +{
> +       if (small_const_nbits(size)) {
> +               unsigned long val = *addr & GENMASK(size - 1, 0);
> +
> +               return val ? __ffs(val) : size;
> +       }
> +
> +       return _find_first_bit(addr, size);
> +}
>
>  /**
>   * find_first_zero_bit - find the first cleared bit in a memory region
> @@ -113,8 +125,17 @@ extern unsigned long find_first_bit(const unsigned long *addr,
>   * Returns the bit number of the first cleared bit.
>   * If no bits are zero, returns @size.
>   */
> -extern unsigned long find_first_zero_bit(const unsigned long *addr,
> -                                        unsigned long size);
> +static inline
> +unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
> +{
> +       if (small_const_nbits(size)) {
> +               unsigned long val = *addr | ~GENMASK(size - 1, 0);
> +
> +               return val == ~0UL ? size : ffz(val);
> +       }
> +
> +       return _find_first_zero_bit(addr, size);
> +}
>  #else /* CONFIG_GENERIC_FIND_FIRST_BIT */
>
>  #ifndef find_first_bit
> @@ -126,6 +147,27 @@ extern unsigned long find_first_zero_bit(const unsigned long *addr,
>
>  #endif /* CONFIG_GENERIC_FIND_FIRST_BIT */
>
> +#ifndef find_last_bit
> +/**
> + * find_last_bit - find the last set bit in a memory region
> + * @addr: The address to start the search at
> + * @size: The number of bits to search
> + *
> + * Returns the bit number of the last set bit, or size.
> + */
> +static inline
> +unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
> +{
> +       if (small_const_nbits(size)) {
> +               unsigned long val = *addr & GENMASK(size - 1, 0);
> +
> +               return val ? __fls(val) : size;
> +       }
> +
> +       return _find_last_bit(addr, size);
> +}
> +#endif
> +
>  /**
>   * find_next_clump8 - find next 8-bit clump with set bits in a memory region
>   * @clump: location to store copy of found clump
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index a5a48303b0f1..26bf15e6cd35 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -286,17 +286,5 @@ static __always_inline void __assign_bit(long nr, volatile unsigned long *addr,
>  })
>  #endif
>
> -#ifndef find_last_bit
> -/**
> - * find_last_bit - find the last set bit in a memory region
> - * @addr: The address to start the search at
> - * @size: The number of bits to search
> - *
> - * Returns the bit number of the last set bit, or size.
> - */
> -extern unsigned long find_last_bit(const unsigned long *addr,
> -                                  unsigned long size);
> -#endif
> -
>  #endif /* __KERNEL__ */
>  #endif
> diff --git a/lib/find_bit.c b/lib/find_bit.c
> index b03a101367f8..0f8e2e369b1d 100644
> --- a/lib/find_bit.c
> +++ b/lib/find_bit.c
> @@ -75,7 +75,7 @@ EXPORT_SYMBOL(_find_next_bit);
>  /*
>   * Find the first set bit in a memory region.
>   */
> -unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
> +unsigned long _find_first_bit(const unsigned long *addr, unsigned long size)
>  {
>         unsigned long idx;
>
> @@ -86,14 +86,14 @@ unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
>
>         return size;
>  }
> -EXPORT_SYMBOL(find_first_bit);
> +EXPORT_SYMBOL(_find_first_bit);
>  #endif
>
>  #ifndef find_first_zero_bit
>  /*
>   * Find the first cleared bit in a memory region.
>   */
> -unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
> +unsigned long _find_first_zero_bit(const unsigned long *addr, unsigned long size)
>  {
>         unsigned long idx;
>
> @@ -104,11 +104,11 @@ unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
>
>         return size;
>  }
> -EXPORT_SYMBOL(find_first_zero_bit);
> +EXPORT_SYMBOL(_find_first_zero_bit);
>  #endif
>
>  #ifndef find_last_bit
> -unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
> +unsigned long _find_last_bit(const unsigned long *addr, unsigned long size)
>  {
>         if (size) {
>                 unsigned long val = BITMAP_LAST_WORD_MASK(size);
> @@ -124,7 +124,7 @@ unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
>         }
>         return size;
>  }
> -EXPORT_SYMBOL(find_last_bit);
> +EXPORT_SYMBOL(_find_last_bit);
>  #endif
>
>  unsigned long find_next_clump8(unsigned long *clump, const unsigned long *addr,
> --
> 2.25.1
>


-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 00/12] lib/find_bit: fast path for small bitmaps
  2021-04-01  0:31 [PATCH v6 00/12] lib/find_bit: fast path for small bitmaps Yury Norov
                   ` (11 preceding siblings ...)
  2021-04-01  0:31 ` [PATCH 12/12] MAINTAINERS: Add entry for the bitmap API Yury Norov
@ 2021-04-01  9:14 ` Andy Shevchenko
  2021-04-01  9:28   ` Arnd Bergmann
  12 siblings, 1 reply; 34+ messages in thread
From: Andy Shevchenko @ 2021-04-01  9:14 UTC (permalink / raw)
  To: Yury Norov
  Cc: Linux Kernel Mailing List, Andrew Morton, linux-m68k, Linux-Arch,
	Linux-SH, Alexey Klimov, Andy Shevchenko, Arnd Bergmann,
	David Sterba, Dennis Zhou, Geert Uytterhoeven, Jianpeng Ma,
	Joe Perches, John Paul Adrian Glaubitz, Josh Poimboeuf,
	Rasmus Villemoes, Rich Felker, Stefano Brivio, Wei Yang,
	Wolfram Sang, Yoshinori Sato

On Thu, Apr 1, 2021 at 3:36 AM Yury Norov <yury.norov@gmail.com> wrote:
>
> Bitmap operations are much simpler and faster in case of small bitmaps
> which fit into a single word. In linux/bitmap.c we have a machinery that
> allows compiler to replace actual function call with a few instructions
> if bitmaps passed into the function are small and their size is known at
> compile time.
>
> find_*_bit() API lacks this functionality; but users will benefit from it
> a lot. One important example is cpumask subsystem when
> NR_CPUS <= BITS_PER_LONG.

Cool, thanks!

I guess it's assumed to go via Andrew's tree.

But after that since you are about to be a maintainer of this, I think
it would make sense to send PRs directly to Linus. I would recommend
creating an official tree (followed by an update in the MAINTAINERS)
and connecting it to Linux next (usually done by email to Stephen).


> v6 is mostly a resend. The only change comparing to v5 is a fix of
> small_const_nbits() synchronization patch.
>
> v1: https://www.spinics.net/lists/kernel/msg3804727.html
> v2: https://www.spinics.net/lists/linux-m68k/msg16945.html
> v3: https://www.spinics.net/lists/kernel/msg3837020.html
> v4: https://patchwork.kernel.org/project/linux-sh/cover/20210316015424.1999082-1-yury.norov@gmail.com/
> v5: https://lore.kernel.org/linux-arch/20210321215457.588554-1-yury.norov@gmail.com/T/
> v6: - sync small_const_nbits() properly (patch 6).
>     - Rasmus' ack added.
>
> Yury Norov (12):
>   tools: disable -Wno-type-limits
>   tools: bitmap: sync function declarations with the kernel
>   tools: sync BITMAP_LAST_WORD_MASK() macro with the kernel
>   arch: rearrange headers inclusion order in asm/bitops for m68k and sh
>   lib: extend the scope of small_const_nbits() macro
>   tools: sync small_const_nbits() macro with the kernel
>   lib: inline _find_next_bit() wrappers
>   tools: sync find_next_bit implementation
>   lib: add fast path for find_next_*_bit()
>   lib: add fast path for find_first_*_bit() and find_last_bit()
>   tools: sync lib/find_bit implementation
>   MAINTAINERS: Add entry for the bitmap API
>
>  MAINTAINERS                             |  16 ++++
>  arch/m68k/include/asm/bitops.h          |   6 +-
>  arch/sh/include/asm/bitops.h            |   5 +-
>  include/asm-generic/bitops/find.h       | 108 +++++++++++++++++++++---
>  include/asm-generic/bitops/le.h         |  38 ++++++++-
>  include/asm-generic/bitsperlong.h       |  12 +++
>  include/linux/bitmap.h                  |   8 --
>  include/linux/bitops.h                  |  12 ---
>  lib/find_bit.c                          |  68 ++-------------
>  tools/include/asm-generic/bitops/find.h |  85 +++++++++++++++++--
>  tools/include/asm-generic/bitsperlong.h |   3 +
>  tools/include/linux/bitmap.h            |  18 ++--
>  tools/lib/bitmap.c                      |   4 +-
>  tools/lib/find_bit.c                    |  56 +++++-------
>  tools/scripts/Makefile.include          |   1 +
>  15 files changed, 284 insertions(+), 156 deletions(-)
>
> --
> 2.25.1
>


--
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 00/12] lib/find_bit: fast path for small bitmaps
  2021-04-01  9:14 ` [PATCH v6 00/12] lib/find_bit: fast path for small bitmaps Andy Shevchenko
@ 2021-04-01  9:28   ` Arnd Bergmann
  2021-04-01  9:50     ` Andy Shevchenko
  0 siblings, 1 reply; 34+ messages in thread
From: Arnd Bergmann @ 2021-04-01  9:28 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Yury Norov, Linux Kernel Mailing List, Andrew Morton, linux-m68k,
	Linux-Arch, Linux-SH, Alexey Klimov, Andy Shevchenko,
	David Sterba, Dennis Zhou, Geert Uytterhoeven, Jianpeng Ma,
	Joe Perches, John Paul Adrian Glaubitz, Josh Poimboeuf,
	Rasmus Villemoes, Rich Felker, Stefano Brivio, Wei Yang,
	Wolfram Sang, Yoshinori Sato

On Thu, Apr 1, 2021 at 11:16 AM Andy Shevchenko
<andy.shevchenko@gmail.com> wrote:
>
> On Thu, Apr 1, 2021 at 3:36 AM Yury Norov <yury.norov@gmail.com> wrote:
> >
> > Bitmap operations are much simpler and faster in case of small bitmaps
> > which fit into a single word. In linux/bitmap.c we have a machinery that
> > allows compiler to replace actual function call with a few instructions
> > if bitmaps passed into the function are small and their size is known at
> > compile time.
> >
> > find_*_bit() API lacks this functionality; but users will benefit from it
> > a lot. One important example is cpumask subsystem when
> > NR_CPUS <= BITS_PER_LONG.
>
> Cool, thanks!
>
> I guess it's assumed to go via Andrew's tree.
>
> But after that since you are about to be a maintainer of this, I think
> it would make sense to send PRs directly to Linus. I would recommend
> creating an official tree (followed by an update in the MAINTAINERS)
> and connecting it to Linux next (usually done by email to Stephen).

It depends on how often we expect to see updates to this. I have not
followed the changes as closely as I should have, but I can also
merge them through the asm-generic tree for this time so Andrew
has to carry fewer patches for this.

I normally don't have a lot of material for asm-generic either, half
the time there are no pull requests at all for a given release. I would
expect future changes to the bitmap implementation to only need
an occasional bugfix, which could go through either the asm-generic
tree or through mm and doesn't need another separate pull request.

If it turns out to be a tree that needs regular updates every time,
then having a top level repository in linux-next would be appropriate.

        Arnd

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 00/12] lib/find_bit: fast path for small bitmaps
  2021-04-01  9:28   ` Arnd Bergmann
@ 2021-04-01  9:50     ` Andy Shevchenko
  2021-04-02  0:32       ` Andrew Morton
  0 siblings, 1 reply; 34+ messages in thread
From: Andy Shevchenko @ 2021-04-01  9:50 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Yury Norov, Linux Kernel Mailing List, Andrew Morton, linux-m68k,
	Linux-Arch, Linux-SH, Alexey Klimov, Andy Shevchenko,
	David Sterba, Dennis Zhou, Geert Uytterhoeven, Jianpeng Ma,
	Joe Perches, John Paul Adrian Glaubitz, Josh Poimboeuf,
	Rasmus Villemoes, Rich Felker, Stefano Brivio, Wei Yang,
	Wolfram Sang, Yoshinori Sato

On Thu, Apr 1, 2021 at 12:29 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Thu, Apr 1, 2021 at 11:16 AM Andy Shevchenko
> <andy.shevchenko@gmail.com> wrote:
> >
> > On Thu, Apr 1, 2021 at 3:36 AM Yury Norov <yury.norov@gmail.com> wrote:
> > >
> > > Bitmap operations are much simpler and faster in case of small bitmaps
> > > which fit into a single word. In linux/bitmap.c we have a machinery that
> > > allows compiler to replace actual function call with a few instructions
> > > if bitmaps passed into the function are small and their size is known at
> > > compile time.
> > >
> > > find_*_bit() API lacks this functionality; but users will benefit from it
> > > a lot. One important example is cpumask subsystem when
> > > NR_CPUS <= BITS_PER_LONG.
> >
> > Cool, thanks!
> >
> > I guess it's assumed to go via Andrew's tree.
> >
> > But after that since you are about to be a maintainer of this, I think
> > it would make sense to send PRs directly to Linus. I would recommend
> > creating an official tree (followed by an update in the MAINTAINERS)
> > and connecting it to Linux next (usually done by email to Stephen).
>
> It depends on how often we expect to see updates to this. I have not
> followed the changes as closely as I should have, but I can also
> merge them through the asm-generic tree for this time so Andrew
> has to carry fewer patches for this.
>
> I normally don't have a lot of material for asm-generic either, half
> the time there are no pull requests at all for a given release. I would
> expect future changes to the bitmap implementation to only need
> an occasional bugfix, which could go through either the asm-generic
> tree or through mm and doesn't need another separate pull request.
>
> If it turns out to be a tree that needs regular updates every time,
> then having a top level repository in linux-next would be appropriate.

Agree. asm-generic may serve for this. My worries are solely about how
much burden we add on Andrew's shoulders.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 00/12] lib/find_bit: fast path for small bitmaps
  2021-04-01  9:50     ` Andy Shevchenko
@ 2021-04-02  0:32       ` Andrew Morton
  0 siblings, 0 replies; 34+ messages in thread
From: Andrew Morton @ 2021-04-02  0:32 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Arnd Bergmann, Yury Norov, Linux Kernel Mailing List, linux-m68k,
	Linux-Arch, Linux-SH, Alexey Klimov, Andy Shevchenko,
	David Sterba, Dennis Zhou, Geert Uytterhoeven, Jianpeng Ma,
	Joe Perches, John Paul Adrian Glaubitz, Josh Poimboeuf,
	Rasmus Villemoes, Rich Felker, Stefano Brivio, Wei Yang,
	Wolfram Sang, Yoshinori Sato

On Thu, 1 Apr 2021 12:50:31 +0300 Andy Shevchenko <andy.shevchenko@gmail.com> wrote:

> > I normally don't have a lot of material for asm-generic either, half
> > the time there are no pull requests at all for a given release. I would
> > expect future changes to the bitmap implementation to only need
> > an occasional bugfix, which could go through either the asm-generic
> > tree or through mm and doesn't need another separate pull request.
> >
> > If it turns out to be a tree that needs regular updates every time,
> > then having a top level repository in linux-next would be appropriate.
> 
> Agree. asm-generic may serve for this. My worries are solely about how
> much burden we add on Andrew's shoulders.

Is fine.  Saving other developers from having to maintain tiny trees is
a thing I do.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 11/12] tools: sync lib/find_bit implementation
  2021-04-01  0:31 ` [PATCH 11/12] tools: sync lib/find_bit implementation Yury Norov
@ 2021-05-10 15:27   ` Tetsuo Handa
  2021-05-10 15:44     ` Andy Shevchenko
  0 siblings, 1 reply; 34+ messages in thread
From: Tetsuo Handa @ 2021-05-10 15:27 UTC (permalink / raw)
  To: Yury Norov, linux-kernel, Andrew Morton, Rasmus Villemoes
  Cc: linux-m68k, linux-arch, linux-sh, Alexey Klimov, Andy Shevchenko,
	Arnd Bergmann, David Sterba, Dennis Zhou, Geert Uytterhoeven,
	Jianpeng Ma, Joe Perches, John Paul Adrian Glaubitz,
	Josh Poimboeuf, Rich Felker, Stefano Brivio, Wei Yang,
	Wolfram Sang, Yoshinori Sato

Commit eaae7841ba83bb42 ("tools: sync lib/find_bit implementation") broke
build of 5.13-rc1 using gcc (GCC) 8.3.1 20190311 (Red Hat 8.3.1-3).

  DESCEND  objtool
  CC       /usr/src/linux/tools/objtool/exec-cmd.o
  CC       /usr/src/linux/tools/objtool/help.o
  CC       /usr/src/linux/tools/objtool/pager.o
  CC       /usr/src/linux/tools/objtool/parse-options.o
  CC       /usr/src/linux/tools/objtool/run-command.o
  CC       /usr/src/linux/tools/objtool/sigchain.o
  CC       /usr/src/linux/tools/objtool/subcmd-config.o
  LD       /usr/src/linux/tools/objtool/libsubcmd-in.o
  AR       /usr/src/linux/tools/objtool/libsubcmd.a
  CC       /usr/src/linux/tools/objtool/arch/x86/special.o
In file included from /usr/src/linux/tools/include/linux/kernel.h:8:0,
                 from /usr/src/linux/tools/include/linux/list.h:7,
                 from /usr/src/linux/tools/objtool/include/objtool/arch.h:10,
                 from /usr/src/linux/tools/objtool/include/objtool/check.h:11,
                 from /usr/src/linux/tools/objtool/include/objtool/special.h:10,
                 from arch/x86/special.c:4:
/usr/src/linux/tools/include/asm-generic/bitops/find.h: In function 'find_next_bit':
/usr/src/linux/tools/include/linux/bits.h:24:21: error: first argument to '__builtin_choose_expr' not a constant
  (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
                     ^
/usr/src/linux/tools/include/linux/build_bug.h:16:62: note: in definition of macro 'BUILD_BUG_ON_ZERO'
 #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
                                                              ^
/usr/src/linux/tools/include/linux/bits.h:38:3: note: in expansion of macro 'GENMASK_INPUT_CHECK'
  (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
   ^
/usr/src/linux/tools/include/asm-generic/bitops/find.h:32:17: note: in expansion of macro 'GENMASK'
   val = *addr & GENMASK(size - 1, offset);
                 ^
/usr/src/linux/tools/include/linux/build_bug.h:16:51: error: bit-field '<anonymous>' width not an integer constant
 #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
                                                   ^
/usr/src/linux/tools/include/linux/bits.h:24:3: note: in expansion of macro 'BUILD_BUG_ON_ZERO'
  (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
   ^
/usr/src/linux/tools/include/linux/bits.h:38:3: note: in expansion of macro 'GENMASK_INPUT_CHECK'
  (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
   ^
/usr/src/linux/tools/include/asm-generic/bitops/find.h:32:17: note: in expansion of macro 'GENMASK'
   val = *addr & GENMASK(size - 1, offset);
                 ^
/usr/src/linux/tools/include/asm-generic/bitops/find.h: In function 'find_next_and_bit':
/usr/src/linux/tools/include/linux/bits.h:24:21: error: first argument to '__builtin_choose_expr' not a constant
  (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
                     ^
/usr/src/linux/tools/include/linux/build_bug.h:16:62: note: in definition of macro 'BUILD_BUG_ON_ZERO'
 #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
                                                              ^
/usr/src/linux/tools/include/linux/bits.h:38:3: note: in expansion of macro 'GENMASK_INPUT_CHECK'
  (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
   ^
/usr/src/linux/tools/include/asm-generic/bitops/find.h:62:27: note: in expansion of macro 'GENMASK'
   val = *addr1 & *addr2 & GENMASK(size - 1, offset);
                           ^
/usr/src/linux/tools/include/linux/build_bug.h:16:51: error: bit-field '<anonymous>' width not an integer constant
 #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
                                                   ^
/usr/src/linux/tools/include/linux/bits.h:24:3: note: in expansion of macro 'BUILD_BUG_ON_ZERO'
  (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
   ^
/usr/src/linux/tools/include/linux/bits.h:38:3: note: in expansion of macro 'GENMASK_INPUT_CHECK'
  (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
   ^
/usr/src/linux/tools/include/asm-generic/bitops/find.h:62:27: note: in expansion of macro 'GENMASK'
   val = *addr1 & *addr2 & GENMASK(size - 1, offset);
                           ^
/usr/src/linux/tools/include/asm-generic/bitops/find.h: In function 'find_next_zero_bit':
/usr/src/linux/tools/include/linux/bits.h:24:21: error: first argument to '__builtin_choose_expr' not a constant
  (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
                     ^
/usr/src/linux/tools/include/linux/build_bug.h:16:62: note: in definition of macro 'BUILD_BUG_ON_ZERO'
 #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
                                                              ^
/usr/src/linux/tools/include/linux/bits.h:38:3: note: in expansion of macro 'GENMASK_INPUT_CHECK'
  (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
   ^
/usr/src/linux/tools/include/asm-generic/bitops/find.h:90:18: note: in expansion of macro 'GENMASK'
   val = *addr | ~GENMASK(size - 1, offset);
                  ^
/usr/src/linux/tools/include/linux/build_bug.h:16:51: error: bit-field '<anonymous>' width not an integer constant
 #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
                                                   ^
/usr/src/linux/tools/include/linux/bits.h:24:3: note: in expansion of macro 'BUILD_BUG_ON_ZERO'
  (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
   ^
/usr/src/linux/tools/include/linux/bits.h:38:3: note: in expansion of macro 'GENMASK_INPUT_CHECK'
  (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
   ^
/usr/src/linux/tools/include/asm-generic/bitops/find.h:90:18: note: in expansion of macro 'GENMASK'
   val = *addr | ~GENMASK(size - 1, offset);
                  ^
make[5]: *** [/usr/src/linux/tools/objtool/arch/x86/special.o] Error 1
make[4]: *** [arch/x86] Error 2
make[3]: *** [/usr/src/linux/tools/objtool/objtool-in.o] Error 2
make[2]: *** [objtool] Error 2
make[1]: *** [tools/objtool] Error 2
make: *** [__sub-make] Error 2



Applying below diff seems to solve the build failure.
Do we need to use BUILD_BUG_ON_ZERO() here?

diff --git a/tools/include/linux/bits.h b/tools/include/linux/bits.h
index 7f475d59a097..0aba9294f29d 100644
--- a/tools/include/linux/bits.h
+++ b/tools/include/linux/bits.h
@@ -21,8 +21,7 @@
 #if !defined(__ASSEMBLY__)
 #include <linux/build_bug.h>
 #define GENMASK_INPUT_CHECK(h, l) \
-	(BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
-		__builtin_constant_p((l) > (h)), (l) > (h), 0)))
+	({ BUILD_BUG_ON(__builtin_constant_p((l) > (h)) && ((l) > (h))); 0; })
 #else
 /*
  * BUILD_BUG_ON_ZERO is not available in h files included from asm files,



Also, why the fast path of find_*_bit() functions does not check
__builtin_constant_p(offset) as well as small_const_nbits(size), for the fast
path fails to catch BUILD_BUG_ON_ZERO() when offset argument is not a constant.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 11/12] tools: sync lib/find_bit implementation
  2021-05-10 15:27   ` Tetsuo Handa
@ 2021-05-10 15:44     ` Andy Shevchenko
  2021-05-10 17:21       ` Yury Norov
  2021-05-10 22:51       ` Rikard Falkeborn
  0 siblings, 2 replies; 34+ messages in thread
From: Andy Shevchenko @ 2021-05-10 15:44 UTC (permalink / raw)
  To: Tetsuo Handa, Rikard Falkeborn
  Cc: Yury Norov, Linux Kernel Mailing List, Andrew Morton,
	Rasmus Villemoes, linux-m68k, Linux-Arch, Linux-SH,
	Alexey Klimov, Andy Shevchenko, Arnd Bergmann, David Sterba,
	Dennis Zhou, Geert Uytterhoeven, Jianpeng Ma, Joe Perches,
	John Paul Adrian Glaubitz, Josh Poimboeuf, Rich Felker,
	Stefano Brivio, Wei Yang, Wolfram Sang, Yoshinori Sato

+Cc: Rikard

On Mon, May 10, 2021 at 6:31 PM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> Commit eaae7841ba83bb42 ("tools: sync lib/find_bit implementation") broke
> build of 5.13-rc1 using gcc (GCC) 8.3.1 20190311 (Red Hat 8.3.1-3).
>
>   DESCEND  objtool
>   CC       /usr/src/linux/tools/objtool/exec-cmd.o
>   CC       /usr/src/linux/tools/objtool/help.o
>   CC       /usr/src/linux/tools/objtool/pager.o
>   CC       /usr/src/linux/tools/objtool/parse-options.o
>   CC       /usr/src/linux/tools/objtool/run-command.o
>   CC       /usr/src/linux/tools/objtool/sigchain.o
>   CC       /usr/src/linux/tools/objtool/subcmd-config.o
>   LD       /usr/src/linux/tools/objtool/libsubcmd-in.o
>   AR       /usr/src/linux/tools/objtool/libsubcmd.a
>   CC       /usr/src/linux/tools/objtool/arch/x86/special.o
> In file included from /usr/src/linux/tools/include/linux/kernel.h:8:0,
>                  from /usr/src/linux/tools/include/linux/list.h:7,
>                  from /usr/src/linux/tools/objtool/include/objtool/arch.h:10,
>                  from /usr/src/linux/tools/objtool/include/objtool/check.h:11,
>                  from /usr/src/linux/tools/objtool/include/objtool/special.h:10,
>                  from arch/x86/special.c:4:
> /usr/src/linux/tools/include/asm-generic/bitops/find.h: In function 'find_next_bit':
> /usr/src/linux/tools/include/linux/bits.h:24:21: error: first argument to '__builtin_choose_expr' not a constant
>   (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
>                      ^
> /usr/src/linux/tools/include/linux/build_bug.h:16:62: note: in definition of macro 'BUILD_BUG_ON_ZERO'
>  #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
>                                                               ^
> /usr/src/linux/tools/include/linux/bits.h:38:3: note: in expansion of macro 'GENMASK_INPUT_CHECK'
>   (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
>    ^
> /usr/src/linux/tools/include/asm-generic/bitops/find.h:32:17: note: in expansion of macro 'GENMASK'
>    val = *addr & GENMASK(size - 1, offset);
>                  ^
> /usr/src/linux/tools/include/linux/build_bug.h:16:51: error: bit-field '<anonymous>' width not an integer constant
>  #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
>                                                    ^
> /usr/src/linux/tools/include/linux/bits.h:24:3: note: in expansion of macro 'BUILD_BUG_ON_ZERO'
>   (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
>    ^
> /usr/src/linux/tools/include/linux/bits.h:38:3: note: in expansion of macro 'GENMASK_INPUT_CHECK'
>   (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
>    ^
> /usr/src/linux/tools/include/asm-generic/bitops/find.h:32:17: note: in expansion of macro 'GENMASK'
>    val = *addr & GENMASK(size - 1, offset);
>                  ^
> /usr/src/linux/tools/include/asm-generic/bitops/find.h: In function 'find_next_and_bit':
> /usr/src/linux/tools/include/linux/bits.h:24:21: error: first argument to '__builtin_choose_expr' not a constant
>   (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
>                      ^
> /usr/src/linux/tools/include/linux/build_bug.h:16:62: note: in definition of macro 'BUILD_BUG_ON_ZERO'
>  #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
>                                                               ^
> /usr/src/linux/tools/include/linux/bits.h:38:3: note: in expansion of macro 'GENMASK_INPUT_CHECK'
>   (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
>    ^
> /usr/src/linux/tools/include/asm-generic/bitops/find.h:62:27: note: in expansion of macro 'GENMASK'
>    val = *addr1 & *addr2 & GENMASK(size - 1, offset);
>                            ^
> /usr/src/linux/tools/include/linux/build_bug.h:16:51: error: bit-field '<anonymous>' width not an integer constant
>  #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
>                                                    ^
> /usr/src/linux/tools/include/linux/bits.h:24:3: note: in expansion of macro 'BUILD_BUG_ON_ZERO'
>   (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
>    ^
> /usr/src/linux/tools/include/linux/bits.h:38:3: note: in expansion of macro 'GENMASK_INPUT_CHECK'
>   (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
>    ^
> /usr/src/linux/tools/include/asm-generic/bitops/find.h:62:27: note: in expansion of macro 'GENMASK'
>    val = *addr1 & *addr2 & GENMASK(size - 1, offset);
>                            ^
> /usr/src/linux/tools/include/asm-generic/bitops/find.h: In function 'find_next_zero_bit':
> /usr/src/linux/tools/include/linux/bits.h:24:21: error: first argument to '__builtin_choose_expr' not a constant
>   (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
>                      ^
> /usr/src/linux/tools/include/linux/build_bug.h:16:62: note: in definition of macro 'BUILD_BUG_ON_ZERO'
>  #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
>                                                               ^
> /usr/src/linux/tools/include/linux/bits.h:38:3: note: in expansion of macro 'GENMASK_INPUT_CHECK'
>   (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
>    ^
> /usr/src/linux/tools/include/asm-generic/bitops/find.h:90:18: note: in expansion of macro 'GENMASK'
>    val = *addr | ~GENMASK(size - 1, offset);
>                   ^
> /usr/src/linux/tools/include/linux/build_bug.h:16:51: error: bit-field '<anonymous>' width not an integer constant
>  #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
>                                                    ^
> /usr/src/linux/tools/include/linux/bits.h:24:3: note: in expansion of macro 'BUILD_BUG_ON_ZERO'
>   (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
>    ^
> /usr/src/linux/tools/include/linux/bits.h:38:3: note: in expansion of macro 'GENMASK_INPUT_CHECK'
>   (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
>    ^
> /usr/src/linux/tools/include/asm-generic/bitops/find.h:90:18: note: in expansion of macro 'GENMASK'
>    val = *addr | ~GENMASK(size - 1, offset);
>                   ^
> make[5]: *** [/usr/src/linux/tools/objtool/arch/x86/special.o] Error 1
> make[4]: *** [arch/x86] Error 2
> make[3]: *** [/usr/src/linux/tools/objtool/objtool-in.o] Error 2
> make[2]: *** [objtool] Error 2
> make[1]: *** [tools/objtool] Error 2
> make: *** [__sub-make] Error 2
>
>
>
> Applying below diff seems to solve the build failure.

It will desynchronize this implementation with the mother's one (i.e.
in bits.h).

> Do we need to use BUILD_BUG_ON_ZERO() here?

Rikard?

>
> diff --git a/tools/include/linux/bits.h b/tools/include/linux/bits.h
> index 7f475d59a097..0aba9294f29d 100644
> --- a/tools/include/linux/bits.h
> +++ b/tools/include/linux/bits.h
> @@ -21,8 +21,7 @@
>  #if !defined(__ASSEMBLY__)
>  #include <linux/build_bug.h>
>  #define GENMASK_INPUT_CHECK(h, l) \
> -       (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
> -               __builtin_constant_p((l) > (h)), (l) > (h), 0)))
> +       ({ BUILD_BUG_ON(__builtin_constant_p((l) > (h)) && ((l) > (h))); 0; })
>  #else
>  /*
>   * BUILD_BUG_ON_ZERO is not available in h files included from asm files,


> Also, why the fast path of find_*_bit() functions does not check
> __builtin_constant_p(offset) as well as small_const_nbits(size), for the fast
> path fails to catch BUILD_BUG_ON_ZERO() when offset argument is not a constant.

How would this help anything?

If you ask a bit from a bitmap behind the size, what do you expect to get?

And I'm a bit lost here, because I can't imagine the offset being
constant along with a size of bitmap. What do we want to achieve by
this? Any examples to better understand the case?

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 11/12] tools: sync lib/find_bit implementation
  2021-05-10 15:44     ` Andy Shevchenko
@ 2021-05-10 17:21       ` Yury Norov
  2021-05-10 22:51       ` Rikard Falkeborn
  1 sibling, 0 replies; 34+ messages in thread
From: Yury Norov @ 2021-05-10 17:21 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Tetsuo Handa, Rikard Falkeborn, Linux Kernel Mailing List,
	Andrew Morton, Rasmus Villemoes, linux-m68k, Linux-Arch,
	Linux-SH, Alexey Klimov, Andy Shevchenko, Arnd Bergmann,
	David Sterba, Dennis Zhou, Geert Uytterhoeven, Jianpeng Ma,
	Joe Perches, John Paul Adrian Glaubitz, Josh Poimboeuf,
	Rich Felker, Stefano Brivio, Wei Yang, Wolfram Sang,
	Yoshinori Sato

On Mon, May 10, 2021 at 06:44:44PM +0300, Andy Shevchenko wrote:
> +Cc: Rikard
> 
> On Mon, May 10, 2021 at 6:31 PM Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
> >
> > Commit eaae7841ba83bb42 ("tools: sync lib/find_bit implementation") broke
> > build of 5.13-rc1 using gcc (GCC) 8.3.1 20190311 (Red Hat 8.3.1-3).
> >
> >   DESCEND  objtool
> >   CC       /usr/src/linux/tools/objtool/exec-cmd.o
> >   CC       /usr/src/linux/tools/objtool/help.o
> >   CC       /usr/src/linux/tools/objtool/pager.o
> >   CC       /usr/src/linux/tools/objtool/parse-options.o
> >   CC       /usr/src/linux/tools/objtool/run-command.o
> >   CC       /usr/src/linux/tools/objtool/sigchain.o
> >   CC       /usr/src/linux/tools/objtool/subcmd-config.o
> >   LD       /usr/src/linux/tools/objtool/libsubcmd-in.o
> >   AR       /usr/src/linux/tools/objtool/libsubcmd.a
> >   CC       /usr/src/linux/tools/objtool/arch/x86/special.o
> > In file included from /usr/src/linux/tools/include/linux/kernel.h:8:0,
> >                  from /usr/src/linux/tools/include/linux/list.h:7,
> >                  from /usr/src/linux/tools/objtool/include/objtool/arch.h:10,
> >                  from /usr/src/linux/tools/objtool/include/objtool/check.h:11,
> >                  from /usr/src/linux/tools/objtool/include/objtool/special.h:10,
> >                  from arch/x86/special.c:4:
> > /usr/src/linux/tools/include/asm-generic/bitops/find.h: In function 'find_next_bit':
> > /usr/src/linux/tools/include/linux/bits.h:24:21: error: first argument to '__builtin_choose_expr' not a constant
> >   (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
> >                      ^
> > /usr/src/linux/tools/include/linux/build_bug.h:16:62: note: in definition of macro 'BUILD_BUG_ON_ZERO'
> >  #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
> >                                                               ^
> > /usr/src/linux/tools/include/linux/bits.h:38:3: note: in expansion of macro 'GENMASK_INPUT_CHECK'
> >   (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
> >    ^
> > /usr/src/linux/tools/include/asm-generic/bitops/find.h:32:17: note: in expansion of macro 'GENMASK'
> >    val = *addr & GENMASK(size - 1, offset);
> >                  ^
> > /usr/src/linux/tools/include/linux/build_bug.h:16:51: error: bit-field '<anonymous>' width not an integer constant
> >  #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
> >                                                    ^
> > /usr/src/linux/tools/include/linux/bits.h:24:3: note: in expansion of macro 'BUILD_BUG_ON_ZERO'
> >   (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
> >    ^
> > /usr/src/linux/tools/include/linux/bits.h:38:3: note: in expansion of macro 'GENMASK_INPUT_CHECK'
> >   (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
> >    ^
> > /usr/src/linux/tools/include/asm-generic/bitops/find.h:32:17: note: in expansion of macro 'GENMASK'
> >    val = *addr & GENMASK(size - 1, offset);
> >                  ^
> > /usr/src/linux/tools/include/asm-generic/bitops/find.h: In function 'find_next_and_bit':
> > /usr/src/linux/tools/include/linux/bits.h:24:21: error: first argument to '__builtin_choose_expr' not a constant
> >   (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
> >                      ^
> > /usr/src/linux/tools/include/linux/build_bug.h:16:62: note: in definition of macro 'BUILD_BUG_ON_ZERO'
> >  #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
> >                                                               ^
> > /usr/src/linux/tools/include/linux/bits.h:38:3: note: in expansion of macro 'GENMASK_INPUT_CHECK'
> >   (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
> >    ^
> > /usr/src/linux/tools/include/asm-generic/bitops/find.h:62:27: note: in expansion of macro 'GENMASK'
> >    val = *addr1 & *addr2 & GENMASK(size - 1, offset);
> >                            ^
> > /usr/src/linux/tools/include/linux/build_bug.h:16:51: error: bit-field '<anonymous>' width not an integer constant
> >  #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
> >                                                    ^
> > /usr/src/linux/tools/include/linux/bits.h:24:3: note: in expansion of macro 'BUILD_BUG_ON_ZERO'
> >   (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
> >    ^
> > /usr/src/linux/tools/include/linux/bits.h:38:3: note: in expansion of macro 'GENMASK_INPUT_CHECK'
> >   (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
> >    ^
> > /usr/src/linux/tools/include/asm-generic/bitops/find.h:62:27: note: in expansion of macro 'GENMASK'
> >    val = *addr1 & *addr2 & GENMASK(size - 1, offset);
> >                            ^
> > /usr/src/linux/tools/include/asm-generic/bitops/find.h: In function 'find_next_zero_bit':
> > /usr/src/linux/tools/include/linux/bits.h:24:21: error: first argument to '__builtin_choose_expr' not a constant
> >   (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
> >                      ^
> > /usr/src/linux/tools/include/linux/build_bug.h:16:62: note: in definition of macro 'BUILD_BUG_ON_ZERO'
> >  #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
> >                                                               ^
> > /usr/src/linux/tools/include/linux/bits.h:38:3: note: in expansion of macro 'GENMASK_INPUT_CHECK'
> >   (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
> >    ^
> > /usr/src/linux/tools/include/asm-generic/bitops/find.h:90:18: note: in expansion of macro 'GENMASK'
> >    val = *addr | ~GENMASK(size - 1, offset);
> >                   ^
> > /usr/src/linux/tools/include/linux/build_bug.h:16:51: error: bit-field '<anonymous>' width not an integer constant
> >  #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
> >                                                    ^
> > /usr/src/linux/tools/include/linux/bits.h:24:3: note: in expansion of macro 'BUILD_BUG_ON_ZERO'
> >   (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
> >    ^
> > /usr/src/linux/tools/include/linux/bits.h:38:3: note: in expansion of macro 'GENMASK_INPUT_CHECK'
> >   (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
> >    ^
> > /usr/src/linux/tools/include/asm-generic/bitops/find.h:90:18: note: in expansion of macro 'GENMASK'
> >    val = *addr | ~GENMASK(size - 1, offset);
> >                   ^
> > make[5]: *** [/usr/src/linux/tools/objtool/arch/x86/special.o] Error 1
> > make[4]: *** [arch/x86] Error 2
> > make[3]: *** [/usr/src/linux/tools/objtool/objtool-in.o] Error 2
> > make[2]: *** [objtool] Error 2
> > make[1]: *** [tools/objtool] Error 2
> > make: *** [__sub-make] Error 2
> >
> >
> > Applying below diff seems to solve the build failure.
> >
> It will desynchronize this implementation with the mother's one (i.e.
> in bits.h).
> 
> > Do we need to use BUILD_BUG_ON_ZERO() here?
> 
> Rikard?
>
> >
> > diff --git a/tools/include/linux/bits.h b/tools/include/linux/bits.h
> > index 7f475d59a097..0aba9294f29d 100644
> > --- a/tools/include/linux/bits.h
> > +++ b/tools/include/linux/bits.h
> > @@ -21,8 +21,7 @@
> >  #if !defined(__ASSEMBLY__)
> >  #include <linux/build_bug.h>
> >  #define GENMASK_INPUT_CHECK(h, l) \
> > -       (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
> > -               __builtin_constant_p((l) > (h)), (l) > (h), 0)))
> > +       ({ BUILD_BUG_ON(__builtin_constant_p((l) > (h)) && ((l) > (h))); 0; })
> >  #else
> >  /*
> >   * BUILD_BUG_ON_ZERO is not available in h files included from asm files,
 
As Andy said, we need to sync tools/GENMASK_INPUT_CHECK() with the
kernel version, and if I do this, I have many build failures:

https://pastebin.com/Fan1VLVF

Maybe in this case we should use __GENMASK() to bypass GENMASK_INPUT_CHECK()...
I'll check everything carefully this evening.
 
> > Also, why the fast path of find_*_bit() functions does not check
> > __builtin_constant_p(offset) as well as small_const_nbits(size), for the fast
> > path fails to catch BUILD_BUG_ON_ZERO() when offset argument is not a constant.
>
> How would this help anything?
> 
> If you ask a bit from a bitmap behind the size, what do you expect to get?
> 
> And I'm a bit lost here, because I can't imagine the offset being
> constant along with a size of bitmap. What do we want to achieve by
> this? Any examples to better understand the case?

If offset is constant, the existing fast path optimization would work
even better, without any modifications. (Not sure there's an example of
it in the existing codebase.) But even if the offset is not constant,
fast path works quite well - it saves ~1K of .text and improves on
performance:

https://lore.kernel.org/linux-m68k/20210321215457.588554-10-yury.norov@gmail.com/

We don't need to disable the optimization for non-constant offsets,
for sure. If asserts in GENMASK() break build, we should use __GENMASK().

Thanks,
Yury

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 11/12] tools: sync lib/find_bit implementation
  2021-05-10 15:44     ` Andy Shevchenko
  2021-05-10 17:21       ` Yury Norov
@ 2021-05-10 22:51       ` Rikard Falkeborn
  2021-05-11  7:28         ` Andy Shevchenko
  1 sibling, 1 reply; 34+ messages in thread
From: Rikard Falkeborn @ 2021-05-10 22:51 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Tetsuo Handa, Rikard Falkeborn, Yury Norov,
	Linux Kernel Mailing List, Andrew Morton, Rasmus Villemoes,
	linux-m68k, Linux-Arch, Linux-SH, Alexey Klimov, Andy Shevchenko,
	Arnd Bergmann, David Sterba, Dennis Zhou, Geert Uytterhoeven,
	Jianpeng Ma, Joe Perches, John Paul Adrian Glaubitz,
	Josh Poimboeuf, Rich Felker, Stefano Brivio, Wei Yang,
	Wolfram Sang, Yoshinori Sato

On Mon, May 10, 2021 at 06:44:44PM +0300, Andy Shevchenko wrote:
> +Cc: Rikard
> 
> On Mon, May 10, 2021 at 6:31 PM Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
> >
> > Commit eaae7841ba83bb42 ("tools: sync lib/find_bit implementation") broke
> > build of 5.13-rc1 using gcc (GCC) 8.3.1 20190311 (Red Hat 8.3.1-3).
> >
> >   DESCEND  objtool
> >   CC       /usr/src/linux/tools/objtool/exec-cmd.o
> >   CC       /usr/src/linux/tools/objtool/help.o
> >   CC       /usr/src/linux/tools/objtool/pager.o
> >   CC       /usr/src/linux/tools/objtool/parse-options.o
> >   CC       /usr/src/linux/tools/objtool/run-command.o
> >   CC       /usr/src/linux/tools/objtool/sigchain.o
> >   CC       /usr/src/linux/tools/objtool/subcmd-config.o
> >   LD       /usr/src/linux/tools/objtool/libsubcmd-in.o
> >   AR       /usr/src/linux/tools/objtool/libsubcmd.a
> >   CC       /usr/src/linux/tools/objtool/arch/x86/special.o
> > In file included from /usr/src/linux/tools/include/linux/kernel.h:8:0,
> >                  from /usr/src/linux/tools/include/linux/list.h:7,
> >                  from /usr/src/linux/tools/objtool/include/objtool/arch.h:10,
> >                  from /usr/src/linux/tools/objtool/include/objtool/check.h:11,
> >                  from /usr/src/linux/tools/objtool/include/objtool/special.h:10,
> >                  from arch/x86/special.c:4:
> > /usr/src/linux/tools/include/asm-generic/bitops/find.h: In function 'find_next_bit':
> > /usr/src/linux/tools/include/linux/bits.h:24:21: error: first argument to '__builtin_choose_expr' not a constant
> >   (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
> >                      ^
> > /usr/src/linux/tools/include/linux/build_bug.h:16:62: note: in definition of macro 'BUILD_BUG_ON_ZERO'
> >  #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
> >                                                               ^
> > /usr/src/linux/tools/include/linux/bits.h:38:3: note: in expansion of macro 'GENMASK_INPUT_CHECK'
> >   (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
> >    ^
> > /usr/src/linux/tools/include/asm-generic/bitops/find.h:32:17: note: in expansion of macro 'GENMASK'
> >    val = *addr & GENMASK(size - 1, offset);
> >                  ^
> > /usr/src/linux/tools/include/linux/build_bug.h:16:51: error: bit-field '<anonymous>' width not an integer constant
> >  #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
> >                                                    ^
> > /usr/src/linux/tools/include/linux/bits.h:24:3: note: in expansion of macro 'BUILD_BUG_ON_ZERO'
> >   (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
> >    ^
> > /usr/src/linux/tools/include/linux/bits.h:38:3: note: in expansion of macro 'GENMASK_INPUT_CHECK'
> >   (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
> >    ^
> > /usr/src/linux/tools/include/asm-generic/bitops/find.h:32:17: note: in expansion of macro 'GENMASK'
> >    val = *addr & GENMASK(size - 1, offset);
> >                  ^
> > /usr/src/linux/tools/include/asm-generic/bitops/find.h: In function 'find_next_and_bit':
> > /usr/src/linux/tools/include/linux/bits.h:24:21: error: first argument to '__builtin_choose_expr' not a constant
> >   (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
> >                      ^
> > /usr/src/linux/tools/include/linux/build_bug.h:16:62: note: in definition of macro 'BUILD_BUG_ON_ZERO'
> >  #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
> >                                                               ^
> > /usr/src/linux/tools/include/linux/bits.h:38:3: note: in expansion of macro 'GENMASK_INPUT_CHECK'
> >   (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
> >    ^
> > /usr/src/linux/tools/include/asm-generic/bitops/find.h:62:27: note: in expansion of macro 'GENMASK'
> >    val = *addr1 & *addr2 & GENMASK(size - 1, offset);
> >                            ^
> > /usr/src/linux/tools/include/linux/build_bug.h:16:51: error: bit-field '<anonymous>' width not an integer constant
> >  #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
> >                                                    ^
> > /usr/src/linux/tools/include/linux/bits.h:24:3: note: in expansion of macro 'BUILD_BUG_ON_ZERO'
> >   (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
> >    ^
> > /usr/src/linux/tools/include/linux/bits.h:38:3: note: in expansion of macro 'GENMASK_INPUT_CHECK'
> >   (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
> >    ^
> > /usr/src/linux/tools/include/asm-generic/bitops/find.h:62:27: note: in expansion of macro 'GENMASK'
> >    val = *addr1 & *addr2 & GENMASK(size - 1, offset);
> >                            ^
> > /usr/src/linux/tools/include/asm-generic/bitops/find.h: In function 'find_next_zero_bit':
> > /usr/src/linux/tools/include/linux/bits.h:24:21: error: first argument to '__builtin_choose_expr' not a constant
> >   (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
> >                      ^
> > /usr/src/linux/tools/include/linux/build_bug.h:16:62: note: in definition of macro 'BUILD_BUG_ON_ZERO'
> >  #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
> >                                                               ^
> > /usr/src/linux/tools/include/linux/bits.h:38:3: note: in expansion of macro 'GENMASK_INPUT_CHECK'
> >   (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
> >    ^
> > /usr/src/linux/tools/include/asm-generic/bitops/find.h:90:18: note: in expansion of macro 'GENMASK'
> >    val = *addr | ~GENMASK(size - 1, offset);
> >                   ^
> > /usr/src/linux/tools/include/linux/build_bug.h:16:51: error: bit-field '<anonymous>' width not an integer constant
> >  #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
> >                                                    ^
> > /usr/src/linux/tools/include/linux/bits.h:24:3: note: in expansion of macro 'BUILD_BUG_ON_ZERO'
> >   (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
> >    ^
> > /usr/src/linux/tools/include/linux/bits.h:38:3: note: in expansion of macro 'GENMASK_INPUT_CHECK'
> >   (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
> >    ^
> > /usr/src/linux/tools/include/asm-generic/bitops/find.h:90:18: note: in expansion of macro 'GENMASK'
> >    val = *addr | ~GENMASK(size - 1, offset);
> >                   ^
> > make[5]: *** [/usr/src/linux/tools/objtool/arch/x86/special.o] Error 1
> > make[4]: *** [arch/x86] Error 2
> > make[3]: *** [/usr/src/linux/tools/objtool/objtool-in.o] Error 2
> > make[2]: *** [objtool] Error 2
> > make[1]: *** [tools/objtool] Error 2
> > make: *** [__sub-make] Error 2
> >
> >

Some background: when I added the input check to GENMASK there was
originally a check that disabled the input check if gcc versions earlier
than 4.9, since for old compilers, there was a bug [1] where for some
constructs, __builtin_constant_p() did not evaluate to a constant. This
was supposedly fixed in gcc 4.9, but apparently there are newer gcc
versions which suffer from this too.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19449

> >
> > Applying below diff seems to solve the build failure.
> 
> It will desynchronize this implementation with the mother's one (i.e.
> in bits.h).
> 
> > Do we need to use BUILD_BUG_ON_ZERO() here?
> 
> Rikard?

Yes, (as Yury pointed out), GENMASK is used in for example structure
initializers where  BUIlD_BUG() can not be used.
> 
> >
> > diff --git a/tools/include/linux/bits.h b/tools/include/linux/bits.h
> > index 7f475d59a097..0aba9294f29d 100644
> > --- a/tools/include/linux/bits.h
> > +++ b/tools/include/linux/bits.h
> > @@ -21,8 +21,7 @@
> >  #if !defined(__ASSEMBLY__)
> >  #include <linux/build_bug.h>
> >  #define GENMASK_INPUT_CHECK(h, l) \
> > -       (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
> > -               __builtin_constant_p((l) > (h)), (l) > (h), 0)))
> > +       ({ BUILD_BUG_ON(__builtin_constant_p((l) > (h)) && ((l) > (h))); 0; })
> >  #else
> >  /*
> >   * BUILD_BUG_ON_ZERO is not available in h files included from asm files,

Does the following work for you? For simplicity, I copied__is_constexpr from
include/linux/minmax.h (which isn't available in tools/). A proper patch
would reuse __is_constexpr (possibly refactoring it to a separate
header since bits.h including minmax.h for that only seems smelly) and fix
bits.h in the kernel header as well, to keep the files in sync.

diff --git a/tools/include/linux/bits.h b/tools/include/linux/bits.h
index 7f475d59a097..7bc4c31a7df0 100644
--- a/tools/include/linux/bits.h
+++ b/tools/include/linux/bits.h
@@ -19,10 +19,13 @@
  * GENMASK_ULL(39, 21) gives us the 64bit vector 0x000000ffffe00000.
  */
 #if !defined(__ASSEMBLY__)
+
+#define __is_constexpr(x) \
+       (sizeof(int) == sizeof(*(8 ? ((void *)((long)(x) * 0l)) : (int *)8)))
 #include <linux/build_bug.h>
 #define GENMASK_INPUT_CHECK(h, l) \
        (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
-               __builtin_constant_p((l) > (h)), (l) > (h), 0)))
+               __is_constexpr((l) > (h)), (l) > (h), 0)))
 #else
 /*
  * BUILD_BUG_ON_ZERO is not available in h files included from asm files,

I think any solution which results in using __GENMASK to avoid the input
check will just result in similar reports in the future.

Rikard


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 11/12] tools: sync lib/find_bit implementation
  2021-05-10 22:51       ` Rikard Falkeborn
@ 2021-05-11  7:28         ` Andy Shevchenko
  2021-05-11 10:36           ` Rikard Falkeborn
  0 siblings, 1 reply; 34+ messages in thread
From: Andy Shevchenko @ 2021-05-11  7:28 UTC (permalink / raw)
  To: Rikard Falkeborn
  Cc: Tetsuo Handa, Yury Norov, Linux Kernel Mailing List,
	Andrew Morton, Rasmus Villemoes, linux-m68k, Linux-Arch,
	Linux-SH, Alexey Klimov, Arnd Bergmann, David Sterba,
	Dennis Zhou, Geert Uytterhoeven, Jianpeng Ma, Joe Perches,
	John Paul Adrian Glaubitz, Josh Poimboeuf, Rich Felker,
	Stefano Brivio, Wei Yang, Wolfram Sang, Yoshinori Sato

On Tue, May 11, 2021 at 12:51:58AM +0200, Rikard Falkeborn wrote:
> On Mon, May 10, 2021 at 06:44:44PM +0300, Andy Shevchenko wrote:

...

> Does the following work for you? For simplicity, I copied__is_constexpr from
> include/linux/minmax.h (which isn't available in tools/). A proper patch
> would reuse __is_constexpr (possibly refactoring it to a separate
> header since bits.h including minmax.h for that only seems smelly)

I think we need to have it in something like compiler.h (top level). Under
'top level' I meant something with the function as of compiler.h but with
Linuxisms rather than compiler attributes or so.

Separate header for the (single) macro is too much...

> and fix
> bits.h in the kernel header as well, to keep the files in sync.

Right.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 11/12] tools: sync lib/find_bit implementation
  2021-05-11  7:28         ` Andy Shevchenko
@ 2021-05-11 10:36           ` Rikard Falkeborn
  2021-05-11 11:53             ` Tetsuo Handa
  2021-05-11 12:17             ` Andy Shevchenko
  0 siblings, 2 replies; 34+ messages in thread
From: Rikard Falkeborn @ 2021-05-11 10:36 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Rikard Falkeborn, Tetsuo Handa, Yury Norov,
	Linux Kernel Mailing List, Andrew Morton, Rasmus Villemoes,
	linux-m68k, Linux-Arch, Linux-SH, Alexey Klimov, Arnd Bergmann,
	David Sterba, Dennis Zhou, Geert Uytterhoeven, Jianpeng Ma,
	Joe Perches, John Paul Adrian Glaubitz, Josh Poimboeuf,
	Rich Felker, Stefano Brivio, Wei Yang, Wolfram Sang,
	Yoshinori Sato

On Tue, May 11, 2021 at 10:28:50AM +0300, Andy Shevchenko wrote:
> On Tue, May 11, 2021 at 12:51:58AM +0200, Rikard Falkeborn wrote:
> > On Mon, May 10, 2021 at 06:44:44PM +0300, Andy Shevchenko wrote:
> 
> ...
> 
> > Does the following work for you? For simplicity, I copied__is_constexpr from
> > include/linux/minmax.h (which isn't available in tools/). A proper patch
> > would reuse __is_constexpr (possibly refactoring it to a separate
> > header since bits.h including minmax.h for that only seems smelly)
> 
> I think we need to have it in something like compiler.h (top level). Under
> 'top level' I meant something with the function as of compiler.h but with
> Linuxisms rather than compiler attributes or so.

Right. Will you send a patch, or do you want me to?

It would be good to get confirmation that __is_constexpr solves the
build failure.

> 
> Separate header for the (single) macro is too much...
> 

Agreed.

> > and fix
> > bits.h in the kernel header as well, to keep the files in sync.
> 
> Right.
> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 
> 

With Best Regards,
Rikard

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 11/12] tools: sync lib/find_bit implementation
  2021-05-11 10:36           ` Rikard Falkeborn
@ 2021-05-11 11:53             ` Tetsuo Handa
  2021-05-11 20:37               ` Rikard Falkeborn
  2021-05-11 12:17             ` Andy Shevchenko
  1 sibling, 1 reply; 34+ messages in thread
From: Tetsuo Handa @ 2021-05-11 11:53 UTC (permalink / raw)
  To: Rikard Falkeborn, Andy Shevchenko
  Cc: Yury Norov, Linux Kernel Mailing List, Andrew Morton,
	Rasmus Villemoes, linux-m68k, Linux-Arch, Linux-SH,
	Alexey Klimov, Arnd Bergmann, David Sterba, Dennis Zhou,
	Geert Uytterhoeven, Jianpeng Ma, Joe Perches,
	John Paul Adrian Glaubitz, Josh Poimboeuf, Rich Felker,
	Stefano Brivio, Wei Yang, Wolfram Sang, Yoshinori Sato

On 2021/05/11 0:44, Andy Shevchenko wrote:
> And I'm a bit lost here, because I can't imagine the offset being
> constant along with a size of bitmap. What do we want to achieve by
> this? Any examples to better understand the case?

Because I feel that the GENMASK() macro cannot be evaluated without
both arguments being a constant.

The usage is

 unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
                            unsigned long offset)
 {
+       if (small_const_nbits(size)) {
+               unsigned long val;
+
+               if (unlikely(offset >= size))
+                       return size;
+
+               val = *addr & GENMASK(size - 1, offset);
+               return val ? __ffs(val) : size;
+       }
+
        return _find_next_bit(addr, NULL, size, offset, 0UL, 0);
 }

where GENMASK() might be called even if "offset" is not a constant.

#define GENMASK(h, l) \
     (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))

#define __GENMASK(h, l) \
     (((~UL(0)) - (UL(1) << (l)) + 1) & \
       (~UL(0) >> (BITS_PER_LONG - 1 - (h))))

#define GENMASK_INPUT_CHECK(h, l) \
     (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
          __builtin_constant_p((l) > (h)), (l) > (h), 0)))

__GENMASK() does not need "h" and "l" being a constant.

Yes, small_const_nbits(size) in find_next_bit() can guarantee that "size" is a
constant and hence "h" argument in GENMASK_INPUT_CHECK() call is also a constant.
But nothing can guarantee that "offset" is a constant, and hence nothing can
guarantee that "l" argument in GENMASK_INPUT_CHECK() call is also a constant.

Then, how can (l) > (h) in __builtin_constant_p((l) > (h)) be evaluated at build time
if either l or h (i.e. "offset" and "size - 1" in find_next_bit()) lacks a guarantee of
being a constant?

But what a surprise,

On 2021/05/11 7:51, Rikard Falkeborn wrote:
> Does the following work for you? For simplicity, I copied__is_constexpr from
> include/linux/minmax.h (which isn't available in tools/). A proper patch
> would reuse __is_constexpr (possibly refactoring it to a separate
> header since bits.h including minmax.h for that only seems smelly) and fix
> bits.h in the kernel header as well, to keep the files in sync.

this works for me.

> 
> diff --git a/tools/include/linux/bits.h b/tools/include/linux/bits.h
> index 7f475d59a097..7bc4c31a7df0 100644
> --- a/tools/include/linux/bits.h
> +++ b/tools/include/linux/bits.h
> @@ -19,10 +19,13 @@
>   * GENMASK_ULL(39, 21) gives us the 64bit vector 0x000000ffffe00000.
>   */
>  #if !defined(__ASSEMBLY__)
> +
> +#define __is_constexpr(x) \
> +       (sizeof(int) == sizeof(*(8 ? ((void *)((long)(x) * 0l)) : (int *)8)))
>  #include <linux/build_bug.h>
>  #define GENMASK_INPUT_CHECK(h, l) \
>         (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
> -               __builtin_constant_p((l) > (h)), (l) > (h), 0)))
> +               __is_constexpr((l) > (h)), (l) > (h), 0)))
>  #else
>  /*
>   * BUILD_BUG_ON_ZERO is not available in h files included from asm files,
> 



On 2021/05/11 7:52, Yury Norov wrote:
> I tested the objtool build with the 8.4.0 and 7.5.0 compilers from
> ubuntu 21 distro, and it looks working. Can you please share more
> details about your system? 

Nothing special. A plain x86_64 CentOS 7.9 system with devtoolset-8.

$ /opt/rh/devtoolset-8/root/bin/gcc --version
gcc (GCC) 8.3.1 20190311 (Red Hat 8.3.1-3)
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ rpm -qi devtoolset-8-gcc
Name        : devtoolset-8-gcc
Version     : 8.3.1
Release     : 3.2.el7
Architecture: x86_64
Install Date: Wed Apr 22 07:58:16 2020
Group       : Development/Languages
Size        : 74838011
License     : GPLv3+ and GPLv3+ with exceptions and GPLv2+ with exceptions and LGPLv2+ and BSD
Signature   : RSA/SHA1, Thu Apr 16 19:44:43 2020, Key ID 4eb84e71f2ee9d55
Source RPM  : devtoolset-8-gcc-8.3.1-3.2.el7.src.rpm
Build Date  : Sat Mar 28 00:06:45 2020
Build Host  : c1be.rdu2.centos.org
Relocations : (not relocatable)
Packager    : CBS <cbs@centos.org>
Vendor      : CentOS
URL         : http://gcc.gnu.org
Summary     : GCC version 8
Description :
The devtoolset-8-gcc package contains the GNU Compiler Collection version 7.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 11/12] tools: sync lib/find_bit implementation
  2021-05-11 10:36           ` Rikard Falkeborn
  2021-05-11 11:53             ` Tetsuo Handa
@ 2021-05-11 12:17             ` Andy Shevchenko
  1 sibling, 0 replies; 34+ messages in thread
From: Andy Shevchenko @ 2021-05-11 12:17 UTC (permalink / raw)
  To: Rikard Falkeborn
  Cc: Tetsuo Handa, Yury Norov, Linux Kernel Mailing List,
	Andrew Morton, Rasmus Villemoes, linux-m68k, Linux-Arch,
	Linux-SH, Alexey Klimov, Arnd Bergmann, David Sterba,
	Dennis Zhou, Geert Uytterhoeven, Jianpeng Ma, Joe Perches,
	John Paul Adrian Glaubitz, Josh Poimboeuf, Rich Felker,
	Stefano Brivio, Wei Yang, Wolfram Sang, Yoshinori Sato

On Tue, May 11, 2021 at 1:36 PM Rikard Falkeborn
<rikard.falkeborn@gmail.com> wrote:
>
> On Tue, May 11, 2021 at 10:28:50AM +0300, Andy Shevchenko wrote:
> > On Tue, May 11, 2021 at 12:51:58AM +0200, Rikard Falkeborn wrote:
> > > On Mon, May 10, 2021 at 06:44:44PM +0300, Andy Shevchenko wrote:
> >
> > ...
> >
> > > Does the following work for you? For simplicity, I copied__is_constexpr from
> > > include/linux/minmax.h (which isn't available in tools/). A proper patch
> > > would reuse __is_constexpr (possibly refactoring it to a separate
> > > header since bits.h including minmax.h for that only seems smelly)
> >
> > I think we need to have it in something like compiler.h (top level). Under
> > 'top level' I meant something with the function as of compiler.h but with
> > Linuxisms rather than compiler attributes or so.
>
> Right. Will you send a patch, or do you want me to?

Please, go ahead!
I'm in a vacation mood (tomorrow it will start)

> It would be good to get confirmation that __is_constexpr solves the
> build failure.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 11/12] tools: sync lib/find_bit implementation
  2021-05-11 11:53             ` Tetsuo Handa
@ 2021-05-11 20:37               ` Rikard Falkeborn
  2021-05-12  7:48                 ` Arnd Bergmann
  0 siblings, 1 reply; 34+ messages in thread
From: Rikard Falkeborn @ 2021-05-11 20:37 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Rikard Falkeborn, Andy Shevchenko, Yury Norov,
	Linux Kernel Mailing List, Andrew Morton, Rasmus Villemoes,
	linux-m68k, Linux-Arch, Linux-SH, Alexey Klimov, Arnd Bergmann,
	David Sterba, Dennis Zhou, Geert Uytterhoeven, Jianpeng Ma,
	Joe Perches, John Paul Adrian Glaubitz, Josh Poimboeuf,
	Rich Felker, Stefano Brivio, Wei Yang, Wolfram Sang,
	Yoshinori Sato

On Tue, May 11, 2021 at 08:53:53PM +0900, Tetsuo Handa wrote:
> On 2021/05/11 0:44, Andy Shevchenko wrote:
> > And I'm a bit lost here, because I can't imagine the offset being
> > constant along with a size of bitmap. What do we want to achieve by
> > this? Any examples to better understand the case?
> 
> Because I feel that the GENMASK() macro cannot be evaluated without
> both arguments being a constant.
> 
> The usage is
> 
>  unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
>                             unsigned long offset)
>  {
> +       if (small_const_nbits(size)) {
> +               unsigned long val;
> +
> +               if (unlikely(offset >= size))
> +                       return size;
> +
> +               val = *addr & GENMASK(size - 1, offset);
> +               return val ? __ffs(val) : size;
> +       }
> +
>         return _find_next_bit(addr, NULL, size, offset, 0UL, 0);
>  }
> 
> where GENMASK() might be called even if "offset" is not a constant.
> 
> #define GENMASK(h, l) \
>      (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
> 
> #define __GENMASK(h, l) \
>      (((~UL(0)) - (UL(1) << (l)) + 1) & \
>        (~UL(0) >> (BITS_PER_LONG - 1 - (h))))
> 
> #define GENMASK_INPUT_CHECK(h, l) \
>      (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
>           __builtin_constant_p((l) > (h)), (l) > (h), 0)))
> 
> __GENMASK() does not need "h" and "l" being a constant.
> 
> Yes, small_const_nbits(size) in find_next_bit() can guarantee that "size" is a
> constant and hence "h" argument in GENMASK_INPUT_CHECK() call is also a constant.
> But nothing can guarantee that "offset" is a constant, and hence nothing can
> guarantee that "l" argument in GENMASK_INPUT_CHECK() call is also a constant.
> 
> Then, how can (l) > (h) in __builtin_constant_p((l) > (h)) be evaluated at build time
> if either l or h (i.e. "offset" and "size - 1" in find_next_bit()) lacks a guarantee of
> being a constant?
> 

So the idea is that if (l > h) is constant, __builtin_constant_p should
evaluate that, and if it is not it should use zero instead as input to
__builtin_chose_expr(). This works with non-const inputs in many other
places in the kernel, but apparently in this case with a certain
compiler, it doesn't so I guess we need to work around it.

> But what a surprise,
> 
> On 2021/05/11 7:51, Rikard Falkeborn wrote:
> > Does the following work for you? For simplicity, I copied__is_constexpr from
> > include/linux/minmax.h (which isn't available in tools/). A proper patch
> > would reuse __is_constexpr (possibly refactoring it to a separate
> > header since bits.h including minmax.h for that only seems smelly) and fix
> > bits.h in the kernel header as well, to keep the files in sync.
> 
> this works for me.
> 

Great, thanks for testing!

I sent a patch for this here:
https://lore.kernel.org/lkml/20210511203716.117010-1-rikard.falkeborn@gmail.com/

Rikard

> > 
> > diff --git a/tools/include/linux/bits.h b/tools/include/linux/bits.h
> > index 7f475d59a097..7bc4c31a7df0 100644
> > --- a/tools/include/linux/bits.h
> > +++ b/tools/include/linux/bits.h
> > @@ -19,10 +19,13 @@
> >   * GENMASK_ULL(39, 21) gives us the 64bit vector 0x000000ffffe00000.
> >   */
> >  #if !defined(__ASSEMBLY__)
> > +
> > +#define __is_constexpr(x) \
> > +       (sizeof(int) == sizeof(*(8 ? ((void *)((long)(x) * 0l)) : (int *)8)))
> >  #include <linux/build_bug.h>
> >  #define GENMASK_INPUT_CHECK(h, l) \
> >         (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
> > -               __builtin_constant_p((l) > (h)), (l) > (h), 0)))
> > +               __is_constexpr((l) > (h)), (l) > (h), 0)))
> >  #else
> >  /*
> >   * BUILD_BUG_ON_ZERO is not available in h files included from asm files,
> > 
> 
> 
> 
> On 2021/05/11 7:52, Yury Norov wrote:
> > I tested the objtool build with the 8.4.0 and 7.5.0 compilers from
> > ubuntu 21 distro, and it looks working. Can you please share more
> > details about your system? 
> 
> Nothing special. A plain x86_64 CentOS 7.9 system with devtoolset-8.
> 
> $ /opt/rh/devtoolset-8/root/bin/gcc --version
> gcc (GCC) 8.3.1 20190311 (Red Hat 8.3.1-3)
> Copyright (C) 2018 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> 
> $ rpm -qi devtoolset-8-gcc
> Name        : devtoolset-8-gcc
> Version     : 8.3.1
> Release     : 3.2.el7
> Architecture: x86_64
> Install Date: Wed Apr 22 07:58:16 2020
> Group       : Development/Languages
> Size        : 74838011
> License     : GPLv3+ and GPLv3+ with exceptions and GPLv2+ with exceptions and LGPLv2+ and BSD
> Signature   : RSA/SHA1, Thu Apr 16 19:44:43 2020, Key ID 4eb84e71f2ee9d55
> Source RPM  : devtoolset-8-gcc-8.3.1-3.2.el7.src.rpm
> Build Date  : Sat Mar 28 00:06:45 2020
> Build Host  : c1be.rdu2.centos.org
> Relocations : (not relocatable)
> Packager    : CBS <cbs@centos.org>
> Vendor      : CentOS
> URL         : http://gcc.gnu.org
> Summary     : GCC version 8
> Description :
> The devtoolset-8-gcc package contains the GNU Compiler Collection version 7.
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 11/12] tools: sync lib/find_bit implementation
  2021-05-11 20:37               ` Rikard Falkeborn
@ 2021-05-12  7:48                 ` Arnd Bergmann
  2021-05-12  8:15                   ` Rasmus Villemoes
  0 siblings, 1 reply; 34+ messages in thread
From: Arnd Bergmann @ 2021-05-12  7:48 UTC (permalink / raw)
  To: Rikard Falkeborn
  Cc: Tetsuo Handa, Andy Shevchenko, Yury Norov,
	Linux Kernel Mailing List, Andrew Morton, Rasmus Villemoes,
	linux-m68k, Linux-Arch, Linux-SH, Alexey Klimov, David Sterba,
	Dennis Zhou, Geert Uytterhoeven, Jianpeng Ma, Joe Perches,
	John Paul Adrian Glaubitz, Josh Poimboeuf, Rich Felker,
	Stefano Brivio, Wei Yang, Wolfram Sang, Yoshinori Sato

On Tue, May 11, 2021 at 10:39 PM Rikard Falkeborn
<rikard.falkeborn@gmail.com> wrote:
> On Tue, May 11, 2021 at 08:53:53PM +0900, Tetsuo Handa wrote:

> > #define GENMASK_INPUT_CHECK(h, l) \
> >      (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
> >           __builtin_constant_p((l) > (h)), (l) > (h), 0)))
> >
> > __GENMASK() does not need "h" and "l" being a constant.
> >
> > Yes, small_const_nbits(size) in find_next_bit() can guarantee that "size" is a
> > constant and hence "h" argument in GENMASK_INPUT_CHECK() call is also a constant.
> > But nothing can guarantee that "offset" is a constant, and hence nothing can
> > guarantee that "l" argument in GENMASK_INPUT_CHECK() call is also a constant.
> >
> > Then, how can (l) > (h) in __builtin_constant_p((l) > (h)) be evaluated at build time
> > if either l or h (i.e. "offset" and "size - 1" in find_next_bit()) lacks a guarantee of
> > being a constant?
> >
>
> So the idea is that if (l > h) is constant, __builtin_constant_p should
> evaluate that, and if it is not it should use zero instead as input to
> __builtin_chose_expr(). This works with non-const inputs in many other
> places in the kernel, but apparently in this case with a certain
> compiler, it doesn't so I guess we need to work around it.

I have a vague memory that __builtin_constant_p() inside of
__builtin_choose_expr()
always evaluates to false because of the order in which the compiler processes
those: If constant-folding only happens after __builtin_choose_expr(), then
__builtin_constant_p() has to be false.

        Arnd

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 11/12] tools: sync lib/find_bit implementation
  2021-05-12  7:48                 ` Arnd Bergmann
@ 2021-05-12  8:15                   ` Rasmus Villemoes
  2021-05-12  8:33                     ` Arnd Bergmann
  0 siblings, 1 reply; 34+ messages in thread
From: Rasmus Villemoes @ 2021-05-12  8:15 UTC (permalink / raw)
  To: Arnd Bergmann, Rikard Falkeborn
  Cc: Tetsuo Handa, Andy Shevchenko, Yury Norov,
	Linux Kernel Mailing List, Andrew Morton, linux-m68k, Linux-Arch,
	Linux-SH, Alexey Klimov, David Sterba, Dennis Zhou,
	Geert Uytterhoeven, Jianpeng Ma, Joe Perches,
	John Paul Adrian Glaubitz, Josh Poimboeuf, Rich Felker,
	Stefano Brivio, Wei Yang, Wolfram Sang, Yoshinori Sato

On 12/05/2021 09.48, Arnd Bergmann wrote:
> On Tue, May 11, 2021 at 10:39 PM Rikard Falkeborn
> <rikard.falkeborn@gmail.com> wrote:
>> On Tue, May 11, 2021 at 08:53:53PM +0900, Tetsuo Handa wrote:
> 
>>> #define GENMASK_INPUT_CHECK(h, l) \
>>>      (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
>>>           __builtin_constant_p((l) > (h)), (l) > (h), 0)))
>>>
>>> __GENMASK() does not need "h" and "l" being a constant.
>>>
>>> Yes, small_const_nbits(size) in find_next_bit() can guarantee that "size" is a
>>> constant and hence "h" argument in GENMASK_INPUT_CHECK() call is also a constant.
>>> But nothing can guarantee that "offset" is a constant, and hence nothing can
>>> guarantee that "l" argument in GENMASK_INPUT_CHECK() call is also a constant.
>>>
>>> Then, how can (l) > (h) in __builtin_constant_p((l) > (h)) be evaluated at build time
>>> if either l or h (i.e. "offset" and "size - 1" in find_next_bit()) lacks a guarantee of
>>> being a constant?
>>>
>>
>> So the idea is that if (l > h) is constant, __builtin_constant_p should
>> evaluate that, and if it is not it should use zero instead as input to
>> __builtin_chose_expr(). This works with non-const inputs in many other
>> places in the kernel, but apparently in this case with a certain
>> compiler, it doesn't so I guess we need to work around it.
> 
> I have a vague memory that __builtin_constant_p() inside of
> __builtin_choose_expr()
> always evaluates to false because of the order in which the compiler processes
> those: If constant-folding only happens after __builtin_choose_expr(), then
> __builtin_constant_p() has to be false.

It's more complicated than that. __builtin_constant_p on something which
is a bona-fide Integer Constant Expression (ICE) gets folded early to a
1. And then it turns out that such a __builtin_constant_p() that folds
early to a 1 can be "stronger" than a literal 1, in the sense that when
used as the controlling expression of a ?: with nonsense in the false
branch, the former is OK but the latter fails:

https://lore.kernel.org/lkml/c68a0f46-346c-70a0-a9b8-31747888f05f@rasmusvillemoes.dk/

Now what happens when the argument to __builtin_constant_p is not an ICE
is a lot more complicated. The argument _may_ be so obviously
non-constant that it can be folded early to a 0, hence still be suitable
as first argument to __b_c_e. But it is also possible that the compiler
leaves it unevaluated, in the "hope" that a later optimization stage
could prove the argument constant. And that's the case where __b_c_e
will then break, because that can't be left unevaluated for very long -
the very _type_ of the result depends on which branch is chosen.

tl;dr: there's no "order in which the compiler processes those", __b_c_p
can get evaluated (folded) early, before __b_c_e inspects it, or be left
for later stages.

Rasmus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 11/12] tools: sync lib/find_bit implementation
  2021-05-12  8:15                   ` Rasmus Villemoes
@ 2021-05-12  8:33                     ` Arnd Bergmann
  0 siblings, 0 replies; 34+ messages in thread
From: Arnd Bergmann @ 2021-05-12  8:33 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Rikard Falkeborn, Tetsuo Handa, Andy Shevchenko, Yury Norov,
	Linux Kernel Mailing List, Andrew Morton, linux-m68k, Linux-Arch,
	Linux-SH, Alexey Klimov, David Sterba, Dennis Zhou,
	Geert Uytterhoeven, Jianpeng Ma, Joe Perches,
	John Paul Adrian Glaubitz, Josh Poimboeuf, Rich Felker,
	Stefano Brivio, Wei Yang, Wolfram Sang, Yoshinori Sato

On Wed, May 12, 2021 at 10:16 AM Rasmus Villemoes
<linux@rasmusvillemoes.dk> wrote:
> It's more complicated than that. __builtin_constant_p on something which
> is a bona-fide Integer Constant Expression (ICE) gets folded early to a
> 1. And then it turns out that such a __builtin_constant_p() that folds
> early to a 1 can be "stronger" than a literal 1, in the sense that when
> used as the controlling expression of a ?: with nonsense in the false
> branch, the former is OK but the latter fails:
>
> https://lore.kernel.org/lkml/c68a0f46-346c-70a0-a9b8-31747888f05f@rasmusvillemoes.dk/
>
> Now what happens when the argument to __builtin_constant_p is not an ICE
> is a lot more complicated. The argument _may_ be so obviously
> non-constant that it can be folded early to a 0, hence still be suitable
> as first argument to __b_c_e. But it is also possible that the compiler
> leaves it unevaluated, in the "hope" that a later optimization stage
> could prove the argument constant. And that's the case where __b_c_e
> will then break, because that can't be left unevaluated for very long -
> the very _type_ of the result depends on which branch is chosen.
>
> tl;dr: there's no "order in which the compiler processes those", __b_c_p
> can get evaluated (folded) early, before __b_c_e inspects it, or be left
> for later stages.

Thanks for the detailed explanation. Checking the actual behavior of
a trivial example, I find that

int f(void)
{
    const int i = 1;
    return __builtin_choose_expr(__builtin_constant_p(i), 1, 2);
}

used to return '2' with gcc-7, which is what I remembered.
With gcc-8 and up as well as any version of clang, it returns '1' now:
https://godbolt.org/z/7eKjbMocb

I have also seen a couple of cases where __builtin_constant_p()
without a __builtin_choose_expr() ended up unexpectedly
returning true when gcc found a code path that it would be constant
(e.g. conditionally initializing a variable to one of two possible
ICEs), but then later turning that back into a non-constant
expression in a later optimization stage. There is probably also
a much more detailed explanation behind those.


        Arnd

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 11/12] tools: sync lib/find_bit implementation
  2021-03-21 21:54 [PATCH v5 " Yury Norov
@ 2021-03-21 21:54 ` Yury Norov
  0 siblings, 0 replies; 34+ messages in thread
From: Yury Norov @ 2021-03-21 21:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Yury Norov, linux-m68k, linux-arch, linux-sh, Alexey Klimov,
	Andrew Morton, Andy Shevchenko, Arnd Bergmann, David Sterba,
	Dennis Zhou, Geert Uytterhoeven, Jianpeng Ma, Joe Perches,
	John Paul Adrian Glaubitz, Josh Poimboeuf, Rasmus Villemoes,
	Rich Felker, Stefano Brivio, Wei Yang, Wolfram Sang,
	Yoshinori Sato

Add fast paths to find_*_bit() functions as per kernel implementation.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 tools/include/asm-generic/bitops/find.h | 58 +++++++++++++++++++++++--
 tools/lib/find_bit.c                    |  4 +-
 2 files changed, 57 insertions(+), 5 deletions(-)

diff --git a/tools/include/asm-generic/bitops/find.h b/tools/include/asm-generic/bitops/find.h
index 9fe62d10b084..6481fd11012a 100644
--- a/tools/include/asm-generic/bitops/find.h
+++ b/tools/include/asm-generic/bitops/find.h
@@ -5,6 +5,9 @@
 extern unsigned long _find_next_bit(const unsigned long *addr1,
 		const unsigned long *addr2, unsigned long nbits,
 		unsigned long start, unsigned long invert, unsigned long le);
+extern unsigned long _find_first_bit(const unsigned long *addr, unsigned long size);
+extern unsigned long _find_first_zero_bit(const unsigned long *addr, unsigned long size);
+extern unsigned long _find_last_bit(const unsigned long *addr, unsigned long size);
 
 #ifndef find_next_bit
 /**
@@ -20,6 +23,16 @@ static inline
 unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
 			    unsigned long offset)
 {
+	if (small_const_nbits(size)) {
+		unsigned long val;
+
+		if (unlikely(offset >= size))
+			return size;
+
+		val = *addr & GENMASK(size - 1, offset);
+		return val ? __ffs(val) : size;
+	}
+
 	return _find_next_bit(addr, NULL, size, offset, 0UL, 0);
 }
 #endif
@@ -40,6 +53,16 @@ unsigned long find_next_and_bit(const unsigned long *addr1,
 		const unsigned long *addr2, unsigned long size,
 		unsigned long offset)
 {
+	if (small_const_nbits(size)) {
+		unsigned long val;
+
+		if (unlikely(offset >= size))
+			return size;
+
+		val = *addr1 & *addr2 & GENMASK(size - 1, offset);
+		return val ? __ffs(val) : size;
+	}
+
 	return _find_next_bit(addr1, addr2, size, offset, 0UL, 0);
 }
 #endif
@@ -58,6 +81,16 @@ static inline
 unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
 				 unsigned long offset)
 {
+	if (small_const_nbits(size)) {
+		unsigned long val;
+
+		if (unlikely(offset >= size))
+			return size;
+
+		val = *addr | ~GENMASK(size - 1, offset);
+		return val == ~0UL ? size : ffz(val);
+	}
+
 	return _find_next_bit(addr, NULL, size, offset, ~0UL, 0);
 }
 #endif
@@ -72,8 +105,17 @@ unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
  * Returns the bit number of the first set bit.
  * If no bits are set, returns @size.
  */
-extern unsigned long find_first_bit(const unsigned long *addr,
-				    unsigned long size);
+static inline
+unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
+{
+	if (small_const_nbits(size)) {
+		unsigned long val = *addr & GENMASK(size - 1, 0);
+
+		return val ? __ffs(val) : size;
+	}
+
+	return _find_first_bit(addr, size);
+}
 
 #endif /* find_first_bit */
 
@@ -87,7 +129,17 @@ extern unsigned long find_first_bit(const unsigned long *addr,
  * Returns the bit number of the first cleared bit.
  * If no bits are zero, returns @size.
  */
-unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size);
+static inline
+unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
+{
+	if (small_const_nbits(size)) {
+		unsigned long val = *addr | ~GENMASK(size - 1, 0);
+
+		return val == ~0UL ? size : ffz(val);
+	}
+
+	return _find_first_zero_bit(addr, size);
+}
 #endif
 
 #endif /*_TOOLS_LINUX_ASM_GENERIC_BITOPS_FIND_H_ */
diff --git a/tools/lib/find_bit.c b/tools/lib/find_bit.c
index 589fd2f26f94..109aa7ffcf97 100644
--- a/tools/lib/find_bit.c
+++ b/tools/lib/find_bit.c
@@ -83,7 +83,7 @@ unsigned long _find_next_bit(const unsigned long *addr1,
 /*
  * Find the first set bit in a memory region.
  */
-unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
+unsigned long _find_first_bit(const unsigned long *addr, unsigned long size)
 {
 	unsigned long idx;
 
@@ -100,7 +100,7 @@ unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
 /*
  * Find the first cleared bit in a memory region.
  */
-unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
+unsigned long _find_first_zero_bit(const unsigned long *addr, unsigned long size)
 {
 	unsigned long idx;
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2021-05-12  8:34 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-01  0:31 [PATCH v6 00/12] lib/find_bit: fast path for small bitmaps Yury Norov
2021-04-01  0:31 ` [PATCH 01/12] tools: disable -Wno-type-limits Yury Norov
2021-04-01  0:31 ` [PATCH 02/12] tools: bitmap: sync function declarations with the kernel Yury Norov
2021-04-01  0:31 ` [PATCH 03/12] tools: sync BITMAP_LAST_WORD_MASK() macro " Yury Norov
2021-04-01  0:31 ` [PATCH 04/12] arch: rearrange headers inclusion order in asm/bitops for m68k and sh Yury Norov
2021-04-01  0:31 ` [PATCH 05/12] lib: extend the scope of small_const_nbits() macro Yury Norov
2021-04-01  8:35   ` Andy Shevchenko
2021-04-01  0:31 ` [PATCH 06/12] tools: sync small_const_nbits() macro with the kernel Yury Norov
2021-04-01  0:31 ` [PATCH 07/12] lib: inline _find_next_bit() wrappers Yury Norov
2021-04-01  8:37   ` Andy Shevchenko
2021-04-01  0:31 ` [PATCH 08/12] tools: sync find_next_bit implementation Yury Norov
2021-04-01  0:31 ` [PATCH 09/12] lib: add fast path for find_next_*_bit() Yury Norov
2021-04-01  8:48   ` Andy Shevchenko
2021-04-01  0:31 ` [PATCH 10/12] lib: add fast path for find_first_*_bit() and find_last_bit() Yury Norov
2021-04-01  8:58   ` Andy Shevchenko
2021-04-01  0:31 ` [PATCH 11/12] tools: sync lib/find_bit implementation Yury Norov
2021-05-10 15:27   ` Tetsuo Handa
2021-05-10 15:44     ` Andy Shevchenko
2021-05-10 17:21       ` Yury Norov
2021-05-10 22:51       ` Rikard Falkeborn
2021-05-11  7:28         ` Andy Shevchenko
2021-05-11 10:36           ` Rikard Falkeborn
2021-05-11 11:53             ` Tetsuo Handa
2021-05-11 20:37               ` Rikard Falkeborn
2021-05-12  7:48                 ` Arnd Bergmann
2021-05-12  8:15                   ` Rasmus Villemoes
2021-05-12  8:33                     ` Arnd Bergmann
2021-05-11 12:17             ` Andy Shevchenko
2021-04-01  0:31 ` [PATCH 12/12] MAINTAINERS: Add entry for the bitmap API Yury Norov
2021-04-01  9:14 ` [PATCH v6 00/12] lib/find_bit: fast path for small bitmaps Andy Shevchenko
2021-04-01  9:28   ` Arnd Bergmann
2021-04-01  9:50     ` Andy Shevchenko
2021-04-02  0:32       ` Andrew Morton
  -- strict thread matches above, loose matches on Subject: below --
2021-03-21 21:54 [PATCH v5 " Yury Norov
2021-03-21 21:54 ` [PATCH 11/12] tools: sync lib/find_bit implementation Yury Norov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.