* [RESEND PATCH v2 0/6] lib/find_bit: fast path for small bitmaps
@ 2021-01-30 19:17 Yury Norov
2021-01-30 19:17 ` [PATCH 1/8] tools: disable -Wno-type-limits Yury Norov
` (8 more replies)
0 siblings, 9 replies; 23+ messages in thread
From: Yury Norov @ 2021-01-30 19:17 UTC (permalink / raw)
To: linux-m68k, linux-kernel, linux-sh, linux-arch
Cc: Yury Norov, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
Arnd Bergmann, Dennis Zhou, Andrew Morton, Wolfram Sang,
David Sterba, Andy Shevchenko, Stefano Brivio, Ma, Jianpeng,
Wei Yang, Josh Poimboeuf, John Paul Adrian Glaubitz, Joe Perches
Bitmap operations are much simpler and faster in case of small bitmaps
which fit into a single word. In linux/bitmap.h we have a machinery that
allows compiler to replace actual function call with a few instructions
if bitmaps passed into the function are small and their size is known at
compile time.
find_*_bit() API lacks this functionality; despite users will benefit from
it a lot. One important example is cpumask subsystem when
NR_CPUS <= BITS_PER_LONG. In the very best case, the compiler may replace
a find_*_bit() call for such a bitmap with a single ffs or ffz instruction.
Tools is synchronized with new implementation where needed.
v1: https://www.spinics.net/lists/kernel/msg3804727.html
v2: - employ GENMASK() for bitmaps;
- unify find_bit inliners in;
- address comments to v1;
Yury Norov (8):
tools: disable -Wno-type-limits
tools: bitmap: sync function declarations with linux kernel
arch: rearrange headers inclusion order in asm/bitops for m68k and sh
lib: introduce BITS_{FIRST,LAST} macro
bitsperlong.h: introduce SMALL_CONST() macro
lib: inline _find_next_bit() wrappers
lib: add fast path for find_next_*_bit()
lib: add fast path for find_first_*_bit() and find_last_bit()
arch/m68k/include/asm/bitops.h | 4 +-
arch/sh/include/asm/bitops.h | 3 +-
include/asm-generic/bitops/find.h | 108 +++++++++++++++++++++---
include/asm-generic/bitops/le.h | 38 ++++++++-
include/asm-generic/bitsperlong.h | 2 +
include/linux/bitmap.h | 60 ++++++-------
include/linux/bitops.h | 12 ---
include/linux/bits.h | 6 ++
include/linux/cpumask.h | 8 +-
include/linux/netdev_features.h | 2 +-
include/linux/nodemask.h | 2 +-
lib/bitmap.c | 26 +++---
lib/find_bit.c | 72 +++-------------
lib/genalloc.c | 8 +-
tools/include/asm-generic/bitops/find.h | 85 +++++++++++++++++--
tools/include/asm-generic/bitsperlong.h | 2 +
tools/include/linux/bitmap.h | 47 ++++-------
tools/include/linux/bits.h | 6 ++
tools/lib/bitmap.c | 10 +--
tools/lib/find_bit.c | 56 +++++-------
tools/scripts/Makefile.include | 1 +
tools/testing/radix-tree/bitmap.c | 4 +-
22 files changed, 337 insertions(+), 225 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 1/8] tools: disable -Wno-type-limits
2021-01-30 19:17 [RESEND PATCH v2 0/6] lib/find_bit: fast path for small bitmaps Yury Norov
@ 2021-01-30 19:17 ` Yury Norov
2021-01-30 19:17 ` [PATCH 2/8] tools: bitmap: sync function declarations with linux kernel Yury Norov
` (7 subsequent siblings)
8 siblings, 0 replies; 23+ messages in thread
From: Yury Norov @ 2021-01-30 19:17 UTC (permalink / raw)
To: linux-m68k, linux-kernel, linux-sh, linux-arch
Cc: Yury Norov, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
Arnd Bergmann, Dennis Zhou, Andrew Morton, Wolfram Sang,
David Sterba, Andy Shevchenko, Stefano Brivio, Ma, Jianpeng,
Wei Yang, Josh Poimboeuf, John Paul Adrian Glaubitz, Joe Perches
GENMASK(h, l) may be passed with unsigned types. In such case,
this warning is generated for example in case GENMASK(h, 0)
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
tools/scripts/Makefile.include | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/scripts/Makefile.include b/tools/scripts/Makefile.include
index 1358e89cdf7d..58bb319624f6 100644
--- a/tools/scripts/Makefile.include
+++ b/tools/scripts/Makefile.include
@@ -38,6 +38,7 @@ EXTRA_WARNINGS += -Wswitch-enum
EXTRA_WARNINGS += -Wundef
EXTRA_WARNINGS += -Wwrite-strings
EXTRA_WARNINGS += -Wformat
+EXTRA_WARNINGS += -Wno-type-limits
CC_NO_CLANG := $(shell $(CC) -dM -E -x c /dev/null | grep -Fq "__clang__"; echo $$?)
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 2/8] tools: bitmap: sync function declarations with linux kernel
2021-01-30 19:17 [RESEND PATCH v2 0/6] lib/find_bit: fast path for small bitmaps Yury Norov
2021-01-30 19:17 ` [PATCH 1/8] tools: disable -Wno-type-limits Yury Norov
@ 2021-01-30 19:17 ` Yury Norov
2021-01-30 19:17 ` [PATCH 3/8] arch: rearrange headers inclusion order in asm/bitops for m68k and sh Yury Norov
` (6 subsequent siblings)
8 siblings, 0 replies; 23+ messages in thread
From: Yury Norov @ 2021-01-30 19:17 UTC (permalink / raw)
To: linux-m68k, linux-kernel, linux-sh, linux-arch
Cc: Yury Norov, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
Arnd Bergmann, Dennis Zhou, Andrew Morton, Wolfram Sang,
David Sterba, Andy Shevchenko, Stefano Brivio, Ma, Jianpeng,
Wei Yang, Josh Poimboeuf, John Paul Adrian Glaubitz, Joe Perches
Some functions in tools/include/linux/bitmap.h declare nbits
as int. In the kernel nbits is declared as unsigned int.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
tools/include/linux/bitmap.h | 8 ++++----
tools/lib/bitmap.c | 4 ++--
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/tools/include/linux/bitmap.h b/tools/include/linux/bitmap.h
index 477a1cae513f..7cbd23e56d48 100644
--- a/tools/include/linux/bitmap.h
+++ b/tools/include/linux/bitmap.h
@@ -30,7 +30,7 @@ void bitmap_clear(unsigned long *map, unsigned int start, int len);
#define small_const_nbits(nbits) \
(__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG)
-static inline void bitmap_zero(unsigned long *dst, int nbits)
+static inline void bitmap_zero(unsigned long *dst, unsigned int nbits)
{
if (small_const_nbits(nbits))
*dst = 0UL;
@@ -66,7 +66,7 @@ static inline int bitmap_full(const unsigned long *src, unsigned int nbits)
return find_first_zero_bit(src, nbits) == nbits;
}
-static inline int bitmap_weight(const unsigned long *src, int nbits)
+static inline int bitmap_weight(const unsigned long *src, unsigned int nbits)
{
if (small_const_nbits(nbits))
return hweight_long(*src & BITMAP_LAST_WORD_MASK(nbits));
@@ -74,7 +74,7 @@ static inline int bitmap_weight(const unsigned long *src, int nbits)
}
static inline void bitmap_or(unsigned long *dst, const unsigned long *src1,
- const unsigned long *src2, int nbits)
+ const unsigned long *src2, unsigned int nbits)
{
if (small_const_nbits(nbits))
*dst = *src1 | *src2;
@@ -141,7 +141,7 @@ static inline void bitmap_free(unsigned long *bitmap)
* @buf: buffer to store output
* @size: size of @buf
*/
-size_t bitmap_scnprintf(unsigned long *bitmap, int nbits,
+size_t bitmap_scnprintf(unsigned long *bitmap, unsigned int nbits,
char *buf, size_t size);
/**
diff --git a/tools/lib/bitmap.c b/tools/lib/bitmap.c
index 5043747ef6c5..f4e914712b6f 100644
--- a/tools/lib/bitmap.c
+++ b/tools/lib/bitmap.c
@@ -28,11 +28,11 @@ void __bitmap_or(unsigned long *dst, const unsigned long *bitmap1,
dst[k] = bitmap1[k] | bitmap2[k];
}
-size_t bitmap_scnprintf(unsigned long *bitmap, int nbits,
+size_t bitmap_scnprintf(unsigned long *bitmap, unsigned int nbits,
char *buf, size_t size)
{
/* current bit is 'cur', most recently seen range is [rbot, rtop] */
- int cur, rbot, rtop;
+ unsigned int cur, rbot, rtop;
bool first = true;
size_t ret = 0;
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 3/8] arch: rearrange headers inclusion order in asm/bitops for m68k and sh
2021-01-30 19:17 [RESEND PATCH v2 0/6] lib/find_bit: fast path for small bitmaps Yury Norov
2021-01-30 19:17 ` [PATCH 1/8] tools: disable -Wno-type-limits Yury Norov
2021-01-30 19:17 ` [PATCH 2/8] tools: bitmap: sync function declarations with linux kernel Yury Norov
@ 2021-01-30 19:17 ` Yury Norov
2021-01-30 19:17 ` [PATCH 4/8] lib: introduce BITS_{FIRST,LAST} macro Yury Norov
` (5 subsequent siblings)
8 siblings, 0 replies; 23+ messages in thread
From: Yury Norov @ 2021-01-30 19:17 UTC (permalink / raw)
To: linux-m68k, linux-kernel, linux-sh, linux-arch
Cc: Yury Norov, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
Arnd Bergmann, Dennis Zhou, Andrew Morton, Wolfram Sang,
David Sterba, Andy Shevchenko, Stefano Brivio, Ma, Jianpeng,
Wei Yang, Josh Poimboeuf, John Paul Adrian Glaubitz, Joe Perches
m68k and sh include bitmap/find.h prior to ffs/fls headers. New
fast-path implementation in find.h requires ffs/fls. Reordering
the headers inclusion sequence helps to prevent compile-time
implicit-function-declaration error.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
---
arch/m68k/include/asm/bitops.h | 4 ++--
arch/sh/include/asm/bitops.h | 3 ++-
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
index 10133a968c8e..093590c9e70f 100644
--- a/arch/m68k/include/asm/bitops.h
+++ b/arch/m68k/include/asm/bitops.h
@@ -440,8 +440,6 @@ static inline unsigned long ffz(unsigned long word)
#endif
-#include <asm-generic/bitops/find.h>
-
#ifdef __KERNEL__
#if defined(CONFIG_CPU_HAS_NO_BITFIELDS)
@@ -531,4 +529,6 @@ static inline int __fls(int x)
#include <asm-generic/bitops/hweight.h>
#endif /* __KERNEL__ */
+#include <asm-generic/bitops/find.h>
+
#endif /* _M68K_BITOPS_H */
diff --git a/arch/sh/include/asm/bitops.h b/arch/sh/include/asm/bitops.h
index 450b5854d7c6..792bbe1237dc 100644
--- a/arch/sh/include/asm/bitops.h
+++ b/arch/sh/include/asm/bitops.h
@@ -58,7 +58,6 @@ static inline unsigned long __ffs(unsigned long word)
return result;
}
-#include <asm-generic/bitops/find.h>
#include <asm-generic/bitops/ffs.h>
#include <asm-generic/bitops/hweight.h>
#include <asm-generic/bitops/lock.h>
@@ -69,4 +68,6 @@ static inline unsigned long __ffs(unsigned long word)
#include <asm-generic/bitops/__fls.h>
#include <asm-generic/bitops/fls64.h>
+#include <asm-generic/bitops/find.h>
+
#endif /* __ASM_SH_BITOPS_H */
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 4/8] lib: introduce BITS_{FIRST,LAST} macro
2021-01-30 19:17 [RESEND PATCH v2 0/6] lib/find_bit: fast path for small bitmaps Yury Norov
` (2 preceding siblings ...)
2021-01-30 19:17 ` [PATCH 3/8] arch: rearrange headers inclusion order in asm/bitops for m68k and sh Yury Norov
@ 2021-01-30 19:17 ` Yury Norov
2021-02-01 13:42 ` Andy Shevchenko
2021-01-30 19:17 ` [PATCH 5/8] bitsperlong.h: introduce SMALL_CONST() macro Yury Norov
` (4 subsequent siblings)
8 siblings, 1 reply; 23+ messages in thread
From: Yury Norov @ 2021-01-30 19:17 UTC (permalink / raw)
To: linux-m68k, linux-kernel, linux-sh, linux-arch
Cc: Yury Norov, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
Arnd Bergmann, Dennis Zhou, Andrew Morton, Wolfram Sang,
David Sterba, Andy Shevchenko, Stefano Brivio, Ma, Jianpeng,
Wei Yang, Josh Poimboeuf, John Paul Adrian Glaubitz, Joe Perches
BITMAP_{LAST,FIRST}_WORD_MASK() in linux/bitmap.h duplicates the
functionality of GENMASK(). The scope of there macros is wider than just
bitmap. This patch defines 4 new macros: BITS_FIRST(), BITS_LAST(),
BITS_FIRST_MASK() and BITS_LAST_MASK() in linux/bits.h on top of GENMASK()
and replaces BITMAP_{LAST,FIRST}_WORD_MASK() to avoid duplication and
increases scope of the macros.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
include/linux/bitmap.h | 27 ++++++++++++---------------
include/linux/bits.h | 6 ++++++
include/linux/cpumask.h | 8 ++++----
include/linux/netdev_features.h | 2 +-
include/linux/nodemask.h | 2 +-
lib/bitmap.c | 26 +++++++++++++-------------
lib/find_bit.c | 4 ++--
lib/genalloc.c | 8 ++++----
tools/include/linux/bitmap.h | 20 ++++++--------------
tools/include/linux/bits.h | 6 ++++++
tools/lib/bitmap.c | 6 +++---
tools/lib/find_bit.c | 2 +-
tools/testing/radix-tree/bitmap.c | 4 ++--
13 files changed, 61 insertions(+), 60 deletions(-)
diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index 70a932470b2d..adf7bd9f0467 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -219,9 +219,6 @@ extern unsigned int bitmap_ord_to_pos(const unsigned long *bitmap, unsigned int
extern int bitmap_print_to_pagebuf(bool list, char *buf,
const unsigned long *maskp, int nmaskbits);
-#define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) & (BITS_PER_LONG - 1)))
-#define BITMAP_LAST_WORD_MASK(nbits) (~0UL >> (-(nbits) & (BITS_PER_LONG - 1)))
-
/*
* The static inlines below do not handle constant nbits==0 correctly,
* so make such users (should any ever turn up) call the out-of-line
@@ -257,7 +254,7 @@ static inline void bitmap_copy_clear_tail(unsigned long *dst,
{
bitmap_copy(dst, src, nbits);
if (nbits % BITS_PER_LONG)
- dst[nbits / BITS_PER_LONG] &= BITMAP_LAST_WORD_MASK(nbits);
+ dst[nbits / BITS_PER_LONG] &= BITS_FIRST_MASK(nbits - 1);
}
/*
@@ -282,7 +279,7 @@ static inline int bitmap_and(unsigned long *dst, const unsigned long *src1,
const unsigned long *src2, unsigned int nbits)
{
if (small_const_nbits(nbits))
- return (*dst = *src1 & *src2 & BITMAP_LAST_WORD_MASK(nbits)) != 0;
+ return (*dst = *src1 & *src2 & BITS_FIRST(nbits - 1)) != 0;
return __bitmap_and(dst, src1, src2, nbits);
}
@@ -308,7 +305,7 @@ static inline int bitmap_andnot(unsigned long *dst, const unsigned long *src1,
const unsigned long *src2, unsigned int nbits)
{
if (small_const_nbits(nbits))
- return (*dst = *src1 & ~(*src2) & BITMAP_LAST_WORD_MASK(nbits)) != 0;
+ return (*dst = *src1 & ~(*src2) & BITS_FIRST(nbits - 1)) != 0;
return __bitmap_andnot(dst, src1, src2, nbits);
}
@@ -332,7 +329,7 @@ static inline int bitmap_equal(const unsigned long *src1,
const unsigned long *src2, unsigned int nbits)
{
if (small_const_nbits(nbits))
- return !((*src1 ^ *src2) & BITMAP_LAST_WORD_MASK(nbits));
+ return !((*src1 ^ *src2) & BITS_FIRST(nbits - 1));
if (__builtin_constant_p(nbits & BITMAP_MEM_MASK) &&
IS_ALIGNED(nbits, BITMAP_MEM_ALIGNMENT))
return !memcmp(src1, src2, nbits / 8);
@@ -356,14 +353,14 @@ static inline bool bitmap_or_equal(const unsigned long *src1,
if (!small_const_nbits(nbits))
return __bitmap_or_equal(src1, src2, src3, nbits);
- return !(((*src1 | *src2) ^ *src3) & BITMAP_LAST_WORD_MASK(nbits));
+ return !(((*src1 | *src2) ^ *src3) & BITS_FIRST(nbits - 1));
}
static inline int bitmap_intersects(const unsigned long *src1,
const unsigned long *src2, unsigned int nbits)
{
if (small_const_nbits(nbits))
- return ((*src1 & *src2) & BITMAP_LAST_WORD_MASK(nbits)) != 0;
+ return ((*src1 & *src2) & BITS_FIRST(nbits - 1)) != 0;
else
return __bitmap_intersects(src1, src2, nbits);
}
@@ -372,7 +369,7 @@ static inline int bitmap_subset(const unsigned long *src1,
const unsigned long *src2, unsigned int nbits)
{
if (small_const_nbits(nbits))
- return ! ((*src1 & ~(*src2)) & BITMAP_LAST_WORD_MASK(nbits));
+ return !((*src1 & ~(*src2)) & BITS_FIRST(nbits - 1));
else
return __bitmap_subset(src1, src2, nbits);
}
@@ -380,7 +377,7 @@ static inline int bitmap_subset(const unsigned long *src1,
static inline bool bitmap_empty(const unsigned long *src, unsigned nbits)
{
if (small_const_nbits(nbits))
- return ! (*src & BITMAP_LAST_WORD_MASK(nbits));
+ return !(*src & BITS_FIRST(nbits - 1));
return find_first_bit(src, nbits) == nbits;
}
@@ -388,7 +385,7 @@ static inline bool bitmap_empty(const unsigned long *src, unsigned nbits)
static inline bool bitmap_full(const unsigned long *src, unsigned int nbits)
{
if (small_const_nbits(nbits))
- return ! (~(*src) & BITMAP_LAST_WORD_MASK(nbits));
+ return !(~(*src) & BITS_FIRST(nbits - 1));
return find_first_zero_bit(src, nbits) == nbits;
}
@@ -396,7 +393,7 @@ static inline bool bitmap_full(const unsigned long *src, unsigned int nbits)
static __always_inline int bitmap_weight(const unsigned long *src, unsigned int nbits)
{
if (small_const_nbits(nbits))
- return hweight_long(*src & BITMAP_LAST_WORD_MASK(nbits));
+ return hweight_long(*src & BITS_FIRST(nbits - 1));
return __bitmap_weight(src, nbits);
}
@@ -432,7 +429,7 @@ static inline void bitmap_shift_right(unsigned long *dst, const unsigned long *s
unsigned int shift, unsigned int nbits)
{
if (small_const_nbits(nbits))
- *dst = (*src & BITMAP_LAST_WORD_MASK(nbits)) >> shift;
+ *dst = (*src & BITS_FIRST(nbits - 1)) >> shift;
else
__bitmap_shift_right(dst, src, shift, nbits);
}
@@ -441,7 +438,7 @@ static inline void bitmap_shift_left(unsigned long *dst, const unsigned long *sr
unsigned int shift, unsigned int nbits)
{
if (small_const_nbits(nbits))
- *dst = (*src << shift) & BITMAP_LAST_WORD_MASK(nbits);
+ *dst = (*src << shift) & BITS_FIRST(nbits - 1);
else
__bitmap_shift_left(dst, src, shift, nbits);
}
diff --git a/include/linux/bits.h b/include/linux/bits.h
index 7f475d59a097..8c191c29506e 100644
--- a/include/linux/bits.h
+++ b/include/linux/bits.h
@@ -37,6 +37,12 @@
#define GENMASK(h, l) \
(GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
+#define BITS_FIRST(nr) GENMASK((nr), 0)
+#define BITS_LAST(nr) GENMASK(BITS_PER_LONG - 1, (nr))
+
+#define BITS_FIRST_MASK(nr) BITS_FIRST((nr) % BITS_PER_LONG)
+#define BITS_LAST_MASK(nr) BITS_LAST((nr) % BITS_PER_LONG)
+
#define __GENMASK_ULL(h, l) \
(((~ULL(0)) - (ULL(1) << (l)) + 1) & \
(~ULL(0) >> (BITS_PER_LONG_LONG - 1 - (h))))
diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index ab0c2a39bfb4..02bbf7319e39 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -899,7 +899,7 @@ static inline const struct cpumask *get_cpu_mask(unsigned int cpu)
#if NR_CPUS <= BITS_PER_LONG
#define CPU_BITS_ALL \
{ \
- [BITS_TO_LONGS(NR_CPUS)-1] = BITMAP_LAST_WORD_MASK(NR_CPUS) \
+ [BITS_TO_LONGS(NR_CPUS)-1] = BITS_FIRST_MASK(NR_CPUS - 1) \
}
#else /* NR_CPUS > BITS_PER_LONG */
@@ -907,7 +907,7 @@ static inline const struct cpumask *get_cpu_mask(unsigned int cpu)
#define CPU_BITS_ALL \
{ \
[0 ... BITS_TO_LONGS(NR_CPUS)-2] = ~0UL, \
- [BITS_TO_LONGS(NR_CPUS)-1] = BITMAP_LAST_WORD_MASK(NR_CPUS) \
+ [BITS_TO_LONGS(NR_CPUS)-1] = BITS_FIRST_MASK(NR_CPUS - 1) \
}
#endif /* NR_CPUS > BITS_PER_LONG */
@@ -931,13 +931,13 @@ cpumap_print_to_pagebuf(bool list, char *buf, const struct cpumask *mask)
#if NR_CPUS <= BITS_PER_LONG
#define CPU_MASK_ALL \
(cpumask_t) { { \
- [BITS_TO_LONGS(NR_CPUS)-1] = BITMAP_LAST_WORD_MASK(NR_CPUS) \
+ [BITS_TO_LONGS(NR_CPUS)-1] = BITS_FIRST_MASK(NR_CPUS - 1) \
} }
#else
#define CPU_MASK_ALL \
(cpumask_t) { { \
[0 ... BITS_TO_LONGS(NR_CPUS)-2] = ~0UL, \
- [BITS_TO_LONGS(NR_CPUS)-1] = BITMAP_LAST_WORD_MASK(NR_CPUS) \
+ [BITS_TO_LONGS(NR_CPUS)-1] = BITS_FIRST_MASK(NR_CPUS - 1) \
} }
#endif /* NR_CPUS > BITS_PER_LONG */
diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index 934de56644e7..76ac20af3da5 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -162,7 +162,7 @@ enum {
*/
static inline int find_next_netdev_feature(u64 feature, unsigned long start)
{
- /* like BITMAP_LAST_WORD_MASK() for u64
+ /* like BITS_FIRST_MASK() for u64
* this sets the most significant 64 - start to 0.
*/
feature &= ~0ULL >> (-start & ((sizeof(feature) * 8) - 1));
diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
index ac398e143c9a..2df0787c9155 100644
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -302,7 +302,7 @@ static inline int __first_unset_node(const nodemask_t *maskp)
find_first_zero_bit(maskp->bits, MAX_NUMNODES));
}
-#define NODE_MASK_LAST_WORD BITMAP_LAST_WORD_MASK(MAX_NUMNODES)
+#define NODE_MASK_LAST_WORD BITS_FIRST_MASK(MAX_NUMNODES - 1)
#if MAX_NUMNODES <= BITS_PER_LONG
diff --git a/lib/bitmap.c b/lib/bitmap.c
index 75006c4036e9..2507fef664ad 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -52,7 +52,7 @@ int __bitmap_equal(const unsigned long *bitmap1,
return 0;
if (bits % BITS_PER_LONG)
- if ((bitmap1[k] ^ bitmap2[k]) & BITMAP_LAST_WORD_MASK(bits))
+ if ((bitmap1[k] ^ bitmap2[k]) & BITS_FIRST_MASK(bits - 1))
return 0;
return 1;
@@ -76,7 +76,7 @@ bool __bitmap_or_equal(const unsigned long *bitmap1,
return true;
tmp = (bitmap1[k] | bitmap2[k]) ^ bitmap3[k];
- return (tmp & BITMAP_LAST_WORD_MASK(bits)) == 0;
+ return (tmp & BITS_FIRST_MASK(bits - 1)) == 0;
}
void __bitmap_complement(unsigned long *dst, const unsigned long *src, unsigned int bits)
@@ -103,7 +103,7 @@ void __bitmap_shift_right(unsigned long *dst, const unsigned long *src,
{
unsigned k, lim = BITS_TO_LONGS(nbits);
unsigned off = shift/BITS_PER_LONG, rem = shift % BITS_PER_LONG;
- unsigned long mask = BITMAP_LAST_WORD_MASK(nbits);
+ unsigned long mask = BITS_FIRST_MASK(nbits - 1);
for (k = 0; off + k < lim; ++k) {
unsigned long upper, lower;
@@ -246,7 +246,7 @@ int __bitmap_and(unsigned long *dst, const unsigned long *bitmap1,
result |= (dst[k] = bitmap1[k] & bitmap2[k]);
if (bits % BITS_PER_LONG)
result |= (dst[k] = bitmap1[k] & bitmap2[k] &
- BITMAP_LAST_WORD_MASK(bits));
+ BITS_FIRST_MASK(bits - 1));
return result != 0;
}
EXPORT_SYMBOL(__bitmap_and);
@@ -284,7 +284,7 @@ int __bitmap_andnot(unsigned long *dst, const unsigned long *bitmap1,
result |= (dst[k] = bitmap1[k] & ~bitmap2[k]);
if (bits % BITS_PER_LONG)
result |= (dst[k] = bitmap1[k] & ~bitmap2[k] &
- BITMAP_LAST_WORD_MASK(bits));
+ BITS_FIRST_MASK(bits - 1));
return result != 0;
}
EXPORT_SYMBOL(__bitmap_andnot);
@@ -310,7 +310,7 @@ int __bitmap_intersects(const unsigned long *bitmap1,
return 1;
if (bits % BITS_PER_LONG)
- if ((bitmap1[k] & bitmap2[k]) & BITMAP_LAST_WORD_MASK(bits))
+ if ((bitmap1[k] & bitmap2[k]) & BITS_FIRST_MASK(bits - 1))
return 1;
return 0;
}
@@ -325,7 +325,7 @@ int __bitmap_subset(const unsigned long *bitmap1,
return 0;
if (bits % BITS_PER_LONG)
- if ((bitmap1[k] & ~bitmap2[k]) & BITMAP_LAST_WORD_MASK(bits))
+ if ((bitmap1[k] & ~bitmap2[k]) & BITS_FIRST_MASK(bits - 1))
return 0;
return 1;
}
@@ -340,7 +340,7 @@ int __bitmap_weight(const unsigned long *bitmap, unsigned int bits)
w += hweight_long(bitmap[k]);
if (bits % BITS_PER_LONG)
- w += hweight_long(bitmap[k] & BITMAP_LAST_WORD_MASK(bits));
+ w += hweight_long(bitmap[k] & BITS_FIRST_MASK(bits - 1));
return w;
}
@@ -351,7 +351,7 @@ void __bitmap_set(unsigned long *map, unsigned int start, int len)
unsigned long *p = map + BIT_WORD(start);
const unsigned int size = start + len;
int bits_to_set = BITS_PER_LONG - (start % BITS_PER_LONG);
- unsigned long mask_to_set = BITMAP_FIRST_WORD_MASK(start);
+ unsigned long mask_to_set = BITS_LAST_MASK(start);
while (len - bits_to_set >= 0) {
*p |= mask_to_set;
@@ -361,7 +361,7 @@ void __bitmap_set(unsigned long *map, unsigned int start, int len)
p++;
}
if (len) {
- mask_to_set &= BITMAP_LAST_WORD_MASK(size);
+ mask_to_set &= BITS_FIRST_MASK(size - 1);
*p |= mask_to_set;
}
}
@@ -372,7 +372,7 @@ void __bitmap_clear(unsigned long *map, unsigned int start, int len)
unsigned long *p = map + BIT_WORD(start);
const unsigned int size = start + len;
int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG);
- unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start);
+ unsigned long mask_to_clear = BITS_LAST_MASK(start);
while (len - bits_to_clear >= 0) {
*p &= ~mask_to_clear;
@@ -382,7 +382,7 @@ void __bitmap_clear(unsigned long *map, unsigned int start, int len)
p++;
}
if (len) {
- mask_to_clear &= BITMAP_LAST_WORD_MASK(size);
+ mask_to_clear &= BITS_FIRST_MASK(size - 1);
*p &= ~mask_to_clear;
}
}
@@ -1282,7 +1282,7 @@ void bitmap_from_arr32(unsigned long *bitmap, const u32 *buf, unsigned int nbits
/* Clear tail bits in last word beyond nbits. */
if (nbits % BITS_PER_LONG)
- bitmap[(halfwords - 1) / 2] &= BITMAP_LAST_WORD_MASK(nbits);
+ bitmap[(halfwords - 1) / 2] &= BITS_FIRST_MASK(nbits - 1);
}
EXPORT_SYMBOL(bitmap_from_arr32);
diff --git a/lib/find_bit.c b/lib/find_bit.c
index f67f86fd2f62..8c2a71a18793 100644
--- a/lib/find_bit.c
+++ b/lib/find_bit.c
@@ -44,7 +44,7 @@ static unsigned long _find_next_bit(const unsigned long *addr1,
tmp ^= invert;
/* Handle 1st word. */
- mask = BITMAP_FIRST_WORD_MASK(start);
+ mask = BITS_LAST_MASK(start);
if (le)
mask = swab(mask);
@@ -141,7 +141,7 @@ EXPORT_SYMBOL(find_first_zero_bit);
unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
{
if (size) {
- unsigned long val = BITMAP_LAST_WORD_MASK(size);
+ unsigned long val = BITS_FIRST_MASK(size - 1);
unsigned long idx = (size-1) / BITS_PER_LONG;
do {
diff --git a/lib/genalloc.c b/lib/genalloc.c
index 5dcf9cdcbc46..0af7275517ff 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -87,7 +87,7 @@ bitmap_set_ll(unsigned long *map, unsigned long start, unsigned long nr)
unsigned long *p = map + BIT_WORD(start);
const unsigned long size = start + nr;
int bits_to_set = BITS_PER_LONG - (start % BITS_PER_LONG);
- unsigned long mask_to_set = BITMAP_FIRST_WORD_MASK(start);
+ unsigned long mask_to_set = BITS_LAST_MASK(start);
while (nr >= bits_to_set) {
if (set_bits_ll(p, mask_to_set))
@@ -98,7 +98,7 @@ bitmap_set_ll(unsigned long *map, unsigned long start, unsigned long nr)
p++;
}
if (nr) {
- mask_to_set &= BITMAP_LAST_WORD_MASK(size);
+ mask_to_set &= BITS_FIRST_MASK(size - 1);
if (set_bits_ll(p, mask_to_set))
return nr;
}
@@ -123,7 +123,7 @@ bitmap_clear_ll(unsigned long *map, unsigned long start, unsigned long nr)
unsigned long *p = map + BIT_WORD(start);
const unsigned long size = start + nr;
int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG);
- unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start);
+ unsigned long mask_to_clear = BITS_LAST_MASK(start);
while (nr >= bits_to_clear) {
if (clear_bits_ll(p, mask_to_clear))
@@ -134,7 +134,7 @@ bitmap_clear_ll(unsigned long *map, unsigned long start, unsigned long nr)
p++;
}
if (nr) {
- mask_to_clear &= BITMAP_LAST_WORD_MASK(size);
+ mask_to_clear &= BITS_FIRST_MASK(size - 1);
if (clear_bits_ll(p, mask_to_clear))
return nr;
}
diff --git a/tools/include/linux/bitmap.h b/tools/include/linux/bitmap.h
index 7cbd23e56d48..b6e8430c8bc9 100644
--- a/tools/include/linux/bitmap.h
+++ b/tools/include/linux/bitmap.h
@@ -19,14 +19,6 @@ int __bitmap_equal(const unsigned long *bitmap1,
const unsigned long *bitmap2, unsigned int bits);
void bitmap_clear(unsigned long *map, unsigned int start, int len);
-#define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) & (BITS_PER_LONG - 1)))
-
-#define BITMAP_LAST_WORD_MASK(nbits) \
-( \
- ((nbits) % BITS_PER_LONG) ? \
- (1UL<<((nbits) % BITS_PER_LONG))-1 : ~0UL \
-)
-
#define small_const_nbits(nbits) \
(__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG)
@@ -47,13 +39,13 @@ static inline void bitmap_fill(unsigned long *dst, unsigned int nbits)
unsigned int len = (nlongs - 1) * sizeof(unsigned long);
memset(dst, 0xff, len);
}
- dst[nlongs - 1] = BITMAP_LAST_WORD_MASK(nbits);
+ dst[nlongs - 1] = BITS_FIRST(nbits - 1);
}
static inline int bitmap_empty(const unsigned long *src, unsigned nbits)
{
if (small_const_nbits(nbits))
- return ! (*src & BITMAP_LAST_WORD_MASK(nbits));
+ return !(*src & BITS_FIRST(nbits - 1));
return find_first_bit(src, nbits) == nbits;
}
@@ -61,7 +53,7 @@ static inline int bitmap_empty(const unsigned long *src, unsigned nbits)
static inline int bitmap_full(const unsigned long *src, unsigned int nbits)
{
if (small_const_nbits(nbits))
- return ! (~(*src) & BITMAP_LAST_WORD_MASK(nbits));
+ return !(~(*src) & BITS_FIRST(nbits - 1));
return find_first_zero_bit(src, nbits) == nbits;
}
@@ -69,7 +61,7 @@ static inline int bitmap_full(const unsigned long *src, unsigned int nbits)
static inline int bitmap_weight(const unsigned long *src, unsigned int nbits)
{
if (small_const_nbits(nbits))
- return hweight_long(*src & BITMAP_LAST_WORD_MASK(nbits));
+ return hweight_long(*src & BITS_FIRST(nbits - 1));
return __bitmap_weight(src, nbits);
}
@@ -155,7 +147,7 @@ static inline int bitmap_and(unsigned long *dst, const unsigned long *src1,
const unsigned long *src2, unsigned int nbits)
{
if (small_const_nbits(nbits))
- return (*dst = *src1 & *src2 & BITMAP_LAST_WORD_MASK(nbits)) != 0;
+ return (*dst = *src1 & *src2 & BITS_FIRST(nbits - 1)) != 0;
return __bitmap_and(dst, src1, src2, nbits);
}
@@ -171,7 +163,7 @@ static inline int bitmap_equal(const unsigned long *src1,
const unsigned long *src2, unsigned int nbits)
{
if (small_const_nbits(nbits))
- return !((*src1 ^ *src2) & BITMAP_LAST_WORD_MASK(nbits));
+ return !((*src1 ^ *src2) & BITS_FIRST(nbits - 1));
if (__builtin_constant_p(nbits & BITMAP_MEM_MASK) &&
IS_ALIGNED(nbits, BITMAP_MEM_ALIGNMENT))
return !memcmp(src1, src2, nbits / 8);
diff --git a/tools/include/linux/bits.h b/tools/include/linux/bits.h
index 7f475d59a097..8c191c29506e 100644
--- a/tools/include/linux/bits.h
+++ b/tools/include/linux/bits.h
@@ -37,6 +37,12 @@
#define GENMASK(h, l) \
(GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
+#define BITS_FIRST(nr) GENMASK((nr), 0)
+#define BITS_LAST(nr) GENMASK(BITS_PER_LONG - 1, (nr))
+
+#define BITS_FIRST_MASK(nr) BITS_FIRST((nr) % BITS_PER_LONG)
+#define BITS_LAST_MASK(nr) BITS_LAST((nr) % BITS_PER_LONG)
+
#define __GENMASK_ULL(h, l) \
(((~ULL(0)) - (ULL(1) << (l)) + 1) & \
(~ULL(0) >> (BITS_PER_LONG_LONG - 1 - (h))))
diff --git a/tools/lib/bitmap.c b/tools/lib/bitmap.c
index f4e914712b6f..8cffad2d1f77 100644
--- a/tools/lib/bitmap.c
+++ b/tools/lib/bitmap.c
@@ -13,7 +13,7 @@ int __bitmap_weight(const unsigned long *bitmap, int bits)
w += hweight_long(bitmap[k]);
if (bits % BITS_PER_LONG)
- w += hweight_long(bitmap[k] & BITMAP_LAST_WORD_MASK(bits));
+ w += hweight_long(bitmap[k] & BITS_FIRST_MASK(bits - 1));
return w;
}
@@ -68,7 +68,7 @@ int __bitmap_and(unsigned long *dst, const unsigned long *bitmap1,
result |= (dst[k] = bitmap1[k] & bitmap2[k]);
if (bits % BITS_PER_LONG)
result |= (dst[k] = bitmap1[k] & bitmap2[k] &
- BITMAP_LAST_WORD_MASK(bits));
+ BITS_FIRST_MASK(bits - 1));
return result != 0;
}
@@ -81,7 +81,7 @@ int __bitmap_equal(const unsigned long *bitmap1,
return 0;
if (bits % BITS_PER_LONG)
- if ((bitmap1[k] ^ bitmap2[k]) & BITMAP_LAST_WORD_MASK(bits))
+ if ((bitmap1[k] ^ bitmap2[k]) & BITS_FIRST_MASK(bits - 1))
return 0;
return 1;
diff --git a/tools/lib/find_bit.c b/tools/lib/find_bit.c
index ac37022e9486..49abb18549cc 100644
--- a/tools/lib/find_bit.c
+++ b/tools/lib/find_bit.c
@@ -43,7 +43,7 @@ static inline unsigned long _find_next_bit(const unsigned long *addr1,
tmp ^= invert;
/* Handle 1st word. */
- tmp &= BITMAP_FIRST_WORD_MASK(start);
+ tmp &= BITS_LAST_MASK(start);
start = round_down(start, BITS_PER_LONG);
while (!tmp) {
diff --git a/tools/testing/radix-tree/bitmap.c b/tools/testing/radix-tree/bitmap.c
index 66ec4a24a203..aedc15461f78 100644
--- a/tools/testing/radix-tree/bitmap.c
+++ b/tools/testing/radix-tree/bitmap.c
@@ -7,7 +7,7 @@ void bitmap_clear(unsigned long *map, unsigned int start, int len)
unsigned long *p = map + BIT_WORD(start);
const unsigned int size = start + len;
int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG);
- unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start);
+ unsigned long mask_to_clear = BITS_LAST_MASK(start);
while (len - bits_to_clear >= 0) {
*p &= ~mask_to_clear;
@@ -17,7 +17,7 @@ void bitmap_clear(unsigned long *map, unsigned int start, int len)
p++;
}
if (len) {
- mask_to_clear &= BITMAP_LAST_WORD_MASK(size);
+ mask_to_clear &= BITS_FIRST_MASK(size - 1);
*p &= ~mask_to_clear;
}
}
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 5/8] bitsperlong.h: introduce SMALL_CONST() macro
2021-01-30 19:17 [RESEND PATCH v2 0/6] lib/find_bit: fast path for small bitmaps Yury Norov
` (3 preceding siblings ...)
2021-01-30 19:17 ` [PATCH 4/8] lib: introduce BITS_{FIRST,LAST} macro Yury Norov
@ 2021-01-30 19:17 ` Yury Norov
2021-02-01 13:45 ` Andy Shevchenko
2021-01-30 19:17 ` [PATCH 6/8] lib: inline _find_next_bit() wrappers Yury Norov
` (3 subsequent siblings)
8 siblings, 1 reply; 23+ messages in thread
From: Yury Norov @ 2021-01-30 19:17 UTC (permalink / raw)
To: linux-m68k, linux-kernel, linux-sh, linux-arch
Cc: Yury Norov, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
Arnd Bergmann, Dennis Zhou, Andrew Morton, Wolfram Sang,
David Sterba, Andy Shevchenko, Stefano Brivio, Ma, Jianpeng,
Wei Yang, Josh Poimboeuf, John Paul Adrian Glaubitz, Joe Perches
Many algorithms become simpler if they are passed with relatively small
input values. One example is bitmap operations when the whole bitmap fits
into one word. To implement such simplifications, linux/bitmap.h declares
small_const_nbits() macro.
Other subsystems may also benefit from optimizations of this sort, like
find_bit API in the following patches. So it looks helpful to generalize
the macro and extend it's visibility.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
include/asm-generic/bitsperlong.h | 2 ++
include/linux/bitmap.h | 33 +++++++++++--------------
tools/include/asm-generic/bitsperlong.h | 2 ++
tools/include/linux/bitmap.h | 19 ++++++--------
4 files changed, 27 insertions(+), 29 deletions(-)
diff --git a/include/asm-generic/bitsperlong.h b/include/asm-generic/bitsperlong.h
index 3905c1c93dc2..0eeb77544f1d 100644
--- a/include/asm-generic/bitsperlong.h
+++ b/include/asm-generic/bitsperlong.h
@@ -23,4 +23,6 @@
#define BITS_PER_LONG_LONG 64
#endif
+#define SMALL_CONST(n) (__builtin_constant_p(n) && (unsigned long)(n) < BITS_PER_LONG)
+
#endif /* __ASM_GENERIC_BITS_PER_LONG */
diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index adf7bd9f0467..e89f1dace846 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -224,9 +224,6 @@ extern int bitmap_print_to_pagebuf(bool list, char *buf,
* so make such users (should any ever turn up) call the out-of-line
* versions.
*/
-#define small_const_nbits(nbits) \
- (__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG && (nbits) > 0)
-
static inline void bitmap_zero(unsigned long *dst, unsigned int nbits)
{
unsigned int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long);
@@ -278,7 +275,7 @@ extern void bitmap_to_arr32(u32 *buf, const unsigned long *bitmap,
static inline int bitmap_and(unsigned long *dst, const unsigned long *src1,
const unsigned long *src2, unsigned int nbits)
{
- if (small_const_nbits(nbits))
+ if (SMALL_CONST(nbits - 1))
return (*dst = *src1 & *src2 & BITS_FIRST(nbits - 1)) != 0;
return __bitmap_and(dst, src1, src2, nbits);
}
@@ -286,7 +283,7 @@ static inline int bitmap_and(unsigned long *dst, const unsigned long *src1,
static inline void bitmap_or(unsigned long *dst, const unsigned long *src1,
const unsigned long *src2, unsigned int nbits)
{
- if (small_const_nbits(nbits))
+ if (SMALL_CONST(nbits - 1))
*dst = *src1 | *src2;
else
__bitmap_or(dst, src1, src2, nbits);
@@ -295,7 +292,7 @@ static inline void bitmap_or(unsigned long *dst, const unsigned long *src1,
static inline void bitmap_xor(unsigned long *dst, const unsigned long *src1,
const unsigned long *src2, unsigned int nbits)
{
- if (small_const_nbits(nbits))
+ if (SMALL_CONST(nbits - 1))
*dst = *src1 ^ *src2;
else
__bitmap_xor(dst, src1, src2, nbits);
@@ -304,7 +301,7 @@ static inline void bitmap_xor(unsigned long *dst, const unsigned long *src1,
static inline int bitmap_andnot(unsigned long *dst, const unsigned long *src1,
const unsigned long *src2, unsigned int nbits)
{
- if (small_const_nbits(nbits))
+ if (SMALL_CONST(nbits - 1))
return (*dst = *src1 & ~(*src2) & BITS_FIRST(nbits - 1)) != 0;
return __bitmap_andnot(dst, src1, src2, nbits);
}
@@ -312,7 +309,7 @@ static inline int bitmap_andnot(unsigned long *dst, const unsigned long *src1,
static inline void bitmap_complement(unsigned long *dst, const unsigned long *src,
unsigned int nbits)
{
- if (small_const_nbits(nbits))
+ if (SMALL_CONST(nbits - 1))
*dst = ~(*src);
else
__bitmap_complement(dst, src, nbits);
@@ -328,7 +325,7 @@ static inline void bitmap_complement(unsigned long *dst, const unsigned long *sr
static inline int bitmap_equal(const unsigned long *src1,
const unsigned long *src2, unsigned int nbits)
{
- if (small_const_nbits(nbits))
+ if (SMALL_CONST(nbits - 1))
return !((*src1 ^ *src2) & BITS_FIRST(nbits - 1));
if (__builtin_constant_p(nbits & BITMAP_MEM_MASK) &&
IS_ALIGNED(nbits, BITMAP_MEM_ALIGNMENT))
@@ -350,7 +347,7 @@ static inline bool bitmap_or_equal(const unsigned long *src1,
const unsigned long *src3,
unsigned int nbits)
{
- if (!small_const_nbits(nbits))
+ if (!SMALL_CONST(nbits - 1))
return __bitmap_or_equal(src1, src2, src3, nbits);
return !(((*src1 | *src2) ^ *src3) & BITS_FIRST(nbits - 1));
@@ -359,7 +356,7 @@ static inline bool bitmap_or_equal(const unsigned long *src1,
static inline int bitmap_intersects(const unsigned long *src1,
const unsigned long *src2, unsigned int nbits)
{
- if (small_const_nbits(nbits))
+ if (SMALL_CONST(nbits - 1))
return ((*src1 & *src2) & BITS_FIRST(nbits - 1)) != 0;
else
return __bitmap_intersects(src1, src2, nbits);
@@ -368,7 +365,7 @@ static inline int bitmap_intersects(const unsigned long *src1,
static inline int bitmap_subset(const unsigned long *src1,
const unsigned long *src2, unsigned int nbits)
{
- if (small_const_nbits(nbits))
+ if (SMALL_CONST(nbits - 1))
return !((*src1 & ~(*src2)) & BITS_FIRST(nbits - 1));
else
return __bitmap_subset(src1, src2, nbits);
@@ -376,7 +373,7 @@ static inline int bitmap_subset(const unsigned long *src1,
static inline bool bitmap_empty(const unsigned long *src, unsigned nbits)
{
- if (small_const_nbits(nbits))
+ if (SMALL_CONST(nbits - 1))
return !(*src & BITS_FIRST(nbits - 1));
return find_first_bit(src, nbits) == nbits;
@@ -384,7 +381,7 @@ static inline bool bitmap_empty(const unsigned long *src, unsigned nbits)
static inline bool bitmap_full(const unsigned long *src, unsigned int nbits)
{
- if (small_const_nbits(nbits))
+ if (SMALL_CONST(nbits - 1))
return !(~(*src) & BITS_FIRST(nbits - 1));
return find_first_zero_bit(src, nbits) == nbits;
@@ -392,7 +389,7 @@ static inline bool bitmap_full(const unsigned long *src, unsigned int nbits)
static __always_inline int bitmap_weight(const unsigned long *src, unsigned int nbits)
{
- if (small_const_nbits(nbits))
+ if (SMALL_CONST(nbits - 1))
return hweight_long(*src & BITS_FIRST(nbits - 1));
return __bitmap_weight(src, nbits);
}
@@ -428,7 +425,7 @@ static __always_inline void bitmap_clear(unsigned long *map, unsigned int start,
static inline void bitmap_shift_right(unsigned long *dst, const unsigned long *src,
unsigned int shift, unsigned int nbits)
{
- if (small_const_nbits(nbits))
+ if (SMALL_CONST(nbits - 1))
*dst = (*src & BITS_FIRST(nbits - 1)) >> shift;
else
__bitmap_shift_right(dst, src, shift, nbits);
@@ -437,7 +434,7 @@ static inline void bitmap_shift_right(unsigned long *dst, const unsigned long *s
static inline void bitmap_shift_left(unsigned long *dst, const unsigned long *src,
unsigned int shift, unsigned int nbits)
{
- if (small_const_nbits(nbits))
+ if (SMALL_CONST(nbits - 1))
*dst = (*src << shift) & BITS_FIRST(nbits - 1);
else
__bitmap_shift_left(dst, src, shift, nbits);
@@ -449,7 +446,7 @@ static inline void bitmap_replace(unsigned long *dst,
const unsigned long *mask,
unsigned int nbits)
{
- if (small_const_nbits(nbits))
+ if (SMALL_CONST(nbits - 1))
*dst = (*old & ~(*mask)) | (*new & *mask);
else
__bitmap_replace(dst, old, new, mask, nbits);
diff --git a/tools/include/asm-generic/bitsperlong.h b/tools/include/asm-generic/bitsperlong.h
index 8f2283052333..432d272baf27 100644
--- a/tools/include/asm-generic/bitsperlong.h
+++ b/tools/include/asm-generic/bitsperlong.h
@@ -18,4 +18,6 @@
#define BITS_PER_LONG_LONG 64
#endif
+#define SMALL_CONST(n) (__builtin_constant_p(n) && (unsigned long)(n) < BITS_PER_LONG)
+
#endif /* __ASM_GENERIC_BITS_PER_LONG */
diff --git a/tools/include/linux/bitmap.h b/tools/include/linux/bitmap.h
index b6e8430c8bc9..fdc0b64bbdbf 100644
--- a/tools/include/linux/bitmap.h
+++ b/tools/include/linux/bitmap.h
@@ -19,12 +19,9 @@ int __bitmap_equal(const unsigned long *bitmap1,
const unsigned long *bitmap2, unsigned int bits);
void bitmap_clear(unsigned long *map, unsigned int start, int len);
-#define small_const_nbits(nbits) \
- (__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG)
-
static inline void bitmap_zero(unsigned long *dst, unsigned int nbits)
{
- if (small_const_nbits(nbits))
+ if (SMALL_CONST(nbits - 1))
*dst = 0UL;
else {
int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long);
@@ -35,7 +32,7 @@ static inline void bitmap_zero(unsigned long *dst, unsigned int nbits)
static inline void bitmap_fill(unsigned long *dst, unsigned int nbits)
{
unsigned int nlongs = BITS_TO_LONGS(nbits);
- if (!small_const_nbits(nbits)) {
+ if (!SMALL_CONST(nbits - 1)) {
unsigned int len = (nlongs - 1) * sizeof(unsigned long);
memset(dst, 0xff, len);
}
@@ -44,7 +41,7 @@ static inline void bitmap_fill(unsigned long *dst, unsigned int nbits)
static inline int bitmap_empty(const unsigned long *src, unsigned nbits)
{
- if (small_const_nbits(nbits))
+ if (SMALL_CONST(nbits - 1))
return !(*src & BITS_FIRST(nbits - 1));
return find_first_bit(src, nbits) == nbits;
@@ -52,7 +49,7 @@ static inline int bitmap_empty(const unsigned long *src, unsigned nbits)
static inline int bitmap_full(const unsigned long *src, unsigned int nbits)
{
- if (small_const_nbits(nbits))
+ if (SMALL_CONST(nbits - 1))
return !(~(*src) & BITS_FIRST(nbits - 1));
return find_first_zero_bit(src, nbits) == nbits;
@@ -60,7 +57,7 @@ static inline int bitmap_full(const unsigned long *src, unsigned int nbits)
static inline int bitmap_weight(const unsigned long *src, unsigned int nbits)
{
- if (small_const_nbits(nbits))
+ if (SMALL_CONST(nbits - 1))
return hweight_long(*src & BITS_FIRST(nbits - 1));
return __bitmap_weight(src, nbits);
}
@@ -68,7 +65,7 @@ static inline int bitmap_weight(const unsigned long *src, unsigned int nbits)
static inline void bitmap_or(unsigned long *dst, const unsigned long *src1,
const unsigned long *src2, unsigned int nbits)
{
- if (small_const_nbits(nbits))
+ if (SMALL_CONST(nbits - 1))
*dst = *src1 | *src2;
else
__bitmap_or(dst, src1, src2, nbits);
@@ -146,7 +143,7 @@ size_t bitmap_scnprintf(unsigned long *bitmap, unsigned int nbits,
static inline int bitmap_and(unsigned long *dst, const unsigned long *src1,
const unsigned long *src2, unsigned int nbits)
{
- if (small_const_nbits(nbits))
+ if (SMALL_CONST(nbits - 1))
return (*dst = *src1 & *src2 & BITS_FIRST(nbits - 1)) != 0;
return __bitmap_and(dst, src1, src2, nbits);
}
@@ -162,7 +159,7 @@ static inline int bitmap_and(unsigned long *dst, const unsigned long *src1,
static inline int bitmap_equal(const unsigned long *src1,
const unsigned long *src2, unsigned int nbits)
{
- if (small_const_nbits(nbits))
+ if (SMALL_CONST(nbits - 1))
return !((*src1 ^ *src2) & BITS_FIRST(nbits - 1));
if (__builtin_constant_p(nbits & BITMAP_MEM_MASK) &&
IS_ALIGNED(nbits, BITMAP_MEM_ALIGNMENT))
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 6/8] lib: inline _find_next_bit() wrappers
2021-01-30 19:17 [RESEND PATCH v2 0/6] lib/find_bit: fast path for small bitmaps Yury Norov
` (4 preceding siblings ...)
2021-01-30 19:17 ` [PATCH 5/8] bitsperlong.h: introduce SMALL_CONST() macro Yury Norov
@ 2021-01-30 19:17 ` Yury Norov
2021-02-01 13:47 ` Andy Shevchenko
2021-01-30 19:17 ` [PATCH 7/8] lib: add fast path for find_next_*_bit() Yury Norov
` (2 subsequent siblings)
8 siblings, 1 reply; 23+ messages in thread
From: Yury Norov @ 2021-01-30 19:17 UTC (permalink / raw)
To: linux-m68k, linux-kernel, linux-sh, linux-arch
Cc: Yury Norov, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
Arnd Bergmann, Dennis Zhou, Andrew Morton, Wolfram Sang,
David Sterba, Andy Shevchenko, Stefano Brivio, Ma, Jianpeng,
Wei Yang, Josh Poimboeuf, John Paul Adrian Glaubitz, Joe Perches
lib/find_bit.c declares five single-line wrappers for _find_next_bit().
We may turn those wrappers to inline functions. It eliminates unneeded
function calls and opens room for compile-time optimizations.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
include/asm-generic/bitops/find.h | 28 ++++++++++---
include/asm-generic/bitops/le.h | 17 ++++++--
lib/find_bit.c | 56 +------------------------
tools/include/asm-generic/bitops/find.h | 27 +++++++++---
tools/lib/find_bit.c | 52 ++++++++++-------------
5 files changed, 79 insertions(+), 101 deletions(-)
diff --git a/include/asm-generic/bitops/find.h b/include/asm-generic/bitops/find.h
index 9fdf21302fdf..7ad70dab8e93 100644
--- a/include/asm-generic/bitops/find.h
+++ b/include/asm-generic/bitops/find.h
@@ -2,6 +2,10 @@
#ifndef _ASM_GENERIC_BITOPS_FIND_H_
#define _ASM_GENERIC_BITOPS_FIND_H_
+extern unsigned long _find_next_bit(const unsigned long *addr1,
+ const unsigned long *addr2, unsigned long nbits,
+ unsigned long start, unsigned long invert, unsigned long le);
+
#ifndef find_next_bit
/**
* find_next_bit - find the next set bit in a memory region
@@ -12,8 +16,12 @@
* Returns the bit number for the next set bit
* If no bits are set, returns @size.
*/
-extern unsigned long find_next_bit(const unsigned long *addr, unsigned long
- size, unsigned long offset);
+static inline
+unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
+ unsigned long offset)
+{
+ return _find_next_bit(addr, NULL, size, offset, 0UL, 0);
+}
#endif
#ifndef find_next_and_bit
@@ -27,9 +35,13 @@ extern unsigned long find_next_bit(const unsigned long *addr, unsigned long
* Returns the bit number for the next set bit
* If no bits are set, returns @size.
*/
-extern unsigned long find_next_and_bit(const unsigned long *addr1,
+static inline
+unsigned long find_next_and_bit(const unsigned long *addr1,
const unsigned long *addr2, unsigned long size,
- unsigned long offset);
+ unsigned long offset)
+{
+ return _find_next_bit(addr1, addr2, size, offset, 0UL, 0);
+}
#endif
#ifndef find_next_zero_bit
@@ -42,8 +54,12 @@ extern unsigned long find_next_and_bit(const unsigned long *addr1,
* Returns the bit number of the next zero bit
* If no bits are zero, returns @size.
*/
-extern unsigned long find_next_zero_bit(const unsigned long *addr, unsigned
- long size, unsigned long offset);
+static inline
+unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
+ unsigned long offset)
+{
+ return _find_next_bit(addr, NULL, size, offset, ~0UL, 0);
+}
#endif
#ifdef CONFIG_GENERIC_FIND_FIRST_BIT
diff --git a/include/asm-generic/bitops/le.h b/include/asm-generic/bitops/le.h
index 188d3eba3ace..21305f6cea0b 100644
--- a/include/asm-generic/bitops/le.h
+++ b/include/asm-generic/bitops/le.h
@@ -2,6 +2,7 @@
#ifndef _ASM_GENERIC_BITOPS_LE_H_
#define _ASM_GENERIC_BITOPS_LE_H_
+#include <asm-generic/bitops/find.h>
#include <asm/types.h>
#include <asm/byteorder.h>
@@ -32,13 +33,21 @@ static inline unsigned long find_first_zero_bit_le(const void *addr,
#define BITOP_LE_SWIZZLE ((BITS_PER_LONG-1) & ~0x7)
#ifndef find_next_zero_bit_le
-extern unsigned long find_next_zero_bit_le(const void *addr,
- unsigned long size, unsigned long offset);
+static inline
+unsigned long find_next_zero_bit_le(const void *addr, unsigned
+ long size, unsigned long offset)
+{
+ return _find_next_bit(addr, NULL, size, offset, ~0UL, 1);
+}
#endif
#ifndef find_next_bit_le
-extern unsigned long find_next_bit_le(const void *addr,
- unsigned long size, unsigned long offset);
+static inline
+unsigned long find_next_bit_le(const void *addr, unsigned
+ long size, unsigned long offset)
+{
+ return _find_next_bit(addr, NULL, size, offset, 0UL, 1);
+}
#endif
#ifndef find_first_zero_bit_le
diff --git a/lib/find_bit.c b/lib/find_bit.c
index 8c2a71a18793..2470ae390f3c 100644
--- a/lib/find_bit.c
+++ b/lib/find_bit.c
@@ -29,7 +29,7 @@
* searching it for one bits.
* - The optional "addr2", which is anded with "addr1" if present.
*/
-static unsigned long _find_next_bit(const unsigned long *addr1,
+unsigned long _find_next_bit(const unsigned long *addr1,
const unsigned long *addr2, unsigned long nbits,
unsigned long start, unsigned long invert, unsigned long le)
{
@@ -68,37 +68,7 @@ static unsigned long _find_next_bit(const unsigned long *addr1,
return min(start + __ffs(tmp), nbits);
}
-#endif
-
-#ifndef find_next_bit
-/*
- * Find the next set bit in a memory region.
- */
-unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
- unsigned long offset)
-{
- return _find_next_bit(addr, NULL, size, offset, 0UL, 0);
-}
-EXPORT_SYMBOL(find_next_bit);
-#endif
-
-#ifndef find_next_zero_bit
-unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
- unsigned long offset)
-{
- return _find_next_bit(addr, NULL, size, offset, ~0UL, 0);
-}
-EXPORT_SYMBOL(find_next_zero_bit);
-#endif
-
-#if !defined(find_next_and_bit)
-unsigned long find_next_and_bit(const unsigned long *addr1,
- const unsigned long *addr2, unsigned long size,
- unsigned long offset)
-{
- return _find_next_bit(addr1, addr2, size, offset, 0UL, 0);
-}
-EXPORT_SYMBOL(find_next_and_bit);
+EXPORT_SYMBOL(_find_next_bit);
#endif
#ifndef find_first_bit
@@ -157,28 +127,6 @@ unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
EXPORT_SYMBOL(find_last_bit);
#endif
-#ifdef __BIG_ENDIAN
-
-#ifndef find_next_zero_bit_le
-unsigned long find_next_zero_bit_le(const void *addr, unsigned
- long size, unsigned long offset)
-{
- return _find_next_bit(addr, NULL, size, offset, ~0UL, 1);
-}
-EXPORT_SYMBOL(find_next_zero_bit_le);
-#endif
-
-#ifndef find_next_bit_le
-unsigned long find_next_bit_le(const void *addr, unsigned
- long size, unsigned long offset)
-{
- return _find_next_bit(addr, NULL, size, offset, 0UL, 1);
-}
-EXPORT_SYMBOL(find_next_bit_le);
-#endif
-
-#endif /* __BIG_ENDIAN */
-
unsigned long find_next_clump8(unsigned long *clump, const unsigned long *addr,
unsigned long size, unsigned long offset)
{
diff --git a/tools/include/asm-generic/bitops/find.h b/tools/include/asm-generic/bitops/find.h
index 16ed1982cb34..9fe62d10b084 100644
--- a/tools/include/asm-generic/bitops/find.h
+++ b/tools/include/asm-generic/bitops/find.h
@@ -2,6 +2,10 @@
#ifndef _TOOLS_LINUX_ASM_GENERIC_BITOPS_FIND_H_
#define _TOOLS_LINUX_ASM_GENERIC_BITOPS_FIND_H_
+extern unsigned long _find_next_bit(const unsigned long *addr1,
+ const unsigned long *addr2, unsigned long nbits,
+ unsigned long start, unsigned long invert, unsigned long le);
+
#ifndef find_next_bit
/**
* find_next_bit - find the next set bit in a memory region
@@ -12,8 +16,12 @@
* Returns the bit number for the next set bit
* If no bits are set, returns @size.
*/
-extern unsigned long find_next_bit(const unsigned long *addr, unsigned long
- size, unsigned long offset);
+static inline
+unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
+ unsigned long offset)
+{
+ return _find_next_bit(addr, NULL, size, offset, 0UL, 0);
+}
#endif
#ifndef find_next_and_bit
@@ -27,13 +35,16 @@ extern unsigned long find_next_bit(const unsigned long *addr, unsigned long
* Returns the bit number for the next set bit
* If no bits are set, returns @size.
*/
-extern unsigned long find_next_and_bit(const unsigned long *addr1,
+static inline
+unsigned long find_next_and_bit(const unsigned long *addr1,
const unsigned long *addr2, unsigned long size,
- unsigned long offset);
+ unsigned long offset)
+{
+ return _find_next_bit(addr1, addr2, size, offset, 0UL, 0);
+}
#endif
#ifndef find_next_zero_bit
-
/**
* find_next_zero_bit - find the next cleared bit in a memory region
* @addr: The address to base the search on
@@ -43,8 +54,12 @@ extern unsigned long find_next_and_bit(const unsigned long *addr1,
* Returns the bit number of the next zero bit
* If no bits are zero, returns @size.
*/
+static inline
unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
- unsigned long offset);
+ unsigned long offset)
+{
+ return _find_next_bit(addr, NULL, size, offset, ~0UL, 0);
+}
#endif
#ifndef find_first_bit
diff --git a/tools/lib/find_bit.c b/tools/lib/find_bit.c
index 49abb18549cc..c3378b291205 100644
--- a/tools/lib/find_bit.c
+++ b/tools/lib/find_bit.c
@@ -28,11 +28,12 @@
* searching it for one bits.
* - The optional "addr2", which is anded with "addr1" if present.
*/
-static inline unsigned long _find_next_bit(const unsigned long *addr1,
+unsigned long _find_next_bit(const unsigned long *addr1,
const unsigned long *addr2, unsigned long nbits,
- unsigned long start, unsigned long invert)
+ unsigned long start, unsigned long invert, unsigned long le)
{
- unsigned long tmp;
+ unsigned long tmp, mask;
+ (void) le;
if (unlikely(start >= nbits))
return nbits;
@@ -43,7 +44,19 @@ static inline unsigned long _find_next_bit(const unsigned long *addr1,
tmp ^= invert;
/* Handle 1st word. */
- tmp &= BITS_LAST_MASK(start);
+ mask = BITS_LAST_MASK(start);
+
+ /*
+ * Due to the lack of swab() in tools, and the fact that it doesn't
+ * need little-endian support, just comment it out
+ */
+#if (0)
+ if (le)
+ mask = swab(mask);
+#endif
+
+ tmp &= mask;
+
start = round_down(start, BITS_PER_LONG);
while (!tmp) {
@@ -57,18 +70,12 @@ static inline unsigned long _find_next_bit(const unsigned long *addr1,
tmp ^= invert;
}
- return min(start + __ffs(tmp), nbits);
-}
+#if (0)
+ if (le)
+ tmp = swab(tmp);
#endif
-#ifndef find_next_bit
-/*
- * Find the next set bit in a memory region.
- */
-unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
- unsigned long offset)
-{
- return _find_next_bit(addr, NULL, size, offset, 0UL);
+ return min(start + __ffs(tmp), nbits);
}
#endif
@@ -105,20 +112,3 @@ unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
return size;
}
#endif
-
-#ifndef find_next_zero_bit
-unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
- unsigned long offset)
-{
- return _find_next_bit(addr, NULL, size, offset, ~0UL);
-}
-#endif
-
-#ifndef find_next_and_bit
-unsigned long find_next_and_bit(const unsigned long *addr1,
- const unsigned long *addr2, unsigned long size,
- unsigned long offset)
-{
- return _find_next_bit(addr1, addr2, size, offset, 0UL);
-}
-#endif
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 7/8] lib: add fast path for find_next_*_bit()
2021-01-30 19:17 [RESEND PATCH v2 0/6] lib/find_bit: fast path for small bitmaps Yury Norov
` (5 preceding siblings ...)
2021-01-30 19:17 ` [PATCH 6/8] lib: inline _find_next_bit() wrappers Yury Norov
@ 2021-01-30 19:17 ` Yury Norov
2021-02-01 13:49 ` Andy Shevchenko
2021-01-30 19:17 ` [PATCH 8/8] lib: add fast path for find_first_*_bit() and find_last_bit() Yury Norov
2021-02-15 21:30 ` [RESEND PATCH v2 0/6] lib/find_bit: fast path for small bitmaps Yury Norov
8 siblings, 1 reply; 23+ messages in thread
From: Yury Norov @ 2021-01-30 19:17 UTC (permalink / raw)
To: linux-m68k, linux-kernel, linux-sh, linux-arch
Cc: Yury Norov, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
Arnd Bergmann, Dennis Zhou, Andrew Morton, Wolfram Sang,
David Sterba, Andy Shevchenko, Stefano Brivio, Ma, Jianpeng,
Wei Yang, Josh Poimboeuf, John Paul Adrian Glaubitz, Joe Perches
Similarly to bitmap functions, find_next_*_bit() users will benefit
if we'll handle a case of bitmaps that fit into a single word. In the
very best case, the compiler may replace a function call with a
single ffs or ffz instruction.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
include/asm-generic/bitops/find.h | 30 +++++++++++++++++++++++++
include/asm-generic/bitops/le.h | 21 +++++++++++++++++
tools/include/asm-generic/bitops/find.h | 30 +++++++++++++++++++++++++
3 files changed, 81 insertions(+)
diff --git a/include/asm-generic/bitops/find.h b/include/asm-generic/bitops/find.h
index 7ad70dab8e93..8bd7a33a889d 100644
--- a/include/asm-generic/bitops/find.h
+++ b/include/asm-generic/bitops/find.h
@@ -20,6 +20,16 @@ static inline
unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
unsigned long offset)
{
+ if (SMALL_CONST(size - 1)) {
+ unsigned long val;
+
+ if (unlikely(offset >= size))
+ return size;
+
+ val = *addr & GENMASK(size - 1, offset);
+ return val ? __ffs(val) : size;
+ }
+
return _find_next_bit(addr, NULL, size, offset, 0UL, 0);
}
#endif
@@ -40,6 +50,16 @@ unsigned long find_next_and_bit(const unsigned long *addr1,
const unsigned long *addr2, unsigned long size,
unsigned long offset)
{
+ if (SMALL_CONST(size - 1)) {
+ unsigned long val;
+
+ if (unlikely(offset >= size))
+ return size;
+
+ val = *addr1 & *addr2 & GENMASK(size - 1, offset);
+ return val ? __ffs(val) : size;
+ }
+
return _find_next_bit(addr1, addr2, size, offset, 0UL, 0);
}
#endif
@@ -58,6 +78,16 @@ static inline
unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
unsigned long offset)
{
+ if (SMALL_CONST(size - 1)) {
+ unsigned long val;
+
+ if (unlikely(offset >= size))
+ return size;
+
+ val = *addr | ~GENMASK(size - 1, offset);
+ return val == ~0UL ? size : ffz(val);
+ }
+
return _find_next_bit(addr, NULL, size, offset, ~0UL, 0);
}
#endif
diff --git a/include/asm-generic/bitops/le.h b/include/asm-generic/bitops/le.h
index 21305f6cea0b..18ebcf639d7f 100644
--- a/include/asm-generic/bitops/le.h
+++ b/include/asm-generic/bitops/le.h
@@ -5,6 +5,7 @@
#include <asm-generic/bitops/find.h>
#include <asm/types.h>
#include <asm/byteorder.h>
+#include <linux/swab.h>
#if defined(__LITTLE_ENDIAN)
@@ -37,6 +38,16 @@ static inline
unsigned long find_next_zero_bit_le(const void *addr, unsigned
long size, unsigned long offset)
{
+ if (SMALL_CONST(size)) {
+ unsigned long val = *(const unsigned long *)addr;
+
+ if (unlikely(offset >= size))
+ return size;
+
+ val = swab(val) | ~GENMASK(size - 1, offset);
+ return val == ~0UL ? size : ffz(val);
+ }
+
return _find_next_bit(addr, NULL, size, offset, ~0UL, 1);
}
#endif
@@ -46,6 +57,16 @@ static inline
unsigned long find_next_bit_le(const void *addr, unsigned
long size, unsigned long offset)
{
+ if (SMALL_CONST(size)) {
+ unsigned long val = *(const unsigned long *)addr;
+
+ if (unlikely(offset >= size))
+ return size;
+
+ val = swab(val) & GENMASK(size - 1, offset);
+ return val ? __ffs(val) : size;
+ }
+
return _find_next_bit(addr, NULL, size, offset, 0UL, 1);
}
#endif
diff --git a/tools/include/asm-generic/bitops/find.h b/tools/include/asm-generic/bitops/find.h
index 9fe62d10b084..eff868bd22f8 100644
--- a/tools/include/asm-generic/bitops/find.h
+++ b/tools/include/asm-generic/bitops/find.h
@@ -20,6 +20,16 @@ static inline
unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
unsigned long offset)
{
+ if (SMALL_CONST(size - 1)) {
+ unsigned long val;
+
+ if (unlikely(offset >= size))
+ return size;
+
+ val = *addr & GENMASK(size - 1, offset);
+ return val ? __ffs(val) : size;
+ }
+
return _find_next_bit(addr, NULL, size, offset, 0UL, 0);
}
#endif
@@ -40,6 +50,16 @@ unsigned long find_next_and_bit(const unsigned long *addr1,
const unsigned long *addr2, unsigned long size,
unsigned long offset)
{
+ if (SMALL_CONST(size - 1)) {
+ unsigned long val;
+
+ if (unlikely(offset >= size))
+ return size;
+
+ val = *addr1 & *addr2 & GENMASK(size - 1, offset);
+ return val ? __ffs(val) : size;
+ }
+
return _find_next_bit(addr1, addr2, size, offset, 0UL, 0);
}
#endif
@@ -58,6 +78,16 @@ static inline
unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
unsigned long offset)
{
+ if (SMALL_CONST(size - 1)) {
+ unsigned long val;
+
+ if (unlikely(offset >= size))
+ return size;
+
+ val = *addr | ~GENMASK(size - 1, offset);
+ return val == ~0UL ? size : ffz(val);
+ }
+
return _find_next_bit(addr, NULL, size, offset, ~0UL, 0);
}
#endif
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 8/8] lib: add fast path for find_first_*_bit() and find_last_bit()
2021-01-30 19:17 [RESEND PATCH v2 0/6] lib/find_bit: fast path for small bitmaps Yury Norov
` (6 preceding siblings ...)
2021-01-30 19:17 ` [PATCH 7/8] lib: add fast path for find_next_*_bit() Yury Norov
@ 2021-01-30 19:17 ` Yury Norov
2021-02-01 13:50 ` Andy Shevchenko
2021-02-15 21:30 ` [RESEND PATCH v2 0/6] lib/find_bit: fast path for small bitmaps Yury Norov
8 siblings, 1 reply; 23+ messages in thread
From: Yury Norov @ 2021-01-30 19:17 UTC (permalink / raw)
To: linux-m68k, linux-kernel, linux-sh, linux-arch
Cc: Yury Norov, Geert Uytterhoeven, Yoshinori Sato, Rich Felker,
Arnd Bergmann, Dennis Zhou, Andrew Morton, Wolfram Sang,
David Sterba, Andy Shevchenko, Stefano Brivio, Ma, Jianpeng,
Wei Yang, Josh Poimboeuf, John Paul Adrian Glaubitz, Joe Perches
Similarly to bitmap functions, users will benefit if we'll handle
a case of small-size bitmaps that fit into a single word.
While here, move the find_last_bit() declaration to bitops/find.h
where other find_*_bit() functions sit.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
include/asm-generic/bitops/find.h | 50 +++++++++++++++++++++++--
include/linux/bitops.h | 12 ------
lib/find_bit.c | 12 +++---
tools/include/asm-generic/bitops/find.h | 28 ++++++++++++--
tools/lib/find_bit.c | 4 +-
5 files changed, 79 insertions(+), 27 deletions(-)
diff --git a/include/asm-generic/bitops/find.h b/include/asm-generic/bitops/find.h
index 8bd7a33a889d..3633a37cec38 100644
--- a/include/asm-generic/bitops/find.h
+++ b/include/asm-generic/bitops/find.h
@@ -5,6 +5,9 @@
extern unsigned long _find_next_bit(const unsigned long *addr1,
const unsigned long *addr2, unsigned long nbits,
unsigned long start, unsigned long invert, unsigned long le);
+extern unsigned long _find_first_bit(const unsigned long *addr, unsigned long size);
+extern unsigned long _find_first_zero_bit(const unsigned long *addr, unsigned long size);
+extern unsigned long _find_last_bit(const unsigned long *addr, unsigned long size);
#ifndef find_next_bit
/**
@@ -102,8 +105,17 @@ unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
* Returns the bit number of the first set bit.
* If no bits are set, returns @size.
*/
-extern unsigned long find_first_bit(const unsigned long *addr,
- unsigned long size);
+static inline
+unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
+{
+ if (SMALL_CONST(size - 1)) {
+ unsigned long val = *addr & BITS_FIRST(size - 1);
+
+ return val ? __ffs(val) : size;
+ }
+
+ return _find_first_bit(addr, size);
+}
/**
* find_first_zero_bit - find the first cleared bit in a memory region
@@ -113,8 +125,17 @@ extern unsigned long find_first_bit(const unsigned long *addr,
* Returns the bit number of the first cleared bit.
* If no bits are zero, returns @size.
*/
-extern unsigned long find_first_zero_bit(const unsigned long *addr,
- unsigned long size);
+static inline
+unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
+{
+ if (SMALL_CONST(size - 1)) {
+ unsigned long val = *addr | ~BITS_FIRST(size - 1);
+
+ return val == ~0UL ? size : ffz(val);
+ }
+
+ return _find_first_zero_bit(addr, size);
+}
#else /* CONFIG_GENERIC_FIND_FIRST_BIT */
#ifndef find_first_bit
@@ -126,6 +147,27 @@ extern unsigned long find_first_zero_bit(const unsigned long *addr,
#endif /* CONFIG_GENERIC_FIND_FIRST_BIT */
+#ifndef find_last_bit
+/**
+ * find_last_bit - find the last set bit in a memory region
+ * @addr: The address to start the search at
+ * @size: The number of bits to search
+ *
+ * Returns the bit number of the last set bit, or size.
+ */
+static inline
+unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
+{
+ if (SMALL_CONST(size - 1)) {
+ unsigned long val = *addr & BITS_FIRST(size - 1);
+
+ return val ? __fls(val) : size;
+ }
+
+ return _find_last_bit(addr, size);
+}
+#endif
+
/**
* find_next_clump8 - find next 8-bit clump with set bits in a memory region
* @clump: location to store copy of found clump
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index a5a48303b0f1..26bf15e6cd35 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -286,17 +286,5 @@ static __always_inline void __assign_bit(long nr, volatile unsigned long *addr,
})
#endif
-#ifndef find_last_bit
-/**
- * find_last_bit - find the last set bit in a memory region
- * @addr: The address to start the search at
- * @size: The number of bits to search
- *
- * Returns the bit number of the last set bit, or size.
- */
-extern unsigned long find_last_bit(const unsigned long *addr,
- unsigned long size);
-#endif
-
#endif /* __KERNEL__ */
#endif
diff --git a/lib/find_bit.c b/lib/find_bit.c
index 2470ae390f3c..e2c301d28568 100644
--- a/lib/find_bit.c
+++ b/lib/find_bit.c
@@ -75,7 +75,7 @@ EXPORT_SYMBOL(_find_next_bit);
/*
* Find the first set bit in a memory region.
*/
-unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
+unsigned long _find_first_bit(const unsigned long *addr, unsigned long size)
{
unsigned long idx;
@@ -86,14 +86,14 @@ unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
return size;
}
-EXPORT_SYMBOL(find_first_bit);
+EXPORT_SYMBOL(_find_first_bit);
#endif
#ifndef find_first_zero_bit
/*
* Find the first cleared bit in a memory region.
*/
-unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
+unsigned long _find_first_zero_bit(const unsigned long *addr, unsigned long size)
{
unsigned long idx;
@@ -104,11 +104,11 @@ unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
return size;
}
-EXPORT_SYMBOL(find_first_zero_bit);
+EXPORT_SYMBOL(_find_first_zero_bit);
#endif
#ifndef find_last_bit
-unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
+unsigned long _find_last_bit(const unsigned long *addr, unsigned long size)
{
if (size) {
unsigned long val = BITS_FIRST_MASK(size - 1);
@@ -124,7 +124,7 @@ unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
}
return size;
}
-EXPORT_SYMBOL(find_last_bit);
+EXPORT_SYMBOL(_find_last_bit);
#endif
unsigned long find_next_clump8(unsigned long *clump, const unsigned long *addr,
diff --git a/tools/include/asm-generic/bitops/find.h b/tools/include/asm-generic/bitops/find.h
index eff868bd22f8..382cd80c61aa 100644
--- a/tools/include/asm-generic/bitops/find.h
+++ b/tools/include/asm-generic/bitops/find.h
@@ -5,6 +5,9 @@
extern unsigned long _find_next_bit(const unsigned long *addr1,
const unsigned long *addr2, unsigned long nbits,
unsigned long start, unsigned long invert, unsigned long le);
+extern unsigned long _find_first_bit(const unsigned long *addr, unsigned long size);
+extern unsigned long _find_first_zero_bit(const unsigned long *addr, unsigned long size);
+extern unsigned long _find_last_bit(const unsigned long *addr, unsigned long size);
#ifndef find_next_bit
/**
@@ -102,8 +105,17 @@ unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
* Returns the bit number of the first set bit.
* If no bits are set, returns @size.
*/
-extern unsigned long find_first_bit(const unsigned long *addr,
- unsigned long size);
+static inline
+unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
+{
+ if (SMALL_CONST(size - 1)) {
+ unsigned long val = *addr & BITS_FIRST(size - 1);
+
+ return val ? __ffs(val) : size;
+ }
+
+ return _find_first_bit(addr, size);
+}
#endif /* find_first_bit */
@@ -117,7 +129,17 @@ extern unsigned long find_first_bit(const unsigned long *addr,
* Returns the bit number of the first cleared bit.
* If no bits are zero, returns @size.
*/
-unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size);
+static inline
+unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
+{
+ if (SMALL_CONST(size - 1)) {
+ unsigned long val = *addr | ~BITS_FIRST(size - 1);
+
+ return val == ~0UL ? size : ffz(val);
+ }
+
+ return _find_first_zero_bit(addr, size);
+}
#endif
#endif /*_TOOLS_LINUX_ASM_GENERIC_BITOPS_FIND_H_ */
diff --git a/tools/lib/find_bit.c b/tools/lib/find_bit.c
index c3378b291205..a77884ca30ec 100644
--- a/tools/lib/find_bit.c
+++ b/tools/lib/find_bit.c
@@ -83,7 +83,7 @@ unsigned long _find_next_bit(const unsigned long *addr1,
/*
* Find the first set bit in a memory region.
*/
-unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
+unsigned long _find_first_bit(const unsigned long *addr, unsigned long size)
{
unsigned long idx;
@@ -100,7 +100,7 @@ unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
/*
* Find the first cleared bit in a memory region.
*/
-unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
+unsigned long _find_first_zero_bit(const unsigned long *addr, unsigned long size)
{
unsigned long idx;
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 4/8] lib: introduce BITS_{FIRST,LAST} macro
2021-01-30 19:17 ` [PATCH 4/8] lib: introduce BITS_{FIRST,LAST} macro Yury Norov
@ 2021-02-01 13:42 ` Andy Shevchenko
0 siblings, 0 replies; 23+ messages in thread
From: Andy Shevchenko @ 2021-02-01 13:42 UTC (permalink / raw)
To: Yury Norov
Cc: linux-m68k, linux-kernel, linux-sh, linux-arch,
Geert Uytterhoeven, Yoshinori Sato, Rich Felker, Arnd Bergmann,
Dennis Zhou, Andrew Morton, Wolfram Sang, David Sterba,
Stefano Brivio, Ma, Jianpeng, Wei Yang, Josh Poimboeuf,
John Paul Adrian Glaubitz, Joe Perches
On Sat, Jan 30, 2021 at 11:17:15AM -0800, Yury Norov wrote:
> BITMAP_{LAST,FIRST}_WORD_MASK() in linux/bitmap.h duplicates the
> functionality of GENMASK(). The scope of there macros is wider than just
> bitmap. This patch defines 4 new macros: BITS_FIRST(), BITS_LAST(),
> BITS_FIRST_MASK() and BITS_LAST_MASK() in linux/bits.h on top of GENMASK()
> and replaces BITMAP_{LAST,FIRST}_WORD_MASK() to avoid duplication and
> increases scope of the macros.
...
> include/linux/bitmap.h | 27 ++++++++++++---------------
> include/linux/bits.h | 6 ++++++
> include/linux/cpumask.h | 8 ++++----
> include/linux/netdev_features.h | 2 +-
> include/linux/nodemask.h | 2 +-
> lib/bitmap.c | 26 +++++++++++++-------------
> lib/find_bit.c | 4 ++--
> lib/genalloc.c | 8 ++++----
Answering to your reply. I meant to split the below to a separate patch.
> tools/include/linux/bitmap.h | 20 ++++++--------------
> tools/include/linux/bits.h | 6 ++++++
> tools/lib/bitmap.c | 6 +++---
> tools/lib/find_bit.c | 2 +-
> tools/testing/radix-tree/bitmap.c | 4 ++--
...
> +#define BITS_FIRST(nr) GENMASK((nr), 0)
> +#define BITS_LAST(nr) GENMASK(BITS_PER_LONG - 1, (nr))
> +
> +#define BITS_FIRST_MASK(nr) BITS_FIRST((nr) % BITS_PER_LONG)
> +#define BITS_LAST_MASK(nr) BITS_LAST((nr) % BITS_PER_LONG)
Any pointers to the difference in generated code for popular architectures
(x86. x86_64, arm, aarch64, ppc, ppc64)?
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 5/8] bitsperlong.h: introduce SMALL_CONST() macro
2021-01-30 19:17 ` [PATCH 5/8] bitsperlong.h: introduce SMALL_CONST() macro Yury Norov
@ 2021-02-01 13:45 ` Andy Shevchenko
2021-02-02 7:10 ` Yury Norov
0 siblings, 1 reply; 23+ messages in thread
From: Andy Shevchenko @ 2021-02-01 13:45 UTC (permalink / raw)
To: Yury Norov
Cc: linux-m68k, linux-kernel, linux-sh, linux-arch,
Geert Uytterhoeven, Yoshinori Sato, Rich Felker, Arnd Bergmann,
Dennis Zhou, Andrew Morton, Wolfram Sang, David Sterba,
Stefano Brivio, Ma, Jianpeng, Wei Yang, Josh Poimboeuf,
John Paul Adrian Glaubitz, Joe Perches
On Sat, Jan 30, 2021 at 11:17:16AM -0800, Yury Norov wrote:
> Many algorithms become simpler if they are passed with relatively small
> input values. One example is bitmap operations when the whole bitmap fits
> into one word. To implement such simplifications, linux/bitmap.h declares
> small_const_nbits() macro.
>
> Other subsystems may also benefit from optimizations of this sort, like
> find_bit API in the following patches. So it looks helpful to generalize
> the macro and extend it's visibility.
Hmm... Are we really good to allow 0 as a parameter to it? I remember we had
a thread at some point where Rasmus explained why 0 is excluded.
...
> --- a/tools/include/asm-generic/bitsperlong.h
> +++ b/tools/include/asm-generic/bitsperlong.h
Tools part in a separate patch?
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 6/8] lib: inline _find_next_bit() wrappers
2021-01-30 19:17 ` [PATCH 6/8] lib: inline _find_next_bit() wrappers Yury Norov
@ 2021-02-01 13:47 ` Andy Shevchenko
2021-02-02 7:13 ` Yury Norov
0 siblings, 1 reply; 23+ messages in thread
From: Andy Shevchenko @ 2021-02-01 13:47 UTC (permalink / raw)
To: Yury Norov
Cc: linux-m68k, linux-kernel, linux-sh, linux-arch,
Geert Uytterhoeven, Yoshinori Sato, Rich Felker, Arnd Bergmann,
Dennis Zhou, Andrew Morton, Wolfram Sang, David Sterba,
Stefano Brivio, Ma, Jianpeng, Wei Yang, Josh Poimboeuf,
John Paul Adrian Glaubitz, Joe Perches
On Sat, Jan 30, 2021 at 11:17:17AM -0800, Yury Norov wrote:
> lib/find_bit.c declares five single-line wrappers for _find_next_bit().
> We may turn those wrappers to inline functions. It eliminates unneeded
> function calls and opens room for compile-time optimizations.
> tools/include/asm-generic/bitops/find.h | 27 +++++++++---
> tools/lib/find_bit.c | 52 ++++++++++-------------
In a separated patch, please. I don't think we need to defer this series in
case if tools lagged (which is usual case in my practice).
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 7/8] lib: add fast path for find_next_*_bit()
2021-01-30 19:17 ` [PATCH 7/8] lib: add fast path for find_next_*_bit() Yury Norov
@ 2021-02-01 13:49 ` Andy Shevchenko
2021-02-01 16:02 ` David Laight
0 siblings, 1 reply; 23+ messages in thread
From: Andy Shevchenko @ 2021-02-01 13:49 UTC (permalink / raw)
To: Yury Norov
Cc: linux-m68k, linux-kernel, linux-sh, linux-arch,
Geert Uytterhoeven, Yoshinori Sato, Rich Felker, Arnd Bergmann,
Dennis Zhou, Andrew Morton, Wolfram Sang, David Sterba,
Stefano Brivio, Ma, Jianpeng, Wei Yang, Josh Poimboeuf,
John Paul Adrian Glaubitz, Joe Perches
On Sat, Jan 30, 2021 at 11:17:18AM -0800, Yury Norov wrote:
> Similarly to bitmap functions, find_next_*_bit() users will benefit
> if we'll handle a case of bitmaps that fit into a single word. In the
> very best case, the compiler may replace a function call with a
> single ffs or ffz instruction.
Would be nice to have the examples how it reduces the actual code size (based
on the existing code in kernel, especially in widely used frameworks /
subsystems, like PCI).
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 8/8] lib: add fast path for find_first_*_bit() and find_last_bit()
2021-01-30 19:17 ` [PATCH 8/8] lib: add fast path for find_first_*_bit() and find_last_bit() Yury Norov
@ 2021-02-01 13:50 ` Andy Shevchenko
0 siblings, 0 replies; 23+ messages in thread
From: Andy Shevchenko @ 2021-02-01 13:50 UTC (permalink / raw)
To: Yury Norov
Cc: linux-m68k, linux-kernel, linux-sh, linux-arch,
Geert Uytterhoeven, Yoshinori Sato, Rich Felker, Arnd Bergmann,
Dennis Zhou, Andrew Morton, Wolfram Sang, David Sterba,
Stefano Brivio, Ma, Jianpeng, Wei Yang, Josh Poimboeuf,
John Paul Adrian Glaubitz, Joe Perches
On Sat, Jan 30, 2021 at 11:17:19AM -0800, Yury Norov wrote:
> Similarly to bitmap functions, users will benefit if we'll handle
> a case of small-size bitmaps that fit into a single word.
>
> While here, move the find_last_bit() declaration to bitops/find.h
> where other find_*_bit() functions sit.
As per previous patches:
- decouple from tools
- show examples of the code generation difference
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: [PATCH 7/8] lib: add fast path for find_next_*_bit()
2021-02-01 13:49 ` Andy Shevchenko
@ 2021-02-01 16:02 ` David Laight
2021-02-01 16:22 ` 'Andy Shevchenko'
0 siblings, 1 reply; 23+ messages in thread
From: David Laight @ 2021-02-01 16:02 UTC (permalink / raw)
To: 'Andy Shevchenko', Yury Norov
Cc: linux-m68k, linux-kernel, linux-sh, linux-arch,
Geert Uytterhoeven, Yoshinori Sato, Rich Felker, Arnd Bergmann,
Dennis Zhou, Andrew Morton, Wolfram Sang, David Sterba,
Stefano Brivio, Ma, Jianpeng, Wei Yang, Josh Poimboeuf,
John Paul Adrian Glaubitz, Joe Perches
From: Andy Shevchenko
> Sent: 01 February 2021 13:49
>
> On Sat, Jan 30, 2021 at 11:17:18AM -0800, Yury Norov wrote:
> > Similarly to bitmap functions, find_next_*_bit() users will benefit
> > if we'll handle a case of bitmaps that fit into a single word. In the
> > very best case, the compiler may replace a function call with a
> > single ffs or ffz instruction.
>
> Would be nice to have the examples how it reduces the actual code size (based
> on the existing code in kernel, especially in widely used frameworks /
> subsystems, like PCI).
I bet it makes the kernel bigger but very slightly faster.
But the fact that the wrappers end up in the i-cache may
mean that inlining actually makes it slower for some calling
sequences.
If a bitmap fits in a single word (as a compile-time constant)
then you should (probably) be using different functions if
you care about performance.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 7/8] lib: add fast path for find_next_*_bit()
2021-02-01 16:02 ` David Laight
@ 2021-02-01 16:22 ` 'Andy Shevchenko'
2021-02-02 7:02 ` Yury Norov
0 siblings, 1 reply; 23+ messages in thread
From: 'Andy Shevchenko' @ 2021-02-01 16:22 UTC (permalink / raw)
To: David Laight
Cc: Yury Norov, linux-m68k, linux-kernel, linux-sh, linux-arch,
Geert Uytterhoeven, Yoshinori Sato, Rich Felker, Arnd Bergmann,
Dennis Zhou, Andrew Morton, Wolfram Sang, David Sterba,
Stefano Brivio, Ma, Jianpeng, Wei Yang, Josh Poimboeuf,
John Paul Adrian Glaubitz, Joe Perches
On Mon, Feb 01, 2021 at 04:02:30PM +0000, David Laight wrote:
> From: Andy Shevchenko
> > Sent: 01 February 2021 13:49
> > On Sat, Jan 30, 2021 at 11:17:18AM -0800, Yury Norov wrote:
> > > Similarly to bitmap functions, find_next_*_bit() users will benefit
> > > if we'll handle a case of bitmaps that fit into a single word. In the
> > > very best case, the compiler may replace a function call with a
> > > single ffs or ffz instruction.
> >
> > Would be nice to have the examples how it reduces the actual code size (based
> > on the existing code in kernel, especially in widely used frameworks /
> > subsystems, like PCI).
>
> I bet it makes the kernel bigger but very slightly faster.
> But the fact that the wrappers end up in the i-cache may
> mean that inlining actually makes it slower for some calling
> sequences.
> If a bitmap fits in a single word (as a compile-time constant)
> then you should (probably) be using different functions if
> you care about performance.
Isn't this patch series exactly about it?
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 7/8] lib: add fast path for find_next_*_bit()
2021-02-01 16:22 ` 'Andy Shevchenko'
@ 2021-02-02 7:02 ` Yury Norov
0 siblings, 0 replies; 23+ messages in thread
From: Yury Norov @ 2021-02-02 7:02 UTC (permalink / raw)
To: Andy Shevchenko
Cc: David Laight, linux-m68k, linux-kernel, linux-sh, linux-arch,
Geert Uytterhoeven, Yoshinori Sato, Rich Felker, Arnd Bergmann,
Dennis Zhou, Andrew Morton, Wolfram Sang, David Sterba,
Stefano Brivio, Ma, Jianpeng, Wei Yang, Josh Poimboeuf,
John Paul Adrian Glaubitz, Joe Perches, Alexey Klimov,
Rasmus Villemoes
[ CC Alexey Klimov, andRasmus Villemoes as they also may be interested ]
On Mon, Feb 1, 2021 at 8:22 AM Andy Shevchenko
<andriy.shevchenko@linux.intel.com> wrote:
> On Mon, Feb 01, 2021 at 04:02:30PM +0000, David Laight wrote:
> > From: Andy Shevchenko
> > > Sent: 01 February 2021 13:49
> > > On Sat, Jan 30, 2021 at 11:17:18AM -0800, Yury Norov wrote:
> > > > Similarly to bitmap functions, find_next_*_bit() users will benefit
> > > > if we'll handle a case of bitmaps that fit into a single word. In the
> > > > very best case, the compiler may replace a function call with a
> > > > single ffs or ffz instruction.
> > >
> > > Would be nice to have the examples how it reduces the actual code size (based
> > > on the existing code in kernel, especially in widely used frameworks /
> > > subsystems, like PCI).
>>
> > I bet it makes the kernel bigger but very slightly faster.
> > But the fact that the wrappers end up in the i-cache may
> > mean that inlining actually makes it slower for some calling
> > sequences.
>
> > If a bitmap fits in a single word (as a compile-time constant)
> > then you should (probably) be using different functions if
> > you care about performance.
>
> Isn't this patch series exactly about it?
Probably, David meant something like find_first_bit_fast_path().
Not sure that this approach would be better than pure inlining.
Addressing questions above.
My arm64 build for qemu:
Before:
Idx Name Size VMA LMA File off Algn
0 .head.text 00010000 ffff800010000000 ffff800010000000 00010000 2**16
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .text 011603a8 ffff800010010000 ffff800010010000 00020000 2**16
CONTENTS, ALLOC, LOAD, READONLY, CODE
2 .got.plt 00000018 ffff8000111703a8 ffff8000111703a8 011803a8 2**3
CONTENTS, ALLOC, LOAD, DATA
3 .rodata 007a29b2 ffff800011180000 ffff800011180000 01190000 2**12
CONTENTS, ALLOC, LOAD, DATA
4 .pci_fixup 000025f0 ffff8000119229c0 ffff8000119229c0 019329c0 2**4
CONTENTS, ALLOC, LOAD, READONLY, DATA
5 __ksymtab 000100d4 ffff800011924fb0 ffff800011924fb0 01934fb0 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
6 __ksymtab_gpl 000147cc ffff800011935084 ffff800011935084 01945084 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
7 __ksymtab_strings 0003b0be ffff800011949850 ffff800011949850
01959850 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
8 __param 00003e58 ffff800011984910 ffff800011984910 01994910 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
9 __modver 00000678 ffff800011988768 ffff800011988768 01998768 2**3
CONTENTS, ALLOC, LOAD, DATA
10 __ex_table 00002270 ffff800011988de0 ffff800011988de0 01998de0 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
11 .notes 0000003c ffff80001198b050 ffff80001198b050 0199b050 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
12 .init.text 0007b09c ffff8000119a0000 ffff8000119a0000 019a0000 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
13 .exit.text 00004120 ffff800011a1b09c ffff800011a1b09c 01a1b09c 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
...
After:
Idx Name Size VMA LMA File off Algn
0 .head.text 00010000 ffff800010000000 ffff800010000000 00010000 2**16
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .text 011613a8 ffff800010010000 ffff800010010000 00020000 2**16
CONTENTS, ALLOC, LOAD, READONLY, CODE
2 .got.plt 00000018 ffff8000111713a8 ffff8000111713a8 011813a8 2**3
CONTENTS, ALLOC, LOAD, DATA
3 .rodata 007a26b2 ffff800011180000 ffff800011180000 01190000 2**12
CONTENTS, ALLOC, LOAD, DATA
4 .pci_fixup 000025f0 ffff8000119226c0 ffff8000119226c0 019326c0 2**4
CONTENTS, ALLOC, LOAD, READONLY, DATA
5 __ksymtab 000100bc ffff800011924cb0 ffff800011924cb0 01934cb0 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
6 __ksymtab_gpl 000147cc ffff800011934d6c ffff800011934d6c 01944d6c 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
7 __ksymtab_strings 0003b09b ffff800011949538 ffff800011949538
01959538 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
8 __param 00003e58 ffff8000119845d8 ffff8000119845d8 019945d8 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
9 __modver 00000678 ffff800011988430 ffff800011988430 01998430 2**3
CONTENTS, ALLOC, LOAD, DATA
10 __ex_table 00002270 ffff800011988aa8 ffff800011988aa8 01998aa8 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
11 .notes 0000003c ffff80001198ad18 ffff80001198ad18 0199ad18 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
12 .init.text 0007b1a8 ffff8000119a0000 ffff8000119a0000 019a0000 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
13 .exit.text 00004120 ffff800011a1b1a8 ffff800011a1b1a8 01a1b1a8 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
The size of .text is increased by 0x1000 bytes, or 0.02%
The size of .rodata is decreased by 0x300 bytes, or 0.06%
The size of __ksymtab is decreased by 24 bytes, or 0.018%
So the cost of this fast path is ~3.3K.
Regarding code generation, this is the quite typical find_bit() user:
unsigned int cpumask_next(int n, const struct cpumask *srcp)
{
/* -1 is a legal arg here. */
if (n != -1)
cpumask_check(n);
return find_next_bit(cpumask_bits(srcp), nr_cpumask_bits, n + 1);
}
EXPORT_SYMBOL(cpumask_next);
Before:
0000000000000000 <cpumask_next>:
0: a9bf7bfd stp x29, x30, [sp, #-16]!
4: 11000402 add w2, w0, #0x1
8: aa0103e0 mov x0, x1
c: d2800401 mov x1, #0x40 // #64
10: 910003fd mov x29, sp
14: 93407c42 sxtw x2, w2
18: 94000000 bl 0 <find_next_bit>
1c: a8c17bfd ldp x29, x30, [sp], #16
20: d65f03c0 ret
24: d503201f nop
After:
0000000000000140 <cpumask_next>:
140: 11000400 add w0, w0, #0x1
144: 93407c00 sxtw x0, w0
148: f100fc1f cmp x0, #0x3f
14c: 54000168 b.hi 178 <cpumask_next+0x38> // b.pmore
150: f9400023 ldr x3, [x1]
154: 92800001 mov x1, #0xffffffffffffffff // #-1
158: 9ac02020 lsl x0, x1, x0
15c: 52800802 mov w2, #0x40 // #64
160: 8a030001 and x1, x0, x3
164: dac00020 rbit x0, x1
168: f100003f cmp x1, #0x0
16c: dac01000 clz x0, x0
170: 1a800040 csel w0, w2, w0, eq // eq = none
174: d65f03c0 ret
178: 52800800 mov w0, #0x40 // #64
17c: d65f03c0 ret
find_next_bit() call is removed with the cost of 6 instructions.
(And I suspect we can improve the GENMASK() for better code generation.)
find_next_bit() itself is 41 instructions, so I believe this fast path
would bring
value for many users.
On the other hand, if someone is limited with I-cache and wants to avoid
generating fast paths, I think it's worth to make SMALL_CONST() configurable,
which would also disable fast paths in lib/bitmap.c. We may rely on -Os flag, or
enable fast path in config explicitly:
diff --git a/include/asm-generic/bitsperlong.h
b/include/asm-generic/bitsperlong.h
index 0eeb77544f1d..209e531074c1 100644
--- a/include/asm-generic/bitsperlong.h
+++ b/include/asm-generic/bitsperlong.h
@@ -23,6 +23,10 @@
#define BITS_PER_LONG_LONG 64
#endif
+#ifdef CONFIG_FAST_PATH
#define SMALL_CONST(n) (__builtin_constant_p(n) && (unsigned long)(n)
< BITS_PER_LONG)
+#else
+#define SMALL_CONST(n) (0)
+#endif
#endif /* __ASM_GENERIC_BITS_PER_LONG */
diff --git a/lib/Kconfig b/lib/Kconfig
index a38cc61256f1..c9629b3ebce8 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -39,6 +39,14 @@ config PACKING
When in doubt, say N.
+config FAST_PATH
+ bool "Enable fast path code generation"
+ default y
+ help
+ This option enables fast path optimization with the cost
+ of increasing the text section.
+
config BITREVERSE
tristate
> With Best Regards,
> Andy Shevchenko
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 5/8] bitsperlong.h: introduce SMALL_CONST() macro
2021-02-01 13:45 ` Andy Shevchenko
@ 2021-02-02 7:10 ` Yury Norov
0 siblings, 0 replies; 23+ messages in thread
From: Yury Norov @ 2021-02-02 7:10 UTC (permalink / raw)
To: Andy Shevchenko
Cc: linux-m68k, Linux Kernel Mailing List, Linux-SH, Linux-Arch,
Geert Uytterhoeven, Yoshinori Sato, Rich Felker, Arnd Bergmann,
Dennis Zhou, Andrew Morton, Wolfram Sang, David Sterba,
Stefano Brivio, Ma, Jianpeng, Wei Yang, Josh Poimboeuf,
John Paul Adrian Glaubitz, Joe Perches
On Mon, Feb 1, 2021 at 5:45 AM Andy Shevchenko
<andriy.shevchenko@linux.intel.com> wrote:
>
> On Sat, Jan 30, 2021 at 11:17:16AM -0800, Yury Norov wrote:
> > Many algorithms become simpler if they are passed with relatively small
> > input values. One example is bitmap operations when the whole bitmap fits
> > into one word. To implement such simplifications, linux/bitmap.h declares
> > small_const_nbits() macro.
> >
> > Other subsystems may also benefit from optimizations of this sort, like
> > find_bit API in the following patches. So it looks helpful to generalize
> > the macro and extend it's visibility.
>
> Hmm... Are we really good to allow 0 as a parameter to it? I remember we had
> a thread at some point where Rasmus explained why 0 is excluded.
Now we pass (nbits - 1) instead of nbits, which is ULONG_MAX in case
of nbits == 0
> > --- a/tools/include/asm-generic/bitsperlong.h
> > +++ b/tools/include/asm-generic/bitsperlong.h
>
> Tools part in a separate patch?
>
> --
> With Best Regards,
> Andy Shevchenko
>
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 6/8] lib: inline _find_next_bit() wrappers
2021-02-01 13:47 ` Andy Shevchenko
@ 2021-02-02 7:13 ` Yury Norov
0 siblings, 0 replies; 23+ messages in thread
From: Yury Norov @ 2021-02-02 7:13 UTC (permalink / raw)
To: Andy Shevchenko
Cc: linux-m68k, Linux Kernel Mailing List, Linux-SH, Linux-Arch,
Geert Uytterhoeven, Yoshinori Sato, Rich Felker, Arnd Bergmann,
Dennis Zhou, Andrew Morton, Wolfram Sang, David Sterba,
Stefano Brivio, Ma, Jianpeng, Wei Yang, Josh Poimboeuf,
John Paul Adrian Glaubitz, Joe Perches
On Mon, Feb 1, 2021 at 5:47 AM Andy Shevchenko
<andriy.shevchenko@linux.intel.com> wrote:
>
> On Sat, Jan 30, 2021 at 11:17:17AM -0800, Yury Norov wrote:
> > lib/find_bit.c declares five single-line wrappers for _find_next_bit().
> > We may turn those wrappers to inline functions. It eliminates unneeded
> > function calls and opens room for compile-time optimizations.
>
> > tools/include/asm-generic/bitops/find.h | 27 +++++++++---
> > tools/lib/find_bit.c | 52 ++++++++++-------------
>
> In a separated patch, please. I don't think we need to defer this series in
> case if tools lagged (which is usual case in my practice).
Splitting it to kernel and tools parts means either a patch bomb for tools or
doubling the size of the series. Both options look worse than what we have now.
Can you explain more on the lagged tools argument?
> --
> With Best Regards,
> Andy Shevchenko
>
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RESEND PATCH v2 0/6] lib/find_bit: fast path for small bitmaps
2021-01-30 19:17 [RESEND PATCH v2 0/6] lib/find_bit: fast path for small bitmaps Yury Norov
` (7 preceding siblings ...)
2021-01-30 19:17 ` [PATCH 8/8] lib: add fast path for find_first_*_bit() and find_last_bit() Yury Norov
@ 2021-02-15 21:30 ` Yury Norov
2021-02-16 9:14 ` Andy Shevchenko
8 siblings, 1 reply; 23+ messages in thread
From: Yury Norov @ 2021-02-15 21:30 UTC (permalink / raw)
To: linux-m68k, linux-kernel, linux-sh, linux-arch
Cc: Geert Uytterhoeven, Yoshinori Sato, Rich Felker, Arnd Bergmann,
Dennis Zhou, Andrew Morton, Wolfram Sang, David Sterba,
Andy Shevchenko, Stefano Brivio, Ma, Jianpeng, Wei Yang,
Josh Poimboeuf, John Paul Adrian Glaubitz, Joe Perches
[add David Laight <David.Laight@ACULAB.COM> ]
On Sat, Jan 30, 2021 at 11:17:11AM -0800, Yury Norov wrote:
> Bitmap operations are much simpler and faster in case of small bitmaps
> which fit into a single word. In linux/bitmap.h we have a machinery that
> allows compiler to replace actual function call with a few instructions
> if bitmaps passed into the function are small and their size is known at
> compile time.
>
> find_*_bit() API lacks this functionality; despite users will benefit from
> it a lot. One important example is cpumask subsystem when
> NR_CPUS <= BITS_PER_LONG. In the very best case, the compiler may replace
> a find_*_bit() call for such a bitmap with a single ffs or ffz instruction.
>
> Tools is synchronized with new implementation where needed.
>
> v1: https://www.spinics.net/lists/kernel/msg3804727.html
> v2: - employ GENMASK() for bitmaps;
> - unify find_bit inliners in;
> - address comments to v1;
Comments so far:
- increased image size (patch #8) - addressed by introducing
CONFIG_FAST_PATH;
- split tools and kernel parts - not clear why it's better.
Anything else?
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RESEND PATCH v2 0/6] lib/find_bit: fast path for small bitmaps
2021-02-15 21:30 ` [RESEND PATCH v2 0/6] lib/find_bit: fast path for small bitmaps Yury Norov
@ 2021-02-16 9:14 ` Andy Shevchenko
2021-02-16 18:00 ` Yury Norov
0 siblings, 1 reply; 23+ messages in thread
From: Andy Shevchenko @ 2021-02-16 9:14 UTC (permalink / raw)
To: Yury Norov
Cc: linux-m68k, linux-kernel, linux-sh, linux-arch,
Geert Uytterhoeven, Yoshinori Sato, Rich Felker, Arnd Bergmann,
Dennis Zhou, Andrew Morton, Wolfram Sang, David Sterba,
Stefano Brivio, Ma, Jianpeng, Wei Yang, Josh Poimboeuf,
John Paul Adrian Glaubitz, Joe Perches
On Mon, Feb 15, 2021 at 01:30:44PM -0800, Yury Norov wrote:
> [add David Laight <David.Laight@ACULAB.COM> ]
>
> On Sat, Jan 30, 2021 at 11:17:11AM -0800, Yury Norov wrote:
> > Bitmap operations are much simpler and faster in case of small bitmaps
> > which fit into a single word. In linux/bitmap.h we have a machinery that
> > allows compiler to replace actual function call with a few instructions
> > if bitmaps passed into the function are small and their size is known at
> > compile time.
> >
> > find_*_bit() API lacks this functionality; despite users will benefit from
> > it a lot. One important example is cpumask subsystem when
> > NR_CPUS <= BITS_PER_LONG. In the very best case, the compiler may replace
> > a find_*_bit() call for such a bitmap with a single ffs or ffz instruction.
> >
> > Tools is synchronized with new implementation where needed.
> >
> > v1: https://www.spinics.net/lists/kernel/msg3804727.html
> > v2: - employ GENMASK() for bitmaps;
> > - unify find_bit inliners in;
> > - address comments to v1;
>
> Comments so far:
> - increased image size (patch #8) - addressed by introducing
> CONFIG_FAST_PATH;
> - split tools and kernel parts - not clear why it's better.
Because tools are user space programs and sometimes may not follow kernel
specifics, so they are different logically and changes should be separated.
> Anything else?
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RESEND PATCH v2 0/6] lib/find_bit: fast path for small bitmaps
2021-02-16 9:14 ` Andy Shevchenko
@ 2021-02-16 18:00 ` Yury Norov
2021-02-17 10:33 ` Andy Shevchenko
0 siblings, 1 reply; 23+ messages in thread
From: Yury Norov @ 2021-02-16 18:00 UTC (permalink / raw)
To: Andy Shevchenko
Cc: linux-m68k, linux-kernel, linux-sh, linux-arch,
Geert Uytterhoeven, Yoshinori Sato, Rich Felker, Arnd Bergmann,
Dennis Zhou, Andrew Morton, Wolfram Sang, David Sterba,
Stefano Brivio, Ma, Jianpeng, Wei Yang, Josh Poimboeuf,
John Paul Adrian Glaubitz, Joe Perches
On Tue, Feb 16, 2021 at 11:14:23AM +0200, Andy Shevchenko wrote:
> On Mon, Feb 15, 2021 at 01:30:44PM -0800, Yury Norov wrote:
> > [add David Laight <David.Laight@ACULAB.COM> ]
> >
> > On Sat, Jan 30, 2021 at 11:17:11AM -0800, Yury Norov wrote:
> > > Bitmap operations are much simpler and faster in case of small bitmaps
> > > which fit into a single word. In linux/bitmap.h we have a machinery that
> > > allows compiler to replace actual function call with a few instructions
> > > if bitmaps passed into the function are small and their size is known at
> > > compile time.
> > >
> > > find_*_bit() API lacks this functionality; despite users will benefit from
> > > it a lot. One important example is cpumask subsystem when
> > > NR_CPUS <= BITS_PER_LONG. In the very best case, the compiler may replace
> > > a find_*_bit() call for such a bitmap with a single ffs or ffz instruction.
> > >
> > > Tools is synchronized with new implementation where needed.
> > >
> > > v1: https://www.spinics.net/lists/kernel/msg3804727.html
> > > v2: - employ GENMASK() for bitmaps;
> > > - unify find_bit inliners in;
> > > - address comments to v1;
> >
> > Comments so far:
> > - increased image size (patch #8) - addressed by introducing
> > CONFIG_FAST_PATH;
>
> > - split tools and kernel parts - not clear why it's better.
>
> Because tools are user space programs and sometimes may not follow kernel
> specifics, so they are different logically and changes should be separated.
In this specific case tools follow kernel well.
Nevertheless, if you think it's a blocker for the series, I can split. What
option for tools is better for you - doubling the number of patches or
squashing everything in a patch bomb?
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RESEND PATCH v2 0/6] lib/find_bit: fast path for small bitmaps
2021-02-16 18:00 ` Yury Norov
@ 2021-02-17 10:33 ` Andy Shevchenko
0 siblings, 0 replies; 23+ messages in thread
From: Andy Shevchenko @ 2021-02-17 10:33 UTC (permalink / raw)
To: Yury Norov
Cc: linux-m68k, linux-kernel, linux-sh, linux-arch,
Geert Uytterhoeven, Yoshinori Sato, Rich Felker, Arnd Bergmann,
Dennis Zhou, Andrew Morton, Wolfram Sang, David Sterba,
Stefano Brivio, Ma, Jianpeng, Wei Yang, Josh Poimboeuf,
John Paul Adrian Glaubitz, Joe Perches
On Tue, Feb 16, 2021 at 10:00:42AM -0800, Yury Norov wrote:
> On Tue, Feb 16, 2021 at 11:14:23AM +0200, Andy Shevchenko wrote:
> > On Mon, Feb 15, 2021 at 01:30:44PM -0800, Yury Norov wrote:
> > > [add David Laight <David.Laight@ACULAB.COM> ]
> > >
> > > On Sat, Jan 30, 2021 at 11:17:11AM -0800, Yury Norov wrote:
> > > > Bitmap operations are much simpler and faster in case of small bitmaps
> > > > which fit into a single word. In linux/bitmap.h we have a machinery that
> > > > allows compiler to replace actual function call with a few instructions
> > > > if bitmaps passed into the function are small and their size is known at
> > > > compile time.
> > > >
> > > > find_*_bit() API lacks this functionality; despite users will benefit from
> > > > it a lot. One important example is cpumask subsystem when
> > > > NR_CPUS <= BITS_PER_LONG. In the very best case, the compiler may replace
> > > > a find_*_bit() call for such a bitmap with a single ffs or ffz instruction.
> > > >
> > > > Tools is synchronized with new implementation where needed.
> > > >
> > > > v1: https://www.spinics.net/lists/kernel/msg3804727.html
> > > > v2: - employ GENMASK() for bitmaps;
> > > > - unify find_bit inliners in;
> > > > - address comments to v1;
> > >
> > > Comments so far:
> > > - increased image size (patch #8) - addressed by introducing
> > > CONFIG_FAST_PATH;
> >
> > > - split tools and kernel parts - not clear why it's better.
> >
> > Because tools are user space programs and sometimes may not follow kernel
> > specifics, so they are different logically and changes should be separated.
>
> In this specific case tools follow kernel well.
>
> Nevertheless, if you think it's a blocker for the series, I can split.
It's not a blocker from my side. But you make it harder to push like this,
because you will need a tag from tools, which in my practice is quite
hard to get -> blocker. My point is: don't make obstacles where we can avoid
them. So, if tools won't take this, it won't block us.
> What
> option for tools is better for you - doubling the number of patches or
> squashing everything in a patch bomb?
Not a tools guy, but common sense tells me that the best approach is to follow
kind of changes in the kernel (similar granularity).
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2021-02-17 10:36 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-30 19:17 [RESEND PATCH v2 0/6] lib/find_bit: fast path for small bitmaps Yury Norov
2021-01-30 19:17 ` [PATCH 1/8] tools: disable -Wno-type-limits Yury Norov
2021-01-30 19:17 ` [PATCH 2/8] tools: bitmap: sync function declarations with linux kernel Yury Norov
2021-01-30 19:17 ` [PATCH 3/8] arch: rearrange headers inclusion order in asm/bitops for m68k and sh Yury Norov
2021-01-30 19:17 ` [PATCH 4/8] lib: introduce BITS_{FIRST,LAST} macro Yury Norov
2021-02-01 13:42 ` Andy Shevchenko
2021-01-30 19:17 ` [PATCH 5/8] bitsperlong.h: introduce SMALL_CONST() macro Yury Norov
2021-02-01 13:45 ` Andy Shevchenko
2021-02-02 7:10 ` Yury Norov
2021-01-30 19:17 ` [PATCH 6/8] lib: inline _find_next_bit() wrappers Yury Norov
2021-02-01 13:47 ` Andy Shevchenko
2021-02-02 7:13 ` Yury Norov
2021-01-30 19:17 ` [PATCH 7/8] lib: add fast path for find_next_*_bit() Yury Norov
2021-02-01 13:49 ` Andy Shevchenko
2021-02-01 16:02 ` David Laight
2021-02-01 16:22 ` 'Andy Shevchenko'
2021-02-02 7:02 ` Yury Norov
2021-01-30 19:17 ` [PATCH 8/8] lib: add fast path for find_first_*_bit() and find_last_bit() Yury Norov
2021-02-01 13:50 ` Andy Shevchenko
2021-02-15 21:30 ` [RESEND PATCH v2 0/6] lib/find_bit: fast path for small bitmaps Yury Norov
2021-02-16 9:14 ` Andy Shevchenko
2021-02-16 18:00 ` Yury Norov
2021-02-17 10:33 ` Andy Shevchenko
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.