linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RESEND 00/17] Resend bitmap patches
@ 2021-08-14 21:16 Yury Norov
  2021-08-14 21:16 ` [PATCH 01/17] bitops: protect find_first_{,zero}_bit properly Yury Norov
                   ` (16 more replies)
  0 siblings, 17 replies; 26+ messages in thread
From: Yury Norov @ 2021-08-14 21:16 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm, linux-arch,
	linux-kselftest, linux-mmc, linux-perf-users, kvm,
	James E.J. Bottomley, Alexander Lobakin, Alexander Shishkin,
	Alexey Klimov, Andrea Merello, Andy Shevchenko,
	Arnaldo Carvalho de Melo, Arnd Bergmann, Ben Gardon,
	Benjamin Herrenschmidt, Brian Cain, Catalin Marinas,
	Christoph Lameter, Daniel Bristot de Oliveira, David Hildenbrand,
	Dennis Zhou, Geert Uytterhoeven, Heiko Carstens, Ian Rogers,
	Ingo Molnar, Jaegeuk Kim, Jakub Kicinski, Jiri Olsa, Joe Perches,
	Jonas Bonn, Leo Yan, Mark Rutland, Namhyung Kim, Palmer Dabbelt,
	Paolo Bonzini, Peter Xu, Peter Zijlstra, Petr Mladek,
	Rasmus Villemoes, Rich Felker, Samuel Mendoza-Jonas,
	Sean Christopherson, Sergey Senozhatsky, Shuah Khan,
	Stefan Kristiansson, Steven Rostedt, Tejun Heo,
	Thomas Bogendoerfer, Ulf Hansson, Will Deacon, Wolfram Sang,
	Yoshinori Sato, Yury Norov

This is a resend of previously submitted series:
https://lore.kernel.org/patchwork/cover/1462071/
https://lore.kernel.org/patchwork/patch/1458703/
https://lore.kernel.org/lkml/YPG8SdsbQ+sxjk0w@yury-ThinkPad/T/
https://lore.kernel.org/lkml/YMVSHCY9yEocmfVD@yury-ThinkPad/T/

Most of the patches received testing and review. If I missed to
add someone's review tag putting all together - my kind apologise.
Please resend it here.

I believe I addessed all comments except Joe's one. In comment to patch 3,
Joe Perches suggested to rename include/linux/find.h, but didn't give a
new name, so I leave it as is. Since this header is not for direct
inclusion, I'm OK with any reasonable name, and we can change it later.

Andrew, can you please take this series in linux-next?

Andy Shevchenko (1):
  tools: Rename bitmap_alloc() to bitmap_zalloc()

Yury Norov (16):
  bitops: protect find_first_{,zero}_bit properly
  bitops: move find_bit_*_le functions from le.h to find.h
  include: move find.h from asm_generic to linux
  arch: remove GENERIC_FIND_FIRST_BIT entirely
  lib: add find_first_and_bit()
  cpumask: use find_first_and_bit()
  all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where
    appropriate
  tools: sync tools/bitmap with mother linux
  cpumask: replace cpumask_next_* with cpumask_first_* where appropriate
  include/linux: move for_each_bit() macros from bitops.h to find.h
  find: micro-optimize for_each_{set,clear}_bit()
  Replace for_each_*_bit_from() with for_each_*_bit() where appropriate
  mm/percpu: micro-optimize pcpu_is_populated()
  bitmap: unify find_bit operations
  lib: bitmap: add performance test for bitmap_print_to_pagebuf
  vsprintf: rework bitmap_list_string

 MAINTAINERS                                   |   4 +-
 arch/alpha/include/asm/bitops.h               |   2 -
 arch/arc/Kconfig                              |   1 -
 arch/arc/include/asm/bitops.h                 |   1 -
 arch/arm/include/asm/bitops.h                 |   1 -
 arch/arm64/Kconfig                            |   1 -
 arch/arm64/include/asm/bitops.h               |   1 -
 arch/csky/include/asm/bitops.h                |   1 -
 arch/h8300/include/asm/bitops.h               |   1 -
 arch/hexagon/include/asm/bitops.h             |   1 -
 arch/ia64/include/asm/bitops.h                |   2 -
 arch/m68k/include/asm/bitops.h                |   2 -
 arch/mips/Kconfig                             |   1 -
 arch/mips/include/asm/bitops.h                |   1 -
 arch/openrisc/include/asm/bitops.h            |   1 -
 arch/parisc/include/asm/bitops.h              |   2 -
 arch/powerpc/include/asm/bitops.h             |   2 -
 arch/powerpc/include/asm/cputhreads.h         |   2 +-
 arch/powerpc/platforms/pasemi/dma_lib.c       |   4 +-
 arch/riscv/include/asm/bitops.h               |   1 -
 arch/s390/Kconfig                             |   1 -
 arch/s390/include/asm/bitops.h                |   1 -
 arch/s390/kvm/kvm-s390.c                      |   2 +-
 arch/sh/include/asm/bitops.h                  |   1 -
 arch/sparc/include/asm/bitops_32.h            |   1 -
 arch/sparc/include/asm/bitops_64.h            |   2 -
 arch/x86/Kconfig                              |   1 -
 arch/x86/include/asm/bitops.h                 |   2 -
 arch/x86/kernel/apic/vector.c                 |   4 +-
 arch/x86/um/Kconfig                           |   1 -
 arch/xtensa/include/asm/bitops.h              |   1 -
 block/blk-mq.c                                |   2 +-
 drivers/block/rnbd/rnbd-clt.c                 |   2 +-
 drivers/dma/ti/edma.c                         |   2 +-
 drivers/gpu/drm/etnaviv/etnaviv_gpu.c         |   4 +-
 drivers/hwmon/ltc2992.c                       |   3 +-
 drivers/iio/adc/ad7124.c                      |   2 +-
 drivers/infiniband/hw/irdma/hw.c              |  16 +-
 drivers/media/cec/core/cec-core.c             |   2 +-
 drivers/media/mc/mc-devnode.c                 |   2 +-
 drivers/mmc/host/renesas_sdhi_core.c          |   2 +-
 drivers/net/virtio_net.c                      |   2 +-
 drivers/pci/controller/dwc/pci-dra7xx.c       |   2 +-
 drivers/scsi/lpfc/lpfc_sli.c                  |  10 +-
 drivers/soc/fsl/qbman/bman_portal.c           |   2 +-
 drivers/soc/fsl/qbman/qman_portal.c           |   2 +-
 drivers/soc/ti/k3-ringacc.c                   |   4 +-
 drivers/tty/n_tty.c                           |   2 +-
 drivers/virt/acrn/ioreq.c                     |   3 +-
 fs/f2fs/segment.c                             |   8 +-
 fs/ocfs2/cluster/heartbeat.c                  |   2 +-
 fs/ocfs2/dlm/dlmdomain.c                      |   4 +-
 fs/ocfs2/dlm/dlmmaster.c                      |  18 +-
 fs/ocfs2/dlm/dlmrecovery.c                    |   2 +-
 fs/ocfs2/dlm/dlmthread.c                      |   2 +-
 include/asm-generic/bitops.h                  |   1 -
 include/asm-generic/bitops/le.h               |  64 ---
 include/linux/bitmap.h                        |  34 +-
 include/linux/bitops.h                        |  34 --
 include/linux/cpumask.h                       |  46 ++-
 include/linux/find.h                          | 372 ++++++++++++++++++
 kernel/time/clocksource.c                     |   4 +-
 lib/Kconfig                                   |   3 -
 lib/find_bit.c                                |  21 +
 lib/find_bit_benchmark.c                      |  21 +
 lib/genalloc.c                                |   2 +-
 lib/test_bitmap.c                             |  37 ++
 lib/vsprintf.c                                |  24 +-
 mm/percpu.c                                   |  35 +-
 net/ncsi/ncsi-manage.c                        |   4 +-
 tools/include/asm-generic/bitops.h            |   1 -
 tools/include/asm-generic/bitops/find.h       | 145 -------
 tools/include/linux/bitmap.h                  |  11 +-
 .../bitops => tools/include/linux}/find.h     |  54 ++-
 tools/lib/find_bit.c                          |  20 +
 tools/perf/bench/find-bit-bench.c             |   2 +-
 tools/perf/builtin-c2c.c                      |   6 +-
 tools/perf/builtin-record.c                   |   2 +-
 tools/perf/tests/bitmap.c                     |   2 +-
 tools/perf/tests/mem2node.c                   |   2 +-
 tools/perf/util/affinity.c                    |   4 +-
 tools/perf/util/header.c                      |   4 +-
 tools/perf/util/metricgroup.c                 |   2 +-
 tools/perf/util/mmap.c                        |   4 +-
 .../selftests/kvm/dirty_log_perf_test.c       |   2 +-
 tools/testing/selftests/kvm/dirty_log_test.c  |   4 +-
 .../selftests/kvm/x86_64/vmx_dirty_log_test.c |   2 +-
 87 files changed, 657 insertions(+), 461 deletions(-)
 create mode 100644 include/linux/find.h
 delete mode 100644 tools/include/asm-generic/bitops/find.h
 rename {include/asm-generic/bitops => tools/include/linux}/find.h (83%)

-- 
2.30.2



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 01/17] bitops: protect find_first_{,zero}_bit properly
  2021-08-14 21:16 [PATCH RESEND 00/17] Resend bitmap patches Yury Norov
@ 2021-08-14 21:16 ` Yury Norov
  2021-08-14 21:16 ` [PATCH 02/17] bitops: move find_bit_*_le functions from le.h to find.h Yury Norov
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Yury Norov @ 2021-08-14 21:16 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm, linux-arch,
	linux-kselftest, linux-mmc, linux-perf-users, kvm,
	James E.J. Bottomley, Alexander Lobakin, Alexander Shishkin,
	Alexey Klimov, Andrea Merello, Andy Shevchenko,
	Arnaldo Carvalho de Melo, Arnd Bergmann, Ben Gardon,
	Benjamin Herrenschmidt, Brian Cain, Catalin Marinas,
	Christoph Lameter, Daniel Bristot de Oliveira, David Hildenbrand,
	Dennis Zhou, Geert Uytterhoeven, Heiko Carstens, Ian Rogers,
	Ingo Molnar, Jaegeuk Kim, Jakub Kicinski, Jiri Olsa, Joe Perches,
	Jonas Bonn, Leo Yan, Mark Rutland, Namhyung Kim, Palmer Dabbelt,
	Paolo Bonzini, Peter Xu, Peter Zijlstra, Petr Mladek,
	Rasmus Villemoes, Rich Felker, Samuel Mendoza-Jonas,
	Sean Christopherson, Sergey Senozhatsky, Shuah Khan,
	Stefan Kristiansson, Steven Rostedt, Tejun Heo,
	Thomas Bogendoerfer, Ulf Hansson, Will Deacon, Wolfram Sang,
	Yoshinori Sato, Yury Norov
  Cc: kernel test robot

find_first_bit() and find_first_zero_bit() are not protected with
ifdefs as other functions in find.h. It causes build errors on some
platforms if CONFIG_GENERIC_FIND_FIRST_BIT is enabled.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Fixes: 2cc7b6a44ac2 ("lib: add fast path for find_first_*_bit() and find_last_bit()")
Reported-by: kernel test robot <lkp@intel.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
---
 include/asm-generic/bitops/find.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/asm-generic/bitops/find.h b/include/asm-generic/bitops/find.h
index 0d132ee2a291..835f959a25f2 100644
--- a/include/asm-generic/bitops/find.h
+++ b/include/asm-generic/bitops/find.h
@@ -97,6 +97,7 @@ unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
 
 #ifdef CONFIG_GENERIC_FIND_FIRST_BIT
 
+#ifndef find_first_bit
 /**
  * find_first_bit - find the first set bit in a memory region
  * @addr: The address to start the search at
@@ -116,7 +117,9 @@ unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
 
 	return _find_first_bit(addr, size);
 }
+#endif
 
+#ifndef find_first_zero_bit
 /**
  * find_first_zero_bit - find the first cleared bit in a memory region
  * @addr: The address to start the search at
@@ -136,6 +139,8 @@ unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
 
 	return _find_first_zero_bit(addr, size);
 }
+#endif
+
 #else /* CONFIG_GENERIC_FIND_FIRST_BIT */
 
 #ifndef find_first_bit
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 02/17] bitops: move find_bit_*_le functions from le.h to find.h
  2021-08-14 21:16 [PATCH RESEND 00/17] Resend bitmap patches Yury Norov
  2021-08-14 21:16 ` [PATCH 01/17] bitops: protect find_first_{,zero}_bit properly Yury Norov
@ 2021-08-14 21:16 ` Yury Norov
  2021-08-14 21:16 ` [PATCH 03/17] include: move find.h from asm_generic to linux Yury Norov
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Yury Norov @ 2021-08-14 21:16 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm, linux-arch,
	linux-kselftest, linux-mmc, linux-perf-users, kvm,
	James E.J. Bottomley, Alexander Lobakin, Alexander Shishkin,
	Alexey Klimov, Andrea Merello, Andy Shevchenko,
	Arnaldo Carvalho de Melo, Arnd Bergmann, Ben Gardon,
	Benjamin Herrenschmidt, Brian Cain, Catalin Marinas,
	Christoph Lameter, Daniel Bristot de Oliveira, David Hildenbrand,
	Dennis Zhou, Geert Uytterhoeven, Heiko Carstens, Ian Rogers,
	Ingo Molnar, Jaegeuk Kim, Jakub Kicinski, Jiri Olsa, Joe Perches,
	Jonas Bonn, Leo Yan, Mark Rutland, Namhyung Kim, Palmer Dabbelt,
	Paolo Bonzini, Peter Xu, Peter Zijlstra, Petr Mladek,
	Rasmus Villemoes, Rich Felker, Samuel Mendoza-Jonas,
	Sean Christopherson, Sergey Senozhatsky, Shuah Khan,
	Stefan Kristiansson, Steven Rostedt, Tejun Heo,
	Thomas Bogendoerfer, Ulf Hansson, Will Deacon, Wolfram Sang,
	Yoshinori Sato, Yury Norov

It's convenient to have all find_bit declarations in one place.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
---
 include/asm-generic/bitops/find.h | 69 +++++++++++++++++++++++++++++++
 include/asm-generic/bitops/le.h   | 64 ----------------------------
 2 files changed, 69 insertions(+), 64 deletions(-)

diff --git a/include/asm-generic/bitops/find.h b/include/asm-generic/bitops/find.h
index 835f959a25f2..91b1b23f2b0c 100644
--- a/include/asm-generic/bitops/find.h
+++ b/include/asm-generic/bitops/find.h
@@ -190,4 +190,73 @@ extern unsigned long find_next_clump8(unsigned long *clump,
 #define find_first_clump8(clump, bits, size) \
 	find_next_clump8((clump), (bits), (size), 0)
 
+#if defined(__LITTLE_ENDIAN)
+
+static inline unsigned long find_next_zero_bit_le(const void *addr,
+		unsigned long size, unsigned long offset)
+{
+	return find_next_zero_bit(addr, size, offset);
+}
+
+static inline unsigned long find_next_bit_le(const void *addr,
+		unsigned long size, unsigned long offset)
+{
+	return find_next_bit(addr, size, offset);
+}
+
+static inline unsigned long find_first_zero_bit_le(const void *addr,
+		unsigned long size)
+{
+	return find_first_zero_bit(addr, size);
+}
+
+#elif defined(__BIG_ENDIAN)
+
+#ifndef find_next_zero_bit_le
+static inline
+unsigned long find_next_zero_bit_le(const void *addr, unsigned
+		long size, unsigned long offset)
+{
+	if (small_const_nbits(size)) {
+		unsigned long val = *(const unsigned long *)addr;
+
+		if (unlikely(offset >= size))
+			return size;
+
+		val = swab(val) | ~GENMASK(size - 1, offset);
+		return val == ~0UL ? size : ffz(val);
+	}
+
+	return _find_next_bit(addr, NULL, size, offset, ~0UL, 1);
+}
+#endif
+
+#ifndef find_next_bit_le
+static inline
+unsigned long find_next_bit_le(const void *addr, unsigned
+		long size, unsigned long offset)
+{
+	if (small_const_nbits(size)) {
+		unsigned long val = *(const unsigned long *)addr;
+
+		if (unlikely(offset >= size))
+			return size;
+
+		val = swab(val) & GENMASK(size - 1, offset);
+		return val ? __ffs(val) : size;
+	}
+
+	return _find_next_bit(addr, NULL, size, offset, 0UL, 1);
+}
+#endif
+
+#ifndef find_first_zero_bit_le
+#define find_first_zero_bit_le(addr, size) \
+	find_next_zero_bit_le((addr), (size), 0)
+#endif
+
+#else
+#error "Please fix <asm/byteorder.h>"
+#endif
+
 #endif /*_ASM_GENERIC_BITOPS_FIND_H_ */
diff --git a/include/asm-generic/bitops/le.h b/include/asm-generic/bitops/le.h
index 5a28629cbf4d..d51beff60375 100644
--- a/include/asm-generic/bitops/le.h
+++ b/include/asm-generic/bitops/le.h
@@ -2,83 +2,19 @@
 #ifndef _ASM_GENERIC_BITOPS_LE_H_
 #define _ASM_GENERIC_BITOPS_LE_H_
 
-#include <asm-generic/bitops/find.h>
 #include <asm/types.h>
 #include <asm/byteorder.h>
-#include <linux/swab.h>
 
 #if defined(__LITTLE_ENDIAN)
 
 #define BITOP_LE_SWIZZLE	0
 
-static inline unsigned long find_next_zero_bit_le(const void *addr,
-		unsigned long size, unsigned long offset)
-{
-	return find_next_zero_bit(addr, size, offset);
-}
-
-static inline unsigned long find_next_bit_le(const void *addr,
-		unsigned long size, unsigned long offset)
-{
-	return find_next_bit(addr, size, offset);
-}
-
-static inline unsigned long find_first_zero_bit_le(const void *addr,
-		unsigned long size)
-{
-	return find_first_zero_bit(addr, size);
-}
-
 #elif defined(__BIG_ENDIAN)
 
 #define BITOP_LE_SWIZZLE	((BITS_PER_LONG-1) & ~0x7)
 
-#ifndef find_next_zero_bit_le
-static inline
-unsigned long find_next_zero_bit_le(const void *addr, unsigned
-		long size, unsigned long offset)
-{
-	if (small_const_nbits(size)) {
-		unsigned long val = *(const unsigned long *)addr;
-
-		if (unlikely(offset >= size))
-			return size;
-
-		val = swab(val) | ~GENMASK(size - 1, offset);
-		return val == ~0UL ? size : ffz(val);
-	}
-
-	return _find_next_bit(addr, NULL, size, offset, ~0UL, 1);
-}
-#endif
-
-#ifndef find_next_bit_le
-static inline
-unsigned long find_next_bit_le(const void *addr, unsigned
-		long size, unsigned long offset)
-{
-	if (small_const_nbits(size)) {
-		unsigned long val = *(const unsigned long *)addr;
-
-		if (unlikely(offset >= size))
-			return size;
-
-		val = swab(val) & GENMASK(size - 1, offset);
-		return val ? __ffs(val) : size;
-	}
-
-	return _find_next_bit(addr, NULL, size, offset, 0UL, 1);
-}
 #endif
 
-#ifndef find_first_zero_bit_le
-#define find_first_zero_bit_le(addr, size) \
-	find_next_zero_bit_le((addr), (size), 0)
-#endif
-
-#else
-#error "Please fix <asm/byteorder.h>"
-#endif
 
 static inline int test_bit_le(int nr, const void *addr)
 {
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 03/17] include: move find.h from asm_generic to linux
  2021-08-14 21:16 [PATCH RESEND 00/17] Resend bitmap patches Yury Norov
  2021-08-14 21:16 ` [PATCH 01/17] bitops: protect find_first_{,zero}_bit properly Yury Norov
  2021-08-14 21:16 ` [PATCH 02/17] bitops: move find_bit_*_le functions from le.h to find.h Yury Norov
@ 2021-08-14 21:16 ` Yury Norov
  2021-08-14 21:17 ` [PATCH 04/17] arch: remove GENERIC_FIND_FIRST_BIT entirely Yury Norov
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Yury Norov @ 2021-08-14 21:16 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm, linux-arch,
	linux-kselftest, linux-mmc, linux-perf-users, kvm,
	James E.J. Bottomley, Alexander Lobakin, Alexander Shishkin,
	Alexey Klimov, Andrea Merello, Andy Shevchenko,
	Arnaldo Carvalho de Melo, Arnd Bergmann, Ben Gardon,
	Benjamin Herrenschmidt, Brian Cain, Catalin Marinas,
	Christoph Lameter, Daniel Bristot de Oliveira, David Hildenbrand,
	Dennis Zhou, Geert Uytterhoeven, Heiko Carstens, Ian Rogers,
	Ingo Molnar, Jaegeuk Kim, Jakub Kicinski, Jiri Olsa, Joe Perches,
	Jonas Bonn, Leo Yan, Mark Rutland, Namhyung Kim, Palmer Dabbelt,
	Paolo Bonzini, Peter Xu, Peter Zijlstra, Petr Mladek,
	Rasmus Villemoes, Rich Felker, Samuel Mendoza-Jonas,
	Sean Christopherson, Sergey Senozhatsky, Shuah Khan,
	Stefan Kristiansson, Steven Rostedt, Tejun Heo,
	Thomas Bogendoerfer, Ulf Hansson, Will Deacon, Wolfram Sang,
	Yoshinori Sato, Yury Norov

find_bit API and bitmap API are closely related, but inclusion paths
are different - include/asm-generic and include/linux, correspondingly.
In the past it made a lot of troubles due to circular dependencies
and/or undefined symbols. Fix this by moving find.h under include/linux.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
---
 MAINTAINERS                                  |  2 +-
 arch/alpha/include/asm/bitops.h              |  2 --
 arch/arc/include/asm/bitops.h                |  1 -
 arch/arm/include/asm/bitops.h                |  1 -
 arch/arm64/include/asm/bitops.h              |  1 -
 arch/csky/include/asm/bitops.h               |  1 -
 arch/h8300/include/asm/bitops.h              |  1 -
 arch/hexagon/include/asm/bitops.h            |  1 -
 arch/ia64/include/asm/bitops.h               |  2 --
 arch/m68k/include/asm/bitops.h               |  2 --
 arch/mips/include/asm/bitops.h               |  1 -
 arch/openrisc/include/asm/bitops.h           |  1 -
 arch/parisc/include/asm/bitops.h             |  2 --
 arch/powerpc/include/asm/bitops.h            |  2 --
 arch/riscv/include/asm/bitops.h              |  1 -
 arch/s390/include/asm/bitops.h               |  1 -
 arch/sh/include/asm/bitops.h                 |  1 -
 arch/sparc/include/asm/bitops_32.h           |  1 -
 arch/sparc/include/asm/bitops_64.h           |  2 --
 arch/x86/include/asm/bitops.h                |  2 --
 arch/xtensa/include/asm/bitops.h             |  1 -
 include/asm-generic/bitops.h                 |  1 -
 include/linux/bitmap.h                       |  1 +
 include/{asm-generic/bitops => linux}/find.h | 12 +++++++++---
 24 files changed, 11 insertions(+), 32 deletions(-)
 rename include/{asm-generic/bitops => linux}/find.h (97%)

diff --git a/MAINTAINERS b/MAINTAINERS
index b63403793c81..9b62293f7b72 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3271,8 +3271,8 @@ M:	Yury Norov <yury.norov@gmail.com>
 R:	Andy Shevchenko <andriy.shevchenko@linux.intel.com>
 R:	Rasmus Villemoes <linux@rasmusvillemoes.dk>
 S:	Maintained
-F:	include/asm-generic/bitops/find.h
 F:	include/linux/bitmap.h
+F:	include/linux/find.h
 F:	lib/bitmap.c
 F:	lib/find_bit.c
 F:	lib/find_bit_benchmark.c
diff --git a/arch/alpha/include/asm/bitops.h b/arch/alpha/include/asm/bitops.h
index 5adca78830b5..e1d8483a45f2 100644
--- a/arch/alpha/include/asm/bitops.h
+++ b/arch/alpha/include/asm/bitops.h
@@ -430,8 +430,6 @@ static inline unsigned int __arch_hweight8(unsigned int w)
 
 #endif /* __KERNEL__ */
 
-#include <asm-generic/bitops/find.h>
-
 #ifdef __KERNEL__
 
 /*
diff --git a/arch/arc/include/asm/bitops.h b/arch/arc/include/asm/bitops.h
index a7daaf64ae34..bdb7e190a294 100644
--- a/arch/arc/include/asm/bitops.h
+++ b/arch/arc/include/asm/bitops.h
@@ -189,7 +189,6 @@ static inline __attribute__ ((const)) unsigned long __ffs(unsigned long x)
 #include <asm-generic/bitops/atomic.h>
 #include <asm-generic/bitops/non-atomic.h>
 
-#include <asm-generic/bitops/find.h>
 #include <asm-generic/bitops/le.h>
 #include <asm-generic/bitops/ext2-atomic-setbit.h>
 
diff --git a/arch/arm/include/asm/bitops.h b/arch/arm/include/asm/bitops.h
index c92e42a5c8f7..8e94fe7ab5eb 100644
--- a/arch/arm/include/asm/bitops.h
+++ b/arch/arm/include/asm/bitops.h
@@ -264,7 +264,6 @@ static inline int find_next_bit_le(const void *p, int size, int offset)
 
 #endif
 
-#include <asm-generic/bitops/find.h>
 #include <asm-generic/bitops/le.h>
 
 /*
diff --git a/arch/arm64/include/asm/bitops.h b/arch/arm64/include/asm/bitops.h
index 81a3e519b07d..9b3c787132d2 100644
--- a/arch/arm64/include/asm/bitops.h
+++ b/arch/arm64/include/asm/bitops.h
@@ -18,7 +18,6 @@
 
 #include <asm-generic/bitops/ffz.h>
 #include <asm-generic/bitops/fls64.h>
-#include <asm-generic/bitops/find.h>
 
 #include <asm-generic/bitops/sched.h>
 #include <asm-generic/bitops/hweight.h>
diff --git a/arch/csky/include/asm/bitops.h b/arch/csky/include/asm/bitops.h
index 91818787d860..9604f47bd850 100644
--- a/arch/csky/include/asm/bitops.h
+++ b/arch/csky/include/asm/bitops.h
@@ -59,7 +59,6 @@ static __always_inline unsigned long __fls(unsigned long x)
 
 #include <asm-generic/bitops/ffz.h>
 #include <asm-generic/bitops/fls64.h>
-#include <asm-generic/bitops/find.h>
 
 #ifndef _LINUX_BITOPS_H
 #error only <linux/bitops.h> can be included directly
diff --git a/arch/h8300/include/asm/bitops.h b/arch/h8300/include/asm/bitops.h
index c867a80cab5b..4489e3d6edd3 100644
--- a/arch/h8300/include/asm/bitops.h
+++ b/arch/h8300/include/asm/bitops.h
@@ -168,7 +168,6 @@ static inline unsigned long __ffs(unsigned long word)
 	return result;
 }
 
-#include <asm-generic/bitops/find.h>
 #include <asm-generic/bitops/sched.h>
 #include <asm-generic/bitops/hweight.h>
 #include <asm-generic/bitops/lock.h>
diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h
index 71429f756af0..75d6ba3643b8 100644
--- a/arch/hexagon/include/asm/bitops.h
+++ b/arch/hexagon/include/asm/bitops.h
@@ -271,7 +271,6 @@ static inline unsigned long __fls(unsigned long word)
 }
 
 #include <asm-generic/bitops/lock.h>
-#include <asm-generic/bitops/find.h>
 
 #include <asm-generic/bitops/fls64.h>
 #include <asm-generic/bitops/sched.h>
diff --git a/arch/ia64/include/asm/bitops.h b/arch/ia64/include/asm/bitops.h
index 2f24ee6459d2..577be93c0818 100644
--- a/arch/ia64/include/asm/bitops.h
+++ b/arch/ia64/include/asm/bitops.h
@@ -441,8 +441,6 @@ static __inline__ unsigned long __arch_hweight64(unsigned long x)
 
 #endif /* __KERNEL__ */
 
-#include <asm-generic/bitops/find.h>
-
 #ifdef __KERNEL__
 
 #include <asm-generic/bitops/le.h>
diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
index 7b414099e5fc..f551a2160294 100644
--- a/arch/m68k/include/asm/bitops.h
+++ b/arch/m68k/include/asm/bitops.h
@@ -529,6 +529,4 @@ static inline int __fls(int x)
 #include <asm-generic/bitops/le.h>
 #endif /* __KERNEL__ */
 
-#include <asm-generic/bitops/find.h>
-
 #endif /* _M68K_BITOPS_H */
diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index dc2a6234dd3c..c09d57f907f7 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -446,7 +446,6 @@ static inline int ffs(int word)
 }
 
 #include <asm-generic/bitops/ffz.h>
-#include <asm-generic/bitops/find.h>
 
 #ifdef __KERNEL__
 
diff --git a/arch/openrisc/include/asm/bitops.h b/arch/openrisc/include/asm/bitops.h
index 7f1ca35213d8..d773ed938acb 100644
--- a/arch/openrisc/include/asm/bitops.h
+++ b/arch/openrisc/include/asm/bitops.h
@@ -30,7 +30,6 @@
 #include <asm/bitops/fls.h>
 #include <asm/bitops/__fls.h>
 #include <asm-generic/bitops/fls64.h>
-#include <asm-generic/bitops/find.h>
 
 #ifndef _LINUX_BITOPS_H
 #error only <linux/bitops.h> can be included directly
diff --git a/arch/parisc/include/asm/bitops.h b/arch/parisc/include/asm/bitops.h
index aa4e883431c1..c7a9997ac9cb 100644
--- a/arch/parisc/include/asm/bitops.h
+++ b/arch/parisc/include/asm/bitops.h
@@ -208,8 +208,6 @@ static __inline__ int fls(unsigned int x)
 
 #endif /* __KERNEL__ */
 
-#include <asm-generic/bitops/find.h>
-
 #ifdef __KERNEL__
 
 #include <asm-generic/bitops/le.h>
diff --git a/arch/powerpc/include/asm/bitops.h b/arch/powerpc/include/asm/bitops.h
index 299ab33505a6..ce2c1fa1a45d 100644
--- a/arch/powerpc/include/asm/bitops.h
+++ b/arch/powerpc/include/asm/bitops.h
@@ -255,8 +255,6 @@ unsigned long __arch_hweight64(__u64 w);
 #include <asm-generic/bitops/hweight.h>
 #endif
 
-#include <asm-generic/bitops/find.h>
-
 /* wrappers that deal with KASAN instrumentation */
 #include <asm-generic/bitops/instrumented-atomic.h>
 #include <asm-generic/bitops/instrumented-lock.h>
diff --git a/arch/riscv/include/asm/bitops.h b/arch/riscv/include/asm/bitops.h
index 396a3303c537..3540b690944b 100644
--- a/arch/riscv/include/asm/bitops.h
+++ b/arch/riscv/include/asm/bitops.h
@@ -20,7 +20,6 @@
 #include <asm-generic/bitops/fls.h>
 #include <asm-generic/bitops/__fls.h>
 #include <asm-generic/bitops/fls64.h>
-#include <asm-generic/bitops/find.h>
 #include <asm-generic/bitops/sched.h>
 #include <asm-generic/bitops/ffs.h>
 
diff --git a/arch/s390/include/asm/bitops.h b/arch/s390/include/asm/bitops.h
index fd149480b6e2..f7cefdde7c24 100644
--- a/arch/s390/include/asm/bitops.h
+++ b/arch/s390/include/asm/bitops.h
@@ -387,7 +387,6 @@ static inline int fls(unsigned int word)
 #endif /* CONFIG_HAVE_MARCH_Z9_109_FEATURES */
 
 #include <asm-generic/bitops/ffz.h>
-#include <asm-generic/bitops/find.h>
 #include <asm-generic/bitops/hweight.h>
 #include <asm-generic/bitops/sched.h>
 #include <asm-generic/bitops/le.h>
diff --git a/arch/sh/include/asm/bitops.h b/arch/sh/include/asm/bitops.h
index 3b6c7b5b7ec9..10ceb0d6b5a9 100644
--- a/arch/sh/include/asm/bitops.h
+++ b/arch/sh/include/asm/bitops.h
@@ -68,6 +68,5 @@ static inline unsigned long __ffs(unsigned long word)
 #include <asm-generic/bitops/fls64.h>
 
 #include <asm-generic/bitops/le.h>
-#include <asm-generic/bitops/find.h>
 
 #endif /* __ASM_SH_BITOPS_H */
diff --git a/arch/sparc/include/asm/bitops_32.h b/arch/sparc/include/asm/bitops_32.h
index 0ceff3b915a8..889afa9f990f 100644
--- a/arch/sparc/include/asm/bitops_32.h
+++ b/arch/sparc/include/asm/bitops_32.h
@@ -100,7 +100,6 @@ static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
 #include <asm-generic/bitops/fls64.h>
 #include <asm-generic/bitops/hweight.h>
 #include <asm-generic/bitops/lock.h>
-#include <asm-generic/bitops/find.h>
 #include <asm-generic/bitops/le.h>
 #include <asm-generic/bitops/ext2-atomic.h>
 
diff --git a/arch/sparc/include/asm/bitops_64.h b/arch/sparc/include/asm/bitops_64.h
index ca7ea5913494..005a8ae858f1 100644
--- a/arch/sparc/include/asm/bitops_64.h
+++ b/arch/sparc/include/asm/bitops_64.h
@@ -52,8 +52,6 @@ unsigned int __arch_hweight8(unsigned int w);
 #include <asm-generic/bitops/lock.h>
 #endif /* __KERNEL__ */
 
-#include <asm-generic/bitops/find.h>
-
 #ifdef __KERNEL__
 
 #include <asm-generic/bitops/le.h>
diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index 0367efdc5b7a..a288ecd230ab 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -380,8 +380,6 @@ static __always_inline int fls64(__u64 x)
 #include <asm-generic/bitops/fls64.h>
 #endif
 
-#include <asm-generic/bitops/find.h>
-
 #include <asm-generic/bitops/sched.h>
 
 #include <asm/arch_hweight.h>
diff --git a/arch/xtensa/include/asm/bitops.h b/arch/xtensa/include/asm/bitops.h
index 3f71d364ba90..cd225896c40f 100644
--- a/arch/xtensa/include/asm/bitops.h
+++ b/arch/xtensa/include/asm/bitops.h
@@ -205,7 +205,6 @@ BIT_OPS(change, "xor", )
 #undef BIT_OP
 #undef TEST_AND_BIT_OP
 
-#include <asm-generic/bitops/find.h>
 #include <asm-generic/bitops/le.h>
 
 #include <asm-generic/bitops/ext2-atomic-setbit.h>
diff --git a/include/asm-generic/bitops.h b/include/asm-generic/bitops.h
index df9b5bc3d282..a47b8a71d6fe 100644
--- a/include/asm-generic/bitops.h
+++ b/include/asm-generic/bitops.h
@@ -20,7 +20,6 @@
 #include <asm-generic/bitops/fls.h>
 #include <asm-generic/bitops/__fls.h>
 #include <asm-generic/bitops/fls64.h>
-#include <asm-generic/bitops/find.h>
 
 #ifndef _LINUX_BITOPS_H
 #error only <linux/bitops.h> can be included directly
diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index a36cfcec4e77..3f7c6731b203 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -6,6 +6,7 @@
 
 #include <linux/align.h>
 #include <linux/bitops.h>
+#include <linux/find.h>
 #include <linux/limits.h>
 #include <linux/string.h>
 #include <linux/types.h>
diff --git a/include/asm-generic/bitops/find.h b/include/linux/find.h
similarity index 97%
rename from include/asm-generic/bitops/find.h
rename to include/linux/find.h
index 91b1b23f2b0c..c5410c243e04 100644
--- a/include/asm-generic/bitops/find.h
+++ b/include/linux/find.h
@@ -1,6 +1,12 @@
 /* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_GENERIC_BITOPS_FIND_H_
-#define _ASM_GENERIC_BITOPS_FIND_H_
+#ifndef __LINUX_FIND_H_
+#define __LINUX_FIND_H_
+
+#ifndef __LINUX_BITMAP_H
+#error only <linux/bitmap.h> can be included directly
+#endif
+
+#include <linux/bitops.h>
 
 extern unsigned long _find_next_bit(const unsigned long *addr1,
 		const unsigned long *addr2, unsigned long nbits,
@@ -259,4 +265,4 @@ unsigned long find_next_bit_le(const void *addr, unsigned
 #error "Please fix <asm/byteorder.h>"
 #endif
 
-#endif /*_ASM_GENERIC_BITOPS_FIND_H_ */
+#endif /*__LINUX_FIND_H_ */
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 04/17] arch: remove GENERIC_FIND_FIRST_BIT entirely
  2021-08-14 21:16 [PATCH RESEND 00/17] Resend bitmap patches Yury Norov
                   ` (2 preceding siblings ...)
  2021-08-14 21:16 ` [PATCH 03/17] include: move find.h from asm_generic to linux Yury Norov
@ 2021-08-14 21:17 ` Yury Norov
  2021-08-14 21:17 ` [PATCH 05/17] lib: add find_first_and_bit() Yury Norov
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Yury Norov @ 2021-08-14 21:17 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm, linux-arch,
	linux-kselftest, linux-mmc, linux-perf-users, kvm,
	James E.J. Bottomley, Alexander Lobakin, Alexander Shishkin,
	Alexey Klimov, Andrea Merello, Andy Shevchenko,
	Arnaldo Carvalho de Melo, Arnd Bergmann, Ben Gardon,
	Benjamin Herrenschmidt, Brian Cain, Catalin Marinas,
	Christoph Lameter, Daniel Bristot de Oliveira, David Hildenbrand,
	Dennis Zhou, Geert Uytterhoeven, Heiko Carstens, Ian Rogers,
	Ingo Molnar, Jaegeuk Kim, Jakub Kicinski, Jiri Olsa, Joe Perches,
	Jonas Bonn, Leo Yan, Mark Rutland, Namhyung Kim, Palmer Dabbelt,
	Paolo Bonzini, Peter Xu, Peter Zijlstra, Petr Mladek,
	Rasmus Villemoes, Rich Felker, Samuel Mendoza-Jonas,
	Sean Christopherson, Sergey Senozhatsky, Shuah Khan,
	Stefan Kristiansson, Steven Rostedt, Tejun Heo,
	Thomas Bogendoerfer, Ulf Hansson, Will Deacon, Wolfram Sang,
	Yoshinori Sato, Yury Norov

In 5.12 cycle we enabled GENERIC_FIND_FIRST_BIT config option for ARM64
and MIPS. It increased performance and shrunk .text size; and so far
I didn't receive any negative feedback on the change.

https://lore.kernel.org/linux-arch/20210225135700.1381396-1-yury.norov@gmail.com/

Now I think it's a good time to switch all architectures to use
find_{first,last}_bit() unconditionally, and so remove corresponding
config option.

The patch does't introduce functioal changes for arc, arm, arm64, mips,
m68k, s390 and x86, for other architectures I expect improvement both in
performance and .text size.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Alexander Lobakin <alobakin@pm.me> (mips)
Reviewed-by: Alexander Lobakin <alobakin@pm.me> (mips)
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Will Deacon <will@kernel.org>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
---
 arch/arc/Kconfig     |  1 -
 arch/arm64/Kconfig   |  1 -
 arch/mips/Kconfig    |  1 -
 arch/s390/Kconfig    |  1 -
 arch/x86/Kconfig     |  1 -
 arch/x86/um/Kconfig  |  1 -
 include/linux/find.h | 13 -------------
 lib/Kconfig          |  3 ---
 8 files changed, 22 deletions(-)

diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index 53d143fc42fe..f35994edc165 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -20,7 +20,6 @@ config ARC
 	select COMMON_CLK
 	select DMA_DIRECT_REMAP
 	select GENERIC_ATOMIC64 if !ISA_ARCV2 || !(ARC_HAS_LL64 && ARC_HAS_LLSC)
-	select GENERIC_FIND_FIRST_BIT
 	# for now, we don't need GENERIC_IRQ_PROBE, CONFIG_GENERIC_IRQ_CHIP
 	select GENERIC_IRQ_SHOW
 	select GENERIC_PCI_IOMAP
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index fdcd54d39c1e..7b2269ad3e12 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -119,7 +119,6 @@ config ARM64
 	select GENERIC_CPU_AUTOPROBE
 	select GENERIC_CPU_VULNERABILITIES
 	select GENERIC_EARLY_IOREMAP
-	select GENERIC_FIND_FIRST_BIT
 	select GENERIC_IDLE_POLL_SETUP
 	select GENERIC_IRQ_IPI
 	select GENERIC_IRQ_PROBE
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index da874375fa4b..d7758c8f7e37 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -30,7 +30,6 @@ config MIPS
 	select GENERIC_ATOMIC64 if !64BIT
 	select GENERIC_CMOS_UPDATE
 	select GENERIC_CPU_AUTOPROBE
-	select GENERIC_FIND_FIRST_BIT
 	select GENERIC_GETTIMEOFDAY
 	select GENERIC_IOMAP
 	select GENERIC_IRQ_PROBE
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 92c0a1b4c528..7a3bbaf19640 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -126,7 +126,6 @@ config S390
 	select GENERIC_CPU_AUTOPROBE
 	select GENERIC_CPU_VULNERABILITIES
 	select GENERIC_ENTRY
-	select GENERIC_FIND_FIRST_BIT
 	select GENERIC_GETTIMEOFDAY
 	select GENERIC_PTDUMP
 	select GENERIC_SMP_IDLE_THREAD
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 45962aaf2b2c..7edce30dc214 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -134,7 +134,6 @@ config X86
 	select GENERIC_CPU_VULNERABILITIES
 	select GENERIC_EARLY_IOREMAP
 	select GENERIC_ENTRY
-	select GENERIC_FIND_FIRST_BIT
 	select GENERIC_IOMAP
 	select GENERIC_IRQ_EFFECTIVE_AFF_MASK	if SMP
 	select GENERIC_IRQ_MATRIX_ALLOCATOR	if X86_LOCAL_APIC
diff --git a/arch/x86/um/Kconfig b/arch/x86/um/Kconfig
index 95d26a69088b..40d6a06e41c8 100644
--- a/arch/x86/um/Kconfig
+++ b/arch/x86/um/Kconfig
@@ -8,7 +8,6 @@ endmenu
 
 config UML_X86
 	def_bool y
-	select GENERIC_FIND_FIRST_BIT
 
 config 64BIT
 	bool "64-bit kernel" if "$(SUBARCH)" = "x86"
diff --git a/include/linux/find.h b/include/linux/find.h
index c5410c243e04..ea57f7f38c49 100644
--- a/include/linux/find.h
+++ b/include/linux/find.h
@@ -101,8 +101,6 @@ unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
 }
 #endif
 
-#ifdef CONFIG_GENERIC_FIND_FIRST_BIT
-
 #ifndef find_first_bit
 /**
  * find_first_bit - find the first set bit in a memory region
@@ -147,17 +145,6 @@ unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
 }
 #endif
 
-#else /* CONFIG_GENERIC_FIND_FIRST_BIT */
-
-#ifndef find_first_bit
-#define find_first_bit(addr, size) find_next_bit((addr), (size), 0)
-#endif
-#ifndef find_first_zero_bit
-#define find_first_zero_bit(addr, size) find_next_zero_bit((addr), (size), 0)
-#endif
-
-#endif /* CONFIG_GENERIC_FIND_FIRST_BIT */
-
 #ifndef find_last_bit
 /**
  * find_last_bit - find the last set bit in a memory region
diff --git a/lib/Kconfig b/lib/Kconfig
index 5c9c0687f76d..f8874240835b 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -59,9 +59,6 @@ config GENERIC_STRNLEN_USER
 config GENERIC_NET_UTILS
 	bool
 
-config GENERIC_FIND_FIRST_BIT
-	bool
-
 source "lib/math/Kconfig"
 
 config NO_GENERIC_PCI_IOPORT_MAP
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 05/17] lib: add find_first_and_bit()
  2021-08-14 21:16 [PATCH RESEND 00/17] Resend bitmap patches Yury Norov
                   ` (3 preceding siblings ...)
  2021-08-14 21:17 ` [PATCH 04/17] arch: remove GENERIC_FIND_FIRST_BIT entirely Yury Norov
@ 2021-08-14 21:17 ` Yury Norov
  2021-08-14 21:17 ` [PATCH 06/17] cpumask: use find_first_and_bit() Yury Norov
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Yury Norov @ 2021-08-14 21:17 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm, linux-arch,
	linux-kselftest, linux-mmc, linux-perf-users, kvm,
	James E.J. Bottomley, Alexander Lobakin, Alexander Shishkin,
	Alexey Klimov, Andrea Merello, Andy Shevchenko,
	Arnaldo Carvalho de Melo, Arnd Bergmann, Ben Gardon,
	Benjamin Herrenschmidt, Brian Cain, Catalin Marinas,
	Christoph Lameter, Daniel Bristot de Oliveira, David Hildenbrand,
	Dennis Zhou, Geert Uytterhoeven, Heiko Carstens, Ian Rogers,
	Ingo Molnar, Jaegeuk Kim, Jakub Kicinski, Jiri Olsa, Joe Perches,
	Jonas Bonn, Leo Yan, Mark Rutland, Namhyung Kim, Palmer Dabbelt,
	Paolo Bonzini, Peter Xu, Peter Zijlstra, Petr Mladek,
	Rasmus Villemoes, Rich Felker, Samuel Mendoza-Jonas,
	Sean Christopherson, Sergey Senozhatsky, Shuah Khan,
	Stefan Kristiansson, Steven Rostedt, Tejun Heo,
	Thomas Bogendoerfer, Ulf Hansson, Will Deacon, Wolfram Sang,
	Yoshinori Sato, Yury Norov

Currently find_first_and_bit() is an alias to find_next_and_bit(). However,
it is widely used in cpumask, so it worth to optimize it. This patch adds
its own implementation for find_first_and_bit().

On x86_64 find_bit_benchmark says:

Before (#define find_first_and_bit(...) find_next_and_bit(..., 0):
Start testing find_bit() with random-filled bitmap
[  140.291468] find_first_and_bit:           46890919 ns,  32671 iterations
Start testing find_bit() with sparse bitmap
[  140.295028] find_first_and_bit:               7103 ns,      1 iterations

After:
Start testing find_bit() with random-filled bitmap
[  162.574907] find_first_and_bit:           25045813 ns,  32846 iterations
Start testing find_bit() with sparse bitmap
[  162.578458] find_first_and_bit:               4900 ns,      1 iterations

(Thanks to Alexey Klimov for thorough testing.)

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Tested-by: Alexey Klimov <aklimov@redhat.com>
---
 include/linux/find.h     | 27 +++++++++++++++++++++++++++
 lib/find_bit.c           | 21 +++++++++++++++++++++
 lib/find_bit_benchmark.c | 21 +++++++++++++++++++++
 3 files changed, 69 insertions(+)

diff --git a/include/linux/find.h b/include/linux/find.h
index ea57f7f38c49..6048f8c97418 100644
--- a/include/linux/find.h
+++ b/include/linux/find.h
@@ -12,6 +12,8 @@ extern unsigned long _find_next_bit(const unsigned long *addr1,
 		const unsigned long *addr2, unsigned long nbits,
 		unsigned long start, unsigned long invert, unsigned long le);
 extern unsigned long _find_first_bit(const unsigned long *addr, unsigned long size);
+extern unsigned long _find_first_and_bit(const unsigned long *addr1,
+					 const unsigned long *addr2, unsigned long size);
 extern unsigned long _find_first_zero_bit(const unsigned long *addr, unsigned long size);
 extern unsigned long _find_last_bit(const unsigned long *addr, unsigned long size);
 
@@ -123,6 +125,31 @@ unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
 }
 #endif
 
+#ifndef find_first_and_bit
+/**
+ * find_first_and_bit - find the first set bit in both memory regions
+ * @addr1: The first address to base the search on
+ * @addr2: The second address to base the search on
+ * @size: The bitmap size in bits
+ *
+ * Returns the bit number for the next set bit
+ * If no bits are set, returns @size.
+ */
+static inline
+unsigned long find_first_and_bit(const unsigned long *addr1,
+				 const unsigned long *addr2,
+				 unsigned long size)
+{
+	if (small_const_nbits(size)) {
+		unsigned long val = *addr1 & *addr2 & GENMASK(size - 1, 0);
+
+		return val ? __ffs(val) : size;
+	}
+
+	return _find_first_and_bit(addr1, addr2, size);
+}
+#endif
+
 #ifndef find_first_zero_bit
 /**
  * find_first_zero_bit - find the first cleared bit in a memory region
diff --git a/lib/find_bit.c b/lib/find_bit.c
index 0f8e2e369b1d..1b8e4b2a9cba 100644
--- a/lib/find_bit.c
+++ b/lib/find_bit.c
@@ -89,6 +89,27 @@ unsigned long _find_first_bit(const unsigned long *addr, unsigned long size)
 EXPORT_SYMBOL(_find_first_bit);
 #endif
 
+#ifndef find_first_and_bit
+/*
+ * Find the first set bit in two memory regions.
+ */
+unsigned long _find_first_and_bit(const unsigned long *addr1,
+				  const unsigned long *addr2,
+				  unsigned long size)
+{
+	unsigned long idx, val;
+
+	for (idx = 0; idx * BITS_PER_LONG < size; idx++) {
+		val = addr1[idx] & addr2[idx];
+		if (val)
+			return min(idx * BITS_PER_LONG + __ffs(val), size);
+	}
+
+	return size;
+}
+EXPORT_SYMBOL(_find_first_and_bit);
+#endif
+
 #ifndef find_first_zero_bit
 /*
  * Find the first cleared bit in a memory region.
diff --git a/lib/find_bit_benchmark.c b/lib/find_bit_benchmark.c
index 5637c5711db9..db904b57d4b8 100644
--- a/lib/find_bit_benchmark.c
+++ b/lib/find_bit_benchmark.c
@@ -49,6 +49,25 @@ static int __init test_find_first_bit(void *bitmap, unsigned long len)
 	return 0;
 }
 
+static int __init test_find_first_and_bit(void *bitmap, const void *bitmap2, unsigned long len)
+{
+	static DECLARE_BITMAP(cp, BITMAP_LEN) __initdata;
+	unsigned long i, cnt;
+	ktime_t time;
+
+	bitmap_copy(cp, bitmap, BITMAP_LEN);
+
+	time = ktime_get();
+	for (cnt = i = 0; i < len; cnt++) {
+		i = find_first_and_bit(cp, bitmap2, len);
+		__clear_bit(i, cp);
+	}
+	time = ktime_get() - time;
+	pr_err("find_first_and_bit: %18llu ns, %6ld iterations\n", time, cnt);
+
+	return 0;
+}
+
 static int __init test_find_next_bit(const void *bitmap, unsigned long len)
 {
 	unsigned long i, cnt;
@@ -129,6 +148,7 @@ static int __init find_bit_test(void)
 	 * traverse only part of bitmap to avoid soft lockup.
 	 */
 	test_find_first_bit(bitmap, BITMAP_LEN / 10);
+	test_find_first_and_bit(bitmap, bitmap2, BITMAP_LEN / 2);
 	test_find_next_and_bit(bitmap, bitmap2, BITMAP_LEN);
 
 	pr_err("\nStart testing find_bit() with sparse bitmap\n");
@@ -145,6 +165,7 @@ static int __init find_bit_test(void)
 	test_find_next_zero_bit(bitmap, BITMAP_LEN);
 	test_find_last_bit(bitmap, BITMAP_LEN);
 	test_find_first_bit(bitmap, BITMAP_LEN);
+	test_find_first_and_bit(bitmap, bitmap2, BITMAP_LEN);
 	test_find_next_and_bit(bitmap, bitmap2, BITMAP_LEN);
 
 	/*
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 06/17] cpumask: use find_first_and_bit()
  2021-08-14 21:16 [PATCH RESEND 00/17] Resend bitmap patches Yury Norov
                   ` (4 preceding siblings ...)
  2021-08-14 21:17 ` [PATCH 05/17] lib: add find_first_and_bit() Yury Norov
@ 2021-08-14 21:17 ` Yury Norov
  2021-08-14 21:17 ` [PATCH 07/17] all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate Yury Norov
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Yury Norov @ 2021-08-14 21:17 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm, linux-arch,
	linux-kselftest, linux-mmc, linux-perf-users, kvm,
	James E.J. Bottomley, Alexander Lobakin, Alexander Shishkin,
	Alexey Klimov, Andrea Merello, Andy Shevchenko,
	Arnaldo Carvalho de Melo, Arnd Bergmann, Ben Gardon,
	Benjamin Herrenschmidt, Brian Cain, Catalin Marinas,
	Christoph Lameter, Daniel Bristot de Oliveira, David Hildenbrand,
	Dennis Zhou, Geert Uytterhoeven, Heiko Carstens, Ian Rogers,
	Ingo Molnar, Jaegeuk Kim, Jakub Kicinski, Jiri Olsa, Joe Perches,
	Jonas Bonn, Leo Yan, Mark Rutland, Namhyung Kim, Palmer Dabbelt,
	Paolo Bonzini, Peter Xu, Peter Zijlstra, Petr Mladek,
	Rasmus Villemoes, Rich Felker, Samuel Mendoza-Jonas,
	Sean Christopherson, Sergey Senozhatsky, Shuah Khan,
	Stefan Kristiansson, Steven Rostedt, Tejun Heo,
	Thomas Bogendoerfer, Ulf Hansson, Will Deacon, Wolfram Sang,
	Yoshinori Sato, Yury Norov

Now we have an efficient implementation for find_first_and_bit(),
so switch cpumask to use it where appropriate.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
---
 include/linux/cpumask.h | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index f3689a52bfd0..4a03f6636e6c 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -123,6 +123,12 @@ static inline unsigned int cpumask_first(const struct cpumask *srcp)
 	return 0;
 }
 
+static inline unsigned int cpumask_first_and(const struct cpumask *srcp1,
+					     const struct cpumask *srcp2)
+{
+	return 0;
+}
+
 static inline unsigned int cpumask_last(const struct cpumask *srcp)
 {
 	return 0;
@@ -167,7 +173,7 @@ static inline unsigned int cpumask_local_spread(unsigned int i, int node)
 
 static inline int cpumask_any_and_distribute(const struct cpumask *src1p,
 					     const struct cpumask *src2p) {
-	return cpumask_next_and(-1, src1p, src2p);
+	return cpumask_first_and(src1p, src2p);
 }
 
 static inline int cpumask_any_distribute(const struct cpumask *srcp)
@@ -195,6 +201,19 @@ static inline unsigned int cpumask_first(const struct cpumask *srcp)
 	return find_first_bit(cpumask_bits(srcp), nr_cpumask_bits);
 }
 
+/**
+ * cpumask_first_and - return the first cpu from *srcp1 & *srcp2
+ * @src1p: the first input
+ * @src2p: the second input
+ *
+ * Returns >= nr_cpu_ids if no cpus set in both.  See also cpumask_next_and().
+ */
+static inline
+unsigned int cpumask_first_and(const struct cpumask *srcp1, const struct cpumask *srcp2)
+{
+	return find_first_and_bit(cpumask_bits(srcp1), cpumask_bits(srcp2), nr_cpumask_bits);
+}
+
 /**
  * cpumask_last - get the last CPU in a cpumask
  * @srcp:	- the cpumask pointer
@@ -585,15 +604,6 @@ static inline void cpumask_copy(struct cpumask *dstp,
  */
 #define cpumask_any(srcp) cpumask_first(srcp)
 
-/**
- * cpumask_first_and - return the first cpu from *srcp1 & *srcp2
- * @src1p: the first input
- * @src2p: the second input
- *
- * Returns >= nr_cpu_ids if no cpus set in both.  See also cpumask_next_and().
- */
-#define cpumask_first_and(src1p, src2p) cpumask_next_and(-1, (src1p), (src2p))
-
 /**
  * cpumask_any_and - pick a "random" cpu from *mask1 & *mask2
  * @mask1: the first input cpumask
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 07/17] all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate
  2021-08-14 21:16 [PATCH RESEND 00/17] Resend bitmap patches Yury Norov
                   ` (5 preceding siblings ...)
  2021-08-14 21:17 ` [PATCH 06/17] cpumask: use find_first_and_bit() Yury Norov
@ 2021-08-14 21:17 ` Yury Norov
  2021-08-14 21:17 ` [PATCH 08/17] tools: sync tools/bitmap with mother linux Yury Norov
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Yury Norov @ 2021-08-14 21:17 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm, linux-arch,
	linux-kselftest, linux-mmc, linux-perf-users, kvm,
	James E.J. Bottomley, Alexander Lobakin, Alexander Shishkin,
	Alexey Klimov, Andrea Merello, Andy Shevchenko,
	Arnaldo Carvalho de Melo, Arnd Bergmann, Ben Gardon,
	Benjamin Herrenschmidt, Brian Cain, Catalin Marinas,
	Christoph Lameter, Daniel Bristot de Oliveira, David Hildenbrand,
	Dennis Zhou, Geert Uytterhoeven, Heiko Carstens, Ian Rogers,
	Ingo Molnar, Jaegeuk Kim, Jakub Kicinski, Jiri Olsa, Joe Perches,
	Jonas Bonn, Leo Yan, Mark Rutland, Namhyung Kim, Palmer Dabbelt,
	Paolo Bonzini, Peter Xu, Peter Zijlstra, Petr Mladek,
	Rasmus Villemoes, Rich Felker, Samuel Mendoza-Jonas,
	Sean Christopherson, Sergey Senozhatsky, Shuah Khan,
	Stefan Kristiansson, Steven Rostedt, Tejun Heo,
	Thomas Bogendoerfer, Ulf Hansson, Will Deacon, Wolfram Sang,
	Yoshinori Sato, Yury Norov

find_first{,_zero}_bit is a more effective analogue of 'next' version if
start == 0. This patch replaces 'next' with 'first' where things look
trivial.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
---
 arch/powerpc/platforms/pasemi/dma_lib.c |  4 ++--
 arch/s390/kvm/kvm-s390.c                |  2 +-
 drivers/block/rnbd/rnbd-clt.c           |  2 +-
 drivers/dma/ti/edma.c                   |  2 +-
 drivers/iio/adc/ad7124.c                |  2 +-
 drivers/infiniband/hw/irdma/hw.c        | 16 ++++++++--------
 drivers/media/cec/core/cec-core.c       |  2 +-
 drivers/media/mc/mc-devnode.c           |  2 +-
 drivers/pci/controller/dwc/pci-dra7xx.c |  2 +-
 drivers/scsi/lpfc/lpfc_sli.c            | 10 +++++-----
 drivers/soc/ti/k3-ringacc.c             |  4 ++--
 drivers/tty/n_tty.c                     |  2 +-
 drivers/virt/acrn/ioreq.c               |  3 +--
 fs/f2fs/segment.c                       |  8 ++++----
 fs/ocfs2/cluster/heartbeat.c            |  2 +-
 fs/ocfs2/dlm/dlmdomain.c                |  4 ++--
 fs/ocfs2/dlm/dlmmaster.c                | 18 +++++++++---------
 fs/ocfs2/dlm/dlmrecovery.c              |  2 +-
 fs/ocfs2/dlm/dlmthread.c                |  2 +-
 lib/genalloc.c                          |  2 +-
 net/ncsi/ncsi-manage.c                  |  4 ++--
 21 files changed, 47 insertions(+), 48 deletions(-)

diff --git a/arch/powerpc/platforms/pasemi/dma_lib.c b/arch/powerpc/platforms/pasemi/dma_lib.c
index 270fa3c0d372..26427311fc72 100644
--- a/arch/powerpc/platforms/pasemi/dma_lib.c
+++ b/arch/powerpc/platforms/pasemi/dma_lib.c
@@ -375,7 +375,7 @@ int pasemi_dma_alloc_flag(void)
 	int bit;
 
 retry:
-	bit = find_next_bit(flags_free, MAX_FLAGS, 0);
+	bit = find_first_bit(flags_free, MAX_FLAGS);
 	if (bit >= MAX_FLAGS)
 		return -ENOSPC;
 	if (!test_and_clear_bit(bit, flags_free))
@@ -440,7 +440,7 @@ int pasemi_dma_alloc_fun(void)
 	int bit;
 
 retry:
-	bit = find_next_bit(fun_free, MAX_FLAGS, 0);
+	bit = find_first_bit(fun_free, MAX_FLAGS);
 	if (bit >= MAX_FLAGS)
 		return -ENOSPC;
 	if (!test_and_clear_bit(bit, fun_free))
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 02574d7b3612..a5443a0accb3 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2023,7 +2023,7 @@ static unsigned long kvm_s390_next_dirty_cmma(struct kvm_memslots *slots,
 	while ((slotidx > 0) && (ofs >= ms->npages)) {
 		slotidx--;
 		ms = slots->memslots + slotidx;
-		ofs = find_next_bit(kvm_second_dirty_bitmap(ms), ms->npages, 0);
+		ofs = find_first_bit(kvm_second_dirty_bitmap(ms), ms->npages);
 	}
 	return ms->base_gfn + ofs;
 }
diff --git a/drivers/block/rnbd/rnbd-clt.c b/drivers/block/rnbd/rnbd-clt.c
index bd4a41afbbfc..5864c9b46cb9 100644
--- a/drivers/block/rnbd/rnbd-clt.c
+++ b/drivers/block/rnbd/rnbd-clt.c
@@ -196,7 +196,7 @@ rnbd_get_cpu_qlist(struct rnbd_clt_session *sess, int cpu)
 		return per_cpu_ptr(sess->cpu_queues, bit);
 	} else if (cpu != 0) {
 		/* Search from 0 to cpu */
-		bit = find_next_bit(sess->cpu_queues_bm, cpu, 0);
+		bit = find_first_bit(sess->cpu_queues_bm, cpu);
 		if (bit < cpu)
 			return per_cpu_ptr(sess->cpu_queues, bit);
 	}
diff --git a/drivers/dma/ti/edma.c b/drivers/dma/ti/edma.c
index 35d81bd857f1..caa4050ecc02 100644
--- a/drivers/dma/ti/edma.c
+++ b/drivers/dma/ti/edma.c
@@ -1681,7 +1681,7 @@ static irqreturn_t dma_ccerr_handler(int irq, void *data)
 
 			dev_dbg(ecc->dev, "EMR%d 0x%08x\n", j, val);
 			emr = val;
-			for (i = find_next_bit(&emr, 32, 0); i < 32;
+			for (i = find_first_bit(&emr, 32); i < 32;
 			     i = find_next_bit(&emr, 32, i + 1)) {
 				int k = (j << 5) + i;
 
diff --git a/drivers/iio/adc/ad7124.c b/drivers/iio/adc/ad7124.c
index e45c600fccc0..bc2cfa5f9592 100644
--- a/drivers/iio/adc/ad7124.c
+++ b/drivers/iio/adc/ad7124.c
@@ -347,7 +347,7 @@ static int ad7124_find_free_config_slot(struct ad7124_state *st)
 {
 	unsigned int free_cfg_slot;
 
-	free_cfg_slot = find_next_zero_bit(&st->cfg_slots_status, AD7124_MAX_CONFIGS, 0);
+	free_cfg_slot = find_first_zero_bit(&st->cfg_slots_status, AD7124_MAX_CONFIGS);
 	if (free_cfg_slot == AD7124_MAX_CONFIGS)
 		return -1;
 
diff --git a/drivers/infiniband/hw/irdma/hw.c b/drivers/infiniband/hw/irdma/hw.c
index 00de5ee9a260..a003893a431e 100644
--- a/drivers/infiniband/hw/irdma/hw.c
+++ b/drivers/infiniband/hw/irdma/hw.c
@@ -1696,14 +1696,14 @@ static enum irdma_status_code irdma_setup_init_state(struct irdma_pci_f *rf)
  */
 static void irdma_get_used_rsrc(struct irdma_device *iwdev)
 {
-	iwdev->rf->used_pds = find_next_zero_bit(iwdev->rf->allocated_pds,
-						 iwdev->rf->max_pd, 0);
-	iwdev->rf->used_qps = find_next_zero_bit(iwdev->rf->allocated_qps,
-						 iwdev->rf->max_qp, 0);
-	iwdev->rf->used_cqs = find_next_zero_bit(iwdev->rf->allocated_cqs,
-						 iwdev->rf->max_cq, 0);
-	iwdev->rf->used_mrs = find_next_zero_bit(iwdev->rf->allocated_mrs,
-						 iwdev->rf->max_mr, 0);
+	iwdev->rf->used_pds = find_first_zero_bit(iwdev->rf->allocated_pds,
+						 iwdev->rf->max_pd);
+	iwdev->rf->used_qps = find_first_zero_bit(iwdev->rf->allocated_qps,
+						 iwdev->rf->max_qp);
+	iwdev->rf->used_cqs = find_first_zero_bit(iwdev->rf->allocated_cqs,
+						 iwdev->rf->max_cq);
+	iwdev->rf->used_mrs = find_first_zero_bit(iwdev->rf->allocated_mrs,
+						 iwdev->rf->max_mr);
 }
 
 void irdma_ctrl_deinit_hw(struct irdma_pci_f *rf)
diff --git a/drivers/media/cec/core/cec-core.c b/drivers/media/cec/core/cec-core.c
index 551689d371a7..7322e7cd9753 100644
--- a/drivers/media/cec/core/cec-core.c
+++ b/drivers/media/cec/core/cec-core.c
@@ -106,7 +106,7 @@ static int __must_check cec_devnode_register(struct cec_devnode *devnode,
 
 	/* Part 1: Find a free minor number */
 	mutex_lock(&cec_devnode_lock);
-	minor = find_next_zero_bit(cec_devnode_nums, CEC_NUM_DEVICES, 0);
+	minor = find_first_zero_bit(cec_devnode_nums, CEC_NUM_DEVICES);
 	if (minor == CEC_NUM_DEVICES) {
 		mutex_unlock(&cec_devnode_lock);
 		pr_err("could not get a free minor\n");
diff --git a/drivers/media/mc/mc-devnode.c b/drivers/media/mc/mc-devnode.c
index f11382afe23b..680fbb3a9340 100644
--- a/drivers/media/mc/mc-devnode.c
+++ b/drivers/media/mc/mc-devnode.c
@@ -217,7 +217,7 @@ int __must_check media_devnode_register(struct media_device *mdev,
 
 	/* Part 1: Find a free minor number */
 	mutex_lock(&media_devnode_lock);
-	minor = find_next_zero_bit(media_devnode_nums, MEDIA_NUM_DEVICES, 0);
+	minor = find_first_zero_bit(media_devnode_nums, MEDIA_NUM_DEVICES);
 	if (minor == MEDIA_NUM_DEVICES) {
 		mutex_unlock(&media_devnode_lock);
 		pr_err("could not get a free minor\n");
diff --git a/drivers/pci/controller/dwc/pci-dra7xx.c b/drivers/pci/controller/dwc/pci-dra7xx.c
index fbbb78f6885e..ab4b133e1ccd 100644
--- a/drivers/pci/controller/dwc/pci-dra7xx.c
+++ b/drivers/pci/controller/dwc/pci-dra7xx.c
@@ -211,7 +211,7 @@ static int dra7xx_pcie_handle_msi(struct pcie_port *pp, int index)
 	if (!val)
 		return 0;
 
-	pos = find_next_bit(&val, MAX_MSI_IRQS_PER_CTRL, 0);
+	pos = find_first_bit(&val, MAX_MSI_IRQS_PER_CTRL);
 	while (pos != MAX_MSI_IRQS_PER_CTRL) {
 		generic_handle_domain_irq(pp->irq_domain,
 					  (index * MAX_MSI_IRQS_PER_CTRL) + pos);
diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index 47dd13719901..5274536f9ea1 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -17286,8 +17286,8 @@ lpfc_sli4_alloc_xri(struct lpfc_hba *phba)
 	 * the driver starts at 0 each time.
 	 */
 	spin_lock_irq(&phba->hbalock);
-	xri = find_next_zero_bit(phba->sli4_hba.xri_bmask,
-				 phba->sli4_hba.max_cfg_param.max_xri, 0);
+	xri = find_first_zero_bit(phba->sli4_hba.xri_bmask,
+				 phba->sli4_hba.max_cfg_param.max_xri);
 	if (xri >= phba->sli4_hba.max_cfg_param.max_xri) {
 		spin_unlock_irq(&phba->hbalock);
 		return NO_XRI;
@@ -18964,7 +18964,7 @@ lpfc_sli4_alloc_rpi(struct lpfc_hba *phba)
 	max_rpi = phba->sli4_hba.max_cfg_param.max_rpi;
 	rpi_limit = phba->sli4_hba.next_rpi;
 
-	rpi = find_next_zero_bit(phba->sli4_hba.rpi_bmask, rpi_limit, 0);
+	rpi = find_first_zero_bit(phba->sli4_hba.rpi_bmask, rpi_limit);
 	if (rpi >= rpi_limit)
 		rpi = LPFC_RPI_ALLOC_ERROR;
 	else {
@@ -19607,8 +19607,8 @@ lpfc_sli4_fcf_rr_next_index_get(struct lpfc_hba *phba)
 		 * have been tested so that we can detect when we should
 		 * change the priority level.
 		 */
-		next_fcf_index = find_next_bit(phba->fcf.fcf_rr_bmask,
-					       LPFC_SLI4_FCF_TBL_INDX_MAX, 0);
+		next_fcf_index = find_first_bit(phba->fcf.fcf_rr_bmask,
+					       LPFC_SLI4_FCF_TBL_INDX_MAX);
 	}
 
 
diff --git a/drivers/soc/ti/k3-ringacc.c b/drivers/soc/ti/k3-ringacc.c
index 312ba0f98ad7..573be88f8191 100644
--- a/drivers/soc/ti/k3-ringacc.c
+++ b/drivers/soc/ti/k3-ringacc.c
@@ -358,8 +358,8 @@ struct k3_ring *k3_ringacc_request_ring(struct k3_ringacc *ringacc,
 		goto out;
 
 	if (flags & K3_RINGACC_RING_USE_PROXY) {
-		proxy_id = find_next_zero_bit(ringacc->proxy_inuse,
-					      ringacc->num_proxies, 0);
+		proxy_id = find_first_zero_bit(ringacc->proxy_inuse,
+					      ringacc->num_proxies);
 		if (proxy_id == ringacc->num_proxies)
 			goto error;
 	}
diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index 0ec93f1a61f5..0965793dfe4f 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -1975,7 +1975,7 @@ static bool canon_copy_from_read_buf(struct tty_struct *tty,
 	more = n - (size - tail);
 	if (eol == N_TTY_BUF_SIZE && more) {
 		/* scan wrapped without finding set bit */
-		eol = find_next_bit(ldata->read_flags, more, 0);
+		eol = find_first_bit(ldata->read_flags, more);
 		found = eol != more;
 	} else
 		found = eol != size;
diff --git a/drivers/virt/acrn/ioreq.c b/drivers/virt/acrn/ioreq.c
index 80b2e3f0e276..5ff1c53740c0 100644
--- a/drivers/virt/acrn/ioreq.c
+++ b/drivers/virt/acrn/ioreq.c
@@ -246,8 +246,7 @@ void acrn_ioreq_request_clear(struct acrn_vm *vm)
 	spin_lock_bh(&vm->ioreq_clients_lock);
 	client = vm->default_client;
 	if (client) {
-		vcpu = find_next_bit(client->ioreqs_map,
-				     ACRN_IO_REQUEST_MAX, 0);
+		vcpu = find_first_bit(client->ioreqs_map, ACRN_IO_REQUEST_MAX);
 		while (vcpu < ACRN_IO_REQUEST_MAX) {
 			acrn_ioreq_complete_request(client, vcpu, NULL);
 			vcpu = find_next_bit(client->ioreqs_map,
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index b4dd22134a73..2a4a101b3fe3 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2524,8 +2524,8 @@ static void get_new_segment(struct f2fs_sb_info *sbi,
 	secno = find_next_zero_bit(free_i->free_secmap, MAIN_SECS(sbi), hint);
 	if (secno >= MAIN_SECS(sbi)) {
 		if (dir == ALLOC_RIGHT) {
-			secno = find_next_zero_bit(free_i->free_secmap,
-							MAIN_SECS(sbi), 0);
+			secno = find_first_zero_bit(free_i->free_secmap,
+							MAIN_SECS(sbi));
 			f2fs_bug_on(sbi, secno >= MAIN_SECS(sbi));
 		} else {
 			go_left = 1;
@@ -2540,8 +2540,8 @@ static void get_new_segment(struct f2fs_sb_info *sbi,
 			left_start--;
 			continue;
 		}
-		left_start = find_next_zero_bit(free_i->free_secmap,
-							MAIN_SECS(sbi), 0);
+		left_start = find_first_zero_bit(free_i->free_secmap,
+							MAIN_SECS(sbi));
 		f2fs_bug_on(sbi, left_start >= MAIN_SECS(sbi));
 		break;
 	}
diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index f89ffcbd585f..a17be1618bf7 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -379,7 +379,7 @@ static void o2hb_nego_timeout(struct work_struct *work)
 
 	o2hb_fill_node_map(live_node_bitmap, sizeof(live_node_bitmap));
 	/* lowest node as master node to make negotiate decision. */
-	master_node = find_next_bit(live_node_bitmap, O2NM_MAX_NODES, 0);
+	master_node = find_first_bit(live_node_bitmap, O2NM_MAX_NODES);
 
 	if (master_node == o2nm_this_node()) {
 		if (!test_bit(master_node, reg->hr_nego_node_bitmap)) {
diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
index 9f90fc9551e1..c4eccd499db8 100644
--- a/fs/ocfs2/dlm/dlmdomain.c
+++ b/fs/ocfs2/dlm/dlmdomain.c
@@ -1045,7 +1045,7 @@ static int dlm_send_regions(struct dlm_ctxt *dlm, unsigned long *node_map)
 	int status, ret = 0, i;
 	char *p;
 
-	if (find_next_bit(node_map, O2NM_MAX_NODES, 0) >= O2NM_MAX_NODES)
+	if (find_first_bit(node_map, O2NM_MAX_NODES) >= O2NM_MAX_NODES)
 		goto bail;
 
 	qr = kzalloc(sizeof(struct dlm_query_region), GFP_KERNEL);
@@ -1217,7 +1217,7 @@ static int dlm_send_nodeinfo(struct dlm_ctxt *dlm, unsigned long *node_map)
 	struct o2nm_node *node;
 	int ret = 0, status, count, i;
 
-	if (find_next_bit(node_map, O2NM_MAX_NODES, 0) >= O2NM_MAX_NODES)
+	if (find_first_bit(node_map, O2NM_MAX_NODES) >= O2NM_MAX_NODES)
 		goto bail;
 
 	qn = kzalloc(sizeof(struct dlm_query_nodeinfo), GFP_KERNEL);
diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
index 9b88219febb5..227da5b1b6ab 100644
--- a/fs/ocfs2/dlm/dlmmaster.c
+++ b/fs/ocfs2/dlm/dlmmaster.c
@@ -861,7 +861,7 @@ struct dlm_lock_resource * dlm_get_lock_resource(struct dlm_ctxt *dlm,
 		 * to see if there are any nodes that still need to be
 		 * considered.  these will not appear in the mle nodemap
 		 * but they might own this lockres.  wait on them. */
-		bit = find_next_bit(dlm->recovery_map, O2NM_MAX_NODES, 0);
+		bit = find_first_bit(dlm->recovery_map, O2NM_MAX_NODES);
 		if (bit < O2NM_MAX_NODES) {
 			mlog(0, "%s: res %.*s, At least one node (%d) "
 			     "to recover before lock mastery can begin\n",
@@ -912,7 +912,7 @@ struct dlm_lock_resource * dlm_get_lock_resource(struct dlm_ctxt *dlm,
 		dlm_wait_for_recovery(dlm);
 
 		spin_lock(&dlm->spinlock);
-		bit = find_next_bit(dlm->recovery_map, O2NM_MAX_NODES, 0);
+		bit = find_first_bit(dlm->recovery_map, O2NM_MAX_NODES);
 		if (bit < O2NM_MAX_NODES) {
 			mlog(0, "%s: res %.*s, At least one node (%d) "
 			     "to recover before lock mastery can begin\n",
@@ -1079,7 +1079,7 @@ static int dlm_wait_for_lock_mastery(struct dlm_ctxt *dlm,
 		sleep = 1;
 		/* have all nodes responded? */
 		if (voting_done && !*blocked) {
-			bit = find_next_bit(mle->maybe_map, O2NM_MAX_NODES, 0);
+			bit = find_first_bit(mle->maybe_map, O2NM_MAX_NODES);
 			if (dlm->node_num <= bit) {
 				/* my node number is lowest.
 			 	 * now tell other nodes that I am
@@ -1234,8 +1234,8 @@ static int dlm_restart_lock_mastery(struct dlm_ctxt *dlm,
 		} else {
 			mlog(ML_ERROR, "node down! %d\n", node);
 			if (blocked) {
-				int lowest = find_next_bit(mle->maybe_map,
-						       O2NM_MAX_NODES, 0);
+				int lowest = find_first_bit(mle->maybe_map,
+						       O2NM_MAX_NODES);
 
 				/* act like it was never there */
 				clear_bit(node, mle->maybe_map);
@@ -1795,7 +1795,7 @@ int dlm_assert_master_handler(struct o2net_msg *msg, u32 len, void *data,
 		     "MLE for it! (%.*s)\n", assert->node_idx,
 		     namelen, name);
 	} else {
-		int bit = find_next_bit (mle->maybe_map, O2NM_MAX_NODES, 0);
+		int bit = find_first_bit(mle->maybe_map, O2NM_MAX_NODES);
 		if (bit >= O2NM_MAX_NODES) {
 			/* not necessarily an error, though less likely.
 			 * could be master just re-asserting. */
@@ -2521,7 +2521,7 @@ static int dlm_is_lockres_migratable(struct dlm_ctxt *dlm,
 	}
 
 	if (!nonlocal) {
-		node_ref = find_next_bit(res->refmap, O2NM_MAX_NODES, 0);
+		node_ref = find_first_bit(res->refmap, O2NM_MAX_NODES);
 		if (node_ref >= O2NM_MAX_NODES)
 			return 0;
 	}
@@ -3303,7 +3303,7 @@ static void dlm_clean_block_mle(struct dlm_ctxt *dlm,
 	BUG_ON(mle->type != DLM_MLE_BLOCK);
 
 	spin_lock(&mle->spinlock);
-	bit = find_next_bit(mle->maybe_map, O2NM_MAX_NODES, 0);
+	bit = find_first_bit(mle->maybe_map, O2NM_MAX_NODES);
 	if (bit != dead_node) {
 		mlog(0, "mle found, but dead node %u would not have been "
 		     "master\n", dead_node);
@@ -3542,7 +3542,7 @@ void dlm_force_free_mles(struct dlm_ctxt *dlm)
 	spin_lock(&dlm->master_lock);
 
 	BUG_ON(dlm->dlm_state != DLM_CTXT_LEAVING);
-	BUG_ON((find_next_bit(dlm->domain_map, O2NM_MAX_NODES, 0) < O2NM_MAX_NODES));
+	BUG_ON((find_first_bit(dlm->domain_map, O2NM_MAX_NODES) < O2NM_MAX_NODES));
 
 	for (i = 0; i < DLM_HASH_BUCKETS; i++) {
 		bucket = dlm_master_hash(dlm, i);
diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
index 0e7aad1b11cc..e24e327524f8 100644
--- a/fs/ocfs2/dlm/dlmrecovery.c
+++ b/fs/ocfs2/dlm/dlmrecovery.c
@@ -451,7 +451,7 @@ static int dlm_do_recovery(struct dlm_ctxt *dlm)
 	if (dlm->reco.dead_node == O2NM_INVALID_NODE_NUM) {
 		int bit;
 
-		bit = find_next_bit (dlm->recovery_map, O2NM_MAX_NODES, 0);
+		bit = find_first_bit(dlm->recovery_map, O2NM_MAX_NODES);
 		if (bit >= O2NM_MAX_NODES || bit < 0)
 			dlm_set_reco_dead_node(dlm, O2NM_INVALID_NODE_NUM);
 		else
diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c
index c350bd4df770..eedf07ca23ca 100644
--- a/fs/ocfs2/dlm/dlmthread.c
+++ b/fs/ocfs2/dlm/dlmthread.c
@@ -92,7 +92,7 @@ int __dlm_lockres_unused(struct dlm_lock_resource *res)
 		return 0;
 
 	/* Another node has this resource with this node as the master */
-	bit = find_next_bit(res->refmap, O2NM_MAX_NODES, 0);
+	bit = find_first_bit(res->refmap, O2NM_MAX_NODES);
 	if (bit < O2NM_MAX_NODES)
 		return 0;
 
diff --git a/lib/genalloc.c b/lib/genalloc.c
index 9a57257988c7..00fc50d0a640 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -251,7 +251,7 @@ void gen_pool_destroy(struct gen_pool *pool)
 		list_del(&chunk->next_chunk);
 
 		end_bit = chunk_size(chunk) >> order;
-		bit = find_next_bit(chunk->bits, end_bit, 0);
+		bit = find_first_bit(chunk->bits, end_bit);
 		BUG_ON(bit < end_bit);
 
 		vfree(chunk);
diff --git a/net/ncsi/ncsi-manage.c b/net/ncsi/ncsi-manage.c
index 89c7742cd72e..a554837d86dd 100644
--- a/net/ncsi/ncsi-manage.c
+++ b/net/ncsi/ncsi-manage.c
@@ -608,7 +608,7 @@ static int clear_one_vid(struct ncsi_dev_priv *ndp, struct ncsi_channel *nc,
 	bitmap = &ncf->bitmap;
 
 	spin_lock_irqsave(&nc->lock, flags);
-	index = find_next_bit(bitmap, ncf->n_vids, 0);
+	index = find_first_bit(bitmap, ncf->n_vids);
 	if (index >= ncf->n_vids) {
 		spin_unlock_irqrestore(&nc->lock, flags);
 		return -1;
@@ -667,7 +667,7 @@ static int set_one_vid(struct ncsi_dev_priv *ndp, struct ncsi_channel *nc,
 		return -1;
 	}
 
-	index = find_next_zero_bit(bitmap, ncf->n_vids, 0);
+	index = find_first_zero_bit(bitmap, ncf->n_vids);
 	if (index < 0 || index >= ncf->n_vids) {
 		netdev_err(ndp->ndev.dev,
 			   "Channel %u already has all VLAN filters set\n",
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 08/17] tools: sync tools/bitmap with mother linux
  2021-08-14 21:16 [PATCH RESEND 00/17] Resend bitmap patches Yury Norov
                   ` (6 preceding siblings ...)
  2021-08-14 21:17 ` [PATCH 07/17] all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate Yury Norov
@ 2021-08-14 21:17 ` Yury Norov
  2021-08-14 21:17 ` [PATCH 09/17] cpumask: replace cpumask_next_* with cpumask_first_* where appropriate Yury Norov
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Yury Norov @ 2021-08-14 21:17 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm, linux-arch,
	linux-kselftest, linux-mmc, linux-perf-users, kvm,
	James E.J. Bottomley, Alexander Lobakin, Alexander Shishkin,
	Alexey Klimov, Andrea Merello, Andy Shevchenko,
	Arnaldo Carvalho de Melo, Arnd Bergmann, Ben Gardon,
	Benjamin Herrenschmidt, Brian Cain, Catalin Marinas,
	Christoph Lameter, Daniel Bristot de Oliveira, David Hildenbrand,
	Dennis Zhou, Geert Uytterhoeven, Heiko Carstens, Ian Rogers,
	Ingo Molnar, Jaegeuk Kim, Jakub Kicinski, Jiri Olsa, Joe Perches,
	Jonas Bonn, Leo Yan, Mark Rutland, Namhyung Kim, Palmer Dabbelt,
	Paolo Bonzini, Peter Xu, Peter Zijlstra, Petr Mladek,
	Rasmus Villemoes, Rich Felker, Samuel Mendoza-Jonas,
	Sean Christopherson, Sergey Senozhatsky, Shuah Khan,
	Stefan Kristiansson, Steven Rostedt, Tejun Heo,
	Thomas Bogendoerfer, Ulf Hansson, Will Deacon, Wolfram Sang,
	Yoshinori Sato, Yury Norov

Remove tools/include/asm-generic/bitops/find.h and copy
include/linux/bitmap.h to tools. find_*_le() functions are not copied
because not needed in tools.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
---
 MAINTAINERS                                   |  2 +-
 tools/include/asm-generic/bitops.h            |  1 -
 tools/include/linux/bitmap.h                  |  7 +-
 .../{asm-generic/bitops => linux}/find.h      | 81 +++++++++++++++++--
 tools/lib/find_bit.c                          | 20 +++++
 5 files changed, 100 insertions(+), 11 deletions(-)
 rename tools/include/{asm-generic/bitops => linux}/find.h (63%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 9b62293f7b72..b033083dbb42 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3277,8 +3277,8 @@ F:	lib/bitmap.c
 F:	lib/find_bit.c
 F:	lib/find_bit_benchmark.c
 F:	lib/test_bitmap.c
-F:	tools/include/asm-generic/bitops/find.h
 F:	tools/include/linux/bitmap.h
+F:	tools/include/linux/find.h
 F:	tools/lib/bitmap.c
 F:	tools/lib/find_bit.c
 
diff --git a/tools/include/asm-generic/bitops.h b/tools/include/asm-generic/bitops.h
index 5d2ab38965cc..9ab313e93555 100644
--- a/tools/include/asm-generic/bitops.h
+++ b/tools/include/asm-generic/bitops.h
@@ -18,7 +18,6 @@
 #include <asm-generic/bitops/fls.h>
 #include <asm-generic/bitops/__fls.h>
 #include <asm-generic/bitops/fls64.h>
-#include <asm-generic/bitops/find.h>
 
 #ifndef _TOOLS_LINUX_BITOPS_H_
 #error only <linux/bitops.h> can be included directly
diff --git a/tools/include/linux/bitmap.h b/tools/include/linux/bitmap.h
index 9d959bc24859..13d90b574970 100644
--- a/tools/include/linux/bitmap.h
+++ b/tools/include/linux/bitmap.h
@@ -1,9 +1,10 @@
 /* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _PERF_BITOPS_H
-#define _PERF_BITOPS_H
+#ifndef _TOOLS_LINUX_BITMAP_H
+#define _TOOLS_LINUX_BITMAP_H
 
 #include <string.h>
 #include <linux/bitops.h>
+#include <linux/find.h>
 #include <stdlib.h>
 #include <linux/kernel.h>
 
@@ -181,4 +182,4 @@ static inline int bitmap_intersects(const unsigned long *src1,
 		return __bitmap_intersects(src1, src2, nbits);
 }
 
-#endif /* _PERF_BITOPS_H */
+#endif /* _TOOLS_LINUX_BITMAP_H */
diff --git a/tools/include/asm-generic/bitops/find.h b/tools/include/linux/find.h
similarity index 63%
rename from tools/include/asm-generic/bitops/find.h
rename to tools/include/linux/find.h
index 6481fd11012a..47e2bd6c5174 100644
--- a/tools/include/asm-generic/bitops/find.h
+++ b/tools/include/linux/find.h
@@ -1,11 +1,19 @@
 /* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _TOOLS_LINUX_ASM_GENERIC_BITOPS_FIND_H_
-#define _TOOLS_LINUX_ASM_GENERIC_BITOPS_FIND_H_
+#ifndef _TOOLS_LINUX_FIND_H_
+#define _TOOLS_LINUX_FIND_H_
+
+#ifndef _TOOLS_LINUX_BITMAP_H
+#error tools: only <linux/bitmap.h> can be included directly
+#endif
+
+#include <linux/bitops.h>
 
 extern unsigned long _find_next_bit(const unsigned long *addr1,
 		const unsigned long *addr2, unsigned long nbits,
 		unsigned long start, unsigned long invert, unsigned long le);
 extern unsigned long _find_first_bit(const unsigned long *addr, unsigned long size);
+extern unsigned long _find_first_and_bit(const unsigned long *addr1,
+					 const unsigned long *addr2, unsigned long size);
 extern unsigned long _find_first_zero_bit(const unsigned long *addr, unsigned long size);
 extern unsigned long _find_last_bit(const unsigned long *addr, unsigned long size);
 
@@ -96,7 +104,6 @@ unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
 #endif
 
 #ifndef find_first_bit
-
 /**
  * find_first_bit - find the first set bit in a memory region
  * @addr: The address to start the search at
@@ -116,11 +123,34 @@ unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
 
 	return _find_first_bit(addr, size);
 }
+#endif
+
+#ifndef find_first_and_bit
+/**
+ * find_first_and_bit - find the first set bit in both memory regions
+ * @addr1: The first address to base the search on
+ * @addr2: The second address to base the search on
+ * @size: The bitmap size in bits
+ *
+ * Returns the bit number for the next set bit
+ * If no bits are set, returns @size.
+ */
+static inline
+unsigned long find_first_and_bit(const unsigned long *addr1,
+				 const unsigned long *addr2,
+				 unsigned long size)
+{
+	if (small_const_nbits(size)) {
+		unsigned long val = *addr1 & *addr2 & GENMASK(size - 1, 0);
 
-#endif /* find_first_bit */
+		return val ? __ffs(val) : size;
+	}
 
-#ifndef find_first_zero_bit
+	return _find_first_and_bit(addr1, addr2, size);
+}
+#endif
 
+#ifndef find_first_zero_bit
 /**
  * find_first_zero_bit - find the first cleared bit in a memory region
  * @addr: The address to start the search at
@@ -142,4 +172,43 @@ unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
 }
 #endif
 
-#endif /*_TOOLS_LINUX_ASM_GENERIC_BITOPS_FIND_H_ */
+#ifndef find_last_bit
+/**
+ * find_last_bit - find the last set bit in a memory region
+ * @addr: The address to start the search at
+ * @size: The number of bits to search
+ *
+ * Returns the bit number of the last set bit, or size.
+ */
+static inline
+unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
+{
+	if (small_const_nbits(size)) {
+		unsigned long val = *addr & GENMASK(size - 1, 0);
+
+		return val ? __fls(val) : size;
+	}
+
+	return _find_last_bit(addr, size);
+}
+#endif
+
+/**
+ * find_next_clump8 - find next 8-bit clump with set bits in a memory region
+ * @clump: location to store copy of found clump
+ * @addr: address to base the search on
+ * @size: bitmap size in number of bits
+ * @offset: bit offset at which to start searching
+ *
+ * Returns the bit offset for the next set clump; the found clump value is
+ * copied to the location pointed by @clump. If no bits are set, returns @size.
+ */
+extern unsigned long find_next_clump8(unsigned long *clump,
+				      const unsigned long *addr,
+				      unsigned long size, unsigned long offset);
+
+#define find_first_clump8(clump, bits, size) \
+	find_next_clump8((clump), (bits), (size), 0)
+
+
+#endif /*__LINUX_FIND_H_ */
diff --git a/tools/lib/find_bit.c b/tools/lib/find_bit.c
index 109aa7ffcf97..ba4b8d94e004 100644
--- a/tools/lib/find_bit.c
+++ b/tools/lib/find_bit.c
@@ -96,6 +96,26 @@ unsigned long _find_first_bit(const unsigned long *addr, unsigned long size)
 }
 #endif
 
+#ifndef find_first_and_bit
+/*
+ * Find the first set bit in two memory regions.
+ */
+unsigned long _find_first_and_bit(const unsigned long *addr1,
+				  const unsigned long *addr2,
+				  unsigned long size)
+{
+	unsigned long idx, val;
+
+	for (idx = 0; idx * BITS_PER_LONG < size; idx++) {
+		val = addr1[idx] & addr2[idx];
+		if (val)
+			return min(idx * BITS_PER_LONG + __ffs(val), size);
+	}
+
+	return size;
+}
+#endif
+
 #ifndef find_first_zero_bit
 /*
  * Find the first cleared bit in a memory region.
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 09/17] cpumask: replace cpumask_next_* with cpumask_first_* where appropriate
  2021-08-14 21:16 [PATCH RESEND 00/17] Resend bitmap patches Yury Norov
                   ` (7 preceding siblings ...)
  2021-08-14 21:17 ` [PATCH 08/17] tools: sync tools/bitmap with mother linux Yury Norov
@ 2021-08-14 21:17 ` Yury Norov
  2021-08-14 21:17 ` [PATCH 10/17] include/linux: move for_each_bit() macros from bitops.h to find.h Yury Norov
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Yury Norov @ 2021-08-14 21:17 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm, linux-arch,
	linux-kselftest, linux-mmc, linux-perf-users, kvm,
	James E.J. Bottomley, Alexander Lobakin, Alexander Shishkin,
	Alexey Klimov, Andrea Merello, Andy Shevchenko,
	Arnaldo Carvalho de Melo, Arnd Bergmann, Ben Gardon,
	Benjamin Herrenschmidt, Brian Cain, Catalin Marinas,
	Christoph Lameter, Daniel Bristot de Oliveira, David Hildenbrand,
	Dennis Zhou, Geert Uytterhoeven, Heiko Carstens, Ian Rogers,
	Ingo Molnar, Jaegeuk Kim, Jakub Kicinski, Jiri Olsa, Joe Perches,
	Jonas Bonn, Leo Yan, Mark Rutland, Namhyung Kim, Palmer Dabbelt,
	Paolo Bonzini, Peter Xu, Peter Zijlstra, Petr Mladek,
	Rasmus Villemoes, Rich Felker, Samuel Mendoza-Jonas,
	Sean Christopherson, Sergey Senozhatsky, Shuah Khan,
	Stefan Kristiansson, Steven Rostedt, Tejun Heo,
	Thomas Bogendoerfer, Ulf Hansson, Will Deacon, Wolfram Sang,
	Yoshinori Sato, Yury Norov

cpumask_first() is a more effective analogue of 'next' version if n == -1
(which means start == 0). This patch replaces 'next' with 'first' where
things look trivial.

There's no cpumask_first_zero() function, so create it.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
---
 arch/powerpc/include/asm/cputhreads.h |  2 +-
 block/blk-mq.c                        |  2 +-
 drivers/net/virtio_net.c              |  2 +-
 drivers/soc/fsl/qbman/bman_portal.c   |  2 +-
 drivers/soc/fsl/qbman/qman_portal.c   |  2 +-
 include/linux/cpumask.h               | 16 ++++++++++++++++
 kernel/time/clocksource.c             |  4 ++--
 7 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/cputhreads.h b/arch/powerpc/include/asm/cputhreads.h
index b167186aaee4..44286df21d2a 100644
--- a/arch/powerpc/include/asm/cputhreads.h
+++ b/arch/powerpc/include/asm/cputhreads.h
@@ -52,7 +52,7 @@ static inline cpumask_t cpu_thread_mask_to_cores(const struct cpumask *threads)
 	for (i = 0; i < NR_CPUS; i += threads_per_core) {
 		cpumask_shift_left(&tmp, &threads_core_mask, i);
 		if (cpumask_intersects(threads, &tmp)) {
-			cpu = cpumask_next_and(-1, &tmp, cpu_online_mask);
+			cpu = cpumask_first_and(&tmp, cpu_online_mask);
 			if (cpu < nr_cpu_ids)
 				cpumask_set_cpu(cpu, &res);
 		}
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 3ab4540320ca..39c38a8a996d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2544,7 +2544,7 @@ static bool blk_mq_hctx_has_requests(struct blk_mq_hw_ctx *hctx)
 static inline bool blk_mq_last_cpu_in_hctx(unsigned int cpu,
 		struct blk_mq_hw_ctx *hctx)
 {
-	if (cpumask_next_and(-1, hctx->cpumask, cpu_online_mask) != cpu)
+	if (cpumask_first_and(hctx->cpumask, cpu_online_mask) != cpu)
 		return false;
 	if (cpumask_next_and(cpu, hctx->cpumask, cpu_online_mask) < nr_cpu_ids)
 		return false;
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 2e42210a6503..7fb87f8a7973 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2080,7 +2080,7 @@ static void virtnet_set_affinity(struct virtnet_info *vi)
 	stragglers = num_cpu >= vi->curr_queue_pairs ?
 			num_cpu % vi->curr_queue_pairs :
 			0;
-	cpu = cpumask_next(-1, cpu_online_mask);
+	cpu = cpumask_first(cpu_online_mask);
 
 	for (i = 0; i < vi->curr_queue_pairs; i++) {
 		group_size = stride + (i < stragglers ? 1 : 0);
diff --git a/drivers/soc/fsl/qbman/bman_portal.c b/drivers/soc/fsl/qbman/bman_portal.c
index acda8a5637c5..4d7b9caee1c4 100644
--- a/drivers/soc/fsl/qbman/bman_portal.c
+++ b/drivers/soc/fsl/qbman/bman_portal.c
@@ -155,7 +155,7 @@ static int bman_portal_probe(struct platform_device *pdev)
 	}
 
 	spin_lock(&bman_lock);
-	cpu = cpumask_next_zero(-1, &portal_cpus);
+	cpu = cpumask_first_zero(&portal_cpus);
 	if (cpu >= nr_cpu_ids) {
 		__bman_portals_probed = 1;
 		/* unassigned portal, skip init */
diff --git a/drivers/soc/fsl/qbman/qman_portal.c b/drivers/soc/fsl/qbman/qman_portal.c
index 96f74a1dc603..e23b60618c1a 100644
--- a/drivers/soc/fsl/qbman/qman_portal.c
+++ b/drivers/soc/fsl/qbman/qman_portal.c
@@ -248,7 +248,7 @@ static int qman_portal_probe(struct platform_device *pdev)
 	pcfg->pools = qm_get_pools_sdqcr();
 
 	spin_lock(&qman_lock);
-	cpu = cpumask_next_zero(-1, &portal_cpus);
+	cpu = cpumask_first_zero(&portal_cpus);
 	if (cpu >= nr_cpu_ids) {
 		__qman_portals_probed = 1;
 		/* unassigned portal, skip init */
diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 4a03f6636e6c..f5883a8f28ca 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -123,6 +123,11 @@ static inline unsigned int cpumask_first(const struct cpumask *srcp)
 	return 0;
 }
 
+static inline unsigned int cpumask_first_zero(const struct cpumask *srcp)
+{
+	return 0;
+}
+
 static inline unsigned int cpumask_first_and(const struct cpumask *srcp1,
 					     const struct cpumask *srcp2)
 {
@@ -201,6 +206,17 @@ static inline unsigned int cpumask_first(const struct cpumask *srcp)
 	return find_first_bit(cpumask_bits(srcp), nr_cpumask_bits);
 }
 
+/**
+ * cpumask_first_zero - get the first unset cpu in a cpumask
+ * @srcp: the cpumask pointer
+ *
+ * Returns >= nr_cpu_ids if all cpus are set.
+ */
+static inline unsigned int cpumask_first_zero(const struct cpumask *srcp)
+{
+	return find_first_zero_bit(cpumask_bits(srcp), nr_cpumask_bits);
+}
+
 /**
  * cpumask_first_and - return the first cpu from *srcp1 & *srcp2
  * @src1p: the first input
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index b038b81f8d32..2f170383b00a 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -262,7 +262,7 @@ static void clocksource_verify_choose_cpus(void)
 		return;
 
 	/* Make sure to select at least one CPU other than the current CPU. */
-	cpu = cpumask_next(-1, cpu_online_mask);
+	cpu = cpumask_first(cpu_online_mask);
 	if (cpu == smp_processor_id())
 		cpu = cpumask_next(cpu, cpu_online_mask);
 	if (WARN_ON_ONCE(cpu >= nr_cpu_ids))
@@ -284,7 +284,7 @@ static void clocksource_verify_choose_cpus(void)
 		cpu = prandom_u32() % nr_cpu_ids;
 		cpu = cpumask_next(cpu - 1, cpu_online_mask);
 		if (cpu >= nr_cpu_ids)
-			cpu = cpumask_next(-1, cpu_online_mask);
+			cpu = cpumask_first(cpu_online_mask);
 		if (!WARN_ON_ONCE(cpu >= nr_cpu_ids))
 			cpumask_set_cpu(cpu, &cpus_chosen);
 	}
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 10/17] include/linux: move for_each_bit() macros from bitops.h to find.h
  2021-08-14 21:16 [PATCH RESEND 00/17] Resend bitmap patches Yury Norov
                   ` (8 preceding siblings ...)
  2021-08-14 21:17 ` [PATCH 09/17] cpumask: replace cpumask_next_* with cpumask_first_* where appropriate Yury Norov
@ 2021-08-14 21:17 ` Yury Norov
  2021-08-14 21:17 ` [PATCH 11/17] find: micro-optimize for_each_{set,clear}_bit() Yury Norov
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Yury Norov @ 2021-08-14 21:17 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm, linux-arch,
	linux-kselftest, linux-mmc, linux-perf-users, kvm,
	James E.J. Bottomley, Alexander Lobakin, Alexander Shishkin,
	Alexey Klimov, Andrea Merello, Andy Shevchenko,
	Arnaldo Carvalho de Melo, Arnd Bergmann, Ben Gardon,
	Benjamin Herrenschmidt, Brian Cain, Catalin Marinas,
	Christoph Lameter, Daniel Bristot de Oliveira, David Hildenbrand,
	Dennis Zhou, Geert Uytterhoeven, Heiko Carstens, Ian Rogers,
	Ingo Molnar, Jaegeuk Kim, Jakub Kicinski, Jiri Olsa, Joe Perches,
	Jonas Bonn, Leo Yan, Mark Rutland, Namhyung Kim, Palmer Dabbelt,
	Paolo Bonzini, Peter Xu, Peter Zijlstra, Petr Mladek,
	Rasmus Villemoes, Rich Felker, Samuel Mendoza-Jonas,
	Sean Christopherson, Sergey Senozhatsky, Shuah Khan,
	Stefan Kristiansson, Steven Rostedt, Tejun Heo,
	Thomas Bogendoerfer, Ulf Hansson, Will Deacon, Wolfram Sang,
	Yoshinori Sato, Yury Norov

for_each_bit() macros depend on find_bit() machinery, and so the
proper place for them is the find.h header.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
---
 include/linux/bitops.h | 34 ----------------------------------
 include/linux/find.h   | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 5e62e2383b7f..7aaed501f768 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -32,40 +32,6 @@ extern unsigned long __sw_hweight64(__u64 w);
  */
 #include <asm/bitops.h>
 
-#define for_each_set_bit(bit, addr, size) \
-	for ((bit) = find_first_bit((addr), (size));		\
-	     (bit) < (size);					\
-	     (bit) = find_next_bit((addr), (size), (bit) + 1))
-
-/* same as for_each_set_bit() but use bit as value to start with */
-#define for_each_set_bit_from(bit, addr, size) \
-	for ((bit) = find_next_bit((addr), (size), (bit));	\
-	     (bit) < (size);					\
-	     (bit) = find_next_bit((addr), (size), (bit) + 1))
-
-#define for_each_clear_bit(bit, addr, size) \
-	for ((bit) = find_first_zero_bit((addr), (size));	\
-	     (bit) < (size);					\
-	     (bit) = find_next_zero_bit((addr), (size), (bit) + 1))
-
-/* same as for_each_clear_bit() but use bit as value to start with */
-#define for_each_clear_bit_from(bit, addr, size) \
-	for ((bit) = find_next_zero_bit((addr), (size), (bit));	\
-	     (bit) < (size);					\
-	     (bit) = find_next_zero_bit((addr), (size), (bit) + 1))
-
-/**
- * for_each_set_clump8 - iterate over bitmap for each 8-bit clump with set bits
- * @start: bit offset to start search and to store the current iteration offset
- * @clump: location to store copy of current 8-bit clump
- * @bits: bitmap address to base the search on
- * @size: bitmap size in number of bits
- */
-#define for_each_set_clump8(start, clump, bits, size) \
-	for ((start) = find_first_clump8(&(clump), (bits), (size)); \
-	     (start) < (size); \
-	     (start) = find_next_clump8(&(clump), (bits), (size), (start) + 8))
-
 static inline int get_bitmask_order(unsigned int count)
 {
 	int order;
diff --git a/include/linux/find.h b/include/linux/find.h
index 6048f8c97418..4500e8ab93e2 100644
--- a/include/linux/find.h
+++ b/include/linux/find.h
@@ -279,4 +279,38 @@ unsigned long find_next_bit_le(const void *addr, unsigned
 #error "Please fix <asm/byteorder.h>"
 #endif
 
+#define for_each_set_bit(bit, addr, size) \
+	for ((bit) = find_first_bit((addr), (size));		\
+	     (bit) < (size);					\
+	     (bit) = find_next_bit((addr), (size), (bit) + 1))
+
+/* same as for_each_set_bit() but use bit as value to start with */
+#define for_each_set_bit_from(bit, addr, size) \
+	for ((bit) = find_next_bit((addr), (size), (bit));	\
+	     (bit) < (size);					\
+	     (bit) = find_next_bit((addr), (size), (bit) + 1))
+
+#define for_each_clear_bit(bit, addr, size) \
+	for ((bit) = find_first_zero_bit((addr), (size));	\
+	     (bit) < (size);					\
+	     (bit) = find_next_zero_bit((addr), (size), (bit) + 1))
+
+/* same as for_each_clear_bit() but use bit as value to start with */
+#define for_each_clear_bit_from(bit, addr, size) \
+	for ((bit) = find_next_zero_bit((addr), (size), (bit));	\
+	     (bit) < (size);					\
+	     (bit) = find_next_zero_bit((addr), (size), (bit) + 1))
+
+/**
+ * for_each_set_clump8 - iterate over bitmap for each 8-bit clump with set bits
+ * @start: bit offset to start search and to store the current iteration offset
+ * @clump: location to store copy of current 8-bit clump
+ * @bits: bitmap address to base the search on
+ * @size: bitmap size in number of bits
+ */
+#define for_each_set_clump8(start, clump, bits, size) \
+	for ((start) = find_first_clump8(&(clump), (bits), (size)); \
+	     (start) < (size); \
+	     (start) = find_next_clump8(&(clump), (bits), (size), (start) + 8))
+
 #endif /*__LINUX_FIND_H_ */
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 11/17] find: micro-optimize for_each_{set,clear}_bit()
  2021-08-14 21:16 [PATCH RESEND 00/17] Resend bitmap patches Yury Norov
                   ` (9 preceding siblings ...)
  2021-08-14 21:17 ` [PATCH 10/17] include/linux: move for_each_bit() macros from bitops.h to find.h Yury Norov
@ 2021-08-14 21:17 ` Yury Norov
  2021-08-26 13:57   ` Petr Mladek
  2021-08-14 21:17 ` [PATCH 12/17] Replace for_each_*_bit_from() with for_each_*_bit() where appropriate Yury Norov
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 26+ messages in thread
From: Yury Norov @ 2021-08-14 21:17 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm, linux-arch,
	linux-kselftest, linux-mmc, linux-perf-users, kvm,
	James E.J. Bottomley, Alexander Lobakin, Alexander Shishkin,
	Alexey Klimov, Andrea Merello, Andy Shevchenko,
	Arnaldo Carvalho de Melo, Arnd Bergmann, Ben Gardon,
	Benjamin Herrenschmidt, Brian Cain, Catalin Marinas,
	Christoph Lameter, Daniel Bristot de Oliveira, David Hildenbrand,
	Dennis Zhou, Geert Uytterhoeven, Heiko Carstens, Ian Rogers,
	Ingo Molnar, Jaegeuk Kim, Jakub Kicinski, Jiri Olsa, Joe Perches,
	Jonas Bonn, Leo Yan, Mark Rutland, Namhyung Kim, Palmer Dabbelt,
	Paolo Bonzini, Peter Xu, Peter Zijlstra, Petr Mladek,
	Rasmus Villemoes, Rich Felker, Samuel Mendoza-Jonas,
	Sean Christopherson, Sergey Senozhatsky, Shuah Khan,
	Stefan Kristiansson, Steven Rostedt, Tejun Heo,
	Thomas Bogendoerfer, Ulf Hansson, Will Deacon, Wolfram Sang,
	Yoshinori Sato, Yury Norov

The macros iterate thru all set/clear bits in a bitmap. They search a
first bit using find_first_bit(), and the rest bits using find_next_bit().

Since find_next_bit() is called shortly after find_first_bit(), we can
save few lines of I-cache by not using find_first_bit().

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
---
 include/linux/find.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/find.h b/include/linux/find.h
index 4500e8ab93e2..ae9ed52b52b8 100644
--- a/include/linux/find.h
+++ b/include/linux/find.h
@@ -280,7 +280,7 @@ unsigned long find_next_bit_le(const void *addr, unsigned
 #endif
 
 #define for_each_set_bit(bit, addr, size) \
-	for ((bit) = find_first_bit((addr), (size));		\
+	for ((bit) = find_next_bit((addr), (size), 0);		\
 	     (bit) < (size);					\
 	     (bit) = find_next_bit((addr), (size), (bit) + 1))
 
@@ -291,7 +291,7 @@ unsigned long find_next_bit_le(const void *addr, unsigned
 	     (bit) = find_next_bit((addr), (size), (bit) + 1))
 
 #define for_each_clear_bit(bit, addr, size) \
-	for ((bit) = find_first_zero_bit((addr), (size));	\
+	for ((bit) = find_next_zero_bit((addr), (size), 0);	\
 	     (bit) < (size);					\
 	     (bit) = find_next_zero_bit((addr), (size), (bit) + 1))
 
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 12/17] Replace for_each_*_bit_from() with for_each_*_bit() where appropriate
  2021-08-14 21:16 [PATCH RESEND 00/17] Resend bitmap patches Yury Norov
                   ` (10 preceding siblings ...)
  2021-08-14 21:17 ` [PATCH 11/17] find: micro-optimize for_each_{set,clear}_bit() Yury Norov
@ 2021-08-14 21:17 ` Yury Norov
  2021-08-14 21:17 ` [PATCH 13/17] tools: Rename bitmap_alloc() to bitmap_zalloc() Yury Norov
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Yury Norov @ 2021-08-14 21:17 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm, linux-arch,
	linux-kselftest, linux-mmc, linux-perf-users, kvm,
	James E.J. Bottomley, Alexander Lobakin, Alexander Shishkin,
	Alexey Klimov, Andrea Merello, Andy Shevchenko,
	Arnaldo Carvalho de Melo, Arnd Bergmann, Ben Gardon,
	Benjamin Herrenschmidt, Brian Cain, Catalin Marinas,
	Christoph Lameter, Daniel Bristot de Oliveira, David Hildenbrand,
	Dennis Zhou, Geert Uytterhoeven, Heiko Carstens, Ian Rogers,
	Ingo Molnar, Jaegeuk Kim, Jakub Kicinski, Jiri Olsa, Joe Perches,
	Jonas Bonn, Leo Yan, Mark Rutland, Namhyung Kim, Palmer Dabbelt,
	Paolo Bonzini, Peter Xu, Peter Zijlstra, Petr Mladek,
	Rasmus Villemoes, Rich Felker, Samuel Mendoza-Jonas,
	Sean Christopherson, Sergey Senozhatsky, Shuah Khan,
	Stefan Kristiansson, Steven Rostedt, Tejun Heo,
	Thomas Bogendoerfer, Ulf Hansson, Will Deacon, Wolfram Sang,
	Yoshinori Sato, Yury Norov

A couple of kernel functions call for_each_*_bit_from() with start
bit equal to 0. Replace them with for_each_*_bit().

No functional changes, but might improve on readability.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
---
 arch/x86/kernel/apic/vector.c         | 4 ++--
 drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 4 ++--
 drivers/hwmon/ltc2992.c               | 3 +--
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index c132daabe615..3e6f6b448f6a 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -760,9 +760,9 @@ void __init lapic_update_legacy_vectors(void)
 
 void __init lapic_assign_system_vectors(void)
 {
-	unsigned int i, vector = 0;
+	unsigned int i, vector;
 
-	for_each_set_bit_from(vector, system_vectors, NR_VECTORS)
+	for_each_set_bit(vector, system_vectors, NR_VECTORS)
 		irq_matrix_assign_system(vector_matrix, vector, false);
 
 	if (nr_legacy_irqs() > 1)
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
index c297fffe06eb..cb9a2e493e8a 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
@@ -1038,7 +1038,7 @@ int etnaviv_gpu_debugfs(struct etnaviv_gpu *gpu, struct seq_file *m)
 
 void etnaviv_gpu_recover_hang(struct etnaviv_gpu *gpu)
 {
-	unsigned int i = 0;
+	unsigned int i;
 
 	dev_err(gpu->dev, "recover hung GPU!\n");
 
@@ -1051,7 +1051,7 @@ void etnaviv_gpu_recover_hang(struct etnaviv_gpu *gpu)
 
 	/* complete all events, the GPU won't do it after the reset */
 	spin_lock(&gpu->event_spinlock);
-	for_each_set_bit_from(i, gpu->event_bitmap, ETNA_NR_EVENTS)
+	for_each_set_bit(i, gpu->event_bitmap, ETNA_NR_EVENTS)
 		complete(&gpu->event_free);
 	bitmap_zero(gpu->event_bitmap, ETNA_NR_EVENTS);
 	spin_unlock(&gpu->event_spinlock);
diff --git a/drivers/hwmon/ltc2992.c b/drivers/hwmon/ltc2992.c
index 2a4bed0ab226..7352d2b3c756 100644
--- a/drivers/hwmon/ltc2992.c
+++ b/drivers/hwmon/ltc2992.c
@@ -248,8 +248,7 @@ static int ltc2992_gpio_get_multiple(struct gpio_chip *chip, unsigned long *mask
 
 	gpio_status = reg;
 
-	gpio_nr = 0;
-	for_each_set_bit_from(gpio_nr, mask, LTC2992_GPIO_NR) {
+	for_each_set_bit(gpio_nr, mask, LTC2992_GPIO_NR) {
 		if (test_bit(LTC2992_GPIO_BIT(gpio_nr), &gpio_status))
 			set_bit(gpio_nr, bits);
 	}
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 13/17] tools: Rename bitmap_alloc() to bitmap_zalloc()
  2021-08-14 21:16 [PATCH RESEND 00/17] Resend bitmap patches Yury Norov
                   ` (11 preceding siblings ...)
  2021-08-14 21:17 ` [PATCH 12/17] Replace for_each_*_bit_from() with for_each_*_bit() where appropriate Yury Norov
@ 2021-08-14 21:17 ` Yury Norov
  2021-08-14 21:17 ` [PATCH 14/17] mm/percpu: micro-optimize pcpu_is_populated() Yury Norov
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Yury Norov @ 2021-08-14 21:17 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm, linux-arch,
	linux-kselftest, linux-mmc, linux-perf-users, kvm,
	James E.J. Bottomley, Alexander Lobakin, Alexander Shishkin,
	Alexey Klimov, Andrea Merello, Andy Shevchenko,
	Arnaldo Carvalho de Melo, Arnd Bergmann, Ben Gardon,
	Benjamin Herrenschmidt, Brian Cain, Catalin Marinas,
	Christoph Lameter, Daniel Bristot de Oliveira, David Hildenbrand,
	Dennis Zhou, Geert Uytterhoeven, Heiko Carstens, Ian Rogers,
	Ingo Molnar, Jaegeuk Kim, Jakub Kicinski, Jiri Olsa, Joe Perches,
	Jonas Bonn, Leo Yan, Mark Rutland, Namhyung Kim, Palmer Dabbelt,
	Paolo Bonzini, Peter Xu, Peter Zijlstra, Petr Mladek,
	Rasmus Villemoes, Rich Felker, Samuel Mendoza-Jonas,
	Sean Christopherson, Sergey Senozhatsky, Shuah Khan,
	Stefan Kristiansson, Steven Rostedt, Tejun Heo,
	Thomas Bogendoerfer, Ulf Hansson, Will Deacon, Wolfram Sang,
	Yoshinori Sato, Yury Norov

From: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

Rename bitmap_alloc() to bitmap_zalloc() in tools to follow the bitmap API
in the kernel.

No functional changes intended.

Suggested-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
---
 tools/include/linux/bitmap.h                            | 4 ++--
 tools/perf/bench/find-bit-bench.c                       | 2 +-
 tools/perf/builtin-c2c.c                                | 6 +++---
 tools/perf/builtin-record.c                             | 2 +-
 tools/perf/tests/bitmap.c                               | 2 +-
 tools/perf/tests/mem2node.c                             | 2 +-
 tools/perf/util/affinity.c                              | 4 ++--
 tools/perf/util/header.c                                | 4 ++--
 tools/perf/util/metricgroup.c                           | 2 +-
 tools/perf/util/mmap.c                                  | 4 ++--
 tools/testing/selftests/kvm/dirty_log_perf_test.c       | 2 +-
 tools/testing/selftests/kvm/dirty_log_test.c            | 4 ++--
 tools/testing/selftests/kvm/x86_64/vmx_dirty_log_test.c | 2 +-
 13 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/tools/include/linux/bitmap.h b/tools/include/linux/bitmap.h
index 13d90b574970..ea97804d04d4 100644
--- a/tools/include/linux/bitmap.h
+++ b/tools/include/linux/bitmap.h
@@ -112,10 +112,10 @@ static inline int test_and_clear_bit(int nr, unsigned long *addr)
 }
 
 /**
- * bitmap_alloc - Allocate bitmap
+ * bitmap_zalloc - Allocate bitmap
  * @nbits: Number of bits
  */
-static inline unsigned long *bitmap_alloc(int nbits)
+static inline unsigned long *bitmap_zalloc(int nbits)
 {
 	return calloc(1, BITS_TO_LONGS(nbits) * sizeof(unsigned long));
 }
diff --git a/tools/perf/bench/find-bit-bench.c b/tools/perf/bench/find-bit-bench.c
index 73b5bcc5946a..22b5cfe97023 100644
--- a/tools/perf/bench/find-bit-bench.c
+++ b/tools/perf/bench/find-bit-bench.c
@@ -54,7 +54,7 @@ static bool asm_test_bit(long nr, const unsigned long *addr)
 
 static int do_for_each_set_bit(unsigned int num_bits)
 {
-	unsigned long *to_test = bitmap_alloc(num_bits);
+	unsigned long *to_test = bitmap_zalloc(num_bits);
 	struct timeval start, end, diff;
 	u64 runtime_us;
 	struct stats fb_time_stats, tb_time_stats;
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 6dea37f141b2..c34d77bee4ef 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -139,11 +139,11 @@ static void *c2c_he_zalloc(size_t size)
 	if (!c2c_he)
 		return NULL;
 
-	c2c_he->cpuset = bitmap_alloc(c2c.cpus_cnt);
+	c2c_he->cpuset = bitmap_zalloc(c2c.cpus_cnt);
 	if (!c2c_he->cpuset)
 		return NULL;
 
-	c2c_he->nodeset = bitmap_alloc(c2c.nodes_cnt);
+	c2c_he->nodeset = bitmap_zalloc(c2c.nodes_cnt);
 	if (!c2c_he->nodeset)
 		return NULL;
 
@@ -2047,7 +2047,7 @@ static int setup_nodes(struct perf_session *session)
 		struct perf_cpu_map *map = n[node].map;
 		unsigned long *set;
 
-		set = bitmap_alloc(c2c.cpus_cnt);
+		set = bitmap_zalloc(c2c.cpus_cnt);
 		if (!set)
 			return -ENOMEM;
 
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 671a21c9ee4d..f1b30ac094cb 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -2786,7 +2786,7 @@ int cmd_record(int argc, const char **argv)
 
 	if (rec->opts.affinity != PERF_AFFINITY_SYS) {
 		rec->affinity_mask.nbits = cpu__max_cpu();
-		rec->affinity_mask.bits = bitmap_alloc(rec->affinity_mask.nbits);
+		rec->affinity_mask.bits = bitmap_zalloc(rec->affinity_mask.nbits);
 		if (!rec->affinity_mask.bits) {
 			pr_err("Failed to allocate thread mask for %zd cpus\n", rec->affinity_mask.nbits);
 			err = -ENOMEM;
diff --git a/tools/perf/tests/bitmap.c b/tools/perf/tests/bitmap.c
index 96c137360918..12b805efdca0 100644
--- a/tools/perf/tests/bitmap.c
+++ b/tools/perf/tests/bitmap.c
@@ -14,7 +14,7 @@ static unsigned long *get_bitmap(const char *str, int nbits)
 	unsigned long *bm = NULL;
 	int i;
 
-	bm = bitmap_alloc(nbits);
+	bm = bitmap_zalloc(nbits);
 
 	if (map && bm) {
 		for (i = 0; i < map->nr; i++)
diff --git a/tools/perf/tests/mem2node.c b/tools/perf/tests/mem2node.c
index a258bd51f1a4..e4d0d58b97f8 100644
--- a/tools/perf/tests/mem2node.c
+++ b/tools/perf/tests/mem2node.c
@@ -27,7 +27,7 @@ static unsigned long *get_bitmap(const char *str, int nbits)
 	unsigned long *bm = NULL;
 	int i;
 
-	bm = bitmap_alloc(nbits);
+	bm = bitmap_zalloc(nbits);
 
 	if (map && bm) {
 		for (i = 0; i < map->nr; i++) {
diff --git a/tools/perf/util/affinity.c b/tools/perf/util/affinity.c
index a5e31f826828..7b12bd7a3080 100644
--- a/tools/perf/util/affinity.c
+++ b/tools/perf/util/affinity.c
@@ -25,11 +25,11 @@ int affinity__setup(struct affinity *a)
 {
 	int cpu_set_size = get_cpu_set_size();
 
-	a->orig_cpus = bitmap_alloc(cpu_set_size * 8);
+	a->orig_cpus = bitmap_zalloc(cpu_set_size * 8);
 	if (!a->orig_cpus)
 		return -1;
 	sched_getaffinity(0, cpu_set_size, (cpu_set_t *)a->orig_cpus);
-	a->sched_cpus = bitmap_alloc(cpu_set_size * 8);
+	a->sched_cpus = bitmap_zalloc(cpu_set_size * 8);
 	if (!a->sched_cpus) {
 		zfree(&a->orig_cpus);
 		return -1;
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 44249027507a..563dec72adeb 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -278,7 +278,7 @@ static int do_read_bitmap(struct feat_fd *ff, unsigned long **pset, u64 *psize)
 	if (ret)
 		return ret;
 
-	set = bitmap_alloc(size);
+	set = bitmap_zalloc(size);
 	if (!set)
 		return -ENOMEM;
 
@@ -1294,7 +1294,7 @@ static int memory_node__read(struct memory_node *n, unsigned long idx)
 
 	size++;
 
-	n->set = bitmap_alloc(size);
+	n->set = bitmap_zalloc(size);
 	if (!n->set) {
 		closedir(dir);
 		return -ENOMEM;
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index 99d047c5ead0..29b747ac31c1 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -313,7 +313,7 @@ static int metricgroup__setup_events(struct list_head *groups,
 	struct evsel *evsel, *tmp;
 	unsigned long *evlist_used;
 
-	evlist_used = bitmap_alloc(perf_evlist->core.nr_entries);
+	evlist_used = bitmap_zalloc(perf_evlist->core.nr_entries);
 	if (!evlist_used)
 		return -ENOMEM;
 
diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index ab7108d22428..512dc8b9c168 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -106,7 +106,7 @@ static int perf_mmap__aio_bind(struct mmap *map, int idx, int cpu, int affinity)
 		data = map->aio.data[idx];
 		mmap_len = mmap__mmap_len(map);
 		node_index = cpu__get_node(cpu);
-		node_mask = bitmap_alloc(node_index + 1);
+		node_mask = bitmap_zalloc(node_index + 1);
 		if (!node_mask) {
 			pr_err("Failed to allocate node mask for mbind: error %m\n");
 			return -1;
@@ -258,7 +258,7 @@ static void build_node_mask(int node, struct mmap_cpu_mask *mask)
 static int perf_mmap__setup_affinity_mask(struct mmap *map, struct mmap_params *mp)
 {
 	map->affinity_mask.nbits = cpu__max_cpu();
-	map->affinity_mask.bits = bitmap_alloc(map->affinity_mask.nbits);
+	map->affinity_mask.bits = bitmap_zalloc(map->affinity_mask.nbits);
 	if (!map->affinity_mask.bits)
 		return -1;
 
diff --git a/tools/testing/selftests/kvm/dirty_log_perf_test.c b/tools/testing/selftests/kvm/dirty_log_perf_test.c
index 3c30d0045d8d..479868570d59 100644
--- a/tools/testing/selftests/kvm/dirty_log_perf_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_perf_test.c
@@ -171,7 +171,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	guest_num_pages = (nr_vcpus * guest_percpu_mem_size) >> vm_get_page_shift(vm);
 	guest_num_pages = vm_adjust_num_guest_pages(mode, guest_num_pages);
 	host_num_pages = vm_num_host_pages(mode, guest_num_pages);
-	bmap = bitmap_alloc(host_num_pages);
+	bmap = bitmap_zalloc(host_num_pages);
 
 	if (dirty_log_manual_caps) {
 		cap.cap = KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2;
diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index 5fe0140e407e..792c60e1b17d 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -749,8 +749,8 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 
 	pr_info("guest physical test memory offset: 0x%lx\n", guest_test_phys_mem);
 
-	bmap = bitmap_alloc(host_num_pages);
-	host_bmap_track = bitmap_alloc(host_num_pages);
+	bmap = bitmap_zalloc(host_num_pages);
+	host_bmap_track = bitmap_zalloc(host_num_pages);
 
 	/* Add an extra memory slot for testing dirty logging */
 	vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS,
diff --git a/tools/testing/selftests/kvm/x86_64/vmx_dirty_log_test.c b/tools/testing/selftests/kvm/x86_64/vmx_dirty_log_test.c
index 06a64980a5d2..68f26a8b4f42 100644
--- a/tools/testing/selftests/kvm/x86_64/vmx_dirty_log_test.c
+++ b/tools/testing/selftests/kvm/x86_64/vmx_dirty_log_test.c
@@ -111,7 +111,7 @@ int main(int argc, char *argv[])
 	nested_map(vmx, vm, NESTED_TEST_MEM1, GUEST_TEST_MEM, 4096);
 	nested_map(vmx, vm, NESTED_TEST_MEM2, GUEST_TEST_MEM, 4096);
 
-	bmap = bitmap_alloc(TEST_MEM_PAGES);
+	bmap = bitmap_zalloc(TEST_MEM_PAGES);
 	host_test_mem = addr_gpa2hva(vm, GUEST_TEST_MEM);
 
 	while (!done) {
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 14/17] mm/percpu: micro-optimize pcpu_is_populated()
  2021-08-14 21:16 [PATCH RESEND 00/17] Resend bitmap patches Yury Norov
                   ` (12 preceding siblings ...)
  2021-08-14 21:17 ` [PATCH 13/17] tools: Rename bitmap_alloc() to bitmap_zalloc() Yury Norov
@ 2021-08-14 21:17 ` Yury Norov
  2021-08-14 21:17 ` [PATCH 15/17] bitmap: unify find_bit operations Yury Norov
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Yury Norov @ 2021-08-14 21:17 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm, linux-arch,
	linux-kselftest, linux-mmc, linux-perf-users, kvm,
	James E.J. Bottomley, Alexander Lobakin, Alexander Shishkin,
	Alexey Klimov, Andrea Merello, Andy Shevchenko,
	Arnaldo Carvalho de Melo, Arnd Bergmann, Ben Gardon,
	Benjamin Herrenschmidt, Brian Cain, Catalin Marinas,
	Christoph Lameter, Daniel Bristot de Oliveira, David Hildenbrand,
	Dennis Zhou, Geert Uytterhoeven, Heiko Carstens, Ian Rogers,
	Ingo Molnar, Jaegeuk Kim, Jakub Kicinski, Jiri Olsa, Joe Perches,
	Jonas Bonn, Leo Yan, Mark Rutland, Namhyung Kim, Palmer Dabbelt,
	Paolo Bonzini, Peter Xu, Peter Zijlstra, Petr Mladek,
	Rasmus Villemoes, Rich Felker, Samuel Mendoza-Jonas,
	Sean Christopherson, Sergey Senozhatsky, Shuah Khan,
	Stefan Kristiansson, Steven Rostedt, Tejun Heo,
	Thomas Bogendoerfer, Ulf Hansson, Will Deacon, Wolfram Sang,
	Yoshinori Sato, Yury Norov

bitmap_next_clear_region() calls find_next_zero_bit() and find_next_bit()
sequentially to find a range of clear bits. In case of pcpu_is_populated()
there's a chance to return earlier if bitmap has all bits set.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Dennis Zhou <dennis@kernel.org>
---
 mm/percpu.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index a43039629aa4..8bf8b88446d7 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1070,17 +1070,18 @@ static void pcpu_block_update_hint_free(struct pcpu_chunk *chunk, int bit_off,
 static bool pcpu_is_populated(struct pcpu_chunk *chunk, int bit_off, int bits,
 			      int *next_off)
 {
-	unsigned int page_start, page_end, rs, re;
+	unsigned int start, end;
 
-	page_start = PFN_DOWN(bit_off * PCPU_MIN_ALLOC_SIZE);
-	page_end = PFN_UP((bit_off + bits) * PCPU_MIN_ALLOC_SIZE);
+	start = PFN_DOWN(bit_off * PCPU_MIN_ALLOC_SIZE);
+	end = PFN_UP((bit_off + bits) * PCPU_MIN_ALLOC_SIZE);
 
-	rs = page_start;
-	bitmap_next_clear_region(chunk->populated, &rs, &re, page_end);
-	if (rs >= page_end)
+	start = find_next_zero_bit(chunk->populated, end, start);
+	if (start >= end)
 		return true;
 
-	*next_off = re * PAGE_SIZE / PCPU_MIN_ALLOC_SIZE;
+	end = find_next_bit(chunk->populated, end, start + 1);
+
+	*next_off = end * PAGE_SIZE / PCPU_MIN_ALLOC_SIZE;
 	return false;
 }
 
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 15/17] bitmap: unify find_bit operations
  2021-08-14 21:16 [PATCH RESEND 00/17] Resend bitmap patches Yury Norov
                   ` (13 preceding siblings ...)
  2021-08-14 21:17 ` [PATCH 14/17] mm/percpu: micro-optimize pcpu_is_populated() Yury Norov
@ 2021-08-14 21:17 ` Yury Norov
  2021-08-14 21:17 ` [PATCH 16/17] lib: bitmap: add performance test for bitmap_print_to_pagebuf Yury Norov
  2021-08-14 21:17 ` [PATCH 17/17] vsprintf: rework bitmap_list_string Yury Norov
  16 siblings, 0 replies; 26+ messages in thread
From: Yury Norov @ 2021-08-14 21:17 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm, linux-arch,
	linux-kselftest, linux-mmc, linux-perf-users, kvm,
	James E.J. Bottomley, Alexander Lobakin, Alexander Shishkin,
	Alexey Klimov, Andrea Merello, Andy Shevchenko,
	Arnaldo Carvalho de Melo, Arnd Bergmann, Ben Gardon,
	Benjamin Herrenschmidt, Brian Cain, Catalin Marinas,
	Christoph Lameter, Daniel Bristot de Oliveira, David Hildenbrand,
	Dennis Zhou, Geert Uytterhoeven, Heiko Carstens, Ian Rogers,
	Ingo Molnar, Jaegeuk Kim, Jakub Kicinski, Jiri Olsa, Joe Perches,
	Jonas Bonn, Leo Yan, Mark Rutland, Namhyung Kim, Palmer Dabbelt,
	Paolo Bonzini, Peter Xu, Peter Zijlstra, Petr Mladek,
	Rasmus Villemoes, Rich Felker, Samuel Mendoza-Jonas,
	Sean Christopherson, Sergey Senozhatsky, Shuah Khan,
	Stefan Kristiansson, Steven Rostedt, Tejun Heo,
	Thomas Bogendoerfer, Ulf Hansson, Will Deacon, Wolfram Sang,
	Yoshinori Sato, Yury Norov

bitmap_for_each_{set,clear}_region() are similar to for_each_bit()
macros in include/linux/find.h, but interface and implementation
of them are different.

This patch adds for_each_bitrange() macros and drops unused
bitmap_*_region() API in sake of unification.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Dennis Zhou <dennis@kernel.org>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # For MMC
---
 drivers/mmc/host/renesas_sdhi_core.c |  2 +-
 include/linux/bitmap.h               | 33 ----------------
 include/linux/find.h                 | 56 ++++++++++++++++++++++++++++
 mm/percpu.c                          | 20 ++++------
 4 files changed, 65 insertions(+), 46 deletions(-)

diff --git a/drivers/mmc/host/renesas_sdhi_core.c b/drivers/mmc/host/renesas_sdhi_core.c
index e49ca0f7fe9a..efd33b1fc467 100644
--- a/drivers/mmc/host/renesas_sdhi_core.c
+++ b/drivers/mmc/host/renesas_sdhi_core.c
@@ -647,7 +647,7 @@ static int renesas_sdhi_select_tuning(struct tmio_mmc_host *host)
 	 * is at least SH_MOBILE_SDHI_MIN_TAP_ROW probes long then use the
 	 * center index as the tap, otherwise bail out.
 	 */
-	bitmap_for_each_set_region(bitmap, rs, re, 0, taps_size) {
+	for_each_set_bitrange(rs, re, bitmap, taps_size) {
 		if (re - rs > tap_cnt) {
 			tap_end = re;
 			tap_start = rs;
diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index 3f7c6731b203..6938edd457c2 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -55,12 +55,6 @@ struct device;
  *  bitmap_clear(dst, pos, nbits)               Clear specified bit area
  *  bitmap_find_next_zero_area(buf, len, pos, n, mask)  Find bit free area
  *  bitmap_find_next_zero_area_off(buf, len, pos, n, mask, mask_off)  as above
- *  bitmap_next_clear_region(map, &start, &end, nbits)  Find next clear region
- *  bitmap_next_set_region(map, &start, &end, nbits)  Find next set region
- *  bitmap_for_each_clear_region(map, rs, re, start, end)
- *  						Iterate over all clear regions
- *  bitmap_for_each_set_region(map, rs, re, start, end)
- *  						Iterate over all set regions
  *  bitmap_shift_right(dst, src, n, nbits)      *dst = *src >> n
  *  bitmap_shift_left(dst, src, n, nbits)       *dst = *src << n
  *  bitmap_cut(dst, src, first, n, nbits)       Cut n bits from first, copy rest
@@ -459,14 +453,6 @@ static inline void bitmap_replace(unsigned long *dst,
 		__bitmap_replace(dst, old, new, mask, nbits);
 }
 
-static inline void bitmap_next_clear_region(unsigned long *bitmap,
-					    unsigned int *rs, unsigned int *re,
-					    unsigned int end)
-{
-	*rs = find_next_zero_bit(bitmap, end, *rs);
-	*re = find_next_bit(bitmap, end, *rs + 1);
-}
-
 static inline void bitmap_next_set_region(unsigned long *bitmap,
 					  unsigned int *rs, unsigned int *re,
 					  unsigned int end)
@@ -475,25 +461,6 @@ static inline void bitmap_next_set_region(unsigned long *bitmap,
 	*re = find_next_zero_bit(bitmap, end, *rs + 1);
 }
 
-/*
- * Bitmap region iterators.  Iterates over the bitmap between [@start, @end).
- * @rs and @re should be integer variables and will be set to start and end
- * index of the current clear or set region.
- */
-#define bitmap_for_each_clear_region(bitmap, rs, re, start, end)	     \
-	for ((rs) = (start),						     \
-	     bitmap_next_clear_region((bitmap), &(rs), &(re), (end));	     \
-	     (rs) < (re);						     \
-	     (rs) = (re) + 1,						     \
-	     bitmap_next_clear_region((bitmap), &(rs), &(re), (end)))
-
-#define bitmap_for_each_set_region(bitmap, rs, re, start, end)		     \
-	for ((rs) = (start),						     \
-	     bitmap_next_set_region((bitmap), &(rs), &(re), (end));	     \
-	     (rs) < (re);						     \
-	     (rs) = (re) + 1,						     \
-	     bitmap_next_set_region((bitmap), &(rs), &(re), (end)))
-
 /**
  * BITMAP_FROM_U64() - Represent u64 value in the format suitable for bitmap.
  * @n: u64 value
diff --git a/include/linux/find.h b/include/linux/find.h
index ae9ed52b52b8..5bb6db213bcb 100644
--- a/include/linux/find.h
+++ b/include/linux/find.h
@@ -301,6 +301,62 @@ unsigned long find_next_bit_le(const void *addr, unsigned
 	     (bit) < (size);					\
 	     (bit) = find_next_zero_bit((addr), (size), (bit) + 1))
 
+/**
+ * for_each_set_bitrange - iterate over all set bit ranges [b; e)
+ * @b: bit offset of start of current bitrange (first set bit)
+ * @e: bit offset of end of current bitrange (first unset bit)
+ * @addr: bitmap address to base the search on
+ * @size: bitmap size in number of bits
+ */
+#define for_each_set_bitrange(b, e, addr, size)			\
+	for ((b) = find_next_bit((addr), (size), 0),		\
+	     (e) = find_next_zero_bit((addr), (size), (b) + 1);	\
+	     (b) < (size);					\
+	     (b) = find_next_bit((addr), (size), (e) + 1),	\
+	     (e) = find_next_zero_bit((addr), (size), (b) + 1))
+
+/**
+ * for_each_set_bitrange_from - iterate over all set bit ranges [b; e)
+ * @b: bit offset of start of current bitrange (first set bit); must be initialized
+ * @e: bit offset of end of current bitrange (first unset bit)
+ * @addr: bitmap address to base the search on
+ * @size: bitmap size in number of bits
+ */
+#define for_each_set_bitrange_from(b, e, addr, size)		\
+	for ((b) = find_next_bit((addr), (size), (b)),		\
+	     (e) = find_next_zero_bit((addr), (size), (b) + 1);	\
+	     (b) < (size);					\
+	     (b) = find_next_bit((addr), (size), (e) + 1),	\
+	     (e) = find_next_zero_bit((addr), (size), (b) + 1))
+
+/**
+ * for_each_clear_bitrange - iterate over all unset bit ranges [b; e)
+ * @b: bit offset of start of current bitrange (first unset bit)
+ * @e: bit offset of end of current bitrange (first set bit)
+ * @addr: bitmap address to base the search on
+ * @size: bitmap size in number of bits
+ */
+#define for_each_clear_bitrange(b, e, addr, size)		\
+	for ((b) = find_next_zero_bit((addr), (size), 0),	\
+	     (e) = find_next_bit((addr), (size), (b) + 1);	\
+	     (b) < (size);					\
+	     (b) = find_next_zero_bit((addr), (size), (e) + 1),	\
+	     (e) = find_next_bit((addr), (size), (b) + 1))
+
+/**
+ * for_each_clear_bitrange_from - iterate over all unset bit ranges [b; e)
+ * @b: bit offset of start of current bitrange (first set bit); must be initialized
+ * @e: bit offset of end of current bitrange (first unset bit)
+ * @addr: bitmap address to base the search on
+ * @size: bitmap size in number of bits
+ */
+#define for_each_clear_bitrange_from(b, e, addr, size)		\
+	for ((b) = find_next_zero_bit((addr), (size), (b)),	\
+	     (e) = find_next_bit((addr), (size), (b) + 1);	\
+	     (b) < (size);					\
+	     (b) = find_next_zero_bit((addr), (size), (e) + 1),	\
+	     (e) = find_next_bit((addr), (size), (b) + 1))
+
 /**
  * for_each_set_clump8 - iterate over bitmap for each 8-bit clump with set bits
  * @start: bit offset to start search and to store the current iteration offset
diff --git a/mm/percpu.c b/mm/percpu.c
index 8bf8b88446d7..8c16c8e04eee 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -779,7 +779,7 @@ static void pcpu_block_refresh_hint(struct pcpu_chunk *chunk, int index)
 {
 	struct pcpu_block_md *block = chunk->md_blocks + index;
 	unsigned long *alloc_map = pcpu_index_alloc_map(chunk, index);
-	unsigned int rs, re, start;	/* region start, region end */
+	unsigned int start, end;	/* region start, region end */
 
 	/* promote scan_hint to contig_hint */
 	if (block->scan_hint) {
@@ -795,9 +795,8 @@ static void pcpu_block_refresh_hint(struct pcpu_chunk *chunk, int index)
 	block->right_free = 0;
 
 	/* iterate over free areas and update the contig hints */
-	bitmap_for_each_clear_region(alloc_map, rs, re, start,
-				     PCPU_BITMAP_BLOCK_BITS)
-		pcpu_block_update(block, rs, re);
+	for_each_clear_bitrange_from(start, end, alloc_map, PCPU_BITMAP_BLOCK_BITS)
+		pcpu_block_update(block, start, end);
 }
 
 /**
@@ -1855,13 +1854,12 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
 
 	/* populate if not all pages are already there */
 	if (!is_atomic) {
-		unsigned int page_start, page_end, rs, re;
+		unsigned int page_end, rs, re;
 
-		page_start = PFN_DOWN(off);
+		rs = PFN_DOWN(off);
 		page_end = PFN_UP(off + size);
 
-		bitmap_for_each_clear_region(chunk->populated, rs, re,
-					     page_start, page_end) {
+		for_each_clear_bitrange_from(rs, re, chunk->populated, page_end) {
 			WARN_ON(chunk->immutable);
 
 			ret = pcpu_populate_chunk(chunk, rs, re, pcpu_gfp);
@@ -2017,8 +2015,7 @@ static void pcpu_balance_free(bool empty_only)
 	list_for_each_entry_safe(chunk, next, &to_free, list) {
 		unsigned int rs, re;
 
-		bitmap_for_each_set_region(chunk->populated, rs, re, 0,
-					   chunk->nr_pages) {
+		for_each_set_bitrange(rs, re, chunk->populated, chunk->nr_pages) {
 			pcpu_depopulate_chunk(chunk, rs, re);
 			spin_lock_irq(&pcpu_lock);
 			pcpu_chunk_depopulated(chunk, rs, re);
@@ -2088,8 +2085,7 @@ static void pcpu_balance_populated(void)
 			continue;
 
 		/* @chunk can't go away while pcpu_alloc_mutex is held */
-		bitmap_for_each_clear_region(chunk->populated, rs, re, 0,
-					     chunk->nr_pages) {
+		for_each_clear_bitrange(rs, re, chunk->populated, chunk->nr_pages) {
 			int nr = min_t(int, re - rs, nr_to_pop);
 
 			spin_unlock_irq(&pcpu_lock);
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 16/17] lib: bitmap: add performance test for bitmap_print_to_pagebuf
  2021-08-14 21:16 [PATCH RESEND 00/17] Resend bitmap patches Yury Norov
                   ` (14 preceding siblings ...)
  2021-08-14 21:17 ` [PATCH 15/17] bitmap: unify find_bit operations Yury Norov
@ 2021-08-14 21:17 ` Yury Norov
  2021-08-14 21:17 ` [PATCH 17/17] vsprintf: rework bitmap_list_string Yury Norov
  16 siblings, 0 replies; 26+ messages in thread
From: Yury Norov @ 2021-08-14 21:17 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm, linux-arch,
	linux-kselftest, linux-mmc, linux-perf-users, kvm,
	James E.J. Bottomley, Alexander Lobakin, Alexander Shishkin,
	Alexey Klimov, Andrea Merello, Andy Shevchenko,
	Arnaldo Carvalho de Melo, Arnd Bergmann, Ben Gardon,
	Benjamin Herrenschmidt, Brian Cain, Catalin Marinas,
	Christoph Lameter, Daniel Bristot de Oliveira, David Hildenbrand,
	Dennis Zhou, Geert Uytterhoeven, Heiko Carstens, Ian Rogers,
	Ingo Molnar, Jaegeuk Kim, Jakub Kicinski, Jiri Olsa, Joe Perches,
	Jonas Bonn, Leo Yan, Mark Rutland, Namhyung Kim, Palmer Dabbelt,
	Paolo Bonzini, Peter Xu, Peter Zijlstra, Petr Mladek,
	Rasmus Villemoes, Rich Felker, Samuel Mendoza-Jonas,
	Sean Christopherson, Sergey Senozhatsky, Shuah Khan,
	Stefan Kristiansson, Steven Rostedt, Tejun Heo,
	Thomas Bogendoerfer, Ulf Hansson, Will Deacon, Wolfram Sang,
	Yoshinori Sato, Yury Norov

Functional tests for bitmap_print_to_pagebuf() are provided
in lib/test_printf.c. This patch adds performance test for
a case of fully set bitmap.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
---
 lib/test_bitmap.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index 4ea73f5aed41..452d525007da 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -430,6 +430,42 @@ static void __init test_bitmap_parselist(void)
 	}
 }
 
+static void __init test_bitmap_printlist(void)
+{
+	unsigned long *bmap = kmalloc(PAGE_SIZE, GFP_KERNEL);
+	char *buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
+	char expected[256];
+	int ret, slen;
+	ktime_t time;
+
+	if (!buf || !bmap)
+		goto out;
+
+	memset(bmap, -1, PAGE_SIZE);
+	slen = snprintf(expected, 256, "0-%ld", PAGE_SIZE * 8 - 1);
+	if (slen < 0)
+		goto out;
+
+	time = ktime_get();
+	ret = bitmap_print_to_pagebuf(true, buf, bmap, PAGE_SIZE * 8);
+	time = ktime_get() - time;
+
+	if (ret != slen + 1) {
+		pr_err("bitmap_print_to_pagebuf: result is %d, expected %d\n", ret, slen);
+		goto out;
+	}
+
+	if (strncmp(buf, expected, slen)) {
+		pr_err("bitmap_print_to_pagebuf: result is %s, expected %s\n", buf, expected);
+		goto out;
+	}
+
+	pr_err("bitmap_print_to_pagebuf: input is '%s', Time: %llu\n", buf, time);
+out:
+	kfree(buf);
+	kfree(bmap);
+}
+
 static const unsigned long parse_test[] __initconst = {
 	BITMAP_FROM_U64(0),
 	BITMAP_FROM_U64(1),
@@ -669,6 +705,7 @@ static void __init selftest(void)
 	test_bitmap_arr32();
 	test_bitmap_parse();
 	test_bitmap_parselist();
+	test_bitmap_printlist();
 	test_mem_optimisations();
 	test_for_each_set_clump8();
 	test_bitmap_cut();
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 17/17] vsprintf: rework bitmap_list_string
  2021-08-14 21:16 [PATCH RESEND 00/17] Resend bitmap patches Yury Norov
                   ` (15 preceding siblings ...)
  2021-08-14 21:17 ` [PATCH 16/17] lib: bitmap: add performance test for bitmap_print_to_pagebuf Yury Norov
@ 2021-08-14 21:17 ` Yury Norov
  2021-08-15 11:09   ` Andy Shevchenko
  2021-08-26 14:15   ` Petr Mladek
  16 siblings, 2 replies; 26+ messages in thread
From: Yury Norov @ 2021-08-14 21:17 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm, linux-arch,
	linux-kselftest, linux-mmc, linux-perf-users, kvm,
	James E.J. Bottomley, Alexander Lobakin, Alexander Shishkin,
	Alexey Klimov, Andrea Merello, Andy Shevchenko,
	Arnaldo Carvalho de Melo, Arnd Bergmann, Ben Gardon,
	Benjamin Herrenschmidt, Brian Cain, Catalin Marinas,
	Christoph Lameter, Daniel Bristot de Oliveira, David Hildenbrand,
	Dennis Zhou, Geert Uytterhoeven, Heiko Carstens, Ian Rogers,
	Ingo Molnar, Jaegeuk Kim, Jakub Kicinski, Jiri Olsa, Joe Perches,
	Jonas Bonn, Leo Yan, Mark Rutland, Namhyung Kim, Palmer Dabbelt,
	Paolo Bonzini, Peter Xu, Peter Zijlstra, Petr Mladek,
	Rasmus Villemoes, Rich Felker, Samuel Mendoza-Jonas,
	Sean Christopherson, Sergey Senozhatsky, Shuah Khan,
	Stefan Kristiansson, Steven Rostedt, Tejun Heo,
	Thomas Bogendoerfer, Ulf Hansson, Will Deacon, Wolfram Sang,
	Yoshinori Sato, Yury Norov

bitmap_list_string() is very ineffective when printing bitmaps with long
ranges of set bits because it calls find_next_bit for each bit in the
bitmap.  We can do better by detecting ranges of set bits.

In my environment, before/after is 943008/31008 ns.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
---
 lib/vsprintf.c | 24 +++++++-----------------
 1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index dd006adfe853..29a384eee286 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -1241,20 +1241,13 @@ char *bitmap_list_string(char *buf, char *end, unsigned long *bitmap,
 			 struct printf_spec spec, const char *fmt)
 {
 	int nr_bits = max_t(int, spec.field_width, 0);
-	/* current bit is 'cur', most recently seen range is [rbot, rtop] */
-	int cur, rbot, rtop;
 	bool first = true;
+	int rbot, rtop;
 
 	if (check_pointer(&buf, end, bitmap, spec))
 		return buf;
 
-	rbot = cur = find_first_bit(bitmap, nr_bits);
-	while (cur < nr_bits) {
-		rtop = cur;
-		cur = find_next_bit(bitmap, nr_bits, cur + 1);
-		if (cur < nr_bits && cur <= rtop + 1)
-			continue;
-
+	for_each_set_bitrange(rbot, rtop, bitmap, nr_bits) {
 		if (!first) {
 			if (buf < end)
 				*buf = ',';
@@ -1263,15 +1256,12 @@ char *bitmap_list_string(char *buf, char *end, unsigned long *bitmap,
 		first = false;
 
 		buf = number(buf, end, rbot, default_dec_spec);
-		if (rbot < rtop) {
-			if (buf < end)
-				*buf = '-';
-			buf++;
-
-			buf = number(buf, end, rtop, default_dec_spec);
-		}
+		if (rtop == rbot + 1)
+			continue;
 
-		rbot = cur;
+		if (buf < end)
+			*buf = '-';
+		buf = number(++buf, end, rtop - 1, default_dec_spec);
 	}
 	return buf;
 }
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 17/17] vsprintf: rework bitmap_list_string
  2021-08-14 21:17 ` [PATCH 17/17] vsprintf: rework bitmap_list_string Yury Norov
@ 2021-08-15 11:09   ` Andy Shevchenko
  2021-08-17 16:35     ` Yury Norov
  2021-08-26 14:15   ` Petr Mladek
  1 sibling, 1 reply; 26+ messages in thread
From: Andy Shevchenko @ 2021-08-15 11:09 UTC (permalink / raw)
  To: Yury Norov
  Cc: Andrew Morton, Linux Kernel Mailing List, linux-mm, Linux-Arch,
	open list:KERNEL SELFTEST FRAMEWORK, linux-mmc, linux-perf-users,
	open list:VFIO DRIVER, James E.J. Bottomley, Alexander Lobakin,
	Alexander Shishkin, Alexey Klimov, Andrea Merello,
	Andy Shevchenko, Arnaldo Carvalho de Melo, Arnd Bergmann,
	Ben Gardon, Benjamin Herrenschmidt, Brian Cain, Catalin Marinas,
	Christoph Lameter, Daniel Bristot de Oliveira, David Hildenbrand,
	Dennis Zhou, Geert Uytterhoeven, Heiko Carstens, Ian Rogers,
	Ingo Molnar, Jaegeuk Kim, Jakub Kicinski, Jiri Olsa, Joe Perches,
	Jonas Bonn, Leo Yan, Mark Rutland, Namhyung Kim, Palmer Dabbelt,
	Paolo Bonzini, Peter Xu, Peter Zijlstra, Petr Mladek,
	Rasmus Villemoes, Rich Felker, Samuel Mendoza-Jonas,
	Sean Christopherson, Sergey Senozhatsky, Shuah Khan,
	Stefan Kristiansson, Steven Rostedt, Tejun Heo,
	Thomas Bogendoerfer, Ulf Hansson, Will Deacon, Wolfram Sang,
	Yoshinori Sato

On Sun, Aug 15, 2021 at 12:21 AM Yury Norov <yury.norov@gmail.com> wrote:
>
> bitmap_list_string() is very ineffective when printing bitmaps with long
> ranges of set bits because it calls find_next_bit for each bit in the
> bitmap.  We can do better by detecting ranges of set bits.
>
> In my environment, before/after is 943008/31008 ns.

I would add a couple of words, maybe in parentheses, to describe what
your environment is.

...

> +               buf = number(++buf, end, rtop - 1, default_dec_spec);

++buf is a bit confusing here. Since you will rewrite the buf value
anyway, I would write the parameter as buf + 1.

-- 
With Best Regards,
Andy Shevchenko


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 17/17] vsprintf: rework bitmap_list_string
  2021-08-15 11:09   ` Andy Shevchenko
@ 2021-08-17 16:35     ` Yury Norov
  0 siblings, 0 replies; 26+ messages in thread
From: Yury Norov @ 2021-08-17 16:35 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Andrew Morton, Linux Kernel Mailing List, linux-mm, Linux-Arch,
	open list:KERNEL SELFTEST FRAMEWORK, linux-mmc, linux-perf-users,
	open list:VFIO DRIVER, James E.J. Bottomley, Alexander Lobakin,
	Alexander Shishkin, Alexey Klimov, Andrea Merello,
	Andy Shevchenko, Arnaldo Carvalho de Melo, Arnd Bergmann,
	Ben Gardon, Benjamin Herrenschmidt, Brian Cain, Catalin Marinas,
	Christoph Lameter, Daniel Bristot de Oliveira, David Hildenbrand,
	Dennis Zhou, Geert Uytterhoeven, Heiko Carstens, Ian Rogers,
	Ingo Molnar, Jaegeuk Kim, Jakub Kicinski, Jiri Olsa, Joe Perches,
	Jonas Bonn, Leo Yan, Mark Rutland, Namhyung Kim, Palmer Dabbelt,
	Paolo Bonzini, Peter Xu, Peter Zijlstra, Petr Mladek,
	Rasmus Villemoes, Rich Felker, Samuel Mendoza-Jonas,
	Sean Christopherson, Sergey Senozhatsky, Shuah Khan,
	Stefan Kristiansson, Steven Rostedt, Tejun Heo,
	Thomas Bogendoerfer, Ulf Hansson, Will Deacon, Wolfram Sang,
	Yoshinori Sato

On Sun, Aug 15, 2021 at 02:09:45PM +0300, Andy Shevchenko wrote:
> On Sun, Aug 15, 2021 at 12:21 AM Yury Norov <yury.norov@gmail.com> wrote:
> >
> > bitmap_list_string() is very ineffective when printing bitmaps with long
> > ranges of set bits because it calls find_next_bit for each bit in the
> > bitmap.  We can do better by detecting ranges of set bits.
> >
> > In my environment, before/after is 943008/31008 ns.
> 
> I would add a couple of words, maybe in parentheses, to describe what
> your environment is.
> 
> ...
> 
> > +               buf = number(++buf, end, rtop - 1, default_dec_spec);
> 
> ++buf is a bit confusing here. Since you will rewrite the buf value
> anyway, I would write the parameter as buf + 1.

Agree, it's sloppy. I'll  send the patch by tomorrow.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 11/17] find: micro-optimize for_each_{set,clear}_bit()
  2021-08-14 21:17 ` [PATCH 11/17] find: micro-optimize for_each_{set,clear}_bit() Yury Norov
@ 2021-08-26 13:57   ` Petr Mladek
  2021-08-26 21:09     ` Yury Norov
  0 siblings, 1 reply; 26+ messages in thread
From: Petr Mladek @ 2021-08-26 13:57 UTC (permalink / raw)
  To: Yury Norov
  Cc: Andrew Morton, linux-kernel, linux-mm, linux-arch,
	linux-kselftest, linux-mmc, linux-perf-users, kvm,
	James E.J. Bottomley, Alexander Lobakin, Alexander Shishkin,
	Alexey Klimov, Andrea Merello, Andy Shevchenko,
	Arnaldo Carvalho de Melo, Arnd Bergmann, Ben Gardon,
	Benjamin Herrenschmidt, Brian Cain, Catalin Marinas,
	Christoph Lameter, Daniel Bristot de Oliveira, David Hildenbrand,
	Dennis Zhou, Geert Uytterhoeven, Heiko Carstens, Ian Rogers,
	Ingo Molnar, Jaegeuk Kim, Jakub Kicinski, Jiri Olsa, Joe Perches,
	Jonas Bonn, Leo Yan, Mark Rutland, Namhyung Kim, Palmer Dabbelt,
	Paolo Bonzini, Peter Xu, Peter Zijlstra, Rasmus Villemoes,
	Rich Felker, Samuel Mendoza-Jonas, Sean Christopherson,
	Sergey Senozhatsky, Shuah Khan, Stefan Kristiansson,
	Steven Rostedt, Tejun Heo, Thomas Bogendoerfer, Ulf Hansson,
	Will Deacon, Wolfram Sang, Yoshinori Sato

On Sat 2021-08-14 14:17:07, Yury Norov wrote:
> The macros iterate thru all set/clear bits in a bitmap. They search a
> first bit using find_first_bit(), and the rest bits using find_next_bit().
> 
> Since find_next_bit() is called shortly after find_first_bit(), we can
> save few lines of I-cache by not using find_first_bit().

Is this only a speculation or does it fix a real performance problem?

The macro is used like:

	for_each_set_bit(bit, addr, size) {
		fn(bit);
	}

IMHO, the micro-opimization does not help when fn() is non-trivial.


> --- a/include/linux/find.h
> +++ b/include/linux/find.h
> @@ -280,7 +280,7 @@ unsigned long find_next_bit_le(const void *addr, unsigned
>  #endif
>  
>  #define for_each_set_bit(bit, addr, size) \
> -	for ((bit) = find_first_bit((addr), (size));		\
> +	for ((bit) = find_next_bit((addr), (size), 0);		\
>  	     (bit) < (size);					\
>  	     (bit) = find_next_bit((addr), (size), (bit) + 1))
>  

It is not a big deal. I just think that the original code is slightly
more self-explaining.

Best Regards,
Petr


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 17/17] vsprintf: rework bitmap_list_string
  2021-08-14 21:17 ` [PATCH 17/17] vsprintf: rework bitmap_list_string Yury Norov
  2021-08-15 11:09   ` Andy Shevchenko
@ 2021-08-26 14:15   ` Petr Mladek
  2021-08-26 20:59     ` Yury Norov
  1 sibling, 1 reply; 26+ messages in thread
From: Petr Mladek @ 2021-08-26 14:15 UTC (permalink / raw)
  To: Yury Norov
  Cc: Andrew Morton, linux-kernel, linux-mm, linux-arch,
	linux-kselftest, linux-mmc, linux-perf-users, kvm,
	James E.J. Bottomley, Alexander Lobakin, Alexander Shishkin,
	Alexey Klimov, Andrea Merello, Andy Shevchenko,
	Arnaldo Carvalho de Melo, Arnd Bergmann, Ben Gardon,
	Benjamin Herrenschmidt, Brian Cain, Catalin Marinas,
	Christoph Lameter, Daniel Bristot de Oliveira, David Hildenbrand,
	Dennis Zhou, Geert Uytterhoeven, Heiko Carstens, Ian Rogers,
	Ingo Molnar, Jaegeuk Kim, Jakub Kicinski, Jiri Olsa, Joe Perches,
	Jonas Bonn, Leo Yan, Mark Rutland, Namhyung Kim, Palmer Dabbelt,
	Paolo Bonzini, Peter Xu, Peter Zijlstra, Rasmus Villemoes,
	Rich Felker, Samuel Mendoza-Jonas, Sean Christopherson,
	Sergey Senozhatsky, Shuah Khan, Stefan Kristiansson,
	Steven Rostedt, Tejun Heo, Thomas Bogendoerfer, Ulf Hansson,
	Will Deacon, Wolfram Sang, Yoshinori Sato

On Sat 2021-08-14 14:17:13, Yury Norov wrote:
> bitmap_list_string() is very ineffective when printing bitmaps with long
> ranges of set bits because it calls find_next_bit for each bit in the
> bitmap.  We can do better by detecting ranges of set bits.
> 
> In my environment, before/after is 943008/31008 ns.
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>

I like the patch. The new code is much easier to follow.
Feel free to use:

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 17/17] vsprintf: rework bitmap_list_string
  2021-08-26 14:15   ` Petr Mladek
@ 2021-08-26 20:59     ` Yury Norov
  0 siblings, 0 replies; 26+ messages in thread
From: Yury Norov @ 2021-08-26 20:59 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, linux-kernel, linux-mm, linux-arch,
	linux-kselftest, linux-mmc, linux-perf-users, kvm,
	James E.J. Bottomley, Alexander Lobakin, Alexander Shishkin,
	Alexey Klimov, Andrea Merello, Andy Shevchenko,
	Arnaldo Carvalho de Melo, Arnd Bergmann, Ben Gardon,
	Benjamin Herrenschmidt, Brian Cain, Catalin Marinas,
	Christoph Lameter, Daniel Bristot de Oliveira, David Hildenbrand,
	Dennis Zhou, Geert Uytterhoeven, Heiko Carstens, Ian Rogers,
	Ingo Molnar, Jaegeuk Kim, Jakub Kicinski, Jiri Olsa, Joe Perches,
	Jonas Bonn, Leo Yan, Mark Rutland, Namhyung Kim, Palmer Dabbelt,
	Paolo Bonzini, Peter Xu, Peter Zijlstra, Rasmus Villemoes,
	Rich Felker, Samuel Mendoza-Jonas, Sean Christopherson,
	Sergey Senozhatsky, Shuah Khan, Stefan Kristiansson,
	Steven Rostedt, Tejun Heo, Thomas Bogendoerfer, Ulf Hansson,
	Will Deacon, Wolfram Sang, Yoshinori Sato

On Thu, Aug 26, 2021 at 04:15:09PM +0200, Petr Mladek wrote:
> On Sat 2021-08-14 14:17:13, Yury Norov wrote:
> > bitmap_list_string() is very ineffective when printing bitmaps with long
> > ranges of set bits because it calls find_next_bit for each bit in the
> > bitmap.  We can do better by detecting ranges of set bits.
> > 
> > In my environment, before/after is 943008/31008 ns.
> > 
> > Signed-off-by: Yury Norov <yury.norov@gmail.com>
> > Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
> 
> I like the patch. The new code is much easier to follow.
> Feel free to use:
> 
> Reviewed-by: Petr Mladek <pmladek@suse.com>
> 
> Best Regards,
> Petr

Thanks Petr! The patch is already in the linux-next.

Andrew, Stephen, can you please append Petr's reviewed-by?

Thanks,
Yury


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 11/17] find: micro-optimize for_each_{set,clear}_bit()
  2021-08-26 13:57   ` Petr Mladek
@ 2021-08-26 21:09     ` Yury Norov
  2021-08-30 12:12       ` Petr Mladek
  0 siblings, 1 reply; 26+ messages in thread
From: Yury Norov @ 2021-08-26 21:09 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, linux-kernel, linux-mm, linux-arch,
	linux-kselftest, linux-mmc, linux-perf-users, kvm,
	James E.J. Bottomley, Alexander Lobakin, Alexander Shishkin,
	Alexey Klimov, Andrea Merello, Andy Shevchenko,
	Arnaldo Carvalho de Melo, Arnd Bergmann, Ben Gardon,
	Benjamin Herrenschmidt, Brian Cain, Catalin Marinas,
	Christoph Lameter, Daniel Bristot de Oliveira, David Hildenbrand,
	Dennis Zhou, Geert Uytterhoeven, Heiko Carstens, Ian Rogers,
	Ingo Molnar, Jaegeuk Kim, Jakub Kicinski, Jiri Olsa, Joe Perches,
	Jonas Bonn, Leo Yan, Mark Rutland, Namhyung Kim, Palmer Dabbelt,
	Paolo Bonzini, Peter Xu, Peter Zijlstra, Rasmus Villemoes,
	Rich Felker, Samuel Mendoza-Jonas, Sean Christopherson,
	Sergey Senozhatsky, Shuah Khan, Stefan Kristiansson,
	Steven Rostedt, Tejun Heo, Thomas Bogendoerfer, Ulf Hansson,
	Will Deacon, Wolfram Sang, Yoshinori Sato

On Thu, Aug 26, 2021 at 03:57:13PM +0200, Petr Mladek wrote:
> On Sat 2021-08-14 14:17:07, Yury Norov wrote:
> > The macros iterate thru all set/clear bits in a bitmap. They search a
> > first bit using find_first_bit(), and the rest bits using find_next_bit().
> > 
> > Since find_next_bit() is called shortly after find_first_bit(), we can
> > save few lines of I-cache by not using find_first_bit().
> 
> Is this only a speculation or does it fix a real performance problem?
> 
> The macro is used like:
> 
> 	for_each_set_bit(bit, addr, size) {
> 		fn(bit);
> 	}
> 
> IMHO, the micro-opimization does not help when fn() is non-trivial.
 
The effect is measurable:

Start testing for_each_bit()
for_each_set_bit:                15296 ns,   1000 iterations
for_each_set_bit_from:           15225 ns,   1000 iterations

Start testing for_each_bit() with cash flushing
for_each_set_bit:               547626 ns,   1000 iterations
for_each_set_bit_from:          497899 ns,   1000 iterations

Refer this:

https://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg356151.html

Thanks,
Yury
 
> > --- a/include/linux/find.h
> > +++ b/include/linux/find.h
> > @@ -280,7 +280,7 @@ unsigned long find_next_bit_le(const void *addr, unsigned
> >  #endif
> >  
> >  #define for_each_set_bit(bit, addr, size) \
> > -	for ((bit) = find_first_bit((addr), (size));		\
> > +	for ((bit) = find_next_bit((addr), (size), 0);		\
> >  	     (bit) < (size);					\
> >  	     (bit) = find_next_bit((addr), (size), (bit) + 1))
> >  
> 
> It is not a big deal. I just think that the original code is slightly
> more self-explaining.
> 
> Best Regards,
> Petr


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 11/17] find: micro-optimize for_each_{set,clear}_bit()
  2021-08-26 21:09     ` Yury Norov
@ 2021-08-30 12:12       ` Petr Mladek
  2021-08-30 16:15         ` Yury Norov
  0 siblings, 1 reply; 26+ messages in thread
From: Petr Mladek @ 2021-08-30 12:12 UTC (permalink / raw)
  To: Yury Norov
  Cc: Andrew Morton, linux-kernel, linux-mm, linux-arch,
	linux-kselftest, linux-mmc, linux-perf-users, kvm,
	James E.J. Bottomley, Alexander Lobakin, Alexander Shishkin,
	Alexey Klimov, Andrea Merello, Andy Shevchenko,
	Arnaldo Carvalho de Melo, Arnd Bergmann, Ben Gardon,
	Benjamin Herrenschmidt, Brian Cain, Catalin Marinas,
	Christoph Lameter, Daniel Bristot de Oliveira, David Hildenbrand,
	Dennis Zhou, Geert Uytterhoeven, Heiko Carstens, Ian Rogers,
	Ingo Molnar, Jaegeuk Kim, Jakub Kicinski, Jiri Olsa, Joe Perches,
	Jonas Bonn, Leo Yan, Mark Rutland, Namhyung Kim, Palmer Dabbelt,
	Paolo Bonzini, Peter Xu, Peter Zijlstra, Rasmus Villemoes,
	Rich Felker, Samuel Mendoza-Jonas, Sean Christopherson,
	Sergey Senozhatsky, Shuah Khan, Stefan Kristiansson,
	Steven Rostedt, Tejun Heo, Thomas Bogendoerfer, Ulf Hansson,
	Will Deacon, Wolfram Sang, Yoshinori Sato

On Thu 2021-08-26 14:09:55, Yury Norov wrote:
> On Thu, Aug 26, 2021 at 03:57:13PM +0200, Petr Mladek wrote:
> > On Sat 2021-08-14 14:17:07, Yury Norov wrote:
> > > The macros iterate thru all set/clear bits in a bitmap. They search a
> > > first bit using find_first_bit(), and the rest bits using find_next_bit().
> > > 
> > > Since find_next_bit() is called shortly after find_first_bit(), we can
> > > save few lines of I-cache by not using find_first_bit().
> > 
> > Is this only a speculation or does it fix a real performance problem?
> > 
> > The macro is used like:
> > 
> > 	for_each_set_bit(bit, addr, size) {
> > 		fn(bit);
> > 	}
> > 
> > IMHO, the micro-opimization does not help when fn() is non-trivial.
>  
> The effect is measurable:
> 
> Start testing for_each_bit()
> for_each_set_bit:                15296 ns,   1000 iterations
> for_each_set_bit_from:           15225 ns,   1000 iterations
> 
> Start testing for_each_bit() with cash flushing
> for_each_set_bit:               547626 ns,   1000 iterations
> for_each_set_bit_from:          497899 ns,   1000 iterations
> 
> Refer this:
> 
> https://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg356151.html

I see. The results look convincing on the first look.

But I am still not sure. This patch is basically contradicting many
other patches from this patchset:

  + 5th patch optimizes find_first_and_bit() and proves that it is
    much faster:

    Before (#define find_first_and_bit(...) find_next_and_bit(..., 0):
    Start testing find_bit() with random-filled bitmap
    [  140.291468] find_first_and_bit:           46890919 ns,  32671 iterations
    Start testing find_bit() with sparse bitmap
    [  140.295028] find_first_and_bit:               7103 ns,      1 iterations

    After:
    Start testing find_bit() with random-filled bitmap
    [  162.574907] find_first_and_bit:           25045813 ns,  32846 iterations
    Start testing find_bit() with sparse bitmap
    [  162.578458] find_first_and_bit:               4900 ns,      1 iterations

       => saves 46% in random bitmap
	  saves 31% in sparse bitmap


  + 6th, 7th, and 9th patch makes the code use find_first_bit()
    because it is faster than find_next_bit(mask, size, 0);

  + Now, 11th (this) patch replaces find_first_bit() with
    find_next_bit(mask, size, 0) because find_first_bit()
    makes things slower. It is suspicious at minimum.


By other words. The I-cache could safe 10% in one case.
But find_first_bit() might safe 46% in random case.

Does I-cache cost more than the faster code?

Or was for_each_set_bit() tested only with a bitmap
where find_first_bit() optimization did not help much?

How would for_each_set_bit() work with random bitmap?
How does it work with larger bitmaps?

Best Regards,
Petr


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 11/17] find: micro-optimize for_each_{set,clear}_bit()
  2021-08-30 12:12       ` Petr Mladek
@ 2021-08-30 16:15         ` Yury Norov
  0 siblings, 0 replies; 26+ messages in thread
From: Yury Norov @ 2021-08-30 16:15 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, linux-kernel, linux-mm, linux-arch,
	linux-kselftest, linux-mmc, linux-perf-users, kvm,
	James E.J. Bottomley, Alexander Lobakin, Alexander Shishkin,
	Alexey Klimov, Andrea Merello, Andy Shevchenko,
	Arnaldo Carvalho de Melo, Arnd Bergmann, Ben Gardon,
	Benjamin Herrenschmidt, Brian Cain, Catalin Marinas,
	Christoph Lameter, Daniel Bristot de Oliveira, David Hildenbrand,
	Dennis Zhou, Geert Uytterhoeven, Heiko Carstens, Ian Rogers,
	Ingo Molnar, Jaegeuk Kim, Jakub Kicinski, Jiri Olsa, Joe Perches,
	Jonas Bonn, Leo Yan, Mark Rutland, Namhyung Kim, Palmer Dabbelt,
	Paolo Bonzini, Peter Xu, Peter Zijlstra, Rasmus Villemoes,
	Rich Felker, Samuel Mendoza-Jonas, Sean Christopherson,
	Sergey Senozhatsky, Shuah Khan, Stefan Kristiansson,
	Steven Rostedt, Tejun Heo, Thomas Bogendoerfer, Ulf Hansson,
	Will Deacon, Wolfram Sang, Yoshinori Sato

On Mon, Aug 30, 2021 at 02:12:49PM +0200, Petr Mladek wrote:
> On Thu 2021-08-26 14:09:55, Yury Norov wrote:
> > On Thu, Aug 26, 2021 at 03:57:13PM +0200, Petr Mladek wrote:
> > > On Sat 2021-08-14 14:17:07, Yury Norov wrote:
> > > > The macros iterate thru all set/clear bits in a bitmap. They search a
> > > > first bit using find_first_bit(), and the rest bits using find_next_bit().
> > > > 
> > > > Since find_next_bit() is called shortly after find_first_bit(), we can
> > > > save few lines of I-cache by not using find_first_bit().
> > > 
> > > Is this only a speculation or does it fix a real performance problem?
> > > 
> > > The macro is used like:
> > > 
> > > 	for_each_set_bit(bit, addr, size) {
> > > 		fn(bit);
> > > 	}
> > > 
> > > IMHO, the micro-opimization does not help when fn() is non-trivial.
> >  
> > The effect is measurable:
> > 
> > Start testing for_each_bit()
> > for_each_set_bit:                15296 ns,   1000 iterations
> > for_each_set_bit_from:           15225 ns,   1000 iterations
> > 
> > Start testing for_each_bit() with cash flushing
> > for_each_set_bit:               547626 ns,   1000 iterations
> > for_each_set_bit_from:          497899 ns,   1000 iterations
> > 
> > Refer this:
> > 
> > https://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg356151.html
> 
> I see. The results look convincing on the first look.
> 
> But I am still not sure. This patch is basically contradicting many
> other patches from this patchset:
> 
>   + 5th patch optimizes find_first_and_bit() and proves that it is
>     much faster:
> 
>     Before (#define find_first_and_bit(...) find_next_and_bit(..., 0):
>     Start testing find_bit() with random-filled bitmap
>     [  140.291468] find_first_and_bit:           46890919 ns,  32671 iterations
>     Start testing find_bit() with sparse bitmap
>     [  140.295028] find_first_and_bit:               7103 ns,      1 iterations
> 
>     After:
>     Start testing find_bit() with random-filled bitmap
>     [  162.574907] find_first_and_bit:           25045813 ns,  32846 iterations
>     Start testing find_bit() with sparse bitmap
>     [  162.578458] find_first_and_bit:               4900 ns,      1 iterations
> 
>        => saves 46% in random bitmap
> 	  saves 31% in sparse bitmap
> 
> 
>   + 6th, 7th, and 9th patch makes the code use find_first_bit()
>     because it is faster than find_next_bit(mask, size, 0);
> 
>   + Now, 11th (this) patch replaces find_first_bit() with
>     find_next_bit(mask, size, 0) because find_first_bit()
>     makes things slower. It is suspicious at minimum.
> 
> 
> By other words. The I-cache could safe 10% in one case.
> But find_first_bit() might safe 46% in random case.

Those are different cases. find_first_bit() is approximately twice
faster than find_next_bit, and much smaller. The conclusion is simple:
use 'first' version whenever possible if there's no other considerations.

In case of for_each_bit() macros, however, we have such a consideration.
In contrast to regular pattern, where user calls either first, or next
versions N times, here we call find_first_bit once, and then find_next_bit
N-1 times.

Because we know for sure that we'll call find_next_bit shortly, we can
benefit from locality under heavy pressure on I-cache, if replace 'first'
with 'next'. Consider it as a prefetch mechanism for the following calls
to find_next_bit().

> Does I-cache cost more than the faster code?
 
In this case cache miss is more expensive.

> Or was for_each_set_bit() tested only with a bitmap
> where find_first_bit() optimization did not help much?

I tried to ensure that the effect of I-cache is real and in this case
more important than code performance, so in the test I called 'first'
once and 'next' twice.

> How would for_each_set_bit() work with random bitmap?

It would work for all bitmaps.

> How does it work with larger bitmaps?

Percentage gain (but not absolute) will decrease proportionally to the
number of calls of find_next_bit() for big N.

Thanks,
Yury

> Best Regards,
> Petr


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-08-30 16:15 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-14 21:16 [PATCH RESEND 00/17] Resend bitmap patches Yury Norov
2021-08-14 21:16 ` [PATCH 01/17] bitops: protect find_first_{,zero}_bit properly Yury Norov
2021-08-14 21:16 ` [PATCH 02/17] bitops: move find_bit_*_le functions from le.h to find.h Yury Norov
2021-08-14 21:16 ` [PATCH 03/17] include: move find.h from asm_generic to linux Yury Norov
2021-08-14 21:17 ` [PATCH 04/17] arch: remove GENERIC_FIND_FIRST_BIT entirely Yury Norov
2021-08-14 21:17 ` [PATCH 05/17] lib: add find_first_and_bit() Yury Norov
2021-08-14 21:17 ` [PATCH 06/17] cpumask: use find_first_and_bit() Yury Norov
2021-08-14 21:17 ` [PATCH 07/17] all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate Yury Norov
2021-08-14 21:17 ` [PATCH 08/17] tools: sync tools/bitmap with mother linux Yury Norov
2021-08-14 21:17 ` [PATCH 09/17] cpumask: replace cpumask_next_* with cpumask_first_* where appropriate Yury Norov
2021-08-14 21:17 ` [PATCH 10/17] include/linux: move for_each_bit() macros from bitops.h to find.h Yury Norov
2021-08-14 21:17 ` [PATCH 11/17] find: micro-optimize for_each_{set,clear}_bit() Yury Norov
2021-08-26 13:57   ` Petr Mladek
2021-08-26 21:09     ` Yury Norov
2021-08-30 12:12       ` Petr Mladek
2021-08-30 16:15         ` Yury Norov
2021-08-14 21:17 ` [PATCH 12/17] Replace for_each_*_bit_from() with for_each_*_bit() where appropriate Yury Norov
2021-08-14 21:17 ` [PATCH 13/17] tools: Rename bitmap_alloc() to bitmap_zalloc() Yury Norov
2021-08-14 21:17 ` [PATCH 14/17] mm/percpu: micro-optimize pcpu_is_populated() Yury Norov
2021-08-14 21:17 ` [PATCH 15/17] bitmap: unify find_bit operations Yury Norov
2021-08-14 21:17 ` [PATCH 16/17] lib: bitmap: add performance test for bitmap_print_to_pagebuf Yury Norov
2021-08-14 21:17 ` [PATCH 17/17] vsprintf: rework bitmap_list_string Yury Norov
2021-08-15 11:09   ` Andy Shevchenko
2021-08-17 16:35     ` Yury Norov
2021-08-26 14:15   ` Petr Mladek
2021-08-26 20:59     ` Yury Norov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).