linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/34] biops: add atomig find_bit() operations
@ 2023-11-18 15:50 Yury Norov
  2023-11-18 15:50 ` [PATCH 01/34] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (34 more replies)
  0 siblings, 35 replies; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:50 UTC (permalink / raw)
  To: linux-kernel, David S. Miller, H. Peter Anvin,
	James E.J. Bottomley, K. Y. Srinivasan, Md. Haris Iqbal,
	Akinobu Mita, Andrew Morton, Bjorn Andersson, Borislav Petkov,
	Chaitanya Kulkarni, Christian Brauner, Damien Le Moal,
	Dave Hansen, David Disseldorp, Edward Cree, Eric Dumazet,
	Fenghua Yu, Geert Uytterhoeven, Greg Kroah-Hartman,
	Gregory Greenman, Hans Verkuil, Hans de Goede, Hugh Dickins,
	Ingo Molnar, Jakub Kicinski, Jaroslav Kysela, Jason Gunthorpe,
	Jens Axboe, Jiri Pirko, Jiri Slaby, Kalle Valo, Karsten Graul,
	Karsten Keil, Kees Cook, Leon Romanovsky, Mark Rutland,
	Martin Habets, Mauro Carvalho Chehab, Michael Ellerman,
	Michal Simek, Nicholas Piggin, Oliver Neukum, Paolo Abeni,
	Paolo Bonzini, Peter Zijlstra, Ping-Ke Shih, Rich Felker,
	Rob Herring, Robin Murphy, Sathya Prakash Veerichetty,
	Sean Christopherson, Shuai Xue, Stanislaw Gruszka,
	Steven Rostedt, Thomas Bogendoerfer, Thomas Gleixner,
	Valentin Schneider, Vitaly Kuznetsov, Wenjia Zhang, Will Deacon,
	Yoshinori Sato, GR-QLogic-Storage-Upstream, alsa-devel, ath10k,
	dmaengine, iommu, kvm, linux-arm-kernel, linux-arm-msm,
	linux-block, linux-bluetooth, linux-hyperv, linux-m68k,
	linux-media, linux-mips, linux-net-drivers, linux-pci,
	linux-rdma, linux-s390, linux-scsi, linux-serial, linux-sh,
	linux-sound, linux-usb, linux-wireless, linuxppc-dev,
	mpi3mr-linuxdrv.pdl, netdev, sparclinux, x86
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

Add helpers around test_and_{set,clear}_bit() that allow to search for
clear or set bits and flip them atomically.

The target patterns may look like this:

	for (idx = 0; idx < nbits; idx++)
		if (test_and_clear_bit(idx, bitmap))
			do_something(idx);

Or like this:

	do {
		bit = find_first_bit(bitmap, nbits);
		if (bit >= nbits)
			return nbits;
	} while (!test_and_clear_bit(bit, bitmap));
	return bit;

In both cases, the opencoded loop may be converted to a single function
or iterator call. Correspondingly:

	for_each_test_and_clear_bit(idx, bitmap, nbits)
		do_something(idx);

Or:
	return find_and_clear_bit(bitmap, nbits);

Obviously, the less routine code people have write themself, the less
probability to make a mistake. Patch #31 of this series fixes one such
error in perf/m1 codebase.

Those are not only handy helpers but also resolve a non-trivial
issue of using non-atomic find_bit() together with atomic
test_and_{set,clear)_bit().

The trick is that find_bit() implies that the bitmap is a regular
non-volatile piece of memory, and compiler is allowed to use such
optimization techniques like re-fetching memory instead of caching it.

For example, find_first_bit() is implemented like this:

      for (idx = 0; idx * BITS_PER_LONG < sz; idx++) {
              val = addr[idx];
              if (val) {
                      sz = min(idx * BITS_PER_LONG + __ffs(val), sz);
                      break;
              }
      }

On register-memory architectures, like x86, compiler may decide to
access memory twice - first time to compare against 0, and second time
to fetch its value to pass it to __ffs().

When running find_first_bit() on volatile memory, the memory may get
changed in-between, and for instance, it may lead to passing 0 to
__ffs(), which is undefined. This is a potentially dangerous call.

find_and_clear_bit() as a wrapper around test_and_clear_bit()
naturally treats underlying bitmap as a volatile memory and prevents
compiler from such optimizations.

Now that KCSAN is catching exactly this type of situations and warns on
undercover memory modifications. We can use it to reveal improper usage
of find_bit(), and convert it to atomic find_and_*_bit() as appropriate.

The 1st patch of the series adds the following atomic primitives:

	find_and_set_bit(addr, nbits);
	find_and_set_next_bit(addr, nbits, start);
	...

Here find_and_{set,clear} part refers to the corresponding
test_and_{set,clear}_bit function, and suffixes like _wrap or _lock
derive semantics from corresponding find() or test() functions.

For brevity, the naming omits the fact that we search for zero bit in
find_and_set, and correspondingly, search for set bit in find_and_clear
functions.

The patch also adds iterators with atomic semantics, like
for_each_test_and_set_bit(). Here, the naming rule is to simply prefix
corresponding atomic operation with 'for_each'.

This series is a result of discussion [1]. All find_bit() functions imply
exclusive access to the bitmaps. However, KCSAN reports quite a number
of warnings related to find_bit() API. Some of them are not pointing
to real bugs because in many situations people intentionally allow
concurrent bitmap operations.

If so, find_bit() can be annotated such that KCSAN will ignore it:

	bit = data_race(find_first_bit(bitmap, nbits));

This series addresses the other important case where people really need
atomic find ops. As the following patches show, the resulting code
looks safer and more verbose comparing to opencoded loops followed by
atomic bit flips.

In [1] Mirsad reported 2% slowdown in a single-thread search test when
switching find_bit() function to treat bitmaps as volatile arrays. On
the other hand, kernel robot in the same thread reported +3.7% to the
performance of will-it-scale.per_thread_ops test.

Assuming that our compilers are sane and generate better code against
properly annotated data, the above discrepancy doesn't look weird. When
running on non-volatile bitmaps, plain find_bit() outperforms atomic
find_and_bit(), and vice-versa.

So, all users of find_bit() API, where heavy concurrency is expected,
are encouraged to switch to atomic find_and_bit() as appropriate.

1st patch of this series adds atomic find_and_bit() API, and all the
following patches spread it over the kernel. They can be applied
separately from each other on per-subsystems basis, or I can pull them
in bitmap tree, as appropriate.

[1] https://lore.kernel.org/lkml/634f5fdf-e236-42cf-be8d-48a581c21660@alu.unizg.hr/T/#m3e7341eb3571753f3acf8fe166f3fb5b2c12e615 

Yury Norov (34):
  lib/find: add atomic find_bit() primitives
  lib/sbitmap; make __sbitmap_get_word() using find_and_set_bit()
  watch_queue: use atomic find_bit() in post_one_notification()
  sched: add cpumask_find_and_set() and use it in __mm_cid_get()
  mips: sgi-ip30: rework heart_alloc_int()
  sparc: fix opencoded find_and_set_bit() in alloc_msi()
  perf/arm: optimize opencoded atomic find_bit() API
  drivers/perf: optimize ali_drw_get_counter_idx() by using find_bit()
  dmaengine: idxd: optimize perfmon_assign_event()
  ath10k: optimize ath10k_snoc_napi_poll() by using find_bit()
  wifi: rtw88: optimize rtw_pci_tx_kick_off() by using find_bit()
  wifi: intel: use atomic find_bit() API where appropriate
  KVM: x86: hyper-v: optimize and cleanup kvm_hv_process_stimers()
  PCI: hv: switch hv_get_dom_num() to use atomic find_bit()
  scsi: use atomic find_bit() API where appropriate
  powerpc: use atomic find_bit() API where appropriate
  iommu: use atomic find_bit() API where appropriate
  media: radio-shark: use atomic find_bit() API where appropriate
  sfc: switch to using atomic find_bit() API where appropriate
  tty: nozomi: optimize interrupt_handler()
  usb: cdc-acm: optimize acm_softint()
  block: null_blk: fix opencoded find_and_set_bit() in get_tag()
  RDMA/rtrs: fix opencoded find_and_set_bit_lock() in
    __rtrs_get_permit()
  mISDN: optimize get_free_devid()
  media: em28xx: cx231xx: fix opencoded find_and_set_bit()
  ethernet: rocker: optimize ofdpa_port_internal_vlan_id_get()
  serial: sc12is7xx: optimize sc16is7xx_alloc_line()
  bluetooth: optimize cmtp_alloc_block_id()
  net: smc: fix opencoded find_and_set_bit() in
    smc_wr_tx_get_free_slot_index()
  ALSA: use atomic find_bit() functions where applicable
  drivers/perf: optimize m1_pmu_get_event_idx() by using find_bit() API
  m68k: rework get_mmu_context()
  microblaze: rework get_mmu_context()
  sh: rework ilsel_enable()

 arch/m68k/include/asm/mmu_context.h           |  11 +-
 arch/microblaze/include/asm/mmu_context_mm.h  |  11 +-
 arch/mips/sgi-ip30/ip30-irq.c                 |  12 +-
 arch/powerpc/mm/book3s32/mmu_context.c        |  10 +-
 arch/powerpc/platforms/pasemi/dma_lib.c       |  45 +--
 arch/powerpc/platforms/powernv/pci-sriov.c    |  12 +-
 arch/sh/boards/mach-x3proto/ilsel.c           |   4 +-
 arch/sparc/kernel/pci_msi.c                   |   9 +-
 arch/x86/kvm/hyperv.c                         |  39 ++-
 drivers/block/null_blk/main.c                 |  41 +--
 drivers/dma/idxd/perfmon.c                    |   8 +-
 drivers/infiniband/ulp/rtrs/rtrs-clt.c        |  15 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.h         |  10 +-
 drivers/iommu/msm_iommu.c                     |  18 +-
 drivers/isdn/mISDN/core.c                     |   9 +-
 drivers/media/radio/radio-shark.c             |   5 +-
 drivers/media/radio/radio-shark2.c            |   5 +-
 drivers/media/usb/cx231xx/cx231xx-cards.c     |  16 +-
 drivers/media/usb/em28xx/em28xx-cards.c       |  37 +--
 drivers/net/ethernet/rocker/rocker_ofdpa.c    |  11 +-
 drivers/net/ethernet/sfc/rx_common.c          |   4 +-
 drivers/net/ethernet/sfc/siena/rx_common.c    |   4 +-
 drivers/net/ethernet/sfc/siena/siena_sriov.c  |  14 +-
 drivers/net/wireless/ath/ath10k/snoc.c        |   9 +-
 .../net/wireless/intel/iwlegacy/4965-mac.c    |   7 +-
 drivers/net/wireless/intel/iwlegacy/common.c  |   8 +-
 drivers/net/wireless/intel/iwlwifi/dvm/sta.c  |   8 +-
 drivers/net/wireless/intel/iwlwifi/dvm/tx.c   |  19 +-
 drivers/net/wireless/realtek/rtw88/pci.c      |   5 +-
 drivers/net/wireless/realtek/rtw89/pci.c      |   5 +-
 drivers/pci/controller/pci-hyperv.c           |   7 +-
 drivers/perf/alibaba_uncore_drw_pmu.c         |  10 +-
 drivers/perf/apple_m1_cpu_pmu.c               |   8 +-
 drivers/perf/arm-cci.c                        |  23 +-
 drivers/perf/arm-ccn.c                        |  10 +-
 drivers/perf/arm_dmc620_pmu.c                 |   9 +-
 drivers/perf/arm_pmuv3.c                      |   8 +-
 drivers/scsi/mpi3mr/mpi3mr_os.c               |  21 +-
 drivers/scsi/qedi/qedi_main.c                 |   9 +-
 drivers/scsi/scsi_lib.c                       |   5 +-
 drivers/tty/nozomi.c                          |   5 +-
 drivers/tty/serial/sc16is7xx.c                |   8 +-
 drivers/usb/class/cdc-acm.c                   |   5 +-
 include/linux/cpumask.h                       |  12 +
 include/linux/find.h                          | 289 ++++++++++++++++++
 kernel/sched/sched.h                          |  52 +---
 kernel/watch_queue.c                          |   6 +-
 lib/find_bit.c                                |  85 ++++++
 lib/sbitmap.c                                 |  46 +--
 net/bluetooth/cmtp/core.c                     |  10 +-
 net/smc/smc_wr.c                              |  10 +-
 sound/pci/hda/hda_codec.c                     |   7 +-
 sound/usb/caiaq/audio.c                       |  13 +-
 53 files changed, 588 insertions(+), 481 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 01/34] lib/find: add atomic find_bit() primitives
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
@ 2023-11-18 15:50 ` Yury Norov
  2023-11-18 16:23   ` Bart Van Assche
  2023-11-18 15:50 ` [PATCH 02/34] lib/sbitmap; make __sbitmap_get_word() using find_and_set_bit() Yury Norov
                   ` (33 subsequent siblings)
  34 siblings, 1 reply; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:50 UTC (permalink / raw)
  To: linux-kernel, David S. Miller, H. Peter Anvin,
	James E.J. Bottomley, K. Y. Srinivasan, Md. Haris Iqbal,
	Akinobu Mita, Andrew Morton, Bjorn Andersson, Borislav Petkov,
	Chaitanya Kulkarni, Christian Brauner, Damien Le Moal,
	Dave Hansen, David Disseldorp, Edward Cree, Eric Dumazet,
	Fenghua Yu, Geert Uytterhoeven, Greg Kroah-Hartman,
	Gregory Greenman, Hans Verkuil, Hans de Goede, Hugh Dickins,
	Ingo Molnar, Jakub Kicinski, Jaroslav Kysela, Jason Gunthorpe,
	Jens Axboe, Jiri Pirko, Jiri Slaby, Kalle Valo, Karsten Graul,
	Karsten Keil, Kees Cook, Leon Romanovsky, Mark Rutland,
	Martin Habets, Mauro Carvalho Chehab, Michael Ellerman,
	Michal Simek, Nicholas Piggin, Oliver Neukum, Paolo Abeni,
	Paolo Bonzini, Peter Zijlstra, Ping-Ke Shih, Rich Felker,
	Rob Herring, Robin Murphy, Sathya Prakash Veerichetty,
	Sean Christopherson, Shuai Xue, Stanislaw Gruszka,
	Steven Rostedt, Thomas Bogendoerfer, Thomas Gleixner,
	Valentin Schneider, Vitaly Kuznetsov, Wenjia Zhang, Will Deacon,
	Yoshinori Sato, GR-QLogic-Storage-Upstream, alsa-devel, ath10k,
	dmaengine, iommu, kvm, linux-arm-kernel, linux-arm-msm,
	linux-block, linux-bluetooth, linux-hyperv, linux-m68k,
	linux-media, linux-mips, linux-net-drivers, linux-pci,
	linux-rdma, linux-s390, linux-scsi, linux-serial, linux-sh,
	linux-sound, linux-usb, linux-wireless, linuxppc-dev,
	mpi3mr-linuxdrv.pdl, netdev, sparclinux, x86
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

Add helpers around test_and_{set,clear}_bit() that allow to search for
clear or set bits and flip them atomically.

The target patterns may look like this:

	for (idx = 0; idx < nbits; idx++)
		if (test_and_clear_bit(idx, bitmap))
			do_something(idx);

Or like this:

	do {
		bit = find_first_bit(bitmap, nbits);
		if (bit >= nbits)
			return nbits;
	} while (!test_and_clear_bit(bit, bitmap));
	return bit;

In both cases, the opencoded loop may be converted to a single function
or iterator call. Correspondingly:

	for_each_test_and_clear_bit(idx, bitmap, nbits)
		do_something(idx);

Or:
	return find_and_clear_bit(bitmap, nbits);

Obviously, the less routine code people have write themself, the less
probability to make a mistake.

Those are not only handy helpers but also resolve a non-trivial
issue of using non-atomic find_bit() together with atomic
test_and_{set,clear)_bit().

The trick is that find_bit() implies that the bitmap is a regular
non-volatile piece of memory, and compiler is allowed to use such
optimization techniques like re-fetching memory instead of caching it.

For example, find_first_bit() is implemented like this:

      for (idx = 0; idx * BITS_PER_LONG < sz; idx++) {
              val = addr[idx];
              if (val) {
                      sz = min(idx * BITS_PER_LONG + __ffs(val), sz);
                      break;
              }
      }

On register-memory architectures, like x86, compiler may decide to
access memory twice - first time to compare against 0, and second time
to fetch its value to pass it to __ffs().

When running find_first_bit() on volatile memory, the memory may get
changed in-between, and for instance, it may lead to passing 0 to
__ffs(), which is undefined. This is a potentially dangerous call.

find_and_clear_bit() as a wrapper around test_and_clear_bit()
naturally treats underlying bitmap as a volatile memory and prevents
compiler from such optimizations.

Now that KCSAN is catching exactly this type of situations and warns on
undercover memory modifications. We can use it to reveal improper usage
of find_bit(), and convert it to atomic find_and_*_bit() as appropriate.

The 1st patch of the series adds the following atomic primitives:

	find_and_set_bit(addr, nbits);
	find_and_set_next_bit(addr, nbits, start);
	...

Here find_and_{set,clear} part refers to the corresponding
test_and_{set,clear}_bit function, and suffixes like _wrap or _lock
derive semantics from corresponding find() or test() functions.

For brevity, the naming omits the fact that we search for zero bit in
find_and_set, and correspondingly, search for set bit in find_and_clear
functions.

The patch also adds iterators with atomic semantics, like
for_each_test_and_set_bit(). Here, the naming rule is to simply prefix
corresponding atomic operation with 'for_each'.

All users of find_bit() API, where heavy concurrency is expected,
are encouraged to switch to atomic find_and_bit() as appropriate.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 include/linux/find.h | 289 +++++++++++++++++++++++++++++++++++++++++++
 lib/find_bit.c       |  85 +++++++++++++
 2 files changed, 374 insertions(+)

diff --git a/include/linux/find.h b/include/linux/find.h
index 5e4f39ef2e72..e8567f336f42 100644
--- a/include/linux/find.h
+++ b/include/linux/find.h
@@ -32,6 +32,16 @@ extern unsigned long _find_first_and_bit(const unsigned long *addr1,
 extern unsigned long _find_first_zero_bit(const unsigned long *addr, unsigned long size);
 extern unsigned long _find_last_bit(const unsigned long *addr, unsigned long size);
 
+unsigned long _find_and_set_bit(volatile unsigned long *addr, unsigned long nbits);
+unsigned long _find_and_set_next_bit(volatile unsigned long *addr, unsigned long nbits,
+				unsigned long start);
+unsigned long _find_and_set_bit_lock(volatile unsigned long *addr, unsigned long nbits);
+unsigned long _find_and_set_next_bit_lock(volatile unsigned long *addr, unsigned long nbits,
+					  unsigned long start);
+unsigned long _find_and_clear_bit(volatile unsigned long *addr, unsigned long nbits);
+unsigned long _find_and_clear_next_bit(volatile unsigned long *addr, unsigned long nbits,
+				unsigned long start);
+
 #ifdef __BIG_ENDIAN
 unsigned long _find_first_zero_bit_le(const unsigned long *addr, unsigned long size);
 unsigned long _find_next_zero_bit_le(const  unsigned long *addr, unsigned
@@ -460,6 +470,267 @@ unsigned long __for_each_wrap(const unsigned long *bitmap, unsigned long size,
 	return bit < start ? bit : size;
 }
 
+/**
+ * find_and_set_bit - Find a zero bit and set it atomically
+ * @addr: The address to base the search on
+ * @nbits: The bitmap size in bits
+ *
+ * This function is designed to operate in concurrent access environment.
+ *
+ * Because of concurrency and volatile nature of underlying bitmap, it's not
+ * guaranteed that the bit found is the 1st bit in the bitmap. It's also not
+ * guaranteed that if @nbits is returned, the bitmap is empty.
+ *
+ * The function does guarantee that if returned value is in range [0 .. @nbits),
+ * the acquired bit belongs to the caller exclusively.
+ *
+ * Returns: found and set bit, or @nbits if no bits found
+ */
+static inline
+unsigned long find_and_set_bit(volatile unsigned long *addr, unsigned long nbits)
+{
+	if (small_const_nbits(nbits)) {
+		unsigned long val, ret;
+
+		do {
+			val = *addr | ~GENMASK(nbits - 1, 0);
+			if (val == ~0UL)
+				return nbits;
+			ret = ffz(val);
+		} while (test_and_set_bit(ret, addr));
+
+		return ret;
+	}
+
+	return _find_and_set_bit(addr, nbits);
+}
+
+
+/**
+ * find_and_set_next_bit - Find a zero bit and set it, starting from @offset
+ * @addr: The address to base the search on
+ * @nbits: The bitmap nbits in bits
+ * @offset: The bitnumber to start searching at
+ *
+ * This function is designed to operate in concurrent access environment.
+ *
+ * Because of concurrency and volatile nature of underlying bitmap, it's not
+ * guaranteed that the bit found is the 1st bit in the bitmap, starting from @offset.
+ * It's also not guaranteed that if @nbits is returned, the bitmap is empty.
+ *
+ * The function does guarantee that if returned value is in range [@offset .. @nbits),
+ * the acquired bit belongs to the caller exclusively.
+ *
+ * Returns: found and set bit, or @nbits if no bits found
+ */
+static inline
+unsigned long find_and_set_next_bit(volatile unsigned long *addr,
+				    unsigned long nbits, unsigned long offset)
+{
+	if (small_const_nbits(nbits)) {
+		unsigned long val, ret;
+
+		do {
+			val = *addr | ~GENMASK(nbits - 1, offset);
+			if (val == ~0UL)
+				return nbits;
+			ret = ffz(val);
+		} while (test_and_set_bit(ret, addr));
+
+		return ret;
+	}
+
+	return _find_and_set_next_bit(addr, nbits, offset);
+}
+
+/**
+ * find_and_set_bit_wrap - find and set bit starting at @offset, wrapping around zero
+ * @addr: The first address to base the search on
+ * @nbits: The bitmap size in bits
+ * @offset: The bitnumber to start searching at
+ *
+ * Returns: the bit number for the next clear bit, or first clear bit up to @offset,
+ * while atomically setting it. If no bits are found, returns @nbits.
+ */
+static inline
+unsigned long find_and_set_bit_wrap(volatile unsigned long *addr,
+					unsigned long nbits, unsigned long offset)
+{
+	unsigned long bit = find_and_set_next_bit(addr, nbits, offset);
+
+	if (bit < nbits || offset == 0)
+		return bit;
+
+	bit = find_and_set_bit(addr, offset);
+	return bit < offset ? bit : nbits;
+}
+
+/**
+ * find_and_set_bit_lock - find a zero bit, then set it atomically with lock
+ * @addr: The address to base the search on
+ * @nbits: The bitmap nbits in bits
+ *
+ * This function is designed to operate in concurrent access environment.
+ *
+ * Because of concurrency and volatile nature of underlying bitmap, it's not
+ * guaranteed that the bit found is the 1st bit in the bitmap. It's also not
+ * guaranteed that if @nbits is returned, the bitmap is empty.
+ *
+ * The function does guarantee that if returned value is in range [0 .. @nbits),
+ * the acquired bit belongs to the caller exclusively.
+ *
+ * Returns: found and set bit, or @nbits if no bits found
+ */
+static inline
+unsigned long find_and_set_bit_lock(volatile unsigned long *addr, unsigned long nbits)
+{
+	if (small_const_nbits(nbits)) {
+		unsigned long val, ret;
+
+		do {
+			val = *addr | ~GENMASK(nbits - 1, 0);
+			if (val == ~0UL)
+				return nbits;
+			ret = ffz(val);
+		} while (test_and_set_bit_lock(ret, addr));
+
+		return ret;
+	}
+
+	return _find_and_set_bit_lock(addr, nbits);
+}
+
+/**
+ * find_and_set_next_bit_lock - find a zero bit and set it atomically with lock
+ * @addr: The address to base the search on
+ * @nbits: The bitmap size in bits
+ * @offset: The bitnumber to start searching at
+ *
+ * This function is designed to operate in concurrent access environment.
+ *
+ * Because of concurrency and volatile nature of underlying bitmap, it's not
+ * guaranteed that the bit found is the 1st bit in the range. It's also not
+ * guaranteed that if @nbits is returned, the bitmap is empty.
+ *
+ * The function does guarantee that if returned value is in range [@offset .. @nbits),
+ * the acquired bit belongs to the caller exclusively.
+ *
+ * Returns: found and set bit, or @nbits if no bits found
+ */
+static inline
+unsigned long find_and_set_next_bit_lock(volatile unsigned long *addr,
+					 unsigned long nbits, unsigned long offset)
+{
+	if (small_const_nbits(nbits)) {
+		unsigned long val, ret;
+
+		do {
+			val = *addr | ~GENMASK(nbits - 1, offset);
+			if (val == ~0UL)
+				return nbits;
+			ret = ffz(val);
+		} while (test_and_set_bit_lock(ret, addr));
+
+		return ret;
+	}
+
+	return _find_and_set_next_bit_lock(addr, nbits, offset);
+}
+
+/**
+ * find_and_set_bit_wrap_lock - find zero bit starting at @ofset and set it
+ *				with lock, and wrap around zero if nothing found
+ * @addr: The first address to base the search on
+ * @nbits: The bitmap size in bits
+ * @offset: The bitnumber to start searching at
+ *
+ * Returns: the bit number for the next set bit, or first set bit up to @offset
+ * If no bits are set, returns @nbits.
+ */
+static inline
+unsigned long find_and_set_bit_wrap_lock(volatile unsigned long *addr,
+					unsigned long nbits, unsigned long offset)
+{
+	unsigned long bit = find_and_set_next_bit_lock(addr, nbits, offset);
+
+	if (bit < nbits || offset == 0)
+		return bit;
+
+	bit = find_and_set_bit_lock(addr, offset);
+	return bit < offset ? bit : nbits;
+}
+
+/**
+ * find_and_clear_bit - Find a set bit and clear it atomically
+ * @addr: The address to base the search on
+ * @nbits: The bitmap nbits in bits
+ *
+ * This function is designed to operate in concurrent access environment.
+ *
+ * Because of concurrency and volatile nature of underlying bitmap, it's not
+ * guaranteed that the found bit is the 1st bit in the bitmap. It's also not
+ * guaranteed that if @nbits is returned, the bitmap is empty.
+ *
+ * The function does guarantee that if returned value is in range [0 .. @nbits),
+ * the acquired bit belongs to the caller exclusively.
+ *
+ * Returns: found and cleared bit, or @nbits if no bits found
+ */
+static inline unsigned long find_and_clear_bit(volatile unsigned long *addr, unsigned long nbits)
+{
+	if (small_const_nbits(nbits)) {
+		unsigned long val, ret;
+
+		do {
+			val = *addr & GENMASK(nbits - 1, 0);
+			if (val == 0)
+				return nbits;
+			ret = __ffs(val);
+		} while (!test_and_clear_bit(ret, addr));
+
+		return ret;
+	}
+
+	return _find_and_clear_bit(addr, nbits);
+}
+
+/**
+ * find_and_clear_next_bit - Find a set bit next after @offset, and clear it atomically
+ * @addr: The address to base the search on
+ * @nbits: The bitmap nbits in bits
+ * @offset: bit offset at which to start searching
+ *
+ * This function is designed to operate in concurrent access environment.
+ *
+ * Because of concurrency and volatile nature of underlying bitmap, it's not
+ * guaranteed that the bit found is the 1st bit in the range It's also not
+ * guaranteed that if @nbits is returned, there's no set bits after @offset.
+ *
+ * The function does guarantee that if returned value is in range [@offset .. @nbits),
+ * the acquired bit belongs to the caller exclusively.
+ *
+ * Returns: found and cleared bit, or @nbits if no bits found
+ */
+static inline
+unsigned long find_and_clear_next_bit(volatile unsigned long *addr,
+					unsigned long nbits, unsigned long offset)
+{
+	if (small_const_nbits(nbits)) {
+		unsigned long val, ret;
+
+		do {
+			val = *addr & GENMASK(nbits - 1, offset);
+			if (val == 0)
+				return nbits;
+			ret = __ffs(val);
+		} while (!test_and_clear_bit(ret, addr));
+
+		return ret;
+	}
+
+	return _find_and_clear_next_bit(addr, nbits, offset);
+}
+
 /**
  * find_next_clump8 - find next 8-bit clump with set bits in a memory region
  * @clump: location to store copy of found clump
@@ -577,6 +848,24 @@ unsigned long find_next_bit_le(const void *addr, unsigned
 #define for_each_set_bit_from(bit, addr, size) \
 	for (; (bit) = find_next_bit((addr), (size), (bit)), (bit) < (size); (bit)++)
 
+/* same as for_each_set_bit() but atomically clears each found bit */
+#define for_each_test_and_clear_bit(bit, addr, size) \
+	for ((bit) = 0; \
+	     (bit) = find_and_clear_next_bit((addr), (size), (bit)), (bit) < (size); \
+	     (bit)++)
+
+/* same as for_each_clear_bit() but atomically sets each found bit */
+#define for_each_test_and_set_bit(bit, addr, size) \
+	for ((bit) = 0; \
+	     (bit) = find_and_clear_next_bit((addr), (size), (bit)), (bit) < (size); \
+	     (bit)++)
+
+/* same as for_each_clear_bit_from() but atomically clears each found bit */
+#define for_each_test_and_set_bit_from(bit, addr, size) \
+	for (; \
+	     (bit) = find_and_set_next_bit((addr), (size), (bit)), (bit) < (size); \
+	     (bit)++)
+
 #define for_each_clear_bit(bit, addr, size) \
 	for ((bit) = 0;									\
 	     (bit) = find_next_zero_bit((addr), (size), (bit)), (bit) < (size);		\
diff --git a/lib/find_bit.c b/lib/find_bit.c
index 32f99e9a670e..c9b6b9f96610 100644
--- a/lib/find_bit.c
+++ b/lib/find_bit.c
@@ -116,6 +116,91 @@ unsigned long _find_first_and_bit(const unsigned long *addr1,
 EXPORT_SYMBOL(_find_first_and_bit);
 #endif
 
+unsigned long _find_and_set_bit(volatile unsigned long *addr, unsigned long nbits)
+{
+	unsigned long bit;
+
+	do {
+		bit = FIND_FIRST_BIT(~addr[idx], /* nop */, nbits);
+		if (bit >= nbits)
+			return nbits;
+	} while (test_and_set_bit(bit, addr));
+
+	return bit;
+}
+EXPORT_SYMBOL(_find_and_set_bit);
+
+unsigned long _find_and_set_next_bit(volatile unsigned long *addr,
+				     unsigned long nbits, unsigned long start)
+{
+	unsigned long bit;
+
+	do {
+		bit = FIND_NEXT_BIT(~addr[idx], /* nop */, nbits, start);
+		if (bit >= nbits)
+			return nbits;
+	} while (test_and_set_bit(bit, addr));
+
+	return bit;
+}
+EXPORT_SYMBOL(_find_and_set_next_bit);
+
+unsigned long _find_and_set_bit_lock(volatile unsigned long *addr, unsigned long nbits)
+{
+	unsigned long bit;
+
+	do {
+		bit = FIND_FIRST_BIT(~addr[idx], /* nop */, nbits);
+		if (bit >= nbits)
+			return nbits;
+	} while (test_and_set_bit_lock(bit, addr));
+
+	return bit;
+}
+EXPORT_SYMBOL(_find_and_set_bit_lock);
+
+unsigned long _find_and_set_next_bit_lock(volatile unsigned long *addr,
+					  unsigned long nbits, unsigned long start)
+{
+	unsigned long bit;
+
+	do {
+		bit = FIND_NEXT_BIT(~addr[idx], /* nop */, nbits, start);
+		if (bit >= nbits)
+			return nbits;
+	} while (test_and_set_bit_lock(bit, addr));
+
+	return bit;
+}
+EXPORT_SYMBOL(_find_and_set_next_bit_lock);
+
+unsigned long _find_and_clear_bit(volatile unsigned long *addr, unsigned long nbits)
+{
+	unsigned long bit;
+
+	do {
+		bit = FIND_FIRST_BIT(addr[idx], /* nop */, nbits);
+		if (bit >= nbits)
+			return nbits;
+	} while (!test_and_clear_bit(bit, addr));
+
+	return bit;
+}
+EXPORT_SYMBOL(_find_and_clear_bit);
+
+unsigned long _find_and_clear_next_bit(volatile unsigned long *addr,
+					unsigned long nbits, unsigned long start)
+{
+	do {
+		start =  FIND_NEXT_BIT(addr[idx], /* nop */, nbits, start);
+		if (start >= nbits)
+			return nbits;
+	} while (!test_and_clear_bit(start, addr));
+
+	return start;
+}
+EXPORT_SYMBOL(_find_and_clear_next_bit);
+
 #ifndef find_first_zero_bit
 /*
  * Find the first cleared bit in a memory region.
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 02/34] lib/sbitmap; make __sbitmap_get_word() using find_and_set_bit()
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
  2023-11-18 15:50 ` [PATCH 01/34] lib/find: add atomic find_bit() primitives Yury Norov
@ 2023-11-18 15:50 ` Yury Norov
  2023-11-18 15:50 ` [PATCH 03/34] watch_queue: use atomic find_bit() in post_one_notification() Yury Norov
                   ` (32 subsequent siblings)
  34 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:50 UTC (permalink / raw)
  To: linux-kernel, Jens Axboe, linux-block
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

__sbitmap_get_word() opencodes either find_and_set_bit_wrap(), or
find_and_set_next_bit() depending on hint and wrap parameters.

Switch it to use the atomic find_bit() API. While here, simplify
sbitmap_find_bit_in_word(), which calls it.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 lib/sbitmap.c | 46 ++++++++--------------------------------------
 1 file changed, 8 insertions(+), 38 deletions(-)

diff --git a/lib/sbitmap.c b/lib/sbitmap.c
index d0a5081dfd12..b21aebd07fd6 100644
--- a/lib/sbitmap.c
+++ b/lib/sbitmap.c
@@ -133,38 +133,11 @@ void sbitmap_resize(struct sbitmap *sb, unsigned int depth)
 }
 EXPORT_SYMBOL_GPL(sbitmap_resize);
 
-static int __sbitmap_get_word(unsigned long *word, unsigned long depth,
+static inline int __sbitmap_get_word(unsigned long *word, unsigned long depth,
 			      unsigned int hint, bool wrap)
 {
-	int nr;
-
-	/* don't wrap if starting from 0 */
-	wrap = wrap && hint;
-
-	while (1) {
-		nr = find_next_zero_bit(word, depth, hint);
-		if (unlikely(nr >= depth)) {
-			/*
-			 * We started with an offset, and we didn't reset the
-			 * offset to 0 in a failure case, so start from 0 to
-			 * exhaust the map.
-			 */
-			if (hint && wrap) {
-				hint = 0;
-				continue;
-			}
-			return -1;
-		}
-
-		if (!test_and_set_bit_lock(nr, word))
-			break;
-
-		hint = nr + 1;
-		if (hint >= depth - 1)
-			hint = 0;
-	}
-
-	return nr;
+	return wrap ? find_and_set_bit_wrap_lock(word, depth, hint) :
+			find_and_set_next_bit_lock(word, depth, hint);
 }
 
 static int sbitmap_find_bit_in_word(struct sbitmap_word *map,
@@ -175,15 +148,12 @@ static int sbitmap_find_bit_in_word(struct sbitmap_word *map,
 	int nr;
 
 	do {
-		nr = __sbitmap_get_word(&map->word, depth,
-					alloc_hint, wrap);
-		if (nr != -1)
-			break;
-		if (!sbitmap_deferred_clear(map))
-			break;
-	} while (1);
+		nr = __sbitmap_get_word(&map->word, depth, alloc_hint, wrap);
+		if (nr < depth)
+			return nr;
+	} while (sbitmap_deferred_clear(map));
 
-	return nr;
+	return -1;
 }
 
 static int sbitmap_find_bit(struct sbitmap *sb,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 03/34] watch_queue: use atomic find_bit() in post_one_notification()
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
  2023-11-18 15:50 ` [PATCH 01/34] lib/find: add atomic find_bit() primitives Yury Norov
  2023-11-18 15:50 ` [PATCH 02/34] lib/sbitmap; make __sbitmap_get_word() using find_and_set_bit() Yury Norov
@ 2023-11-18 15:50 ` Yury Norov
  2023-11-18 15:50 ` [PATCH 04/34] sched: add cpumask_find_and_set() and use it in __mm_cid_get() Yury Norov
                   ` (31 subsequent siblings)
  34 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:50 UTC (permalink / raw)
  To: linux-kernel, Christian Brauner, David Howells, Siddh Raman Pant,
	Yury Norov, Dave Airlie, David Disseldorp, Philipp Stanner,
	Nick Alcock
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

post_one_notification() searches for a set bit in wqueue->notes_bitmap,
and after some housekeeping work clears it, firing a BUG() if someone
else cleared the bit in-between.

We can allocate a bit atomically with the atomic find_and_clear_bit(),
and remove the BUG() possibility entirely.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 kernel/watch_queue.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/kernel/watch_queue.c b/kernel/watch_queue.c
index 778b4056700f..07edd4a2b463 100644
--- a/kernel/watch_queue.c
+++ b/kernel/watch_queue.c
@@ -112,7 +112,7 @@ static bool post_one_notification(struct watch_queue *wqueue,
 	if (pipe_full(head, tail, pipe->ring_size))
 		goto lost;
 
-	note = find_first_bit(wqueue->notes_bitmap, wqueue->nr_notes);
+	note = find_and_clear_bit(wqueue->notes_bitmap, wqueue->nr_notes);
 	if (note >= wqueue->nr_notes)
 		goto lost;
 
@@ -133,10 +133,6 @@ static bool post_one_notification(struct watch_queue *wqueue,
 	buf->flags = PIPE_BUF_FLAG_WHOLE;
 	smp_store_release(&pipe->head, head + 1); /* vs pipe_read() */
 
-	if (!test_and_clear_bit(note, wqueue->notes_bitmap)) {
-		spin_unlock_irq(&pipe->rd_wait.lock);
-		BUG();
-	}
 	wake_up_interruptible_sync_poll_locked(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM);
 	done = true;
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 04/34] sched: add cpumask_find_and_set() and use it in __mm_cid_get()
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
                   ` (2 preceding siblings ...)
  2023-11-18 15:50 ` [PATCH 03/34] watch_queue: use atomic find_bit() in post_one_notification() Yury Norov
@ 2023-11-18 15:50 ` Yury Norov
  2023-11-20 11:31   ` Peter Zijlstra
  2023-11-18 15:50 ` [PATCH 05/34] mips: sgi-ip30: rework heart_alloc_int() Yury Norov
                   ` (30 subsequent siblings)
  34 siblings, 1 reply; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:50 UTC (permalink / raw)
  To: linux-kernel, Yury Norov, Andy Shevchenko, Rasmus Villemoes,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Maxim Kuvyrkov,
	Alexey Klimov

__mm_cid_get() uses a __mm_cid_try_get() helper to atomically acquire a
bit in mm cid mask. Now that we have atomic find_and_set_bit(), we can
easily extend it to cpumasks and use in the scheduler code.

__mm_cid_try_get() has an infinite loop, which may delay forward
progress of __mm_cid_get() when the mask is dense. The
cpumask_find_and_set() doesn't poll the mask infinitely, and returns as
soon as nothing has found after the first iteration, allowing to acquire
the lock, and set use_cid_lock faster, if needed.

cpumask_find_and_set() considers cid mask as a volatile region of memory,
as it actually is in this case. So, if it's changed while search is in
progress, KCSAN wouldn't fire warning on it.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 include/linux/cpumask.h | 12 ++++++++++
 kernel/sched/sched.h    | 52 ++++++++++++-----------------------------
 2 files changed, 27 insertions(+), 37 deletions(-)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index cfb545841a2c..c2acced8be4e 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -271,6 +271,18 @@ unsigned int cpumask_next_and(int n, const struct cpumask *src1p,
 		small_cpumask_bits, n + 1);
 }
 
+/**
+ * cpumask_find_and_set - find the first unset cpu in a cpumask and
+ *			  set it atomically
+ * @srcp: the cpumask pointer
+ *
+ * Return: >= nr_cpu_ids if nothing is found.
+ */
+static inline unsigned int cpumask_find_and_set(volatile struct cpumask *srcp)
+{
+	return find_and_set_bit(cpumask_bits(srcp), small_cpumask_bits);
+}
+
 /**
  * for_each_cpu - iterate over every cpu in a mask
  * @cpu: the (optionally unsigned) integer iterator
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 2e5a95486a42..b2f095a9fc40 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -3345,28 +3345,6 @@ static inline void mm_cid_put(struct mm_struct *mm)
 	__mm_cid_put(mm, mm_cid_clear_lazy_put(cid));
 }
 
-static inline int __mm_cid_try_get(struct mm_struct *mm)
-{
-	struct cpumask *cpumask;
-	int cid;
-
-	cpumask = mm_cidmask(mm);
-	/*
-	 * Retry finding first zero bit if the mask is temporarily
-	 * filled. This only happens during concurrent remote-clear
-	 * which owns a cid without holding a rq lock.
-	 */
-	for (;;) {
-		cid = cpumask_first_zero(cpumask);
-		if (cid < nr_cpu_ids)
-			break;
-		cpu_relax();
-	}
-	if (cpumask_test_and_set_cpu(cid, cpumask))
-		return -1;
-	return cid;
-}
-
 /*
  * Save a snapshot of the current runqueue time of this cpu
  * with the per-cpu cid value, allowing to estimate how recently it was used.
@@ -3381,25 +3359,25 @@ static inline void mm_cid_snapshot_time(struct rq *rq, struct mm_struct *mm)
 
 static inline int __mm_cid_get(struct rq *rq, struct mm_struct *mm)
 {
+	struct cpumask *cpumask = mm_cidmask(mm);
 	int cid;
 
-	/*
-	 * All allocations (even those using the cid_lock) are lock-free. If
-	 * use_cid_lock is set, hold the cid_lock to perform cid allocation to
-	 * guarantee forward progress.
-	 */
+	/* All allocations (even those using the cid_lock) are lock-free. */
 	if (!READ_ONCE(use_cid_lock)) {
-		cid = __mm_cid_try_get(mm);
-		if (cid >= 0)
+		cid = cpumask_find_and_set(cpumask);
+		if (cid < nr_cpu_ids)
 			goto end;
-		raw_spin_lock(&cid_lock);
-	} else {
-		raw_spin_lock(&cid_lock);
-		cid = __mm_cid_try_get(mm);
-		if (cid >= 0)
-			goto unlock;
 	}
 
+	/*
+	 * If use_cid_lock is set, hold the cid_lock to perform cid
+	 * allocation to guarantee forward progress.
+	 */
+	raw_spin_lock(&cid_lock);
+	cid = cpumask_find_and_set(cpumask);
+	if (cid < nr_cpu_ids)
+		goto unlock;
+
 	/*
 	 * cid concurrently allocated. Retry while forcing following
 	 * allocations to use the cid_lock to ensure forward progress.
@@ -3415,9 +3393,9 @@ static inline int __mm_cid_get(struct rq *rq, struct mm_struct *mm)
 	 * all newcoming allocations observe the use_cid_lock flag set.
 	 */
 	do {
-		cid = __mm_cid_try_get(mm);
+		cid = cpumask_find_and_set(cpumask);
 		cpu_relax();
-	} while (cid < 0);
+	} while (cid >= nr_cpu_ids);
 	/*
 	 * Allocate before clearing use_cid_lock. Only care about
 	 * program order because this is for forward progress.
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 05/34] mips: sgi-ip30: rework heart_alloc_int()
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
                   ` (3 preceding siblings ...)
  2023-11-18 15:50 ` [PATCH 04/34] sched: add cpumask_find_and_set() and use it in __mm_cid_get() Yury Norov
@ 2023-11-18 15:50 ` Yury Norov
  2023-11-18 15:50 ` [PATCH 06/34] sparc: fix opencoded find_and_set_bit() in alloc_msi() Yury Norov
                   ` (29 subsequent siblings)
  34 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:50 UTC (permalink / raw)
  To: linux-kernel, Thomas Bogendoerfer, Yury Norov, linux-mips
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

heart_alloc_int() opencodes find_and_set_bit(). Switch it to using the
dedicated function, and make an nice one-liner.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 arch/mips/sgi-ip30/ip30-irq.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/arch/mips/sgi-ip30/ip30-irq.c b/arch/mips/sgi-ip30/ip30-irq.c
index 423c32cb66ed..3c4d4e947817 100644
--- a/arch/mips/sgi-ip30/ip30-irq.c
+++ b/arch/mips/sgi-ip30/ip30-irq.c
@@ -28,17 +28,9 @@ static DEFINE_PER_CPU(unsigned long, irq_enable_mask);
 
 static inline int heart_alloc_int(void)
 {
-	int bit;
+	int bit = find_and_set_bit(heart_irq_map, HEART_NUM_IRQS);
 
-again:
-	bit = find_first_zero_bit(heart_irq_map, HEART_NUM_IRQS);
-	if (bit >= HEART_NUM_IRQS)
-		return -ENOSPC;
-
-	if (test_and_set_bit(bit, heart_irq_map))
-		goto again;
-
-	return bit;
+	return bit < HEART_NUM_IRQS ? bit : -ENOSPC;
 }
 
 static void ip30_error_irq(struct irq_desc *desc)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 06/34] sparc: fix opencoded find_and_set_bit() in alloc_msi()
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
                   ` (4 preceding siblings ...)
  2023-11-18 15:50 ` [PATCH 05/34] mips: sgi-ip30: rework heart_alloc_int() Yury Norov
@ 2023-11-18 15:50 ` Yury Norov
  2023-11-18 15:50 ` [PATCH 07/34] perf/arm: optimize opencoded atomic find_bit() API Yury Norov
                   ` (28 subsequent siblings)
  34 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:50 UTC (permalink / raw)
  To: linux-kernel, David S. Miller, Rob Herring, Sam Ravnborg,
	Yury Norov, sparclinux
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

alloc_msi() opencodes find_and_clear_bit(). Switch it to using the
dedicated function, and make an nice one-liner.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 arch/sparc/kernel/pci_msi.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/sparc/kernel/pci_msi.c b/arch/sparc/kernel/pci_msi.c
index fc7402948b7b..91105c788d1d 100644
--- a/arch/sparc/kernel/pci_msi.c
+++ b/arch/sparc/kernel/pci_msi.c
@@ -96,14 +96,9 @@ static u32 pick_msiq(struct pci_pbm_info *pbm)
 
 static int alloc_msi(struct pci_pbm_info *pbm)
 {
-	int i;
-
-	for (i = 0; i < pbm->msi_num; i++) {
-		if (!test_and_set_bit(i, pbm->msi_bitmap))
-			return i + pbm->msi_first;
-	}
+	int i = find_and_set_bit(pbm->msi_bitmap, pbm->msi_num);
 
-	return -ENOENT;
+	return i < pbm->msi_num ? i + pbm->msi_first : -ENOENT;
 }
 
 static void free_msi(struct pci_pbm_info *pbm, int msi_num)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 07/34] perf/arm: optimize opencoded atomic find_bit() API
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
                   ` (5 preceding siblings ...)
  2023-11-18 15:50 ` [PATCH 06/34] sparc: fix opencoded find_and_set_bit() in alloc_msi() Yury Norov
@ 2023-11-18 15:50 ` Yury Norov
  2023-11-21 15:53   ` Will Deacon
  2023-11-18 15:50 ` [PATCH 08/34] drivers/perf: optimize ali_drw_get_counter_idx() by using find_bit() Yury Norov
                   ` (27 subsequent siblings)
  34 siblings, 1 reply; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:50 UTC (permalink / raw)
  To: linux-kernel, Will Deacon, Mark Rutland, linux-arm-kernel
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

Switch subsystem to use atomic find_bit() or atomic iterators as
appropriate.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/perf/arm-cci.c        | 23 +++++------------------
 drivers/perf/arm-ccn.c        | 10 ++--------
 drivers/perf/arm_dmc620_pmu.c |  9 ++-------
 drivers/perf/arm_pmuv3.c      |  8 ++------
 4 files changed, 11 insertions(+), 39 deletions(-)

diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c
index 61de861eaf91..70fbf9d09d37 100644
--- a/drivers/perf/arm-cci.c
+++ b/drivers/perf/arm-cci.c
@@ -320,12 +320,8 @@ static int cci400_get_event_idx(struct cci_pmu *cci_pmu,
 		return CCI400_PMU_CYCLE_CNTR_IDX;
 	}
 
-	for (idx = CCI400_PMU_CNTR0_IDX; idx <= CCI_PMU_CNTR_LAST(cci_pmu); ++idx)
-		if (!test_and_set_bit(idx, hw->used_mask))
-			return idx;
-
-	/* No counters available */
-	return -EAGAIN;
+	idx = find_and_set_bit(hw->used_mask, CCI_PMU_CNTR_LAST(cci_pmu) + 1);
+	return idx < CCI_PMU_CNTR_LAST(cci_pmu) + 1 ? idx : -EAGAIN;
 }
 
 static int cci400_validate_hw_event(struct cci_pmu *cci_pmu, unsigned long hw_event)
@@ -802,13 +798,8 @@ static int pmu_get_event_idx(struct cci_pmu_hw_events *hw, struct perf_event *ev
 	if (cci_pmu->model->get_event_idx)
 		return cci_pmu->model->get_event_idx(cci_pmu, hw, cci_event);
 
-	/* Generic code to find an unused idx from the mask */
-	for (idx = 0; idx <= CCI_PMU_CNTR_LAST(cci_pmu); idx++)
-		if (!test_and_set_bit(idx, hw->used_mask))
-			return idx;
-
-	/* No counters available */
-	return -EAGAIN;
+	idx = find_and_set_bit(hw->used_mask, CCI_PMU_CNTR_LAST(cci_pmu) + 1);
+	return idx < CCI_PMU_CNTR_LAST(cci_pmu) + 1 ? idx : -EAGAIN;
 }
 
 static int pmu_map_event(struct perf_event *event)
@@ -861,12 +852,8 @@ static void pmu_free_irq(struct cci_pmu *cci_pmu)
 {
 	int i;
 
-	for (i = 0; i < cci_pmu->nr_irqs; i++) {
-		if (!test_and_clear_bit(i, &cci_pmu->active_irqs))
-			continue;
-
+	for_each_test_and_clear_bit(i, &cci_pmu->active_irqs, cci_pmu->nr_irqs)
 		free_irq(cci_pmu->irqs[i], cci_pmu);
-	}
 }
 
 static u32 pmu_read_counter(struct perf_event *event)
diff --git a/drivers/perf/arm-ccn.c b/drivers/perf/arm-ccn.c
index 728d13d8e98a..d657701b1f23 100644
--- a/drivers/perf/arm-ccn.c
+++ b/drivers/perf/arm-ccn.c
@@ -589,15 +589,9 @@ static const struct attribute_group *arm_ccn_pmu_attr_groups[] = {
 
 static int arm_ccn_pmu_alloc_bit(unsigned long *bitmap, unsigned long size)
 {
-	int bit;
-
-	do {
-		bit = find_first_zero_bit(bitmap, size);
-		if (bit >= size)
-			return -EAGAIN;
-	} while (test_and_set_bit(bit, bitmap));
+	int bit = find_and_set_bit(bitmap, size);
 
-	return bit;
+	return bit < size ? bit : -EAGAIN;
 }
 
 /* All RN-I and RN-D nodes have identical PMUs */
diff --git a/drivers/perf/arm_dmc620_pmu.c b/drivers/perf/arm_dmc620_pmu.c
index 30cea6859574..e41c84dabc3e 100644
--- a/drivers/perf/arm_dmc620_pmu.c
+++ b/drivers/perf/arm_dmc620_pmu.c
@@ -303,13 +303,8 @@ static int dmc620_get_event_idx(struct perf_event *event)
 		end_idx = DMC620_PMU_MAX_COUNTERS;
 	}
 
-	for (idx = start_idx; idx < end_idx; ++idx) {
-		if (!test_and_set_bit(idx, dmc620_pmu->used_mask))
-			return idx;
-	}
-
-	/* The counters are all in use. */
-	return -EAGAIN;
+	idx = find_and_set_next_bit(dmc620_pmu->used_mask, end_idx, start_idx);
+	return idx < end_idx ? idx : -EAGAIN;
 }
 
 static inline
diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index 18b91b56af1d..784b0383e9f8 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -825,13 +825,9 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
 static int armv8pmu_get_single_idx(struct pmu_hw_events *cpuc,
 				    struct arm_pmu *cpu_pmu)
 {
-	int idx;
+	int idx = find_and_set_next_bit(cpuc->used_mask, cpu_pmu->num_events, ARMV8_IDX_COUNTER0);
 
-	for (idx = ARMV8_IDX_COUNTER0; idx < cpu_pmu->num_events; idx++) {
-		if (!test_and_set_bit(idx, cpuc->used_mask))
-			return idx;
-	}
-	return -EAGAIN;
+	return idx < cpu_pmu->num_events ? idx : -EAGAIN;
 }
 
 static int armv8pmu_get_chain_idx(struct pmu_hw_events *cpuc,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 08/34] drivers/perf: optimize ali_drw_get_counter_idx() by using find_bit()
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
                   ` (6 preceding siblings ...)
  2023-11-18 15:50 ` [PATCH 07/34] perf/arm: optimize opencoded atomic find_bit() API Yury Norov
@ 2023-11-18 15:50 ` Yury Norov
  2023-11-21 15:54   ` Will Deacon
  2023-11-18 15:50 ` [PATCH 09/34] dmaengine: idxd: optimize perfmon_assign_event() Yury Norov
                   ` (26 subsequent siblings)
  34 siblings, 1 reply; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:50 UTC (permalink / raw)
  To: linux-kernel, Shuai Xue, Will Deacon, Mark Rutland, linux-arm-kernel
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

The function searches used_mask for a set bit in a for-loop bit by bit.
We can do it faster by using atomic find_and_set_bit().

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/perf/alibaba_uncore_drw_pmu.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/perf/alibaba_uncore_drw_pmu.c b/drivers/perf/alibaba_uncore_drw_pmu.c
index 19d459a36be5..2a3b7701d568 100644
--- a/drivers/perf/alibaba_uncore_drw_pmu.c
+++ b/drivers/perf/alibaba_uncore_drw_pmu.c
@@ -274,15 +274,9 @@ static const struct attribute_group *ali_drw_pmu_attr_groups[] = {
 static int ali_drw_get_counter_idx(struct perf_event *event)
 {
 	struct ali_drw_pmu *drw_pmu = to_ali_drw_pmu(event->pmu);
-	int idx;
+	int idx = find_and_set_bit(drw_pmu->used_mask, ALI_DRW_PMU_COMMON_MAX_COUNTERS);
 
-	for (idx = 0; idx < ALI_DRW_PMU_COMMON_MAX_COUNTERS; ++idx) {
-		if (!test_and_set_bit(idx, drw_pmu->used_mask))
-			return idx;
-	}
-
-	/* The counters are all in use. */
-	return -EBUSY;
+	return idx < ALI_DRW_PMU_COMMON_MAX_COUNTERS ? idx : -EBUSY;
 }
 
 static u64 ali_drw_pmu_read_counter(struct perf_event *event)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 09/34] dmaengine: idxd: optimize perfmon_assign_event()
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
                   ` (7 preceding siblings ...)
  2023-11-18 15:50 ` [PATCH 08/34] drivers/perf: optimize ali_drw_get_counter_idx() by using find_bit() Yury Norov
@ 2023-11-18 15:50 ` Yury Norov
  2023-11-20 15:34   ` Dave Jiang
  2023-11-24 12:15   ` Vinod Koul
  2023-11-18 15:50 ` [PATCH 10/34] ath10k: optimize ath10k_snoc_napi_poll() by using find_bit() Yury Norov
                   ` (25 subsequent siblings)
  34 siblings, 2 replies; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:50 UTC (permalink / raw)
  To: linux-kernel, Fenghua Yu, Dave Jiang, Vinod Koul, dmaengine
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

The function searches used_mask for a set bit in a for-loop bit by bit.
We can do it faster by using atomic find_and_set_bit().

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/dma/idxd/perfmon.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/dma/idxd/perfmon.c b/drivers/dma/idxd/perfmon.c
index fdda6d604262..4dd9c0d979c3 100644
--- a/drivers/dma/idxd/perfmon.c
+++ b/drivers/dma/idxd/perfmon.c
@@ -134,13 +134,9 @@ static void perfmon_assign_hw_event(struct idxd_pmu *idxd_pmu,
 static int perfmon_assign_event(struct idxd_pmu *idxd_pmu,
 				struct perf_event *event)
 {
-	int i;
-
-	for (i = 0; i < IDXD_PMU_EVENT_MAX; i++)
-		if (!test_and_set_bit(i, idxd_pmu->used_mask))
-			return i;
+	int i = find_and_set_bit(idxd_pmu->used_mask, IDXD_PMU_EVENT_MAX);
 
-	return -EINVAL;
+	return i < IDXD_PMU_EVENT_MAX ? i : -EINVAL;
 }
 
 /*
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 10/34] ath10k: optimize ath10k_snoc_napi_poll() by using find_bit()
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
                   ` (8 preceding siblings ...)
  2023-11-18 15:50 ` [PATCH 09/34] dmaengine: idxd: optimize perfmon_assign_event() Yury Norov
@ 2023-11-18 15:50 ` Yury Norov
  2023-11-18 15:50 ` [PATCH 11/34] wifi: rtw88: optimize rtw_pci_tx_kick_off() " Yury Norov
                   ` (24 subsequent siblings)
  34 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:50 UTC (permalink / raw)
  To: linux-kernel, Kalle Valo, Jeff Johnson, ath10k, linux-wireless
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

ath10k_snoc_napi_poll() traverses pending_ce_irqs bitmap bit by bit.
We can do it faster by using for_each_test_and_clear_bit() iterator.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/net/wireless/ath/ath10k/snoc.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/snoc.c b/drivers/net/wireless/ath/ath10k/snoc.c
index 2c39bad7ebfb..a1db5a973780 100644
--- a/drivers/net/wireless/ath/ath10k/snoc.c
+++ b/drivers/net/wireless/ath/ath10k/snoc.c
@@ -1237,11 +1237,10 @@ static int ath10k_snoc_napi_poll(struct napi_struct *ctx, int budget)
 		return done;
 	}
 
-	for (ce_id = 0; ce_id < CE_COUNT; ce_id++)
-		if (test_and_clear_bit(ce_id, ar_snoc->pending_ce_irqs)) {
-			ath10k_ce_per_engine_service(ar, ce_id);
-			ath10k_ce_enable_interrupt(ar, ce_id);
-		}
+	for_each_test_and_clear_bit(ce_id, ar_snoc->pending_ce_irqs, CE_COUNT) {
+		ath10k_ce_per_engine_service(ar, ce_id);
+		ath10k_ce_enable_interrupt(ar, ce_id);
+	}
 
 	done = ath10k_htt_txrx_compl_task(ar, budget);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 11/34] wifi: rtw88: optimize rtw_pci_tx_kick_off() by using find_bit()
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
                   ` (9 preceding siblings ...)
  2023-11-18 15:50 ` [PATCH 10/34] ath10k: optimize ath10k_snoc_napi_poll() by using find_bit() Yury Norov
@ 2023-11-18 15:50 ` Yury Norov
  2023-11-18 15:50 ` [PATCH 12/34] wifi: intel: use atomic find_bit() API where appropriate Yury Norov
                   ` (23 subsequent siblings)
  34 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:50 UTC (permalink / raw)
  To: linux-kernel, Ping-Ke Shih, Kalle Valo, linux-wireless
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

rtw_pci_tx_kick_off() traverses tx_queued bitmap bit by bit. We can do it
faster by using atomic for_each_test_and_clear_bit() iterator.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/net/wireless/realtek/rtw88/pci.c | 5 ++---
 drivers/net/wireless/realtek/rtw89/pci.c | 5 +----
 2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtw88/pci.c b/drivers/net/wireless/realtek/rtw88/pci.c
index 2bfc0e822b8d..a0d69c75a381 100644
--- a/drivers/net/wireless/realtek/rtw88/pci.c
+++ b/drivers/net/wireless/realtek/rtw88/pci.c
@@ -789,9 +789,8 @@ static void rtw_pci_tx_kick_off(struct rtw_dev *rtwdev)
 	struct rtw_pci *rtwpci = (struct rtw_pci *)rtwdev->priv;
 	enum rtw_tx_queue_type queue;
 
-	for (queue = 0; queue < RTK_MAX_TX_QUEUE_NUM; queue++)
-		if (test_and_clear_bit(queue, rtwpci->tx_queued))
-			rtw_pci_tx_kick_off_queue(rtwdev, queue);
+	for_each_test_and_clear_bit(queue, rtwpci->tx_queued, RTK_MAX_TX_QUEUE_NUM)
+		rtw_pci_tx_kick_off_queue(rtwdev, queue);
 }
 
 static int rtw_pci_tx_write_data(struct rtw_dev *rtwdev,
diff --git a/drivers/net/wireless/realtek/rtw89/pci.c b/drivers/net/wireless/realtek/rtw89/pci.c
index 14ddb0d39e63..184d41b774d7 100644
--- a/drivers/net/wireless/realtek/rtw89/pci.c
+++ b/drivers/net/wireless/realtek/rtw89/pci.c
@@ -1077,10 +1077,7 @@ static void rtw89_pci_tx_kick_off_pending(struct rtw89_dev *rtwdev)
 	struct rtw89_pci_tx_ring *tx_ring;
 	int txch;
 
-	for (txch = 0; txch < RTW89_TXCH_NUM; txch++) {
-		if (!test_and_clear_bit(txch, rtwpci->kick_map))
-			continue;
-
+	for_each_test_and_clear_bit(txch, rtwpci->kick_map, RTW89_TXCH_NUM) {
 		tx_ring = &rtwpci->tx_rings[txch];
 		__rtw89_pci_tx_kick_off(rtwdev, tx_ring);
 	}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 12/34] wifi: intel: use atomic find_bit() API where appropriate
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
                   ` (10 preceding siblings ...)
  2023-11-18 15:50 ` [PATCH 11/34] wifi: rtw88: optimize rtw_pci_tx_kick_off() " Yury Norov
@ 2023-11-18 15:50 ` Yury Norov
  2023-11-19 19:58   ` Johannes Berg
  2023-11-18 15:50 ` [PATCH 13/34] KVM: x86: hyper-v: optimize and cleanup kvm_hv_process_stimers() Yury Norov
                   ` (22 subsequent siblings)
  34 siblings, 1 reply; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:50 UTC (permalink / raw)
  To: linux-kernel, Stanislaw Gruszka, Kalle Valo, Gregory Greenman,
	Hans de Goede, Johannes Berg, Kees Cook, Yury Norov,
	Miri Korenblit, linux-wireless
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

iwlegacy and iwlwifi code opencodes atomic bit allocation/traversing by
using loops. Switch it to use dedicated functions.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 .../net/wireless/intel/iwlegacy/4965-mac.c    |  7 ++-----
 drivers/net/wireless/intel/iwlegacy/common.c  |  8 ++------
 drivers/net/wireless/intel/iwlwifi/dvm/sta.c  |  8 ++------
 drivers/net/wireless/intel/iwlwifi/dvm/tx.c   | 19 ++++++++-----------
 4 files changed, 14 insertions(+), 28 deletions(-)

diff --git a/drivers/net/wireless/intel/iwlegacy/4965-mac.c b/drivers/net/wireless/intel/iwlegacy/4965-mac.c
index 69276266ce6f..8fb738c95cb4 100644
--- a/drivers/net/wireless/intel/iwlegacy/4965-mac.c
+++ b/drivers/net/wireless/intel/iwlegacy/4965-mac.c
@@ -2089,12 +2089,9 @@ il4965_txq_ctx_stop(struct il_priv *il)
 static int
 il4965_txq_ctx_activate_free(struct il_priv *il)
 {
-	int txq_id;
+	int txq_id = find_and_set_bit(&il->txq_ctx_active_msk, il->hw_params.max_txq_num);
 
-	for (txq_id = 0; txq_id < il->hw_params.max_txq_num; txq_id++)
-		if (!test_and_set_bit(txq_id, &il->txq_ctx_active_msk))
-			return txq_id;
-	return -1;
+	return txq_id < il->hw_params.max_txq_num ? txq_id : -1;
 }
 
 /*
diff --git a/drivers/net/wireless/intel/iwlegacy/common.c b/drivers/net/wireless/intel/iwlegacy/common.c
index 054fef680aba..c6353e17be50 100644
--- a/drivers/net/wireless/intel/iwlegacy/common.c
+++ b/drivers/net/wireless/intel/iwlegacy/common.c
@@ -2303,13 +2303,9 @@ EXPORT_SYMBOL(il_restore_stations);
 int
 il_get_free_ucode_key_idx(struct il_priv *il)
 {
-	int i;
-
-	for (i = 0; i < il->sta_key_max_num; i++)
-		if (!test_and_set_bit(i, &il->ucode_key_table))
-			return i;
+	int i = find_and_set_bit(&il->ucode_key_table, il->sta_key_max_num);
 
-	return WEP_INVALID_OFFSET;
+	return i < il->sta_key_max_num ? i : WEP_INVALID_OFFSET;
 }
 EXPORT_SYMBOL(il_get_free_ucode_key_idx);
 
diff --git a/drivers/net/wireless/intel/iwlwifi/dvm/sta.c b/drivers/net/wireless/intel/iwlwifi/dvm/sta.c
index 8b01ab986cb1..21e663d2bc44 100644
--- a/drivers/net/wireless/intel/iwlwifi/dvm/sta.c
+++ b/drivers/net/wireless/intel/iwlwifi/dvm/sta.c
@@ -719,13 +719,9 @@ void iwl_restore_stations(struct iwl_priv *priv, struct iwl_rxon_context *ctx)
 
 int iwl_get_free_ucode_key_offset(struct iwl_priv *priv)
 {
-	int i;
-
-	for (i = 0; i < priv->sta_key_max_num; i++)
-		if (!test_and_set_bit(i, &priv->ucode_key_table))
-			return i;
+	int i = find_and_set_bit(&priv->ucode_key_table, priv->sta_key_max_num);
 
-	return WEP_INVALID_OFFSET;
+	return i < priv->sta_key_max_num ? i : WEP_INVALID_OFFSET;
 }
 
 void iwl_dealloc_bcast_stations(struct iwl_priv *priv)
diff --git a/drivers/net/wireless/intel/iwlwifi/dvm/tx.c b/drivers/net/wireless/intel/iwlwifi/dvm/tx.c
index 111ed1873006..1b3dc99b968c 100644
--- a/drivers/net/wireless/intel/iwlwifi/dvm/tx.c
+++ b/drivers/net/wireless/intel/iwlwifi/dvm/tx.c
@@ -460,17 +460,14 @@ int iwlagn_tx_skb(struct iwl_priv *priv,
 
 static int iwlagn_alloc_agg_txq(struct iwl_priv *priv, int mq)
 {
-	int q;
-
-	for (q = IWLAGN_FIRST_AMPDU_QUEUE;
-	     q < priv->trans->trans_cfg->base_params->num_of_queues; q++) {
-		if (!test_and_set_bit(q, priv->agg_q_alloc)) {
-			priv->queue_to_mac80211[q] = mq;
-			return q;
-		}
-	}
-
-	return -ENOSPC;
+	int q = find_and_set_next_bit(priv->agg_q_alloc,
+				      priv->trans->trans_cfg->base_params->num_of_queues,
+				      IWLAGN_FIRST_AMPDU_QUEUE);
+	if (q >= priv->trans->trans_cfg->base_params->num_of_queues)
+		return -ENOSPC;
+
+	priv->queue_to_mac80211[q] = mq;
+	return q;
 }
 
 static void iwlagn_dealloc_agg_txq(struct iwl_priv *priv, int q)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 13/34] KVM: x86: hyper-v: optimize and cleanup kvm_hv_process_stimers()
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
                   ` (11 preceding siblings ...)
  2023-11-18 15:50 ` [PATCH 12/34] wifi: intel: use atomic find_bit() API where appropriate Yury Norov
@ 2023-11-18 15:50 ` Yury Norov
  2023-11-20 14:26   ` Vitaly Kuznetsov
  2023-11-18 15:50 ` [PATCH 14/34] PCI: hv: switch hv_get_dom_num() to use atomic find_bit() Yury Norov
                   ` (21 subsequent siblings)
  34 siblings, 1 reply; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:50 UTC (permalink / raw)
  To: linux-kernel, Vitaly Kuznetsov, Sean Christopherson,
	Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, kvm
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

The function traverses stimer_pending_bitmap n a for-loop bit by bit.
We can do it faster by using atomic find_and_set_bit().

While here, refactor the logic by decreasing indentation level
and dropping 2nd check for stimer->config.enable.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 arch/x86/kvm/hyperv.c | 39 +++++++++++++++++++--------------------
 1 file changed, 19 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 238afd7335e4..460e300b558b 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -870,27 +870,26 @@ void kvm_hv_process_stimers(struct kvm_vcpu *vcpu)
 	if (!hv_vcpu)
 		return;
 
-	for (i = 0; i < ARRAY_SIZE(hv_vcpu->stimer); i++)
-		if (test_and_clear_bit(i, hv_vcpu->stimer_pending_bitmap)) {
-			stimer = &hv_vcpu->stimer[i];
-			if (stimer->config.enable) {
-				exp_time = stimer->exp_time;
-
-				if (exp_time) {
-					time_now =
-						get_time_ref_counter(vcpu->kvm);
-					if (time_now >= exp_time)
-						stimer_expiration(stimer);
-				}
-
-				if ((stimer->config.enable) &&
-				    stimer->count) {
-					if (!stimer->msg_pending)
-						stimer_start(stimer);
-				} else
-					stimer_cleanup(stimer);
-			}
+	for_each_test_and_clear_bit(i, hv_vcpu->stimer_pending_bitmap,
+					ARRAY_SIZE(hv_vcpu->stimer)) {
+		stimer = &hv_vcpu->stimer[i];
+		if (!stimer->config.enable)
+			continue;
+
+		exp_time = stimer->exp_time;
+
+		if (exp_time) {
+			time_now = get_time_ref_counter(vcpu->kvm);
+			if (time_now >= exp_time)
+				stimer_expiration(stimer);
 		}
+
+		if (stimer->count) {
+			if (!stimer->msg_pending)
+				stimer_start(stimer);
+		} else
+			stimer_cleanup(stimer);
+	}
 }
 
 void kvm_hv_vcpu_uninit(struct kvm_vcpu *vcpu)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 14/34] PCI: hv: switch hv_get_dom_num() to use atomic find_bit()
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
                   ` (12 preceding siblings ...)
  2023-11-18 15:50 ` [PATCH 13/34] KVM: x86: hyper-v: optimize and cleanup kvm_hv_process_stimers() Yury Norov
@ 2023-11-18 15:50 ` Yury Norov
  2023-11-18 17:59   ` Michael Kelley
  2023-11-18 15:50 ` [PATCH 15/34] scsi: use atomic find_bit() API where appropriate Yury Norov
                   ` (20 subsequent siblings)
  34 siblings, 1 reply; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:50 UTC (permalink / raw)
  To: linux-kernel, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Lorenzo Pieralisi, Krzysztof Wilczyński,
	Rob Herring, Bjorn Helgaas, linux-hyperv, linux-pci
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

The function traverses bitmap with for_each_clear_bit() just to allocate
a bit atomically. We can do it better with a dedicated find_and_set_bit().

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/pci/controller/pci-hyperv.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 30c7dfeccb16..033b1fb7f4eb 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -3605,12 +3605,9 @@ static u16 hv_get_dom_num(u16 dom)
 	if (test_and_set_bit(dom, hvpci_dom_map) == 0)
 		return dom;
 
-	for_each_clear_bit(i, hvpci_dom_map, HVPCI_DOM_MAP_SIZE) {
-		if (test_and_set_bit(i, hvpci_dom_map) == 0)
-			return i;
-	}
+	i = find_and_set_bit(hvpci_dom_map, HVPCI_DOM_MAP_SIZE);
 
-	return HVPCI_DOM_INVALID;
+	return i < HVPCI_DOM_MAP_SIZE ? i : HVPCI_DOM_INVALID;
 }
 
 /**
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 15/34] scsi: use atomic find_bit() API where appropriate
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
                   ` (13 preceding siblings ...)
  2023-11-18 15:50 ` [PATCH 14/34] PCI: hv: switch hv_get_dom_num() to use atomic find_bit() Yury Norov
@ 2023-11-18 15:50 ` Yury Norov
  2023-11-18 16:30   ` Bart Van Assche
  2023-11-18 15:50 ` [PATCH 16/34] powerpc: " Yury Norov
                   ` (19 subsequent siblings)
  34 siblings, 1 reply; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:50 UTC (permalink / raw)
  To: linux-kernel, Sathya Prakash Veerichetty, Kashyap Desai,
	Sumit Saxena, Sreekanth Reddy, James E.J. Bottomley,
	Martin K. Petersen, Nilesh Javali, Manish Rangankar,
	GR-QLogic-Storage-Upstream, mpi3mr-linuxdrv.pdl, linux-scsi
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

SCSI code opencodes atomic bit allocation/traversing generic routines.
Switch it to use dedicated functions.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/scsi/mpi3mr/mpi3mr_os.c | 21 ++++++---------------
 drivers/scsi/qedi/qedi_main.c   |  9 +--------
 drivers/scsi/scsi_lib.c         |  5 ++---
 3 files changed, 9 insertions(+), 26 deletions(-)

diff --git a/drivers/scsi/mpi3mr/mpi3mr_os.c b/drivers/scsi/mpi3mr/mpi3mr_os.c
index 040031eb0c12..11139a2008fd 100644
--- a/drivers/scsi/mpi3mr/mpi3mr_os.c
+++ b/drivers/scsi/mpi3mr/mpi3mr_os.c
@@ -2276,13 +2276,9 @@ static void mpi3mr_dev_rmhs_send_tm(struct mpi3mr_ioc *mrioc, u16 handle,
 	if (drv_cmd)
 		goto issue_cmd;
 	do {
-		cmd_idx = find_first_zero_bit(mrioc->devrem_bitmap,
-		    MPI3MR_NUM_DEVRMCMD);
-		if (cmd_idx < MPI3MR_NUM_DEVRMCMD) {
-			if (!test_and_set_bit(cmd_idx, mrioc->devrem_bitmap))
-				break;
-			cmd_idx = MPI3MR_NUM_DEVRMCMD;
-		}
+		cmd_idx = find_and_set_bit(mrioc->devrem_bitmap, MPI3MR_NUM_DEVRMCMD);
+		if (cmd_idx < MPI3MR_NUM_DEVRMCMD)
+			break;
 	} while (retrycount--);
 
 	if (cmd_idx >= MPI3MR_NUM_DEVRMCMD) {
@@ -2417,14 +2413,9 @@ static void mpi3mr_send_event_ack(struct mpi3mr_ioc *mrioc, u8 event,
 	    "sending event ack in the top half for event(0x%02x), event_ctx(0x%08x)\n",
 	    event, event_ctx);
 	do {
-		cmd_idx = find_first_zero_bit(mrioc->evtack_cmds_bitmap,
-		    MPI3MR_NUM_EVTACKCMD);
-		if (cmd_idx < MPI3MR_NUM_EVTACKCMD) {
-			if (!test_and_set_bit(cmd_idx,
-			    mrioc->evtack_cmds_bitmap))
-				break;
-			cmd_idx = MPI3MR_NUM_EVTACKCMD;
-		}
+		cmd_idx = find_and_set_bit(mrioc->evtack_cmds_bitmap, MPI3MR_NUM_EVTACKCMD);
+		if (cmd_idx < MPI3MR_NUM_EVTACKCMD)
+			break;
 	} while (retrycount--);
 
 	if (cmd_idx >= MPI3MR_NUM_EVTACKCMD) {
diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
index cd0180b1f5b9..2f940c6898ef 100644
--- a/drivers/scsi/qedi/qedi_main.c
+++ b/drivers/scsi/qedi/qedi_main.c
@@ -1824,20 +1824,13 @@ int qedi_get_task_idx(struct qedi_ctx *qedi)
 {
 	s16 tmp_idx;
 
-again:
-	tmp_idx = find_first_zero_bit(qedi->task_idx_map,
-				      MAX_ISCSI_TASK_ENTRIES);
+	tmp_idx = find_and_set_bit(qedi->task_idx_map, MAX_ISCSI_TASK_ENTRIES);
 
 	if (tmp_idx >= MAX_ISCSI_TASK_ENTRIES) {
 		QEDI_ERR(&qedi->dbg_ctx, "FW task context pool is full.\n");
 		tmp_idx = -1;
-		goto err_idx;
 	}
 
-	if (test_and_set_bit(tmp_idx, qedi->task_idx_map))
-		goto again;
-
-err_idx:
 	return tmp_idx;
 }
 
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index cf3864f72093..4460a37f4864 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2499,9 +2499,8 @@ void scsi_evt_thread(struct work_struct *work)
 
 	sdev = container_of(work, struct scsi_device, event_work);
 
-	for (evt_type = SDEV_EVT_FIRST; evt_type <= SDEV_EVT_LAST; evt_type++)
-		if (test_and_clear_bit(evt_type, sdev->pending_events))
-			sdev_evt_send_simple(sdev, evt_type, GFP_KERNEL);
+	for_each_test_and_clear_bit(evt_type, sdev->pending_events, SDEV_EVT_LAST)
+		sdev_evt_send_simple(sdev, evt_type, GFP_KERNEL);
 
 	while (1) {
 		struct scsi_event *evt;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 16/34] powerpc: use atomic find_bit() API where appropriate
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
                   ` (14 preceding siblings ...)
  2023-11-18 15:50 ` [PATCH 15/34] scsi: use atomic find_bit() API where appropriate Yury Norov
@ 2023-11-18 15:50 ` Yury Norov
  2023-11-18 15:50 ` [PATCH 17/34] iommu: " Yury Norov
                   ` (18 subsequent siblings)
  34 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:50 UTC (permalink / raw)
  To: linux-kernel, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Yury Norov, Colin Ian King, linuxppc-dev
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

Fix opencoded find_and_{set,clear}_bit() by using dedicated functions.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 arch/powerpc/mm/book3s32/mmu_context.c     | 10 ++---
 arch/powerpc/platforms/pasemi/dma_lib.c    | 45 +++++-----------------
 arch/powerpc/platforms/powernv/pci-sriov.c | 12 ++----
 3 files changed, 17 insertions(+), 50 deletions(-)

diff --git a/arch/powerpc/mm/book3s32/mmu_context.c b/arch/powerpc/mm/book3s32/mmu_context.c
index 1922f9a6b058..7db19f173c2e 100644
--- a/arch/powerpc/mm/book3s32/mmu_context.c
+++ b/arch/powerpc/mm/book3s32/mmu_context.c
@@ -50,13 +50,11 @@ static unsigned long context_map[LAST_CONTEXT / BITS_PER_LONG + 1];
 
 unsigned long __init_new_context(void)
 {
-	unsigned long ctx = next_mmu_context;
+	unsigned long ctx;
 
-	while (test_and_set_bit(ctx, context_map)) {
-		ctx = find_next_zero_bit(context_map, LAST_CONTEXT+1, ctx);
-		if (ctx > LAST_CONTEXT)
-			ctx = 0;
-	}
+	ctx = find_and_set_next_bit(context_map, LAST_CONTEXT + 1, next_mmu_context);
+	if (ctx > LAST_CONTEXT)
+		ctx = 0;
 	next_mmu_context = (ctx + 1) & LAST_CONTEXT;
 
 	return ctx;
diff --git a/arch/powerpc/platforms/pasemi/dma_lib.c b/arch/powerpc/platforms/pasemi/dma_lib.c
index 1be1f18f6f09..906dabee0132 100644
--- a/arch/powerpc/platforms/pasemi/dma_lib.c
+++ b/arch/powerpc/platforms/pasemi/dma_lib.c
@@ -118,14 +118,9 @@ static int pasemi_alloc_tx_chan(enum pasemi_dmachan_type type)
 		limit = MAX_TXCH;
 		break;
 	}
-retry:
-	bit = find_next_bit(txch_free, MAX_TXCH, start);
-	if (bit >= limit)
-		return -ENOSPC;
-	if (!test_and_clear_bit(bit, txch_free))
-		goto retry;
-
-	return bit;
+
+	bit = find_and_clear_next_bit(txch_free, MAX_TXCH, start);
+	return bit < limit ? bit : -ENOSPC;
 }
 
 static void pasemi_free_tx_chan(int chan)
@@ -136,15 +131,9 @@ static void pasemi_free_tx_chan(int chan)
 
 static int pasemi_alloc_rx_chan(void)
 {
-	int bit;
-retry:
-	bit = find_first_bit(rxch_free, MAX_RXCH);
-	if (bit >= MAX_TXCH)
-		return -ENOSPC;
-	if (!test_and_clear_bit(bit, rxch_free))
-		goto retry;
-
-	return bit;
+	int bit = find_and_clear_bit(rxch_free, MAX_RXCH);
+
+	return bit < MAX_TXCH ? bit : -ENOSPC;
 }
 
 static void pasemi_free_rx_chan(int chan)
@@ -374,16 +363,9 @@ EXPORT_SYMBOL(pasemi_dma_free_buf);
  */
 int pasemi_dma_alloc_flag(void)
 {
-	int bit;
+	int bit = find_and_clear_bit(flags_free, MAX_FLAGS);
 
-retry:
-	bit = find_first_bit(flags_free, MAX_FLAGS);
-	if (bit >= MAX_FLAGS)
-		return -ENOSPC;
-	if (!test_and_clear_bit(bit, flags_free))
-		goto retry;
-
-	return bit;
+	return bit < MAX_FLAGS ? bit : -ENOSPC;
 }
 EXPORT_SYMBOL(pasemi_dma_alloc_flag);
 
@@ -439,16 +421,9 @@ EXPORT_SYMBOL(pasemi_dma_clear_flag);
  */
 int pasemi_dma_alloc_fun(void)
 {
-	int bit;
-
-retry:
-	bit = find_first_bit(fun_free, MAX_FLAGS);
-	if (bit >= MAX_FLAGS)
-		return -ENOSPC;
-	if (!test_and_clear_bit(bit, fun_free))
-		goto retry;
+	int bit = find_and_clear_bit(fun_free, MAX_FLAGS);
 
-	return bit;
+	return bit < MAX_FLAGS ? bit : -ENOSPC;
 }
 EXPORT_SYMBOL(pasemi_dma_alloc_fun);
 
diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c b/arch/powerpc/platforms/powernv/pci-sriov.c
index 59882da3e742..640e387e6d83 100644
--- a/arch/powerpc/platforms/powernv/pci-sriov.c
+++ b/arch/powerpc/platforms/powernv/pci-sriov.c
@@ -397,18 +397,12 @@ static int64_t pnv_ioda_map_m64_single(struct pnv_phb *phb,
 
 static int pnv_pci_alloc_m64_bar(struct pnv_phb *phb, struct pnv_iov_data *iov)
 {
-	int win;
+	int win = find_and_set_bit(&phb->ioda.m64_bar_alloc, phb->ioda.m64_bar_idx + 1);
 
-	do {
-		win = find_next_zero_bit(&phb->ioda.m64_bar_alloc,
-				phb->ioda.m64_bar_idx + 1, 0);
-
-		if (win >= phb->ioda.m64_bar_idx + 1)
-			return -1;
-	} while (test_and_set_bit(win, &phb->ioda.m64_bar_alloc));
+	if (win >= phb->ioda.m64_bar_idx + 1)
+		return -1;
 
 	set_bit(win, iov->used_m64_bar_mask);
-
 	return win;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 17/34] iommu: use atomic find_bit() API where appropriate
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
                   ` (15 preceding siblings ...)
  2023-11-18 15:50 ` [PATCH 16/34] powerpc: " Yury Norov
@ 2023-11-18 15:50 ` Yury Norov
  2023-11-18 15:50 ` [PATCH 18/34] media: radio-shark: " Yury Norov
                   ` (17 subsequent siblings)
  34 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:50 UTC (permalink / raw)
  To: linux-kernel, Will Deacon, Robin Murphy, Joerg Roedel,
	Andy Gross, Bjorn Andersson, Konrad Dybcio, Yury Norov,
	linux-arm-kernel, iommu, linux-arm-msm
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

Fix opencoded find_and_set_next_bit() in __arm_smmu_alloc_bitmap()
and msm_iommu_alloc_ctx(), and make them nice one-liner wrappers.

While here, refactor msm_iommu_attach_dev() and msm_iommu_alloc_ctx()
so that error codes don't mismatch.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/iommu/arm/arm-smmu/arm-smmu.h | 10 ++--------
 drivers/iommu/msm_iommu.c             | 18 ++++--------------
 2 files changed, 6 insertions(+), 22 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index 703fd5817ec1..004a4704ebf1 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -453,15 +453,9 @@ struct arm_smmu_impl {
 
 static inline int __arm_smmu_alloc_bitmap(unsigned long *map, int start, int end)
 {
-	int idx;
+	int idx = find_and_set_next_bit(map, end, start);
 
-	do {
-		idx = find_next_zero_bit(map, end, start);
-		if (idx == end)
-			return -ENOSPC;
-	} while (test_and_set_bit(idx, map));
-
-	return idx;
+	return idx < end ? idx : -ENOSPC;
 }
 
 static inline void __iomem *arm_smmu_page(struct arm_smmu_device *smmu, int n)
diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index f86af9815d6f..67124f4228b1 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -185,17 +185,9 @@ static const struct iommu_flush_ops msm_iommu_flush_ops = {
 	.tlb_add_page = __flush_iotlb_page,
 };
 
-static int msm_iommu_alloc_ctx(unsigned long *map, int start, int end)
+static int msm_iommu_alloc_ctx(struct msm_iommu_dev *iommu)
 {
-	int idx;
-
-	do {
-		idx = find_next_zero_bit(map, end, start);
-		if (idx == end)
-			return -ENOSPC;
-	} while (test_and_set_bit(idx, map));
-
-	return idx;
+	return find_and_set_bit(iommu->context_map, iommu->ncb);
 }
 
 static void msm_iommu_free_ctx(unsigned long *map, int idx)
@@ -418,10 +410,8 @@ static int msm_iommu_attach_dev(struct iommu_domain *domain, struct device *dev)
 					ret = -EEXIST;
 					goto fail;
 				}
-				master->num =
-					msm_iommu_alloc_ctx(iommu->context_map,
-							    0, iommu->ncb);
-				if (IS_ERR_VALUE(master->num)) {
+				master->num = msm_iommu_alloc_ctx(iommu);
+				if (master->num >= iommu->ncb) {
 					ret = -ENODEV;
 					goto fail;
 				}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 18/34] media: radio-shark: use atomic find_bit() API where appropriate
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
                   ` (16 preceding siblings ...)
  2023-11-18 15:50 ` [PATCH 17/34] iommu: " Yury Norov
@ 2023-11-18 15:50 ` Yury Norov
  2023-11-18 15:50 ` [PATCH 19/34] sfc: switch to using " Yury Norov
                   ` (16 subsequent siblings)
  34 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:50 UTC (permalink / raw)
  To: linux-kernel, Hans Verkuil, Mauro Carvalho Chehab, linux-media
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

Despite that it's only 2- or 3-bit maps, convert for-loop followed by
test_bit() to for_each_test_and_clear_bit() as it makes the code cleaner.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/media/radio/radio-shark.c  | 5 +----
 drivers/media/radio/radio-shark2.c | 5 +----
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/media/radio/radio-shark.c b/drivers/media/radio/radio-shark.c
index 127a3be0e0f0..0c50b3a9623e 100644
--- a/drivers/media/radio/radio-shark.c
+++ b/drivers/media/radio/radio-shark.c
@@ -158,10 +158,7 @@ static void shark_led_work(struct work_struct *work)
 		container_of(work, struct shark_device, led_work);
 	int i, res, brightness, actual_len;
 
-	for (i = 0; i < 3; i++) {
-		if (!test_and_clear_bit(i, &shark->brightness_new))
-			continue;
-
+	for_each_test_and_clear_bit(i, &shark->brightness_new, 3) {
 		brightness = atomic_read(&shark->brightness[i]);
 		memset(shark->transfer_buffer, 0, TB_LEN);
 		if (i != RED_LED) {
diff --git a/drivers/media/radio/radio-shark2.c b/drivers/media/radio/radio-shark2.c
index f1c5c0a6a335..d9ef241e1778 100644
--- a/drivers/media/radio/radio-shark2.c
+++ b/drivers/media/radio/radio-shark2.c
@@ -145,10 +145,7 @@ static void shark_led_work(struct work_struct *work)
 		container_of(work, struct shark_device, led_work);
 	int i, res, brightness, actual_len;
 
-	for (i = 0; i < 2; i++) {
-		if (!test_and_clear_bit(i, &shark->brightness_new))
-			continue;
-
+	for_each_test_and_clear_bit(i, &shark->brightness_new, 2) {
 		brightness = atomic_read(&shark->brightness[i]);
 		memset(shark->transfer_buffer, 0, TB_LEN);
 		shark->transfer_buffer[0] = 0x83 + i;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 19/34] sfc: switch to using atomic find_bit() API where appropriate
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
                   ` (17 preceding siblings ...)
  2023-11-18 15:50 ` [PATCH 18/34] media: radio-shark: " Yury Norov
@ 2023-11-18 15:50 ` Yury Norov
  2023-11-21 19:46   ` Edward Cree
  2023-11-18 15:50 ` [PATCH 20/34] tty: nozomi: optimize interrupt_handler() Yury Norov
                   ` (15 subsequent siblings)
  34 siblings, 1 reply; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:50 UTC (permalink / raw)
  To: linux-kernel, Edward Cree, Martin Habets, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Yury Norov, netdev,
	linux-net-drivers
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

SFC code traverses rps_slot_map and rxq_retry_mask bit by bit. We can do
it better by using dedicated atomic find_bit() functions, because they
skip already clear bits.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/net/ethernet/sfc/rx_common.c         |  4 +---
 drivers/net/ethernet/sfc/siena/rx_common.c   |  4 +---
 drivers/net/ethernet/sfc/siena/siena_sriov.c | 14 ++++++--------
 3 files changed, 8 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/sfc/rx_common.c b/drivers/net/ethernet/sfc/rx_common.c
index d2f35ee15eff..0112968b3fe7 100644
--- a/drivers/net/ethernet/sfc/rx_common.c
+++ b/drivers/net/ethernet/sfc/rx_common.c
@@ -950,9 +950,7 @@ int efx_filter_rfs(struct net_device *net_dev, const struct sk_buff *skb,
 	int rc;
 
 	/* find a free slot */
-	for (slot_idx = 0; slot_idx < EFX_RPS_MAX_IN_FLIGHT; slot_idx++)
-		if (!test_and_set_bit(slot_idx, &efx->rps_slot_map))
-			break;
+	slot_idx = find_and_set_bit(&efx->rps_slot_map, EFX_RPS_MAX_IN_FLIGHT);
 	if (slot_idx >= EFX_RPS_MAX_IN_FLIGHT)
 		return -EBUSY;
 
diff --git a/drivers/net/ethernet/sfc/siena/rx_common.c b/drivers/net/ethernet/sfc/siena/rx_common.c
index 4579f43484c3..160b16aa7486 100644
--- a/drivers/net/ethernet/sfc/siena/rx_common.c
+++ b/drivers/net/ethernet/sfc/siena/rx_common.c
@@ -958,9 +958,7 @@ int efx_siena_filter_rfs(struct net_device *net_dev, const struct sk_buff *skb,
 	int rc;
 
 	/* find a free slot */
-	for (slot_idx = 0; slot_idx < EFX_RPS_MAX_IN_FLIGHT; slot_idx++)
-		if (!test_and_set_bit(slot_idx, &efx->rps_slot_map))
-			break;
+	slot_idx = find_and_set_bit(&efx->rps_slot_map, EFX_RPS_MAX_IN_FLIGHT);
 	if (slot_idx >= EFX_RPS_MAX_IN_FLIGHT)
 		return -EBUSY;
 
diff --git a/drivers/net/ethernet/sfc/siena/siena_sriov.c b/drivers/net/ethernet/sfc/siena/siena_sriov.c
index 8353c15dc233..554b799288b8 100644
--- a/drivers/net/ethernet/sfc/siena/siena_sriov.c
+++ b/drivers/net/ethernet/sfc/siena/siena_sriov.c
@@ -722,14 +722,12 @@ static int efx_vfdi_fini_all_queues(struct siena_vf *vf)
 					     efx_vfdi_flush_wake(vf),
 					     timeout);
 		rxqs_count = 0;
-		for (index = 0; index < count; ++index) {
-			if (test_and_clear_bit(index, vf->rxq_retry_mask)) {
-				atomic_dec(&vf->rxq_retry_count);
-				MCDI_SET_ARRAY_DWORD(
-					inbuf, FLUSH_RX_QUEUES_IN_QID_OFST,
-					rxqs_count, vf_offset + index);
-				rxqs_count++;
-			}
+		for_each_test_and_clear_bit(index, vf->rxq_retry_mask, count) {
+			atomic_dec(&vf->rxq_retry_count);
+			MCDI_SET_ARRAY_DWORD(
+				inbuf, FLUSH_RX_QUEUES_IN_QID_OFST,
+				rxqs_count, vf_offset + index);
+			rxqs_count++;
 		}
 	}
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 20/34] tty: nozomi: optimize interrupt_handler()
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
                   ` (18 preceding siblings ...)
  2023-11-18 15:50 ` [PATCH 19/34] sfc: switch to using " Yury Norov
@ 2023-11-18 15:50 ` Yury Norov
  2023-11-18 15:50 ` [PATCH 21/34] usb: cdc-acm: optimize acm_softint() Yury Norov
                   ` (14 subsequent siblings)
  34 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:50 UTC (permalink / raw)
  To: linux-kernel, Greg Kroah-Hartman, Jiri Slaby, linux-serial
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

In exit path of interrupt_handler(), dc->flip map is traversed bit by
bit to find and clear set bits and call tty_flip_buffer_push() for
corresponding ports.

We can do it better by using for_each_test_and_clear_bit(), because it
skips already clear bits.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/tty/nozomi.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/tty/nozomi.c b/drivers/tty/nozomi.c
index 02cd40147b3a..de0503247391 100644
--- a/drivers/tty/nozomi.c
+++ b/drivers/tty/nozomi.c
@@ -1220,9 +1220,8 @@ static irqreturn_t interrupt_handler(int irq, void *dev_id)
 exit_handler:
 	spin_unlock(&dc->spin_mutex);
 
-	for (a = 0; a < NOZOMI_MAX_PORTS; a++)
-		if (test_and_clear_bit(a, &dc->flip))
-			tty_flip_buffer_push(&dc->port[a].port);
+	for_each_test_and_clear_bit(a, &dc->flip, NOZOMI_MAX_PORTS)
+		tty_flip_buffer_push(&dc->port[a].port);
 
 	return IRQ_HANDLED;
 none:
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 21/34] usb: cdc-acm: optimize acm_softint()
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
                   ` (19 preceding siblings ...)
  2023-11-18 15:50 ` [PATCH 20/34] tty: nozomi: optimize interrupt_handler() Yury Norov
@ 2023-11-18 15:50 ` Yury Norov
  2023-11-20 11:39   ` Oliver Neukum
  2023-11-18 15:50 ` [PATCH 22/34] block: null_blk: fix opencoded find_and_set_bit() in get_tag() Yury Norov
                   ` (13 subsequent siblings)
  34 siblings, 1 reply; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:50 UTC (permalink / raw)
  To: linux-kernel, Oliver Neukum, Greg Kroah-Hartman, linux-usb
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

acm_softint(), uses for-loop to traverse urbs_in_error_delay bitmap
bit by bit to find and clear set bits.

We can do it better by using for_each_test_and_clear_bit(), because it
doesn't test already clear bits.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/usb/class/cdc-acm.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
index a1f4e1ead97f..8664b63050b0 100644
--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -613,9 +613,8 @@ static void acm_softint(struct work_struct *work)
 	}
 
 	if (test_and_clear_bit(ACM_ERROR_DELAY, &acm->flags)) {
-		for (i = 0; i < acm->rx_buflimit; i++)
-			if (test_and_clear_bit(i, &acm->urbs_in_error_delay))
-				acm_submit_read_urb(acm, i, GFP_KERNEL);
+		for_each_test_and_clear_bit(i, &acm->urbs_in_error_delay, acm->rx_buflimit)
+			acm_submit_read_urb(acm, i, GFP_KERNEL);
 	}
 
 	if (test_and_clear_bit(EVENT_TTY_WAKEUP, &acm->flags))
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 22/34] block: null_blk: fix opencoded find_and_set_bit() in get_tag()
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
                   ` (20 preceding siblings ...)
  2023-11-18 15:50 ` [PATCH 21/34] usb: cdc-acm: optimize acm_softint() Yury Norov
@ 2023-11-18 15:50 ` Yury Norov
  2023-11-18 15:50 ` [PATCH 23/34] RDMA/rtrs: fix opencoded find_and_set_bit_lock() in __rtrs_get_permit() Yury Norov
                   ` (12 subsequent siblings)
  34 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:50 UTC (permalink / raw)
  To: linux-kernel, Jens Axboe, Damien Le Moal, Chaitanya Kulkarni,
	Ming Lei, Johannes Thumshirn, Chengming Zhou, Nitesh Shetty,
	Akinobu Mita, Shin'ichiro Kawasaki, Yury Norov, linux-block
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

get_tag() opencodes find_and_set_bit(). Switch the code to use the
dedicated function, and get rid of get_tag entirely.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/block/null_blk/main.c | 41 +++++++++++------------------------
 1 file changed, 13 insertions(+), 28 deletions(-)

diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
index 22a3cf7f32e2..a41d146663e1 100644
--- a/drivers/block/null_blk/main.c
+++ b/drivers/block/null_blk/main.c
@@ -760,19 +760,6 @@ static void put_tag(struct nullb_queue *nq, unsigned int tag)
 		wake_up(&nq->wait);
 }
 
-static unsigned int get_tag(struct nullb_queue *nq)
-{
-	unsigned int tag;
-
-	do {
-		tag = find_first_zero_bit(nq->tag_map, nq->queue_depth);
-		if (tag >= nq->queue_depth)
-			return -1U;
-	} while (test_and_set_bit_lock(tag, nq->tag_map));
-
-	return tag;
-}
-
 static void free_cmd(struct nullb_cmd *cmd)
 {
 	put_tag(cmd->nq, cmd->tag);
@@ -782,24 +769,22 @@ static enum hrtimer_restart null_cmd_timer_expired(struct hrtimer *timer);
 
 static struct nullb_cmd *__alloc_cmd(struct nullb_queue *nq)
 {
+	unsigned int tag = find_and_set_bit_lock(nq->tag_map, nq->queue_depth);
 	struct nullb_cmd *cmd;
-	unsigned int tag;
-
-	tag = get_tag(nq);
-	if (tag != -1U) {
-		cmd = &nq->cmds[tag];
-		cmd->tag = tag;
-		cmd->error = BLK_STS_OK;
-		cmd->nq = nq;
-		if (nq->dev->irqmode == NULL_IRQ_TIMER) {
-			hrtimer_init(&cmd->timer, CLOCK_MONOTONIC,
-				     HRTIMER_MODE_REL);
-			cmd->timer.function = null_cmd_timer_expired;
-		}
-		return cmd;
+
+	if (tag >= nq->queue_depth)
+		return NULL;
+
+	cmd = &nq->cmds[tag];
+	cmd->tag = tag;
+	cmd->error = BLK_STS_OK;
+	cmd->nq = nq;
+	if (nq->dev->irqmode == NULL_IRQ_TIMER) {
+		hrtimer_init(&cmd->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+		cmd->timer.function = null_cmd_timer_expired;
 	}
 
-	return NULL;
+	return cmd;
 }
 
 static struct nullb_cmd *alloc_cmd(struct nullb_queue *nq, struct bio *bio)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 23/34] RDMA/rtrs: fix opencoded find_and_set_bit_lock() in __rtrs_get_permit()
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
                   ` (21 preceding siblings ...)
  2023-11-18 15:50 ` [PATCH 22/34] block: null_blk: fix opencoded find_and_set_bit() in get_tag() Yury Norov
@ 2023-11-18 15:50 ` Yury Norov
  2023-11-18 15:50 ` [PATCH 24/34] mISDN: optimize get_free_devid() Yury Norov
                   ` (11 subsequent siblings)
  34 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:50 UTC (permalink / raw)
  To: linux-kernel, Md. Haris Iqbal, Jack Wang, Jason Gunthorpe,
	Leon Romanovsky, linux-rdma
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

The function opencodes find_and_set_bit_lock() with a while-loop polling
on test_and_set_bit_lock(). Use a dedicated find_and_set_bit_lock()
instead.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/infiniband/ulp/rtrs/rtrs-clt.c | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
index 07261523c554..2f3b0ad42e8a 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
@@ -72,18 +72,9 @@ __rtrs_get_permit(struct rtrs_clt_sess *clt, enum rtrs_clt_con_type con_type)
 	struct rtrs_permit *permit;
 	int bit;
 
-	/*
-	 * Adapted from null_blk get_tag(). Callers from different cpus may
-	 * grab the same bit, since find_first_zero_bit is not atomic.
-	 * But then the test_and_set_bit_lock will fail for all the
-	 * callers but one, so that they will loop again.
-	 * This way an explicit spinlock is not required.
-	 */
-	do {
-		bit = find_first_zero_bit(clt->permits_map, max_depth);
-		if (bit >= max_depth)
-			return NULL;
-	} while (test_and_set_bit_lock(bit, clt->permits_map));
+	bit = find_and_set_bit_lock(clt->permits_map, max_depth);
+	if (bit >= max_depth)
+		return NULL;
 
 	permit = get_permit(clt, bit);
 	WARN_ON(permit->mem_id != bit);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 24/34] mISDN: optimize get_free_devid()
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
                   ` (22 preceding siblings ...)
  2023-11-18 15:50 ` [PATCH 23/34] RDMA/rtrs: fix opencoded find_and_set_bit_lock() in __rtrs_get_permit() Yury Norov
@ 2023-11-18 15:50 ` Yury Norov
  2023-11-18 15:50 ` [PATCH 25/34] media: em28xx: cx231xx: fix opencoded find_and_set_bit() Yury Norov
                   ` (10 subsequent siblings)
  34 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:50 UTC (permalink / raw)
  To: linux-kernel, Karsten Keil, netdev
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

get_free_devid() traverses each bit in device_ids in an open-coded loop.
We can do it faster by using dedicated find_and_set_bit().

It makes the whole function a nice one-liner, and because MAX_DEVICE_ID
is a small constant-time value (63), on 64-bit platforms find_and_set_bit()
call will be optimized to:

	ffs();
	test_and_set_bit().

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/isdn/mISDN/core.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/isdn/mISDN/core.c b/drivers/isdn/mISDN/core.c
index ab8513a7acd5..3f97db006cf3 100644
--- a/drivers/isdn/mISDN/core.c
+++ b/drivers/isdn/mISDN/core.c
@@ -197,14 +197,9 @@ get_mdevice_count(void)
 static int
 get_free_devid(void)
 {
-	u_int	i;
+	u_int i = find_and_set_bit((u_long *)&device_ids, MAX_DEVICE_ID + 1);
 
-	for (i = 0; i <= MAX_DEVICE_ID; i++)
-		if (!test_and_set_bit(i, (u_long *)&device_ids))
-			break;
-	if (i > MAX_DEVICE_ID)
-		return -EBUSY;
-	return i;
+	return i <= MAX_DEVICE_ID ? i : -EBUSY;
 }
 
 int
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 25/34] media: em28xx: cx231xx: fix opencoded find_and_set_bit()
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
                   ` (23 preceding siblings ...)
  2023-11-18 15:50 ` [PATCH 24/34] mISDN: optimize get_free_devid() Yury Norov
@ 2023-11-18 15:50 ` Yury Norov
  2023-11-18 15:50 ` [PATCH 26/34] ethernet: rocker: optimize ofdpa_port_internal_vlan_id_get() Yury Norov
                   ` (9 subsequent siblings)
  34 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:50 UTC (permalink / raw)
  To: linux-kernel, Mauro Carvalho Chehab, Yury Norov, linux-media
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

Functions in the media/usb drivers opencode find_and_set_bit() by
polling on a found bit in a while-loop.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/media/usb/cx231xx/cx231xx-cards.c | 16 ++++------
 drivers/media/usb/em28xx/em28xx-cards.c   | 37 +++++++++--------------
 2 files changed, 21 insertions(+), 32 deletions(-)

diff --git a/drivers/media/usb/cx231xx/cx231xx-cards.c b/drivers/media/usb/cx231xx/cx231xx-cards.c
index 92efe6c1f47b..b314603932d7 100644
--- a/drivers/media/usb/cx231xx/cx231xx-cards.c
+++ b/drivers/media/usb/cx231xx/cx231xx-cards.c
@@ -1708,16 +1708,12 @@ static int cx231xx_usb_probe(struct usb_interface *interface,
 		return -ENODEV;
 
 	/* Check to see next free device and mark as used */
-	do {
-		nr = find_first_zero_bit(&cx231xx_devused, CX231XX_MAXBOARDS);
-		if (nr >= CX231XX_MAXBOARDS) {
-			/* No free device slots */
-			dev_err(d,
-				"Supports only %i devices.\n",
-				CX231XX_MAXBOARDS);
-			return -ENOMEM;
-		}
-	} while (test_and_set_bit(nr, &cx231xx_devused));
+	nr = find_and_set_bit(&cx231xx_devused, CX231XX_MAXBOARDS);
+	if (nr >= CX231XX_MAXBOARDS) {
+		/* No free device slots */
+		dev_err(d, "Supports only %i devices.\n", CX231XX_MAXBOARDS);
+		return -ENOMEM;
+	}
 
 	udev = usb_get_dev(interface_to_usbdev(interface));
 
diff --git a/drivers/media/usb/em28xx/em28xx-cards.c b/drivers/media/usb/em28xx/em28xx-cards.c
index 4d037c92af7c..af4809fe74a8 100644
--- a/drivers/media/usb/em28xx/em28xx-cards.c
+++ b/drivers/media/usb/em28xx/em28xx-cards.c
@@ -3684,17 +3684,14 @@ static int em28xx_duplicate_dev(struct em28xx *dev)
 		return -ENOMEM;
 	}
 	/* Check to see next free device and mark as used */
-	do {
-		nr = find_first_zero_bit(em28xx_devused, EM28XX_MAXBOARDS);
-		if (nr >= EM28XX_MAXBOARDS) {
-			/* No free device slots */
-			dev_warn(&dev->intf->dev, ": Supports only %i em28xx boards.\n",
-				 EM28XX_MAXBOARDS);
-			kfree(sec_dev);
-			dev->dev_next = NULL;
-			return -ENOMEM;
-		}
-	} while (test_and_set_bit(nr, em28xx_devused));
+	nr = find_and_set_bit(em28xx_devused, EM28XX_MAXBOARDS);
+	if (nr >= EM28XX_MAXBOARDS) {
+		/* No free device slots */
+		dev_warn(&dev->intf->dev, ": Supports only %i em28xx boards.\n", EM28XX_MAXBOARDS);
+		kfree(sec_dev);
+		dev->dev_next = NULL;
+		return -ENOMEM;
+	}
 	sec_dev->devno = nr;
 	snprintf(sec_dev->name, 28, "em28xx #%d", nr);
 	sec_dev->dev_next = NULL;
@@ -3827,17 +3824,13 @@ static int em28xx_usb_probe(struct usb_interface *intf,
 	udev = usb_get_dev(interface_to_usbdev(intf));
 
 	/* Check to see next free device and mark as used */
-	do {
-		nr = find_first_zero_bit(em28xx_devused, EM28XX_MAXBOARDS);
-		if (nr >= EM28XX_MAXBOARDS) {
-			/* No free device slots */
-			dev_err(&intf->dev,
-				"Driver supports up to %i em28xx boards.\n",
-			       EM28XX_MAXBOARDS);
-			retval = -ENOMEM;
-			goto err_no_slot;
-		}
-	} while (test_and_set_bit(nr, em28xx_devused));
+	nr = find_and_set_bit(em28xx_devused, EM28XX_MAXBOARDS);
+	if (nr >= EM28XX_MAXBOARDS) {
+		/* No free device slots */
+		dev_err(&intf->dev, "Driver supports up to %i em28xx boards.\n", EM28XX_MAXBOARDS);
+		retval = -ENOMEM;
+		goto err_no_slot;
+	}
 
 	/* Don't register audio interfaces */
 	if (intf->altsetting[0].desc.bInterfaceClass == USB_CLASS_AUDIO) {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 26/34] ethernet: rocker: optimize ofdpa_port_internal_vlan_id_get()
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
                   ` (24 preceding siblings ...)
  2023-11-18 15:50 ` [PATCH 25/34] media: em28xx: cx231xx: fix opencoded find_and_set_bit() Yury Norov
@ 2023-11-18 15:50 ` Yury Norov
  2023-11-18 15:50 ` [PATCH 27/34] serial: sc12is7xx: optimize sc16is7xx_alloc_line() Yury Norov
                   ` (8 subsequent siblings)
  34 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:50 UTC (permalink / raw)
  To: linux-kernel, Jiri Pirko, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, netdev
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

Optimize ofdpa_port_internal_vlan_id_get() by using find_and_set_bit(),
instead of polling every bit from bitmap in a for-loop.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/net/ethernet/rocker/rocker_ofdpa.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker_ofdpa.c b/drivers/net/ethernet/rocker/rocker_ofdpa.c
index 826990459fa4..449be8af7ffc 100644
--- a/drivers/net/ethernet/rocker/rocker_ofdpa.c
+++ b/drivers/net/ethernet/rocker/rocker_ofdpa.c
@@ -2249,14 +2249,11 @@ static __be16 ofdpa_port_internal_vlan_id_get(struct ofdpa_port *ofdpa_port,
 	found = entry;
 	hash_add(ofdpa->internal_vlan_tbl, &found->entry, found->ifindex);
 
-	for (i = 0; i < OFDPA_N_INTERNAL_VLANS; i++) {
-		if (test_and_set_bit(i, ofdpa->internal_vlan_bitmap))
-			continue;
+	i = find_and_set_bit(ofdpa->internal_vlan_bitmap, OFDPA_N_INTERNAL_VLANS);
+	if (i < OFDPA_N_INTERNAL_VLANS)
 		found->vlan_id = htons(OFDPA_INTERNAL_VLAN_ID_BASE + i);
-		goto found;
-	}
-
-	netdev_err(ofdpa_port->dev, "Out of internal VLAN IDs\n");
+	else
+		netdev_err(ofdpa_port->dev, "Out of internal VLAN IDs\n");
 
 found:
 	found->ref_count++;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 27/34] serial: sc12is7xx: optimize sc16is7xx_alloc_line()
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
                   ` (25 preceding siblings ...)
  2023-11-18 15:50 ` [PATCH 26/34] ethernet: rocker: optimize ofdpa_port_internal_vlan_id_get() Yury Norov
@ 2023-11-18 15:50 ` Yury Norov
  2023-11-18 15:50 ` [PATCH 28/34] bluetooth: optimize cmtp_alloc_block_id() Yury Norov
                   ` (7 subsequent siblings)
  34 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:50 UTC (permalink / raw)
  To: linux-kernel, Greg Kroah-Hartman, Jiri Slaby, Hugo Villeneuve,
	Lech Perczak, Ilpo Järvinen, Andy Shevchenko,
	Uwe Kleine-König, Thomas Gleixner, Hui Wang, Isaac True,
	Yury Norov, linux-serial
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

Instead of polling every bit in sc16is7xx_lines, switch it to using a
dedicated find_and_set_bit(), and make the function a simple one-liner.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/tty/serial/sc16is7xx.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c
index db2bb1c0d36c..6a463988d5e0 100644
--- a/drivers/tty/serial/sc16is7xx.c
+++ b/drivers/tty/serial/sc16is7xx.c
@@ -427,15 +427,9 @@ static void sc16is7xx_port_update(struct uart_port *port, u8 reg,
 
 static int sc16is7xx_alloc_line(void)
 {
-	int i;
-
 	BUILD_BUG_ON(SC16IS7XX_MAX_DEVS > BITS_PER_LONG);
 
-	for (i = 0; i < SC16IS7XX_MAX_DEVS; i++)
-		if (!test_and_set_bit(i, &sc16is7xx_lines))
-			break;
-
-	return i;
+	return find_and_set_bit(&sc16is7xx_lines, SC16IS7XX_MAX_DEVS);
 }
 
 static void sc16is7xx_power(struct uart_port *port, int on)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 28/34] bluetooth: optimize cmtp_alloc_block_id()
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
                   ` (26 preceding siblings ...)
  2023-11-18 15:50 ` [PATCH 27/34] serial: sc12is7xx: optimize sc16is7xx_alloc_line() Yury Norov
@ 2023-11-18 15:50 ` Yury Norov
  2023-11-18 15:51 ` [PATCH 29/34] net: smc: fix opencoded find_and_set_bit() in smc_wr_tx_get_free_slot_index() Yury Norov
                   ` (6 subsequent siblings)
  34 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:50 UTC (permalink / raw)
  To: linux-kernel, Karsten Keil, Marcel Holtmann, Johan Hedberg,
	Luiz Augusto von Dentz, Yury Norov, netdev, linux-bluetooth
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

Instead of polling every bit in blockids, switch it to using a
dedicated find_and_set_bit(), and make the function a simple one-liner.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 net/bluetooth/cmtp/core.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/net/bluetooth/cmtp/core.c b/net/bluetooth/cmtp/core.c
index 90d130588a3e..b1330acbbff3 100644
--- a/net/bluetooth/cmtp/core.c
+++ b/net/bluetooth/cmtp/core.c
@@ -88,15 +88,9 @@ static void __cmtp_copy_session(struct cmtp_session *session, struct cmtp_connin
 
 static inline int cmtp_alloc_block_id(struct cmtp_session *session)
 {
-	int i, id = -1;
+	int id = find_and_set_bit(&session->blockids, 16);
 
-	for (i = 0; i < 16; i++)
-		if (!test_and_set_bit(i, &session->blockids)) {
-			id = i;
-			break;
-		}
-
-	return id;
+	return id < 16 ? id : -1;
 }
 
 static inline void cmtp_free_block_id(struct cmtp_session *session, int id)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 29/34] net: smc: fix opencoded find_and_set_bit() in smc_wr_tx_get_free_slot_index()
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
                   ` (27 preceding siblings ...)
  2023-11-18 15:50 ` [PATCH 28/34] bluetooth: optimize cmtp_alloc_block_id() Yury Norov
@ 2023-11-18 15:51 ` Yury Norov
  2023-11-20  8:43   ` Alexandra Winter
  2023-11-20  9:56   ` Tony Lu
  2023-11-18 15:51 ` [PATCH 30/34] ALSA: use atomic find_bit() functions where applicable Yury Norov
                   ` (5 subsequent siblings)
  34 siblings, 2 replies; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:51 UTC (permalink / raw)
  To: linux-kernel, Karsten Graul, Wenjia Zhang, Jan Karcher, D. Wythe,
	Tony Lu, Wen Gu, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, linux-s390, netdev
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

The function opencodes find_and_set_bit() with a for_each() loop. Fix
it, and make the whole function a simple almost one-liner.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 net/smc/smc_wr.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c
index 0021065a600a..b6f0cfc52788 100644
--- a/net/smc/smc_wr.c
+++ b/net/smc/smc_wr.c
@@ -170,15 +170,11 @@ void smc_wr_tx_cq_handler(struct ib_cq *ib_cq, void *cq_context)
 
 static inline int smc_wr_tx_get_free_slot_index(struct smc_link *link, u32 *idx)
 {
-	*idx = link->wr_tx_cnt;
 	if (!smc_link_sendable(link))
 		return -ENOLINK;
-	for_each_clear_bit(*idx, link->wr_tx_mask, link->wr_tx_cnt) {
-		if (!test_and_set_bit(*idx, link->wr_tx_mask))
-			return 0;
-	}
-	*idx = link->wr_tx_cnt;
-	return -EBUSY;
+
+	*idx = find_and_set_bit(link->wr_tx_mask, link->wr_tx_cnt);
+	return *idx < link->wr_tx_cnt ? 0 : -EBUSY;
 }
 
 /**
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 30/34] ALSA: use atomic find_bit() functions where applicable
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
                   ` (28 preceding siblings ...)
  2023-11-18 15:51 ` [PATCH 29/34] net: smc: fix opencoded find_and_set_bit() in smc_wr_tx_get_free_slot_index() Yury Norov
@ 2023-11-18 15:51 ` Yury Norov
  2023-11-20 15:57   ` Takashi Iwai
  2023-11-18 15:51 ` [PATCH 31/34] drivers/perf: optimize m1_pmu_get_event_idx() by using find_bit() API Yury Norov
                   ` (4 subsequent siblings)
  34 siblings, 1 reply; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:51 UTC (permalink / raw)
  To: linux-kernel, Jaroslav Kysela, Takashi Iwai, Daniel Mack,
	Cezary Rojewski, Kai Vehmanen, Yury Norov, Kees Cook,
	linux-sound, alsa-devel
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

ALSA code tests each bit in bitmaps in a for() loop. Switch it to
dedicated atomic find_bit() API.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 sound/pci/hda/hda_codec.c |  7 +++----
 sound/usb/caiaq/audio.c   | 13 +++++--------
 2 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/sound/pci/hda/hda_codec.c b/sound/pci/hda/hda_codec.c
index 01718b1fc9a7..29254005f394 100644
--- a/sound/pci/hda/hda_codec.c
+++ b/sound/pci/hda/hda_codec.c
@@ -3275,10 +3275,9 @@ static int get_empty_pcm_device(struct hda_bus *bus, unsigned int type)
 
 #ifdef CONFIG_SND_DYNAMIC_MINORS
 	/* non-fixed slots starting from 10 */
-	for (i = 10; i < 32; i++) {
-		if (!test_and_set_bit(i, bus->pcm_dev_bits))
-			return i;
-	}
+	i = find_and_set_next_bit(bus->pcm_dev_bits, 32, 10);
+	if (i < 32)
+		return i;
 #endif
 
 	dev_warn(bus->card->dev, "Too many %s devices\n",
diff --git a/sound/usb/caiaq/audio.c b/sound/usb/caiaq/audio.c
index 4981753652a7..74dfcf32b439 100644
--- a/sound/usb/caiaq/audio.c
+++ b/sound/usb/caiaq/audio.c
@@ -610,7 +610,7 @@ static void read_completed(struct urb *urb)
 	struct snd_usb_caiaq_cb_info *info = urb->context;
 	struct snd_usb_caiaqdev *cdev;
 	struct device *dev;
-	struct urb *out = NULL;
+	struct urb *out;
 	int i, frame, len, send_it = 0, outframe = 0;
 	unsigned long flags;
 	size_t offset = 0;
@@ -625,17 +625,14 @@ static void read_completed(struct urb *urb)
 		return;
 
 	/* find an unused output urb that is unused */
-	for (i = 0; i < N_URBS; i++)
-		if (test_and_set_bit(i, &cdev->outurb_active_mask) == 0) {
-			out = cdev->data_urbs_out[i];
-			break;
-		}
-
-	if (!out) {
+	i = find_and_set_bit(&cdev->outurb_active_mask, N_URBS);
+	if (i >= N_URBS) {
 		dev_err(dev, "Unable to find an output urb to use\n");
 		goto requeue;
 	}
 
+	out = cdev->data_urbs_out[i];
+
 	/* read the recently received packet and send back one which has
 	 * the same layout */
 	for (frame = 0; frame < FRAMES_PER_URB; frame++) {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 31/34] drivers/perf: optimize m1_pmu_get_event_idx() by using find_bit() API
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
                   ` (29 preceding siblings ...)
  2023-11-18 15:51 ` [PATCH 30/34] ALSA: use atomic find_bit() functions where applicable Yury Norov
@ 2023-11-18 15:51 ` Yury Norov
  2023-11-18 18:40   ` Marc Zyngier
  2023-11-18 15:51 ` [PATCH 32/34] m68k: rework get_mmu_context() Yury Norov
                   ` (3 subsequent siblings)
  34 siblings, 1 reply; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:51 UTC (permalink / raw)
  To: linux-kernel, Will Deacon, Mark Rutland, Marc Zyngier, linux-arm-kernel
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

The function searches used_mask for a bit in a for-loop bit by bit.
We can do it faster by using atomic find_and_set_bit().

The comment to the function says that it searches for the first free
counter, but obviously for_each_set_bit() searches for the first set
counter. The following test_and_set_bit() tries to enable already set
bit, which is weird.

This patch, by using find_and_set_bit(), fixes this automatically.

Fixes: a639027a1be1 ("drivers/perf: Add Apple icestorm/firestorm CPU PMU driver")
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/perf/apple_m1_cpu_pmu.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/perf/apple_m1_cpu_pmu.c b/drivers/perf/apple_m1_cpu_pmu.c
index cd2de44b61b9..2d50670ffb01 100644
--- a/drivers/perf/apple_m1_cpu_pmu.c
+++ b/drivers/perf/apple_m1_cpu_pmu.c
@@ -447,12 +447,8 @@ static int m1_pmu_get_event_idx(struct pmu_hw_events *cpuc,
 	 * counting on the PMU at any given time, and by placing the
 	 * most constraining events first.
 	 */
-	for_each_set_bit(idx, &affinity, M1_PMU_NR_COUNTERS) {
-		if (!test_and_set_bit(idx, cpuc->used_mask))
-			return idx;
-	}
-
-	return -EAGAIN;
+	idx = find_and_set_bit(cpuc->used_mask, M1_PMU_NR_COUNTERS);
+	return idx < M1_PMU_NR_COUNTERS ? idx : -EAGAIN;
 }
 
 static void m1_pmu_clear_event_idx(struct pmu_hw_events *cpuc,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 32/34] m68k: rework get_mmu_context()
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
                   ` (30 preceding siblings ...)
  2023-11-18 15:51 ` [PATCH 31/34] drivers/perf: optimize m1_pmu_get_event_idx() by using find_bit() API Yury Norov
@ 2023-11-18 15:51 ` Yury Norov
  2023-11-19 19:29   ` Geert Uytterhoeven
  2023-11-21 14:39   ` Greg Ungerer
  2023-11-18 15:51 ` [PATCH 33/34] microblaze: " Yury Norov
                   ` (2 subsequent siblings)
  34 siblings, 2 replies; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:51 UTC (permalink / raw)
  To: linux-kernel, Geert Uytterhoeven, Hugh Dickins, Andrew Morton,
	Yury Norov, linux-m68k
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

ALSA code opencodes atomic find_and_set_bit_wrap(). Switch it to
dedicated function.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 arch/m68k/include/asm/mmu_context.h | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/m68k/include/asm/mmu_context.h b/arch/m68k/include/asm/mmu_context.h
index 141bbdfad960..0419ad87a1c1 100644
--- a/arch/m68k/include/asm/mmu_context.h
+++ b/arch/m68k/include/asm/mmu_context.h
@@ -35,12 +35,11 @@ static inline void get_mmu_context(struct mm_struct *mm)
 		atomic_inc(&nr_free_contexts);
 		steal_context();
 	}
-	ctx = next_mmu_context;
-	while (test_and_set_bit(ctx, context_map)) {
-		ctx = find_next_zero_bit(context_map, LAST_CONTEXT+1, ctx);
-		if (ctx > LAST_CONTEXT)
-			ctx = 0;
-	}
+
+	do {
+		ctx = find_and_set_bit_wrap(context_map, LAST_CONTEXT + 1, next_mmu_context);
+	} while (ctx > LAST_CONTEXT);
+
 	next_mmu_context = (ctx + 1) & LAST_CONTEXT;
 	mm->context = ctx;
 	context_mm[ctx] = mm;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 33/34] microblaze: rework get_mmu_context()
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
                   ` (31 preceding siblings ...)
  2023-11-18 15:51 ` [PATCH 32/34] m68k: rework get_mmu_context() Yury Norov
@ 2023-11-18 15:51 ` Yury Norov
  2023-11-18 15:51 ` [PATCH 34/34] sh: rework ilsel_enable() Yury Norov
  2023-11-18 16:18 ` [PATCH 00/34] biops: add atomig find_bit() operations Bart Van Assche
  34 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:51 UTC (permalink / raw)
  To: linux-kernel, Michal Simek, Yury Norov
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

Fix opencoded find_and_set_bit_wrap(), which also suppresses potential
KCSAN warning.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 arch/microblaze/include/asm/mmu_context_mm.h | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/microblaze/include/asm/mmu_context_mm.h b/arch/microblaze/include/asm/mmu_context_mm.h
index c2c77f708455..209c3a62353a 100644
--- a/arch/microblaze/include/asm/mmu_context_mm.h
+++ b/arch/microblaze/include/asm/mmu_context_mm.h
@@ -82,12 +82,11 @@ static inline void get_mmu_context(struct mm_struct *mm)
 		return;
 	while (atomic_dec_if_positive(&nr_free_contexts) < 0)
 		steal_context();
-	ctx = next_mmu_context;
-	while (test_and_set_bit(ctx, context_map)) {
-		ctx = find_next_zero_bit(context_map, LAST_CONTEXT+1, ctx);
-		if (ctx > LAST_CONTEXT)
-			ctx = 0;
-	}
+
+	do {
+		ctx = find_and_set_bit_wrap(context_map, LAST_CONTEXT + 1, next_mmu_context);
+	} while (ctx > LAST_CONTEXT);
+
 	next_mmu_context = (ctx + 1) & LAST_CONTEXT;
 	mm->context = ctx;
 	context_mm[ctx] = mm;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 34/34] sh: rework ilsel_enable()
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
                   ` (32 preceding siblings ...)
  2023-11-18 15:51 ` [PATCH 33/34] microblaze: " Yury Norov
@ 2023-11-18 15:51 ` Yury Norov
  2023-11-18 16:15   ` John Paul Adrian Glaubitz
  2023-11-18 16:18 ` [PATCH 00/34] biops: add atomig find_bit() operations Bart Van Assche
  34 siblings, 1 reply; 68+ messages in thread
From: Yury Norov @ 2023-11-18 15:51 UTC (permalink / raw)
  To: linux-kernel, Yoshinori Sato, Rich Felker,
	John Paul Adrian Glaubitz, Yury Norov, linux-sh
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

Fix opencoded find_and_set_bit(), which also suppresses potential
KCSAN warning.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 arch/sh/boards/mach-x3proto/ilsel.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/sh/boards/mach-x3proto/ilsel.c b/arch/sh/boards/mach-x3proto/ilsel.c
index f0d5eb41521a..7fadc479a80b 100644
--- a/arch/sh/boards/mach-x3proto/ilsel.c
+++ b/arch/sh/boards/mach-x3proto/ilsel.c
@@ -99,8 +99,8 @@ int ilsel_enable(ilsel_source_t set)
 	}
 
 	do {
-		bit = find_first_zero_bit(&ilsel_level_map, ILSEL_LEVELS);
-	} while (test_and_set_bit(bit, &ilsel_level_map));
+		bit = find_and_set_bit(&ilsel_level_map, ILSEL_LEVELS);
+	} while (bit >= ILSEL_LEVELS);
 
 	__ilsel_enable(set, bit);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH 34/34] sh: rework ilsel_enable()
  2023-11-18 15:51 ` [PATCH 34/34] sh: rework ilsel_enable() Yury Norov
@ 2023-11-18 16:15   ` John Paul Adrian Glaubitz
  2023-11-21 13:43     ` Yury Norov
  0 siblings, 1 reply; 68+ messages in thread
From: John Paul Adrian Glaubitz @ 2023-11-18 16:15 UTC (permalink / raw)
  To: Yury Norov, linux-kernel, Yoshinori Sato, Rich Felker, linux-sh
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

Hello Yury!

On Sat, 2023-11-18 at 07:51 -0800, Yury Norov wrote:
> Fix opencoded find_and_set_bit(), which also suppresses potential
> KCSAN warning.
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> ---
>  arch/sh/boards/mach-x3proto/ilsel.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/sh/boards/mach-x3proto/ilsel.c b/arch/sh/boards/mach-x3proto/ilsel.c
> index f0d5eb41521a..7fadc479a80b 100644
> --- a/arch/sh/boards/mach-x3proto/ilsel.c
> +++ b/arch/sh/boards/mach-x3proto/ilsel.c
> @@ -99,8 +99,8 @@ int ilsel_enable(ilsel_source_t set)
>  	}
>  
>  	do {
> -		bit = find_first_zero_bit(&ilsel_level_map, ILSEL_LEVELS);
> -	} while (test_and_set_bit(bit, &ilsel_level_map));
> +		bit = find_and_set_bit(&ilsel_level_map, ILSEL_LEVELS);
> +	} while (bit >= ILSEL_LEVELS);
>  
>  	__ilsel_enable(set, bit);
>  

The subject should mention the subsystem, i.e. "sh: mach-x3proto:".

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 00/34] biops: add atomig find_bit() operations
  2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
                   ` (33 preceding siblings ...)
  2023-11-18 15:51 ` [PATCH 34/34] sh: rework ilsel_enable() Yury Norov
@ 2023-11-18 16:18 ` Bart Van Assche
  2023-11-18 19:06   ` Sergey Shtylyov
  34 siblings, 1 reply; 68+ messages in thread
From: Bart Van Assche @ 2023-11-18 16:18 UTC (permalink / raw)
  To: Yury Norov, linux-kernel, David S. Miller, H. Peter Anvin,
	James E.J. Bottomley, K. Y. Srinivasan, Md. Haris Iqbal,
	Akinobu Mita, Andrew Morton, Bjorn Andersson, Borislav Petkov,
	Chaitanya Kulkarni, Christian Brauner, Damien Le Moal,
	Dave Hansen, David Disseldorp, Edward Cree, Eric Dumazet,
	Fenghua Yu, Geert Uytterhoeven, Greg Kroah-Hartman,
	Gregory Greenman, Hans Verkuil, Hans de Goede, Hugh Dickins,
	Ingo Molnar, Jakub Kicinski, Jaroslav Kysela, Jason Gunthorpe,
	Jens Axboe, Jiri Pirko, Jiri Slaby, Kalle Valo, Karsten Graul,
	Karsten Keil, Kees Cook, Leon Romanovsky, Mark Rutland,
	Martin Habets, Mauro Carvalho Chehab, Michael Ellerman,
	Michal Simek, Nicholas Piggin, Oliver Neukum, Paolo Abeni,
	Paolo Bonzini, Peter Zijlstra, Ping-Ke Shih, Rich Felker,
	Rob Herring, Robin Murphy, Sathya Prakash Veerichetty,
	Sean Christopherson, Shuai Xue, Stanislaw Gruszka,
	Steven Rostedt, Thomas Bogendoerfer, Thomas Gleixner,
	Valentin Schneider, Vitaly Kuznetsov, Wenjia Zhang, Will Deacon,
	Yoshinori Sato, GR-QLogic-Storage-Upstream, alsa-devel, ath10k,
	dmaengine, iommu, kvm, linux-arm-kernel, linux-arm-msm,
	linux-block, linux-bluetooth, linux-hyperv, linux-m68k,
	linux-media, linux-mips, linux-net-drivers, linux-pci,
	linux-rdma, linux-s390, linux-scsi, linux-serial, linux-sh,
	linux-sound, linux-usb, linux-wireless, linuxppc-dev,
	mpi3mr-linuxdrv.pdl, netdev, sparclinux, x86
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

On 11/18/23 07:50, Yury Norov wrote:
> Add helpers around test_and_{set,clear}_bit() that allow to search for
> clear or set bits and flip them atomically.

There is a typo in the subject: shouldn't "atomig" be changed
into "atomic"?

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 01/34] lib/find: add atomic find_bit() primitives
  2023-11-18 15:50 ` [PATCH 01/34] lib/find: add atomic find_bit() primitives Yury Norov
@ 2023-11-18 16:23   ` Bart Van Assche
  0 siblings, 0 replies; 68+ messages in thread
From: Bart Van Assche @ 2023-11-18 16:23 UTC (permalink / raw)
  To: Yury Norov, linux-kernel, David S. Miller, H. Peter Anvin,
	James E.J. Bottomley, K. Y. Srinivasan, Md. Haris Iqbal,
	Akinobu Mita, Andrew Morton, Bjorn Andersson, Borislav Petkov,
	Chaitanya Kulkarni, Christian Brauner, Damien Le Moal,
	Dave Hansen, David Disseldorp, Edward Cree, Eric Dumazet,
	Fenghua Yu, Geert Uytterhoeven, Greg Kroah-Hartman,
	Gregory Greenman, Hans Verkuil, Hans de Goede, Hugh Dickins,
	Ingo Molnar, Jakub Kicinski, Jaroslav Kysela, Jason Gunthorpe,
	Jens Axboe, Jiri Pirko, Jiri Slaby, Kalle Valo, Karsten Graul,
	Karsten Keil, Kees Cook, Leon Romanovsky, Mark Rutland,
	Martin Habets, Mauro Carvalho Chehab, Michael Ellerman,
	Michal Simek, Nicholas Piggin, Oliver Neukum, Paolo Abeni,
	Paolo Bonzini, Peter Zijlstra, Ping-Ke Shih, Rich Felker,
	Rob Herring, Robin Murphy, Sathya Prakash Veerichetty,
	Sean Christopherson, Shuai Xue, Stanislaw Gruszka,
	Steven Rostedt, Thomas Bogendoerfer, Thomas Gleixner,
	Valentin Schneider, Vitaly Kuznetsov, Wenjia Zhang, Will Deacon,
	Yoshinori Sato, GR-QLogic-Storage-Upstream, alsa-devel, ath10k,
	dmaengine, iommu, kvm, linux-arm-kernel, linux-arm-msm,
	linux-block, linux-bluetooth, linux-hyperv, linux-m68k,
	linux-media, linux-mips, linux-net-drivers, linux-pci,
	linux-rdma, linux-s390, linux-scsi, linux-serial, linux-sh,
	linux-sound, linux-usb, linux-wireless, linuxppc-dev,
	mpi3mr-linuxdrv.pdl, netdev, sparclinux, x86
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

On 11/18/23 07:50, Yury Norov wrote:
> Add helpers around test_and_{set,clear}_bit() that allow to search for
> clear or set bits and flip them atomically.

Has it been considered to add kunit tests for the new functions?

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 15/34] scsi: use atomic find_bit() API where appropriate
  2023-11-18 15:50 ` [PATCH 15/34] scsi: use atomic find_bit() API where appropriate Yury Norov
@ 2023-11-18 16:30   ` Bart Van Assche
  0 siblings, 0 replies; 68+ messages in thread
From: Bart Van Assche @ 2023-11-18 16:30 UTC (permalink / raw)
  To: Yury Norov, linux-kernel, Sathya Prakash Veerichetty,
	Kashyap Desai, Sumit Saxena, Sreekanth Reddy,
	James E.J. Bottomley, Martin K. Petersen, Nilesh Javali,
	Manish Rangankar, GR-QLogic-Storage-Upstream,
	mpi3mr-linuxdrv.pdl, linux-scsi
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

On 11/18/23 07:50, Yury Norov wrote:
>   drivers/scsi/mpi3mr/mpi3mr_os.c | 21 ++++++---------------
>   drivers/scsi/qedi/qedi_main.c   |  9 +--------
>   drivers/scsi/scsi_lib.c         |  5 ++---
>   3 files changed, 9 insertions(+), 26 deletions(-)

One patch for each of the above source files please. mpi3mr and qedi are
both SCSI drivers. scsi_lib.c is a source file from the SCSI core.

> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index cf3864f72093..4460a37f4864 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -2499,9 +2499,8 @@ void scsi_evt_thread(struct work_struct *work)
>   
>   	sdev = container_of(work, struct scsi_device, event_work);
>   
> -	for (evt_type = SDEV_EVT_FIRST; evt_type <= SDEV_EVT_LAST; evt_type++)
> -		if (test_and_clear_bit(evt_type, sdev->pending_events))
> -			sdev_evt_send_simple(sdev, evt_type, GFP_KERNEL);
> +	for_each_test_and_clear_bit(evt_type, sdev->pending_events, SDEV_EVT_LAST)
> +		sdev_evt_send_simple(sdev, evt_type, GFP_KERNEL);

Hmm ... the original code iterates over the range 1 .. SDEV_EVT_LAST
while the new code iterates over the range 0 .. SDEV_EVT_LAST - 1. 
Please fix this.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* RE: [PATCH 14/34] PCI: hv: switch hv_get_dom_num() to use atomic find_bit()
  2023-11-18 15:50 ` [PATCH 14/34] PCI: hv: switch hv_get_dom_num() to use atomic find_bit() Yury Norov
@ 2023-11-18 17:59   ` Michael Kelley
  0 siblings, 0 replies; 68+ messages in thread
From: Michael Kelley @ 2023-11-18 17:59 UTC (permalink / raw)
  To: Yury Norov, linux-kernel, K. Y. Srinivasan, Haiyang Zhang,
	Wei Liu, Dexuan Cui, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas,
	linux-hyperv, linux-pci
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

From: Yury Norov <yury.norov@gmail.com> Sent: Saturday, November 18, 2023 7:51 AM
> 
> The function traverses bitmap with for_each_clear_bit() just to allocate
> a bit atomically. We can do it better with a dedicated find_and_set_bit().
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> ---
>  drivers/pci/controller/pci-hyperv.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-
> hyperv.c
> index 30c7dfeccb16..033b1fb7f4eb 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -3605,12 +3605,9 @@ static u16 hv_get_dom_num(u16 dom)
>  	if (test_and_set_bit(dom, hvpci_dom_map) == 0)
>  		return dom;
> 
> -	for_each_clear_bit(i, hvpci_dom_map, HVPCI_DOM_MAP_SIZE) {
> -		if (test_and_set_bit(i, hvpci_dom_map) == 0)
> -			return i;
> -	}
> +	i = find_and_set_bit(hvpci_dom_map, HVPCI_DOM_MAP_SIZE);
> 
> -	return HVPCI_DOM_INVALID;
> +	return i < HVPCI_DOM_MAP_SIZE ? i : HVPCI_DOM_INVALID;
>  }
> 
>  /**
> --
> 2.39.2

Reviewed-by: Michael Kelley <mhklinux@outlook.com>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 31/34] drivers/perf: optimize m1_pmu_get_event_idx() by using find_bit() API
  2023-11-18 15:51 ` [PATCH 31/34] drivers/perf: optimize m1_pmu_get_event_idx() by using find_bit() API Yury Norov
@ 2023-11-18 18:40   ` Marc Zyngier
  2023-11-18 18:45     ` Yury Norov
  0 siblings, 1 reply; 68+ messages in thread
From: Marc Zyngier @ 2023-11-18 18:40 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-kernel, Will Deacon, Mark Rutland, linux-arm-kernel,
	Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

On Sat, 18 Nov 2023 15:51:02 +0000,
Yury Norov <yury.norov@gmail.com> wrote:
> 
> The function searches used_mask for a bit in a for-loop bit by bit.
> We can do it faster by using atomic find_and_set_bit().

Sure, let's do things fast. Correctness is overrated anyway.

> 
> The comment to the function says that it searches for the first free
> counter, but obviously for_each_set_bit() searches for the first set
> counter.

No it doesn't. It iterates over the counters the event can count on.

> The following test_and_set_bit() tries to enable already set
> bit, which is weird.

Maybe you could try to actually read the code?

> 
> This patch, by using find_and_set_bit(), fixes this automatically.

This doesn't fix anything, but instead actively breaks the driver.

> 
> Fixes: a639027a1be1 ("drivers/perf: Add Apple icestorm/firestorm CPU PMU driver")
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> ---
>  drivers/perf/apple_m1_cpu_pmu.c | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/perf/apple_m1_cpu_pmu.c b/drivers/perf/apple_m1_cpu_pmu.c
> index cd2de44b61b9..2d50670ffb01 100644
> --- a/drivers/perf/apple_m1_cpu_pmu.c
> +++ b/drivers/perf/apple_m1_cpu_pmu.c
> @@ -447,12 +447,8 @@ static int m1_pmu_get_event_idx(struct pmu_hw_events *cpuc,
>  	 * counting on the PMU at any given time, and by placing the
>  	 * most constraining events first.
>  	 */
> -	for_each_set_bit(idx, &affinity, M1_PMU_NR_COUNTERS) {
> -		if (!test_and_set_bit(idx, cpuc->used_mask))
> -			return idx;
> -	}
> -
> -	return -EAGAIN;
> +	idx = find_and_set_bit(cpuc->used_mask, M1_PMU_NR_COUNTERS);
> +	return idx < M1_PMU_NR_COUNTERS ? idx : -EAGAIN;

So now you're picking any possible counter, irrespective of the
possible affinity of the event. This is great.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 31/34] drivers/perf: optimize m1_pmu_get_event_idx() by using find_bit() API
  2023-11-18 18:40   ` Marc Zyngier
@ 2023-11-18 18:45     ` Yury Norov
  0 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-11-18 18:45 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-kernel, Will Deacon, Mark Rutland, linux-arm-kernel,
	Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

On Sat, Nov 18, 2023 at 06:40:43PM +0000, Marc Zyngier wrote:
> On Sat, 18 Nov 2023 15:51:02 +0000,
> Yury Norov <yury.norov@gmail.com> wrote:
> > 
> > The function searches used_mask for a bit in a for-loop bit by bit.
> > We can do it faster by using atomic find_and_set_bit().
> 
> Sure, let's do things fast. Correctness is overrated anyway.
> 
> > 
> > The comment to the function says that it searches for the first free
> > counter, but obviously for_each_set_bit() searches for the first set
> > counter.
> 
> No it doesn't. It iterates over the counters the event can count on.
> 
> > The following test_and_set_bit() tries to enable already set
> > bit, which is weird.
> 
> Maybe you could try to actually read the code?
> 
> > 
> > This patch, by using find_and_set_bit(), fixes this automatically.
> 
> This doesn't fix anything, but instead actively breaks the driver.
> 
> > 
> > Fixes: a639027a1be1 ("drivers/perf: Add Apple icestorm/firestorm CPU PMU driver")
> > Signed-off-by: Yury Norov <yury.norov@gmail.com>
> > ---
> >  drivers/perf/apple_m1_cpu_pmu.c | 8 ++------
> >  1 file changed, 2 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/perf/apple_m1_cpu_pmu.c b/drivers/perf/apple_m1_cpu_pmu.c
> > index cd2de44b61b9..2d50670ffb01 100644
> > --- a/drivers/perf/apple_m1_cpu_pmu.c
> > +++ b/drivers/perf/apple_m1_cpu_pmu.c
> > @@ -447,12 +447,8 @@ static int m1_pmu_get_event_idx(struct pmu_hw_events *cpuc,
> >  	 * counting on the PMU at any given time, and by placing the
> >  	 * most constraining events first.
> >  	 */
> > -	for_each_set_bit(idx, &affinity, M1_PMU_NR_COUNTERS) {
> > -		if (!test_and_set_bit(idx, cpuc->used_mask))
> > -			return idx;
> > -	}
> > -
> > -	return -EAGAIN;
> > +	idx = find_and_set_bit(cpuc->used_mask, M1_PMU_NR_COUNTERS);
> > +	return idx < M1_PMU_NR_COUNTERS ? idx : -EAGAIN;
> 
> So now you're picking any possible counter, irrespective of the
> possible affinity of the event. This is great.

Ok, I'll drop the patch. Sorry.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 00/34] biops: add atomig find_bit() operations
  2023-11-18 16:18 ` [PATCH 00/34] biops: add atomig find_bit() operations Bart Van Assche
@ 2023-11-18 19:06   ` Sergey Shtylyov
  0 siblings, 0 replies; 68+ messages in thread
From: Sergey Shtylyov @ 2023-11-18 19:06 UTC (permalink / raw)
  To: Bart Van Assche, Yury Norov, linux-kernel, David S. Miller,
	H. Peter Anvin, James E.J. Bottomley, K. Y. Srinivasan,
	Md. Haris Iqbal, Akinobu Mita, Andrew Morton, Bjorn Andersson,
	Borislav Petkov, Chaitanya Kulkarni, Christian Brauner,
	Damien Le Moal, Dave Hansen, David Disseldorp, Edward Cree,
	Eric Dumazet, Fenghua Yu, Geert Uytterhoeven, Greg Kroah-Hartman,
	Gregory Greenman, Hans Verkuil, Hans de Goede, Hugh Dickins,
	Ingo Molnar, Jakub Kicinski, Jaroslav Kysela, Jason Gunthorpe,
	Jens Axboe, Jiri Pirko, Jiri Slaby, Kalle Valo, Karsten Graul,
	Karsten Keil, Kees Cook, Leon Romanovsky, Mark Rutland,
	Martin Habets, Mauro Carvalho Chehab, Michael Ellerman,
	Michal Simek, Nicholas Piggin, Oliver Neukum, Paolo Abeni,
	Paolo Bonzini, Peter Zijlstra, Ping-Ke Shih, Rich Felker,
	Rob Herring, Robin Murphy, Sathya Prakash Veerichetty,
	Sean Christopherson, Shuai Xue, Stanislaw Gruszka,
	Steven Rostedt, Thomas Bogendoerfer, Thomas Gleixner,
	Valentin Schneider, Vitaly Kuznetsov, Wenjia Zhang, Will Deacon,
	Yoshinori Sato, GR-QLogic-Storage-Upstream, alsa-devel, ath10k,
	dmaengine, iommu, kvm, linux-arm-kernel, linux-arm-msm,
	linux-block, linux-bluetooth, linux-hyperv, linux-m68k,
	linux-media, linux-mips, linux-net-drivers, linux-pci,
	linux-rdma, linux-s390, linux-scsi, linux-serial, linux-sh,
	linux-sound, linux-usb, linux-wireless, linuxppc-dev,
	mpi3mr-linuxdrv.pdl, netdev, sparclinux, x86
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

On 11/18/23 7:18 PM, Bart Van Assche wrote:
[...]
>> Add helpers around test_and_{set,clear}_bit() that allow to search for
>> clear or set bits and flip them atomically.
> 
> There is a typo in the subject: shouldn't "atomig" be changed
> into "atomic"?

   And "biops" to "bitops"? :-)

> Thanks,
> 
> Bart.

MBR, Sergey

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 32/34] m68k: rework get_mmu_context()
  2023-11-18 15:51 ` [PATCH 32/34] m68k: rework get_mmu_context() Yury Norov
@ 2023-11-19 19:29   ` Geert Uytterhoeven
  2023-11-21 14:39   ` Greg Ungerer
  1 sibling, 0 replies; 68+ messages in thread
From: Geert Uytterhoeven @ 2023-11-19 19:29 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-kernel, Hugh Dickins, Andrew Morton, linux-m68k, Jan Kara,
	Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov, Greg Ungerer

Hi Yuri,

Thanks for your patch!

On Sat, Nov 18, 2023 at 4:51 PM Yury Norov <yury.norov@gmail.com> wrote:
> ALSA code opencodes atomic find_and_set_bit_wrap(). Switch it to

ALSA?

> dedicated function.
>
> Signed-off-by: Yury Norov <yury.norov@gmail.com>

The rest LGTM, but as it's Coldfire code, I'd like to defer to Greg.

Gr{oetje,eeting}s,

                        Geert


--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 12/34] wifi: intel: use atomic find_bit() API where appropriate
  2023-11-18 15:50 ` [PATCH 12/34] wifi: intel: use atomic find_bit() API where appropriate Yury Norov
@ 2023-11-19 19:58   ` Johannes Berg
  2023-11-21 16:36     ` Yury Norov
  0 siblings, 1 reply; 68+ messages in thread
From: Johannes Berg @ 2023-11-19 19:58 UTC (permalink / raw)
  To: Yury Norov, linux-kernel, Stanislaw Gruszka, Kalle Valo,
	Gregory Greenman, Hans de Goede, Kees Cook, Miri Korenblit,
	linux-wireless
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

On Sat, 2023-11-18 at 07:50 -0800, Yury Norov wrote:
> iwlegacy and iwlwifi code opencodes atomic bit allocation/traversing by
> using loops. 

That's really just due to being lazy though, it could use a non-atomic
__test_and_set_bit() would be just fine in all of this, there's always a
mutex held around it that protects the data.

Not that it means that the helper is _wrong_, it's just unnecessary, and
you don't have non-atomic versions of these, do you?

johannes


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 29/34] net: smc: fix opencoded find_and_set_bit() in smc_wr_tx_get_free_slot_index()
  2023-11-18 15:51 ` [PATCH 29/34] net: smc: fix opencoded find_and_set_bit() in smc_wr_tx_get_free_slot_index() Yury Norov
@ 2023-11-20  8:43   ` Alexandra Winter
  2023-11-21 13:41     ` Yury Norov
  2023-11-20  9:56   ` Tony Lu
  1 sibling, 1 reply; 68+ messages in thread
From: Alexandra Winter @ 2023-11-20  8:43 UTC (permalink / raw)
  To: Yury Norov, linux-kernel, Karsten Graul, Wenjia Zhang,
	Jan Karcher, D. Wythe, Tony Lu, Wen Gu, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-s390, netdev
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov



On 18.11.23 16:51, Yury Norov wrote:
> The function opencodes find_and_set_bit() with a for_each() loop. Fix
> it, and make the whole function a simple almost one-liner.
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> ---
>  net/smc/smc_wr.c | 10 +++-------
>  1 file changed, 3 insertions(+), 7 deletions(-)
> 
> diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c
> index 0021065a600a..b6f0cfc52788 100644
> --- a/net/smc/smc_wr.c
> +++ b/net/smc/smc_wr.c
> @@ -170,15 +170,11 @@ void smc_wr_tx_cq_handler(struct ib_cq *ib_cq, void *cq_context)
>  
>  static inline int smc_wr_tx_get_free_slot_index(struct smc_link *link, u32 *idx)
>  {
> -	*idx = link->wr_tx_cnt;
>  	if (!smc_link_sendable(link))
>  		return -ENOLINK;
> -	for_each_clear_bit(*idx, link->wr_tx_mask, link->wr_tx_cnt) {
> -		if (!test_and_set_bit(*idx, link->wr_tx_mask))
> -			return 0;
> -	}
> -	*idx = link->wr_tx_cnt;
> -	return -EBUSY;
> +
> +	*idx = find_and_set_bit(link->wr_tx_mask, link->wr_tx_cnt);
> +	return *idx < link->wr_tx_cnt ? 0 : -EBUSY;
>  }
>  
>  /**


My understanding is that you can omit the lines with
> -	*idx = link->wr_tx_cnt;
because they only apply to the error paths and you checked that the calling function
does not use the idx variable in the error cases. Do I understand this correct?

If so the removal of these 2 lines is not related to your change of using find_and_set_bit(),
do I understand that correctly?

If so, it may be worth mentioning that in the commit message.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 29/34] net: smc: fix opencoded find_and_set_bit() in smc_wr_tx_get_free_slot_index()
  2023-11-18 15:51 ` [PATCH 29/34] net: smc: fix opencoded find_and_set_bit() in smc_wr_tx_get_free_slot_index() Yury Norov
  2023-11-20  8:43   ` Alexandra Winter
@ 2023-11-20  9:56   ` Tony Lu
  1 sibling, 0 replies; 68+ messages in thread
From: Tony Lu @ 2023-11-20  9:56 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-kernel, Karsten Graul, Wenjia Zhang, Jan Karcher, D. Wythe,
	Wen Gu, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, linux-s390, netdev, Jan Kara, Mirsad Todorovac,
	Matthew Wilcox, Rasmus Villemoes, Andy Shevchenko,
	Maxim Kuvyrkov, Alexey Klimov

The prefix tag and subject imply that it is a bugfix. I think, first, it
should be a new feature with net-next tag. Also please use net/smc as
prefix.

Thanks,
Tony Lu

On Sat, Nov 18, 2023 at 07:51:00AM -0800, Yury Norov wrote:
> The function opencodes find_and_set_bit() with a for_each() loop. Fix
> it, and make the whole function a simple almost one-liner.
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> ---
>  net/smc/smc_wr.c | 10 +++-------
>  1 file changed, 3 insertions(+), 7 deletions(-)
> 
> diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c
> index 0021065a600a..b6f0cfc52788 100644
> --- a/net/smc/smc_wr.c
> +++ b/net/smc/smc_wr.c
> @@ -170,15 +170,11 @@ void smc_wr_tx_cq_handler(struct ib_cq *ib_cq, void *cq_context)
>  
>  static inline int smc_wr_tx_get_free_slot_index(struct smc_link *link, u32 *idx)
>  {
> -	*idx = link->wr_tx_cnt;
>  	if (!smc_link_sendable(link))
>  		return -ENOLINK;
> -	for_each_clear_bit(*idx, link->wr_tx_mask, link->wr_tx_cnt) {
> -		if (!test_and_set_bit(*idx, link->wr_tx_mask))
> -			return 0;
> -	}
> -	*idx = link->wr_tx_cnt;
> -	return -EBUSY;
> +
> +	*idx = find_and_set_bit(link->wr_tx_mask, link->wr_tx_cnt);
> +	return *idx < link->wr_tx_cnt ? 0 : -EBUSY;
>  }
>  
>  /**
> -- 
> 2.39.2

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 04/34] sched: add cpumask_find_and_set() and use it in __mm_cid_get()
  2023-11-18 15:50 ` [PATCH 04/34] sched: add cpumask_find_and_set() and use it in __mm_cid_get() Yury Norov
@ 2023-11-20 11:31   ` Peter Zijlstra
  2023-11-20 16:17     ` Mathieu Desnoyers
  0 siblings, 1 reply; 68+ messages in thread
From: Peter Zijlstra @ 2023-11-20 11:31 UTC (permalink / raw)
  To: Yury Norov, mathieu.desnoyers
  Cc: linux-kernel, Andy Shevchenko, Rasmus Villemoes, Ingo Molnar,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Valentin Schneider, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Maxim Kuvyrkov, Alexey Klimov

On Sat, Nov 18, 2023 at 07:50:35AM -0800, Yury Norov wrote:
> __mm_cid_get() uses a __mm_cid_try_get() helper to atomically acquire a
> bit in mm cid mask. Now that we have atomic find_and_set_bit(), we can
> easily extend it to cpumasks and use in the scheduler code.
> 
> __mm_cid_try_get() has an infinite loop, which may delay forward
> progress of __mm_cid_get() when the mask is dense. The
> cpumask_find_and_set() doesn't poll the mask infinitely, and returns as
> soon as nothing has found after the first iteration, allowing to acquire
> the lock, and set use_cid_lock faster, if needed.

Methieu, I forgot again, but the comment delete seems to suggest you did
this on purpose...

> cpumask_find_and_set() considers cid mask as a volatile region of memory,
> as it actually is in this case. So, if it's changed while search is in
> progress, KCSAN wouldn't fire warning on it.
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> ---
>  include/linux/cpumask.h | 12 ++++++++++
>  kernel/sched/sched.h    | 52 ++++++++++++-----------------------------
>  2 files changed, 27 insertions(+), 37 deletions(-)
> 
> diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
> index cfb545841a2c..c2acced8be4e 100644
> --- a/include/linux/cpumask.h
> +++ b/include/linux/cpumask.h
> @@ -271,6 +271,18 @@ unsigned int cpumask_next_and(int n, const struct cpumask *src1p,
>  		small_cpumask_bits, n + 1);
>  }
>  
> +/**
> + * cpumask_find_and_set - find the first unset cpu in a cpumask and
> + *			  set it atomically
> + * @srcp: the cpumask pointer
> + *
> + * Return: >= nr_cpu_ids if nothing is found.
> + */
> +static inline unsigned int cpumask_find_and_set(volatile struct cpumask *srcp)
> +{
> +	return find_and_set_bit(cpumask_bits(srcp), small_cpumask_bits);
> +}
> +
>  /**
>   * for_each_cpu - iterate over every cpu in a mask
>   * @cpu: the (optionally unsigned) integer iterator
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 2e5a95486a42..b2f095a9fc40 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -3345,28 +3345,6 @@ static inline void mm_cid_put(struct mm_struct *mm)
>  	__mm_cid_put(mm, mm_cid_clear_lazy_put(cid));
>  }
>  
> -static inline int __mm_cid_try_get(struct mm_struct *mm)
> -{
> -	struct cpumask *cpumask;
> -	int cid;
> -
> -	cpumask = mm_cidmask(mm);
> -	/*
> -	 * Retry finding first zero bit if the mask is temporarily
> -	 * filled. This only happens during concurrent remote-clear
> -	 * which owns a cid without holding a rq lock.
> -	 */
> -	for (;;) {
> -		cid = cpumask_first_zero(cpumask);
> -		if (cid < nr_cpu_ids)
> -			break;
> -		cpu_relax();
> -	}
> -	if (cpumask_test_and_set_cpu(cid, cpumask))
> -		return -1;
> -	return cid;
> -}
> -
>  /*
>   * Save a snapshot of the current runqueue time of this cpu
>   * with the per-cpu cid value, allowing to estimate how recently it was used.
> @@ -3381,25 +3359,25 @@ static inline void mm_cid_snapshot_time(struct rq *rq, struct mm_struct *mm)
>  
>  static inline int __mm_cid_get(struct rq *rq, struct mm_struct *mm)
>  {
> +	struct cpumask *cpumask = mm_cidmask(mm);
>  	int cid;
>  
> -	/*
> -	 * All allocations (even those using the cid_lock) are lock-free. If
> -	 * use_cid_lock is set, hold the cid_lock to perform cid allocation to
> -	 * guarantee forward progress.
> -	 */
> +	/* All allocations (even those using the cid_lock) are lock-free. */
>  	if (!READ_ONCE(use_cid_lock)) {
> -		cid = __mm_cid_try_get(mm);
> -		if (cid >= 0)
> +		cid = cpumask_find_and_set(cpumask);
> +		if (cid < nr_cpu_ids)
>  			goto end;
> -		raw_spin_lock(&cid_lock);
> -	} else {
> -		raw_spin_lock(&cid_lock);
> -		cid = __mm_cid_try_get(mm);
> -		if (cid >= 0)
> -			goto unlock;
>  	}
>  
> +	/*
> +	 * If use_cid_lock is set, hold the cid_lock to perform cid
> +	 * allocation to guarantee forward progress.
> +	 */
> +	raw_spin_lock(&cid_lock);
> +	cid = cpumask_find_and_set(cpumask);
> +	if (cid < nr_cpu_ids)
> +		goto unlock;
> +
>  	/*
>  	 * cid concurrently allocated. Retry while forcing following
>  	 * allocations to use the cid_lock to ensure forward progress.
> @@ -3415,9 +3393,9 @@ static inline int __mm_cid_get(struct rq *rq, struct mm_struct *mm)
>  	 * all newcoming allocations observe the use_cid_lock flag set.
>  	 */
>  	do {
> -		cid = __mm_cid_try_get(mm);
> +		cid = cpumask_find_and_set(cpumask);
>  		cpu_relax();
> -	} while (cid < 0);
> +	} while (cid >= nr_cpu_ids);
>  	/*
>  	 * Allocate before clearing use_cid_lock. Only care about
>  	 * program order because this is for forward progress.
> -- 
> 2.39.2
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 21/34] usb: cdc-acm: optimize acm_softint()
  2023-11-18 15:50 ` [PATCH 21/34] usb: cdc-acm: optimize acm_softint() Yury Norov
@ 2023-11-20 11:39   ` Oliver Neukum
  0 siblings, 0 replies; 68+ messages in thread
From: Oliver Neukum @ 2023-11-20 11:39 UTC (permalink / raw)
  To: Yury Norov, linux-kernel, Oliver Neukum, Greg Kroah-Hartman, linux-usb
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov



On 18.11.23 16:50, Yury Norov wrote:
> acm_softint(), uses for-loop to traverse urbs_in_error_delay bitmap
> bit by bit to find and clear set bits.
> 
> We can do it better by using for_each_test_and_clear_bit(), because it
> doesn't test already clear bits.
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Oliver Neukum <oneukum@suse.com>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 13/34] KVM: x86: hyper-v: optimize and cleanup kvm_hv_process_stimers()
  2023-11-18 15:50 ` [PATCH 13/34] KVM: x86: hyper-v: optimize and cleanup kvm_hv_process_stimers() Yury Norov
@ 2023-11-20 14:26   ` Vitaly Kuznetsov
  2023-11-21 13:35     ` Yury Norov
  0 siblings, 1 reply; 68+ messages in thread
From: Vitaly Kuznetsov @ 2023-11-20 14:26 UTC (permalink / raw)
  To: Yury Norov, linux-kernel, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, kvm
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

Yury Norov <yury.norov@gmail.com> writes:

> The function traverses stimer_pending_bitmap n a for-loop bit by bit.
> We can do it faster by using atomic find_and_set_bit().
>
> While here, refactor the logic by decreasing indentation level
> and dropping 2nd check for stimer->config.enable.
>
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> ---
>  arch/x86/kvm/hyperv.c | 39 +++++++++++++++++++--------------------
>  1 file changed, 19 insertions(+), 20 deletions(-)
>
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 238afd7335e4..460e300b558b 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -870,27 +870,26 @@ void kvm_hv_process_stimers(struct kvm_vcpu *vcpu)
>  	if (!hv_vcpu)
>  		return;
>  
> -	for (i = 0; i < ARRAY_SIZE(hv_vcpu->stimer); i++)
> -		if (test_and_clear_bit(i, hv_vcpu->stimer_pending_bitmap)) {
> -			stimer = &hv_vcpu->stimer[i];
> -			if (stimer->config.enable) {
> -				exp_time = stimer->exp_time;
> -
> -				if (exp_time) {
> -					time_now =
> -						get_time_ref_counter(vcpu->kvm);
> -					if (time_now >= exp_time)
> -						stimer_expiration(stimer);
> -				}
> -
> -				if ((stimer->config.enable) &&
> -				    stimer->count) {
> -					if (!stimer->msg_pending)
> -						stimer_start(stimer);
> -				} else
> -					stimer_cleanup(stimer);
> -			}
> +	for_each_test_and_clear_bit(i, hv_vcpu->stimer_pending_bitmap,
> +					ARRAY_SIZE(hv_vcpu->stimer)) {
> +		stimer = &hv_vcpu->stimer[i];
> +		if (!stimer->config.enable)
> +			continue;
> +
> +		exp_time = stimer->exp_time;
> +
> +		if (exp_time) {
> +			time_now = get_time_ref_counter(vcpu->kvm);
> +			if (time_now >= exp_time)
> +				stimer_expiration(stimer);
>  		}
> +
> +		if (stimer->count) {

You can't drop 'stimer->config.enable' check here as stimer_expiration()
call above actually changes it. This is done on purpose: oneshot timers
fire only once so 'config.enable' is reset to 0.

> +			if (!stimer->msg_pending)
> +				stimer_start(stimer);
> +		} else
> +			stimer_cleanup(stimer);
> +	}
>  }
>  
>  void kvm_hv_vcpu_uninit(struct kvm_vcpu *vcpu)

-- 
Vitaly


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 09/34] dmaengine: idxd: optimize perfmon_assign_event()
  2023-11-18 15:50 ` [PATCH 09/34] dmaengine: idxd: optimize perfmon_assign_event() Yury Norov
@ 2023-11-20 15:34   ` Dave Jiang
  2023-11-24 12:15   ` Vinod Koul
  1 sibling, 0 replies; 68+ messages in thread
From: Dave Jiang @ 2023-11-20 15:34 UTC (permalink / raw)
  To: Yury Norov, linux-kernel, Fenghua Yu, Vinod Koul, dmaengine
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov, Fenghua Yu,
	tom.zanussi



On 11/18/23 08:50, Yury Norov wrote:
> The function searches used_mask for a set bit in a for-loop bit by bit.
> We can do it faster by using atomic find_and_set_bit().
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
>  drivers/dma/idxd/perfmon.c | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/dma/idxd/perfmon.c b/drivers/dma/idxd/perfmon.c
> index fdda6d604262..4dd9c0d979c3 100644
> --- a/drivers/dma/idxd/perfmon.c
> +++ b/drivers/dma/idxd/perfmon.c
> @@ -134,13 +134,9 @@ static void perfmon_assign_hw_event(struct idxd_pmu *idxd_pmu,
>  static int perfmon_assign_event(struct idxd_pmu *idxd_pmu,
>  				struct perf_event *event)
>  {
> -	int i;
> -
> -	for (i = 0; i < IDXD_PMU_EVENT_MAX; i++)
> -		if (!test_and_set_bit(i, idxd_pmu->used_mask))
> -			return i;
> +	int i = find_and_set_bit(idxd_pmu->used_mask, IDXD_PMU_EVENT_MAX);
>  
> -	return -EINVAL;
> +	return i < IDXD_PMU_EVENT_MAX ? i : -EINVAL;
>  }
>  
>  /*

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 30/34] ALSA: use atomic find_bit() functions where applicable
  2023-11-18 15:51 ` [PATCH 30/34] ALSA: use atomic find_bit() functions where applicable Yury Norov
@ 2023-11-20 15:57   ` Takashi Iwai
  0 siblings, 0 replies; 68+ messages in thread
From: Takashi Iwai @ 2023-11-20 15:57 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-kernel, Jaroslav Kysela, Takashi Iwai, Daniel Mack,
	Cezary Rojewski, Kai Vehmanen, Kees Cook, linux-sound,
	alsa-devel, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

On Sat, 18 Nov 2023 16:51:01 +0100,
Yury Norov wrote:
> 
> ALSA code tests each bit in bitmaps in a for() loop. Switch it to
> dedicated atomic find_bit() API.
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>

Through a quick glance, both changes look OK.
Feel free to take my ack

Acked-by: Takashi Iwai <tiwai@suse.de>


thanks,

Takashi

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 04/34] sched: add cpumask_find_and_set() and use it in __mm_cid_get()
  2023-11-20 11:31   ` Peter Zijlstra
@ 2023-11-20 16:17     ` Mathieu Desnoyers
  2023-11-21 13:31       ` Yury Norov
  0 siblings, 1 reply; 68+ messages in thread
From: Mathieu Desnoyers @ 2023-11-20 16:17 UTC (permalink / raw)
  To: Peter Zijlstra, Yury Norov
  Cc: linux-kernel, Andy Shevchenko, Rasmus Villemoes, Ingo Molnar,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Valentin Schneider, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Maxim Kuvyrkov, Alexey Klimov

On 2023-11-20 06:31, Peter Zijlstra wrote:
> On Sat, Nov 18, 2023 at 07:50:35AM -0800, Yury Norov wrote:
>> __mm_cid_get() uses a __mm_cid_try_get() helper to atomically acquire a
>> bit in mm cid mask. Now that we have atomic find_and_set_bit(), we can
>> easily extend it to cpumasks and use in the scheduler code.
>>
>> __mm_cid_try_get() has an infinite loop, which may delay forward
>> progress of __mm_cid_get() when the mask is dense. The
>> cpumask_find_and_set() doesn't poll the mask infinitely, and returns as
>> soon as nothing has found after the first iteration, allowing to acquire
>> the lock, and set use_cid_lock faster, if needed.
> 
> Methieu, I forgot again, but the comment delete seems to suggest you did
> this on purpose...

See comments below.

> 
>> cpumask_find_and_set() considers cid mask as a volatile region of memory,
>> as it actually is in this case. So, if it's changed while search is in
>> progress, KCSAN wouldn't fire warning on it.
>>
>> Signed-off-by: Yury Norov <yury.norov@gmail.com>
>> ---
>>   include/linux/cpumask.h | 12 ++++++++++
>>   kernel/sched/sched.h    | 52 ++++++++++++-----------------------------
>>   2 files changed, 27 insertions(+), 37 deletions(-)
>>
>> diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
>> index cfb545841a2c..c2acced8be4e 100644
>> --- a/include/linux/cpumask.h
>> +++ b/include/linux/cpumask.h
>> @@ -271,6 +271,18 @@ unsigned int cpumask_next_and(int n, const struct cpumask *src1p,
>>   		small_cpumask_bits, n + 1);
>>   }
>>   
>> +/**
>> + * cpumask_find_and_set - find the first unset cpu in a cpumask and
>> + *			  set it atomically
>> + * @srcp: the cpumask pointer
>> + *
>> + * Return: >= nr_cpu_ids if nothing is found.
>> + */
>> +static inline unsigned int cpumask_find_and_set(volatile struct cpumask *srcp)
>> +{
>> +	return find_and_set_bit(cpumask_bits(srcp), small_cpumask_bits);
>> +}
>> +
>>   /**
>>    * for_each_cpu - iterate over every cpu in a mask
>>    * @cpu: the (optionally unsigned) integer iterator
>> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
>> index 2e5a95486a42..b2f095a9fc40 100644
>> --- a/kernel/sched/sched.h
>> +++ b/kernel/sched/sched.h
>> @@ -3345,28 +3345,6 @@ static inline void mm_cid_put(struct mm_struct *mm)
>>   	__mm_cid_put(mm, mm_cid_clear_lazy_put(cid));
>>   }
>>   
>> -static inline int __mm_cid_try_get(struct mm_struct *mm)
>> -{
>> -	struct cpumask *cpumask;
>> -	int cid;
>> -
>> -	cpumask = mm_cidmask(mm);
>> -	/*
>> -	 * Retry finding first zero bit if the mask is temporarily
>> -	 * filled. This only happens during concurrent remote-clear
>> -	 * which owns a cid without holding a rq lock.
>> -	 */
>> -	for (;;) {
>> -		cid = cpumask_first_zero(cpumask);
>> -		if (cid < nr_cpu_ids)
>> -			break;
>> -		cpu_relax();
>> -	}
>> -	if (cpumask_test_and_set_cpu(cid, cpumask))
>> -		return -1;

This was split in find / test_and_set on purpose because following
patches I have (implementing numa-aware mm_cid) have a scan which
needs to scan sets of two cpumasks in parallel (with "and" and
and_not" operators).

Moreover, the "mask full" scenario only happens while a concurrent
remote-clear temporarily owns a cid without rq lock. See
sched_mm_cid_remote_clear():

         /*
          * The cid is unused, so it can be unset.
          * Disable interrupts to keep the window of cid ownership without rq
          * lock small.
          */
         local_irq_save(flags);
         if (try_cmpxchg(&pcpu_cid->cid, &lazy_cid, MM_CID_UNSET))
                 __mm_cid_put(mm, cid);
         local_irq_restore(flags);

The proposed patch here turns this scenario into something heavier
(setting the use_cid_lock) rather than just retrying. I guess the
question to ask here is whether it is theoretically possible to cause
__mm_cid_try_get() to fail to have forward progress if we have a high
rate of sched_mm_cid_remote_clear. If we decide that this is indeed
a possible progress-failure scenario, then it makes sense to fallback
to use_cid_lock as soon as a full mask is encountered.

However, removing the __mm_cid_try_get() helper will make it harder to
integrate the following numa-awareness patches I have on top.

I am not against using cpumask_find_and_set, but can we keep the
__mm_cid_try_get() helper to facilitate integration of future work ?
We just have to make it use cpumask_find_and_set, which should be
easy.

>> -	return cid;
>> -}
>> -
>>   /*
>>    * Save a snapshot of the current runqueue time of this cpu
>>    * with the per-cpu cid value, allowing to estimate how recently it was used.
>> @@ -3381,25 +3359,25 @@ static inline void mm_cid_snapshot_time(struct rq *rq, struct mm_struct *mm)
>>   
>>   static inline int __mm_cid_get(struct rq *rq, struct mm_struct *mm)
>>   {
>> +	struct cpumask *cpumask = mm_cidmask(mm);
>>   	int cid;
>>   
>> -	/*
>> -	 * All allocations (even those using the cid_lock) are lock-free. If
>> -	 * use_cid_lock is set, hold the cid_lock to perform cid allocation to
>> -	 * guarantee forward progress.
>> -	 */
>> +	/* All allocations (even those using the cid_lock) are lock-free. */
>>   	if (!READ_ONCE(use_cid_lock)) {
>> -		cid = __mm_cid_try_get(mm);
>> -		if (cid >= 0)
>> +		cid = cpumask_find_and_set(cpumask);
>> +		if (cid < nr_cpu_ids)
>>   			goto end;
>> -		raw_spin_lock(&cid_lock);
>> -	} else {
>> -		raw_spin_lock(&cid_lock);
>> -		cid = __mm_cid_try_get(mm);
>> -		if (cid >= 0)
>> -			goto unlock;
>>   	}
>>   
>> +	/*
>> +	 * If use_cid_lock is set, hold the cid_lock to perform cid
>> +	 * allocation to guarantee forward progress.
>> +	 */
>> +	raw_spin_lock(&cid_lock);
>> +	cid = cpumask_find_and_set(cpumask);
>> +	if (cid < nr_cpu_ids)
>> +		goto unlock;

In the !use_cid_lock case where we already failed a lookup above, this change
ends up doing another attempt at lookup before setting the use_cid_lock and
attempting again until success. I am not sure what is the motivation for changing
the code flow here ?

General comment about the rest of the series: please review code comments for
typos.

Thanks,

Mathieu

>> +
>>   	/*
>>   	 * cid concurrently allocated. Retry while forcing following
>>   	 * allocations to use the cid_lock to ensure forward progress.
>> @@ -3415,9 +3393,9 @@ static inline int __mm_cid_get(struct rq *rq, struct mm_struct *mm)
>>   	 * all newcoming allocations observe the use_cid_lock flag set.
>>   	 */
>>   	do {
>> -		cid = __mm_cid_try_get(mm);
>> +		cid = cpumask_find_and_set(cpumask);
>>   		cpu_relax();
>> -	} while (cid < 0);
>> +	} while (cid >= nr_cpu_ids);
>>   	/*
>>   	 * Allocate before clearing use_cid_lock. Only care about
>>   	 * program order because this is for forward progress.
>> -- 
>> 2.39.2
>>

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 04/34] sched: add cpumask_find_and_set() and use it in __mm_cid_get()
  2023-11-20 16:17     ` Mathieu Desnoyers
@ 2023-11-21 13:31       ` Yury Norov
  2023-11-21 13:44         ` Mathieu Desnoyers
  0 siblings, 1 reply; 68+ messages in thread
From: Yury Norov @ 2023-11-21 13:31 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, linux-kernel, Andy Shevchenko, Rasmus Villemoes,
	Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Jan Kara,
	Mirsad Todorovac, Matthew Wilcox, Maxim Kuvyrkov, Alexey Klimov

On Mon, Nov 20, 2023 at 11:17:32AM -0500, Mathieu Desnoyers wrote:

...

> > > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> > > index 2e5a95486a42..b2f095a9fc40 100644
> > > --- a/kernel/sched/sched.h
> > > +++ b/kernel/sched/sched.h
> > > @@ -3345,28 +3345,6 @@ static inline void mm_cid_put(struct mm_struct *mm)
> > >   	__mm_cid_put(mm, mm_cid_clear_lazy_put(cid));
> > >   }
> > > -static inline int __mm_cid_try_get(struct mm_struct *mm)
> > > -{
> > > -	struct cpumask *cpumask;
> > > -	int cid;
> > > -
> > > -	cpumask = mm_cidmask(mm);
> > > -	/*
> > > -	 * Retry finding first zero bit if the mask is temporarily
> > > -	 * filled. This only happens during concurrent remote-clear
> > > -	 * which owns a cid without holding a rq lock.
> > > -	 */
> > > -	for (;;) {
> > > -		cid = cpumask_first_zero(cpumask);
> > > -		if (cid < nr_cpu_ids)
> > > -			break;
> > > -		cpu_relax();
> > > -	}
> > > -	if (cpumask_test_and_set_cpu(cid, cpumask))
> > > -		return -1;
> 
> This was split in find / test_and_set on purpose because following
> patches I have (implementing numa-aware mm_cid) have a scan which
> needs to scan sets of two cpumasks in parallel (with "and" and
> and_not" operators).
> 
> Moreover, the "mask full" scenario only happens while a concurrent
> remote-clear temporarily owns a cid without rq lock. See
> sched_mm_cid_remote_clear():
> 
>         /*
>          * The cid is unused, so it can be unset.
>          * Disable interrupts to keep the window of cid ownership without rq
>          * lock small.
>          */
>         local_irq_save(flags);
>         if (try_cmpxchg(&pcpu_cid->cid, &lazy_cid, MM_CID_UNSET))
>                 __mm_cid_put(mm, cid);
>         local_irq_restore(flags);
> 
> The proposed patch here turns this scenario into something heavier
> (setting the use_cid_lock) rather than just retrying. I guess the
> question to ask here is whether it is theoretically possible to cause
> __mm_cid_try_get() to fail to have forward progress if we have a high
> rate of sched_mm_cid_remote_clear. If we decide that this is indeed
> a possible progress-failure scenario, then it makes sense to fallback
> to use_cid_lock as soon as a full mask is encountered.
> 
> However, removing the __mm_cid_try_get() helper will make it harder to
> integrate the following numa-awareness patches I have on top.
> 
> I am not against using cpumask_find_and_set, but can we keep the
> __mm_cid_try_get() helper to facilitate integration of future work ?
> We just have to make it use cpumask_find_and_set, which should be
> easy.

Sure, I can. Can you point me to the work you mention here?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 13/34] KVM: x86: hyper-v: optimize and cleanup kvm_hv_process_stimers()
  2023-11-20 14:26   ` Vitaly Kuznetsov
@ 2023-11-21 13:35     ` Yury Norov
  0 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-11-21 13:35 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: linux-kernel, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, kvm, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

On Mon, Nov 20, 2023 at 03:26:08PM +0100, Vitaly Kuznetsov wrote:
> Yury Norov <yury.norov@gmail.com> writes:
> 
> > The function traverses stimer_pending_bitmap n a for-loop bit by bit.
> > We can do it faster by using atomic find_and_set_bit().
> >
> > While here, refactor the logic by decreasing indentation level
> > and dropping 2nd check for stimer->config.enable.
> >
> > Signed-off-by: Yury Norov <yury.norov@gmail.com>
> > ---
> >  arch/x86/kvm/hyperv.c | 39 +++++++++++++++++++--------------------
> >  1 file changed, 19 insertions(+), 20 deletions(-)
> >
> > diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> > index 238afd7335e4..460e300b558b 100644
> > --- a/arch/x86/kvm/hyperv.c
> > +++ b/arch/x86/kvm/hyperv.c
> > @@ -870,27 +870,26 @@ void kvm_hv_process_stimers(struct kvm_vcpu *vcpu)
> >  	if (!hv_vcpu)
> >  		return;
> >  
> > -	for (i = 0; i < ARRAY_SIZE(hv_vcpu->stimer); i++)
> > -		if (test_and_clear_bit(i, hv_vcpu->stimer_pending_bitmap)) {
> > -			stimer = &hv_vcpu->stimer[i];
> > -			if (stimer->config.enable) {
> > -				exp_time = stimer->exp_time;
> > -
> > -				if (exp_time) {
> > -					time_now =
> > -						get_time_ref_counter(vcpu->kvm);
> > -					if (time_now >= exp_time)
> > -						stimer_expiration(stimer);
> > -				}
> > -
> > -				if ((stimer->config.enable) &&
> > -				    stimer->count) {
> > -					if (!stimer->msg_pending)
> > -						stimer_start(stimer);
> > -				} else
> > -					stimer_cleanup(stimer);
> > -			}
> > +	for_each_test_and_clear_bit(i, hv_vcpu->stimer_pending_bitmap,
> > +					ARRAY_SIZE(hv_vcpu->stimer)) {
> > +		stimer = &hv_vcpu->stimer[i];
> > +		if (!stimer->config.enable)
> > +			continue;
> > +
> > +		exp_time = stimer->exp_time;
> > +
> > +		if (exp_time) {
> > +			time_now = get_time_ref_counter(vcpu->kvm);
> > +			if (time_now >= exp_time)
> > +				stimer_expiration(stimer);
> >  		}
> > +
> > +		if (stimer->count) {
> 
> You can't drop 'stimer->config.enable' check here as stimer_expiration()
> call above actually changes it. This is done on purpose: oneshot timers
> fire only once so 'config.enable' is reset to 0.

Ok, I see. Will fix in v2

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 29/34] net: smc: fix opencoded find_and_set_bit() in smc_wr_tx_get_free_slot_index()
  2023-11-20  8:43   ` Alexandra Winter
@ 2023-11-21 13:41     ` Yury Norov
  2023-11-21 15:39       ` Alexandra Winter
  0 siblings, 1 reply; 68+ messages in thread
From: Yury Norov @ 2023-11-21 13:41 UTC (permalink / raw)
  To: Alexandra Winter
  Cc: linux-kernel, Karsten Graul, Wenjia Zhang, Jan Karcher, D. Wythe,
	Tony Lu, Wen Gu, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, linux-s390, netdev, Jan Kara, Mirsad Todorovac,
	Matthew Wilcox, Rasmus Villemoes, Andy Shevchenko,
	Maxim Kuvyrkov, Alexey Klimov

On Mon, Nov 20, 2023 at 09:43:54AM +0100, Alexandra Winter wrote:
> 
> 
> On 18.11.23 16:51, Yury Norov wrote:
> > The function opencodes find_and_set_bit() with a for_each() loop. Fix
> > it, and make the whole function a simple almost one-liner.
> > 
> > Signed-off-by: Yury Norov <yury.norov@gmail.com>
> > ---
> >  net/smc/smc_wr.c | 10 +++-------
> >  1 file changed, 3 insertions(+), 7 deletions(-)
> > 
> > diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c
> > index 0021065a600a..b6f0cfc52788 100644
> > --- a/net/smc/smc_wr.c
> > +++ b/net/smc/smc_wr.c
> > @@ -170,15 +170,11 @@ void smc_wr_tx_cq_handler(struct ib_cq *ib_cq, void *cq_context)
> >  
> >  static inline int smc_wr_tx_get_free_slot_index(struct smc_link *link, u32 *idx)
> >  {
> > -	*idx = link->wr_tx_cnt;
> >  	if (!smc_link_sendable(link))
> >  		return -ENOLINK;
> > -	for_each_clear_bit(*idx, link->wr_tx_mask, link->wr_tx_cnt) {
> > -		if (!test_and_set_bit(*idx, link->wr_tx_mask))
> > -			return 0;
> > -	}
> > -	*idx = link->wr_tx_cnt;
> > -	return -EBUSY;
> > +
> > +	*idx = find_and_set_bit(link->wr_tx_mask, link->wr_tx_cnt);
> > +	return *idx < link->wr_tx_cnt ? 0 : -EBUSY;
> >  }
> >  
> >  /**
> 
> 
> My understanding is that you can omit the lines with
> > -	*idx = link->wr_tx_cnt;
> because they only apply to the error paths and you checked that the calling function
> does not use the idx variable in the error cases. Do I understand this correct?
> 
> If so the removal of these 2 lines is not related to your change of using find_and_set_bit(),
> do I understand that correctly?
> 
> If so, it may be worth mentioning that in the commit message.

I'll add:

        If find_and_set_bit() doesn't acquire a bit, it returns
        ->wr_tx_cnt, and so explicit initialization of *idx with
        the same value is unneeded.

Makes sense?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 34/34] sh: rework ilsel_enable()
  2023-11-18 16:15   ` John Paul Adrian Glaubitz
@ 2023-11-21 13:43     ` Yury Norov
  0 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-11-21 13:43 UTC (permalink / raw)
  To: John Paul Adrian Glaubitz
  Cc: linux-kernel, Yoshinori Sato, Rich Felker, linux-sh, Jan Kara,
	Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

On Sat, Nov 18, 2023 at 05:15:57PM +0100, John Paul Adrian Glaubitz wrote:
> Hello Yury!
> 
> On Sat, 2023-11-18 at 07:51 -0800, Yury Norov wrote:
> > Fix opencoded find_and_set_bit(), which also suppresses potential
> > KCSAN warning.
> > 
> > Signed-off-by: Yury Norov <yury.norov@gmail.com>
> > ---
> >  arch/sh/boards/mach-x3proto/ilsel.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/sh/boards/mach-x3proto/ilsel.c b/arch/sh/boards/mach-x3proto/ilsel.c
> > index f0d5eb41521a..7fadc479a80b 100644
> > --- a/arch/sh/boards/mach-x3proto/ilsel.c
> > +++ b/arch/sh/boards/mach-x3proto/ilsel.c
> > @@ -99,8 +99,8 @@ int ilsel_enable(ilsel_source_t set)
> >  	}
> >  
> >  	do {
> > -		bit = find_first_zero_bit(&ilsel_level_map, ILSEL_LEVELS);
> > -	} while (test_and_set_bit(bit, &ilsel_level_map));
> > +		bit = find_and_set_bit(&ilsel_level_map, ILSEL_LEVELS);
> > +	} while (bit >= ILSEL_LEVELS);
> >  
> >  	__ilsel_enable(set, bit);
> >  
> 
> The subject should mention the subsystem, i.e. "sh: mach-x3proto:".

OK, will do in v2

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 04/34] sched: add cpumask_find_and_set() and use it in __mm_cid_get()
  2023-11-21 13:31       ` Yury Norov
@ 2023-11-21 13:44         ` Mathieu Desnoyers
  2023-11-21 17:00           ` Yury Norov
  0 siblings, 1 reply; 68+ messages in thread
From: Mathieu Desnoyers @ 2023-11-21 13:44 UTC (permalink / raw)
  To: Yury Norov
  Cc: Peter Zijlstra, linux-kernel, Andy Shevchenko, Rasmus Villemoes,
	Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Jan Kara,
	Mirsad Todorovac, Matthew Wilcox, Maxim Kuvyrkov, Alexey Klimov

On 2023-11-21 08:31, Yury Norov wrote:
> On Mon, Nov 20, 2023 at 11:17:32AM -0500, Mathieu Desnoyers wrote:
> 
[...]
> 
> Sure, I can. Can you point me to the work you mention here?

It would have to be updated now, but here is the last version that was posted:

https://lore.kernel.org/lkml/20221122203932.231377-1-mathieu.desnoyers@efficios.com/

Especially those patches:

2022-11-22 20:39 ` [PATCH 22/30] lib: Implement find_{first,next,nth}_notandnot_bit, find_first_andnot_bit Mathieu Desnoyers
2022-11-22 20:39 ` [PATCH 23/30] cpumask: Implement cpumask_{first,next}_{not,}andnot Mathieu Desnoyers
2022-11-22 20:39 ` [PATCH 24/30] sched: NUMA-aware per-memory-map concurrency ID Mathieu Desnoyers
2022-11-22 20:39 ` [PATCH 25/30] rseq: Extend struct rseq with per-memory-map NUMA-aware Concurrency ID Mathieu Desnoyers
2022-11-22 20:39 ` [PATCH 26/30] selftests/rseq: x86: Implement rseq_load_u32_u32 Mathieu Desnoyers
2022-11-22 20:39 ` [PATCH 27/30] selftests/rseq: Implement mm_numa_cid accessors in headers Mathieu Desnoyers
2022-11-22 20:39 ` [PATCH 28/30] selftests/rseq: Implement numa node id vs mm_numa_cid invariant test Mathieu Desnoyers
2022-11-22 20:39 ` [PATCH 29/30] selftests/rseq: Implement mm_numa_cid tests Mathieu Desnoyers
2022-11-22 20:39 ` [PATCH 30/30] tracing/rseq: Add mm_numa_cid field to rseq_update Mathieu Desnoyers

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 32/34] m68k: rework get_mmu_context()
  2023-11-18 15:51 ` [PATCH 32/34] m68k: rework get_mmu_context() Yury Norov
  2023-11-19 19:29   ` Geert Uytterhoeven
@ 2023-11-21 14:39   ` Greg Ungerer
  1 sibling, 0 replies; 68+ messages in thread
From: Greg Ungerer @ 2023-11-21 14:39 UTC (permalink / raw)
  To: Yury Norov, linux-kernel, Geert Uytterhoeven, Hugh Dickins,
	Andrew Morton, linux-m68k
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

Hi Yury,

On 19/11/23 01:51, Yury Norov wrote:
> ALSA code opencodes atomic find_and_set_bit_wrap(). Switch it to
   ^^^^
m68k?


> dedicated function.
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>

Looks ok to me:

Acked-by: Greg Ungerer <gerg@linux-m68k.org>

Regards
Greg



> ---
>   arch/m68k/include/asm/mmu_context.h | 11 +++++------
>   1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/m68k/include/asm/mmu_context.h b/arch/m68k/include/asm/mmu_context.h
> index 141bbdfad960..0419ad87a1c1 100644
> --- a/arch/m68k/include/asm/mmu_context.h
> +++ b/arch/m68k/include/asm/mmu_context.h
> @@ -35,12 +35,11 @@ static inline void get_mmu_context(struct mm_struct *mm)
>   		atomic_inc(&nr_free_contexts);
>   		steal_context();
>   	}
> -	ctx = next_mmu_context;
> -	while (test_and_set_bit(ctx, context_map)) {
> -		ctx = find_next_zero_bit(context_map, LAST_CONTEXT+1, ctx);
> -		if (ctx > LAST_CONTEXT)
> -			ctx = 0;
> -	}
> +
> +	do {
> +		ctx = find_and_set_bit_wrap(context_map, LAST_CONTEXT + 1, next_mmu_context);
> +	} while (ctx > LAST_CONTEXT);
> +
>   	next_mmu_context = (ctx + 1) & LAST_CONTEXT;
>   	mm->context = ctx;
>   	context_mm[ctx] = mm;

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 29/34] net: smc: fix opencoded find_and_set_bit() in smc_wr_tx_get_free_slot_index()
  2023-11-21 13:41     ` Yury Norov
@ 2023-11-21 15:39       ` Alexandra Winter
  0 siblings, 0 replies; 68+ messages in thread
From: Alexandra Winter @ 2023-11-21 15:39 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-kernel, Karsten Graul, Wenjia Zhang, Jan Karcher, D. Wythe,
	Tony Lu, Wen Gu, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, linux-s390, netdev, Jan Kara, Mirsad Todorovac,
	Matthew Wilcox, Rasmus Villemoes, Andy Shevchenko,
	Maxim Kuvyrkov, Alexey Klimov



On 21.11.23 14:41, Yury Norov wrote:
> On Mon, Nov 20, 2023 at 09:43:54AM +0100, Alexandra Winter wrote:
>>
>>
>> On 18.11.23 16:51, Yury Norov wrote:
>>> The function opencodes find_and_set_bit() with a for_each() loop. Fix
>>> it, and make the whole function a simple almost one-liner.
>>>
>>> Signed-off-by: Yury Norov <yury.norov@gmail.com>
>>> ---
>>>  net/smc/smc_wr.c | 10 +++-------
>>>  1 file changed, 3 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c
>>> index 0021065a600a..b6f0cfc52788 100644
>>> --- a/net/smc/smc_wr.c
>>> +++ b/net/smc/smc_wr.c
>>> @@ -170,15 +170,11 @@ void smc_wr_tx_cq_handler(struct ib_cq *ib_cq, void *cq_context)
>>>  
>>>  static inline int smc_wr_tx_get_free_slot_index(struct smc_link *link, u32 *idx)
>>>  {
>>> -	*idx = link->wr_tx_cnt;
>>>  	if (!smc_link_sendable(link))
>>>  		return -ENOLINK;
>>> -	for_each_clear_bit(*idx, link->wr_tx_mask, link->wr_tx_cnt) {
>>> -		if (!test_and_set_bit(*idx, link->wr_tx_mask))
>>> -			return 0;
>>> -	}
>>> -	*idx = link->wr_tx_cnt;
>>> -	return -EBUSY;
>>> +
>>> +	*idx = find_and_set_bit(link->wr_tx_mask, link->wr_tx_cnt);
>>> +	return *idx < link->wr_tx_cnt ? 0 : -EBUSY;
>>>  }
>>>  
>>>  /**
>>
>>
>> My understanding is that you can omit the lines with
>>> -	*idx = link->wr_tx_cnt;
>> because they only apply to the error paths and you checked that the calling function
>> does not use the idx variable in the error cases. Do I understand this correct?
>>
>> If so the removal of these 2 lines is not related to your change of using find_and_set_bit(),
>> do I understand that correctly?
>>
>> If so, it may be worth mentioning that in the commit message.
> 
> I'll add:
> 
>         If find_and_set_bit() doesn't acquire a bit, it returns
>         ->wr_tx_cnt, and so explicit initialization of *idx with
>         the same value is unneeded.
> 
> Makes sense?
> 

Makes sense for the -EBUSY case, thank you. 
It does not explain that you also removed the line for the -ENOLINK case 
(which is ok, because the caller has also initialized it to link->wr_tx_cnt)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 07/34] perf/arm: optimize opencoded atomic find_bit() API
  2023-11-18 15:50 ` [PATCH 07/34] perf/arm: optimize opencoded atomic find_bit() API Yury Norov
@ 2023-11-21 15:53   ` Will Deacon
  2023-11-21 16:16     ` Yury Norov
  0 siblings, 1 reply; 68+ messages in thread
From: Will Deacon @ 2023-11-21 15:53 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-kernel, Mark Rutland, linux-arm-kernel, Jan Kara,
	Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

On Sat, Nov 18, 2023 at 07:50:38AM -0800, Yury Norov wrote:
> Switch subsystem to use atomic find_bit() or atomic iterators as
> appropriate.
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> ---
>  drivers/perf/arm-cci.c        | 23 +++++------------------
>  drivers/perf/arm-ccn.c        | 10 ++--------
>  drivers/perf/arm_dmc620_pmu.c |  9 ++-------
>  drivers/perf/arm_pmuv3.c      |  8 ++------
>  4 files changed, 11 insertions(+), 39 deletions(-)
> 
> diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c
> index 61de861eaf91..70fbf9d09d37 100644
> --- a/drivers/perf/arm-cci.c
> +++ b/drivers/perf/arm-cci.c
> @@ -320,12 +320,8 @@ static int cci400_get_event_idx(struct cci_pmu *cci_pmu,
>  		return CCI400_PMU_CYCLE_CNTR_IDX;
>  	}
>  
> -	for (idx = CCI400_PMU_CNTR0_IDX; idx <= CCI_PMU_CNTR_LAST(cci_pmu); ++idx)
> -		if (!test_and_set_bit(idx, hw->used_mask))
> -			return idx;
> -
> -	/* No counters available */
> -	return -EAGAIN;
> +	idx = find_and_set_bit(hw->used_mask, CCI_PMU_CNTR_LAST(cci_pmu) + 1);

CCI400_PMU_CNTR0_IDX is defined as 1, so isn't this wrong?

[...]

> diff --git a/drivers/perf/arm_dmc620_pmu.c b/drivers/perf/arm_dmc620_pmu.c
> index 30cea6859574..e41c84dabc3e 100644
> --- a/drivers/perf/arm_dmc620_pmu.c
> +++ b/drivers/perf/arm_dmc620_pmu.c
> @@ -303,13 +303,8 @@ static int dmc620_get_event_idx(struct perf_event *event)
>  		end_idx = DMC620_PMU_MAX_COUNTERS;
>  	}
>  
> -	for (idx = start_idx; idx < end_idx; ++idx) {
> -		if (!test_and_set_bit(idx, dmc620_pmu->used_mask))
> -			return idx;
> -	}
> -
> -	/* The counters are all in use. */
> -	return -EAGAIN;
> +	idx = find_and_set_next_bit(dmc620_pmu->used_mask, end_idx, start_idx);

It might just be me, but I'd find this a tonne easier to read if you swapped
the last two arguments around so that the offset came before the limit in
the new function.

Will

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 08/34] drivers/perf: optimize ali_drw_get_counter_idx() by using find_bit()
  2023-11-18 15:50 ` [PATCH 08/34] drivers/perf: optimize ali_drw_get_counter_idx() by using find_bit() Yury Norov
@ 2023-11-21 15:54   ` Will Deacon
  0 siblings, 0 replies; 68+ messages in thread
From: Will Deacon @ 2023-11-21 15:54 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-kernel, Shuai Xue, Mark Rutland, linux-arm-kernel,
	Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

On Sat, Nov 18, 2023 at 07:50:39AM -0800, Yury Norov wrote:
> The function searches used_mask for a set bit in a for-loop bit by bit.
> We can do it faster by using atomic find_and_set_bit().
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> ---
>  drivers/perf/alibaba_uncore_drw_pmu.c | 10 ++--------
>  1 file changed, 2 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/perf/alibaba_uncore_drw_pmu.c b/drivers/perf/alibaba_uncore_drw_pmu.c
> index 19d459a36be5..2a3b7701d568 100644
> --- a/drivers/perf/alibaba_uncore_drw_pmu.c
> +++ b/drivers/perf/alibaba_uncore_drw_pmu.c
> @@ -274,15 +274,9 @@ static const struct attribute_group *ali_drw_pmu_attr_groups[] = {
>  static int ali_drw_get_counter_idx(struct perf_event *event)
>  {
>  	struct ali_drw_pmu *drw_pmu = to_ali_drw_pmu(event->pmu);
> -	int idx;
> +	int idx = find_and_set_bit(drw_pmu->used_mask, ALI_DRW_PMU_COMMON_MAX_COUNTERS);
>  
> -	for (idx = 0; idx < ALI_DRW_PMU_COMMON_MAX_COUNTERS; ++idx) {
> -		if (!test_and_set_bit(idx, drw_pmu->used_mask))
> -			return idx;
> -	}
> -
> -	/* The counters are all in use. */
> -	return -EBUSY;
> +	return idx < ALI_DRW_PMU_COMMON_MAX_COUNTERS ? idx : -EBUSY;
>  }
>  
>  static u64 ali_drw_pmu_read_counter(struct perf_event *event)
> -- 
> 2.39.2

Acked-by: Will Deacon <will@kernel.org>

Will


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 07/34] perf/arm: optimize opencoded atomic find_bit() API
  2023-11-21 15:53   ` Will Deacon
@ 2023-11-21 16:16     ` Yury Norov
  2023-11-21 16:17       ` Will Deacon
  0 siblings, 1 reply; 68+ messages in thread
From: Yury Norov @ 2023-11-21 16:16 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-kernel, Mark Rutland, linux-arm-kernel, Jan Kara,
	Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

On Tue, Nov 21, 2023 at 03:53:44PM +0000, Will Deacon wrote:
> On Sat, Nov 18, 2023 at 07:50:38AM -0800, Yury Norov wrote:
> > Switch subsystem to use atomic find_bit() or atomic iterators as
> > appropriate.
> > 
> > Signed-off-by: Yury Norov <yury.norov@gmail.com>
> > ---
> >  drivers/perf/arm-cci.c        | 23 +++++------------------
> >  drivers/perf/arm-ccn.c        | 10 ++--------
> >  drivers/perf/arm_dmc620_pmu.c |  9 ++-------
> >  drivers/perf/arm_pmuv3.c      |  8 ++------
> >  4 files changed, 11 insertions(+), 39 deletions(-)
> > 
> > diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c
> > index 61de861eaf91..70fbf9d09d37 100644
> > --- a/drivers/perf/arm-cci.c
> > +++ b/drivers/perf/arm-cci.c
> > @@ -320,12 +320,8 @@ static int cci400_get_event_idx(struct cci_pmu *cci_pmu,
> >  		return CCI400_PMU_CYCLE_CNTR_IDX;
> >  	}
> >  
> > -	for (idx = CCI400_PMU_CNTR0_IDX; idx <= CCI_PMU_CNTR_LAST(cci_pmu); ++idx)
> > -		if (!test_and_set_bit(idx, hw->used_mask))
> > -			return idx;
> > -
> > -	/* No counters available */
> > -	return -EAGAIN;
> > +	idx = find_and_set_bit(hw->used_mask, CCI_PMU_CNTR_LAST(cci_pmu) + 1);
> 
> CCI400_PMU_CNTR0_IDX is defined as 1, so isn't this wrong?

You're right. Will fix in v2
 
> [...]
> 
> > diff --git a/drivers/perf/arm_dmc620_pmu.c b/drivers/perf/arm_dmc620_pmu.c
> > index 30cea6859574..e41c84dabc3e 100644
> > --- a/drivers/perf/arm_dmc620_pmu.c
> > +++ b/drivers/perf/arm_dmc620_pmu.c
> > @@ -303,13 +303,8 @@ static int dmc620_get_event_idx(struct perf_event *event)
> >  		end_idx = DMC620_PMU_MAX_COUNTERS;
> >  	}
> >  
> > -	for (idx = start_idx; idx < end_idx; ++idx) {
> > -		if (!test_and_set_bit(idx, dmc620_pmu->used_mask))
> > -			return idx;
> > -	}
> > -
> > -	/* The counters are all in use. */
> > -	return -EAGAIN;
> > +	idx = find_and_set_next_bit(dmc620_pmu->used_mask, end_idx, start_idx);
> 
> It might just be me, but I'd find this a tonne easier to read if you swapped
> the last two arguments around so that the offset came before the limit in
> the new function.

I personally agree, but we already have find_next_*_bit(addr, nbits, offset)
functions, and having atomic versions of the same with different order
of arguments will make it even more messy...

Thanks,
        Yury

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 07/34] perf/arm: optimize opencoded atomic find_bit() API
  2023-11-21 16:16     ` Yury Norov
@ 2023-11-21 16:17       ` Will Deacon
  0 siblings, 0 replies; 68+ messages in thread
From: Will Deacon @ 2023-11-21 16:17 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-kernel, Mark Rutland, linux-arm-kernel, Jan Kara,
	Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

On Tue, Nov 21, 2023 at 08:16:13AM -0800, Yury Norov wrote:
> On Tue, Nov 21, 2023 at 03:53:44PM +0000, Will Deacon wrote:
> > On Sat, Nov 18, 2023 at 07:50:38AM -0800, Yury Norov wrote:
> > > Switch subsystem to use atomic find_bit() or atomic iterators as
> > > appropriate.
> > > 
> > > Signed-off-by: Yury Norov <yury.norov@gmail.com>
> > > ---
> > >  drivers/perf/arm-cci.c        | 23 +++++------------------
> > >  drivers/perf/arm-ccn.c        | 10 ++--------
> > >  drivers/perf/arm_dmc620_pmu.c |  9 ++-------
> > >  drivers/perf/arm_pmuv3.c      |  8 ++------
> > >  4 files changed, 11 insertions(+), 39 deletions(-)
> > > 
> > > diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c
> > > index 61de861eaf91..70fbf9d09d37 100644
> > > --- a/drivers/perf/arm-cci.c
> > > +++ b/drivers/perf/arm-cci.c
> > > @@ -320,12 +320,8 @@ static int cci400_get_event_idx(struct cci_pmu *cci_pmu,
> > >  		return CCI400_PMU_CYCLE_CNTR_IDX;
> > >  	}
> > >  
> > > -	for (idx = CCI400_PMU_CNTR0_IDX; idx <= CCI_PMU_CNTR_LAST(cci_pmu); ++idx)
> > > -		if (!test_and_set_bit(idx, hw->used_mask))
> > > -			return idx;
> > > -
> > > -	/* No counters available */
> > > -	return -EAGAIN;
> > > +	idx = find_and_set_bit(hw->used_mask, CCI_PMU_CNTR_LAST(cci_pmu) + 1);
> > 
> > CCI400_PMU_CNTR0_IDX is defined as 1, so isn't this wrong?
> 
> You're right. Will fix in v2
>  
> > [...]
> > 
> > > diff --git a/drivers/perf/arm_dmc620_pmu.c b/drivers/perf/arm_dmc620_pmu.c
> > > index 30cea6859574..e41c84dabc3e 100644
> > > --- a/drivers/perf/arm_dmc620_pmu.c
> > > +++ b/drivers/perf/arm_dmc620_pmu.c
> > > @@ -303,13 +303,8 @@ static int dmc620_get_event_idx(struct perf_event *event)
> > >  		end_idx = DMC620_PMU_MAX_COUNTERS;
> > >  	}
> > >  
> > > -	for (idx = start_idx; idx < end_idx; ++idx) {
> > > -		if (!test_and_set_bit(idx, dmc620_pmu->used_mask))
> > > -			return idx;
> > > -	}
> > > -
> > > -	/* The counters are all in use. */
> > > -	return -EAGAIN;
> > > +	idx = find_and_set_next_bit(dmc620_pmu->used_mask, end_idx, start_idx);
> > 
> > It might just be me, but I'd find this a tonne easier to read if you swapped
> > the last two arguments around so that the offset came before the limit in
> > the new function.
> 
> I personally agree, but we already have find_next_*_bit(addr, nbits, offset)
> functions, and having atomic versions of the same with different order
> of arguments will make it even more messy...

Urgh, and there's loads of them too :(

Will

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 12/34] wifi: intel: use atomic find_bit() API where appropriate
  2023-11-19 19:58   ` Johannes Berg
@ 2023-11-21 16:36     ` Yury Norov
  0 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-11-21 16:36 UTC (permalink / raw)
  To: Johannes Berg
  Cc: linux-kernel, Stanislaw Gruszka, Kalle Valo, Gregory Greenman,
	Hans de Goede, Kees Cook, Miri Korenblit, linux-wireless,
	Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

On Sun, Nov 19, 2023 at 08:58:25PM +0100, Johannes Berg wrote:
> On Sat, 2023-11-18 at 07:50 -0800, Yury Norov wrote:
> > iwlegacy and iwlwifi code opencodes atomic bit allocation/traversing by
> > using loops. 
> 
> That's really just due to being lazy though, it could use a non-atomic
> __test_and_set_bit() would be just fine in all of this, there's always a
> mutex held around it that protects the data.

Ok, then I'll drop the patch.

> Not that it means that the helper is _wrong_, it's just unnecessary, and
> you don't have non-atomic versions of these, do you?

Not yet. If atomic find_bit() will get merged, and there will be a
set of potential users of non-atomic version, I may need to revisit
it and add those non-atomic functions.

Thanks,
        Yury

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 04/34] sched: add cpumask_find_and_set() and use it in __mm_cid_get()
  2023-11-21 13:44         ` Mathieu Desnoyers
@ 2023-11-21 17:00           ` Yury Norov
  0 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-11-21 17:00 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, linux-kernel, Andy Shevchenko, Rasmus Villemoes,
	Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Jan Kara,
	Mirsad Todorovac, Matthew Wilcox, Maxim Kuvyrkov, Alexey Klimov

On Tue, Nov 21, 2023 at 08:44:17AM -0500, Mathieu Desnoyers wrote:
> On 2023-11-21 08:31, Yury Norov wrote:
> > On Mon, Nov 20, 2023 at 11:17:32AM -0500, Mathieu Desnoyers wrote:
> > 
> [...]
> > 
> > Sure, I can. Can you point me to the work you mention here?
> 
> It would have to be updated now, but here is the last version that was posted:
> 
> https://lore.kernel.org/lkml/20221122203932.231377-1-mathieu.desnoyers@efficios.com/
> 
> Especially those patches:
> 
> 2022-11-22 20:39 ` [PATCH 22/30] lib: Implement find_{first,next,nth}_notandnot_bit, find_first_andnot_bit Mathieu Desnoyers
> 2022-11-22 20:39 ` [PATCH 23/30] cpumask: Implement cpumask_{first,next}_{not,}andnot Mathieu Desnoyers
> 2022-11-22 20:39 ` [PATCH 24/30] sched: NUMA-aware per-memory-map concurrency ID Mathieu Desnoyers
> 2022-11-22 20:39 ` [PATCH 25/30] rseq: Extend struct rseq with per-memory-map NUMA-aware Concurrency ID Mathieu Desnoyers
> 2022-11-22 20:39 ` [PATCH 26/30] selftests/rseq: x86: Implement rseq_load_u32_u32 Mathieu Desnoyers
> 2022-11-22 20:39 ` [PATCH 27/30] selftests/rseq: Implement mm_numa_cid accessors in headers Mathieu Desnoyers
> 2022-11-22 20:39 ` [PATCH 28/30] selftests/rseq: Implement numa node id vs mm_numa_cid invariant test Mathieu Desnoyers
> 2022-11-22 20:39 ` [PATCH 29/30] selftests/rseq: Implement mm_numa_cid tests Mathieu Desnoyers
> 2022-11-22 20:39 ` [PATCH 30/30] tracing/rseq: Add mm_numa_cid field to rseq_update Mathieu Desnoyers

OK, I'll take a look.

Thanks,
Yury

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 19/34] sfc: switch to using atomic find_bit() API where appropriate
  2023-11-18 15:50 ` [PATCH 19/34] sfc: switch to using " Yury Norov
@ 2023-11-21 19:46   ` Edward Cree
  0 siblings, 0 replies; 68+ messages in thread
From: Edward Cree @ 2023-11-21 19:46 UTC (permalink / raw)
  To: Yury Norov, linux-kernel, Martin Habets, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev,
	linux-net-drivers
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

On 18/11/2023 15:50, Yury Norov wrote:
> SFC code traverses rps_slot_map and rxq_retry_mask bit by bit. We can do
> it better by using dedicated atomic find_bit() functions, because they
> skip already clear bits.
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>

Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 09/34] dmaengine: idxd: optimize perfmon_assign_event()
  2023-11-18 15:50 ` [PATCH 09/34] dmaengine: idxd: optimize perfmon_assign_event() Yury Norov
  2023-11-20 15:34   ` Dave Jiang
@ 2023-11-24 12:15   ` Vinod Koul
  1 sibling, 0 replies; 68+ messages in thread
From: Vinod Koul @ 2023-11-24 12:15 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-kernel, Fenghua Yu, Dave Jiang, dmaengine, Jan Kara,
	Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov

On 18-11-23, 07:50, Yury Norov wrote:
> The function searches used_mask for a set bit in a for-loop bit by bit.
> We can do it faster by using atomic find_and_set_bit().

Acked-by: Vinod Koul <vkoul@kernel.org>

-- 
~Vinod

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2023-11-24 12:16 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-18 15:50 [PATCH 00/34] biops: add atomig find_bit() operations Yury Norov
2023-11-18 15:50 ` [PATCH 01/34] lib/find: add atomic find_bit() primitives Yury Norov
2023-11-18 16:23   ` Bart Van Assche
2023-11-18 15:50 ` [PATCH 02/34] lib/sbitmap; make __sbitmap_get_word() using find_and_set_bit() Yury Norov
2023-11-18 15:50 ` [PATCH 03/34] watch_queue: use atomic find_bit() in post_one_notification() Yury Norov
2023-11-18 15:50 ` [PATCH 04/34] sched: add cpumask_find_and_set() and use it in __mm_cid_get() Yury Norov
2023-11-20 11:31   ` Peter Zijlstra
2023-11-20 16:17     ` Mathieu Desnoyers
2023-11-21 13:31       ` Yury Norov
2023-11-21 13:44         ` Mathieu Desnoyers
2023-11-21 17:00           ` Yury Norov
2023-11-18 15:50 ` [PATCH 05/34] mips: sgi-ip30: rework heart_alloc_int() Yury Norov
2023-11-18 15:50 ` [PATCH 06/34] sparc: fix opencoded find_and_set_bit() in alloc_msi() Yury Norov
2023-11-18 15:50 ` [PATCH 07/34] perf/arm: optimize opencoded atomic find_bit() API Yury Norov
2023-11-21 15:53   ` Will Deacon
2023-11-21 16:16     ` Yury Norov
2023-11-21 16:17       ` Will Deacon
2023-11-18 15:50 ` [PATCH 08/34] drivers/perf: optimize ali_drw_get_counter_idx() by using find_bit() Yury Norov
2023-11-21 15:54   ` Will Deacon
2023-11-18 15:50 ` [PATCH 09/34] dmaengine: idxd: optimize perfmon_assign_event() Yury Norov
2023-11-20 15:34   ` Dave Jiang
2023-11-24 12:15   ` Vinod Koul
2023-11-18 15:50 ` [PATCH 10/34] ath10k: optimize ath10k_snoc_napi_poll() by using find_bit() Yury Norov
2023-11-18 15:50 ` [PATCH 11/34] wifi: rtw88: optimize rtw_pci_tx_kick_off() " Yury Norov
2023-11-18 15:50 ` [PATCH 12/34] wifi: intel: use atomic find_bit() API where appropriate Yury Norov
2023-11-19 19:58   ` Johannes Berg
2023-11-21 16:36     ` Yury Norov
2023-11-18 15:50 ` [PATCH 13/34] KVM: x86: hyper-v: optimize and cleanup kvm_hv_process_stimers() Yury Norov
2023-11-20 14:26   ` Vitaly Kuznetsov
2023-11-21 13:35     ` Yury Norov
2023-11-18 15:50 ` [PATCH 14/34] PCI: hv: switch hv_get_dom_num() to use atomic find_bit() Yury Norov
2023-11-18 17:59   ` Michael Kelley
2023-11-18 15:50 ` [PATCH 15/34] scsi: use atomic find_bit() API where appropriate Yury Norov
2023-11-18 16:30   ` Bart Van Assche
2023-11-18 15:50 ` [PATCH 16/34] powerpc: " Yury Norov
2023-11-18 15:50 ` [PATCH 17/34] iommu: " Yury Norov
2023-11-18 15:50 ` [PATCH 18/34] media: radio-shark: " Yury Norov
2023-11-18 15:50 ` [PATCH 19/34] sfc: switch to using " Yury Norov
2023-11-21 19:46   ` Edward Cree
2023-11-18 15:50 ` [PATCH 20/34] tty: nozomi: optimize interrupt_handler() Yury Norov
2023-11-18 15:50 ` [PATCH 21/34] usb: cdc-acm: optimize acm_softint() Yury Norov
2023-11-20 11:39   ` Oliver Neukum
2023-11-18 15:50 ` [PATCH 22/34] block: null_blk: fix opencoded find_and_set_bit() in get_tag() Yury Norov
2023-11-18 15:50 ` [PATCH 23/34] RDMA/rtrs: fix opencoded find_and_set_bit_lock() in __rtrs_get_permit() Yury Norov
2023-11-18 15:50 ` [PATCH 24/34] mISDN: optimize get_free_devid() Yury Norov
2023-11-18 15:50 ` [PATCH 25/34] media: em28xx: cx231xx: fix opencoded find_and_set_bit() Yury Norov
2023-11-18 15:50 ` [PATCH 26/34] ethernet: rocker: optimize ofdpa_port_internal_vlan_id_get() Yury Norov
2023-11-18 15:50 ` [PATCH 27/34] serial: sc12is7xx: optimize sc16is7xx_alloc_line() Yury Norov
2023-11-18 15:50 ` [PATCH 28/34] bluetooth: optimize cmtp_alloc_block_id() Yury Norov
2023-11-18 15:51 ` [PATCH 29/34] net: smc: fix opencoded find_and_set_bit() in smc_wr_tx_get_free_slot_index() Yury Norov
2023-11-20  8:43   ` Alexandra Winter
2023-11-21 13:41     ` Yury Norov
2023-11-21 15:39       ` Alexandra Winter
2023-11-20  9:56   ` Tony Lu
2023-11-18 15:51 ` [PATCH 30/34] ALSA: use atomic find_bit() functions where applicable Yury Norov
2023-11-20 15:57   ` Takashi Iwai
2023-11-18 15:51 ` [PATCH 31/34] drivers/perf: optimize m1_pmu_get_event_idx() by using find_bit() API Yury Norov
2023-11-18 18:40   ` Marc Zyngier
2023-11-18 18:45     ` Yury Norov
2023-11-18 15:51 ` [PATCH 32/34] m68k: rework get_mmu_context() Yury Norov
2023-11-19 19:29   ` Geert Uytterhoeven
2023-11-21 14:39   ` Greg Ungerer
2023-11-18 15:51 ` [PATCH 33/34] microblaze: " Yury Norov
2023-11-18 15:51 ` [PATCH 34/34] sh: rework ilsel_enable() Yury Norov
2023-11-18 16:15   ` John Paul Adrian Glaubitz
2023-11-21 13:43     ` Yury Norov
2023-11-18 16:18 ` [PATCH 00/34] biops: add atomig find_bit() operations Bart Van Assche
2023-11-18 19:06   ` Sergey Shtylyov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).