All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/35] bitops: add atomic find_bit() operations
@ 2023-12-03 19:23 ` Yury Norov
  0 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:23 UTC (permalink / raw)
  To: linux-kernel, David S. Miller, H. Peter Anvin,
	James E.J. Bottomley, K. Y. Srinivasan, Md. Haris Iqbal,
	Akinobu Mita, Andrew Morton, Bjorn Andersson, Borislav Petkov,
	Chaitanya Kulkarni, Christian Brauner, Damien Le Moal,
	Dave Hansen, David Disseldorp, Edward Cree, Eric Dumazet,
	Fenghua Yu, Geert Uytterhoeven, Greg Kroah-Hartman,
	Gregory Greenman, Hans Verkuil, Hans de Goede, Hugh Dickins,
	Ingo Molnar, Jakub Kicinski, Jaroslav Kysela, Jason Gunthorpe,
	Jens Axboe, Jiri Pirko, Jiri Slaby, Kalle Valo, Karsten Graul,
	Karsten Keil, Kees Cook, Leon Romanovsky, Mark Rutland,
	Martin Habets, Mauro Carvalho Chehab, Michael Ellerman,
	Michal Simek, Nicholas Piggin, Oliver Neukum, Paolo Abeni,
	Paolo Bonzini, Peter Zijlstra, Ping-Ke Shih, Rich Felker,
	Rob Herring, Robin Murphy, Sean Christopherson, Shuai Xue,
	Stanislaw Gruszka, Steven Rostedt, Thomas Bogendoerfer,
	Thomas Gleixner, Valentin Schneider, Vitaly Kuznetsov,
	Wenjia Zhang, Will Deacon, Yoshinori Sato,
	GR-QLogic-Storage-Upstream, alsa-devel, ath10k, dmaengine, iommu,
	kvm, linux-arm-kernel, linux-arm-msm, linux-block,
	linux-bluetooth, linux-hyperv, linux-m68k, linux-media,
	linux-mips, linux-net-drivers, linux-pci, linux-rdma, linux-s390,
	linux-scsi, linux-serial, linux-sh, linux-sound, linux-usb,
	linux-wireless, linuxppc-dev, mpi3mr-linuxdrv.pdl, netdev,
	sparclinux, x86
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov,
	Bart Van Assche, Sergey Shtylyov

Add helpers around test_and_{set,clear}_bit() that allow to search for
clear or set bits and flip them atomically.

The target patterns may look like this:

	for (idx = 0; idx < nbits; idx++)
		if (test_and_clear_bit(idx, bitmap))
			do_something(idx);

Or like this:

	do {
		bit = find_first_bit(bitmap, nbits);
		if (bit >= nbits)
			return nbits;
	} while (!test_and_clear_bit(bit, bitmap));
	return bit;

In both cases, the opencoded loop may be converted to a single function
or iterator call. Correspondingly:

	for_each_test_and_clear_bit(idx, bitmap, nbits)
		do_something(idx);

Or:
	return find_and_clear_bit(bitmap, nbits);

Obviously, the less routine code people have to write themself, the
less probability to make a mistake.

Those are not only handy helpers but also resolve a non-trivial
issue of using non-atomic find_bit() together with atomic
test_and_{set,clear)_bit().

The trick is that find_bit() implies that the bitmap is a regular
non-volatile piece of memory, and compiler is allowed to use such
optimization techniques like re-fetching memory instead of caching it.

For example, find_first_bit() is implemented like this:

      for (idx = 0; idx * BITS_PER_LONG < sz; idx++) {
              val = addr[idx];
              if (val) {
                      sz = min(idx * BITS_PER_LONG + __ffs(val), sz);
                      break;
              }
      }

On register-memory architectures, like x86, compiler may decide to
access memory twice - first time to compare against 0, and second time
to fetch its value to pass it to __ffs().

When running find_first_bit() on volatile memory, the memory may get
changed in-between, and for instance, it may lead to passing 0 to
__ffs(), which is undefined. This is a potentially dangerous call.

find_and_clear_bit() as a wrapper around test_and_clear_bit()
naturally treats underlying bitmap as a volatile memory and prevents
compiler from such optimizations.

Now that KCSAN is catching exactly this type of situations and warns on
undercover memory modifications. We can use it to reveal improper usage
of find_bit(), and convert it to atomic find_and_*_bit() as appropriate.

The 1st patch of the series adds the following atomic primitives:

	find_and_set_bit(addr, nbits);
	find_and_set_next_bit(addr, nbits, start);
	...

Here find_and_{set,clear} part refers to the corresponding
test_and_{set,clear}_bit function. Suffixes like _wrap or _lock
derive their semantics from corresponding find() or test() functions.

For brevity, the naming omits the fact that we search for zero bit in
find_and_set, and correspondingly search for set bit in find_and_clear
functions.

The patch also adds iterators with atomic semantics, like
for_each_test_and_set_bit(). Here, the naming rule is to simply prefix
corresponding atomic operation with 'for_each'.

This series is a result of discussion [1]. All find_bit() functions imply
exclusive access to the bitmaps. However, KCSAN reports quite a number
of warnings related to find_bit() API. Some of them are not pointing
to real bugs because in many situations people intentionally allow
concurrent bitmap operations.

If so, find_bit() can be annotated such that KCSAN will ignore it:

        bit = data_race(find_first_bit(bitmap, nbits));

This series addresses the other important case where people really need
atomic find ops. As the following patches show, the resulting code
looks safer and more verbose comparing to opencoded loops followed by
atomic bit flips.

In [1] Mirsad reported 2% slowdown in a single-thread search test when
switching find_bit() function to treat bitmaps as volatile arrays. On
the other hand, kernel robot in the same thread reported +3.7% to the
performance of will-it-scale.per_thread_ops test.

Assuming that our compilers are sane and generate better code against
properly annotated data, the above discrepancy doesn't look weird. When
running on non-volatile bitmaps, plain find_bit() outperforms atomic
find_and_bit(), and vice-versa.

So, all users of find_bit() API, where heavy concurrency is expected,
are encouraged to switch to atomic find_and_bit() as appropriate.

The 1st patch of this series adds atomic find_and_bit() API, 2nd adds
a basic test for new API, and all the following patches spread it over
the kernel.

They can be applied separately from each other on per-subsystems basis,
or I can pull them in bitmap tree, as appropriate.

[1] https://lore.kernel.org/lkml/634f5fdf-e236-42cf-be8d-48a581c21660@alu.unizg.hr/T/#m3e7341eb3571753f3acf8fe166f3fb5b2c12e615
---
v1: https://lore.kernel.org/netdev/20231118155105.25678-29-yury.norov@gmail.com/T/
v2:
 - Add a basic test for the new API # Bart Van Assche;
 - Add collected reviewers' tags. Thank you guys!
 - Fix typos where found/pointed by;
 - Drop erroneous patch #v1-31 ("drivers/perf: optimize m1_pmu_get_event_idx()...") @ Marc Zyngier;
 - Drop unneeded patch #v1-12 ("wifi: intel: use atomic find_bit() API...") @ Johannes Berg;
 - Patch #v1-15: split SCSI changes per subsystems @ Bart Van Assche;
 - Patch  #5: keep changes inside __mm_cid_try_get() @ Mathieu Desnoyers;
 - Patch  #8: use find_and_set_next_bit() @ Will Deacon;
 - Patch #13: keep test against stimer->config.enable @ Vitaly Kuznetsov;
 - Patch #15: use find_and_set_next_bit @ Bart Van Assche;
 - Patch #31: edit commit message @ Tony Lu, Alexandra Winter;
 - Patch #35: edit tag @ John Paul Adrian Glaubitz;

Yury Norov (35):
  lib/find: add atomic find_bit() primitives
  lib/find: add test for atomic find_bit() ops
  lib/sbitmap; make __sbitmap_get_word() using find_and_set_bit()
  watch_queue: use atomic find_bit() in post_one_notification()
  sched: add cpumask_find_and_set() and use it in __mm_cid_get()
  mips: sgi-ip30: rework heart_alloc_int()
  sparc: fix opencoded find_and_set_bit() in alloc_msi()
  perf/arm: optimize opencoded atomic find_bit() API
  drivers/perf: optimize ali_drw_get_counter_idx() by using find_bit()
  dmaengine: idxd: optimize perfmon_assign_event()
  ath10k: optimize ath10k_snoc_napi_poll() by using find_bit()
  wifi: rtw88: optimize rtw_pci_tx_kick_off() by using find_bit()
  KVM: x86: hyper-v: optimize and cleanup kvm_hv_process_stimers()
  PCI: hv: switch hv_get_dom_num() to use atomic find_bit()
  scsi: core: use atomic find_bit() API where appropriate
  scsi: mpi3mr: switch to using atomic find_and_set_bit()
  scsi: qedi: rework qedi_get_task_idx()
  powerpc: use atomic find_bit() API where appropriate
  iommu: use atomic find_bit() API where appropriate
  media: radio-shark: use atomic find_bit() API where appropriate
  sfc: switch to using atomic find_bit() API where appropriate
  tty: nozomi: optimize interrupt_handler()
  usb: cdc-acm: optimize acm_softint()
  block: null_blk: fix opencoded find_and_set_bit() in get_tag()
  RDMA/rtrs: fix opencoded find_and_set_bit_lock() in
    __rtrs_get_permit()
  mISDN: optimize get_free_devid()
  media: em28xx: cx231xx: fix opencoded find_and_set_bit()
  ethernet: rocker: optimize ofdpa_port_internal_vlan_id_get()
  serial: sc12is7xx: optimize sc16is7xx_alloc_line()
  bluetooth: optimize cmtp_alloc_block_id()
  net: smc:  use find_and_set_bit() in smc_wr_tx_get_free_slot_index()
  ALSA: use atomic find_bit() functions where applicable
  m68k: rework get_mmu_context()
  microblaze: rework get_mmu_context()
  sh: mach-x3proto: rework ilsel_enable()

 arch/m68k/include/asm/mmu_context.h          |  11 +-
 arch/microblaze/include/asm/mmu_context_mm.h |  11 +-
 arch/mips/sgi-ip30/ip30-irq.c                |  12 +-
 arch/powerpc/mm/book3s32/mmu_context.c       |  10 +-
 arch/powerpc/platforms/pasemi/dma_lib.c      |  45 +--
 arch/powerpc/platforms/powernv/pci-sriov.c   |  12 +-
 arch/sh/boards/mach-x3proto/ilsel.c          |   4 +-
 arch/sparc/kernel/pci_msi.c                  |   9 +-
 arch/x86/kvm/hyperv.c                        |  39 ++-
 drivers/block/null_blk/main.c                |  41 +--
 drivers/dma/idxd/perfmon.c                   |   8 +-
 drivers/infiniband/ulp/rtrs/rtrs-clt.c       |  15 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.h        |  10 +-
 drivers/iommu/msm_iommu.c                    |  18 +-
 drivers/isdn/mISDN/core.c                    |   9 +-
 drivers/media/radio/radio-shark.c            |   5 +-
 drivers/media/radio/radio-shark2.c           |   5 +-
 drivers/media/usb/cx231xx/cx231xx-cards.c    |  16 +-
 drivers/media/usb/em28xx/em28xx-cards.c      |  37 +--
 drivers/net/ethernet/rocker/rocker_ofdpa.c   |  11 +-
 drivers/net/ethernet/sfc/rx_common.c         |   4 +-
 drivers/net/ethernet/sfc/siena/rx_common.c   |   4 +-
 drivers/net/ethernet/sfc/siena/siena_sriov.c |  14 +-
 drivers/net/wireless/ath/ath10k/snoc.c       |   9 +-
 drivers/net/wireless/realtek/rtw88/pci.c     |   5 +-
 drivers/net/wireless/realtek/rtw89/pci.c     |   5 +-
 drivers/pci/controller/pci-hyperv.c          |   7 +-
 drivers/perf/alibaba_uncore_drw_pmu.c        |  10 +-
 drivers/perf/arm-cci.c                       |  24 +-
 drivers/perf/arm-ccn.c                       |  10 +-
 drivers/perf/arm_dmc620_pmu.c                |   9 +-
 drivers/perf/arm_pmuv3.c                     |   8 +-
 drivers/scsi/mpi3mr/mpi3mr_os.c              |  21 +-
 drivers/scsi/qedi/qedi_main.c                |   9 +-
 drivers/scsi/scsi_lib.c                      |   7 +-
 drivers/tty/nozomi.c                         |   5 +-
 drivers/tty/serial/sc16is7xx.c               |   8 +-
 drivers/usb/class/cdc-acm.c                  |   5 +-
 include/linux/cpumask.h                      |  12 +
 include/linux/find.h                         | 293 +++++++++++++++++++
 kernel/sched/sched.h                         |  14 +-
 kernel/watch_queue.c                         |   6 +-
 lib/find_bit.c                               |  85 ++++++
 lib/sbitmap.c                                |  46 +--
 lib/test_bitmap.c                            |  61 ++++
 net/bluetooth/cmtp/core.c                    |  10 +-
 net/smc/smc_wr.c                             |  10 +-
 sound/pci/hda/hda_codec.c                    |   7 +-
 sound/usb/caiaq/audio.c                      |  13 +-
 49 files changed, 629 insertions(+), 420 deletions(-)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 00/35] bitops: add atomic find_bit() operations
@ 2023-12-03 19:23 ` Yury Norov
  0 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:23 UTC (permalink / raw)
  To: linux-kernel, David S. Miller, H. Peter Anvin,
	James E.J. Bottomley, K. Y. Srinivasan, Md. Haris Iqbal,
	Akinobu Mita, Andrew Morton, Bjorn Andersson, Borislav Petkov,
	Chaitanya Kulkarni, Christian Brauner, Damien Le Moal,
	Dave Hansen, David Disseldorp, Edward Cree, Eric Dumazet,
	Fenghua Yu, Geert Uytterhoeven, Greg Kroah-Hartman,
	Gregory Greenman, Hans Verkuil, Hans de Goede, Hugh Dickins,
	Ingo Molnar, Jakub Kicinski, Jaroslav Kysela, Jason Gunthorpe,
	Jens Axboe, Jiri Pirko, Jiri Slaby, Kalle Valo, Karsten Graul,
	Karsten Keil, Kees Cook, Leon Romanovsky, Mark Rutland,
	Martin Habets, Mauro Carvalho Chehab, Michael Ellerman,
	Michal Simek, Nicholas Piggin, Oliver Neukum, Paolo Abeni,
	Paolo Bonzini, Peter Zijlstra, Ping-Ke Shih, Rich Felker,
	Rob Herring, Robin Murphy, Sean Christopherson, Shuai Xue,
	Stanislaw Gruszka, Steven Rostedt, Thomas Bogendoerfer,
	Thomas Gleixner, Valentin Schneider, Vitaly Kuznetsov,
	Wenjia Zhang, Will Deacon, Yoshinori Sato,
	GR-QLogic-Storage-Upstream, alsa-devel, ath10k, dmaengine, iommu,
	kvm, linux-arm-kernel, linux-arm-msm, linux-block,
	linux-bluetooth, linux-hyperv, linux-m68k, linux-media,
	linux-mips, linux-net-drivers, linux-pci, linux-rdma, linux-s390,
	linux-scsi, linux-serial, linux-sh, linux-sound, linux-usb,
	linux-wireless, linuxppc-dev, mpi3mr-linuxdrv.pdl, netdev,
	sparclinux, x86
  Cc: Sergey Shtylyov, Jan Kara, Bart Van Assche, Yury Norov,
	Rasmus Villemoes, Matthew Wilcox, Alexey Klimov, Maxim Kuvyrkov,
	Andy Shevchenko, Mirsad Todorovac

Add helpers around test_and_{set,clear}_bit() that allow to search for
clear or set bits and flip them atomically.

The target patterns may look like this:

	for (idx = 0; idx < nbits; idx++)
		if (test_and_clear_bit(idx, bitmap))
			do_something(idx);

Or like this:

	do {
		bit = find_first_bit(bitmap, nbits);
		if (bit >= nbits)
			return nbits;
	} while (!test_and_clear_bit(bit, bitmap));
	return bit;

In both cases, the opencoded loop may be converted to a single function
or iterator call. Correspondingly:

	for_each_test_and_clear_bit(idx, bitmap, nbits)
		do_something(idx);

Or:
	return find_and_clear_bit(bitmap, nbits);

Obviously, the less routine code people have to write themself, the
less probability to make a mistake.

Those are not only handy helpers but also resolve a non-trivial
issue of using non-atomic find_bit() together with atomic
test_and_{set,clear)_bit().

The trick is that find_bit() implies that the bitmap is a regular
non-volatile piece of memory, and compiler is allowed to use such
optimization techniques like re-fetching memory instead of caching it.

For example, find_first_bit() is implemented like this:

      for (idx = 0; idx * BITS_PER_LONG < sz; idx++) {
              val = addr[idx];
              if (val) {
                      sz = min(idx * BITS_PER_LONG + __ffs(val), sz);
                      break;
              }
      }

On register-memory architectures, like x86, compiler may decide to
access memory twice - first time to compare against 0, and second time
to fetch its value to pass it to __ffs().

When running find_first_bit() on volatile memory, the memory may get
changed in-between, and for instance, it may lead to passing 0 to
__ffs(), which is undefined. This is a potentially dangerous call.

find_and_clear_bit() as a wrapper around test_and_clear_bit()
naturally treats underlying bitmap as a volatile memory and prevents
compiler from such optimizations.

Now that KCSAN is catching exactly this type of situations and warns on
undercover memory modifications. We can use it to reveal improper usage
of find_bit(), and convert it to atomic find_and_*_bit() as appropriate.

The 1st patch of the series adds the following atomic primitives:

	find_and_set_bit(addr, nbits);
	find_and_set_next_bit(addr, nbits, start);
	...

Here find_and_{set,clear} part refers to the corresponding
test_and_{set,clear}_bit function. Suffixes like _wrap or _lock
derive their semantics from corresponding find() or test() functions.

For brevity, the naming omits the fact that we search for zero bit in
find_and_set, and correspondingly search for set bit in find_and_clear
functions.

The patch also adds iterators with atomic semantics, like
for_each_test_and_set_bit(). Here, the naming rule is to simply prefix
corresponding atomic operation with 'for_each'.

This series is a result of discussion [1]. All find_bit() functions imply
exclusive access to the bitmaps. However, KCSAN reports quite a number
of warnings related to find_bit() API. Some of them are not pointing
to real bugs because in many situations people intentionally allow
concurrent bitmap operations.

If so, find_bit() can be annotated such that KCSAN will ignore it:

        bit = data_race(find_first_bit(bitmap, nbits));

This series addresses the other important case where people really need
atomic find ops. As the following patches show, the resulting code
looks safer and more verbose comparing to opencoded loops followed by
atomic bit flips.

In [1] Mirsad reported 2% slowdown in a single-thread search test when
switching find_bit() function to treat bitmaps as volatile arrays. On
the other hand, kernel robot in the same thread reported +3.7% to the
performance of will-it-scale.per_thread_ops test.

Assuming that our compilers are sane and generate better code against
properly annotated data, the above discrepancy doesn't look weird. When
running on non-volatile bitmaps, plain find_bit() outperforms atomic
find_and_bit(), and vice-versa.

So, all users of find_bit() API, where heavy concurrency is expected,
are encouraged to switch to atomic find_and_bit() as appropriate.

The 1st patch of this series adds atomic find_and_bit() API, 2nd adds
a basic test for new API, and all the following patches spread it over
the kernel.

They can be applied separately from each other on per-subsystems basis,
or I can pull them in bitmap tree, as appropriate.

[1] https://lore.kernel.org/lkml/634f5fdf-e236-42cf-be8d-48a581c21660@alu.unizg.hr/T/#m3e7341eb3571753f3acf8fe166f3fb5b2c12e615
---
v1: https://lore.kernel.org/netdev/20231118155105.25678-29-yury.norov@gmail.com/T/
v2:
 - Add a basic test for the new API # Bart Van Assche;
 - Add collected reviewers' tags. Thank you guys!
 - Fix typos where found/pointed by;
 - Drop erroneous patch #v1-31 ("drivers/perf: optimize m1_pmu_get_event_idx()...") @ Marc Zyngier;
 - Drop unneeded patch #v1-12 ("wifi: intel: use atomic find_bit() API...") @ Johannes Berg;
 - Patch #v1-15: split SCSI changes per subsystems @ Bart Van Assche;
 - Patch  #5: keep changes inside __mm_cid_try_get() @ Mathieu Desnoyers;
 - Patch  #8: use find_and_set_next_bit() @ Will Deacon;
 - Patch #13: keep test against stimer->config.enable @ Vitaly Kuznetsov;
 - Patch #15: use find_and_set_next_bit @ Bart Van Assche;
 - Patch #31: edit commit message @ Tony Lu, Alexandra Winter;
 - Patch #35: edit tag @ John Paul Adrian Glaubitz;

Yury Norov (35):
  lib/find: add atomic find_bit() primitives
  lib/find: add test for atomic find_bit() ops
  lib/sbitmap; make __sbitmap_get_word() using find_and_set_bit()
  watch_queue: use atomic find_bit() in post_one_notification()
  sched: add cpumask_find_and_set() and use it in __mm_cid_get()
  mips: sgi-ip30: rework heart_alloc_int()
  sparc: fix opencoded find_and_set_bit() in alloc_msi()
  perf/arm: optimize opencoded atomic find_bit() API
  drivers/perf: optimize ali_drw_get_counter_idx() by using find_bit()
  dmaengine: idxd: optimize perfmon_assign_event()
  ath10k: optimize ath10k_snoc_napi_poll() by using find_bit()
  wifi: rtw88: optimize rtw_pci_tx_kick_off() by using find_bit()
  KVM: x86: hyper-v: optimize and cleanup kvm_hv_process_stimers()
  PCI: hv: switch hv_get_dom_num() to use atomic find_bit()
  scsi: core: use atomic find_bit() API where appropriate
  scsi: mpi3mr: switch to using atomic find_and_set_bit()
  scsi: qedi: rework qedi_get_task_idx()
  powerpc: use atomic find_bit() API where appropriate
  iommu: use atomic find_bit() API where appropriate
  media: radio-shark: use atomic find_bit() API where appropriate
  sfc: switch to using atomic find_bit() API where appropriate
  tty: nozomi: optimize interrupt_handler()
  usb: cdc-acm: optimize acm_softint()
  block: null_blk: fix opencoded find_and_set_bit() in get_tag()
  RDMA/rtrs: fix opencoded find_and_set_bit_lock() in
    __rtrs_get_permit()
  mISDN: optimize get_free_devid()
  media: em28xx: cx231xx: fix opencoded find_and_set_bit()
  ethernet: rocker: optimize ofdpa_port_internal_vlan_id_get()
  serial: sc12is7xx: optimize sc16is7xx_alloc_line()
  bluetooth: optimize cmtp_alloc_block_id()
  net: smc:  use find_and_set_bit() in smc_wr_tx_get_free_slot_index()
  ALSA: use atomic find_bit() functions where applicable
  m68k: rework get_mmu_context()
  microblaze: rework get_mmu_context()
  sh: mach-x3proto: rework ilsel_enable()

 arch/m68k/include/asm/mmu_context.h          |  11 +-
 arch/microblaze/include/asm/mmu_context_mm.h |  11 +-
 arch/mips/sgi-ip30/ip30-irq.c                |  12 +-
 arch/powerpc/mm/book3s32/mmu_context.c       |  10 +-
 arch/powerpc/platforms/pasemi/dma_lib.c      |  45 +--
 arch/powerpc/platforms/powernv/pci-sriov.c   |  12 +-
 arch/sh/boards/mach-x3proto/ilsel.c          |   4 +-
 arch/sparc/kernel/pci_msi.c                  |   9 +-
 arch/x86/kvm/hyperv.c                        |  39 ++-
 drivers/block/null_blk/main.c                |  41 +--
 drivers/dma/idxd/perfmon.c                   |   8 +-
 drivers/infiniband/ulp/rtrs/rtrs-clt.c       |  15 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.h        |  10 +-
 drivers/iommu/msm_iommu.c                    |  18 +-
 drivers/isdn/mISDN/core.c                    |   9 +-
 drivers/media/radio/radio-shark.c            |   5 +-
 drivers/media/radio/radio-shark2.c           |   5 +-
 drivers/media/usb/cx231xx/cx231xx-cards.c    |  16 +-
 drivers/media/usb/em28xx/em28xx-cards.c      |  37 +--
 drivers/net/ethernet/rocker/rocker_ofdpa.c   |  11 +-
 drivers/net/ethernet/sfc/rx_common.c         |   4 +-
 drivers/net/ethernet/sfc/siena/rx_common.c   |   4 +-
 drivers/net/ethernet/sfc/siena/siena_sriov.c |  14 +-
 drivers/net/wireless/ath/ath10k/snoc.c       |   9 +-
 drivers/net/wireless/realtek/rtw88/pci.c     |   5 +-
 drivers/net/wireless/realtek/rtw89/pci.c     |   5 +-
 drivers/pci/controller/pci-hyperv.c          |   7 +-
 drivers/perf/alibaba_uncore_drw_pmu.c        |  10 +-
 drivers/perf/arm-cci.c                       |  24 +-
 drivers/perf/arm-ccn.c                       |  10 +-
 drivers/perf/arm_dmc620_pmu.c                |   9 +-
 drivers/perf/arm_pmuv3.c                     |   8 +-
 drivers/scsi/mpi3mr/mpi3mr_os.c              |  21 +-
 drivers/scsi/qedi/qedi_main.c                |   9 +-
 drivers/scsi/scsi_lib.c                      |   7 +-
 drivers/tty/nozomi.c                         |   5 +-
 drivers/tty/serial/sc16is7xx.c               |   8 +-
 drivers/usb/class/cdc-acm.c                  |   5 +-
 include/linux/cpumask.h                      |  12 +
 include/linux/find.h                         | 293 +++++++++++++++++++
 kernel/sched/sched.h                         |  14 +-
 kernel/watch_queue.c                         |   6 +-
 lib/find_bit.c                               |  85 ++++++
 lib/sbitmap.c                                |  46 +--
 lib/test_bitmap.c                            |  61 ++++
 net/bluetooth/cmtp/core.c                    |  10 +-
 net/smc/smc_wr.c                             |  10 +-
 sound/pci/hda/hda_codec.c                    |   7 +-
 sound/usb/caiaq/audio.c                      |  13 +-
 49 files changed, 629 insertions(+), 420 deletions(-)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 01/35] lib/find: add atomic find_bit() primitives
  2023-12-03 19:23 ` Yury Norov
@ 2023-12-03 19:23   ` Yury Norov
  -1 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:23 UTC (permalink / raw)
  To: linux-kernel, David S. Miller, H. Peter Anvin,
	James E.J. Bottomley, K. Y. Srinivasan, Md. Haris Iqbal,
	Akinobu Mita, Andrew Morton, Bjorn Andersson, Borislav Petkov,
	Chaitanya Kulkarni, Christian Brauner, Damien Le Moal,
	Dave Hansen, David Disseldorp, Edward Cree, Eric Dumazet,
	Fenghua Yu, Geert Uytterhoeven, Greg Kroah-Hartman,
	Gregory Greenman, Hans Verkuil, Hans de Goede, Hugh Dickins,
	Ingo Molnar, Jakub Kicinski, Jaroslav Kysela, Jason Gunthorpe,
	Jens Axboe, Jiri Pirko, Jiri Slaby, Kalle Valo, Karsten Graul,
	Karsten Keil, Kees Cook, Leon Romanovsky, Mark Rutland,
	Martin Habets, Mauro Carvalho Chehab, Michael Ellerman,
	Michal Simek, Nicholas Piggin, Oliver Neukum, Paolo Abeni,
	Paolo Bonzini, Peter Zijlstra, Ping-Ke Shih, Rich Felker,
	Rob Herring, Robin Murphy, Sean Christopherson, Shuai Xue,
	Stanislaw Gruszka, Steven Rostedt, Thomas Bogendoerfer,
	Thomas Gleixner, Valentin Schneider, Vitaly Kuznetsov,
	Wenjia Zhang, Will Deacon, Yoshinori Sato,
	GR-QLogic-Storage-Upstream, alsa-devel, ath10k, dmaengine, iommu,
	kvm, linux-arm-kernel, linux-arm-msm, linux-block,
	linux-bluetooth, linux-hyperv, linux-m68k, linux-media,
	linux-mips, linux-net-drivers, linux-pci, linux-rdma, linux-s390,
	linux-scsi, linux-serial, linux-sh, linux-sound, linux-usb,
	linux-wireless, linuxppc-dev, mpi3mr-linuxdrv.pdl, netdev,
	sparclinux, x86
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov,
	Bart Van Assche, Sergey Shtylyov

Add helpers around test_and_{set,clear}_bit() that allow to search for
clear or set bits and flip them atomically.

The target patterns may look like this:

	for (idx = 0; idx < nbits; idx++)
		if (test_and_clear_bit(idx, bitmap))
			do_something(idx);

Or like this:

	do {
		bit = find_first_bit(bitmap, nbits);
		if (bit >= nbits)
			return nbits;
	} while (!test_and_clear_bit(bit, bitmap));
	return bit;

In both cases, the opencoded loop may be converted to a single function
or iterator call. Correspondingly:

	for_each_test_and_clear_bit(idx, bitmap, nbits)
		do_something(idx);

Or:
	return find_and_clear_bit(bitmap, nbits);

Obviously, the less routine code people have to write themself, the
less probability to make a mistake.

Those are not only handy helpers but also resolve a non-trivial
issue of using non-atomic find_bit() together with atomic
test_and_{set,clear)_bit().

The trick is that find_bit() implies that the bitmap is a regular
non-volatile piece of memory, and compiler is allowed to use such
optimization techniques like re-fetching memory instead of caching it.

For example, find_first_bit() is implemented like this:

      for (idx = 0; idx * BITS_PER_LONG < sz; idx++) {
              val = addr[idx];
              if (val) {
                      sz = min(idx * BITS_PER_LONG + __ffs(val), sz);
                      break;
              }
      }

On register-memory architectures, like x86, compiler may decide to
access memory twice - first time to compare against 0, and second time
to fetch its value to pass it to __ffs().

When running find_first_bit() on volatile memory, the memory may get
changed in-between, and for instance, it may lead to passing 0 to
__ffs(), which is undefined. This is a potentially dangerous call.

find_and_clear_bit() as a wrapper around test_and_clear_bit()
naturally treats underlying bitmap as a volatile memory and prevents
compiler from such optimizations.

Now that KCSAN is catching exactly this type of situations and warns on
undercover memory modifications. We can use it to reveal improper usage
of find_bit(), and convert it to atomic find_and_*_bit() as appropriate.

The 1st patch of the series adds the following atomic primitives:

	find_and_set_bit(addr, nbits);
	find_and_set_next_bit(addr, nbits, start);
	...

Here find_and_{set,clear} part refers to the corresponding
test_and_{set,clear}_bit function. Suffixes like _wrap or _lock
derive their semantics from corresponding find() or test() functions.

For brevity, the naming omits the fact that we search for zero bit in
find_and_set, and correspondingly search for set bit in find_and_clear
functions.

The patch also adds iterators with atomic semantics, like
for_each_test_and_set_bit(). Here, the naming rule is to simply prefix
corresponding atomic operation with 'for_each'.

All users of find_bit() API, where heavy concurrency is expected,
are encouraged to switch to atomic find_and_bit() as appropriate.

CC: Bart Van Assche <bvanassche@acm.org>
CC: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 include/linux/find.h | 293 +++++++++++++++++++++++++++++++++++++++++++
 lib/find_bit.c       |  85 +++++++++++++
 2 files changed, 378 insertions(+)

diff --git a/include/linux/find.h b/include/linux/find.h
index 5e4f39ef2e72..79b0e2589725 100644
--- a/include/linux/find.h
+++ b/include/linux/find.h
@@ -32,6 +32,16 @@ extern unsigned long _find_first_and_bit(const unsigned long *addr1,
 extern unsigned long _find_first_zero_bit(const unsigned long *addr, unsigned long size);
 extern unsigned long _find_last_bit(const unsigned long *addr, unsigned long size);
 
+unsigned long _find_and_set_bit(volatile unsigned long *addr, unsigned long nbits);
+unsigned long _find_and_set_next_bit(volatile unsigned long *addr, unsigned long nbits,
+				unsigned long start);
+unsigned long _find_and_set_bit_lock(volatile unsigned long *addr, unsigned long nbits);
+unsigned long _find_and_set_next_bit_lock(volatile unsigned long *addr, unsigned long nbits,
+					  unsigned long start);
+unsigned long _find_and_clear_bit(volatile unsigned long *addr, unsigned long nbits);
+unsigned long _find_and_clear_next_bit(volatile unsigned long *addr, unsigned long nbits,
+				unsigned long start);
+
 #ifdef __BIG_ENDIAN
 unsigned long _find_first_zero_bit_le(const unsigned long *addr, unsigned long size);
 unsigned long _find_next_zero_bit_le(const  unsigned long *addr, unsigned
@@ -460,6 +470,267 @@ unsigned long __for_each_wrap(const unsigned long *bitmap, unsigned long size,
 	return bit < start ? bit : size;
 }
 
+/**
+ * find_and_set_bit - Find a zero bit and set it atomically
+ * @addr: The address to base the search on
+ * @nbits: The bitmap size in bits
+ *
+ * This function is designed to operate in concurrent access environment.
+ *
+ * Because of concurrency and volatile nature of underlying bitmap, it's not
+ * guaranteed that the found bit is the 1st bit in the bitmap. It's also not
+ * guaranteed that if @nbits is returned, the bitmap is empty.
+ *
+ * The function does guarantee that if returned value is in range [0 .. @nbits),
+ * the acquired bit belongs to the caller exclusively.
+ *
+ * Returns: found and set bit, or @nbits if no bits found
+ */
+static inline
+unsigned long find_and_set_bit(volatile unsigned long *addr, unsigned long nbits)
+{
+	if (small_const_nbits(nbits)) {
+		unsigned long val, ret;
+
+		do {
+			val = *addr | ~GENMASK(nbits - 1, 0);
+			if (val == ~0UL)
+				return nbits;
+			ret = ffz(val);
+		} while (test_and_set_bit(ret, addr));
+
+		return ret;
+	}
+
+	return _find_and_set_bit(addr, nbits);
+}
+
+
+/**
+ * find_and_set_next_bit - Find a zero bit and set it, starting from @offset
+ * @addr: The address to base the search on
+ * @nbits: The bitmap nbits in bits
+ * @offset: The bitnumber to start searching at
+ *
+ * This function is designed to operate in concurrent access environment.
+ *
+ * Because of concurrency and volatile nature of underlying bitmap, it's not
+ * guaranteed that the found bit is the 1st bit in the bitmap, starting from @offset.
+ * It's also not guaranteed that if @nbits is returned, the bitmap is empty.
+ *
+ * The function does guarantee that if returned value is in range [@offset .. @nbits),
+ * the acquired bit belongs to the caller exclusively.
+ *
+ * Returns: found and set bit, or @nbits if no bits found
+ */
+static inline
+unsigned long find_and_set_next_bit(volatile unsigned long *addr,
+				    unsigned long nbits, unsigned long offset)
+{
+	if (small_const_nbits(nbits)) {
+		unsigned long val, ret;
+
+		do {
+			val = *addr | ~GENMASK(nbits - 1, offset);
+			if (val == ~0UL)
+				return nbits;
+			ret = ffz(val);
+		} while (test_and_set_bit(ret, addr));
+
+		return ret;
+	}
+
+	return _find_and_set_next_bit(addr, nbits, offset);
+}
+
+/**
+ * find_and_set_bit_wrap - find and set bit starting at @offset, wrapping around zero
+ * @addr: The first address to base the search on
+ * @nbits: The bitmap size in bits
+ * @offset: The bitnumber to start searching at
+ *
+ * Returns: the bit number for the next clear bit, or first clear bit up to @offset,
+ * while atomically setting it. If no bits are found, returns @nbits.
+ */
+static inline
+unsigned long find_and_set_bit_wrap(volatile unsigned long *addr,
+					unsigned long nbits, unsigned long offset)
+{
+	unsigned long bit = find_and_set_next_bit(addr, nbits, offset);
+
+	if (bit < nbits || offset == 0)
+		return bit;
+
+	bit = find_and_set_bit(addr, offset);
+	return bit < offset ? bit : nbits;
+}
+
+/**
+ * find_and_set_bit_lock - find a zero bit, then set it atomically with lock
+ * @addr: The address to base the search on
+ * @nbits: The bitmap nbits in bits
+ *
+ * This function is designed to operate in concurrent access environment.
+ *
+ * Because of concurrency and volatile nature of underlying bitmap, it's not
+ * guaranteed that the found bit is the 1st bit in the bitmap. It's also not
+ * guaranteed that if @nbits is returned, the bitmap is empty.
+ *
+ * The function does guarantee that if returned value is in range [0 .. @nbits),
+ * the acquired bit belongs to the caller exclusively.
+ *
+ * Returns: found and set bit, or @nbits if no bits found
+ */
+static inline
+unsigned long find_and_set_bit_lock(volatile unsigned long *addr, unsigned long nbits)
+{
+	if (small_const_nbits(nbits)) {
+		unsigned long val, ret;
+
+		do {
+			val = *addr | ~GENMASK(nbits - 1, 0);
+			if (val == ~0UL)
+				return nbits;
+			ret = ffz(val);
+		} while (test_and_set_bit_lock(ret, addr));
+
+		return ret;
+	}
+
+	return _find_and_set_bit_lock(addr, nbits);
+}
+
+/**
+ * find_and_set_next_bit_lock - find a zero bit and set it atomically with lock
+ * @addr: The address to base the search on
+ * @nbits: The bitmap size in bits
+ * @offset: The bitnumber to start searching at
+ *
+ * This function is designed to operate in concurrent access environment.
+ *
+ * Because of concurrency and volatile nature of underlying bitmap, it's not
+ * guaranteed that the found bit is the 1st bit in the range. It's also not
+ * guaranteed that if @nbits is returned, the bitmap is empty.
+ *
+ * The function does guarantee that if returned value is in range [@offset .. @nbits),
+ * the acquired bit belongs to the caller exclusively.
+ *
+ * Returns: found and set bit, or @nbits if no bits found
+ */
+static inline
+unsigned long find_and_set_next_bit_lock(volatile unsigned long *addr,
+					 unsigned long nbits, unsigned long offset)
+{
+	if (small_const_nbits(nbits)) {
+		unsigned long val, ret;
+
+		do {
+			val = *addr | ~GENMASK(nbits - 1, offset);
+			if (val == ~0UL)
+				return nbits;
+			ret = ffz(val);
+		} while (test_and_set_bit_lock(ret, addr));
+
+		return ret;
+	}
+
+	return _find_and_set_next_bit_lock(addr, nbits, offset);
+}
+
+/**
+ * find_and_set_bit_wrap_lock - find zero bit starting at @ofset and set it
+ *				with lock, and wrap around zero if nothing found
+ * @addr: The first address to base the search on
+ * @nbits: The bitmap size in bits
+ * @offset: The bitnumber to start searching at
+ *
+ * Returns: the bit number for the next set bit, or first set bit up to @offset
+ * If no bits are set, returns @nbits.
+ */
+static inline
+unsigned long find_and_set_bit_wrap_lock(volatile unsigned long *addr,
+					unsigned long nbits, unsigned long offset)
+{
+	unsigned long bit = find_and_set_next_bit_lock(addr, nbits, offset);
+
+	if (bit < nbits || offset == 0)
+		return bit;
+
+	bit = find_and_set_bit_lock(addr, offset);
+	return bit < offset ? bit : nbits;
+}
+
+/**
+ * find_and_clear_bit - Find a set bit and clear it atomically
+ * @addr: The address to base the search on
+ * @nbits: The bitmap nbits in bits
+ *
+ * This function is designed to operate in concurrent access environment.
+ *
+ * Because of concurrency and volatile nature of underlying bitmap, it's not
+ * guaranteed that the found bit is the 1st bit in the bitmap. It's also not
+ * guaranteed that if @nbits is returned, the bitmap is empty.
+ *
+ * The function does guarantee that if returned value is in range [0 .. @nbits),
+ * the acquired bit belongs to the caller exclusively.
+ *
+ * Returns: found and cleared bit, or @nbits if no bits found
+ */
+static inline unsigned long find_and_clear_bit(volatile unsigned long *addr, unsigned long nbits)
+{
+	if (small_const_nbits(nbits)) {
+		unsigned long val, ret;
+
+		do {
+			val = *addr & GENMASK(nbits - 1, 0);
+			if (val == 0)
+				return nbits;
+			ret = __ffs(val);
+		} while (!test_and_clear_bit(ret, addr));
+
+		return ret;
+	}
+
+	return _find_and_clear_bit(addr, nbits);
+}
+
+/**
+ * find_and_clear_next_bit - Find a set bit next after @offset, and clear it atomically
+ * @addr: The address to base the search on
+ * @nbits: The bitmap nbits in bits
+ * @offset: bit offset at which to start searching
+ *
+ * This function is designed to operate in concurrent access environment.
+ *
+ * Because of concurrency and volatile nature of underlying bitmap, it's not
+ * guaranteed that the found bit is the 1st bit in the range It's also not
+ * guaranteed that if @nbits is returned, there's no set bits after @offset.
+ *
+ * The function does guarantee that if returned value is in range [@offset .. @nbits),
+ * the acquired bit belongs to the caller exclusively.
+ *
+ * Returns: found and cleared bit, or @nbits if no bits found
+ */
+static inline
+unsigned long find_and_clear_next_bit(volatile unsigned long *addr,
+					unsigned long nbits, unsigned long offset)
+{
+	if (small_const_nbits(nbits)) {
+		unsigned long val, ret;
+
+		do {
+			val = *addr & GENMASK(nbits - 1, offset);
+			if (val == 0)
+				return nbits;
+			ret = __ffs(val);
+		} while (!test_and_clear_bit(ret, addr));
+
+		return ret;
+	}
+
+	return _find_and_clear_next_bit(addr, nbits, offset);
+}
+
 /**
  * find_next_clump8 - find next 8-bit clump with set bits in a memory region
  * @clump: location to store copy of found clump
@@ -577,6 +848,28 @@ unsigned long find_next_bit_le(const void *addr, unsigned
 #define for_each_set_bit_from(bit, addr, size) \
 	for (; (bit) = find_next_bit((addr), (size), (bit)), (bit) < (size); (bit)++)
 
+/* same as for_each_set_bit() but atomically clears each found bit */
+#define for_each_test_and_clear_bit(bit, addr, size) \
+	for ((bit) = 0; \
+	     (bit) = find_and_clear_next_bit((addr), (size), (bit)), (bit) < (size); \
+	     (bit)++)
+
+/* same as for_each_set_bit_from() but atomically clears each found bit */
+#define for_each_test_and_clear_bit_from(bit, addr, size) \
+	for (; (bit) = find_and_clear_next_bit((addr), (size), (bit)), (bit) < (size); (bit)++)
+
+/* same as for_each_clear_bit() but atomically sets each found bit */
+#define for_each_test_and_set_bit(bit, addr, size) \
+	for ((bit) = 0; \
+	     (bit) = find_and_set_next_bit((addr), (size), (bit)), (bit) < (size); \
+	     (bit)++)
+
+/* same as for_each_clear_bit_from() but atomically clears each found bit */
+#define for_each_test_and_set_bit_from(bit, addr, size) \
+	for (; \
+	     (bit) = find_and_set_next_bit((addr), (size), (bit)), (bit) < (size); \
+	     (bit)++)
+
 #define for_each_clear_bit(bit, addr, size) \
 	for ((bit) = 0;									\
 	     (bit) = find_next_zero_bit((addr), (size), (bit)), (bit) < (size);		\
diff --git a/lib/find_bit.c b/lib/find_bit.c
index 32f99e9a670e..c9b6b9f96610 100644
--- a/lib/find_bit.c
+++ b/lib/find_bit.c
@@ -116,6 +116,91 @@ unsigned long _find_first_and_bit(const unsigned long *addr1,
 EXPORT_SYMBOL(_find_first_and_bit);
 #endif
 
+unsigned long _find_and_set_bit(volatile unsigned long *addr, unsigned long nbits)
+{
+	unsigned long bit;
+
+	do {
+		bit = FIND_FIRST_BIT(~addr[idx], /* nop */, nbits);
+		if (bit >= nbits)
+			return nbits;
+	} while (test_and_set_bit(bit, addr));
+
+	return bit;
+}
+EXPORT_SYMBOL(_find_and_set_bit);
+
+unsigned long _find_and_set_next_bit(volatile unsigned long *addr,
+				     unsigned long nbits, unsigned long start)
+{
+	unsigned long bit;
+
+	do {
+		bit = FIND_NEXT_BIT(~addr[idx], /* nop */, nbits, start);
+		if (bit >= nbits)
+			return nbits;
+	} while (test_and_set_bit(bit, addr));
+
+	return bit;
+}
+EXPORT_SYMBOL(_find_and_set_next_bit);
+
+unsigned long _find_and_set_bit_lock(volatile unsigned long *addr, unsigned long nbits)
+{
+	unsigned long bit;
+
+	do {
+		bit = FIND_FIRST_BIT(~addr[idx], /* nop */, nbits);
+		if (bit >= nbits)
+			return nbits;
+	} while (test_and_set_bit_lock(bit, addr));
+
+	return bit;
+}
+EXPORT_SYMBOL(_find_and_set_bit_lock);
+
+unsigned long _find_and_set_next_bit_lock(volatile unsigned long *addr,
+					  unsigned long nbits, unsigned long start)
+{
+	unsigned long bit;
+
+	do {
+		bit = FIND_NEXT_BIT(~addr[idx], /* nop */, nbits, start);
+		if (bit >= nbits)
+			return nbits;
+	} while (test_and_set_bit_lock(bit, addr));
+
+	return bit;
+}
+EXPORT_SYMBOL(_find_and_set_next_bit_lock);
+
+unsigned long _find_and_clear_bit(volatile unsigned long *addr, unsigned long nbits)
+{
+	unsigned long bit;
+
+	do {
+		bit = FIND_FIRST_BIT(addr[idx], /* nop */, nbits);
+		if (bit >= nbits)
+			return nbits;
+	} while (!test_and_clear_bit(bit, addr));
+
+	return bit;
+}
+EXPORT_SYMBOL(_find_and_clear_bit);
+
+unsigned long _find_and_clear_next_bit(volatile unsigned long *addr,
+					unsigned long nbits, unsigned long start)
+{
+	do {
+		start =  FIND_NEXT_BIT(addr[idx], /* nop */, nbits, start);
+		if (start >= nbits)
+			return nbits;
+	} while (!test_and_clear_bit(start, addr));
+
+	return start;
+}
+EXPORT_SYMBOL(_find_and_clear_next_bit);
+
 #ifndef find_first_zero_bit
 /*
  * Find the first cleared bit in a memory region.
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 01/35] lib/find: add atomic find_bit() primitives
@ 2023-12-03 19:23   ` Yury Norov
  0 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:23 UTC (permalink / raw)
  To: linux-kernel, David S. Miller, H. Peter Anvin,
	James E.J. Bottomley, K. Y. Srinivasan, Md. Haris Iqbal,
	Akinobu Mita, Andrew Morton, Bjorn Andersson, Borislav Petkov,
	Chaitanya Kulkarni, Christian Brauner, Damien Le Moal,
	Dave Hansen, David Disseldorp, Edward Cree, Eric Dumazet,
	Fenghua Yu, Geert Uytterhoeven, Greg Kroah-Hartman,
	Gregory Greenman, Hans Verkuil, Hans de Goede, Hugh Dickins,
	Ingo Molnar, Jakub Kicinski, Jaroslav Kysela, Jason Gunthorpe,
	Jens Axboe, Jiri Pirko, Jiri Slaby, Kalle Valo, Karsten Graul,
	Karsten Keil, Kees Cook, Leon Romanovsky, Mark Rutland,
	Martin Habets, Mauro Carvalho Chehab, Michael Ellerman,
	Michal Simek, Nicholas Piggin, Oliver Neukum, Paolo Abeni,
	Paolo Bonzini, Peter Zijlstra, Ping-Ke Shih, Rich Felker,
	Rob Herring, Robin Murphy, Sean Christopherson, Shuai Xue,
	Stanislaw Gruszka, Steven Rostedt, Thomas Bogendoerfer,
	Thomas Gleixner, Valentin Schneider, Vitaly Kuznetsov,
	Wenjia Zhang, Will Deacon, Yoshinori Sato,
	GR-QLogic-Storage-Upstream, alsa-devel, ath10k, dmaengine, iommu,
	kvm, linux-arm-kernel, linux-arm-msm, linux-block,
	linux-bluetooth, linux-hyperv, linux-m68k, linux-media,
	linux-mips, linux-net-drivers, linux-pci, linux-rdma, linux-s390,
	linux-scsi, linux-serial, linux-sh, linux-sound, linux-usb,
	linux-wireless, linuxppc-dev, mpi3mr-linuxdrv.pdl, netdev,
	sparclinux, x86
  Cc: Sergey Shtylyov, Jan Kara, Bart Van Assche, Yury Norov,
	Rasmus Villemoes, Matthew Wilcox, Alexey Klimov, Maxim Kuvyrkov,
	Andy Shevchenko, Mirsad Todorovac

Add helpers around test_and_{set,clear}_bit() that allow to search for
clear or set bits and flip them atomically.

The target patterns may look like this:

	for (idx = 0; idx < nbits; idx++)
		if (test_and_clear_bit(idx, bitmap))
			do_something(idx);

Or like this:

	do {
		bit = find_first_bit(bitmap, nbits);
		if (bit >= nbits)
			return nbits;
	} while (!test_and_clear_bit(bit, bitmap));
	return bit;

In both cases, the opencoded loop may be converted to a single function
or iterator call. Correspondingly:

	for_each_test_and_clear_bit(idx, bitmap, nbits)
		do_something(idx);

Or:
	return find_and_clear_bit(bitmap, nbits);

Obviously, the less routine code people have to write themself, the
less probability to make a mistake.

Those are not only handy helpers but also resolve a non-trivial
issue of using non-atomic find_bit() together with atomic
test_and_{set,clear)_bit().

The trick is that find_bit() implies that the bitmap is a regular
non-volatile piece of memory, and compiler is allowed to use such
optimization techniques like re-fetching memory instead of caching it.

For example, find_first_bit() is implemented like this:

      for (idx = 0; idx * BITS_PER_LONG < sz; idx++) {
              val = addr[idx];
              if (val) {
                      sz = min(idx * BITS_PER_LONG + __ffs(val), sz);
                      break;
              }
      }

On register-memory architectures, like x86, compiler may decide to
access memory twice - first time to compare against 0, and second time
to fetch its value to pass it to __ffs().

When running find_first_bit() on volatile memory, the memory may get
changed in-between, and for instance, it may lead to passing 0 to
__ffs(), which is undefined. This is a potentially dangerous call.

find_and_clear_bit() as a wrapper around test_and_clear_bit()
naturally treats underlying bitmap as a volatile memory and prevents
compiler from such optimizations.

Now that KCSAN is catching exactly this type of situations and warns on
undercover memory modifications. We can use it to reveal improper usage
of find_bit(), and convert it to atomic find_and_*_bit() as appropriate.

The 1st patch of the series adds the following atomic primitives:

	find_and_set_bit(addr, nbits);
	find_and_set_next_bit(addr, nbits, start);
	...

Here find_and_{set,clear} part refers to the corresponding
test_and_{set,clear}_bit function. Suffixes like _wrap or _lock
derive their semantics from corresponding find() or test() functions.

For brevity, the naming omits the fact that we search for zero bit in
find_and_set, and correspondingly search for set bit in find_and_clear
functions.

The patch also adds iterators with atomic semantics, like
for_each_test_and_set_bit(). Here, the naming rule is to simply prefix
corresponding atomic operation with 'for_each'.

All users of find_bit() API, where heavy concurrency is expected,
are encouraged to switch to atomic find_and_bit() as appropriate.

CC: Bart Van Assche <bvanassche@acm.org>
CC: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 include/linux/find.h | 293 +++++++++++++++++++++++++++++++++++++++++++
 lib/find_bit.c       |  85 +++++++++++++
 2 files changed, 378 insertions(+)

diff --git a/include/linux/find.h b/include/linux/find.h
index 5e4f39ef2e72..79b0e2589725 100644
--- a/include/linux/find.h
+++ b/include/linux/find.h
@@ -32,6 +32,16 @@ extern unsigned long _find_first_and_bit(const unsigned long *addr1,
 extern unsigned long _find_first_zero_bit(const unsigned long *addr, unsigned long size);
 extern unsigned long _find_last_bit(const unsigned long *addr, unsigned long size);
 
+unsigned long _find_and_set_bit(volatile unsigned long *addr, unsigned long nbits);
+unsigned long _find_and_set_next_bit(volatile unsigned long *addr, unsigned long nbits,
+				unsigned long start);
+unsigned long _find_and_set_bit_lock(volatile unsigned long *addr, unsigned long nbits);
+unsigned long _find_and_set_next_bit_lock(volatile unsigned long *addr, unsigned long nbits,
+					  unsigned long start);
+unsigned long _find_and_clear_bit(volatile unsigned long *addr, unsigned long nbits);
+unsigned long _find_and_clear_next_bit(volatile unsigned long *addr, unsigned long nbits,
+				unsigned long start);
+
 #ifdef __BIG_ENDIAN
 unsigned long _find_first_zero_bit_le(const unsigned long *addr, unsigned long size);
 unsigned long _find_next_zero_bit_le(const  unsigned long *addr, unsigned
@@ -460,6 +470,267 @@ unsigned long __for_each_wrap(const unsigned long *bitmap, unsigned long size,
 	return bit < start ? bit : size;
 }
 
+/**
+ * find_and_set_bit - Find a zero bit and set it atomically
+ * @addr: The address to base the search on
+ * @nbits: The bitmap size in bits
+ *
+ * This function is designed to operate in concurrent access environment.
+ *
+ * Because of concurrency and volatile nature of underlying bitmap, it's not
+ * guaranteed that the found bit is the 1st bit in the bitmap. It's also not
+ * guaranteed that if @nbits is returned, the bitmap is empty.
+ *
+ * The function does guarantee that if returned value is in range [0 .. @nbits),
+ * the acquired bit belongs to the caller exclusively.
+ *
+ * Returns: found and set bit, or @nbits if no bits found
+ */
+static inline
+unsigned long find_and_set_bit(volatile unsigned long *addr, unsigned long nbits)
+{
+	if (small_const_nbits(nbits)) {
+		unsigned long val, ret;
+
+		do {
+			val = *addr | ~GENMASK(nbits - 1, 0);
+			if (val == ~0UL)
+				return nbits;
+			ret = ffz(val);
+		} while (test_and_set_bit(ret, addr));
+
+		return ret;
+	}
+
+	return _find_and_set_bit(addr, nbits);
+}
+
+
+/**
+ * find_and_set_next_bit - Find a zero bit and set it, starting from @offset
+ * @addr: The address to base the search on
+ * @nbits: The bitmap nbits in bits
+ * @offset: The bitnumber to start searching at
+ *
+ * This function is designed to operate in concurrent access environment.
+ *
+ * Because of concurrency and volatile nature of underlying bitmap, it's not
+ * guaranteed that the found bit is the 1st bit in the bitmap, starting from @offset.
+ * It's also not guaranteed that if @nbits is returned, the bitmap is empty.
+ *
+ * The function does guarantee that if returned value is in range [@offset .. @nbits),
+ * the acquired bit belongs to the caller exclusively.
+ *
+ * Returns: found and set bit, or @nbits if no bits found
+ */
+static inline
+unsigned long find_and_set_next_bit(volatile unsigned long *addr,
+				    unsigned long nbits, unsigned long offset)
+{
+	if (small_const_nbits(nbits)) {
+		unsigned long val, ret;
+
+		do {
+			val = *addr | ~GENMASK(nbits - 1, offset);
+			if (val == ~0UL)
+				return nbits;
+			ret = ffz(val);
+		} while (test_and_set_bit(ret, addr));
+
+		return ret;
+	}
+
+	return _find_and_set_next_bit(addr, nbits, offset);
+}
+
+/**
+ * find_and_set_bit_wrap - find and set bit starting at @offset, wrapping around zero
+ * @addr: The first address to base the search on
+ * @nbits: The bitmap size in bits
+ * @offset: The bitnumber to start searching at
+ *
+ * Returns: the bit number for the next clear bit, or first clear bit up to @offset,
+ * while atomically setting it. If no bits are found, returns @nbits.
+ */
+static inline
+unsigned long find_and_set_bit_wrap(volatile unsigned long *addr,
+					unsigned long nbits, unsigned long offset)
+{
+	unsigned long bit = find_and_set_next_bit(addr, nbits, offset);
+
+	if (bit < nbits || offset == 0)
+		return bit;
+
+	bit = find_and_set_bit(addr, offset);
+	return bit < offset ? bit : nbits;
+}
+
+/**
+ * find_and_set_bit_lock - find a zero bit, then set it atomically with lock
+ * @addr: The address to base the search on
+ * @nbits: The bitmap nbits in bits
+ *
+ * This function is designed to operate in concurrent access environment.
+ *
+ * Because of concurrency and volatile nature of underlying bitmap, it's not
+ * guaranteed that the found bit is the 1st bit in the bitmap. It's also not
+ * guaranteed that if @nbits is returned, the bitmap is empty.
+ *
+ * The function does guarantee that if returned value is in range [0 .. @nbits),
+ * the acquired bit belongs to the caller exclusively.
+ *
+ * Returns: found and set bit, or @nbits if no bits found
+ */
+static inline
+unsigned long find_and_set_bit_lock(volatile unsigned long *addr, unsigned long nbits)
+{
+	if (small_const_nbits(nbits)) {
+		unsigned long val, ret;
+
+		do {
+			val = *addr | ~GENMASK(nbits - 1, 0);
+			if (val == ~0UL)
+				return nbits;
+			ret = ffz(val);
+		} while (test_and_set_bit_lock(ret, addr));
+
+		return ret;
+	}
+
+	return _find_and_set_bit_lock(addr, nbits);
+}
+
+/**
+ * find_and_set_next_bit_lock - find a zero bit and set it atomically with lock
+ * @addr: The address to base the search on
+ * @nbits: The bitmap size in bits
+ * @offset: The bitnumber to start searching at
+ *
+ * This function is designed to operate in concurrent access environment.
+ *
+ * Because of concurrency and volatile nature of underlying bitmap, it's not
+ * guaranteed that the found bit is the 1st bit in the range. It's also not
+ * guaranteed that if @nbits is returned, the bitmap is empty.
+ *
+ * The function does guarantee that if returned value is in range [@offset .. @nbits),
+ * the acquired bit belongs to the caller exclusively.
+ *
+ * Returns: found and set bit, or @nbits if no bits found
+ */
+static inline
+unsigned long find_and_set_next_bit_lock(volatile unsigned long *addr,
+					 unsigned long nbits, unsigned long offset)
+{
+	if (small_const_nbits(nbits)) {
+		unsigned long val, ret;
+
+		do {
+			val = *addr | ~GENMASK(nbits - 1, offset);
+			if (val == ~0UL)
+				return nbits;
+			ret = ffz(val);
+		} while (test_and_set_bit_lock(ret, addr));
+
+		return ret;
+	}
+
+	return _find_and_set_next_bit_lock(addr, nbits, offset);
+}
+
+/**
+ * find_and_set_bit_wrap_lock - find zero bit starting at @ofset and set it
+ *				with lock, and wrap around zero if nothing found
+ * @addr: The first address to base the search on
+ * @nbits: The bitmap size in bits
+ * @offset: The bitnumber to start searching at
+ *
+ * Returns: the bit number for the next set bit, or first set bit up to @offset
+ * If no bits are set, returns @nbits.
+ */
+static inline
+unsigned long find_and_set_bit_wrap_lock(volatile unsigned long *addr,
+					unsigned long nbits, unsigned long offset)
+{
+	unsigned long bit = find_and_set_next_bit_lock(addr, nbits, offset);
+
+	if (bit < nbits || offset == 0)
+		return bit;
+
+	bit = find_and_set_bit_lock(addr, offset);
+	return bit < offset ? bit : nbits;
+}
+
+/**
+ * find_and_clear_bit - Find a set bit and clear it atomically
+ * @addr: The address to base the search on
+ * @nbits: The bitmap nbits in bits
+ *
+ * This function is designed to operate in concurrent access environment.
+ *
+ * Because of concurrency and volatile nature of underlying bitmap, it's not
+ * guaranteed that the found bit is the 1st bit in the bitmap. It's also not
+ * guaranteed that if @nbits is returned, the bitmap is empty.
+ *
+ * The function does guarantee that if returned value is in range [0 .. @nbits),
+ * the acquired bit belongs to the caller exclusively.
+ *
+ * Returns: found and cleared bit, or @nbits if no bits found
+ */
+static inline unsigned long find_and_clear_bit(volatile unsigned long *addr, unsigned long nbits)
+{
+	if (small_const_nbits(nbits)) {
+		unsigned long val, ret;
+
+		do {
+			val = *addr & GENMASK(nbits - 1, 0);
+			if (val == 0)
+				return nbits;
+			ret = __ffs(val);
+		} while (!test_and_clear_bit(ret, addr));
+
+		return ret;
+	}
+
+	return _find_and_clear_bit(addr, nbits);
+}
+
+/**
+ * find_and_clear_next_bit - Find a set bit next after @offset, and clear it atomically
+ * @addr: The address to base the search on
+ * @nbits: The bitmap nbits in bits
+ * @offset: bit offset at which to start searching
+ *
+ * This function is designed to operate in concurrent access environment.
+ *
+ * Because of concurrency and volatile nature of underlying bitmap, it's not
+ * guaranteed that the found bit is the 1st bit in the range It's also not
+ * guaranteed that if @nbits is returned, there's no set bits after @offset.
+ *
+ * The function does guarantee that if returned value is in range [@offset .. @nbits),
+ * the acquired bit belongs to the caller exclusively.
+ *
+ * Returns: found and cleared bit, or @nbits if no bits found
+ */
+static inline
+unsigned long find_and_clear_next_bit(volatile unsigned long *addr,
+					unsigned long nbits, unsigned long offset)
+{
+	if (small_const_nbits(nbits)) {
+		unsigned long val, ret;
+
+		do {
+			val = *addr & GENMASK(nbits - 1, offset);
+			if (val == 0)
+				return nbits;
+			ret = __ffs(val);
+		} while (!test_and_clear_bit(ret, addr));
+
+		return ret;
+	}
+
+	return _find_and_clear_next_bit(addr, nbits, offset);
+}
+
 /**
  * find_next_clump8 - find next 8-bit clump with set bits in a memory region
  * @clump: location to store copy of found clump
@@ -577,6 +848,28 @@ unsigned long find_next_bit_le(const void *addr, unsigned
 #define for_each_set_bit_from(bit, addr, size) \
 	for (; (bit) = find_next_bit((addr), (size), (bit)), (bit) < (size); (bit)++)
 
+/* same as for_each_set_bit() but atomically clears each found bit */
+#define for_each_test_and_clear_bit(bit, addr, size) \
+	for ((bit) = 0; \
+	     (bit) = find_and_clear_next_bit((addr), (size), (bit)), (bit) < (size); \
+	     (bit)++)
+
+/* same as for_each_set_bit_from() but atomically clears each found bit */
+#define for_each_test_and_clear_bit_from(bit, addr, size) \
+	for (; (bit) = find_and_clear_next_bit((addr), (size), (bit)), (bit) < (size); (bit)++)
+
+/* same as for_each_clear_bit() but atomically sets each found bit */
+#define for_each_test_and_set_bit(bit, addr, size) \
+	for ((bit) = 0; \
+	     (bit) = find_and_set_next_bit((addr), (size), (bit)), (bit) < (size); \
+	     (bit)++)
+
+/* same as for_each_clear_bit_from() but atomically clears each found bit */
+#define for_each_test_and_set_bit_from(bit, addr, size) \
+	for (; \
+	     (bit) = find_and_set_next_bit((addr), (size), (bit)), (bit) < (size); \
+	     (bit)++)
+
 #define for_each_clear_bit(bit, addr, size) \
 	for ((bit) = 0;									\
 	     (bit) = find_next_zero_bit((addr), (size), (bit)), (bit) < (size);		\
diff --git a/lib/find_bit.c b/lib/find_bit.c
index 32f99e9a670e..c9b6b9f96610 100644
--- a/lib/find_bit.c
+++ b/lib/find_bit.c
@@ -116,6 +116,91 @@ unsigned long _find_first_and_bit(const unsigned long *addr1,
 EXPORT_SYMBOL(_find_first_and_bit);
 #endif
 
+unsigned long _find_and_set_bit(volatile unsigned long *addr, unsigned long nbits)
+{
+	unsigned long bit;
+
+	do {
+		bit = FIND_FIRST_BIT(~addr[idx], /* nop */, nbits);
+		if (bit >= nbits)
+			return nbits;
+	} while (test_and_set_bit(bit, addr));
+
+	return bit;
+}
+EXPORT_SYMBOL(_find_and_set_bit);
+
+unsigned long _find_and_set_next_bit(volatile unsigned long *addr,
+				     unsigned long nbits, unsigned long start)
+{
+	unsigned long bit;
+
+	do {
+		bit = FIND_NEXT_BIT(~addr[idx], /* nop */, nbits, start);
+		if (bit >= nbits)
+			return nbits;
+	} while (test_and_set_bit(bit, addr));
+
+	return bit;
+}
+EXPORT_SYMBOL(_find_and_set_next_bit);
+
+unsigned long _find_and_set_bit_lock(volatile unsigned long *addr, unsigned long nbits)
+{
+	unsigned long bit;
+
+	do {
+		bit = FIND_FIRST_BIT(~addr[idx], /* nop */, nbits);
+		if (bit >= nbits)
+			return nbits;
+	} while (test_and_set_bit_lock(bit, addr));
+
+	return bit;
+}
+EXPORT_SYMBOL(_find_and_set_bit_lock);
+
+unsigned long _find_and_set_next_bit_lock(volatile unsigned long *addr,
+					  unsigned long nbits, unsigned long start)
+{
+	unsigned long bit;
+
+	do {
+		bit = FIND_NEXT_BIT(~addr[idx], /* nop */, nbits, start);
+		if (bit >= nbits)
+			return nbits;
+	} while (test_and_set_bit_lock(bit, addr));
+
+	return bit;
+}
+EXPORT_SYMBOL(_find_and_set_next_bit_lock);
+
+unsigned long _find_and_clear_bit(volatile unsigned long *addr, unsigned long nbits)
+{
+	unsigned long bit;
+
+	do {
+		bit = FIND_FIRST_BIT(addr[idx], /* nop */, nbits);
+		if (bit >= nbits)
+			return nbits;
+	} while (!test_and_clear_bit(bit, addr));
+
+	return bit;
+}
+EXPORT_SYMBOL(_find_and_clear_bit);
+
+unsigned long _find_and_clear_next_bit(volatile unsigned long *addr,
+					unsigned long nbits, unsigned long start)
+{
+	do {
+		start =  FIND_NEXT_BIT(addr[idx], /* nop */, nbits, start);
+		if (start >= nbits)
+			return nbits;
+	} while (!test_and_clear_bit(start, addr));
+
+	return start;
+}
+EXPORT_SYMBOL(_find_and_clear_next_bit);
+
 #ifndef find_first_zero_bit
 /*
  * Find the first cleared bit in a memory region.
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 02/35] lib/find: add test for atomic find_bit() ops
  2023-12-03 19:23 ` Yury Norov
@ 2023-12-03 19:32   ` Yury Norov
  -1 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:32 UTC (permalink / raw)
  To: linux-kernel, David S. Miller, H. Peter Anvin,
	James E.J. Bottomley, K. Y. Srinivasan, Md. Haris Iqbal,
	Akinobu Mita, Andrew Morton, Bjorn Andersson, Borislav Petkov,
	Chaitanya Kulkarni, Christian Brauner, Damien Le Moal,
	Dave Hansen, David Disseldorp, Edward Cree, Eric Dumazet,
	Fenghua Yu, Geert Uytterhoeven, Greg Kroah-Hartman,
	Gregory Greenman, Hans Verkuil, Hans de Goede, Hugh Dickins,
	Ingo Molnar, Jakub Kicinski, Jaroslav Kysela, Jason Gunthorpe,
	Jens Axboe, Jiri Pirko, Jiri Slaby, Kalle Valo, Karsten Graul,
	Karsten Keil, Kees Cook, Leon Romanovsky, Mark Rutland,
	Martin Habets, Mauro Carvalho Chehab, Michael Ellerman,
	Michal Simek, Nicholas Piggin, Oliver Neukum, Paolo Abeni,
	Paolo Bonzini, Peter Zijlstra, Ping-Ke Shih, Rich Felker,
	Rob Herring, Robin Murphy, Sean Christopherson, Shuai Xue,
	Stanislaw Gruszka, Steven Rostedt, Thomas Bogendoerfer,
	Thomas Gleixner, Valentin Schneider, Vitaly Kuznetsov,
	Wenjia Zhang, Will Deacon, Yoshinori Sato,
	GR-QLogic-Storage-Upstream, alsa-devel, ath10k, dmaengine, iommu,
	kvm, linux-arm-kernel, linux-arm-msm, linux-block,
	linux-bluetooth, linux-hyperv, linux-m68k, linux-media,
	linux-mips, linux-net-drivers, linux-pci, linux-rdma, linux-s390,
	linux-scsi, linux-serial, linux-sh, linux-sound, linux-usb,
	linux-wireless, linuxppc-dev, mpi3mr-linuxdrv.pdl, netdev,
	sparclinux, x86
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov,
	Bart Van Assche, Sergey Shtylyov

Add basic functionality test for new API.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 lib/test_bitmap.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index 65f22c2578b0..277e1ca9fd28 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -221,6 +221,65 @@ static void __init test_zero_clear(void)
 	expect_eq_pbl("", bmap, 1024);
 }
 
+static void __init test_find_and_bit(void)
+{
+	unsigned long w, w_part, bit, cnt = 0;
+	DECLARE_BITMAP(bmap, EXP1_IN_BITS);
+
+	/*
+	 * Test find_and_clear{_next}_bit() and corresponding
+	 * iterators
+	 */
+	bitmap_copy(bmap, exp1, EXP1_IN_BITS);
+	w = bitmap_weight(bmap, EXP1_IN_BITS);
+
+	for_each_test_and_clear_bit(bit, bmap, EXP1_IN_BITS)
+		cnt++;
+
+	expect_eq_uint(w, cnt);
+	expect_eq_uint(0, bitmap_weight(bmap, EXP1_IN_BITS));
+
+	bitmap_copy(bmap, exp1, EXP1_IN_BITS);
+	w = bitmap_weight(bmap, EXP1_IN_BITS);
+	w_part = bitmap_weight(bmap, EXP1_IN_BITS / 3);
+
+	cnt = 0;
+	bit = EXP1_IN_BITS / 3;
+	for_each_test_and_clear_bit_from(bit, bmap, EXP1_IN_BITS)
+		cnt++;
+
+	expect_eq_uint(bitmap_weight(bmap, EXP1_IN_BITS), bitmap_weight(bmap, EXP1_IN_BITS / 3));
+	expect_eq_uint(w_part, bitmap_weight(bmap, EXP1_IN_BITS));
+	expect_eq_uint(w - w_part, cnt);
+
+	/*
+	 * Test find_and_set{_next}_bit() and corresponding
+	 * iterators
+	 */
+	bitmap_copy(bmap, exp1, EXP1_IN_BITS);
+	w = bitmap_weight(bmap, EXP1_IN_BITS);
+	cnt = 0;
+
+	for_each_test_and_set_bit(bit, bmap, EXP1_IN_BITS)
+		cnt++;
+
+	expect_eq_uint(EXP1_IN_BITS - w, cnt);
+	expect_eq_uint(EXP1_IN_BITS, bitmap_weight(bmap, EXP1_IN_BITS));
+
+	bitmap_copy(bmap, exp1, EXP1_IN_BITS);
+	w = bitmap_weight(bmap, EXP1_IN_BITS);
+	w_part = bitmap_weight(bmap, EXP1_IN_BITS / 3);
+	cnt = 0;
+
+	bit = EXP1_IN_BITS / 3;
+	for_each_test_and_set_bit_from(bit, bmap, EXP1_IN_BITS)
+		cnt++;
+
+	expect_eq_uint(EXP1_IN_BITS - bitmap_weight(bmap, EXP1_IN_BITS),
+			EXP1_IN_BITS / 3 - bitmap_weight(bmap, EXP1_IN_BITS / 3));
+	expect_eq_uint(EXP1_IN_BITS * 2 / 3 - (w - w_part), cnt);
+}
+
 static void __init test_find_nth_bit(void)
 {
 	unsigned long b, bit, cnt = 0;
@@ -1273,6 +1332,8 @@ static void __init selftest(void)
 	test_for_each_clear_bitrange_from();
 	test_for_each_set_clump8();
 	test_for_each_set_bit_wrap();
+
+	test_find_and_bit();
 }
 
 KSTM_MODULE_LOADERS(test_bitmap);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 02/35] lib/find: add test for atomic find_bit() ops
@ 2023-12-03 19:32   ` Yury Norov
  0 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:32 UTC (permalink / raw)
  To: linux-kernel, David S. Miller, H. Peter Anvin,
	James E.J. Bottomley, K. Y. Srinivasan, Md. Haris Iqbal,
	Akinobu Mita, Andrew Morton, Bjorn Andersson, Borislav Petkov,
	Chaitanya Kulkarni, Christian Brauner, Damien Le Moal,
	Dave Hansen, David Disseldorp, Edward Cree, Eric Dumazet,
	Fenghua Yu, Geert Uytterhoeven, Greg Kroah-Hartman,
	Gregory Greenman, Hans Verkuil, Hans de Goede, Hugh Dickins,
	Ingo Molnar, Jakub Kicinski, Jaroslav Kysela, Jason Gunthorpe,
	Jens Axboe, Jiri Pirko, Jiri Slaby, Kalle Valo, Karsten Graul,
	Karsten Keil, Kees Cook, Leon Romanovsky, Mark Rutland,
	Martin Habets, Mauro Carvalho Chehab, Michael Ellerman,
	Michal Simek, Nicholas Piggin, Oliver Neukum, Paolo Abeni,
	Paolo Bonzini, Peter Zijlstra, Ping-Ke Shih, Rich Felker,
	Rob Herring, Robin Murphy, Sean Christopherson, Shuai Xue,
	Stanislaw Gruszka, Steven Rostedt, Thomas Bogendoerfer,
	Thomas Gleixner, Valentin Schneider, Vitaly Kuznetsov,
	Wenjia Zhang, Will Deacon, Yoshinori Sato,
	GR-QLogic-Storage-Upstream, alsa-devel, ath10k, dmaengine, iommu,
	kvm, linux-arm-kernel, linux-arm-msm, linux-block,
	linux-bluetooth, linux-hyperv, linux-m68k, linux-media,
	linux-mips, linux-net-drivers, linux-pci, linux-rdma, linux-s390,
	linux-scsi, linux-serial, linux-sh, linux-sound, linux-usb,
	linux-wireless, linuxppc-dev, mpi3mr-linuxdrv.pdl, netdev,
	sparclinux, x86
  Cc: Sergey Shtylyov, Jan Kara, Bart Van Assche, Yury Norov,
	Rasmus Villemoes, Matthew Wilcox, Alexey Klimov, Maxim Kuvyrkov,
	Andy Shevchenko, Mirsad Todorovac

Add basic functionality test for new API.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 lib/test_bitmap.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index 65f22c2578b0..277e1ca9fd28 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -221,6 +221,65 @@ static void __init test_zero_clear(void)
 	expect_eq_pbl("", bmap, 1024);
 }
 
+static void __init test_find_and_bit(void)
+{
+	unsigned long w, w_part, bit, cnt = 0;
+	DECLARE_BITMAP(bmap, EXP1_IN_BITS);
+
+	/*
+	 * Test find_and_clear{_next}_bit() and corresponding
+	 * iterators
+	 */
+	bitmap_copy(bmap, exp1, EXP1_IN_BITS);
+	w = bitmap_weight(bmap, EXP1_IN_BITS);
+
+	for_each_test_and_clear_bit(bit, bmap, EXP1_IN_BITS)
+		cnt++;
+
+	expect_eq_uint(w, cnt);
+	expect_eq_uint(0, bitmap_weight(bmap, EXP1_IN_BITS));
+
+	bitmap_copy(bmap, exp1, EXP1_IN_BITS);
+	w = bitmap_weight(bmap, EXP1_IN_BITS);
+	w_part = bitmap_weight(bmap, EXP1_IN_BITS / 3);
+
+	cnt = 0;
+	bit = EXP1_IN_BITS / 3;
+	for_each_test_and_clear_bit_from(bit, bmap, EXP1_IN_BITS)
+		cnt++;
+
+	expect_eq_uint(bitmap_weight(bmap, EXP1_IN_BITS), bitmap_weight(bmap, EXP1_IN_BITS / 3));
+	expect_eq_uint(w_part, bitmap_weight(bmap, EXP1_IN_BITS));
+	expect_eq_uint(w - w_part, cnt);
+
+	/*
+	 * Test find_and_set{_next}_bit() and corresponding
+	 * iterators
+	 */
+	bitmap_copy(bmap, exp1, EXP1_IN_BITS);
+	w = bitmap_weight(bmap, EXP1_IN_BITS);
+	cnt = 0;
+
+	for_each_test_and_set_bit(bit, bmap, EXP1_IN_BITS)
+		cnt++;
+
+	expect_eq_uint(EXP1_IN_BITS - w, cnt);
+	expect_eq_uint(EXP1_IN_BITS, bitmap_weight(bmap, EXP1_IN_BITS));
+
+	bitmap_copy(bmap, exp1, EXP1_IN_BITS);
+	w = bitmap_weight(bmap, EXP1_IN_BITS);
+	w_part = bitmap_weight(bmap, EXP1_IN_BITS / 3);
+	cnt = 0;
+
+	bit = EXP1_IN_BITS / 3;
+	for_each_test_and_set_bit_from(bit, bmap, EXP1_IN_BITS)
+		cnt++;
+
+	expect_eq_uint(EXP1_IN_BITS - bitmap_weight(bmap, EXP1_IN_BITS),
+			EXP1_IN_BITS / 3 - bitmap_weight(bmap, EXP1_IN_BITS / 3));
+	expect_eq_uint(EXP1_IN_BITS * 2 / 3 - (w - w_part), cnt);
+}
+
 static void __init test_find_nth_bit(void)
 {
 	unsigned long b, bit, cnt = 0;
@@ -1273,6 +1332,8 @@ static void __init selftest(void)
 	test_for_each_clear_bitrange_from();
 	test_for_each_set_clump8();
 	test_for_each_set_bit_wrap();
+
+	test_find_and_bit();
 }
 
 KSTM_MODULE_LOADERS(test_bitmap);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 03/35] lib/sbitmap; make __sbitmap_get_word() using find_and_set_bit()
  2023-12-03 19:32   ` Yury Norov
  (?)
@ 2023-12-03 19:32   ` Yury Norov
  2023-12-04 18:22     ` Jan Kara
  2023-12-04 18:40     ` Jens Axboe
  -1 siblings, 2 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:32 UTC (permalink / raw)
  To: linux-kernel, Jens Axboe, linux-block
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov,
	Bart Van Assche, Sergey Shtylyov

__sbitmap_get_word() opencodes either find_and_set_bit_wrap(), or
find_and_set_next_bit() depending on hint and wrap parameters.

Switch it to use the atomic find_bit() API. While here, simplify
sbitmap_find_bit_in_word(), which calls it.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 lib/sbitmap.c | 46 ++++++++--------------------------------------
 1 file changed, 8 insertions(+), 38 deletions(-)

diff --git a/lib/sbitmap.c b/lib/sbitmap.c
index d0a5081dfd12..b21aebd07fd6 100644
--- a/lib/sbitmap.c
+++ b/lib/sbitmap.c
@@ -133,38 +133,11 @@ void sbitmap_resize(struct sbitmap *sb, unsigned int depth)
 }
 EXPORT_SYMBOL_GPL(sbitmap_resize);
 
-static int __sbitmap_get_word(unsigned long *word, unsigned long depth,
+static inline int __sbitmap_get_word(unsigned long *word, unsigned long depth,
 			      unsigned int hint, bool wrap)
 {
-	int nr;
-
-	/* don't wrap if starting from 0 */
-	wrap = wrap && hint;
-
-	while (1) {
-		nr = find_next_zero_bit(word, depth, hint);
-		if (unlikely(nr >= depth)) {
-			/*
-			 * We started with an offset, and we didn't reset the
-			 * offset to 0 in a failure case, so start from 0 to
-			 * exhaust the map.
-			 */
-			if (hint && wrap) {
-				hint = 0;
-				continue;
-			}
-			return -1;
-		}
-
-		if (!test_and_set_bit_lock(nr, word))
-			break;
-
-		hint = nr + 1;
-		if (hint >= depth - 1)
-			hint = 0;
-	}
-
-	return nr;
+	return wrap ? find_and_set_bit_wrap_lock(word, depth, hint) :
+			find_and_set_next_bit_lock(word, depth, hint);
 }
 
 static int sbitmap_find_bit_in_word(struct sbitmap_word *map,
@@ -175,15 +148,12 @@ static int sbitmap_find_bit_in_word(struct sbitmap_word *map,
 	int nr;
 
 	do {
-		nr = __sbitmap_get_word(&map->word, depth,
-					alloc_hint, wrap);
-		if (nr != -1)
-			break;
-		if (!sbitmap_deferred_clear(map))
-			break;
-	} while (1);
+		nr = __sbitmap_get_word(&map->word, depth, alloc_hint, wrap);
+		if (nr < depth)
+			return nr;
+	} while (sbitmap_deferred_clear(map));
 
-	return nr;
+	return -1;
 }
 
 static int sbitmap_find_bit(struct sbitmap *sb,
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 04/35] watch_queue: use atomic find_bit() in post_one_notification()
  2023-12-03 19:32   ` Yury Norov
  (?)
  (?)
@ 2023-12-03 19:32   ` Yury Norov
  -1 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:32 UTC (permalink / raw)
  To: linux-kernel, Christian Brauner, David Howells, Siddh Raman Pant,
	Yury Norov, Dave Airlie, David Disseldorp, Philipp Stanner,
	Nick Alcock
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov, Bart Van Assche,
	Sergey Shtylyov

post_one_notification() searches for a set bit in wqueue->notes_bitmap,
and after some housekeeping work clears it, firing a BUG() if someone
else cleared the bit in-between.

We can allocate the bit atomically with an atomic find_and_clear_bit(),
and remove the BUG() possibility entirely.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 kernel/watch_queue.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/kernel/watch_queue.c b/kernel/watch_queue.c
index 778b4056700f..07edd4a2b463 100644
--- a/kernel/watch_queue.c
+++ b/kernel/watch_queue.c
@@ -112,7 +112,7 @@ static bool post_one_notification(struct watch_queue *wqueue,
 	if (pipe_full(head, tail, pipe->ring_size))
 		goto lost;
 
-	note = find_first_bit(wqueue->notes_bitmap, wqueue->nr_notes);
+	note = find_and_clear_bit(wqueue->notes_bitmap, wqueue->nr_notes);
 	if (note >= wqueue->nr_notes)
 		goto lost;
 
@@ -133,10 +133,6 @@ static bool post_one_notification(struct watch_queue *wqueue,
 	buf->flags = PIPE_BUF_FLAG_WHOLE;
 	smp_store_release(&pipe->head, head + 1); /* vs pipe_read() */
 
-	if (!test_and_clear_bit(note, wqueue->notes_bitmap)) {
-		spin_unlock_irq(&pipe->rd_wait.lock);
-		BUG();
-	}
 	wake_up_interruptible_sync_poll_locked(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM);
 	done = true;
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 05/35] sched: add cpumask_find_and_set() and use it in __mm_cid_get()
  2023-12-03 19:32   ` Yury Norov
                     ` (2 preceding siblings ...)
  (?)
@ 2023-12-03 19:32   ` Yury Norov
  -1 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:32 UTC (permalink / raw)
  To: linux-kernel, Yury Norov, Andy Shevchenko, Rasmus Villemoes,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Maxim Kuvyrkov,
	Alexey Klimov, Bart Van Assche, Sergey Shtylyov,
	Mathieu Desnoyers

__mm_cid_get() uses __mm_cid_try_get() helper to atomically acquire a
bit in mm cid mask. Now that we have atomic find_and_set_bit(), we can
easily extend it to cpumasks and use in the scheduler code.

cpumask_find_and_set() considers cid mask as a volatile region of memory,
as it actually is in this case. So, if it's changed while search is in
progress, KCSAN wouldn't fire warning on it.

CC: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 include/linux/cpumask.h | 12 ++++++++++++
 kernel/sched/sched.h    | 14 +++++---------
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index cfb545841a2c..c2acced8be4e 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -271,6 +271,18 @@ unsigned int cpumask_next_and(int n, const struct cpumask *src1p,
 		small_cpumask_bits, n + 1);
 }
 
+/**
+ * cpumask_find_and_set - find the first unset cpu in a cpumask and
+ *			  set it atomically
+ * @srcp: the cpumask pointer
+ *
+ * Return: >= nr_cpu_ids if nothing is found.
+ */
+static inline unsigned int cpumask_find_and_set(volatile struct cpumask *srcp)
+{
+	return find_and_set_bit(cpumask_bits(srcp), small_cpumask_bits);
+}
+
 /**
  * for_each_cpu - iterate over every cpu in a mask
  * @cpu: the (optionally unsigned) integer iterator
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 2e5a95486a42..2ce9112de89b 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -3347,23 +3347,19 @@ static inline void mm_cid_put(struct mm_struct *mm)
 
 static inline int __mm_cid_try_get(struct mm_struct *mm)
 {
-	struct cpumask *cpumask;
-	int cid;
+	struct cpumask *cpumask = mm_cidmask(mm);
+	int cid = nr_cpu_ids;
 
-	cpumask = mm_cidmask(mm);
 	/*
 	 * Retry finding first zero bit if the mask is temporarily
 	 * filled. This only happens during concurrent remote-clear
 	 * which owns a cid without holding a rq lock.
 	 */
-	for (;;) {
-		cid = cpumask_first_zero(cpumask);
-		if (cid < nr_cpu_ids)
-			break;
+	while (cid >= nr_cpu_ids) {
+		cid = cpumask_find_and_set(cpumask);
 		cpu_relax();
 	}
-	if (cpumask_test_and_set_cpu(cid, cpumask))
-		return -1;
+
 	return cid;
 }
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 06/35] mips: sgi-ip30: rework heart_alloc_int()
  2023-12-03 19:32   ` Yury Norov
                     ` (3 preceding siblings ...)
  (?)
@ 2023-12-03 19:32   ` Yury Norov
  -1 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:32 UTC (permalink / raw)
  To: linux-kernel, Thomas Bogendoerfer, Yury Norov, linux-mips
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov, Bart Van Assche,
	Sergey Shtylyov

heart_alloc_int() opencodes find_and_set_bit(). Switch it to using the
dedicated function, and make an nice one-liner.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 arch/mips/sgi-ip30/ip30-irq.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/arch/mips/sgi-ip30/ip30-irq.c b/arch/mips/sgi-ip30/ip30-irq.c
index 423c32cb66ed..3c4d4e947817 100644
--- a/arch/mips/sgi-ip30/ip30-irq.c
+++ b/arch/mips/sgi-ip30/ip30-irq.c
@@ -28,17 +28,9 @@ static DEFINE_PER_CPU(unsigned long, irq_enable_mask);
 
 static inline int heart_alloc_int(void)
 {
-	int bit;
+	int bit = find_and_set_bit(heart_irq_map, HEART_NUM_IRQS);
 
-again:
-	bit = find_first_zero_bit(heart_irq_map, HEART_NUM_IRQS);
-	if (bit >= HEART_NUM_IRQS)
-		return -ENOSPC;
-
-	if (test_and_set_bit(bit, heart_irq_map))
-		goto again;
-
-	return bit;
+	return bit < HEART_NUM_IRQS ? bit : -ENOSPC;
 }
 
 static void ip30_error_irq(struct irq_desc *desc)
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 07/35] sparc: fix opencoded find_and_set_bit() in alloc_msi()
  2023-12-03 19:32   ` Yury Norov
                     ` (4 preceding siblings ...)
  (?)
@ 2023-12-03 19:32   ` Yury Norov
  -1 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:32 UTC (permalink / raw)
  To: linux-kernel, David S. Miller, Rob Herring, Sam Ravnborg,
	Yury Norov, sparclinux
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov, Bart Van Assche,
	Sergey Shtylyov

alloc_msi() opencodes find_and_clear_bit(). Switch it to using the
dedicated function, and make an nice one-liner.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 arch/sparc/kernel/pci_msi.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/sparc/kernel/pci_msi.c b/arch/sparc/kernel/pci_msi.c
index fc7402948b7b..91105c788d1d 100644
--- a/arch/sparc/kernel/pci_msi.c
+++ b/arch/sparc/kernel/pci_msi.c
@@ -96,14 +96,9 @@ static u32 pick_msiq(struct pci_pbm_info *pbm)
 
 static int alloc_msi(struct pci_pbm_info *pbm)
 {
-	int i;
-
-	for (i = 0; i < pbm->msi_num; i++) {
-		if (!test_and_set_bit(i, pbm->msi_bitmap))
-			return i + pbm->msi_first;
-	}
+	int i = find_and_set_bit(pbm->msi_bitmap, pbm->msi_num);
 
-	return -ENOENT;
+	return i < pbm->msi_num ? i + pbm->msi_first : -ENOENT;
 }
 
 static void free_msi(struct pci_pbm_info *pbm, int msi_num)
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 08/35] perf/arm: optimize opencoded atomic find_bit() API
  2023-12-03 19:32   ` Yury Norov
@ 2023-12-03 19:32     ` Yury Norov
  -1 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:32 UTC (permalink / raw)
  To: linux-kernel, Will Deacon, Mark Rutland, linux-arm-kernel
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov,
	Bart Van Assche, Sergey Shtylyov

Switch subsystem to use atomic find_bit() or atomic iterators as
appropriate.

CC: Will Deacon <will@kernel.org>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/perf/arm-cci.c        | 24 ++++++------------------
 drivers/perf/arm-ccn.c        | 10 ++--------
 drivers/perf/arm_dmc620_pmu.c |  9 ++-------
 drivers/perf/arm_pmuv3.c      |  8 ++------
 4 files changed, 12 insertions(+), 39 deletions(-)

diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c
index 61de861eaf91..cb15b4cee5f7 100644
--- a/drivers/perf/arm-cci.c
+++ b/drivers/perf/arm-cci.c
@@ -320,12 +320,9 @@ static int cci400_get_event_idx(struct cci_pmu *cci_pmu,
 		return CCI400_PMU_CYCLE_CNTR_IDX;
 	}
 
-	for (idx = CCI400_PMU_CNTR0_IDX; idx <= CCI_PMU_CNTR_LAST(cci_pmu); ++idx)
-		if (!test_and_set_bit(idx, hw->used_mask))
-			return idx;
-
-	/* No counters available */
-	return -EAGAIN;
+	idx = find_and_set_next_bit(hw->used_mask, CCI_PMU_CNTR_LAST(cci_pmu) + 1,
+							CCI400_PMU_CNTR0_IDX);
+	return idx < CCI_PMU_CNTR_LAST(cci_pmu) + 1 ? idx : -EAGAIN;
 }
 
 static int cci400_validate_hw_event(struct cci_pmu *cci_pmu, unsigned long hw_event)
@@ -802,13 +799,8 @@ static int pmu_get_event_idx(struct cci_pmu_hw_events *hw, struct perf_event *ev
 	if (cci_pmu->model->get_event_idx)
 		return cci_pmu->model->get_event_idx(cci_pmu, hw, cci_event);
 
-	/* Generic code to find an unused idx from the mask */
-	for (idx = 0; idx <= CCI_PMU_CNTR_LAST(cci_pmu); idx++)
-		if (!test_and_set_bit(idx, hw->used_mask))
-			return idx;
-
-	/* No counters available */
-	return -EAGAIN;
+	idx = find_and_set_bit(hw->used_mask, CCI_PMU_CNTR_LAST(cci_pmu) + 1);
+	return idx < CCI_PMU_CNTR_LAST(cci_pmu) + 1 ? idx : -EAGAIN;
 }
 
 static int pmu_map_event(struct perf_event *event)
@@ -861,12 +853,8 @@ static void pmu_free_irq(struct cci_pmu *cci_pmu)
 {
 	int i;
 
-	for (i = 0; i < cci_pmu->nr_irqs; i++) {
-		if (!test_and_clear_bit(i, &cci_pmu->active_irqs))
-			continue;
-
+	for_each_test_and_clear_bit(i, &cci_pmu->active_irqs, cci_pmu->nr_irqs)
 		free_irq(cci_pmu->irqs[i], cci_pmu);
-	}
 }
 
 static u32 pmu_read_counter(struct perf_event *event)
diff --git a/drivers/perf/arm-ccn.c b/drivers/perf/arm-ccn.c
index 728d13d8e98a..d657701b1f23 100644
--- a/drivers/perf/arm-ccn.c
+++ b/drivers/perf/arm-ccn.c
@@ -589,15 +589,9 @@ static const struct attribute_group *arm_ccn_pmu_attr_groups[] = {
 
 static int arm_ccn_pmu_alloc_bit(unsigned long *bitmap, unsigned long size)
 {
-	int bit;
-
-	do {
-		bit = find_first_zero_bit(bitmap, size);
-		if (bit >= size)
-			return -EAGAIN;
-	} while (test_and_set_bit(bit, bitmap));
+	int bit = find_and_set_bit(bitmap, size);
 
-	return bit;
+	return bit < size ? bit : -EAGAIN;
 }
 
 /* All RN-I and RN-D nodes have identical PMUs */
diff --git a/drivers/perf/arm_dmc620_pmu.c b/drivers/perf/arm_dmc620_pmu.c
index 30cea6859574..e41c84dabc3e 100644
--- a/drivers/perf/arm_dmc620_pmu.c
+++ b/drivers/perf/arm_dmc620_pmu.c
@@ -303,13 +303,8 @@ static int dmc620_get_event_idx(struct perf_event *event)
 		end_idx = DMC620_PMU_MAX_COUNTERS;
 	}
 
-	for (idx = start_idx; idx < end_idx; ++idx) {
-		if (!test_and_set_bit(idx, dmc620_pmu->used_mask))
-			return idx;
-	}
-
-	/* The counters are all in use. */
-	return -EAGAIN;
+	idx = find_and_set_next_bit(dmc620_pmu->used_mask, end_idx, start_idx);
+	return idx < end_idx ? idx : -EAGAIN;
 }
 
 static inline
diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index 6ca7be05229c..f046ad9e71f1 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -825,13 +825,9 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
 static int armv8pmu_get_single_idx(struct pmu_hw_events *cpuc,
 				    struct arm_pmu *cpu_pmu)
 {
-	int idx;
+	int idx = find_and_set_next_bit(cpuc->used_mask, cpu_pmu->num_events, ARMV8_IDX_COUNTER0);
 
-	for (idx = ARMV8_IDX_COUNTER0; idx < cpu_pmu->num_events; idx++) {
-		if (!test_and_set_bit(idx, cpuc->used_mask))
-			return idx;
-	}
-	return -EAGAIN;
+	return idx < cpu_pmu->num_events ? idx : -EAGAIN;
 }
 
 static int armv8pmu_get_chain_idx(struct pmu_hw_events *cpuc,
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 08/35] perf/arm: optimize opencoded atomic find_bit() API
@ 2023-12-03 19:32     ` Yury Norov
  0 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:32 UTC (permalink / raw)
  To: linux-kernel, Will Deacon, Mark Rutland, linux-arm-kernel
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov,
	Bart Van Assche, Sergey Shtylyov

Switch subsystem to use atomic find_bit() or atomic iterators as
appropriate.

CC: Will Deacon <will@kernel.org>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/perf/arm-cci.c        | 24 ++++++------------------
 drivers/perf/arm-ccn.c        | 10 ++--------
 drivers/perf/arm_dmc620_pmu.c |  9 ++-------
 drivers/perf/arm_pmuv3.c      |  8 ++------
 4 files changed, 12 insertions(+), 39 deletions(-)

diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c
index 61de861eaf91..cb15b4cee5f7 100644
--- a/drivers/perf/arm-cci.c
+++ b/drivers/perf/arm-cci.c
@@ -320,12 +320,9 @@ static int cci400_get_event_idx(struct cci_pmu *cci_pmu,
 		return CCI400_PMU_CYCLE_CNTR_IDX;
 	}
 
-	for (idx = CCI400_PMU_CNTR0_IDX; idx <= CCI_PMU_CNTR_LAST(cci_pmu); ++idx)
-		if (!test_and_set_bit(idx, hw->used_mask))
-			return idx;
-
-	/* No counters available */
-	return -EAGAIN;
+	idx = find_and_set_next_bit(hw->used_mask, CCI_PMU_CNTR_LAST(cci_pmu) + 1,
+							CCI400_PMU_CNTR0_IDX);
+	return idx < CCI_PMU_CNTR_LAST(cci_pmu) + 1 ? idx : -EAGAIN;
 }
 
 static int cci400_validate_hw_event(struct cci_pmu *cci_pmu, unsigned long hw_event)
@@ -802,13 +799,8 @@ static int pmu_get_event_idx(struct cci_pmu_hw_events *hw, struct perf_event *ev
 	if (cci_pmu->model->get_event_idx)
 		return cci_pmu->model->get_event_idx(cci_pmu, hw, cci_event);
 
-	/* Generic code to find an unused idx from the mask */
-	for (idx = 0; idx <= CCI_PMU_CNTR_LAST(cci_pmu); idx++)
-		if (!test_and_set_bit(idx, hw->used_mask))
-			return idx;
-
-	/* No counters available */
-	return -EAGAIN;
+	idx = find_and_set_bit(hw->used_mask, CCI_PMU_CNTR_LAST(cci_pmu) + 1);
+	return idx < CCI_PMU_CNTR_LAST(cci_pmu) + 1 ? idx : -EAGAIN;
 }
 
 static int pmu_map_event(struct perf_event *event)
@@ -861,12 +853,8 @@ static void pmu_free_irq(struct cci_pmu *cci_pmu)
 {
 	int i;
 
-	for (i = 0; i < cci_pmu->nr_irqs; i++) {
-		if (!test_and_clear_bit(i, &cci_pmu->active_irqs))
-			continue;
-
+	for_each_test_and_clear_bit(i, &cci_pmu->active_irqs, cci_pmu->nr_irqs)
 		free_irq(cci_pmu->irqs[i], cci_pmu);
-	}
 }
 
 static u32 pmu_read_counter(struct perf_event *event)
diff --git a/drivers/perf/arm-ccn.c b/drivers/perf/arm-ccn.c
index 728d13d8e98a..d657701b1f23 100644
--- a/drivers/perf/arm-ccn.c
+++ b/drivers/perf/arm-ccn.c
@@ -589,15 +589,9 @@ static const struct attribute_group *arm_ccn_pmu_attr_groups[] = {
 
 static int arm_ccn_pmu_alloc_bit(unsigned long *bitmap, unsigned long size)
 {
-	int bit;
-
-	do {
-		bit = find_first_zero_bit(bitmap, size);
-		if (bit >= size)
-			return -EAGAIN;
-	} while (test_and_set_bit(bit, bitmap));
+	int bit = find_and_set_bit(bitmap, size);
 
-	return bit;
+	return bit < size ? bit : -EAGAIN;
 }
 
 /* All RN-I and RN-D nodes have identical PMUs */
diff --git a/drivers/perf/arm_dmc620_pmu.c b/drivers/perf/arm_dmc620_pmu.c
index 30cea6859574..e41c84dabc3e 100644
--- a/drivers/perf/arm_dmc620_pmu.c
+++ b/drivers/perf/arm_dmc620_pmu.c
@@ -303,13 +303,8 @@ static int dmc620_get_event_idx(struct perf_event *event)
 		end_idx = DMC620_PMU_MAX_COUNTERS;
 	}
 
-	for (idx = start_idx; idx < end_idx; ++idx) {
-		if (!test_and_set_bit(idx, dmc620_pmu->used_mask))
-			return idx;
-	}
-
-	/* The counters are all in use. */
-	return -EAGAIN;
+	idx = find_and_set_next_bit(dmc620_pmu->used_mask, end_idx, start_idx);
+	return idx < end_idx ? idx : -EAGAIN;
 }
 
 static inline
diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index 6ca7be05229c..f046ad9e71f1 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -825,13 +825,9 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
 static int armv8pmu_get_single_idx(struct pmu_hw_events *cpuc,
 				    struct arm_pmu *cpu_pmu)
 {
-	int idx;
+	int idx = find_and_set_next_bit(cpuc->used_mask, cpu_pmu->num_events, ARMV8_IDX_COUNTER0);
 
-	for (idx = ARMV8_IDX_COUNTER0; idx < cpu_pmu->num_events; idx++) {
-		if (!test_and_set_bit(idx, cpuc->used_mask))
-			return idx;
-	}
-	return -EAGAIN;
+	return idx < cpu_pmu->num_events ? idx : -EAGAIN;
 }
 
 static int armv8pmu_get_chain_idx(struct pmu_hw_events *cpuc,
-- 
2.40.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 09/35] drivers/perf: optimize ali_drw_get_counter_idx() by using find_bit()
  2023-12-03 19:32   ` Yury Norov
@ 2023-12-03 19:32     ` Yury Norov
  -1 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:32 UTC (permalink / raw)
  To: linux-kernel, Shuai Xue, Will Deacon, Mark Rutland, linux-arm-kernel
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov,
	Bart Van Assche, Sergey Shtylyov

The function searches used_mask for a set bit in a for-loop bit by bit.
We can do it faster by using atomic find_and_set_bit().

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Will Deacon <will@kernel.org>
---
 drivers/perf/alibaba_uncore_drw_pmu.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/perf/alibaba_uncore_drw_pmu.c b/drivers/perf/alibaba_uncore_drw_pmu.c
index 19d459a36be5..2a3b7701d568 100644
--- a/drivers/perf/alibaba_uncore_drw_pmu.c
+++ b/drivers/perf/alibaba_uncore_drw_pmu.c
@@ -274,15 +274,9 @@ static const struct attribute_group *ali_drw_pmu_attr_groups[] = {
 static int ali_drw_get_counter_idx(struct perf_event *event)
 {
 	struct ali_drw_pmu *drw_pmu = to_ali_drw_pmu(event->pmu);
-	int idx;
+	int idx = find_and_set_bit(drw_pmu->used_mask, ALI_DRW_PMU_COMMON_MAX_COUNTERS);
 
-	for (idx = 0; idx < ALI_DRW_PMU_COMMON_MAX_COUNTERS; ++idx) {
-		if (!test_and_set_bit(idx, drw_pmu->used_mask))
-			return idx;
-	}
-
-	/* The counters are all in use. */
-	return -EBUSY;
+	return idx < ALI_DRW_PMU_COMMON_MAX_COUNTERS ? idx : -EBUSY;
 }
 
 static u64 ali_drw_pmu_read_counter(struct perf_event *event)
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 09/35] drivers/perf: optimize ali_drw_get_counter_idx() by using find_bit()
@ 2023-12-03 19:32     ` Yury Norov
  0 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:32 UTC (permalink / raw)
  To: linux-kernel, Shuai Xue, Will Deacon, Mark Rutland, linux-arm-kernel
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov,
	Bart Van Assche, Sergey Shtylyov

The function searches used_mask for a set bit in a for-loop bit by bit.
We can do it faster by using atomic find_and_set_bit().

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Will Deacon <will@kernel.org>
---
 drivers/perf/alibaba_uncore_drw_pmu.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/perf/alibaba_uncore_drw_pmu.c b/drivers/perf/alibaba_uncore_drw_pmu.c
index 19d459a36be5..2a3b7701d568 100644
--- a/drivers/perf/alibaba_uncore_drw_pmu.c
+++ b/drivers/perf/alibaba_uncore_drw_pmu.c
@@ -274,15 +274,9 @@ static const struct attribute_group *ali_drw_pmu_attr_groups[] = {
 static int ali_drw_get_counter_idx(struct perf_event *event)
 {
 	struct ali_drw_pmu *drw_pmu = to_ali_drw_pmu(event->pmu);
-	int idx;
+	int idx = find_and_set_bit(drw_pmu->used_mask, ALI_DRW_PMU_COMMON_MAX_COUNTERS);
 
-	for (idx = 0; idx < ALI_DRW_PMU_COMMON_MAX_COUNTERS; ++idx) {
-		if (!test_and_set_bit(idx, drw_pmu->used_mask))
-			return idx;
-	}
-
-	/* The counters are all in use. */
-	return -EBUSY;
+	return idx < ALI_DRW_PMU_COMMON_MAX_COUNTERS ? idx : -EBUSY;
 }
 
 static u64 ali_drw_pmu_read_counter(struct perf_event *event)
-- 
2.40.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 10/35] dmaengine: idxd: optimize perfmon_assign_event()
  2023-12-03 19:32   ` Yury Norov
                     ` (7 preceding siblings ...)
  (?)
@ 2023-12-03 19:32   ` Yury Norov
  -1 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:32 UTC (permalink / raw)
  To: linux-kernel, Fenghua Yu, Dave Jiang, Vinod Koul, dmaengine
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov,
	Bart Van Assche, Sergey Shtylyov

The function searches used_mask for a set bit in a for-loop bit by bit.
We can do it faster by using atomic find_and_set_bit().

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Acked-by: Vinod Koul <vkoul@kernel.org>
---
 drivers/dma/idxd/perfmon.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/dma/idxd/perfmon.c b/drivers/dma/idxd/perfmon.c
index fdda6d604262..4dd9c0d979c3 100644
--- a/drivers/dma/idxd/perfmon.c
+++ b/drivers/dma/idxd/perfmon.c
@@ -134,13 +134,9 @@ static void perfmon_assign_hw_event(struct idxd_pmu *idxd_pmu,
 static int perfmon_assign_event(struct idxd_pmu *idxd_pmu,
 				struct perf_event *event)
 {
-	int i;
-
-	for (i = 0; i < IDXD_PMU_EVENT_MAX; i++)
-		if (!test_and_set_bit(i, idxd_pmu->used_mask))
-			return i;
+	int i = find_and_set_bit(idxd_pmu->used_mask, IDXD_PMU_EVENT_MAX);
 
-	return -EINVAL;
+	return i < IDXD_PMU_EVENT_MAX ? i : -EINVAL;
 }
 
 /*
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 11/35] ath10k: optimize ath10k_snoc_napi_poll() by using find_bit()
  2023-12-03 19:32   ` Yury Norov
                     ` (8 preceding siblings ...)
  (?)
@ 2023-12-03 19:32   ` Yury Norov
  -1 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:32 UTC (permalink / raw)
  To: linux-kernel, Kalle Valo, Jeff Johnson, ath10k, linux-wireless
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov,
	Bart Van Assche, Sergey Shtylyov

ath10k_snoc_napi_poll() traverses pending_ce_irqs bitmap bit by bit.
We can do it faster by using for_each_test_and_clear_bit() iterator.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/net/wireless/ath/ath10k/snoc.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/snoc.c b/drivers/net/wireless/ath/ath10k/snoc.c
index 2c39bad7ebfb..a1db5a973780 100644
--- a/drivers/net/wireless/ath/ath10k/snoc.c
+++ b/drivers/net/wireless/ath/ath10k/snoc.c
@@ -1237,11 +1237,10 @@ static int ath10k_snoc_napi_poll(struct napi_struct *ctx, int budget)
 		return done;
 	}
 
-	for (ce_id = 0; ce_id < CE_COUNT; ce_id++)
-		if (test_and_clear_bit(ce_id, ar_snoc->pending_ce_irqs)) {
-			ath10k_ce_per_engine_service(ar, ce_id);
-			ath10k_ce_enable_interrupt(ar, ce_id);
-		}
+	for_each_test_and_clear_bit(ce_id, ar_snoc->pending_ce_irqs, CE_COUNT) {
+		ath10k_ce_per_engine_service(ar, ce_id);
+		ath10k_ce_enable_interrupt(ar, ce_id);
+	}
 
 	done = ath10k_htt_txrx_compl_task(ar, budget);
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 12/35] wifi: rtw88: optimize rtw_pci_tx_kick_off() by using find_bit()
  2023-12-03 19:32   ` Yury Norov
                     ` (9 preceding siblings ...)
  (?)
@ 2023-12-03 19:32   ` Yury Norov
  -1 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:32 UTC (permalink / raw)
  To: linux-kernel, Ping-Ke Shih, Kalle Valo, linux-wireless
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov,
	Bart Van Assche, Sergey Shtylyov

rtw_pci_tx_kick_off() traverses tx_queued bitmap bit by bit. We can do it
faster by using atomic for_each_test_and_clear_bit() iterator.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/net/wireless/realtek/rtw88/pci.c | 5 ++---
 drivers/net/wireless/realtek/rtw89/pci.c | 5 +----
 2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtw88/pci.c b/drivers/net/wireless/realtek/rtw88/pci.c
index 2bfc0e822b8d..a0d69c75a381 100644
--- a/drivers/net/wireless/realtek/rtw88/pci.c
+++ b/drivers/net/wireless/realtek/rtw88/pci.c
@@ -789,9 +789,8 @@ static void rtw_pci_tx_kick_off(struct rtw_dev *rtwdev)
 	struct rtw_pci *rtwpci = (struct rtw_pci *)rtwdev->priv;
 	enum rtw_tx_queue_type queue;
 
-	for (queue = 0; queue < RTK_MAX_TX_QUEUE_NUM; queue++)
-		if (test_and_clear_bit(queue, rtwpci->tx_queued))
-			rtw_pci_tx_kick_off_queue(rtwdev, queue);
+	for_each_test_and_clear_bit(queue, rtwpci->tx_queued, RTK_MAX_TX_QUEUE_NUM)
+		rtw_pci_tx_kick_off_queue(rtwdev, queue);
 }
 
 static int rtw_pci_tx_write_data(struct rtw_dev *rtwdev,
diff --git a/drivers/net/wireless/realtek/rtw89/pci.c b/drivers/net/wireless/realtek/rtw89/pci.c
index 14ddb0d39e63..184d41b774d7 100644
--- a/drivers/net/wireless/realtek/rtw89/pci.c
+++ b/drivers/net/wireless/realtek/rtw89/pci.c
@@ -1077,10 +1077,7 @@ static void rtw89_pci_tx_kick_off_pending(struct rtw89_dev *rtwdev)
 	struct rtw89_pci_tx_ring *tx_ring;
 	int txch;
 
-	for (txch = 0; txch < RTW89_TXCH_NUM; txch++) {
-		if (!test_and_clear_bit(txch, rtwpci->kick_map))
-			continue;
-
+	for_each_test_and_clear_bit(txch, rtwpci->kick_map, RTW89_TXCH_NUM) {
 		tx_ring = &rtwpci->tx_rings[txch];
 		__rtw89_pci_tx_kick_off(rtwdev, tx_ring);
 	}
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 13/35] KVM: x86: hyper-v: optimize and cleanup kvm_hv_process_stimers()
  2023-12-03 19:32   ` Yury Norov
                     ` (10 preceding siblings ...)
  (?)
@ 2023-12-03 19:32   ` Yury Norov
  2023-12-04  9:53     ` Vitaly Kuznetsov
  -1 siblings, 1 reply; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:32 UTC (permalink / raw)
  To: linux-kernel, Vitaly Kuznetsov, Sean Christopherson,
	Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, kvm
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov,
	Bart Van Assche, Sergey Shtylyov

The function traverses stimer_pending_bitmap in a for-loop bit by bit.
We can do it faster by using atomic find_and_set_bit().

While here, refactor the logic by decreasing indentation level.

CC: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 arch/x86/kvm/hyperv.c | 39 +++++++++++++++++++--------------------
 1 file changed, 19 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 238afd7335e4..a0e45d20d451 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -870,27 +870,26 @@ void kvm_hv_process_stimers(struct kvm_vcpu *vcpu)
 	if (!hv_vcpu)
 		return;
 
-	for (i = 0; i < ARRAY_SIZE(hv_vcpu->stimer); i++)
-		if (test_and_clear_bit(i, hv_vcpu->stimer_pending_bitmap)) {
-			stimer = &hv_vcpu->stimer[i];
-			if (stimer->config.enable) {
-				exp_time = stimer->exp_time;
-
-				if (exp_time) {
-					time_now =
-						get_time_ref_counter(vcpu->kvm);
-					if (time_now >= exp_time)
-						stimer_expiration(stimer);
-				}
-
-				if ((stimer->config.enable) &&
-				    stimer->count) {
-					if (!stimer->msg_pending)
-						stimer_start(stimer);
-				} else
-					stimer_cleanup(stimer);
-			}
+	for_each_test_and_clear_bit(i, hv_vcpu->stimer_pending_bitmap,
+					ARRAY_SIZE(hv_vcpu->stimer)) {
+		stimer = &hv_vcpu->stimer[i];
+		if (!stimer->config.enable)
+			continue;
+
+		exp_time = stimer->exp_time;
+
+		if (exp_time) {
+			time_now = get_time_ref_counter(vcpu->kvm);
+			if (time_now >= exp_time)
+				stimer_expiration(stimer);
 		}
+
+		if (stimer->config.enable && stimer->count) {
+			if (!stimer->msg_pending)
+				stimer_start(stimer);
+		} else
+			stimer_cleanup(stimer);
+	}
 }
 
 void kvm_hv_vcpu_uninit(struct kvm_vcpu *vcpu)
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 14/35] PCI: hv: switch hv_get_dom_num() to use atomic find_bit()
  2023-12-03 19:32   ` Yury Norov
                     ` (11 preceding siblings ...)
  (?)
@ 2023-12-03 19:32   ` Yury Norov
  2023-12-04  5:58     ` Wei Liu
  2023-12-04 19:14     ` Bjorn Helgaas
  -1 siblings, 2 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:32 UTC (permalink / raw)
  To: linux-kernel, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Lorenzo Pieralisi, Krzysztof Wilczyński,
	Rob Herring, Bjorn Helgaas, linux-hyperv, linux-pci
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov,
	Bart Van Assche, Sergey Shtylyov, Michael Kelley

The function traverses bitmap with for_each_clear_bit() just to allocate
a bit atomically. We can do it better with a dedicated find_and_set_bit().

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
---
 drivers/pci/controller/pci-hyperv.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 30c7dfeccb16..033b1fb7f4eb 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -3605,12 +3605,9 @@ static u16 hv_get_dom_num(u16 dom)
 	if (test_and_set_bit(dom, hvpci_dom_map) == 0)
 		return dom;
 
-	for_each_clear_bit(i, hvpci_dom_map, HVPCI_DOM_MAP_SIZE) {
-		if (test_and_set_bit(i, hvpci_dom_map) == 0)
-			return i;
-	}
+	i = find_and_set_bit(hvpci_dom_map, HVPCI_DOM_MAP_SIZE);
 
-	return HVPCI_DOM_INVALID;
+	return i < HVPCI_DOM_MAP_SIZE ? i : HVPCI_DOM_INVALID;
 }
 
 /**
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 15/35] scsi: core: use atomic find_bit() API where appropriate
  2023-12-03 19:32   ` Yury Norov
                     ` (12 preceding siblings ...)
  (?)
@ 2023-12-03 19:32   ` Yury Norov
  -1 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:32 UTC (permalink / raw)
  To: linux-kernel, Sathya Prakash Veerichetty, Kashyap Desai,
	Sumit Saxena, Sreekanth Reddy, James E.J. Bottomley,
	Martin K. Petersen, Nilesh Javali, Manish Rangankar,
	GR-QLogic-Storage-Upstream, mpi3mr-linuxdrv.pdl, linux-scsi
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov,
	Bart Van Assche, Sergey Shtylyov

A plain loop in scsi_evt_thread() opencodes optimized atomic bit traversing
macro. Switch it to using the dedicated iterator.

CC: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/scsi/scsi_lib.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index cf3864f72093..a4c5c9b4bfc9 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2494,14 +2494,13 @@ static void scsi_evt_emit(struct scsi_device *sdev, struct scsi_event *evt)
 void scsi_evt_thread(struct work_struct *work)
 {
 	struct scsi_device *sdev;
-	enum scsi_device_event evt_type;
+	enum scsi_device_event evt_type = SDEV_EVT_FIRST;
 	LIST_HEAD(event_list);
 
 	sdev = container_of(work, struct scsi_device, event_work);
 
-	for (evt_type = SDEV_EVT_FIRST; evt_type <= SDEV_EVT_LAST; evt_type++)
-		if (test_and_clear_bit(evt_type, sdev->pending_events))
-			sdev_evt_send_simple(sdev, evt_type, GFP_KERNEL);
+	for_each_test_and_clear_bit_from(evt_type, sdev->pending_events, SDEV_EVT_LAST + 1)
+		sdev_evt_send_simple(sdev, evt_type, GFP_KERNEL);
 
 	while (1) {
 		struct scsi_event *evt;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 16/35] scsi: mpi3mr: switch to using atomic find_and_set_bit()
  2023-12-03 19:32   ` Yury Norov
                     ` (13 preceding siblings ...)
  (?)
@ 2023-12-03 19:32   ` Yury Norov
  -1 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:32 UTC (permalink / raw)
  To: linux-kernel, Sathya Prakash Veerichetty, Kashyap Desai,
	Sumit Saxena, Sreekanth Reddy, James E.J. Bottomley,
	Martin K. Petersen, Nilesh Javali, Manish Rangankar,
	GR-QLogic-Storage-Upstream, mpi3mr-linuxdrv.pdl, linux-scsi
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov,
	Bart Van Assche, Sergey Shtylyov

mpi3mr_dev_rmhs_send_tm() and mpi3mr_send_event_ack() opencode
find_and_set_bit(). Switch them to using dedicated functions.

CC: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/scsi/mpi3mr/mpi3mr_os.c | 21 ++++++---------------
 1 file changed, 6 insertions(+), 15 deletions(-)

diff --git a/drivers/scsi/mpi3mr/mpi3mr_os.c b/drivers/scsi/mpi3mr/mpi3mr_os.c
index 040031eb0c12..11139a2008fd 100644
--- a/drivers/scsi/mpi3mr/mpi3mr_os.c
+++ b/drivers/scsi/mpi3mr/mpi3mr_os.c
@@ -2276,13 +2276,9 @@ static void mpi3mr_dev_rmhs_send_tm(struct mpi3mr_ioc *mrioc, u16 handle,
 	if (drv_cmd)
 		goto issue_cmd;
 	do {
-		cmd_idx = find_first_zero_bit(mrioc->devrem_bitmap,
-		    MPI3MR_NUM_DEVRMCMD);
-		if (cmd_idx < MPI3MR_NUM_DEVRMCMD) {
-			if (!test_and_set_bit(cmd_idx, mrioc->devrem_bitmap))
-				break;
-			cmd_idx = MPI3MR_NUM_DEVRMCMD;
-		}
+		cmd_idx = find_and_set_bit(mrioc->devrem_bitmap, MPI3MR_NUM_DEVRMCMD);
+		if (cmd_idx < MPI3MR_NUM_DEVRMCMD)
+			break;
 	} while (retrycount--);
 
 	if (cmd_idx >= MPI3MR_NUM_DEVRMCMD) {
@@ -2417,14 +2413,9 @@ static void mpi3mr_send_event_ack(struct mpi3mr_ioc *mrioc, u8 event,
 	    "sending event ack in the top half for event(0x%02x), event_ctx(0x%08x)\n",
 	    event, event_ctx);
 	do {
-		cmd_idx = find_first_zero_bit(mrioc->evtack_cmds_bitmap,
-		    MPI3MR_NUM_EVTACKCMD);
-		if (cmd_idx < MPI3MR_NUM_EVTACKCMD) {
-			if (!test_and_set_bit(cmd_idx,
-			    mrioc->evtack_cmds_bitmap))
-				break;
-			cmd_idx = MPI3MR_NUM_EVTACKCMD;
-		}
+		cmd_idx = find_and_set_bit(mrioc->evtack_cmds_bitmap, MPI3MR_NUM_EVTACKCMD);
+		if (cmd_idx < MPI3MR_NUM_EVTACKCMD)
+			break;
 	} while (retrycount--);
 
 	if (cmd_idx >= MPI3MR_NUM_EVTACKCMD) {
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 17/35] scsi: qedi: rework qedi_get_task_idx()
  2023-12-03 19:32   ` Yury Norov
                     ` (14 preceding siblings ...)
  (?)
@ 2023-12-03 19:32   ` Yury Norov
  -1 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:32 UTC (permalink / raw)
  To: linux-kernel, Sathya Prakash Veerichetty, Kashyap Desai,
	Sumit Saxena, Sreekanth Reddy, James E.J. Bottomley,
	Martin K. Petersen, Nilesh Javali, Manish Rangankar,
	GR-QLogic-Storage-Upstream, mpi3mr-linuxdrv.pdl, linux-scsi
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov,
	Bart Van Assche, Sergey Shtylyov

Simplify the function by using a dedicated find_and_set_bit().

CC: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/scsi/qedi/qedi_main.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
index cd0180b1f5b9..2f940c6898ef 100644
--- a/drivers/scsi/qedi/qedi_main.c
+++ b/drivers/scsi/qedi/qedi_main.c
@@ -1824,20 +1824,13 @@ int qedi_get_task_idx(struct qedi_ctx *qedi)
 {
 	s16 tmp_idx;
 
-again:
-	tmp_idx = find_first_zero_bit(qedi->task_idx_map,
-				      MAX_ISCSI_TASK_ENTRIES);
+	tmp_idx = find_and_set_bit(qedi->task_idx_map, MAX_ISCSI_TASK_ENTRIES);
 
 	if (tmp_idx >= MAX_ISCSI_TASK_ENTRIES) {
 		QEDI_ERR(&qedi->dbg_ctx, "FW task context pool is full.\n");
 		tmp_idx = -1;
-		goto err_idx;
 	}
 
-	if (test_and_set_bit(tmp_idx, qedi->task_idx_map))
-		goto again;
-
-err_idx:
 	return tmp_idx;
 }
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 18/35] powerpc: use atomic find_bit() API where appropriate
  2023-12-03 19:32   ` Yury Norov
@ 2023-12-03 19:32     ` Yury Norov
  -1 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:32 UTC (permalink / raw)
  To: linux-kernel, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Yury Norov, Colin Ian King, linuxppc-dev
  Cc: Sergey Shtylyov, Jan Kara, Bart Van Assche, Rasmus Villemoes,
	Matthew Wilcox, Alexey Klimov, Mirsad Todorovac, Andy Shevchenko,
	Maxim Kuvyrkov

Use find_and_{set,clear}_bit() where appropriate and simplify the logic.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 arch/powerpc/mm/book3s32/mmu_context.c     | 10 ++---
 arch/powerpc/platforms/pasemi/dma_lib.c    | 45 +++++-----------------
 arch/powerpc/platforms/powernv/pci-sriov.c | 12 ++----
 3 files changed, 17 insertions(+), 50 deletions(-)

diff --git a/arch/powerpc/mm/book3s32/mmu_context.c b/arch/powerpc/mm/book3s32/mmu_context.c
index 1922f9a6b058..7db19f173c2e 100644
--- a/arch/powerpc/mm/book3s32/mmu_context.c
+++ b/arch/powerpc/mm/book3s32/mmu_context.c
@@ -50,13 +50,11 @@ static unsigned long context_map[LAST_CONTEXT / BITS_PER_LONG + 1];
 
 unsigned long __init_new_context(void)
 {
-	unsigned long ctx = next_mmu_context;
+	unsigned long ctx;
 
-	while (test_and_set_bit(ctx, context_map)) {
-		ctx = find_next_zero_bit(context_map, LAST_CONTEXT+1, ctx);
-		if (ctx > LAST_CONTEXT)
-			ctx = 0;
-	}
+	ctx = find_and_set_next_bit(context_map, LAST_CONTEXT + 1, next_mmu_context);
+	if (ctx > LAST_CONTEXT)
+		ctx = 0;
 	next_mmu_context = (ctx + 1) & LAST_CONTEXT;
 
 	return ctx;
diff --git a/arch/powerpc/platforms/pasemi/dma_lib.c b/arch/powerpc/platforms/pasemi/dma_lib.c
index 1be1f18f6f09..906dabee0132 100644
--- a/arch/powerpc/platforms/pasemi/dma_lib.c
+++ b/arch/powerpc/platforms/pasemi/dma_lib.c
@@ -118,14 +118,9 @@ static int pasemi_alloc_tx_chan(enum pasemi_dmachan_type type)
 		limit = MAX_TXCH;
 		break;
 	}
-retry:
-	bit = find_next_bit(txch_free, MAX_TXCH, start);
-	if (bit >= limit)
-		return -ENOSPC;
-	if (!test_and_clear_bit(bit, txch_free))
-		goto retry;
-
-	return bit;
+
+	bit = find_and_clear_next_bit(txch_free, MAX_TXCH, start);
+	return bit < limit ? bit : -ENOSPC;
 }
 
 static void pasemi_free_tx_chan(int chan)
@@ -136,15 +131,9 @@ static void pasemi_free_tx_chan(int chan)
 
 static int pasemi_alloc_rx_chan(void)
 {
-	int bit;
-retry:
-	bit = find_first_bit(rxch_free, MAX_RXCH);
-	if (bit >= MAX_TXCH)
-		return -ENOSPC;
-	if (!test_and_clear_bit(bit, rxch_free))
-		goto retry;
-
-	return bit;
+	int bit = find_and_clear_bit(rxch_free, MAX_RXCH);
+
+	return bit < MAX_TXCH ? bit : -ENOSPC;
 }
 
 static void pasemi_free_rx_chan(int chan)
@@ -374,16 +363,9 @@ EXPORT_SYMBOL(pasemi_dma_free_buf);
  */
 int pasemi_dma_alloc_flag(void)
 {
-	int bit;
+	int bit = find_and_clear_bit(flags_free, MAX_FLAGS);
 
-retry:
-	bit = find_first_bit(flags_free, MAX_FLAGS);
-	if (bit >= MAX_FLAGS)
-		return -ENOSPC;
-	if (!test_and_clear_bit(bit, flags_free))
-		goto retry;
-
-	return bit;
+	return bit < MAX_FLAGS ? bit : -ENOSPC;
 }
 EXPORT_SYMBOL(pasemi_dma_alloc_flag);
 
@@ -439,16 +421,9 @@ EXPORT_SYMBOL(pasemi_dma_clear_flag);
  */
 int pasemi_dma_alloc_fun(void)
 {
-	int bit;
-
-retry:
-	bit = find_first_bit(fun_free, MAX_FLAGS);
-	if (bit >= MAX_FLAGS)
-		return -ENOSPC;
-	if (!test_and_clear_bit(bit, fun_free))
-		goto retry;
+	int bit = find_and_clear_bit(fun_free, MAX_FLAGS);
 
-	return bit;
+	return bit < MAX_FLAGS ? bit : -ENOSPC;
 }
 EXPORT_SYMBOL(pasemi_dma_alloc_fun);
 
diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c b/arch/powerpc/platforms/powernv/pci-sriov.c
index 59882da3e742..640e387e6d83 100644
--- a/arch/powerpc/platforms/powernv/pci-sriov.c
+++ b/arch/powerpc/platforms/powernv/pci-sriov.c
@@ -397,18 +397,12 @@ static int64_t pnv_ioda_map_m64_single(struct pnv_phb *phb,
 
 static int pnv_pci_alloc_m64_bar(struct pnv_phb *phb, struct pnv_iov_data *iov)
 {
-	int win;
+	int win = find_and_set_bit(&phb->ioda.m64_bar_alloc, phb->ioda.m64_bar_idx + 1);
 
-	do {
-		win = find_next_zero_bit(&phb->ioda.m64_bar_alloc,
-				phb->ioda.m64_bar_idx + 1, 0);
-
-		if (win >= phb->ioda.m64_bar_idx + 1)
-			return -1;
-	} while (test_and_set_bit(win, &phb->ioda.m64_bar_alloc));
+	if (win >= phb->ioda.m64_bar_idx + 1)
+		return -1;
 
 	set_bit(win, iov->used_m64_bar_mask);
-
 	return win;
 }
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 18/35] powerpc: use atomic find_bit() API where appropriate
@ 2023-12-03 19:32     ` Yury Norov
  0 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:32 UTC (permalink / raw)
  To: linux-kernel, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Yury Norov, Colin Ian King, linuxppc-dev
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov, Bart Van Assche,
	Sergey Shtylyov

Use find_and_{set,clear}_bit() where appropriate and simplify the logic.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 arch/powerpc/mm/book3s32/mmu_context.c     | 10 ++---
 arch/powerpc/platforms/pasemi/dma_lib.c    | 45 +++++-----------------
 arch/powerpc/platforms/powernv/pci-sriov.c | 12 ++----
 3 files changed, 17 insertions(+), 50 deletions(-)

diff --git a/arch/powerpc/mm/book3s32/mmu_context.c b/arch/powerpc/mm/book3s32/mmu_context.c
index 1922f9a6b058..7db19f173c2e 100644
--- a/arch/powerpc/mm/book3s32/mmu_context.c
+++ b/arch/powerpc/mm/book3s32/mmu_context.c
@@ -50,13 +50,11 @@ static unsigned long context_map[LAST_CONTEXT / BITS_PER_LONG + 1];
 
 unsigned long __init_new_context(void)
 {
-	unsigned long ctx = next_mmu_context;
+	unsigned long ctx;
 
-	while (test_and_set_bit(ctx, context_map)) {
-		ctx = find_next_zero_bit(context_map, LAST_CONTEXT+1, ctx);
-		if (ctx > LAST_CONTEXT)
-			ctx = 0;
-	}
+	ctx = find_and_set_next_bit(context_map, LAST_CONTEXT + 1, next_mmu_context);
+	if (ctx > LAST_CONTEXT)
+		ctx = 0;
 	next_mmu_context = (ctx + 1) & LAST_CONTEXT;
 
 	return ctx;
diff --git a/arch/powerpc/platforms/pasemi/dma_lib.c b/arch/powerpc/platforms/pasemi/dma_lib.c
index 1be1f18f6f09..906dabee0132 100644
--- a/arch/powerpc/platforms/pasemi/dma_lib.c
+++ b/arch/powerpc/platforms/pasemi/dma_lib.c
@@ -118,14 +118,9 @@ static int pasemi_alloc_tx_chan(enum pasemi_dmachan_type type)
 		limit = MAX_TXCH;
 		break;
 	}
-retry:
-	bit = find_next_bit(txch_free, MAX_TXCH, start);
-	if (bit >= limit)
-		return -ENOSPC;
-	if (!test_and_clear_bit(bit, txch_free))
-		goto retry;
-
-	return bit;
+
+	bit = find_and_clear_next_bit(txch_free, MAX_TXCH, start);
+	return bit < limit ? bit : -ENOSPC;
 }
 
 static void pasemi_free_tx_chan(int chan)
@@ -136,15 +131,9 @@ static void pasemi_free_tx_chan(int chan)
 
 static int pasemi_alloc_rx_chan(void)
 {
-	int bit;
-retry:
-	bit = find_first_bit(rxch_free, MAX_RXCH);
-	if (bit >= MAX_TXCH)
-		return -ENOSPC;
-	if (!test_and_clear_bit(bit, rxch_free))
-		goto retry;
-
-	return bit;
+	int bit = find_and_clear_bit(rxch_free, MAX_RXCH);
+
+	return bit < MAX_TXCH ? bit : -ENOSPC;
 }
 
 static void pasemi_free_rx_chan(int chan)
@@ -374,16 +363,9 @@ EXPORT_SYMBOL(pasemi_dma_free_buf);
  */
 int pasemi_dma_alloc_flag(void)
 {
-	int bit;
+	int bit = find_and_clear_bit(flags_free, MAX_FLAGS);
 
-retry:
-	bit = find_first_bit(flags_free, MAX_FLAGS);
-	if (bit >= MAX_FLAGS)
-		return -ENOSPC;
-	if (!test_and_clear_bit(bit, flags_free))
-		goto retry;
-
-	return bit;
+	return bit < MAX_FLAGS ? bit : -ENOSPC;
 }
 EXPORT_SYMBOL(pasemi_dma_alloc_flag);
 
@@ -439,16 +421,9 @@ EXPORT_SYMBOL(pasemi_dma_clear_flag);
  */
 int pasemi_dma_alloc_fun(void)
 {
-	int bit;
-
-retry:
-	bit = find_first_bit(fun_free, MAX_FLAGS);
-	if (bit >= MAX_FLAGS)
-		return -ENOSPC;
-	if (!test_and_clear_bit(bit, fun_free))
-		goto retry;
+	int bit = find_and_clear_bit(fun_free, MAX_FLAGS);
 
-	return bit;
+	return bit < MAX_FLAGS ? bit : -ENOSPC;
 }
 EXPORT_SYMBOL(pasemi_dma_alloc_fun);
 
diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c b/arch/powerpc/platforms/powernv/pci-sriov.c
index 59882da3e742..640e387e6d83 100644
--- a/arch/powerpc/platforms/powernv/pci-sriov.c
+++ b/arch/powerpc/platforms/powernv/pci-sriov.c
@@ -397,18 +397,12 @@ static int64_t pnv_ioda_map_m64_single(struct pnv_phb *phb,
 
 static int pnv_pci_alloc_m64_bar(struct pnv_phb *phb, struct pnv_iov_data *iov)
 {
-	int win;
+	int win = find_and_set_bit(&phb->ioda.m64_bar_alloc, phb->ioda.m64_bar_idx + 1);
 
-	do {
-		win = find_next_zero_bit(&phb->ioda.m64_bar_alloc,
-				phb->ioda.m64_bar_idx + 1, 0);
-
-		if (win >= phb->ioda.m64_bar_idx + 1)
-			return -1;
-	} while (test_and_set_bit(win, &phb->ioda.m64_bar_alloc));
+	if (win >= phb->ioda.m64_bar_idx + 1)
+		return -1;
 
 	set_bit(win, iov->used_m64_bar_mask);
-
 	return win;
 }
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 19/35] iommu: use atomic find_bit() API where appropriate
  2023-12-03 19:32   ` Yury Norov
@ 2023-12-03 19:32     ` Yury Norov
  -1 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:32 UTC (permalink / raw)
  To: linux-kernel, Will Deacon, Robin Murphy, Joerg Roedel,
	Andy Gross, Bjorn Andersson, Konrad Dybcio, Yury Norov,
	linux-arm-kernel, iommu, linux-arm-msm
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov, Bart Van Assche,
	Sergey Shtylyov

Switch opencoded find_and_set_next_bit() in __arm_smmu_alloc_bitmap()
and msm_iommu_alloc_ctx() to use dedicated API, and make them nice
one-liner wrappers.

While here, refactor msm_iommu_attach_dev() and msm_iommu_alloc_ctx()
so that error codes don't mismatch.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/iommu/arm/arm-smmu/arm-smmu.h | 10 ++--------
 drivers/iommu/msm_iommu.c             | 18 ++++--------------
 2 files changed, 6 insertions(+), 22 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index 703fd5817ec1..004a4704ebf1 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -453,15 +453,9 @@ struct arm_smmu_impl {
 
 static inline int __arm_smmu_alloc_bitmap(unsigned long *map, int start, int end)
 {
-	int idx;
+	int idx = find_and_set_next_bit(map, end, start);
 
-	do {
-		idx = find_next_zero_bit(map, end, start);
-		if (idx == end)
-			return -ENOSPC;
-	} while (test_and_set_bit(idx, map));
-
-	return idx;
+	return idx < end ? idx : -ENOSPC;
 }
 
 static inline void __iomem *arm_smmu_page(struct arm_smmu_device *smmu, int n)
diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index f86af9815d6f..67124f4228b1 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -185,17 +185,9 @@ static const struct iommu_flush_ops msm_iommu_flush_ops = {
 	.tlb_add_page = __flush_iotlb_page,
 };
 
-static int msm_iommu_alloc_ctx(unsigned long *map, int start, int end)
+static int msm_iommu_alloc_ctx(struct msm_iommu_dev *iommu)
 {
-	int idx;
-
-	do {
-		idx = find_next_zero_bit(map, end, start);
-		if (idx == end)
-			return -ENOSPC;
-	} while (test_and_set_bit(idx, map));
-
-	return idx;
+	return find_and_set_bit(iommu->context_map, iommu->ncb);
 }
 
 static void msm_iommu_free_ctx(unsigned long *map, int idx)
@@ -418,10 +410,8 @@ static int msm_iommu_attach_dev(struct iommu_domain *domain, struct device *dev)
 					ret = -EEXIST;
 					goto fail;
 				}
-				master->num =
-					msm_iommu_alloc_ctx(iommu->context_map,
-							    0, iommu->ncb);
-				if (IS_ERR_VALUE(master->num)) {
+				master->num = msm_iommu_alloc_ctx(iommu);
+				if (master->num >= iommu->ncb) {
 					ret = -ENODEV;
 					goto fail;
 				}
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 19/35] iommu: use atomic find_bit() API where appropriate
@ 2023-12-03 19:32     ` Yury Norov
  0 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:32 UTC (permalink / raw)
  To: linux-kernel, Will Deacon, Robin Murphy, Joerg Roedel,
	Andy Gross, Bjorn Andersson, Konrad Dybcio, Yury Norov,
	linux-arm-kernel, iommu, linux-arm-msm
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov, Bart Van Assche,
	Sergey Shtylyov

Switch opencoded find_and_set_next_bit() in __arm_smmu_alloc_bitmap()
and msm_iommu_alloc_ctx() to use dedicated API, and make them nice
one-liner wrappers.

While here, refactor msm_iommu_attach_dev() and msm_iommu_alloc_ctx()
so that error codes don't mismatch.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/iommu/arm/arm-smmu/arm-smmu.h | 10 ++--------
 drivers/iommu/msm_iommu.c             | 18 ++++--------------
 2 files changed, 6 insertions(+), 22 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index 703fd5817ec1..004a4704ebf1 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -453,15 +453,9 @@ struct arm_smmu_impl {
 
 static inline int __arm_smmu_alloc_bitmap(unsigned long *map, int start, int end)
 {
-	int idx;
+	int idx = find_and_set_next_bit(map, end, start);
 
-	do {
-		idx = find_next_zero_bit(map, end, start);
-		if (idx == end)
-			return -ENOSPC;
-	} while (test_and_set_bit(idx, map));
-
-	return idx;
+	return idx < end ? idx : -ENOSPC;
 }
 
 static inline void __iomem *arm_smmu_page(struct arm_smmu_device *smmu, int n)
diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index f86af9815d6f..67124f4228b1 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -185,17 +185,9 @@ static const struct iommu_flush_ops msm_iommu_flush_ops = {
 	.tlb_add_page = __flush_iotlb_page,
 };
 
-static int msm_iommu_alloc_ctx(unsigned long *map, int start, int end)
+static int msm_iommu_alloc_ctx(struct msm_iommu_dev *iommu)
 {
-	int idx;
-
-	do {
-		idx = find_next_zero_bit(map, end, start);
-		if (idx == end)
-			return -ENOSPC;
-	} while (test_and_set_bit(idx, map));
-
-	return idx;
+	return find_and_set_bit(iommu->context_map, iommu->ncb);
 }
 
 static void msm_iommu_free_ctx(unsigned long *map, int idx)
@@ -418,10 +410,8 @@ static int msm_iommu_attach_dev(struct iommu_domain *domain, struct device *dev)
 					ret = -EEXIST;
 					goto fail;
 				}
-				master->num =
-					msm_iommu_alloc_ctx(iommu->context_map,
-							    0, iommu->ncb);
-				if (IS_ERR_VALUE(master->num)) {
+				master->num = msm_iommu_alloc_ctx(iommu);
+				if (master->num >= iommu->ncb) {
 					ret = -ENODEV;
 					goto fail;
 				}
-- 
2.40.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 20/35] media: radio-shark: use atomic find_bit() API where appropriate
  2023-12-03 19:32   ` Yury Norov
                     ` (17 preceding siblings ...)
  (?)
@ 2023-12-03 19:32   ` Yury Norov
  2023-12-04  8:43     ` Hans Verkuil
  -1 siblings, 1 reply; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:32 UTC (permalink / raw)
  To: linux-kernel, Hans Verkuil, Mauro Carvalho Chehab, linux-media
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov,
	Bart Van Assche, Sergey Shtylyov

Despite that it's only 2- or 3-bit maps, convert for-loop followed by
test_bit() to for_each_test_and_clear_bit() as it makes the code cleaner.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/media/radio/radio-shark.c  | 5 +----
 drivers/media/radio/radio-shark2.c | 5 +----
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/media/radio/radio-shark.c b/drivers/media/radio/radio-shark.c
index 127a3be0e0f0..0c50b3a9623e 100644
--- a/drivers/media/radio/radio-shark.c
+++ b/drivers/media/radio/radio-shark.c
@@ -158,10 +158,7 @@ static void shark_led_work(struct work_struct *work)
 		container_of(work, struct shark_device, led_work);
 	int i, res, brightness, actual_len;
 
-	for (i = 0; i < 3; i++) {
-		if (!test_and_clear_bit(i, &shark->brightness_new))
-			continue;
-
+	for_each_test_and_clear_bit(i, &shark->brightness_new, 3) {
 		brightness = atomic_read(&shark->brightness[i]);
 		memset(shark->transfer_buffer, 0, TB_LEN);
 		if (i != RED_LED) {
diff --git a/drivers/media/radio/radio-shark2.c b/drivers/media/radio/radio-shark2.c
index f1c5c0a6a335..d9ef241e1778 100644
--- a/drivers/media/radio/radio-shark2.c
+++ b/drivers/media/radio/radio-shark2.c
@@ -145,10 +145,7 @@ static void shark_led_work(struct work_struct *work)
 		container_of(work, struct shark_device, led_work);
 	int i, res, brightness, actual_len;
 
-	for (i = 0; i < 2; i++) {
-		if (!test_and_clear_bit(i, &shark->brightness_new))
-			continue;
-
+	for_each_test_and_clear_bit(i, &shark->brightness_new, 2) {
 		brightness = atomic_read(&shark->brightness[i]);
 		memset(shark->transfer_buffer, 0, TB_LEN);
 		shark->transfer_buffer[0] = 0x83 + i;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 21/35] sfc: switch to using atomic find_bit() API where appropriate
  2023-12-03 19:32   ` Yury Norov
                     ` (18 preceding siblings ...)
  (?)
@ 2023-12-03 19:32   ` Yury Norov
  -1 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:32 UTC (permalink / raw)
  To: linux-kernel, Edward Cree, Martin Habets, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Yury Norov, netdev,
	linux-net-drivers
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov, Bart Van Assche,
	Sergey Shtylyov

SFC code traverses rps_slot_map and rxq_retry_mask bit by bit. We can do
it better by using dedicated atomic find_bit() functions, because they
skip already clear bits.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
---
 drivers/net/ethernet/sfc/rx_common.c         |  4 +---
 drivers/net/ethernet/sfc/siena/rx_common.c   |  4 +---
 drivers/net/ethernet/sfc/siena/siena_sriov.c | 14 ++++++--------
 3 files changed, 8 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/sfc/rx_common.c b/drivers/net/ethernet/sfc/rx_common.c
index d2f35ee15eff..0112968b3fe7 100644
--- a/drivers/net/ethernet/sfc/rx_common.c
+++ b/drivers/net/ethernet/sfc/rx_common.c
@@ -950,9 +950,7 @@ int efx_filter_rfs(struct net_device *net_dev, const struct sk_buff *skb,
 	int rc;
 
 	/* find a free slot */
-	for (slot_idx = 0; slot_idx < EFX_RPS_MAX_IN_FLIGHT; slot_idx++)
-		if (!test_and_set_bit(slot_idx, &efx->rps_slot_map))
-			break;
+	slot_idx = find_and_set_bit(&efx->rps_slot_map, EFX_RPS_MAX_IN_FLIGHT);
 	if (slot_idx >= EFX_RPS_MAX_IN_FLIGHT)
 		return -EBUSY;
 
diff --git a/drivers/net/ethernet/sfc/siena/rx_common.c b/drivers/net/ethernet/sfc/siena/rx_common.c
index 4579f43484c3..160b16aa7486 100644
--- a/drivers/net/ethernet/sfc/siena/rx_common.c
+++ b/drivers/net/ethernet/sfc/siena/rx_common.c
@@ -958,9 +958,7 @@ int efx_siena_filter_rfs(struct net_device *net_dev, const struct sk_buff *skb,
 	int rc;
 
 	/* find a free slot */
-	for (slot_idx = 0; slot_idx < EFX_RPS_MAX_IN_FLIGHT; slot_idx++)
-		if (!test_and_set_bit(slot_idx, &efx->rps_slot_map))
-			break;
+	slot_idx = find_and_set_bit(&efx->rps_slot_map, EFX_RPS_MAX_IN_FLIGHT);
 	if (slot_idx >= EFX_RPS_MAX_IN_FLIGHT)
 		return -EBUSY;
 
diff --git a/drivers/net/ethernet/sfc/siena/siena_sriov.c b/drivers/net/ethernet/sfc/siena/siena_sriov.c
index 8353c15dc233..554b799288b8 100644
--- a/drivers/net/ethernet/sfc/siena/siena_sriov.c
+++ b/drivers/net/ethernet/sfc/siena/siena_sriov.c
@@ -722,14 +722,12 @@ static int efx_vfdi_fini_all_queues(struct siena_vf *vf)
 					     efx_vfdi_flush_wake(vf),
 					     timeout);
 		rxqs_count = 0;
-		for (index = 0; index < count; ++index) {
-			if (test_and_clear_bit(index, vf->rxq_retry_mask)) {
-				atomic_dec(&vf->rxq_retry_count);
-				MCDI_SET_ARRAY_DWORD(
-					inbuf, FLUSH_RX_QUEUES_IN_QID_OFST,
-					rxqs_count, vf_offset + index);
-				rxqs_count++;
-			}
+		for_each_test_and_clear_bit(index, vf->rxq_retry_mask, count) {
+			atomic_dec(&vf->rxq_retry_count);
+			MCDI_SET_ARRAY_DWORD(
+				inbuf, FLUSH_RX_QUEUES_IN_QID_OFST,
+				rxqs_count, vf_offset + index);
+			rxqs_count++;
 		}
 	}
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 22/35] tty: nozomi: optimize interrupt_handler()
  2023-12-03 19:32   ` Yury Norov
                     ` (19 preceding siblings ...)
  (?)
@ 2023-12-03 19:32   ` Yury Norov
  -1 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:32 UTC (permalink / raw)
  To: linux-kernel, Greg Kroah-Hartman, Jiri Slaby, linux-serial
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov,
	Bart Van Assche, Sergey Shtylyov

In exit path of interrupt_handler(), dc->flip map is traversed bit by
bit to find and clear set bits and call tty_flip_buffer_push() for
corresponding ports.

We can do it better by using for_each_test_and_clear_bit(), because it
skips already clear bits.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/tty/nozomi.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/tty/nozomi.c b/drivers/tty/nozomi.c
index 02cd40147b3a..de0503247391 100644
--- a/drivers/tty/nozomi.c
+++ b/drivers/tty/nozomi.c
@@ -1220,9 +1220,8 @@ static irqreturn_t interrupt_handler(int irq, void *dev_id)
 exit_handler:
 	spin_unlock(&dc->spin_mutex);
 
-	for (a = 0; a < NOZOMI_MAX_PORTS; a++)
-		if (test_and_clear_bit(a, &dc->flip))
-			tty_flip_buffer_push(&dc->port[a].port);
+	for_each_test_and_clear_bit(a, &dc->flip, NOZOMI_MAX_PORTS)
+		tty_flip_buffer_push(&dc->port[a].port);
 
 	return IRQ_HANDLED;
 none:
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 23/35] usb: cdc-acm: optimize acm_softint()
  2023-12-03 19:32   ` Yury Norov
                     ` (20 preceding siblings ...)
  (?)
@ 2023-12-03 19:32   ` Yury Norov
  -1 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:32 UTC (permalink / raw)
  To: linux-kernel, Oliver Neukum, Greg Kroah-Hartman, linux-usb
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov,
	Bart Van Assche, Sergey Shtylyov

acm_softint(), uses for-loop to traverse urbs_in_error_delay bitmap
bit by bit to find and clear set bits.

We can do it better by using for_each_test_and_clear_bit(), because it
doesn't test already clear bits.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Oliver Neukum <oneukum@suse.com>
---
 drivers/usb/class/cdc-acm.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
index a1f4e1ead97f..8664b63050b0 100644
--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -613,9 +613,8 @@ static void acm_softint(struct work_struct *work)
 	}
 
 	if (test_and_clear_bit(ACM_ERROR_DELAY, &acm->flags)) {
-		for (i = 0; i < acm->rx_buflimit; i++)
-			if (test_and_clear_bit(i, &acm->urbs_in_error_delay))
-				acm_submit_read_urb(acm, i, GFP_KERNEL);
+		for_each_test_and_clear_bit(i, &acm->urbs_in_error_delay, acm->rx_buflimit)
+			acm_submit_read_urb(acm, i, GFP_KERNEL);
 	}
 
 	if (test_and_clear_bit(EVENT_TTY_WAKEUP, &acm->flags))
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 24/35] block: null_blk: fix opencoded find_and_set_bit() in get_tag()
  2023-12-03 19:32   ` Yury Norov
                     ` (21 preceding siblings ...)
  (?)
@ 2023-12-03 19:32   ` Yury Norov
  2023-12-04 18:26     ` Jan Kara
  2023-12-05  2:39     ` Chengming Zhou
  -1 siblings, 2 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:32 UTC (permalink / raw)
  To: linux-kernel, Jens Axboe, Damien Le Moal, Chaitanya Kulkarni,
	Ming Lei, Johannes Thumshirn, Chengming Zhou, Nitesh Shetty,
	Akinobu Mita, Shin'ichiro Kawasaki, Yury Norov, linux-block
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov, Bart Van Assche,
	Sergey Shtylyov

get_tag() opencodes find_and_set_bit(). Switch the code to use the
dedicated function, and get rid of get_tag entirely.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/block/null_blk/main.c | 41 +++++++++++------------------------
 1 file changed, 13 insertions(+), 28 deletions(-)

diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
index 3021d58ca51c..671dbb9ab928 100644
--- a/drivers/block/null_blk/main.c
+++ b/drivers/block/null_blk/main.c
@@ -760,19 +760,6 @@ static void put_tag(struct nullb_queue *nq, unsigned int tag)
 		wake_up(&nq->wait);
 }
 
-static unsigned int get_tag(struct nullb_queue *nq)
-{
-	unsigned int tag;
-
-	do {
-		tag = find_first_zero_bit(nq->tag_map, nq->queue_depth);
-		if (tag >= nq->queue_depth)
-			return -1U;
-	} while (test_and_set_bit_lock(tag, nq->tag_map));
-
-	return tag;
-}
-
 static void free_cmd(struct nullb_cmd *cmd)
 {
 	put_tag(cmd->nq, cmd->tag);
@@ -782,24 +769,22 @@ static enum hrtimer_restart null_cmd_timer_expired(struct hrtimer *timer);
 
 static struct nullb_cmd *__alloc_cmd(struct nullb_queue *nq)
 {
+	unsigned int tag = find_and_set_bit_lock(nq->tag_map, nq->queue_depth);
 	struct nullb_cmd *cmd;
-	unsigned int tag;
-
-	tag = get_tag(nq);
-	if (tag != -1U) {
-		cmd = &nq->cmds[tag];
-		cmd->tag = tag;
-		cmd->error = BLK_STS_OK;
-		cmd->nq = nq;
-		if (nq->dev->irqmode == NULL_IRQ_TIMER) {
-			hrtimer_init(&cmd->timer, CLOCK_MONOTONIC,
-				     HRTIMER_MODE_REL);
-			cmd->timer.function = null_cmd_timer_expired;
-		}
-		return cmd;
+
+	if (tag >= nq->queue_depth)
+		return NULL;
+
+	cmd = &nq->cmds[tag];
+	cmd->tag = tag;
+	cmd->error = BLK_STS_OK;
+	cmd->nq = nq;
+	if (nq->dev->irqmode == NULL_IRQ_TIMER) {
+		hrtimer_init(&cmd->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+		cmd->timer.function = null_cmd_timer_expired;
 	}
 
-	return NULL;
+	return cmd;
 }
 
 static struct nullb_cmd *alloc_cmd(struct nullb_queue *nq, struct bio *bio)
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 25/35] RDMA/rtrs: fix opencoded find_and_set_bit_lock() in __rtrs_get_permit()
  2023-12-03 19:32   ` Yury Norov
                     ` (22 preceding siblings ...)
  (?)
@ 2023-12-03 19:32   ` Yury Norov
  -1 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:32 UTC (permalink / raw)
  To: linux-kernel, Md. Haris Iqbal, Jack Wang, Jason Gunthorpe,
	Leon Romanovsky, linux-rdma
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov,
	Bart Van Assche, Sergey Shtylyov

The function opencodes find_and_set_bit_lock() with a while-loop polling
on test_and_set_bit_lock(). Use the dedicated function instead.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/infiniband/ulp/rtrs/rtrs-clt.c | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
index 07261523c554..2f3b0ad42e8a 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
@@ -72,18 +72,9 @@ __rtrs_get_permit(struct rtrs_clt_sess *clt, enum rtrs_clt_con_type con_type)
 	struct rtrs_permit *permit;
 	int bit;
 
-	/*
-	 * Adapted from null_blk get_tag(). Callers from different cpus may
-	 * grab the same bit, since find_first_zero_bit is not atomic.
-	 * But then the test_and_set_bit_lock will fail for all the
-	 * callers but one, so that they will loop again.
-	 * This way an explicit spinlock is not required.
-	 */
-	do {
-		bit = find_first_zero_bit(clt->permits_map, max_depth);
-		if (bit >= max_depth)
-			return NULL;
-	} while (test_and_set_bit_lock(bit, clt->permits_map));
+	bit = find_and_set_bit_lock(clt->permits_map, max_depth);
+	if (bit >= max_depth)
+		return NULL;
 
 	permit = get_permit(clt, bit);
 	WARN_ON(permit->mem_id != bit);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 26/35] mISDN: optimize get_free_devid()
  2023-12-03 19:32   ` Yury Norov
                     ` (23 preceding siblings ...)
  (?)
@ 2023-12-03 19:32   ` Yury Norov
  -1 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:32 UTC (permalink / raw)
  To: linux-kernel, Karsten Keil, netdev
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov,
	Bart Van Assche, Sergey Shtylyov

get_free_devid() traverses each bit in device_ids in an open-coded loop.
We can do it faster by using dedicated find_and_set_bit().

It makes the whole function a nice one-liner, and because MAX_DEVICE_ID
is a small constant-time value (63), on 64-bit platforms find_and_set_bit()
call will be optimized to:

	ffs();
	test_and_set_bit().

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/isdn/mISDN/core.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/isdn/mISDN/core.c b/drivers/isdn/mISDN/core.c
index ab8513a7acd5..3f97db006cf3 100644
--- a/drivers/isdn/mISDN/core.c
+++ b/drivers/isdn/mISDN/core.c
@@ -197,14 +197,9 @@ get_mdevice_count(void)
 static int
 get_free_devid(void)
 {
-	u_int	i;
+	u_int i = find_and_set_bit((u_long *)&device_ids, MAX_DEVICE_ID + 1);
 
-	for (i = 0; i <= MAX_DEVICE_ID; i++)
-		if (!test_and_set_bit(i, (u_long *)&device_ids))
-			break;
-	if (i > MAX_DEVICE_ID)
-		return -EBUSY;
-	return i;
+	return i <= MAX_DEVICE_ID ? i : -EBUSY;
 }
 
 int
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 27/35] media: em28xx: cx231xx: fix opencoded find_and_set_bit()
  2023-12-03 19:32   ` Yury Norov
                     ` (24 preceding siblings ...)
  (?)
@ 2023-12-03 19:32   ` Yury Norov
  2023-12-04  8:39     ` Hans Verkuil
  -1 siblings, 1 reply; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:32 UTC (permalink / raw)
  To: linux-kernel, Mauro Carvalho Chehab, Yury Norov, linux-media
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov, Bart Van Assche,
	Sergey Shtylyov

Functions in the media/usb drivers opencode find_and_set_bit() by
polling on a found bit in a while-loop.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/media/usb/cx231xx/cx231xx-cards.c | 16 ++++------
 drivers/media/usb/em28xx/em28xx-cards.c   | 37 +++++++++--------------
 2 files changed, 21 insertions(+), 32 deletions(-)

diff --git a/drivers/media/usb/cx231xx/cx231xx-cards.c b/drivers/media/usb/cx231xx/cx231xx-cards.c
index 92efe6c1f47b..b314603932d7 100644
--- a/drivers/media/usb/cx231xx/cx231xx-cards.c
+++ b/drivers/media/usb/cx231xx/cx231xx-cards.c
@@ -1708,16 +1708,12 @@ static int cx231xx_usb_probe(struct usb_interface *interface,
 		return -ENODEV;
 
 	/* Check to see next free device and mark as used */
-	do {
-		nr = find_first_zero_bit(&cx231xx_devused, CX231XX_MAXBOARDS);
-		if (nr >= CX231XX_MAXBOARDS) {
-			/* No free device slots */
-			dev_err(d,
-				"Supports only %i devices.\n",
-				CX231XX_MAXBOARDS);
-			return -ENOMEM;
-		}
-	} while (test_and_set_bit(nr, &cx231xx_devused));
+	nr = find_and_set_bit(&cx231xx_devused, CX231XX_MAXBOARDS);
+	if (nr >= CX231XX_MAXBOARDS) {
+		/* No free device slots */
+		dev_err(d, "Supports only %i devices.\n", CX231XX_MAXBOARDS);
+		return -ENOMEM;
+	}
 
 	udev = usb_get_dev(interface_to_usbdev(interface));
 
diff --git a/drivers/media/usb/em28xx/em28xx-cards.c b/drivers/media/usb/em28xx/em28xx-cards.c
index 4d037c92af7c..af4809fe74a8 100644
--- a/drivers/media/usb/em28xx/em28xx-cards.c
+++ b/drivers/media/usb/em28xx/em28xx-cards.c
@@ -3684,17 +3684,14 @@ static int em28xx_duplicate_dev(struct em28xx *dev)
 		return -ENOMEM;
 	}
 	/* Check to see next free device and mark as used */
-	do {
-		nr = find_first_zero_bit(em28xx_devused, EM28XX_MAXBOARDS);
-		if (nr >= EM28XX_MAXBOARDS) {
-			/* No free device slots */
-			dev_warn(&dev->intf->dev, ": Supports only %i em28xx boards.\n",
-				 EM28XX_MAXBOARDS);
-			kfree(sec_dev);
-			dev->dev_next = NULL;
-			return -ENOMEM;
-		}
-	} while (test_and_set_bit(nr, em28xx_devused));
+	nr = find_and_set_bit(em28xx_devused, EM28XX_MAXBOARDS);
+	if (nr >= EM28XX_MAXBOARDS) {
+		/* No free device slots */
+		dev_warn(&dev->intf->dev, ": Supports only %i em28xx boards.\n", EM28XX_MAXBOARDS);
+		kfree(sec_dev);
+		dev->dev_next = NULL;
+		return -ENOMEM;
+	}
 	sec_dev->devno = nr;
 	snprintf(sec_dev->name, 28, "em28xx #%d", nr);
 	sec_dev->dev_next = NULL;
@@ -3827,17 +3824,13 @@ static int em28xx_usb_probe(struct usb_interface *intf,
 	udev = usb_get_dev(interface_to_usbdev(intf));
 
 	/* Check to see next free device and mark as used */
-	do {
-		nr = find_first_zero_bit(em28xx_devused, EM28XX_MAXBOARDS);
-		if (nr >= EM28XX_MAXBOARDS) {
-			/* No free device slots */
-			dev_err(&intf->dev,
-				"Driver supports up to %i em28xx boards.\n",
-			       EM28XX_MAXBOARDS);
-			retval = -ENOMEM;
-			goto err_no_slot;
-		}
-	} while (test_and_set_bit(nr, em28xx_devused));
+	nr = find_and_set_bit(em28xx_devused, EM28XX_MAXBOARDS);
+	if (nr >= EM28XX_MAXBOARDS) {
+		/* No free device slots */
+		dev_err(&intf->dev, "Driver supports up to %i em28xx boards.\n", EM28XX_MAXBOARDS);
+		retval = -ENOMEM;
+		goto err_no_slot;
+	}
 
 	/* Don't register audio interfaces */
 	if (intf->altsetting[0].desc.bInterfaceClass == USB_CLASS_AUDIO) {
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 28/35] ethernet: rocker: optimize ofdpa_port_internal_vlan_id_get()
  2023-12-03 19:32   ` Yury Norov
                     ` (25 preceding siblings ...)
  (?)
@ 2023-12-03 19:33   ` Yury Norov
  -1 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:33 UTC (permalink / raw)
  To: linux-kernel, Jiri Pirko, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, netdev
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov,
	Bart Van Assche, Sergey Shtylyov

Optimize ofdpa_port_internal_vlan_id_get() by using find_and_set_bit(),
instead of polling every bit from bitmap in a for-loop.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/net/ethernet/rocker/rocker_ofdpa.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker_ofdpa.c b/drivers/net/ethernet/rocker/rocker_ofdpa.c
index 826990459fa4..449be8af7ffc 100644
--- a/drivers/net/ethernet/rocker/rocker_ofdpa.c
+++ b/drivers/net/ethernet/rocker/rocker_ofdpa.c
@@ -2249,14 +2249,11 @@ static __be16 ofdpa_port_internal_vlan_id_get(struct ofdpa_port *ofdpa_port,
 	found = entry;
 	hash_add(ofdpa->internal_vlan_tbl, &found->entry, found->ifindex);
 
-	for (i = 0; i < OFDPA_N_INTERNAL_VLANS; i++) {
-		if (test_and_set_bit(i, ofdpa->internal_vlan_bitmap))
-			continue;
+	i = find_and_set_bit(ofdpa->internal_vlan_bitmap, OFDPA_N_INTERNAL_VLANS);
+	if (i < OFDPA_N_INTERNAL_VLANS)
 		found->vlan_id = htons(OFDPA_INTERNAL_VLAN_ID_BASE + i);
-		goto found;
-	}
-
-	netdev_err(ofdpa_port->dev, "Out of internal VLAN IDs\n");
+	else
+		netdev_err(ofdpa_port->dev, "Out of internal VLAN IDs\n");
 
 found:
 	found->ref_count++;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 29/35] serial: sc12is7xx: optimize sc16is7xx_alloc_line()
  2023-12-03 19:32   ` Yury Norov
                     ` (26 preceding siblings ...)
  (?)
@ 2023-12-03 19:33   ` Yury Norov
  -1 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:33 UTC (permalink / raw)
  To: linux-kernel, Greg Kroah-Hartman, Jiri Slaby, Hugo Villeneuve,
	Lech Perczak, Ilpo Järvinen, Andy Shevchenko,
	Uwe Kleine-König, Thomas Gleixner, Hui Wang, Isaac True,
	Yury Norov, linux-serial
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov, Bart Van Assche,
	Sergey Shtylyov

Instead of polling every bit in sc16is7xx_lines, switch it to using a
dedicated find_and_set_bit(), and make the function a simple one-liner.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/tty/serial/sc16is7xx.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c
index db2bb1c0d36c..6a463988d5e0 100644
--- a/drivers/tty/serial/sc16is7xx.c
+++ b/drivers/tty/serial/sc16is7xx.c
@@ -427,15 +427,9 @@ static void sc16is7xx_port_update(struct uart_port *port, u8 reg,
 
 static int sc16is7xx_alloc_line(void)
 {
-	int i;
-
 	BUILD_BUG_ON(SC16IS7XX_MAX_DEVS > BITS_PER_LONG);
 
-	for (i = 0; i < SC16IS7XX_MAX_DEVS; i++)
-		if (!test_and_set_bit(i, &sc16is7xx_lines))
-			break;
-
-	return i;
+	return find_and_set_bit(&sc16is7xx_lines, SC16IS7XX_MAX_DEVS);
 }
 
 static void sc16is7xx_power(struct uart_port *port, int on)
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 30/35] bluetooth: optimize cmtp_alloc_block_id()
  2023-12-03 19:32   ` Yury Norov
                     ` (27 preceding siblings ...)
  (?)
@ 2023-12-03 19:33   ` Yury Norov
  -1 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:33 UTC (permalink / raw)
  To: linux-kernel, Karsten Keil, Marcel Holtmann, Johan Hedberg,
	Luiz Augusto von Dentz, Yury Norov, netdev, linux-bluetooth
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov, Bart Van Assche,
	Sergey Shtylyov

Instead of polling every bit in blockids, switch it to using a
dedicated find_and_set_bit(), and make the function a simple one-liner.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 net/bluetooth/cmtp/core.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/net/bluetooth/cmtp/core.c b/net/bluetooth/cmtp/core.c
index 90d130588a3e..b1330acbbff3 100644
--- a/net/bluetooth/cmtp/core.c
+++ b/net/bluetooth/cmtp/core.c
@@ -88,15 +88,9 @@ static void __cmtp_copy_session(struct cmtp_session *session, struct cmtp_connin
 
 static inline int cmtp_alloc_block_id(struct cmtp_session *session)
 {
-	int i, id = -1;
+	int id = find_and_set_bit(&session->blockids, 16);
 
-	for (i = 0; i < 16; i++)
-		if (!test_and_set_bit(i, &session->blockids)) {
-			id = i;
-			break;
-		}
-
-	return id;
+	return id < 16 ? id : -1;
 }
 
 static inline void cmtp_free_block_id(struct cmtp_session *session, int id)
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 31/35] net: smc: use find_and_set_bit() in smc_wr_tx_get_free_slot_index()
  2023-12-03 19:32   ` Yury Norov
                     ` (28 preceding siblings ...)
  (?)
@ 2023-12-03 19:33   ` Yury Norov
  2023-12-04  9:40     ` Alexandra Winter
  -1 siblings, 1 reply; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:33 UTC (permalink / raw)
  To: linux-kernel, Karsten Graul, Wenjia Zhang, Jan Karcher, D. Wythe,
	Tony Lu, Wen Gu, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, linux-s390, netdev
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov,
	Bart Van Assche, Sergey Shtylyov, Alexandra Winter

The function opencodes find_and_set_bit() with a for_each() loop. Use
it, and make the whole function a simple almost one-liner.

While here, drop explicit initialization of *idx, because it's already
initialized by the caller in case of ENOLINK, or set properly with
->wr_tx_mask, if nothing is found, in case of EBUSY.

CC: Tony Lu <tonylu@linux.alibaba.com>
CC: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 net/smc/smc_wr.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c
index 0021065a600a..b6f0cfc52788 100644
--- a/net/smc/smc_wr.c
+++ b/net/smc/smc_wr.c
@@ -170,15 +170,11 @@ void smc_wr_tx_cq_handler(struct ib_cq *ib_cq, void *cq_context)
 
 static inline int smc_wr_tx_get_free_slot_index(struct smc_link *link, u32 *idx)
 {
-	*idx = link->wr_tx_cnt;
 	if (!smc_link_sendable(link))
 		return -ENOLINK;
-	for_each_clear_bit(*idx, link->wr_tx_mask, link->wr_tx_cnt) {
-		if (!test_and_set_bit(*idx, link->wr_tx_mask))
-			return 0;
-	}
-	*idx = link->wr_tx_cnt;
-	return -EBUSY;
+
+	*idx = find_and_set_bit(link->wr_tx_mask, link->wr_tx_cnt);
+	return *idx < link->wr_tx_cnt ? 0 : -EBUSY;
 }
 
 /**
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 32/35] ALSA: use atomic find_bit() functions where applicable
  2023-12-03 19:32   ` Yury Norov
                     ` (29 preceding siblings ...)
  (?)
@ 2023-12-03 19:33   ` Yury Norov
  -1 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:33 UTC (permalink / raw)
  To: linux-kernel, Jaroslav Kysela, Takashi Iwai, Daniel Mack,
	Cezary Rojewski, Kai Vehmanen, Yury Norov, Kees Cook,
	linux-sound, alsa-devel
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov, Bart Van Assche,
	Sergey Shtylyov, Takashi Iwai

ALSA code tests each bit in bitmaps in a for() loop. Switch it to
dedicated atomic find_bit() API.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Takashi Iwai <tiwai@suse.de>
---
 sound/pci/hda/hda_codec.c |  7 +++----
 sound/usb/caiaq/audio.c   | 13 +++++--------
 2 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/sound/pci/hda/hda_codec.c b/sound/pci/hda/hda_codec.c
index 01718b1fc9a7..29254005f394 100644
--- a/sound/pci/hda/hda_codec.c
+++ b/sound/pci/hda/hda_codec.c
@@ -3275,10 +3275,9 @@ static int get_empty_pcm_device(struct hda_bus *bus, unsigned int type)
 
 #ifdef CONFIG_SND_DYNAMIC_MINORS
 	/* non-fixed slots starting from 10 */
-	for (i = 10; i < 32; i++) {
-		if (!test_and_set_bit(i, bus->pcm_dev_bits))
-			return i;
-	}
+	i = find_and_set_next_bit(bus->pcm_dev_bits, 32, 10);
+	if (i < 32)
+		return i;
 #endif
 
 	dev_warn(bus->card->dev, "Too many %s devices\n",
diff --git a/sound/usb/caiaq/audio.c b/sound/usb/caiaq/audio.c
index 4981753652a7..74dfcf32b439 100644
--- a/sound/usb/caiaq/audio.c
+++ b/sound/usb/caiaq/audio.c
@@ -610,7 +610,7 @@ static void read_completed(struct urb *urb)
 	struct snd_usb_caiaq_cb_info *info = urb->context;
 	struct snd_usb_caiaqdev *cdev;
 	struct device *dev;
-	struct urb *out = NULL;
+	struct urb *out;
 	int i, frame, len, send_it = 0, outframe = 0;
 	unsigned long flags;
 	size_t offset = 0;
@@ -625,17 +625,14 @@ static void read_completed(struct urb *urb)
 		return;
 
 	/* find an unused output urb that is unused */
-	for (i = 0; i < N_URBS; i++)
-		if (test_and_set_bit(i, &cdev->outurb_active_mask) == 0) {
-			out = cdev->data_urbs_out[i];
-			break;
-		}
-
-	if (!out) {
+	i = find_and_set_bit(&cdev->outurb_active_mask, N_URBS);
+	if (i >= N_URBS) {
 		dev_err(dev, "Unable to find an output urb to use\n");
 		goto requeue;
 	}
 
+	out = cdev->data_urbs_out[i];
+
 	/* read the recently received packet and send back one which has
 	 * the same layout */
 	for (frame = 0; frame < FRAMES_PER_URB; frame++) {
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 33/35] m68k: rework get_mmu_context()
  2023-12-03 19:32   ` Yury Norov
                     ` (30 preceding siblings ...)
  (?)
@ 2023-12-03 19:33   ` Yury Norov
  -1 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:33 UTC (permalink / raw)
  To: linux-kernel, Geert Uytterhoeven, Hugh Dickins, Andrew Morton,
	Yury Norov, linux-m68k
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov, Bart Van Assche,
	Sergey Shtylyov, Greg Ungerer

get_mmu_context() opencodes atomic find_and_set_bit_wrap(). Switch
it to dedicated function.

CC: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Greg Ungerer <gerg@linux-m68k.org>
---
 arch/m68k/include/asm/mmu_context.h | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/m68k/include/asm/mmu_context.h b/arch/m68k/include/asm/mmu_context.h
index 141bbdfad960..0419ad87a1c1 100644
--- a/arch/m68k/include/asm/mmu_context.h
+++ b/arch/m68k/include/asm/mmu_context.h
@@ -35,12 +35,11 @@ static inline void get_mmu_context(struct mm_struct *mm)
 		atomic_inc(&nr_free_contexts);
 		steal_context();
 	}
-	ctx = next_mmu_context;
-	while (test_and_set_bit(ctx, context_map)) {
-		ctx = find_next_zero_bit(context_map, LAST_CONTEXT+1, ctx);
-		if (ctx > LAST_CONTEXT)
-			ctx = 0;
-	}
+
+	do {
+		ctx = find_and_set_bit_wrap(context_map, LAST_CONTEXT + 1, next_mmu_context);
+	} while (ctx > LAST_CONTEXT);
+
 	next_mmu_context = (ctx + 1) & LAST_CONTEXT;
 	mm->context = ctx;
 	context_mm[ctx] = mm;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 34/35] microblaze: rework get_mmu_context()
  2023-12-03 19:32   ` Yury Norov
                     ` (31 preceding siblings ...)
  (?)
@ 2023-12-03 19:33   ` Yury Norov
  -1 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:33 UTC (permalink / raw)
  To: linux-kernel, Michal Simek, Yury Norov
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov, Bart Van Assche,
	Sergey Shtylyov

Fix opencoded find_and_set_bit_wrap(), which also suppresses potential
KCSAN warning.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 arch/microblaze/include/asm/mmu_context_mm.h | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/microblaze/include/asm/mmu_context_mm.h b/arch/microblaze/include/asm/mmu_context_mm.h
index c2c77f708455..209c3a62353a 100644
--- a/arch/microblaze/include/asm/mmu_context_mm.h
+++ b/arch/microblaze/include/asm/mmu_context_mm.h
@@ -82,12 +82,11 @@ static inline void get_mmu_context(struct mm_struct *mm)
 		return;
 	while (atomic_dec_if_positive(&nr_free_contexts) < 0)
 		steal_context();
-	ctx = next_mmu_context;
-	while (test_and_set_bit(ctx, context_map)) {
-		ctx = find_next_zero_bit(context_map, LAST_CONTEXT+1, ctx);
-		if (ctx > LAST_CONTEXT)
-			ctx = 0;
-	}
+
+	do {
+		ctx = find_and_set_bit_wrap(context_map, LAST_CONTEXT + 1, next_mmu_context);
+	} while (ctx > LAST_CONTEXT);
+
 	next_mmu_context = (ctx + 1) & LAST_CONTEXT;
 	mm->context = ctx;
 	context_mm[ctx] = mm;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 35/35] sh: mach-x3proto: rework ilsel_enable()
  2023-12-03 19:32   ` Yury Norov
                     ` (32 preceding siblings ...)
  (?)
@ 2023-12-03 19:33   ` Yury Norov
  2024-01-19  8:48     ` Geert Uytterhoeven
  -1 siblings, 1 reply; 68+ messages in thread
From: Yury Norov @ 2023-12-03 19:33 UTC (permalink / raw)
  To: linux-kernel, Yoshinori Sato, Rich Felker,
	John Paul Adrian Glaubitz, Yury Norov, linux-sh
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov, Bart Van Assche,
	Sergey Shtylyov

Fix opencoded find_and_set_bit(), which also suppresses potential
KCSAN warning.

CC: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 arch/sh/boards/mach-x3proto/ilsel.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/sh/boards/mach-x3proto/ilsel.c b/arch/sh/boards/mach-x3proto/ilsel.c
index f0d5eb41521a..7fadc479a80b 100644
--- a/arch/sh/boards/mach-x3proto/ilsel.c
+++ b/arch/sh/boards/mach-x3proto/ilsel.c
@@ -99,8 +99,8 @@ int ilsel_enable(ilsel_source_t set)
 	}
 
 	do {
-		bit = find_first_zero_bit(&ilsel_level_map, ILSEL_LEVELS);
-	} while (test_and_set_bit(bit, &ilsel_level_map));
+		bit = find_and_set_bit(&ilsel_level_map, ILSEL_LEVELS);
+	} while (bit >= ILSEL_LEVELS);
 
 	__ilsel_enable(set, bit);
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* RE: bitops: add atomic find_bit() operations
  2023-12-03 19:23   ` Yury Norov
  (?)
@ 2023-12-03 19:56   ` bluez.test.bot
  -1 siblings, 0 replies; 68+ messages in thread
From: bluez.test.bot @ 2023-12-03 19:56 UTC (permalink / raw)
  To: linux-bluetooth, yury.norov

[-- Attachment #1: Type: text/plain, Size: 8333 bytes --]

This is automated email and please do not reply to this email!

Dear submitter,

Thank you for submitting the patches to the linux bluetooth mailing list.
This is a CI test results with your patch series:
PW Link:https://patchwork.kernel.org/project/bluetooth/list/?series=806363

---Test result---

Test Summary:
CheckPatch                    FAIL      2.51 seconds
GitLint                       FAIL      1.18 seconds
SubjectPrefix                 FAIL      0.57 seconds
BuildKernel                   PASS      28.39 seconds
CheckAllWarning               PASS      30.04 seconds
CheckSparse                   PASS      35.78 seconds
CheckSmatch                   PASS      99.11 seconds
BuildKernel32                 PASS      26.63 seconds
TestRunnerSetup               PASS      415.33 seconds
TestRunner_l2cap-tester       PASS      23.06 seconds
TestRunner_iso-tester         PASS      46.68 seconds
TestRunner_bnep-tester        PASS      10.39 seconds
TestRunner_mgmt-tester        PASS      174.49 seconds
TestRunner_rfcomm-tester      PASS      10.92 seconds
TestRunner_sco-tester         PASS      14.36 seconds
TestRunner_ioctl-tester       PASS      12.19 seconds
TestRunner_mesh-tester        PASS      8.83 seconds
TestRunner_smp-tester         PASS      9.83 seconds
TestRunner_userchan-tester    PASS      7.44 seconds
IncrementalBuild              PASS      36.18 seconds

Details
##############################
Test: CheckPatch - FAIL
Desc: Run checkpatch.pl script
Output:
[v2,01/35] lib/find: add atomic find_bit() primitives
WARNING: Use of volatile is usually wrong: see Documentation/process/volatile-considered-harmful.rst
#269: FILE: include/linux/find.h:35:
+unsigned long _find_and_set_bit(volatile unsigned long *addr, unsigned long nbits);

WARNING: Use of volatile is usually wrong: see Documentation/process/volatile-considered-harmful.rst
#270: FILE: include/linux/find.h:36:
+unsigned long _find_and_set_next_bit(volatile unsigned long *addr, unsigned long nbits,

WARNING: Use of volatile is usually wrong: see Documentation/process/volatile-considered-harmful.rst
#272: FILE: include/linux/find.h:38:
+unsigned long _find_and_set_bit_lock(volatile unsigned long *addr, unsigned long nbits);

WARNING: Use of volatile is usually wrong: see Documentation/process/volatile-considered-harmful.rst
#273: FILE: include/linux/find.h:39:
+unsigned long _find_and_set_next_bit_lock(volatile unsigned long *addr, unsigned long nbits,

WARNING: Use of volatile is usually wrong: see Documentation/process/volatile-considered-harmful.rst
#275: FILE: include/linux/find.h:41:
+unsigned long _find_and_clear_bit(volatile unsigned long *addr, unsigned long nbits);

WARNING: Use of volatile is usually wrong: see Documentation/process/volatile-considered-harmful.rst
#276: FILE: include/linux/find.h:42:
+unsigned long _find_and_clear_next_bit(volatile unsigned long *addr, unsigned long nbits,

WARNING: Use of volatile is usually wrong: see Documentation/process/volatile-considered-harmful.rst
#303: FILE: include/linux/find.h:490:
+unsigned long find_and_set_bit(volatile unsigned long *addr, unsigned long nbits)

WARNING: Use of volatile is usually wrong: see Documentation/process/volatile-considered-harmful.rst
#340: FILE: include/linux/find.h:527:
+unsigned long find_and_set_next_bit(volatile unsigned long *addr,

WARNING: Use of volatile is usually wrong: see Documentation/process/volatile-considered-harmful.rst
#369: FILE: include/linux/find.h:556:
+unsigned long find_and_set_bit_wrap(volatile unsigned long *addr,

WARNING: Use of volatile is usually wrong: see Documentation/process/volatile-considered-harmful.rst
#398: FILE: include/linux/find.h:585:
+unsigned long find_and_set_bit_lock(volatile unsigned long *addr, unsigned long nbits)

WARNING: Use of volatile is usually wrong: see Documentation/process/volatile-considered-harmful.rst
#434: FILE: include/linux/find.h:621:
+unsigned long find_and_set_next_bit_lock(volatile unsigned long *addr,

WARNING: Use of volatile is usually wrong: see Documentation/process/volatile-considered-harmful.rst
#464: FILE: include/linux/find.h:651:
+unsigned long find_and_set_bit_wrap_lock(volatile unsigned long *addr,

WARNING: Use of volatile is usually wrong: see Documentation/process/volatile-considered-harmful.rst
#492: FILE: include/linux/find.h:679:
+static inline unsigned long find_and_clear_bit(volatile unsigned long *addr, unsigned long nbits)

WARNING: Use of volatile is usually wrong: see Documentation/process/volatile-considered-harmful.rst
#528: FILE: include/linux/find.h:715:
+unsigned long find_and_clear_next_bit(volatile unsigned long *addr,

WARNING: Use of volatile is usually wrong: see Documentation/process/volatile-considered-harmful.rst
#587: FILE: lib/find_bit.c:119:
+unsigned long _find_and_set_bit(volatile unsigned long *addr, unsigned long nbits)

WARNING: Use of volatile is usually wrong: see Documentation/process/volatile-considered-harmful.rst
#601: FILE: lib/find_bit.c:133:
+unsigned long _find_and_set_next_bit(volatile unsigned long *addr,

WARNING: Use of volatile is usually wrong: see Documentation/process/volatile-considered-harmful.rst
#616: FILE: lib/find_bit.c:148:
+unsigned long _find_and_set_bit_lock(volatile unsigned long *addr, unsigned long nbits)

WARNING: Use of volatile is usually wrong: see Documentation/process/volatile-considered-harmful.rst
#630: FILE: lib/find_bit.c:162:
+unsigned long _find_and_set_next_bit_lock(volatile unsigned long *addr,

WARNING: Use of volatile is usually wrong: see Documentation/process/volatile-considered-harmful.rst
#645: FILE: lib/find_bit.c:177:
+unsigned long _find_and_clear_bit(volatile unsigned long *addr, unsigned long nbits)

WARNING: Use of volatile is usually wrong: see Documentation/process/volatile-considered-harmful.rst
#659: FILE: lib/find_bit.c:191:
+unsigned long _find_and_clear_next_bit(volatile unsigned long *addr,

total: 0 errors, 20 warnings, 402 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

/github/workspace/src/src/13477429.patch has style problems, please review.

NOTE: Ignored message types: UNKNOWN_COMMIT_ID

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.


##############################
Test: GitLint - FAIL
Desc: Run gitlint
Output:
[v2,01/35] lib/find: add atomic find_bit() primitives

WARNING: I3 - ignore-body-lines: gitlint will be switching from using Python regex 'match' (match beginning) to 'search' (match anywhere) semantics. Please review your ignore-body-lines.regex option accordingly. To remove this warning, set general.regex-style-search=True. More details: https://jorisroovers.github.io/gitlint/configuration/#regex-style-search
8: B3 Line contains hard tab characters (\t): "	for (idx = 0; idx < nbits; idx++)"
9: B3 Line contains hard tab characters (\t): "		if (test_and_clear_bit(idx, bitmap))"
10: B3 Line contains hard tab characters (\t): "			do_something(idx);"
14: B3 Line contains hard tab characters (\t): "	do {"
15: B3 Line contains hard tab characters (\t): "		bit = find_first_bit(bitmap, nbits);"
16: B3 Line contains hard tab characters (\t): "		if (bit >= nbits)"
17: B3 Line contains hard tab characters (\t): "			return nbits;"
18: B3 Line contains hard tab characters (\t): "	} while (!test_and_clear_bit(bit, bitmap));"
19: B3 Line contains hard tab characters (\t): "	return bit;"
24: B3 Line contains hard tab characters (\t): "	for_each_test_and_clear_bit(idx, bitmap, nbits)"
25: B3 Line contains hard tab characters (\t): "		do_something(idx);"
28: B3 Line contains hard tab characters (\t): "	return find_and_clear_bit(bitmap, nbits);"
69: B3 Line contains hard tab characters (\t): "	find_and_set_bit(addr, nbits);"
70: B3 Line contains hard tab characters (\t): "	find_and_set_next_bit(addr, nbits, start);"
71: B3 Line contains hard tab characters (\t): "	..."
##############################
Test: SubjectPrefix - FAIL
Desc: Check subject contains "Bluetooth" prefix
Output:
"Bluetooth: " prefix is not specified in the subject
"Bluetooth: " prefix is not specified in the subject
"Bluetooth: " prefix is not specified in the subject


---
Regards,
Linux Bluetooth


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 14/35] PCI: hv: switch hv_get_dom_num() to use atomic find_bit()
  2023-12-03 19:32   ` [PATCH v2 14/35] PCI: hv: switch hv_get_dom_num() to use atomic find_bit() Yury Norov
@ 2023-12-04  5:58     ` Wei Liu
  2023-12-04 19:14     ` Bjorn Helgaas
  1 sibling, 0 replies; 68+ messages in thread
From: Wei Liu @ 2023-12-04  5:58 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-kernel, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Lorenzo Pieralisi, Krzysztof Wilczyński,
	Rob Herring, Bjorn Helgaas, linux-hyperv, linux-pci, Jan Kara,
	Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov, Bart Van Assche,
	Sergey Shtylyov, Michael Kelley

On Sun, Dec 03, 2023 at 11:32:46AM -0800, Yury Norov wrote:
> The function traverses bitmap with for_each_clear_bit() just to allocate
> a bit atomically. We can do it better with a dedicated find_and_set_bit().
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> Reviewed-by: Michael Kelley <mhklinux@outlook.com>

Acked-by: Wei Liu <wei.liu@kernel.org>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 27/35] media: em28xx: cx231xx: fix opencoded find_and_set_bit()
  2023-12-03 19:32   ` [PATCH v2 27/35] media: em28xx: cx231xx: fix opencoded find_and_set_bit() Yury Norov
@ 2023-12-04  8:39     ` Hans Verkuil
  2023-12-04 13:05       ` Andy Shevchenko
  0 siblings, 1 reply; 68+ messages in thread
From: Hans Verkuil @ 2023-12-04  8:39 UTC (permalink / raw)
  To: Yury Norov, linux-kernel, Mauro Carvalho Chehab, linux-media
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov, Bart Van Assche,
	Sergey Shtylyov

On 03/12/2023 20:32, Yury Norov wrote:
> Functions in the media/usb drivers opencode find_and_set_bit() by
> polling on a found bit in a while-loop.
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>

Acked-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>

Feel free to take this yourself through the bitmap tree.

It's a nice improvement.

Regards,

	Hans

> ---
>  drivers/media/usb/cx231xx/cx231xx-cards.c | 16 ++++------
>  drivers/media/usb/em28xx/em28xx-cards.c   | 37 +++++++++--------------
>  2 files changed, 21 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/media/usb/cx231xx/cx231xx-cards.c b/drivers/media/usb/cx231xx/cx231xx-cards.c
> index 92efe6c1f47b..b314603932d7 100644
> --- a/drivers/media/usb/cx231xx/cx231xx-cards.c
> +++ b/drivers/media/usb/cx231xx/cx231xx-cards.c
> @@ -1708,16 +1708,12 @@ static int cx231xx_usb_probe(struct usb_interface *interface,
>  		return -ENODEV;
>  
>  	/* Check to see next free device and mark as used */
> -	do {
> -		nr = find_first_zero_bit(&cx231xx_devused, CX231XX_MAXBOARDS);
> -		if (nr >= CX231XX_MAXBOARDS) {
> -			/* No free device slots */
> -			dev_err(d,
> -				"Supports only %i devices.\n",
> -				CX231XX_MAXBOARDS);
> -			return -ENOMEM;
> -		}
> -	} while (test_and_set_bit(nr, &cx231xx_devused));
> +	nr = find_and_set_bit(&cx231xx_devused, CX231XX_MAXBOARDS);
> +	if (nr >= CX231XX_MAXBOARDS) {
> +		/* No free device slots */
> +		dev_err(d, "Supports only %i devices.\n", CX231XX_MAXBOARDS);
> +		return -ENOMEM;
> +	}
>  
>  	udev = usb_get_dev(interface_to_usbdev(interface));
>  
> diff --git a/drivers/media/usb/em28xx/em28xx-cards.c b/drivers/media/usb/em28xx/em28xx-cards.c
> index 4d037c92af7c..af4809fe74a8 100644
> --- a/drivers/media/usb/em28xx/em28xx-cards.c
> +++ b/drivers/media/usb/em28xx/em28xx-cards.c
> @@ -3684,17 +3684,14 @@ static int em28xx_duplicate_dev(struct em28xx *dev)
>  		return -ENOMEM;
>  	}
>  	/* Check to see next free device and mark as used */
> -	do {
> -		nr = find_first_zero_bit(em28xx_devused, EM28XX_MAXBOARDS);
> -		if (nr >= EM28XX_MAXBOARDS) {
> -			/* No free device slots */
> -			dev_warn(&dev->intf->dev, ": Supports only %i em28xx boards.\n",
> -				 EM28XX_MAXBOARDS);
> -			kfree(sec_dev);
> -			dev->dev_next = NULL;
> -			return -ENOMEM;
> -		}
> -	} while (test_and_set_bit(nr, em28xx_devused));
> +	nr = find_and_set_bit(em28xx_devused, EM28XX_MAXBOARDS);
> +	if (nr >= EM28XX_MAXBOARDS) {
> +		/* No free device slots */
> +		dev_warn(&dev->intf->dev, ": Supports only %i em28xx boards.\n", EM28XX_MAXBOARDS);
> +		kfree(sec_dev);
> +		dev->dev_next = NULL;
> +		return -ENOMEM;
> +	}
>  	sec_dev->devno = nr;
>  	snprintf(sec_dev->name, 28, "em28xx #%d", nr);
>  	sec_dev->dev_next = NULL;
> @@ -3827,17 +3824,13 @@ static int em28xx_usb_probe(struct usb_interface *intf,
>  	udev = usb_get_dev(interface_to_usbdev(intf));
>  
>  	/* Check to see next free device and mark as used */
> -	do {
> -		nr = find_first_zero_bit(em28xx_devused, EM28XX_MAXBOARDS);
> -		if (nr >= EM28XX_MAXBOARDS) {
> -			/* No free device slots */
> -			dev_err(&intf->dev,
> -				"Driver supports up to %i em28xx boards.\n",
> -			       EM28XX_MAXBOARDS);
> -			retval = -ENOMEM;
> -			goto err_no_slot;
> -		}
> -	} while (test_and_set_bit(nr, em28xx_devused));
> +	nr = find_and_set_bit(em28xx_devused, EM28XX_MAXBOARDS);
> +	if (nr >= EM28XX_MAXBOARDS) {
> +		/* No free device slots */
> +		dev_err(&intf->dev, "Driver supports up to %i em28xx boards.\n", EM28XX_MAXBOARDS);
> +		retval = -ENOMEM;
> +		goto err_no_slot;
> +	}
>  
>  	/* Don't register audio interfaces */
>  	if (intf->altsetting[0].desc.bInterfaceClass == USB_CLASS_AUDIO) {


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 20/35] media: radio-shark: use atomic find_bit() API where appropriate
  2023-12-03 19:32   ` [PATCH v2 20/35] media: radio-shark: " Yury Norov
@ 2023-12-04  8:43     ` Hans Verkuil
  0 siblings, 0 replies; 68+ messages in thread
From: Hans Verkuil @ 2023-12-04  8:43 UTC (permalink / raw)
  To: Yury Norov, linux-kernel, Mauro Carvalho Chehab, linux-media
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov, Bart Van Assche,
	Sergey Shtylyov

On 03/12/2023 20:32, Yury Norov wrote:
> Despite that it's only 2- or 3-bit maps, convert for-loop followed by
> test_bit() to for_each_test_and_clear_bit() as it makes the code cleaner.
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>

Acked-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>

You can take this patch yourself as well.

Regards,

	Hans

> ---
>  drivers/media/radio/radio-shark.c  | 5 +----
>  drivers/media/radio/radio-shark2.c | 5 +----
>  2 files changed, 2 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/media/radio/radio-shark.c b/drivers/media/radio/radio-shark.c
> index 127a3be0e0f0..0c50b3a9623e 100644
> --- a/drivers/media/radio/radio-shark.c
> +++ b/drivers/media/radio/radio-shark.c
> @@ -158,10 +158,7 @@ static void shark_led_work(struct work_struct *work)
>  		container_of(work, struct shark_device, led_work);
>  	int i, res, brightness, actual_len;
>  
> -	for (i = 0; i < 3; i++) {
> -		if (!test_and_clear_bit(i, &shark->brightness_new))
> -			continue;
> -
> +	for_each_test_and_clear_bit(i, &shark->brightness_new, 3) {
>  		brightness = atomic_read(&shark->brightness[i]);
>  		memset(shark->transfer_buffer, 0, TB_LEN);
>  		if (i != RED_LED) {
> diff --git a/drivers/media/radio/radio-shark2.c b/drivers/media/radio/radio-shark2.c
> index f1c5c0a6a335..d9ef241e1778 100644
> --- a/drivers/media/radio/radio-shark2.c
> +++ b/drivers/media/radio/radio-shark2.c
> @@ -145,10 +145,7 @@ static void shark_led_work(struct work_struct *work)
>  		container_of(work, struct shark_device, led_work);
>  	int i, res, brightness, actual_len;
>  
> -	for (i = 0; i < 2; i++) {
> -		if (!test_and_clear_bit(i, &shark->brightness_new))
> -			continue;
> -
> +	for_each_test_and_clear_bit(i, &shark->brightness_new, 2) {
>  		brightness = atomic_read(&shark->brightness[i]);
>  		memset(shark->transfer_buffer, 0, TB_LEN);
>  		shark->transfer_buffer[0] = 0x83 + i;


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 31/35] net: smc: use find_and_set_bit() in smc_wr_tx_get_free_slot_index()
  2023-12-03 19:33   ` [PATCH v2 31/35] net: smc: use find_and_set_bit() in smc_wr_tx_get_free_slot_index() Yury Norov
@ 2023-12-04  9:40     ` Alexandra Winter
  2023-12-11 22:34       ` Yury Norov
  0 siblings, 1 reply; 68+ messages in thread
From: Alexandra Winter @ 2023-12-04  9:40 UTC (permalink / raw)
  To: Yury Norov, linux-kernel, Karsten Graul, Wenjia Zhang,
	Jan Karcher, D. Wythe, Tony Lu, Wen Gu, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-s390, netdev
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov, Bart Van Assche,
	Sergey Shtylyov



On 03.12.23 20:33, Yury Norov wrote:
> The function opencodes find_and_set_bit() with a for_each() loop. Use
> it, and make the whole function a simple almost one-liner.
> 
> While here, drop explicit initialization of *idx, because it's already
> initialized by the caller in case of ENOLINK, or set properly with
> ->wr_tx_mask, if nothing is found, in case of EBUSY.
> 
> CC: Tony Lu <tonylu@linux.alibaba.com>
> CC: Alexandra Winter <wintera@linux.ibm.com>
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> ---

Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>


Thanks a lot for the great helper function!
I guess the top-level maintainers will figure out, how this series best finds its way upstream.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 13/35] KVM: x86: hyper-v: optimize and cleanup kvm_hv_process_stimers()
  2023-12-03 19:32   ` [PATCH v2 13/35] KVM: x86: hyper-v: optimize and cleanup kvm_hv_process_stimers() Yury Norov
@ 2023-12-04  9:53     ` Vitaly Kuznetsov
  2023-12-04 16:00       ` Sean Christopherson
  0 siblings, 1 reply; 68+ messages in thread
From: Vitaly Kuznetsov @ 2023-12-04  9:53 UTC (permalink / raw)
  To: Yury Norov, linux-kernel, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, kvm
  Cc: Yury Norov, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov,
	Bart Van Assche, Sergey Shtylyov

Yury Norov <yury.norov@gmail.com> writes:

> The function traverses stimer_pending_bitmap in a for-loop bit by bit.
> We can do it faster by using atomic find_and_set_bit().
>
> While here, refactor the logic by decreasing indentation level.
>
> CC: Vitaly Kuznetsov <vkuznets@redhat.com>
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> ---
>  arch/x86/kvm/hyperv.c | 39 +++++++++++++++++++--------------------
>  1 file changed, 19 insertions(+), 20 deletions(-)
>
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 238afd7335e4..a0e45d20d451 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -870,27 +870,26 @@ void kvm_hv_process_stimers(struct kvm_vcpu *vcpu)
>  	if (!hv_vcpu)
>  		return;
>  
> -	for (i = 0; i < ARRAY_SIZE(hv_vcpu->stimer); i++)
> -		if (test_and_clear_bit(i, hv_vcpu->stimer_pending_bitmap)) {
> -			stimer = &hv_vcpu->stimer[i];
> -			if (stimer->config.enable) {
> -				exp_time = stimer->exp_time;
> -
> -				if (exp_time) {
> -					time_now =
> -						get_time_ref_counter(vcpu->kvm);
> -					if (time_now >= exp_time)
> -						stimer_expiration(stimer);
> -				}
> -
> -				if ((stimer->config.enable) &&
> -				    stimer->count) {
> -					if (!stimer->msg_pending)
> -						stimer_start(stimer);
> -				} else
> -					stimer_cleanup(stimer);
> -			}
> +	for_each_test_and_clear_bit(i, hv_vcpu->stimer_pending_bitmap,
> +					ARRAY_SIZE(hv_vcpu->stimer)) {
> +		stimer = &hv_vcpu->stimer[i];
> +		if (!stimer->config.enable)
> +			continue;
> +
> +		exp_time = stimer->exp_time;
> +
> +		if (exp_time) {
> +			time_now = get_time_ref_counter(vcpu->kvm);
> +			if (time_now >= exp_time)
> +				stimer_expiration(stimer);
>  		}
> +
> +		if (stimer->config.enable && stimer->count) {
> +			if (!stimer->msg_pending)
> +				stimer_start(stimer);
> +		} else
> +			stimer_cleanup(stimer);

Minor nitpick: it's better (and afair required by coding style) to use
'{}' for both branches here:

	if (stimer->config.enable && stimer->count) {
		if (!stimer->msg_pending)
			stimer_start(stimer);
	} else {
		stimer_cleanup(stimer);
	}

> +	}
>  }
>  
>  void kvm_hv_vcpu_uninit(struct kvm_vcpu *vcpu)

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>

-- 
Vitaly


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 27/35] media: em28xx: cx231xx: fix opencoded find_and_set_bit()
  2023-12-04  8:39     ` Hans Verkuil
@ 2023-12-04 13:05       ` Andy Shevchenko
  2023-12-04 13:09         ` Hans Verkuil
  0 siblings, 1 reply; 68+ messages in thread
From: Andy Shevchenko @ 2023-12-04 13:05 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Yury Norov, linux-kernel, Mauro Carvalho Chehab, linux-media,
	Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Maxim Kuvyrkov, Alexey Klimov, Bart Van Assche, Sergey Shtylyov

On Mon, Dec 04, 2023 at 09:39:59AM +0100, Hans Verkuil wrote:
> On 03/12/2023 20:32, Yury Norov wrote:
> > Functions in the media/usb drivers opencode find_and_set_bit() by
> > polling on a found bit in a while-loop.

...

> It's a nice improvement.

Wouldn't it be even nicer to utilise IDA framework?

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 00/35] bitops: add atomic find_bit() operations
  2023-12-03 19:23 ` Yury Norov
@ 2023-12-04 13:07   ` Andy Shevchenko
  -1 siblings, 0 replies; 68+ messages in thread
From: Andy Shevchenko @ 2023-12-04 13:07 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-kernel, David S. Miller, H. Peter Anvin,
	James E.J. Bottomley, K. Y. Srinivasan, Md. Haris Iqbal,
	Akinobu Mita, Andrew Morton, Bjorn Andersson, Borislav Petkov,
	Chaitanya Kulkarni, Christian Brauner, Damien Le Moal,
	Dave Hansen, David Disseldorp, Edward Cree, Eric Dumazet,
	Fenghua Yu, Geert Uytterhoeven, Greg Kroah-Hartman,
	Gregory Greenman, Hans Verkuil, Hans de Goede, Hugh Dickins,
	Ingo Molnar, Jakub Kicinski, Jaroslav Kysela, Jason Gunthorpe,
	Jens Axboe, Jiri Pirko, Jiri Slaby, Kalle Valo, Karsten Graul,
	Karsten Keil, Kees Cook, Leon Romanovsky, Mark Rutland,
	Martin Habets, Mauro Carvalho Chehab, Michael Ellerman,
	Michal Simek, Nicholas Piggin, Oliver Neukum, Paolo Abeni,
	Paolo Bonzini, Peter Zijlstra, Ping-Ke Shih, Rich Felker,
	Rob Herring, Robin Murphy, Sean Christopherson, Shuai Xue,
	Stanislaw Gruszka, Steven Rostedt, Thomas Bogendoerfer,
	Thomas Gleixner, Valentin Schneider, Vitaly Kuznetsov,
	Wenjia Zhang, Will Deacon, Yoshinori Sato,
	GR-QLogic-Storage-Upstream, alsa-devel, ath10k, dmaengine, iommu,
	kvm, linux-arm-kernel, linux-arm-msm, linux-block,
	linux-bluetooth, linux-hyperv, linux-m68k, linux-media,
	linux-mips, linux-net-drivers, linux-pci, linux-rdma, linux-s390,
	linux-scsi, linux-serial, linux-sh, linux-sound, linux-usb,
	linux-wireless, linuxppc-dev, mpi3mr-linuxdrv.pdl, netdev,
	sparclinux, x86, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Maxim Kuvyrkov, Alexey Klimov, Bart Van Assche,
	Sergey Shtylyov

On Sun, Dec 03, 2023 at 11:23:47AM -0800, Yury Norov wrote:
> Add helpers around test_and_{set,clear}_bit() that allow to search for
> clear or set bits and flip them atomically.
> 
> The target patterns may look like this:
> 
> 	for (idx = 0; idx < nbits; idx++)
> 		if (test_and_clear_bit(idx, bitmap))
> 			do_something(idx);
> 
> Or like this:
> 
> 	do {
> 		bit = find_first_bit(bitmap, nbits);
> 		if (bit >= nbits)
> 			return nbits;
> 	} while (!test_and_clear_bit(bit, bitmap));
> 	return bit;
> 
> In both cases, the opencoded loop may be converted to a single function
> or iterator call. Correspondingly:
> 
> 	for_each_test_and_clear_bit(idx, bitmap, nbits)
> 		do_something(idx);
> 
> Or:
> 	return find_and_clear_bit(bitmap, nbits);
> 
> Obviously, the less routine code people have to write themself, the
> less probability to make a mistake.
> 
> Those are not only handy helpers but also resolve a non-trivial
> issue of using non-atomic find_bit() together with atomic
> test_and_{set,clear)_bit().
> 
> The trick is that find_bit() implies that the bitmap is a regular
> non-volatile piece of memory, and compiler is allowed to use such
> optimization techniques like re-fetching memory instead of caching it.
> 
> For example, find_first_bit() is implemented like this:
> 
>       for (idx = 0; idx * BITS_PER_LONG < sz; idx++) {
>               val = addr[idx];
>               if (val) {
>                       sz = min(idx * BITS_PER_LONG + __ffs(val), sz);
>                       break;
>               }
>       }
> 
> On register-memory architectures, like x86, compiler may decide to
> access memory twice - first time to compare against 0, and second time
> to fetch its value to pass it to __ffs().
> 
> When running find_first_bit() on volatile memory, the memory may get
> changed in-between, and for instance, it may lead to passing 0 to
> __ffs(), which is undefined. This is a potentially dangerous call.
> 
> find_and_clear_bit() as a wrapper around test_and_clear_bit()
> naturally treats underlying bitmap as a volatile memory and prevents
> compiler from such optimizations.
> 
> Now that KCSAN is catching exactly this type of situations and warns on
> undercover memory modifications. We can use it to reveal improper usage
> of find_bit(), and convert it to atomic find_and_*_bit() as appropriate.
> 
> The 1st patch of the series adds the following atomic primitives:
> 
> 	find_and_set_bit(addr, nbits);
> 	find_and_set_next_bit(addr, nbits, start);
> 	...
> 
> Here find_and_{set,clear} part refers to the corresponding
> test_and_{set,clear}_bit function. Suffixes like _wrap or _lock
> derive their semantics from corresponding find() or test() functions.
> 
> For brevity, the naming omits the fact that we search for zero bit in
> find_and_set, and correspondingly search for set bit in find_and_clear
> functions.
> 
> The patch also adds iterators with atomic semantics, like
> for_each_test_and_set_bit(). Here, the naming rule is to simply prefix
> corresponding atomic operation with 'for_each'.
> 
> This series is a result of discussion [1]. All find_bit() functions imply
> exclusive access to the bitmaps. However, KCSAN reports quite a number
> of warnings related to find_bit() API. Some of them are not pointing
> to real bugs because in many situations people intentionally allow
> concurrent bitmap operations.
> 
> If so, find_bit() can be annotated such that KCSAN will ignore it:
> 
>         bit = data_race(find_first_bit(bitmap, nbits));
> 
> This series addresses the other important case where people really need
> atomic find ops. As the following patches show, the resulting code
> looks safer and more verbose comparing to opencoded loops followed by
> atomic bit flips.
> 
> In [1] Mirsad reported 2% slowdown in a single-thread search test when
> switching find_bit() function to treat bitmaps as volatile arrays. On
> the other hand, kernel robot in the same thread reported +3.7% to the
> performance of will-it-scale.per_thread_ops test.
> 
> Assuming that our compilers are sane and generate better code against
> properly annotated data, the above discrepancy doesn't look weird. When
> running on non-volatile bitmaps, plain find_bit() outperforms atomic
> find_and_bit(), and vice-versa.

...

In some cases the better improvements can be achieved by switching
the (very) old code to utilise IDA framework.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 00/35] bitops: add atomic find_bit() operations
@ 2023-12-04 13:07   ` Andy Shevchenko
  0 siblings, 0 replies; 68+ messages in thread
From: Andy Shevchenko @ 2023-12-04 13:07 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-sh, Jaroslav Kysela, Hans Verkuil, Md. Haris Iqbal,
	K. Y. Srinivasan, Bart Van Assche, Geert Uytterhoeven,
	Jiri Pirko, Christian Brauner, Nicholas Piggin, Alexey Klimov,
	Sergey Shtylyov, Thomas Gleixner, Karsten Keil,
	Greg Kroah-Hartman, linux-usb, linux-wireless, linux-kernel,
	GR-QLogic-Storage-Upstream, Andrew Morton, Mark Rutland,
	alsa-devel, Dave Hansen, Eric Dumazet, Gregory Greenman,
	linux-s390, Valentin Schneider, Leon Romanovsky, Will Deacon,
	mpi3mr-linuxdrv.pdl, Hugh Dickins, iommu, Martin Habets,
	linux-media, Stanislaw Gruszka, linux-arm-msm, Wenjia Zhang,
	linux-m68k, linux-arm-kernel, Sean Christopherson, Oliver Neukum,
	Mirsad Todorovac, linux-pci, Rasmus Villemoes, linux-hyperv,
	Matthew Wilcox, Jiri Slaby, Rob Herring, linux-rdma,
	Damien Le Moal, ath10k, David Disseldorp, Paolo Abeni,
	Fenghua Yu, Kees Cook, James E.J. Bot tomley, Akinobu Mita,
	Steven Rostedt, Borislav Petkov, Mauro Carvalho Chehab,
	Thomas Bogendoerfer, Edward Cree, Shuai Xue, netdev,
	David S. Miller, Rich Felker, Jan Kara, kvm, Peter Zijlstra,
	H. Peter Anvin, sparclinux, Ping-Ke Shih, linux-scsi,
	linux-net-drivers, x86, Jason Gunthorpe, Ingo Molnar,
	linux-serial, Jakub Kicinski, Chaitanya Kulkarni, Kalle Valo,
	linux-block, Hans de Goede, linux-sound, Maxim Kuvyrkov,
	Jen s Axboe, Michal Simek, Yoshinori Sato, Robin Murphy,
	Bjorn Andersson, linux-mips, linux-bluetooth, dmaengine,
	Paolo Bonzini, Vitaly Kuznetsov, linuxppc-dev, Karsten Graul

On Sun, Dec 03, 2023 at 11:23:47AM -0800, Yury Norov wrote:
> Add helpers around test_and_{set,clear}_bit() that allow to search for
> clear or set bits and flip them atomically.
> 
> The target patterns may look like this:
> 
> 	for (idx = 0; idx < nbits; idx++)
> 		if (test_and_clear_bit(idx, bitmap))
> 			do_something(idx);
> 
> Or like this:
> 
> 	do {
> 		bit = find_first_bit(bitmap, nbits);
> 		if (bit >= nbits)
> 			return nbits;
> 	} while (!test_and_clear_bit(bit, bitmap));
> 	return bit;
> 
> In both cases, the opencoded loop may be converted to a single function
> or iterator call. Correspondingly:
> 
> 	for_each_test_and_clear_bit(idx, bitmap, nbits)
> 		do_something(idx);
> 
> Or:
> 	return find_and_clear_bit(bitmap, nbits);
> 
> Obviously, the less routine code people have to write themself, the
> less probability to make a mistake.
> 
> Those are not only handy helpers but also resolve a non-trivial
> issue of using non-atomic find_bit() together with atomic
> test_and_{set,clear)_bit().
> 
> The trick is that find_bit() implies that the bitmap is a regular
> non-volatile piece of memory, and compiler is allowed to use such
> optimization techniques like re-fetching memory instead of caching it.
> 
> For example, find_first_bit() is implemented like this:
> 
>       for (idx = 0; idx * BITS_PER_LONG < sz; idx++) {
>               val = addr[idx];
>               if (val) {
>                       sz = min(idx * BITS_PER_LONG + __ffs(val), sz);
>                       break;
>               }
>       }
> 
> On register-memory architectures, like x86, compiler may decide to
> access memory twice - first time to compare against 0, and second time
> to fetch its value to pass it to __ffs().
> 
> When running find_first_bit() on volatile memory, the memory may get
> changed in-between, and for instance, it may lead to passing 0 to
> __ffs(), which is undefined. This is a potentially dangerous call.
> 
> find_and_clear_bit() as a wrapper around test_and_clear_bit()
> naturally treats underlying bitmap as a volatile memory and prevents
> compiler from such optimizations.
> 
> Now that KCSAN is catching exactly this type of situations and warns on
> undercover memory modifications. We can use it to reveal improper usage
> of find_bit(), and convert it to atomic find_and_*_bit() as appropriate.
> 
> The 1st patch of the series adds the following atomic primitives:
> 
> 	find_and_set_bit(addr, nbits);
> 	find_and_set_next_bit(addr, nbits, start);
> 	...
> 
> Here find_and_{set,clear} part refers to the corresponding
> test_and_{set,clear}_bit function. Suffixes like _wrap or _lock
> derive their semantics from corresponding find() or test() functions.
> 
> For brevity, the naming omits the fact that we search for zero bit in
> find_and_set, and correspondingly search for set bit in find_and_clear
> functions.
> 
> The patch also adds iterators with atomic semantics, like
> for_each_test_and_set_bit(). Here, the naming rule is to simply prefix
> corresponding atomic operation with 'for_each'.
> 
> This series is a result of discussion [1]. All find_bit() functions imply
> exclusive access to the bitmaps. However, KCSAN reports quite a number
> of warnings related to find_bit() API. Some of them are not pointing
> to real bugs because in many situations people intentionally allow
> concurrent bitmap operations.
> 
> If so, find_bit() can be annotated such that KCSAN will ignore it:
> 
>         bit = data_race(find_first_bit(bitmap, nbits));
> 
> This series addresses the other important case where people really need
> atomic find ops. As the following patches show, the resulting code
> looks safer and more verbose comparing to opencoded loops followed by
> atomic bit flips.
> 
> In [1] Mirsad reported 2% slowdown in a single-thread search test when
> switching find_bit() function to treat bitmaps as volatile arrays. On
> the other hand, kernel robot in the same thread reported +3.7% to the
> performance of will-it-scale.per_thread_ops test.
> 
> Assuming that our compilers are sane and generate better code against
> properly annotated data, the above discrepancy doesn't look weird. When
> running on non-volatile bitmaps, plain find_bit() outperforms atomic
> find_and_bit(), and vice-versa.

...

In some cases the better improvements can be achieved by switching
the (very) old code to utilise IDA framework.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 27/35] media: em28xx: cx231xx: fix opencoded find_and_set_bit()
  2023-12-04 13:05       ` Andy Shevchenko
@ 2023-12-04 13:09         ` Hans Verkuil
  0 siblings, 0 replies; 68+ messages in thread
From: Hans Verkuil @ 2023-12-04 13:09 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Yury Norov, linux-kernel, Mauro Carvalho Chehab, linux-media,
	Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Maxim Kuvyrkov, Alexey Klimov, Bart Van Assche, Sergey Shtylyov

On 04/12/2023 14:05, Andy Shevchenko wrote:
> On Mon, Dec 04, 2023 at 09:39:59AM +0100, Hans Verkuil wrote:
>> On 03/12/2023 20:32, Yury Norov wrote:
>>> Functions in the media/usb drivers opencode find_and_set_bit() by
>>> polling on a found bit in a while-loop.
> 
> ...
> 
>> It's a nice improvement.
> 
> Wouldn't it be even nicer to utilise IDA framework?
> 

Not worth the effort IMHO.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 13/35] KVM: x86: hyper-v: optimize and cleanup kvm_hv_process_stimers()
  2023-12-04  9:53     ` Vitaly Kuznetsov
@ 2023-12-04 16:00       ` Sean Christopherson
  0 siblings, 0 replies; 68+ messages in thread
From: Sean Christopherson @ 2023-12-04 16:00 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Yury Norov, linux-kernel, Paolo Bonzini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	kvm, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov,
	Bart Van Assche, Sergey Shtylyov

On Mon, Dec 04, 2023, Vitaly Kuznetsov wrote:
> > -	for (i = 0; i < ARRAY_SIZE(hv_vcpu->stimer); i++)
> > -		if (test_and_clear_bit(i, hv_vcpu->stimer_pending_bitmap)) {
> > -			stimer = &hv_vcpu->stimer[i];
> > -			if (stimer->config.enable) {
> > -				exp_time = stimer->exp_time;
> > -
> > -				if (exp_time) {
> > -					time_now =
> > -						get_time_ref_counter(vcpu->kvm);
> > -					if (time_now >= exp_time)
> > -						stimer_expiration(stimer);
> > -				}
> > -
> > -				if ((stimer->config.enable) &&
> > -				    stimer->count) {
> > -					if (!stimer->msg_pending)
> > -						stimer_start(stimer);
> > -				} else
> > -					stimer_cleanup(stimer);
> > -			}
> > +	for_each_test_and_clear_bit(i, hv_vcpu->stimer_pending_bitmap,
> > +					ARRAY_SIZE(hv_vcpu->stimer)) {

Another nit, please align the indendation:

	for_each_test_and_clear_bit(i, hv_vcpu->stimer_pending_bitmap,
				    ARRAY_SIZE(hv_vcpu->stimer)) {

> > +		stimer = &hv_vcpu->stimer[i];
> > +		if (!stimer->config.enable)
> > +			continue;
> > +
> > +		exp_time = stimer->exp_time;
> > +
> > +		if (exp_time) {
> > +			time_now = get_time_ref_counter(vcpu->kvm);
> > +			if (time_now >= exp_time)
> > +				stimer_expiration(stimer);
> >  		}
> > +
> > +		if (stimer->config.enable && stimer->count) {
> > +			if (!stimer->msg_pending)
> > +				stimer_start(stimer);
> > +		} else
> > +			stimer_cleanup(stimer);
> 
> Minor nitpick: it's better (and afair required by coding style) to use
> '{}' for both branches here:

Yeah, it's a hard requirement in KVM x86.

> 
> 	if (stimer->config.enable && stimer->count) {
> 		if (!stimer->msg_pending)
> 			stimer_start(stimer);
> 	} else {
> 		stimer_cleanup(stimer);
> 	}
> 
> > +	}
> >  }
> >  
> >  void kvm_hv_vcpu_uninit(struct kvm_vcpu *vcpu)
> 
> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> 
> -- 
> Vitaly
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 03/35] lib/sbitmap; make __sbitmap_get_word() using find_and_set_bit()
  2023-12-03 19:32   ` [PATCH v2 03/35] lib/sbitmap; make __sbitmap_get_word() using find_and_set_bit() Yury Norov
@ 2023-12-04 18:22     ` Jan Kara
  2023-12-04 18:40     ` Jens Axboe
  1 sibling, 0 replies; 68+ messages in thread
From: Jan Kara @ 2023-12-04 18:22 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-kernel, Jens Axboe, linux-block, Jan Kara,
	Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov, Bart Van Assche,
	Sergey Shtylyov

On Sun 03-12-23 11:32:35, Yury Norov wrote:
> __sbitmap_get_word() opencodes either find_and_set_bit_wrap(), or
> find_and_set_next_bit() depending on hint and wrap parameters.
> 
> Switch it to use the atomic find_bit() API. While here, simplify
> sbitmap_find_bit_in_word(), which calls it.
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>

Looks good to me. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  lib/sbitmap.c | 46 ++++++++--------------------------------------
>  1 file changed, 8 insertions(+), 38 deletions(-)
> 
> diff --git a/lib/sbitmap.c b/lib/sbitmap.c
> index d0a5081dfd12..b21aebd07fd6 100644
> --- a/lib/sbitmap.c
> +++ b/lib/sbitmap.c
> @@ -133,38 +133,11 @@ void sbitmap_resize(struct sbitmap *sb, unsigned int depth)
>  }
>  EXPORT_SYMBOL_GPL(sbitmap_resize);
>  
> -static int __sbitmap_get_word(unsigned long *word, unsigned long depth,
> +static inline int __sbitmap_get_word(unsigned long *word, unsigned long depth,
>  			      unsigned int hint, bool wrap)
>  {
> -	int nr;
> -
> -	/* don't wrap if starting from 0 */
> -	wrap = wrap && hint;
> -
> -	while (1) {
> -		nr = find_next_zero_bit(word, depth, hint);
> -		if (unlikely(nr >= depth)) {
> -			/*
> -			 * We started with an offset, and we didn't reset the
> -			 * offset to 0 in a failure case, so start from 0 to
> -			 * exhaust the map.
> -			 */
> -			if (hint && wrap) {
> -				hint = 0;
> -				continue;
> -			}
> -			return -1;
> -		}
> -
> -		if (!test_and_set_bit_lock(nr, word))
> -			break;
> -
> -		hint = nr + 1;
> -		if (hint >= depth - 1)
> -			hint = 0;
> -	}
> -
> -	return nr;
> +	return wrap ? find_and_set_bit_wrap_lock(word, depth, hint) :
> +			find_and_set_next_bit_lock(word, depth, hint);
>  }
>  
>  static int sbitmap_find_bit_in_word(struct sbitmap_word *map,
> @@ -175,15 +148,12 @@ static int sbitmap_find_bit_in_word(struct sbitmap_word *map,
>  	int nr;
>  
>  	do {
> -		nr = __sbitmap_get_word(&map->word, depth,
> -					alloc_hint, wrap);
> -		if (nr != -1)
> -			break;
> -		if (!sbitmap_deferred_clear(map))
> -			break;
> -	} while (1);
> +		nr = __sbitmap_get_word(&map->word, depth, alloc_hint, wrap);
> +		if (nr < depth)
> +			return nr;
> +	} while (sbitmap_deferred_clear(map));
>  
> -	return nr;
> +	return -1;
>  }
>  
>  static int sbitmap_find_bit(struct sbitmap *sb,
> -- 
> 2.40.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 24/35] block: null_blk: fix opencoded find_and_set_bit() in get_tag()
  2023-12-03 19:32   ` [PATCH v2 24/35] block: null_blk: fix opencoded find_and_set_bit() in get_tag() Yury Norov
@ 2023-12-04 18:26     ` Jan Kara
  2023-12-05  2:39     ` Chengming Zhou
  1 sibling, 0 replies; 68+ messages in thread
From: Jan Kara @ 2023-12-04 18:26 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-kernel, Jens Axboe, Damien Le Moal, Chaitanya Kulkarni,
	Ming Lei, Johannes Thumshirn, Chengming Zhou, Nitesh Shetty,
	Akinobu Mita, Shin'ichiro Kawasaki, linux-block, Jan Kara,
	Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov, Bart Van Assche,
	Sergey Shtylyov

On Sun 03-12-23 11:32:56, Yury Norov wrote:
> get_tag() opencodes find_and_set_bit(). Switch the code to use the
> dedicated function, and get rid of get_tag entirely.
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>

Looks good to me. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  drivers/block/null_blk/main.c | 41 +++++++++++------------------------
>  1 file changed, 13 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
> index 3021d58ca51c..671dbb9ab928 100644
> --- a/drivers/block/null_blk/main.c
> +++ b/drivers/block/null_blk/main.c
> @@ -760,19 +760,6 @@ static void put_tag(struct nullb_queue *nq, unsigned int tag)
>  		wake_up(&nq->wait);
>  }
>  
> -static unsigned int get_tag(struct nullb_queue *nq)
> -{
> -	unsigned int tag;
> -
> -	do {
> -		tag = find_first_zero_bit(nq->tag_map, nq->queue_depth);
> -		if (tag >= nq->queue_depth)
> -			return -1U;
> -	} while (test_and_set_bit_lock(tag, nq->tag_map));
> -
> -	return tag;
> -}
> -
>  static void free_cmd(struct nullb_cmd *cmd)
>  {
>  	put_tag(cmd->nq, cmd->tag);
> @@ -782,24 +769,22 @@ static enum hrtimer_restart null_cmd_timer_expired(struct hrtimer *timer);
>  
>  static struct nullb_cmd *__alloc_cmd(struct nullb_queue *nq)
>  {
> +	unsigned int tag = find_and_set_bit_lock(nq->tag_map, nq->queue_depth);
>  	struct nullb_cmd *cmd;
> -	unsigned int tag;
> -
> -	tag = get_tag(nq);
> -	if (tag != -1U) {
> -		cmd = &nq->cmds[tag];
> -		cmd->tag = tag;
> -		cmd->error = BLK_STS_OK;
> -		cmd->nq = nq;
> -		if (nq->dev->irqmode == NULL_IRQ_TIMER) {
> -			hrtimer_init(&cmd->timer, CLOCK_MONOTONIC,
> -				     HRTIMER_MODE_REL);
> -			cmd->timer.function = null_cmd_timer_expired;
> -		}
> -		return cmd;
> +
> +	if (tag >= nq->queue_depth)
> +		return NULL;
> +
> +	cmd = &nq->cmds[tag];
> +	cmd->tag = tag;
> +	cmd->error = BLK_STS_OK;
> +	cmd->nq = nq;
> +	if (nq->dev->irqmode == NULL_IRQ_TIMER) {
> +		hrtimer_init(&cmd->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> +		cmd->timer.function = null_cmd_timer_expired;
>  	}
>  
> -	return NULL;
> +	return cmd;
>  }
>  
>  static struct nullb_cmd *alloc_cmd(struct nullb_queue *nq, struct bio *bio)
> -- 
> 2.40.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 03/35] lib/sbitmap; make __sbitmap_get_word() using find_and_set_bit()
  2023-12-03 19:32   ` [PATCH v2 03/35] lib/sbitmap; make __sbitmap_get_word() using find_and_set_bit() Yury Norov
  2023-12-04 18:22     ` Jan Kara
@ 2023-12-04 18:40     ` Jens Axboe
  1 sibling, 0 replies; 68+ messages in thread
From: Jens Axboe @ 2023-12-04 18:40 UTC (permalink / raw)
  To: Yury Norov, linux-kernel, linux-block
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov, Bart Van Assche,
	Sergey Shtylyov

On 12/3/23 12:32 PM, Yury Norov wrote:
> diff --git a/lib/sbitmap.c b/lib/sbitmap.c
> index d0a5081dfd12..b21aebd07fd6 100644
> --- a/lib/sbitmap.c
> +++ b/lib/sbitmap.c
> @@ -133,38 +133,11 @@ void sbitmap_resize(struct sbitmap *sb, unsigned int depth)
>  }
>  EXPORT_SYMBOL_GPL(sbitmap_resize);
>  
> -static int __sbitmap_get_word(unsigned long *word, unsigned long depth,
> +static inline int __sbitmap_get_word(unsigned long *word, unsigned long depth,
>  			      unsigned int hint, bool wrap)
>  {
> -	int nr;
> -
> -	/* don't wrap if starting from 0 */
> -	wrap = wrap && hint;
> -
> -	while (1) {
> -		nr = find_next_zero_bit(word, depth, hint);
> -		if (unlikely(nr >= depth)) {
> -			/*
> -			 * We started with an offset, and we didn't reset the
> -			 * offset to 0 in a failure case, so start from 0 to
> -			 * exhaust the map.
> -			 */
> -			if (hint && wrap) {
> -				hint = 0;
> -				continue;
> -			}
> -			return -1;
> -		}
> -
> -		if (!test_and_set_bit_lock(nr, word))
> -			break;
> -
> -		hint = nr + 1;
> -		if (hint >= depth - 1)
> -			hint = 0;
> -	}
> -
> -	return nr;
> +	return wrap ? find_and_set_bit_wrap_lock(word, depth, hint) :
> +			find_and_set_next_bit_lock(word, depth, hint);
>  }

Please make this:

	if (wrap)
		return find_and_set_bit_wrap_lock(word, depth, hint) :
	return find_and_set_next_bit_lock(word, depth, hint);

so this is a lot more readable.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 00/35] bitops: add atomic find_bit() operations
  2023-12-03 19:23 ` Yury Norov
@ 2023-12-04 18:51   ` Jan Kara
  -1 siblings, 0 replies; 68+ messages in thread
From: Jan Kara @ 2023-12-04 18:51 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-kernel, David S. Miller, H. Peter Anvin,
	James E.J. Bottomley, K. Y. Srinivasan, Md. Haris Iqbal,
	Akinobu Mita, Andrew Morton, Bjorn Andersson, Borislav Petkov,
	Chaitanya Kulkarni, Christian Brauner, Damien Le Moal,
	Dave Hansen, David Disseldorp, Edward Cree, Eric Dumazet,
	Fenghua Yu, Geert Uytterhoeven, Greg Kroah-Hartman,
	Gregory Greenman, Hans Verkuil, Hans de Goede, Hugh Dickins,
	Ingo Molnar, Jakub Kicinski, Jaroslav Kysela, Jason Gunthorpe,
	Jens Axboe, Jiri Pirko, Jiri Slaby, Kalle Valo, Karsten Graul,
	Karsten Keil, Kees Cook, Leon Romanovsky, Mark Rutland,
	Martin Habets, Mauro Carvalho Chehab, Michael Ellerman,
	Michal Simek, Nicholas Piggin, Oliver Neukum, Paolo Abeni,
	Paolo Bonzini, Peter Zijlstra, Ping-Ke Shih, Rich Felker,
	Rob Herring, Robin Murphy, Sean Christopherson, Shuai Xue,
	Stanislaw Gruszka, Steven Rostedt, Thomas Bogendoerfer,
	Thomas Gleixner, Valentin Schneider, Vitaly Kuznetsov,
	Wenjia Zhang, Will Deacon, Yoshinori Sato,
	GR-QLogic-Storage-Upstream, alsa-devel, ath10k, dmaengine, iommu,
	kvm, linux-arm-kernel, linux-arm-msm, linux-block,
	linux-bluetooth, linux-hyperv, linux-m68k, linux-media,
	linux-mips, linux-net-drivers, linux-pci, linux-rdma, linux-s390,
	linux-scsi, linux-serial, linux-sh, linux-sound, linux-usb,
	linux-wireless, linuxppc-dev, mpi3mr-linuxdrv.pdl, netdev,
	sparclinux, x86, Jan Kara, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov,
	Bart Van Assche, Sergey Shtylyov

Hello Yury!

On Sun 03-12-23 11:23:47, Yury Norov wrote:
> Add helpers around test_and_{set,clear}_bit() that allow to search for
> clear or set bits and flip them atomically.
> 
> The target patterns may look like this:
> 
> 	for (idx = 0; idx < nbits; idx++)
> 		if (test_and_clear_bit(idx, bitmap))
> 			do_something(idx);
> 
> Or like this:
> 
> 	do {
> 		bit = find_first_bit(bitmap, nbits);
> 		if (bit >= nbits)
> 			return nbits;
> 	} while (!test_and_clear_bit(bit, bitmap));
> 	return bit;
> 
> In both cases, the opencoded loop may be converted to a single function
> or iterator call. Correspondingly:
> 
> 	for_each_test_and_clear_bit(idx, bitmap, nbits)
> 		do_something(idx);
> 
> Or:
> 	return find_and_clear_bit(bitmap, nbits);

These are fine cleanups but they actually don't address the case that has
triggered all these changes - namely the xarray use of find_next_bit() in
xas_find_chunk().

...
> This series is a result of discussion [1]. All find_bit() functions imply
> exclusive access to the bitmaps. However, KCSAN reports quite a number
> of warnings related to find_bit() API. Some of them are not pointing
> to real bugs because in many situations people intentionally allow
> concurrent bitmap operations.
> 
> If so, find_bit() can be annotated such that KCSAN will ignore it:
> 
>         bit = data_race(find_first_bit(bitmap, nbits));

No, this is not a correct thing to do. If concurrent bitmap changes can
happen, find_first_bit() as it is currently implemented isn't ever a safe
choice because it can call __ffs(0) which is dangerous as you properly note
above. I proposed adding READ_ONCE() into find_first_bit() / find_next_bit()
implementation to fix this issue but you disliked that. So other option we
have is adding find_first_bit() and find_next_bit() variants that take
volatile 'addr' and we have to use these in code like xas_find_chunk()
which cannot be converted to your new helpers.

> This series addresses the other important case where people really need
> atomic find ops. As the following patches show, the resulting code
> looks safer and more verbose comparing to opencoded loops followed by
> atomic bit flips.
> 
> In [1] Mirsad reported 2% slowdown in a single-thread search test when
> switching find_bit() function to treat bitmaps as volatile arrays. On
> the other hand, kernel robot in the same thread reported +3.7% to the
> performance of will-it-scale.per_thread_ops test.

It was actually me who reported the regression here [2] but whatever :)

[2] https://lore.kernel.org/all/20231011150252.32737-1-jack@suse.cz

> Assuming that our compilers are sane and generate better code against
> properly annotated data, the above discrepancy doesn't look weird. When
> running on non-volatile bitmaps, plain find_bit() outperforms atomic
> find_and_bit(), and vice-versa.
> 
> So, all users of find_bit() API, where heavy concurrency is expected,
> are encouraged to switch to atomic find_and_bit() as appropriate.

Well, all users where any concurrency can happen should switch. Otherwise
they are prone to the (admittedly mostly theoretical) data race issue.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 00/35] bitops: add atomic find_bit() operations
@ 2023-12-04 18:51   ` Jan Kara
  0 siblings, 0 replies; 68+ messages in thread
From: Jan Kara @ 2023-12-04 18:51 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-sh, Jaroslav Kysela, Hans Verkuil, Md. Haris Iqbal,
	K. Y. Srinivasan, Bart Van Assche, Geert Uytterhoeven,
	Jiri Pirko, Christian Brauner, Nicholas Piggin, Alexey Klimov,
	Sergey Shtylyov, Thomas Gleixner, Karsten Keil,
	Greg Kroah-Hartman, linux-usb, linux-wireless, linux-kernel,
	GR-QLogic-Storage-Upstream, Andrew Morton, Mark Rutland,
	alsa-devel, Dave Hansen, Eric Dumazet, Gregory Greenman,
	linux-s390, Valentin Schneider, Leon Romanovsky, Will Deacon,
	mpi3mr-linuxdrv.pdl, Hugh Dickins, iommu, Martin Habets,
	linux-media, Stanislaw Gruszka, linux-arm-msm, Wenjia Zhang,
	linux-m68k, linux-arm-kernel, Sean Christopherson, Oliver Neukum,
	Mirsad Todorovac, linux-pci, Rasmus Villemoes, linux-hyperv,
	Matthew Wilcox, Jiri Slaby, Rob Herring, linux-rdma,
	Damien Le Moal, ath10k, David Disseldorp, Paolo Abeni,
	Fenghua Yu, Kees Cook, James E.J. Bot tomley, Akinobu Mita,
	Steven Rostedt, Borislav Petkov, Mauro Carvalho Chehab,
	Thomas Bogendoerfer, Edward Cree, Shuai Xue, netdev,
	David S. Miller, Rich Felker, Jan Kara, kvm, Peter Zijlstra,
	H. Peter Anvin, sparclinux, Ping-Ke Shih, linux-scsi,
	linux-net-drivers, x86, Jason Gunthorpe, Ingo Molnar,
	linux-serial, Jakub Kicinski, Chaitanya Kulkarni, Kalle Valo,
	linux-block, Hans de Goede, linux-sound, Andy Shevchenko,
	Maxim Kuvyrkov, Jens Axboe, Michal Simek, Yoshinori Sato,
	Robin Murphy, Bjorn Andersson, linux-mips, linux-bluetooth,
	dmaengine, Paolo Bonzini, Vitaly Kuznetsov, linuxppc-dev,
	Karsten Graul

Hello Yury!

On Sun 03-12-23 11:23:47, Yury Norov wrote:
> Add helpers around test_and_{set,clear}_bit() that allow to search for
> clear or set bits and flip them atomically.
> 
> The target patterns may look like this:
> 
> 	for (idx = 0; idx < nbits; idx++)
> 		if (test_and_clear_bit(idx, bitmap))
> 			do_something(idx);
> 
> Or like this:
> 
> 	do {
> 		bit = find_first_bit(bitmap, nbits);
> 		if (bit >= nbits)
> 			return nbits;
> 	} while (!test_and_clear_bit(bit, bitmap));
> 	return bit;
> 
> In both cases, the opencoded loop may be converted to a single function
> or iterator call. Correspondingly:
> 
> 	for_each_test_and_clear_bit(idx, bitmap, nbits)
> 		do_something(idx);
> 
> Or:
> 	return find_and_clear_bit(bitmap, nbits);

These are fine cleanups but they actually don't address the case that has
triggered all these changes - namely the xarray use of find_next_bit() in
xas_find_chunk().

...
> This series is a result of discussion [1]. All find_bit() functions imply
> exclusive access to the bitmaps. However, KCSAN reports quite a number
> of warnings related to find_bit() API. Some of them are not pointing
> to real bugs because in many situations people intentionally allow
> concurrent bitmap operations.
> 
> If so, find_bit() can be annotated such that KCSAN will ignore it:
> 
>         bit = data_race(find_first_bit(bitmap, nbits));

No, this is not a correct thing to do. If concurrent bitmap changes can
happen, find_first_bit() as it is currently implemented isn't ever a safe
choice because it can call __ffs(0) which is dangerous as you properly note
above. I proposed adding READ_ONCE() into find_first_bit() / find_next_bit()
implementation to fix this issue but you disliked that. So other option we
have is adding find_first_bit() and find_next_bit() variants that take
volatile 'addr' and we have to use these in code like xas_find_chunk()
which cannot be converted to your new helpers.

> This series addresses the other important case where people really need
> atomic find ops. As the following patches show, the resulting code
> looks safer and more verbose comparing to opencoded loops followed by
> atomic bit flips.
> 
> In [1] Mirsad reported 2% slowdown in a single-thread search test when
> switching find_bit() function to treat bitmaps as volatile arrays. On
> the other hand, kernel robot in the same thread reported +3.7% to the
> performance of will-it-scale.per_thread_ops test.

It was actually me who reported the regression here [2] but whatever :)

[2] https://lore.kernel.org/all/20231011150252.32737-1-jack@suse.cz

> Assuming that our compilers are sane and generate better code against
> properly annotated data, the above discrepancy doesn't look weird. When
> running on non-volatile bitmaps, plain find_bit() outperforms atomic
> find_and_bit(), and vice-versa.
> 
> So, all users of find_bit() API, where heavy concurrency is expected,
> are encouraged to switch to atomic find_and_bit() as appropriate.

Well, all users where any concurrency can happen should switch. Otherwise
they are prone to the (admittedly mostly theoretical) data race issue.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 14/35] PCI: hv: switch hv_get_dom_num() to use atomic find_bit()
  2023-12-03 19:32   ` [PATCH v2 14/35] PCI: hv: switch hv_get_dom_num() to use atomic find_bit() Yury Norov
  2023-12-04  5:58     ` Wei Liu
@ 2023-12-04 19:14     ` Bjorn Helgaas
  2023-12-08 18:34       ` Yury Norov
  1 sibling, 1 reply; 68+ messages in thread
From: Bjorn Helgaas @ 2023-12-04 19:14 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-kernel, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Lorenzo Pieralisi, Krzysztof Wilczyński,
	Rob Herring, Bjorn Helgaas, linux-hyperv, linux-pci, Jan Kara,
	Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov, Bart Van Assche,
	Sergey Shtylyov, Michael Kelley

On Sun, Dec 03, 2023 at 11:32:46AM -0800, Yury Norov wrote:
> The function traverses bitmap with for_each_clear_bit() just to allocate
> a bit atomically. We can do it better with a dedicated find_and_set_bit().

No objection from me, but please tweak the subject line to match
previous hv history, i.e., capitalize the first word after the prefix:

  PCI: hv: Use atomic find_and_set_bit()

I think there's value in using similar phrasing across the whole
series.  Some subjects say "optimize xyz()", some say "rework xyz()",
some "rework xyz()", etc.  I think it's more informative to include
the "atomic" and "find_bit()" ideas in the subject than the specific
functions that *use* it.

I also like how some of the other commit logs clearly say what the
patch does, e.g., "Simplify by using dedicated find_and_set_bit()", as
opposed to just "We can do it better ..." which technically doesn't
say what the patch does.

Very nice simplification in all these users, thanks for doing it!

I assume you'll merge these all together since they depend on [01/35],
so:

Acked-by: Bjorn Helgaas <bhelgaas@google.com>

> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> Reviewed-by: Michael Kelley <mhklinux@outlook.com>
> ---
>  drivers/pci/controller/pci-hyperv.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index 30c7dfeccb16..033b1fb7f4eb 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -3605,12 +3605,9 @@ static u16 hv_get_dom_num(u16 dom)
>  	if (test_and_set_bit(dom, hvpci_dom_map) == 0)
>  		return dom;
>  
> -	for_each_clear_bit(i, hvpci_dom_map, HVPCI_DOM_MAP_SIZE) {
> -		if (test_and_set_bit(i, hvpci_dom_map) == 0)
> -			return i;
> -	}
> +	i = find_and_set_bit(hvpci_dom_map, HVPCI_DOM_MAP_SIZE);
>  
> -	return HVPCI_DOM_INVALID;
> +	return i < HVPCI_DOM_MAP_SIZE ? i : HVPCI_DOM_INVALID;
>  }
>  
>  /**
> -- 
> 2.40.1
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 24/35] block: null_blk: fix opencoded find_and_set_bit() in get_tag()
  2023-12-03 19:32   ` [PATCH v2 24/35] block: null_blk: fix opencoded find_and_set_bit() in get_tag() Yury Norov
  2023-12-04 18:26     ` Jan Kara
@ 2023-12-05  2:39     ` Chengming Zhou
  1 sibling, 0 replies; 68+ messages in thread
From: Chengming Zhou @ 2023-12-05  2:39 UTC (permalink / raw)
  To: Yury Norov, linux-kernel, Jens Axboe, Damien Le Moal,
	Chaitanya Kulkarni, Ming Lei, Johannes Thumshirn, Nitesh Shetty,
	Akinobu Mita, Shin'ichiro Kawasaki, linux-block
  Cc: Jan Kara, Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov, Bart Van Assche,
	Sergey Shtylyov

On 2023/12/4 03:32, Yury Norov wrote:
> get_tag() opencodes find_and_set_bit(). Switch the code to use the
> dedicated function, and get rid of get_tag entirely.
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>

Looks good to me!

Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com>

> ---
>  drivers/block/null_blk/main.c | 41 +++++++++++------------------------
>  1 file changed, 13 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
> index 3021d58ca51c..671dbb9ab928 100644
> --- a/drivers/block/null_blk/main.c
> +++ b/drivers/block/null_blk/main.c
> @@ -760,19 +760,6 @@ static void put_tag(struct nullb_queue *nq, unsigned int tag)
>  		wake_up(&nq->wait);
>  }
>  
> -static unsigned int get_tag(struct nullb_queue *nq)
> -{
> -	unsigned int tag;
> -
> -	do {
> -		tag = find_first_zero_bit(nq->tag_map, nq->queue_depth);
> -		if (tag >= nq->queue_depth)
> -			return -1U;
> -	} while (test_and_set_bit_lock(tag, nq->tag_map));
> -
> -	return tag;
> -}
> -
>  static void free_cmd(struct nullb_cmd *cmd)
>  {
>  	put_tag(cmd->nq, cmd->tag);
> @@ -782,24 +769,22 @@ static enum hrtimer_restart null_cmd_timer_expired(struct hrtimer *timer);
>  
>  static struct nullb_cmd *__alloc_cmd(struct nullb_queue *nq)
>  {
> +	unsigned int tag = find_and_set_bit_lock(nq->tag_map, nq->queue_depth);
>  	struct nullb_cmd *cmd;
> -	unsigned int tag;
> -
> -	tag = get_tag(nq);
> -	if (tag != -1U) {
> -		cmd = &nq->cmds[tag];
> -		cmd->tag = tag;
> -		cmd->error = BLK_STS_OK;
> -		cmd->nq = nq;
> -		if (nq->dev->irqmode == NULL_IRQ_TIMER) {
> -			hrtimer_init(&cmd->timer, CLOCK_MONOTONIC,
> -				     HRTIMER_MODE_REL);
> -			cmd->timer.function = null_cmd_timer_expired;
> -		}
> -		return cmd;
> +
> +	if (tag >= nq->queue_depth)
> +		return NULL;
> +
> +	cmd = &nq->cmds[tag];
> +	cmd->tag = tag;
> +	cmd->error = BLK_STS_OK;
> +	cmd->nq = nq;
> +	if (nq->dev->irqmode == NULL_IRQ_TIMER) {
> +		hrtimer_init(&cmd->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> +		cmd->timer.function = null_cmd_timer_expired;
>  	}
>  
> -	return NULL;
> +	return cmd;
>  }
>  
>  static struct nullb_cmd *alloc_cmd(struct nullb_queue *nq, struct bio *bio)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 00/35] bitops: add atomic find_bit() operations
  2023-12-04 18:51   ` Jan Kara
@ 2023-12-06  5:22     ` Yury Norov
  -1 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-06  5:22 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-kernel, David S. Miller, H. Peter Anvin,
	James E.J. Bottomley, K. Y. Srinivasan, Md. Haris Iqbal,
	Akinobu Mita, Andrew Morton, Bjorn Andersson, Borislav Petkov,
	Chaitanya Kulkarni, Christian Brauner, Damien Le Moal,
	Dave Hansen, David Disseldorp, Edward Cree, Eric Dumazet,
	Fenghua Yu, Geert Uytterhoeven, Greg Kroah-Hartman,
	Gregory Greenman, Hans Verkuil, Hans de Goede, Hugh Dickins,
	Ingo Molnar, Jakub Kicinski, Jaroslav Kysela, Jason Gunthorpe,
	Jens Axboe, Jiri Pirko, Jiri Slaby, Kalle Valo, Karsten Graul,
	Karsten Keil, Kees Cook, Leon Romanovsky, Mark Rutland,
	Martin Habets, Mauro Carvalho Chehab, Michael Ellerman,
	Michal Simek, Nicholas Piggin, Oliver Neukum, Paolo Abeni,
	Paolo Bonzini, Peter Zijlstra, Ping-Ke Shih, Rich Felker,
	Rob Herring, Robin Murphy, Sean Christopherson, Shuai Xue,
	Stanislaw Gruszka, Steven Rostedt, Thomas Bogendoerfer,
	Thomas Gleixner, Valentin Schneider, Vitaly Kuznetsov,
	Wenjia Zhang, Will Deacon, Yoshinori Sato,
	GR-QLogic-Storage-Upstream, alsa-devel, ath10k, dmaengine, iommu,
	kvm, linux-arm-kernel, linux-arm-msm, linux-block,
	linux-bluetooth, linux-hyperv, linux-m68k, linux-media,
	linux-mips, linux-net-drivers, linux-pci, linux-rdma, linux-s390,
	linux-scsi, linux-serial, linux-sh, linux-sound, linux-usb,
	linux-wireless, linuxppc-dev, mpi3mr-linuxdrv.pdl, netdev,
	sparclinux, x86, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov,
	Bart Van Assche, Sergey Shtylyov

On Mon, Dec 04, 2023 at 07:51:01PM +0100, Jan Kara wrote:
> Hello Yury!
> 
> On Sun 03-12-23 11:23:47, Yury Norov wrote:
> > Add helpers around test_and_{set,clear}_bit() that allow to search for
> > clear or set bits and flip them atomically.
> > 
> > The target patterns may look like this:
> > 
> > 	for (idx = 0; idx < nbits; idx++)
> > 		if (test_and_clear_bit(idx, bitmap))
> > 			do_something(idx);
> > 
> > Or like this:
> > 
> > 	do {
> > 		bit = find_first_bit(bitmap, nbits);
> > 		if (bit >= nbits)
> > 			return nbits;
> > 	} while (!test_and_clear_bit(bit, bitmap));
> > 	return bit;
> > 
> > In both cases, the opencoded loop may be converted to a single function
> > or iterator call. Correspondingly:
> > 
> > 	for_each_test_and_clear_bit(idx, bitmap, nbits)
> > 		do_something(idx);
> > 
> > Or:
> > 	return find_and_clear_bit(bitmap, nbits);
> 
> These are fine cleanups but they actually don't address the case that has
> triggered all these changes - namely the xarray use of find_next_bit() in
> xas_find_chunk().
> 
> ...
> > This series is a result of discussion [1]. All find_bit() functions imply
> > exclusive access to the bitmaps. However, KCSAN reports quite a number
> > of warnings related to find_bit() API. Some of them are not pointing
> > to real bugs because in many situations people intentionally allow
> > concurrent bitmap operations.
> > 
> > If so, find_bit() can be annotated such that KCSAN will ignore it:
> > 
> >         bit = data_race(find_first_bit(bitmap, nbits));
> 
> No, this is not a correct thing to do. If concurrent bitmap changes can
> happen, find_first_bit() as it is currently implemented isn't ever a safe
> choice because it can call __ffs(0) which is dangerous as you properly note
> above. I proposed adding READ_ONCE() into find_first_bit() / find_next_bit()
> implementation to fix this issue but you disliked that. So other option we
> have is adding find_first_bit() and find_next_bit() variants that take
> volatile 'addr' and we have to use these in code like xas_find_chunk()
> which cannot be converted to your new helpers.

Here is some examples when concurrent operations with plain find_bit()
are acceptable:

 - two threads running find_*_bit(): safe wrt ffs(0) and returns correct
   value, because underlying bitmap is unchanged;
 - find_next_bit() in parallel with set or clear_bit(), when modifying
   a bit prior to the start bit to search: safe and correct;
 - find_first_bit() in parallel with set_bit(): safe, but may return wrong
   bit number;
 - find_first_zero_bit() in parallel with clear_bit(): same as above.

In last 2 cases find_bit() may not return a correct bit number, but
it may be OK if caller requires any (not exactly first) set or clear
bit, correspondingly.

In such cases, KCSAN may be safely silenced.
 
> > This series addresses the other important case where people really need
> > atomic find ops. As the following patches show, the resulting code
> > looks safer and more verbose comparing to opencoded loops followed by
> > atomic bit flips.
> > 
> > In [1] Mirsad reported 2% slowdown in a single-thread search test when
> > switching find_bit() function to treat bitmaps as volatile arrays. On
> > the other hand, kernel robot in the same thread reported +3.7% to the
> > performance of will-it-scale.per_thread_ops test.
> 
> It was actually me who reported the regression here [2] but whatever :)
> 
> [2] https://lore.kernel.org/all/20231011150252.32737-1-jack@suse.cz

My apologize.

> > Assuming that our compilers are sane and generate better code against
> > properly annotated data, the above discrepancy doesn't look weird. When
> > running on non-volatile bitmaps, plain find_bit() outperforms atomic
> > find_and_bit(), and vice-versa.
> > 
> > So, all users of find_bit() API, where heavy concurrency is expected,
> > are encouraged to switch to atomic find_and_bit() as appropriate.
> 
> Well, all users where any concurrency can happen should switch. Otherwise
> they are prone to the (admittedly mostly theoretical) data race issue.
> 
> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 00/35] bitops: add atomic find_bit() operations
@ 2023-12-06  5:22     ` Yury Norov
  0 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-06  5:22 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-sh, Jaroslav Kysela, Hans Verkuil, Md. Haris Iqbal,
	K. Y. Srinivasan, Bart Van Assche, Geert Uytterhoeven,
	Jiri Pirko, Christian Brauner, Nicholas Piggin, Alexey Klimov,
	Sergey Shtylyov, Thomas Gleixner, Karsten Keil,
	Greg Kroah-Hartman, linux-usb, linux-wireless, linux-kernel,
	GR-QLogic-Storage-Upstream, Andrew Morton, Mark Rutland,
	alsa-devel, Dave Hansen, Eric Dumazet, Gregory Greenman,
	linux-s390, Valentin Schneider, Leon Romanovsky, Will Deacon,
	mpi3mr-linuxdrv.pdl, Hugh Dickins, iommu, Martin Habets,
	linux-media, Stanislaw Gruszka, linux-arm-msm, Wenjia Zhang,
	linux-m68k, linux-arm-kernel, Sean Christopherson, Oliver Neukum,
	Mirsad Todorovac, linux-pci, Rasmus Villemoes, linux-hyperv,
	Matthew Wilcox, Jiri Slaby, Rob Herring, linux-rdma,
	Damien Le Moal, ath10k, David Disseldorp, Paolo Abeni,
	Fenghua Yu, Kees Cook, James E.J. Bot tomley, Akinobu Mita,
	Steven Rostedt, Borislav Petkov, Mauro Carvalho Chehab,
	Thomas Bogendoerfer, Edward Cree, Shuai Xue, netdev,
	David S. Miller, Rich Felker, kvm, Peter Zijlstra,
	H. Peter Anvin, sparclinux, Ping-Ke Shih, linux-scsi,
	linux-net-drivers, x86, Jason Gunthorpe, Ingo Molnar,
	linux-serial, Jakub Kicinski, Chaitanya Kulkarni, Kalle Valo,
	linux-block, Hans de Goede, linux-sound, Andy Shevchenko,
	Maxim Kuvyrkov

On Mon, Dec 04, 2023 at 07:51:01PM +0100, Jan Kara wrote:
> Hello Yury!
> 
> On Sun 03-12-23 11:23:47, Yury Norov wrote:
> > Add helpers around test_and_{set,clear}_bit() that allow to search for
> > clear or set bits and flip them atomically.
> > 
> > The target patterns may look like this:
> > 
> > 	for (idx = 0; idx < nbits; idx++)
> > 		if (test_and_clear_bit(idx, bitmap))
> > 			do_something(idx);
> > 
> > Or like this:
> > 
> > 	do {
> > 		bit = find_first_bit(bitmap, nbits);
> > 		if (bit >= nbits)
> > 			return nbits;
> > 	} while (!test_and_clear_bit(bit, bitmap));
> > 	return bit;
> > 
> > In both cases, the opencoded loop may be converted to a single function
> > or iterator call. Correspondingly:
> > 
> > 	for_each_test_and_clear_bit(idx, bitmap, nbits)
> > 		do_something(idx);
> > 
> > Or:
> > 	return find_and_clear_bit(bitmap, nbits);
> 
> These are fine cleanups but they actually don't address the case that has
> triggered all these changes - namely the xarray use of find_next_bit() in
> xas_find_chunk().
> 
> ...
> > This series is a result of discussion [1]. All find_bit() functions imply
> > exclusive access to the bitmaps. However, KCSAN reports quite a number
> > of warnings related to find_bit() API. Some of them are not pointing
> > to real bugs because in many situations people intentionally allow
> > concurrent bitmap operations.
> > 
> > If so, find_bit() can be annotated such that KCSAN will ignore it:
> > 
> >         bit = data_race(find_first_bit(bitmap, nbits));
> 
> No, this is not a correct thing to do. If concurrent bitmap changes can
> happen, find_first_bit() as it is currently implemented isn't ever a safe
> choice because it can call __ffs(0) which is dangerous as you properly note
> above. I proposed adding READ_ONCE() into find_first_bit() / find_next_bit()
> implementation to fix this issue but you disliked that. So other option we
> have is adding find_first_bit() and find_next_bit() variants that take
> volatile 'addr' and we have to use these in code like xas_find_chunk()
> which cannot be converted to your new helpers.

Here is some examples when concurrent operations with plain find_bit()
are acceptable:

 - two threads running find_*_bit(): safe wrt ffs(0) and returns correct
   value, because underlying bitmap is unchanged;
 - find_next_bit() in parallel with set or clear_bit(), when modifying
   a bit prior to the start bit to search: safe and correct;
 - find_first_bit() in parallel with set_bit(): safe, but may return wrong
   bit number;
 - find_first_zero_bit() in parallel with clear_bit(): same as above.

In last 2 cases find_bit() may not return a correct bit number, but
it may be OK if caller requires any (not exactly first) set or clear
bit, correspondingly.

In such cases, KCSAN may be safely silenced.
 
> > This series addresses the other important case where people really need
> > atomic find ops. As the following patches show, the resulting code
> > looks safer and more verbose comparing to opencoded loops followed by
> > atomic bit flips.
> > 
> > In [1] Mirsad reported 2% slowdown in a single-thread search test when
> > switching find_bit() function to treat bitmaps as volatile arrays. On
> > the other hand, kernel robot in the same thread reported +3.7% to the
> > performance of will-it-scale.per_thread_ops test.
> 
> It was actually me who reported the regression here [2] but whatever :)
> 
> [2] https://lore.kernel.org/all/20231011150252.32737-1-jack@suse.cz

My apologize.

> > Assuming that our compilers are sane and generate better code against
> > properly annotated data, the above discrepancy doesn't look weird. When
> > running on non-volatile bitmaps, plain find_bit() outperforms atomic
> > find_and_bit(), and vice-versa.
> > 
> > So, all users of find_bit() API, where heavy concurrency is expected,
> > are encouraged to switch to atomic find_and_bit() as appropriate.
> 
> Well, all users where any concurrency can happen should switch. Otherwise
> they are prone to the (admittedly mostly theoretical) data race issue.
> 
> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 00/35] bitops: add atomic find_bit() operations
  2023-12-06  5:22     ` Yury Norov
@ 2023-12-07  9:10       ` Jan Kara
  -1 siblings, 0 replies; 68+ messages in thread
From: Jan Kara @ 2023-12-07  9:10 UTC (permalink / raw)
  To: Yury Norov
  Cc: Jan Kara, linux-kernel, David S. Miller, H. Peter Anvin,
	James E.J. Bottomley, K. Y. Srinivasan, Md. Haris Iqbal,
	Akinobu Mita, Andrew Morton, Bjorn Andersson, Borislav Petkov,
	Chaitanya Kulkarni, Christian Brauner, Damien Le Moal,
	Dave Hansen, David Disseldorp, Edward Cree, Eric Dumazet,
	Fenghua Yu, Geert Uytterhoeven, Greg Kroah-Hartman,
	Gregory Greenman, Hans Verkuil, Hans de Goede, Hugh Dickins,
	Ingo Molnar, Jakub Kicinski, Jaroslav Kysela, Jason Gunthorpe,
	Jens Axboe, Jiri Pirko, Jiri Slaby, Kalle Valo, Karsten Graul,
	Karsten Keil, Kees Cook, Leon Romanovsky, Mark Rutland,
	Martin Habets, Mauro Carvalho Chehab, Michael Ellerman,
	Michal Simek, Nicholas Piggin, Oliver Neukum, Paolo Abeni,
	Paolo Bonzini, Peter Zijlstra, Ping-Ke Shih, Rich Felker,
	Rob Herring, Robin Murphy, Sean Christopherson, Shuai Xue,
	Stanislaw Gruszka, Steven Rostedt, Thomas Bogendoerfer,
	Thomas Gleixner, Valentin Schneider, Vitaly Kuznetsov,
	Wenjia Zhang, Will Deacon, Yoshinori Sato,
	GR-QLogic-Storage-Upstream, alsa-devel, ath10k, dmaengine, iommu,
	kvm, linux-arm-kernel, linux-arm-msm, linux-block,
	linux-bluetooth, linux-hyperv, linux-m68k, linux-media,
	linux-mips, linux-net-drivers, linux-pci, linux-rdma, linux-s390,
	linux-scsi, linux-serial, linux-sh, linux-sound, linux-usb,
	linux-wireless, linuxppc-dev, mpi3mr-linuxdrv.pdl, netdev,
	sparclinux, x86, Mirsad Todorovac, Matthew Wilcox,
	Rasmus Villemoes, Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov,
	Bart Van Assche, Sergey Shtylyov

On Tue 05-12-23 21:22:59, Yury Norov wrote:
> On Mon, Dec 04, 2023 at 07:51:01PM +0100, Jan Kara wrote:
> > > This series is a result of discussion [1]. All find_bit() functions imply
> > > exclusive access to the bitmaps. However, KCSAN reports quite a number
> > > of warnings related to find_bit() API. Some of them are not pointing
> > > to real bugs because in many situations people intentionally allow
> > > concurrent bitmap operations.
> > > 
> > > If so, find_bit() can be annotated such that KCSAN will ignore it:
> > > 
> > >         bit = data_race(find_first_bit(bitmap, nbits));
> > 
> > No, this is not a correct thing to do. If concurrent bitmap changes can
> > happen, find_first_bit() as it is currently implemented isn't ever a safe
> > choice because it can call __ffs(0) which is dangerous as you properly note
> > above. I proposed adding READ_ONCE() into find_first_bit() / find_next_bit()
> > implementation to fix this issue but you disliked that. So other option we
> > have is adding find_first_bit() and find_next_bit() variants that take
> > volatile 'addr' and we have to use these in code like xas_find_chunk()
> > which cannot be converted to your new helpers.
> 
> Here is some examples when concurrent operations with plain find_bit()
> are acceptable:
> 
>  - two threads running find_*_bit(): safe wrt ffs(0) and returns correct
>    value, because underlying bitmap is unchanged;
>  - find_next_bit() in parallel with set or clear_bit(), when modifying
>    a bit prior to the start bit to search: safe and correct;
>  - find_first_bit() in parallel with set_bit(): safe, but may return wrong
>    bit number;
>  - find_first_zero_bit() in parallel with clear_bit(): same as above.
> 
> In last 2 cases find_bit() may not return a correct bit number, but
> it may be OK if caller requires any (not exactly first) set or clear
> bit, correspondingly.
> 
> In such cases, KCSAN may be safely silenced.

True - but these are special cases. In particular the case in xas_find_chunk()
is not any of these special cases. It is using find_next_bit() which is can
be racing with clear_bit(). So what are your plans for such usecase?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 00/35] bitops: add atomic find_bit() operations
@ 2023-12-07  9:10       ` Jan Kara
  0 siblings, 0 replies; 68+ messages in thread
From: Jan Kara @ 2023-12-07  9:10 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-sh, Jaroslav Kysela, Hans Verkuil, Md. Haris Iqbal,
	K. Y. Srinivasan, Bart Van Assche, Geert Uytterhoeven,
	Jiri Pirko, Christian Brauner, Nicholas Piggin, Alexey Klimov,
	Sergey Shtylyov, Thomas Gleixner, Karsten Keil,
	Greg Kroah-Hartman, linux-usb, linux-wireless, linux-kernel,
	GR-QLogic-Storage-Upstream, Andrew Morton, Mark Rutland,
	alsa-devel, Dave Hansen, Eric Dumazet, Gregory Greenman,
	linux-s390, Valentin Schneider, Leon Romanovsky, Will Deacon,
	mpi3mr-linuxdrv.pdl, Hugh Dickins, iommu, Martin Habets,
	linux-media, Stanislaw Gruszka, linux-arm-msm, Wenjia Zhang,
	linux-m68k, linux-arm-kernel, Sean Christopherson, Oliver Neukum,
	Mirsad Todorovac, linux-pci, Rasmus Villemoes, linux-hyperv,
	Matthew Wilcox, Jiri Slaby, Rob Herring, linux-rdma,
	Damien Le Moal, ath10k, David Disseldorp, Paolo Abeni,
	Fenghua Yu, Kees Cook, James E.J. Bot tomley, Akinobu Mita,
	Steven Rostedt, Borislav Petkov, Mauro Carvalho Chehab,
	Thomas Bogendoerfer, Edward Cree, Shuai Xue, netdev,
	David S. Miller, Rich Felker, Jan Kara, kvm, Peter Zijlstra,
	H. Peter Anvin, sparclinux, Ping-Ke Shih, linux-scsi,
	linux-net-drivers, x86, Jason Gunthorpe, Ingo Molnar,
	linux-serial, Jakub Kicinski, Chaitanya Kulkarni, Kalle Valo,
	linux-block, Hans de Goede, linux-sound, Andy Shevchenko,
	Maxim Kuvyrkov, Jens Axboe, Michal Simek, Yoshinori Sato,
	linuxppc-dev, Bjorn Andersson, linux-mips, linux-bluetooth,
	dmaengine, Paolo Bonzini, Vitaly Kuznetsov, Robin Murphy,
	Karsten Graul

On Tue 05-12-23 21:22:59, Yury Norov wrote:
> On Mon, Dec 04, 2023 at 07:51:01PM +0100, Jan Kara wrote:
> > > This series is a result of discussion [1]. All find_bit() functions imply
> > > exclusive access to the bitmaps. However, KCSAN reports quite a number
> > > of warnings related to find_bit() API. Some of them are not pointing
> > > to real bugs because in many situations people intentionally allow
> > > concurrent bitmap operations.
> > > 
> > > If so, find_bit() can be annotated such that KCSAN will ignore it:
> > > 
> > >         bit = data_race(find_first_bit(bitmap, nbits));
> > 
> > No, this is not a correct thing to do. If concurrent bitmap changes can
> > happen, find_first_bit() as it is currently implemented isn't ever a safe
> > choice because it can call __ffs(0) which is dangerous as you properly note
> > above. I proposed adding READ_ONCE() into find_first_bit() / find_next_bit()
> > implementation to fix this issue but you disliked that. So other option we
> > have is adding find_first_bit() and find_next_bit() variants that take
> > volatile 'addr' and we have to use these in code like xas_find_chunk()
> > which cannot be converted to your new helpers.
> 
> Here is some examples when concurrent operations with plain find_bit()
> are acceptable:
> 
>  - two threads running find_*_bit(): safe wrt ffs(0) and returns correct
>    value, because underlying bitmap is unchanged;
>  - find_next_bit() in parallel with set or clear_bit(), when modifying
>    a bit prior to the start bit to search: safe and correct;
>  - find_first_bit() in parallel with set_bit(): safe, but may return wrong
>    bit number;
>  - find_first_zero_bit() in parallel with clear_bit(): same as above.
> 
> In last 2 cases find_bit() may not return a correct bit number, but
> it may be OK if caller requires any (not exactly first) set or clear
> bit, correspondingly.
> 
> In such cases, KCSAN may be safely silenced.

True - but these are special cases. In particular the case in xas_find_chunk()
is not any of these special cases. It is using find_next_bit() which is can
be racing with clear_bit(). So what are your plans for such usecase?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 14/35] PCI: hv: switch hv_get_dom_num() to use atomic find_bit()
  2023-12-04 19:14     ` Bjorn Helgaas
@ 2023-12-08 18:34       ` Yury Norov
  0 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-08 18:34 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-kernel, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Lorenzo Pieralisi, Krzysztof Wilczyński,
	Rob Herring, Bjorn Helgaas, linux-hyperv, linux-pci, Jan Kara,
	Mirsad Todorovac, Matthew Wilcox, Rasmus Villemoes,
	Andy Shevchenko, Maxim Kuvyrkov, Alexey Klimov, Bart Van Assche,
	Sergey Shtylyov, Michael Kelley

On Mon, Dec 04, 2023 at 01:14:27PM -0600, Bjorn Helgaas wrote:
> On Sun, Dec 03, 2023 at 11:32:46AM -0800, Yury Norov wrote:
> > The function traverses bitmap with for_each_clear_bit() just to allocate
> > a bit atomically. We can do it better with a dedicated find_and_set_bit().
> 
> No objection from me, but please tweak the subject line to match
> previous hv history, i.e., capitalize the first word after the prefix:
> 
>   PCI: hv: Use atomic find_and_set_bit()
> 
> I think there's value in using similar phrasing across the whole
> series.  Some subjects say "optimize xyz()", some say "rework xyz()",
> some "rework xyz()", etc.  I think it's more informative to include
> the "atomic" and "find_bit()" ideas in the subject than the specific
> functions that *use* it.
> 
> I also like how some of the other commit logs clearly say what the
> patch does, e.g., "Simplify by using dedicated find_and_set_bit()", as
> opposed to just "We can do it better ..." which technically doesn't
> say what the patch does.
> 
> Very nice simplification in all these users, thanks for doing it!
> 
> I assume you'll merge these all together since they depend on [01/35],
> so:
> 
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>

Thank you Bjorn!

Now as many people asked to move their subsystems patch together with
#1, I think, if no objections, it's simpler to pull all the series in
bitmap-for-next.

I'm going to align commit messages wording, as you suggested, address
some other comments, and will send v3 this weekend.

Thanks,
Yury

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 31/35] net: smc: use find_and_set_bit() in smc_wr_tx_get_free_slot_index()
  2023-12-04  9:40     ` Alexandra Winter
@ 2023-12-11 22:34       ` Yury Norov
  0 siblings, 0 replies; 68+ messages in thread
From: Yury Norov @ 2023-12-11 22:34 UTC (permalink / raw)
  To: Alexandra Winter
  Cc: linux-kernel, Karsten Graul, Wenjia Zhang, Jan Karcher, D. Wythe,
	Tony Lu, Wen Gu, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, linux-s390, netdev, Jan Kara, Mirsad Todorovac,
	Matthew Wilcox, Rasmus Villemoes, Andy Shevchenko,
	Maxim Kuvyrkov, Alexey Klimov, Bart Van Assche, Sergey Shtylyov

On Mon, Dec 04, 2023 at 10:40:20AM +0100, Alexandra Winter wrote:
> 
> 
> On 03.12.23 20:33, Yury Norov wrote:
> > The function opencodes find_and_set_bit() with a for_each() loop. Use
> > it, and make the whole function a simple almost one-liner.
> > 
> > While here, drop explicit initialization of *idx, because it's already
> > initialized by the caller in case of ENOLINK, or set properly with
> > ->wr_tx_mask, if nothing is found, in case of EBUSY.
> > 
> > CC: Tony Lu <tonylu@linux.alibaba.com>
> > CC: Alexandra Winter <wintera@linux.ibm.com>
> > Signed-off-by: Yury Norov <yury.norov@gmail.com>
> > ---
> 
> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
> 
> 
> Thanks a lot for the great helper function!
> I guess the top-level maintainers will figure out, how this series best finds its way upstream.

Thanks, Alexandra. :)

People in this thread say just pick their subsystem patch together
with #1. So, I'm going to send v3 with some minor tweaks, and if
everything is OK, will pull all this in my bitmap-for-next branch.

Thanks,
Yury

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 35/35] sh: mach-x3proto: rework ilsel_enable()
  2023-12-03 19:33   ` [PATCH v2 35/35] sh: mach-x3proto: rework ilsel_enable() Yury Norov
@ 2024-01-19  8:48     ` Geert Uytterhoeven
  0 siblings, 0 replies; 68+ messages in thread
From: Geert Uytterhoeven @ 2024-01-19  8:48 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-kernel, Yoshinori Sato, Rich Felker,
	John Paul Adrian Glaubitz, linux-sh, Jan Kara, Mirsad Todorovac,
	Matthew Wilcox, Rasmus Villemoes, Andy Shevchenko,
	Maxim Kuvyrkov, Alexey Klimov, Bart Van Assche, Sergey Shtylyov

On Sun, Dec 3, 2023 at 8:34 PM Yury Norov <yury.norov@gmail.com> wrote:
> Fix opencoded find_and_set_bit(), which also suppresses potential
> KCSAN warning.
>
> CC: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
> Signed-off-by: Yury Norov <yury.norov@gmail.com>

Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>

> --- a/arch/sh/boards/mach-x3proto/ilsel.c
> +++ b/arch/sh/boards/mach-x3proto/ilsel.c
> @@ -99,8 +99,8 @@ int ilsel_enable(ilsel_source_t set)
>         }
>
>         do {
> -               bit = find_first_zero_bit(&ilsel_level_map, ILSEL_LEVELS);
> -       } while (test_and_set_bit(bit, &ilsel_level_map));
> +               bit = find_and_set_bit(&ilsel_level_map, ILSEL_LEVELS);
> +       } while (bit >= ILSEL_LEVELS);
>
>         __ilsel_enable(set, bit);

BTW, I don't think the old code worked as intended: the first time no
free bit is found, bit would have been ILSEL_LEVELS, and
test_and_set_bit() would have returned false, thus terminating the loop,
and continuing with an out-of-range bit value?  Hence to work correctly,
bit ILSEL_LEVELS of ilsel_level_map should have been initialized to one?
Or am I missing something?

The new code does not have that issue.

Anyway, this should probably never happen in real life.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2024-01-19  8:49 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-03 19:23 [PATCH v2 00/35] bitops: add atomic find_bit() operations Yury Norov
2023-12-03 19:23 ` Yury Norov
2023-12-03 19:23 ` [PATCH v2 01/35] lib/find: add atomic find_bit() primitives Yury Norov
2023-12-03 19:23   ` Yury Norov
2023-12-03 19:56   ` bitops: add atomic find_bit() operations bluez.test.bot
2023-12-03 19:32 ` [PATCH v2 02/35] lib/find: add test for atomic find_bit() ops Yury Norov
2023-12-03 19:32   ` Yury Norov
2023-12-03 19:32   ` [PATCH v2 03/35] lib/sbitmap; make __sbitmap_get_word() using find_and_set_bit() Yury Norov
2023-12-04 18:22     ` Jan Kara
2023-12-04 18:40     ` Jens Axboe
2023-12-03 19:32   ` [PATCH v2 04/35] watch_queue: use atomic find_bit() in post_one_notification() Yury Norov
2023-12-03 19:32   ` [PATCH v2 05/35] sched: add cpumask_find_and_set() and use it in __mm_cid_get() Yury Norov
2023-12-03 19:32   ` [PATCH v2 06/35] mips: sgi-ip30: rework heart_alloc_int() Yury Norov
2023-12-03 19:32   ` [PATCH v2 07/35] sparc: fix opencoded find_and_set_bit() in alloc_msi() Yury Norov
2023-12-03 19:32   ` [PATCH v2 08/35] perf/arm: optimize opencoded atomic find_bit() API Yury Norov
2023-12-03 19:32     ` Yury Norov
2023-12-03 19:32   ` [PATCH v2 09/35] drivers/perf: optimize ali_drw_get_counter_idx() by using find_bit() Yury Norov
2023-12-03 19:32     ` Yury Norov
2023-12-03 19:32   ` [PATCH v2 10/35] dmaengine: idxd: optimize perfmon_assign_event() Yury Norov
2023-12-03 19:32   ` [PATCH v2 11/35] ath10k: optimize ath10k_snoc_napi_poll() by using find_bit() Yury Norov
2023-12-03 19:32   ` [PATCH v2 12/35] wifi: rtw88: optimize rtw_pci_tx_kick_off() " Yury Norov
2023-12-03 19:32   ` [PATCH v2 13/35] KVM: x86: hyper-v: optimize and cleanup kvm_hv_process_stimers() Yury Norov
2023-12-04  9:53     ` Vitaly Kuznetsov
2023-12-04 16:00       ` Sean Christopherson
2023-12-03 19:32   ` [PATCH v2 14/35] PCI: hv: switch hv_get_dom_num() to use atomic find_bit() Yury Norov
2023-12-04  5:58     ` Wei Liu
2023-12-04 19:14     ` Bjorn Helgaas
2023-12-08 18:34       ` Yury Norov
2023-12-03 19:32   ` [PATCH v2 15/35] scsi: core: use atomic find_bit() API where appropriate Yury Norov
2023-12-03 19:32   ` [PATCH v2 16/35] scsi: mpi3mr: switch to using atomic find_and_set_bit() Yury Norov
2023-12-03 19:32   ` [PATCH v2 17/35] scsi: qedi: rework qedi_get_task_idx() Yury Norov
2023-12-03 19:32   ` [PATCH v2 18/35] powerpc: use atomic find_bit() API where appropriate Yury Norov
2023-12-03 19:32     ` Yury Norov
2023-12-03 19:32   ` [PATCH v2 19/35] iommu: " Yury Norov
2023-12-03 19:32     ` Yury Norov
2023-12-03 19:32   ` [PATCH v2 20/35] media: radio-shark: " Yury Norov
2023-12-04  8:43     ` Hans Verkuil
2023-12-03 19:32   ` [PATCH v2 21/35] sfc: switch to using " Yury Norov
2023-12-03 19:32   ` [PATCH v2 22/35] tty: nozomi: optimize interrupt_handler() Yury Norov
2023-12-03 19:32   ` [PATCH v2 23/35] usb: cdc-acm: optimize acm_softint() Yury Norov
2023-12-03 19:32   ` [PATCH v2 24/35] block: null_blk: fix opencoded find_and_set_bit() in get_tag() Yury Norov
2023-12-04 18:26     ` Jan Kara
2023-12-05  2:39     ` Chengming Zhou
2023-12-03 19:32   ` [PATCH v2 25/35] RDMA/rtrs: fix opencoded find_and_set_bit_lock() in __rtrs_get_permit() Yury Norov
2023-12-03 19:32   ` [PATCH v2 26/35] mISDN: optimize get_free_devid() Yury Norov
2023-12-03 19:32   ` [PATCH v2 27/35] media: em28xx: cx231xx: fix opencoded find_and_set_bit() Yury Norov
2023-12-04  8:39     ` Hans Verkuil
2023-12-04 13:05       ` Andy Shevchenko
2023-12-04 13:09         ` Hans Verkuil
2023-12-03 19:33   ` [PATCH v2 28/35] ethernet: rocker: optimize ofdpa_port_internal_vlan_id_get() Yury Norov
2023-12-03 19:33   ` [PATCH v2 29/35] serial: sc12is7xx: optimize sc16is7xx_alloc_line() Yury Norov
2023-12-03 19:33   ` [PATCH v2 30/35] bluetooth: optimize cmtp_alloc_block_id() Yury Norov
2023-12-03 19:33   ` [PATCH v2 31/35] net: smc: use find_and_set_bit() in smc_wr_tx_get_free_slot_index() Yury Norov
2023-12-04  9:40     ` Alexandra Winter
2023-12-11 22:34       ` Yury Norov
2023-12-03 19:33   ` [PATCH v2 32/35] ALSA: use atomic find_bit() functions where applicable Yury Norov
2023-12-03 19:33   ` [PATCH v2 33/35] m68k: rework get_mmu_context() Yury Norov
2023-12-03 19:33   ` [PATCH v2 34/35] microblaze: " Yury Norov
2023-12-03 19:33   ` [PATCH v2 35/35] sh: mach-x3proto: rework ilsel_enable() Yury Norov
2024-01-19  8:48     ` Geert Uytterhoeven
2023-12-04 13:07 ` [PATCH v2 00/35] bitops: add atomic find_bit() operations Andy Shevchenko
2023-12-04 13:07   ` Andy Shevchenko
2023-12-04 18:51 ` Jan Kara
2023-12-04 18:51   ` Jan Kara
2023-12-06  5:22   ` Yury Norov
2023-12-06  5:22     ` Yury Norov
2023-12-07  9:10     ` Jan Kara
2023-12-07  9:10       ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.