All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/49] bitmap: optimize bitmap_weight() usage
@ 2022-02-10 22:48 Yury Norov
  2022-02-10 22:48 ` [PATCH 01/49] net: dsa: don't use bitmap_weight() in b53_arl_read() Yury Norov
                   ` (49 more replies)
  0 siblings, 50 replies; 121+ messages in thread
From: Yury Norov @ 2022-02-10 22:48 UTC (permalink / raw)
  To: Yury Norov, Andy Shevchenko, Rasmus Villemoes, Andrew Morton,
	Michał Mirosław, Greg Kroah-Hartman, Peter Zijlstra,
	David Laight, Joe Perches, Dennis Zhou, Emil Renner Berthing,
	Nicholas Piggin, Matti Vaittinen, Alexey Klimov, linux-kernel

In many cases people use bitmap_weight()-based functions to compare
the result against a number of expression:

        if (cpumask_weight(mask) > 1)
                do_something();

This may take considerable amount of time on many-cpus machines because
cpumask_weight() will traverse every word of underlying cpumask
unconditionally.

We can significantly improve on it for many real cases if stop traversing
the mask as soon as we count cpus to any number greater than 1:

        if (cpumask_weight_gt(mask, 1))
                do_something();

The first part of the series is a cleanup and rework where bitmap_weight
API is used wrongly.

Second part converts cpumask_weight() to cpumask_empty() if the number
to compare with is 0. Ditto for bitmap_weight() and nodes_weight().

In the 3nd part of the series bitmap_weight_cmp() is added together with
bitmap_weight_{eq,gt,ge,lt,le} wrappers on top of it. Corresponding
wrappers for cpumask and nodemask are added as well.

The rough numbers of new functions usage, as counted by grep:

	{bitmap,cpumask,nodes}_weight_eq	26
	{bitmap,cpumask,nodes}_weight_ge	25
	{bitmap,cpumask,nodes}_weight_gt	19
	{bitmap,cpumask,nodes}_weight_le	18
	{bitmap,cpumask,nodes}_weight_lt	14

v1: https://lkml.org/lkml/2021/11/27/339
v2: https://lkml.org/lkml/2021/12/18/241
v3: https://lkml.org/lkml/2022/1/27/913
v4: 
 - rebase on next-20220209;
 - exclude patches that already in next-20220209;
 - drop patches 41, 43, 47, 48 from v3 as they are not performance
   critical;
 - deeply rework iio_simple_dummy_trigger_h (patch #4) and
   qed_rdma_bmap_free (#10), instead of replacing bitmap_weight;
 - use more standard tags.

Yury Norov (49):
  net: dsa: don't use bitmap_weight() in b53_arl_read()
  net: systemport: don't use bitmap_weight() in bcm_sysport_rule_set()
  net: mellanox: fix open-coded for_each_set_bit()
  iio: fix opencoded for_each_set_bit()
  qed: rework qed_rdma_bmap_free()
  nds32: perf: replace bitmap_weight with bitmap_empty where appropriate
  KVM: x86: replace bitmap_weight with bitmap_empty where appropriate
  drm: replace bitmap_weight with bitmap_empty where appropriate
  ice: replace bitmap_weight with bitmap_empty for intel
  octeontx2-pf: replace bitmap_weight with bitmap_empty for Marvell
  qed: replace bitmap_weight with bitmap_empty in qed_roce_stop()
  perf/arm-cci: replace bitmap_weight with bitmap_empty where
    appropriate
  perf tools: replace bitmap_weight with bitmap_empty where appropriate
  arch/alpha: replace cpumask_weight with cpumask_empty where
    appropriate
  arch/ia64: replace cpumask_weight with cpumask_empty where appropriate
  arch/x86: replace cpumask_weight with cpumask_empty where appropriate
  cpufreq: replace cpumask_weight with cpumask_empty where appropriate
  gpu: drm: replace cpumask_weight with cpumask_empty where appropriate
  RDMA/hfi: replace cpumask_weight with cpumask_empty where appropriate
  irq: mips: replace cpumask_weight with cpumask_empty where appropriate
  genirq/affinity: replace cpumask_weight with cpumask_empty where
    appropriate
  sched: replace cpumask_weight with cpumask_empty where appropriate
  clocksource: replace cpumask_weight with cpumask_empty in
    clocksource.c
  mm/vmstat: replace cpumask_weight with cpumask_empty where appropriate
  arch/x86: replace nodes_weight with nodes_empty where appropriate
  bitmap: add bitmap_weight_{cmp, eq, gt, ge, lt, le} functions
  arch/x86: replace bitmap_weight with bitmap_weight_{eq,gt,ge,lt,le}
    where appropriate
  iio: replace bitmap_weight() with bitmap_weight_{eq,gt} where
    appropriate
  memstick: replace bitmap_weight with bitmap_weight_eq where
    appropriate
  ixgbe: replace bitmap_weight with bitmap_weight_eq for intel
  octeontx2-pf: replace bitmap_weight with bitmap_weight_{eq,gt} for
    OcteonTX2
  mlx4: replace bitmap_weight with bitmap_weight_{eq,gt,ge,lt,le} for
    mellanox
  perf: replace bitmap_weight with bitmap_weight_eq for ThunderX2
  media: tegra-video:: replace bitmap_weight with bitmap_weight_le
  cpumask: add cpumask_weight_{eq,gt,ge,lt,le}
  arch/ia64: replace cpumask_weight with cpumask_weight_eq in mm/tlb.c
  arch/mips: replace cpumask_weight with cpumask_weight_{eq, ...} where
    appropriate
  arch/powerpc: replace cpumask_weight with cpumask_weight_{eq, ...}
    where appropriate
  arch/s390: replace cpumask_weight with cpumask_weight_eq where
    appropriate
  firmware: pcsi: replace cpumask_weight with cpumask_weight_eq
  RDMA/hfi1: replace cpumask_weight with cpumask_weight_{eq, ...} where
    appropriate
  scsi: lpfc: replace cpumask_weight with cpumask_weight_gt
  soc/qman: replace cpumask_weight with cpumask_weight_lt
  nodemask: add nodemask_weight_{eq,gt,ge,lt,le}
  ACPI: replace nodes__weight with nodes_weight_ge for numa
  mm/mempolicy: replace nodes_weight with nodes_weight_eq
  nodemask: add num_node_state_eq()
  tools: bitmap: sync bitmap_weight
  MAINTAINERS: add cpumask and nodemask files to BITMAP_API

 MAINTAINERS                                   |  4 +
 arch/alpha/kernel/process.c                   |  2 +-
 arch/ia64/kernel/setup.c                      |  2 +-
 arch/ia64/mm/tlb.c                            |  2 +-
 arch/mips/cavium-octeon/octeon-irq.c          |  4 +-
 arch/mips/kernel/crash.c                      |  2 +-
 arch/nds32/kernel/perf_event_cpu.c            |  2 +-
 arch/powerpc/kernel/smp.c                     |  2 +-
 arch/powerpc/kernel/watchdog.c                |  2 +-
 arch/powerpc/xmon/xmon.c                      |  4 +-
 arch/s390/kernel/perf_cpum_cf.c               |  2 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c        | 16 ++--
 arch/x86/kvm/hyperv.c                         |  8 +-
 arch/x86/mm/amdtopology.c                     |  2 +-
 arch/x86/mm/mmio-mod.c                        |  2 +-
 arch/x86/mm/numa_emulation.c                  |  4 +-
 arch/x86/platform/uv/uv_nmi.c                 |  2 +-
 drivers/acpi/numa/srat.c                      |  2 +-
 drivers/cpufreq/qcom-cpufreq-hw.c             |  2 +-
 drivers/cpufreq/scmi-cpufreq.c                |  2 +-
 drivers/firmware/psci/psci_checker.c          |  2 +-
 drivers/gpu/drm/i915/i915_pmu.c               |  2 +-
 drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c      |  2 +-
 drivers/iio/dummy/iio_simple_dummy_buffer.c   | 48 +++++-------
 drivers/iio/industrialio-trigger.c            |  2 +-
 drivers/infiniband/hw/hfi1/affinity.c         | 13 ++--
 drivers/infiniband/hw/qib/qib_file_ops.c      |  2 +-
 drivers/infiniband/hw/qib/qib_iba7322.c       |  2 +-
 drivers/irqchip/irq-bcm6345-l1.c              |  2 +-
 drivers/memstick/core/ms_block.c              |  4 +-
 drivers/net/dsa/b53/b53_common.c              |  6 +-
 drivers/net/ethernet/broadcom/bcmsysport.c    |  6 +-
 .../net/ethernet/intel/ice/ice_virtchnl_pf.c  |  4 +-
 .../net/ethernet/intel/ixgbe/ixgbe_sriov.c    |  2 +-
 .../marvell/octeontx2/nic/otx2_ethtool.c      |  2 +-
 .../marvell/octeontx2/nic/otx2_flows.c        |  8 +-
 .../ethernet/marvell/octeontx2/nic/otx2_pf.c  |  2 +-
 drivers/net/ethernet/mellanox/mlx4/cmd.c      | 33 +++-----
 drivers/net/ethernet/mellanox/mlx4/eq.c       |  4 +-
 drivers/net/ethernet/mellanox/mlx4/fw.c       |  4 +-
 drivers/net/ethernet/mellanox/mlx4/main.c     |  2 +-
 drivers/net/ethernet/qlogic/qed/qed_rdma.c    | 45 ++++-------
 drivers/net/ethernet/qlogic/qed/qed_roce.c    |  2 +-
 drivers/perf/arm-cci.c                        |  2 +-
 drivers/perf/arm_pmu.c                        |  4 +-
 drivers/perf/hisilicon/hisi_uncore_pmu.c      |  2 +-
 drivers/perf/thunderx2_pmu.c                  |  4 +-
 drivers/perf/xgene_pmu.c                      |  2 +-
 drivers/scsi/lpfc/lpfc_init.c                 |  2 +-
 drivers/soc/fsl/qbman/qman_test_stash.c       |  2 +-
 drivers/staging/media/tegra-video/vi.c        |  2 +-
 include/linux/bitmap.h                        | 78 +++++++++++++++++++
 include/linux/cpumask.h                       | 50 ++++++++++++
 include/linux/nodemask.h                      | 40 ++++++++++
 kernel/irq/affinity.c                         |  2 +-
 kernel/sched/core.c                           |  2 +-
 kernel/sched/topology.c                       |  2 +-
 kernel/time/clocksource.c                     |  2 +-
 lib/bitmap.c                                  | 21 +++++
 mm/mempolicy.c                                |  2 +-
 mm/page_alloc.c                               |  2 +-
 mm/vmstat.c                                   |  4 +-
 tools/include/linux/bitmap.h                  | 44 +++++++++++
 tools/lib/bitmap.c                            | 20 +++++
 tools/perf/builtin-c2c.c                      |  4 +-
 tools/perf/util/pmu.c                         |  2 +-
 66 files changed, 384 insertions(+), 178 deletions(-)

-- 
2.32.0


^ permalink raw reply	[flat|nested] 121+ messages in thread

end of thread, other threads:[~2022-06-11 13:41 UTC | newest]

Thread overview: 121+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-10 22:48 [PATCH v4 00/49] bitmap: optimize bitmap_weight() usage Yury Norov
2022-02-10 22:48 ` [PATCH 01/49] net: dsa: don't use bitmap_weight() in b53_arl_read() Yury Norov
2022-02-10 22:48 ` [PATCH 02/49] net: systemport: don't use bitmap_weight() in bcm_sysport_rule_set() Yury Norov
2022-02-10 22:48 ` [PATCH 03/49] net: mellanox: fix open-coded for_each_set_bit() Yury Norov
2022-02-11  9:01   ` David Laight
2022-02-10 22:48 ` [PATCH 04/49] iio: fix opencoded for_each_set_bit() Yury Norov
2022-02-11  8:45   ` Andy Shevchenko
2022-02-11 17:17   ` Christophe JAILLET
2022-06-04 15:41     ` Jonathan Cameron
2022-06-11 13:50       ` Jonathan Cameron
2022-02-10 22:48 ` [RFC PATCH 05/49] qed: rework qed_rdma_bmap_free() Yury Norov
2022-02-11  8:48   ` Andy Shevchenko
2022-02-10 22:48 ` [PATCH 06/49] nds32: perf: replace bitmap_weight with bitmap_empty where appropriate Yury Norov
2022-02-10 22:48 ` [PATCH 07/49] KVM: x86: " Yury Norov
2022-02-11 16:34   ` Sean Christopherson
2022-02-11 17:13   ` Christophe JAILLET
2022-02-11 17:19     ` Sean Christopherson
2022-02-11 17:47       ` Yury Norov
2022-02-10 22:48 ` [PATCH 08/49] drm: " Yury Norov
2022-02-11  2:11   ` Dmitry Baryshkov
2022-02-11  2:11     ` Dmitry Baryshkov
2022-02-10 22:48 ` [PATCH 09/49] ice: replace bitmap_weight with bitmap_empty Yury Norov
2022-02-10 22:48   ` [Intel-wired-lan] " Yury Norov
2022-02-10 22:48 ` [PATCH 10/49] octeontx2-pf: replace bitmap_weight with bitmap_empty where appropriate Yury Norov
2022-02-10 22:48 ` [PATCH 11/49] qed: replace bitmap_weight with bitmap_empty in qed_roce_stop() Yury Norov
2022-02-10 22:48 ` [PATCH 12/49] perf: replace bitmap_weight with bitmap_empty where appropriate Yury Norov
2022-02-10 22:48   ` Yury Norov
2022-02-11 10:25   ` Mark Rutland
2022-02-11 10:25     ` Mark Rutland
2022-02-11 17:59     ` Yury Norov
2022-02-11 17:59       ` Yury Norov
2022-02-11 17:27   ` Christophe JAILLET
2022-02-11 17:27     ` Christophe JAILLET
2022-02-11 23:23     ` Yury Norov
2022-02-11 23:23       ` Yury Norov
2022-02-10 22:48 ` [PATCH 13/49] perf tools: " Yury Norov
2022-02-10 22:48 ` [PATCH 14/49] arch/alpha: replace cpumask_weight with cpumask_empty " Yury Norov
2022-02-10 22:48   ` Yury Norov
2022-02-10 22:48 ` [PATCH 15/49] arch/ia64: " Yury Norov
2022-02-10 22:48   ` Yury Norov
2022-02-10 22:49 ` [PATCH 16/49] arch/x86: " Yury Norov
2022-02-10 22:49   ` [Nouveau] " Yury Norov
2022-04-10 20:42   ` [tip: x86/cleanups] x86: Replace cpumask_weight() with cpumask_empty() " tip-bot2 for Yury Norov
2022-02-10 22:49 ` [PATCH 17/49] cpufreq: replace cpumask_weight with cpumask_empty " Yury Norov
2022-02-10 22:49   ` Yury Norov
2022-02-11  4:30   ` Viresh Kumar
2022-02-11  4:30     ` Viresh Kumar
2022-02-11  5:17     ` Yury Norov
2022-02-11  5:17       ` Yury Norov
2022-02-10 22:49 ` [PATCH 18/49] drm/i915/pmu: " Yury Norov
2022-02-10 22:49   ` [Intel-gfx] " Yury Norov
2022-02-10 22:49 ` [PATCH 19/49] RDMA/hfi: " Yury Norov
2022-02-11 19:10   ` Jason Gunthorpe
2022-02-10 22:49 ` [PATCH 20/49] irq: mips: " Yury Norov
2022-04-10 20:34   ` [tip: irq/core] irqchip/bmips: Replace cpumask_weight() with cpumask_empty() tip-bot2 for Yury Norov
2022-02-10 22:49 ` [PATCH 21/49] genirq/affinity: replace cpumask_weight with cpumask_empty where appropriate Yury Norov
2022-04-10 20:27   ` [tip: irq/core] genirq/affinity: Replace cpumask_weight() with cpumask_empty() " tip-bot2 for Yury Norov
     [not found]     ` <573841649622719@mail.yandex.com>
2022-04-10 21:17       ` Yury Norov
2022-02-10 22:49 ` [PATCH 22/49] sched: replace cpumask_weight with cpumask_empty " Yury Norov
2022-02-11 10:19   ` Peter Zijlstra
2022-02-11 14:19     ` Yury Norov
2022-02-17 18:56   ` [tip: sched/core] " tip-bot2 for Yury Norov
2022-02-10 22:49 ` [PATCH 23/49] clocksource: replace cpumask_weight with cpumask_empty in clocksource.c Yury Norov
2022-04-10 20:35   ` [tip: timers/core] clocksource: Replace cpumask_weight() with cpumask_empty() tip-bot2 for Yury Norov
2022-02-10 22:49 ` [PATCH 24/49] mm/vmstat: replace cpumask_weight with cpumask_empty where appropriate Yury Norov
2022-02-11 10:39   ` Mike Rapoport
2022-02-10 22:49 ` [PATCH 25/49] arch/x86: replace nodes_weight with nodes_empty " Yury Norov
2022-04-10 20:42   ` [tip: x86/cleanups] x86/mm: Replace nodes_weight() with nodes_empty() " tip-bot2 for Yury Norov
2022-02-10 22:49 ` [PATCH 26/49] bitmap: add bitmap_weight_{cmp, eq, gt, ge, lt, le} functions Yury Norov
2022-02-10 22:49 ` [PATCH 27/49] arch/x86: replace bitmap_weight with bitmap_weight_{eq,gt,ge,lt,le} where appropriate Yury Norov
2022-02-10 22:49 ` [PATCH 28/49] iio: replace bitmap_weight() with bitmap_weight_{eq,gt} " Yury Norov
2022-02-10 22:49   ` [Intel-wired-lan] [PATCH 28/49] iio: replace bitmap_weight() with bitmap_weight_{eq, gt} " Yury Norov
2022-02-10 22:49 ` [PATCH 29/49] memstick: replace bitmap_weight with bitmap_weight_eq " Yury Norov
2022-02-17 15:39   ` Ulf Hansson
2022-02-17 16:55     ` Yury Norov
2022-02-22 15:49       ` Ulf Hansson
2022-02-10 22:49 ` [PATCH 30/49] ixgbe: replace bitmap_weight with bitmap_weight_eq Yury Norov
2022-02-10 22:49   ` [Intel-wired-lan] " Yury Norov
2022-02-10 22:49 ` [PATCH 31/49] octeontx2-pf: replace bitmap_weight with bitmap_weight_{eq,gt} Yury Norov
2022-02-10 22:49 ` [PATCH 32/49] mlx4: replace bitmap_weight with bitmap_weight_{eq,gt,ge,lt,le} Yury Norov
2022-02-10 22:49 ` [PATCH 33/49] perf: replace bitmap_weight with bitmap_weight_eq for ThunderX2 Yury Norov
2022-02-10 22:49   ` Yury Norov
2022-02-11 10:30   ` Mark Rutland
2022-02-11 10:30     ` Mark Rutland
2022-02-10 22:49 ` [PATCH 34/49] media: tegra-video: replace bitmap_weight with bitmap_weight_le Yury Norov
2022-04-28  7:31   ` Hans Verkuil
2022-02-10 22:49 ` [PATCH 35/49] cpumask: add cpumask_weight_{eq,gt,ge,lt,le} Yury Norov
2022-02-10 22:49 ` [PATCH 36/49] arch/ia64: replace cpumask_weight with cpumask_weight_eq in mm/tlb.c Yury Norov
2022-02-10 22:49   ` Yury Norov
2022-02-10 22:49 ` [PATCH 37/49] arch/mips: replace cpumask_weight with cpumask_weight_{eq, ...} where appropriate Yury Norov
2022-02-10 22:49 ` [PATCH 38/49] arch/powerpc: " Yury Norov
2022-02-11  4:10   ` Michael Ellerman
2022-02-10 22:49 ` [PATCH 39/49] arch/s390: replace cpumask_weight with cpumask_weight_eq " Yury Norov
2022-02-11  6:54   ` Sven Schnelle
2022-02-11 23:40     ` Yury Norov
2022-02-10 22:49 ` [PATCH 40/49] firmware: pcsi: replace cpumask_weight with cpumask_weight_eq Yury Norov
2022-02-10 22:49   ` Yury Norov
2022-02-11  9:45   ` Sudeep Holla
2022-02-11  9:45     ` Sudeep Holla
2022-02-11 10:32   ` Mark Rutland
2022-02-11 10:32     ` Mark Rutland
2022-02-10 22:49 ` [PATCH 41/49] RDMA/hfi1: replace cpumask_weight with cpumask_weight_{eq, ...} where appropriate Yury Norov
2022-02-11 19:11   ` Jason Gunthorpe
2022-02-10 22:49 ` [PATCH 42/49] scsi: lpfc: replace cpumask_weight with cpumask_weight_gt Yury Norov
2022-02-10 22:49 ` [PATCH 43/49] soc/qman: replace cpumask_weight with cpumask_weight_lt Yury Norov
2022-02-10 22:49   ` Yury Norov
2022-02-10 22:49 ` [PATCH 44/49] nodemask: add nodemask_weight_{eq,gt,ge,lt,le} Yury Norov
2022-02-10 22:49 ` [PATCH 45/49] ACPI: replace nodes__weight with nodes_weight_ge for numa Yury Norov
2022-02-14 19:18   ` Rafael J. Wysocki
2022-02-14 19:34     ` Yury Norov
2022-02-14 19:45       ` Rafael J. Wysocki
2022-02-14 19:55         ` Yury Norov
2022-02-10 22:49 ` [PATCH 46/49] mm/mempolicy: replace nodes_weight with nodes_weight_eq Yury Norov
2022-02-11 10:40   ` Mike Rapoport
2022-02-11 17:44   ` Christophe JAILLET
2022-02-11 19:47     ` Yury Norov
2022-02-10 22:49 ` [PATCH 47/49] nodemask: add num_node_state_eq() Yury Norov
2022-02-11 10:41   ` Mike Rapoport
2022-02-10 22:49 ` [PATCH 48/49] tools: bitmap: sync bitmap_weight Yury Norov
2022-02-10 22:49 ` [PATCH 49/49] MAINTAINERS: add cpumask and nodemask files to BITMAP_API Yury Norov
2022-02-15 23:18 ` [PATCH v4 00/49] bitmap: optimize bitmap_weight() usage Will Deacon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.