* [GIT PULL] bitmap patches for v6.8 @ 2024-01-08 17:05 Yury Norov 2024-01-21 21:47 ` Linus Torvalds 0 siblings, 1 reply; 4+ messages in thread From: Yury Norov @ 2024-01-08 17:05 UTC (permalink / raw) To: Linus Torvalds, Linux Kernel Mailing List Cc: Yury Norov, Alexandra Winter, Andy Shevchenko, Bart Van Assche, Bjorn Helgaas, Chengming Zhou, Dave Jiang, Edward Cree, Fenghua Yu, Geert Uytterhoeven, Greg Ungerer, Guanjun, Hans Verkuil, Jan Kara, Jens Axboe, John Paul Adrian Glaubitz, Mathieu Desnoyers, Michael Kelley, Oliver Neukum, Peter Zijlstra, Rasmus Villemoes, Sean Christopherson, Takashi Iwai, Tony Lu, Vinod Koul, Vitaly Kuznetsov, Wei Liu, Wen Gu, Will Deacon The following changes since commit 33cc938e65a98f1d29d0a18403dbbee050dcad9a: Linux 6.7-rc4 (2023-12-03 18:52:56 +0900) are available in the Git repository at: https://github.com/norov/linux.git/ tags/bitmap-for-6.8 for you to fetch changes up to 071ad962baf5e857fd965595421cf6fb588610ed: bitmap: Step down as a reviewer (2024-01-01 12:50:08 -0800) ---------------------------------------------------------------- Hi Linus, Please pull bitmap patches for v6.8. There is a couple of random cleanup patches, and the rest of the request is a series adding atomic find_bit() operations: https://patchwork.kernel.org/project/linux-media/cover/20231212022749.625238-1-yury.norov@gmail.com/ Most of the patches are arch and drivers code where transition to the new API is straightforward. And most of them (~20/35) are reviewed by corresponding maintainers. For less actively supported subsystems I didn't receive a feedback. Many people asked to pull patches for their drivers together with the head patch of the series. And because the transition is quite clean, I decided to move the whole series in this request, including that unreviewed material. Please let me know if it doesn't work for you, and I'll resend it with the only reviewed patches. Thanks, Yury ---------------------------------------------------------------- Andy Shevchenko (1): bitmap: Step down as a reviewer Guanjun (1): lib/find_bit: Fix the code comments about find_next_bit_wrap Yury Norov (36): lib/find: optimize find_*_bit_wrap lib/find: add atomic find_bit() primitives lib/find: add test for atomic find_bit() ops lib/sbitmap; optimize __sbitmap_get_word() by using find_and_set_bit() watch_queue: optimize post_one_notification() by using find_and_clear_bit() sched: add cpumask_find_and_set() and use it in __mm_cid_get() mips: sgi-ip30: optimize heart_alloc_int() by using find_and_set_bit() sparc: optimize alloc_msi() by using find_and_set_bit() perf/arm: use atomic find_bit() API drivers/perf: optimize ali_drw_get_counter_idx() by using find_and_set_bit() dmaengine: idxd: optimize perfmon_assign_event() ath10k: optimize ath10k_snoc_napi_poll() wifi: rtw88: optimize the driver by using atomic iterator KVM: x86: hyper-v: optimize and cleanup kvm_hv_process_stimers() PCI: hv: Optimize hv_get_dom_num() by using find_and_set_bit() scsi: core: optimize scsi_evt_emit() by using an atomic iterator scsi: mpi3mr: optimize the driver by using find_and_set_bit() scsi: qedi: optimize qedi_get_task_idx() by using find_and_set_bit() powerpc: optimize arch code by using atomic find_bit() API iommu: optimize subsystem by using atomic find_bit() API media: radio-shark: optimize the driver by using atomic find_bit() API sfc: optimize the driver by using atomic find_bit() API tty: nozomi: optimize interrupt_handler() usb: cdc-acm: optimize acm_softint() block: null_blk: replace get_tag() with a generic find_and_set_bit_lock() RDMA/rtrs: optimize __rtrs_get_permit() by using find_and_set_bit_lock() mISDN: optimize get_free_devid() media: em28xx: cx231xx: optimize drivers by using find_and_set_bit() ethernet: rocker: optimize ofdpa_port_internal_vlan_id_get() serial: sc12is7xx: optimize sc16is7xx_alloc_line() bluetooth: optimize cmtp_alloc_block_id() net: smc: optimize smc_wr_tx_get_free_slot_index() ALSA: use atomic find_bit() functions where applicable m68k: optimize get_mmu_context() microblaze: optimize get_mmu_context() sh: mach-x3proto: optimize ilsel_enable() MAINTAINERS | 1 - arch/m68k/include/asm/mmu_context.h | 11 +- arch/microblaze/include/asm/mmu_context_mm.h | 11 +- arch/mips/sgi-ip30/ip30-irq.c | 12 +- arch/powerpc/mm/book3s32/mmu_context.c | 10 +- arch/powerpc/platforms/pasemi/dma_lib.c | 45 +--- arch/powerpc/platforms/powernv/pci-sriov.c | 12 +- arch/sh/boards/mach-x3proto/ilsel.c | 4 +- arch/sparc/kernel/pci_msi.c | 9 +- arch/x86/kvm/hyperv.c | 40 ++-- drivers/block/null_blk/main.c | 41 ++-- drivers/dma/idxd/perfmon.c | 8 +- drivers/infiniband/ulp/rtrs/rtrs-clt.c | 15 +- drivers/iommu/arm/arm-smmu/arm-smmu.h | 10 +- drivers/iommu/msm_iommu.c | 18 +- drivers/isdn/mISDN/core.c | 9 +- drivers/media/radio/radio-shark.c | 5 +- drivers/media/radio/radio-shark2.c | 5 +- drivers/media/usb/cx231xx/cx231xx-cards.c | 16 +- drivers/media/usb/em28xx/em28xx-cards.c | 37 ++-- drivers/net/ethernet/rocker/rocker_ofdpa.c | 11 +- drivers/net/ethernet/sfc/rx_common.c | 4 +- drivers/net/ethernet/sfc/siena/rx_common.c | 4 +- drivers/net/ethernet/sfc/siena/siena_sriov.c | 14 +- drivers/net/wireless/ath/ath10k/snoc.c | 9 +- drivers/net/wireless/realtek/rtw88/pci.c | 5 +- drivers/net/wireless/realtek/rtw89/pci.c | 5 +- drivers/pci/controller/pci-hyperv.c | 7 +- drivers/perf/alibaba_uncore_drw_pmu.c | 10 +- drivers/perf/arm-cci.c | 24 +-- drivers/perf/arm-ccn.c | 10 +- drivers/perf/arm_dmc620_pmu.c | 9 +- drivers/perf/arm_pmuv3.c | 8 +- drivers/scsi/mpi3mr/mpi3mr_os.c | 21 +- drivers/scsi/qedi/qedi_main.c | 9 +- drivers/scsi/scsi_lib.c | 7 +- drivers/tty/nozomi.c | 5 +- drivers/tty/serial/sc16is7xx.c | 8 +- drivers/usb/class/cdc-acm.c | 5 +- include/linux/cpumask.h | 12 ++ include/linux/find.h | 301 ++++++++++++++++++++++++++- kernel/sched/sched.h | 14 +- kernel/watch_queue.c | 6 +- lib/find_bit.c | 85 ++++++++ lib/sbitmap.c | 46 +--- lib/test_bitmap.c | 61 ++++++ net/bluetooth/cmtp/core.c | 10 +- net/smc/smc_wr.c | 10 +- sound/pci/hda/hda_codec.c | 7 +- sound/usb/caiaq/audio.c | 13 +- 50 files changed, 635 insertions(+), 424 deletions(-) ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [GIT PULL] bitmap patches for v6.8 2024-01-08 17:05 [GIT PULL] bitmap patches for v6.8 Yury Norov @ 2024-01-21 21:47 ` Linus Torvalds 2024-01-21 23:35 ` Yury Norov 0 siblings, 1 reply; 4+ messages in thread From: Linus Torvalds @ 2024-01-21 21:47 UTC (permalink / raw) To: Yury Norov Cc: Linux Kernel Mailing List, Alexandra Winter, Andy Shevchenko, Bart Van Assche, Bjorn Helgaas, Chengming Zhou, Dave Jiang, Edward Cree, Fenghua Yu, Geert Uytterhoeven, Greg Ungerer, Guanjun, Hans Verkuil, Jan Kara, Jens Axboe, John Paul Adrian Glaubitz, Mathieu Desnoyers, Michael Kelley, Oliver Neukum, Peter Zijlstra, Rasmus Villemoes, Sean Christopherson, Takashi Iwai, Tony Lu, Vinod Koul, Vitaly Kuznetsov, Wei Liu, Wen Gu, Will Deacon So I've left this to be my last pull request, because I hate how our header files are growing, and this part: include/linux/find.h | 301 ++++++++++++++++++++++++++++++- 1 file changed, 297 insertions(+), 4 deletions(-) in particular. Nobody includes <linux/find.h> directly, but indirectly pretty much *every* single kernel C file includes it. Looking at some basic stats of my dependency files in my tree, 4426 of 4526 object files (~98%) depend on find.h because they get it through *some* path that ends up with bitmap.h -> find.h. And honestly, the number of files that actually want the new functions is basically just a tiny handful. It's also not obvious how useful those optimizations are, considering that a lot of the loops are *tiny*. I looked at a few cases, and the size of the bitmap it was iterating over was often in the 2-4 range, sometimes (like RTW89_TXCH_NUM) 13, etc. In radio-shark, you replaced a loop like this for (i = 0; i < 2; i++) { with that for_each_test_and_clear_bit(), and it *really* isn't clear that it was worth it. It sure wasn't performance-critical to begin with. In general, if an "optimization" doesn't have any performance numbers attached to it, is it an optimization at all? So I finally ended up pulling this, but after looking at the patch I went "this is adding more lines than it removes, has no performance numbers, and grows a core header file that is included by absolutely everything by a third". .. and then I decided to just unpull it again. Linus ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [GIT PULL] bitmap patches for v6.8 2024-01-21 21:47 ` Linus Torvalds @ 2024-01-21 23:35 ` Yury Norov 2024-01-22 19:46 ` Jan Kara 0 siblings, 1 reply; 4+ messages in thread From: Yury Norov @ 2024-01-21 23:35 UTC (permalink / raw) To: Linus Torvalds Cc: Linux Kernel Mailing List, Alexandra Winter, Andy Shevchenko, Bart Van Assche, Bjorn Helgaas, Chengming Zhou, Dave Jiang, Edward Cree, Fenghua Yu, Geert Uytterhoeven, Greg Ungerer, Guanjun, Hans Verkuil, Jan Kara, Jens Axboe, John Paul Adrian Glaubitz, Mathieu Desnoyers, Michael Kelley, Oliver Neukum, Peter Zijlstra, Rasmus Villemoes, Sean Christopherson, Takashi Iwai, Tony Lu, Vinod Koul, Vitaly Kuznetsov, Wei Liu, Wen Gu, Will Deacon On Sun, Jan 21, 2024 at 01:47:21PM -0800, Linus Torvalds wrote: > So I've left this to be my last pull request, because I hate how our > header files are growing, and this part: > > include/linux/find.h | 301 ++++++++++++++++++++++++++++++- > 1 file changed, 297 insertions(+), 4 deletions(-) > > in particular. > > Nobody includes <linux/find.h> directly, but indirectly pretty much > *every* single kernel C file includes it. > > Looking at some basic stats of my dependency files in my tree, 4426 of > 4526 object files (~98%) depend on find.h because they get it through > *some* path that ends up with bitmap.h -> find.h. > > And honestly, the number of files that actually want the new functions > is basically just a tiny handful. It's also not obvious how useful > those optimizations are, considering that a lot of the loops are > *tiny*. I looked at a few cases, and the size of the bitmap it was > iterating over was often in the 2-4 range, sometimes (like > RTW89_TXCH_NUM) 13, etc. > > In radio-shark, you replaced a loop like this > > for (i = 0; i < 2; i++) { > > with that for_each_test_and_clear_bit(), and it *really* isn't clear > that it was worth it. It sure wasn't performance-critical to begin > with. > > In general, if an "optimization" doesn't have any performance numbers > attached to it, is it an optimization at all? No, this is not a performance optimization, and I don't claim that. Jan Kara reported some performance improvement, but performance is not the main goal of the series, and I didn't run performance tests for this on myself. The original motivation came from the fact that using non-volatile find_bit() together with volatile test_and_set_bit() may trigger KCSAN warning on concurrent memory access. People wanted to make the whole find API volatile, which is a bad idea for sure. So I had to give them a reasonable alternative. After some thinking I decided that we just need a separate set of volatile find API. Check this for initial discussion: https://lore.kernel.org/lkml/634f5fdf-e236-42cf-be8d-48a581c21660@alu.unizg.hr/T/#m3e7341eb3571753f3acf8fe166f3fb5b2c12e615 It also makes the code cleaner, at least to my taste, and some reviewers' too. And to some degree less bug-prone. For example, ILSEL_LEVELS is just 15, but traversing code in ilsel_enable() is buggy, as Geert spotted. And switching to atomic find() fixes it automatically: https://lore.kernel.org/lkml/CAMuHMdWHxesM-EOOMtrrw3Caz+5Wux35QiKOjvwA=vwQpRe26Q@mail.gmail.com/T/#me53217b32fd5623af6c7eafa5c4ce6d0465f6c58 (His review came just a couple days ago, after I submitted the pull request, so the tag is not there.) > So I finally ended up pulling this, but after looking at the patch I > went "this is adding more lines than it removes, has no performance > numbers, and grows a core header file that is included by absolutely > everything by a third". > > .. and then I decided to just unpull it again. Yes, it's true. Bitmap.h historically includes everything related to bitmaps, and it became too big. I started moving some chunks out of it already. For example, aae06fc1b5a2e splits out string-related bitmap code to a separate bitmap-str.h. That patch was more about human readability, and it keeps the order, so that bitmap.h includes bitmap-str.h, but we can easily turn that around. I wanted to do it some day, but actually nobody complained before now, and I wanted to collect more material to make it a series. I still think that kernel needs atomic find_bit() API. If you change your mind and pull the series, I can make patches that split-out atomic find() declarations to a separate header, not included by bitmap.h, and submit it for -rc2, together with bitmap-str.h rework. Thanks, Yury ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [GIT PULL] bitmap patches for v6.8 2024-01-21 23:35 ` Yury Norov @ 2024-01-22 19:46 ` Jan Kara 0 siblings, 0 replies; 4+ messages in thread From: Jan Kara @ 2024-01-22 19:46 UTC (permalink / raw) To: Yury Norov Cc: Linus Torvalds, Linux Kernel Mailing List, Alexandra Winter, Andy Shevchenko, Bart Van Assche, Bjorn Helgaas, Chengming Zhou, Dave Jiang, Edward Cree, Fenghua Yu, Geert Uytterhoeven, Greg Ungerer, Guanjun, Hans Verkuil, Jan Kara, Jens Axboe, John Paul Adrian Glaubitz, Mathieu Desnoyers, Michael Kelley, Oliver Neukum, Peter Zijlstra, Rasmus Villemoes, Sean Christopherson, Takashi Iwai, Tony Lu, Vinod Koul, Vitaly Kuznetsov, Wei Liu, Wen Gu, Will Deacon On Sun 21-01-24 15:35:18, Yury Norov wrote: > On Sun, Jan 21, 2024 at 01:47:21PM -0800, Linus Torvalds wrote: > > So I've left this to be my last pull request, because I hate how our > > header files are growing, and this part: > > > > include/linux/find.h | 301 ++++++++++++++++++++++++++++++- > > 1 file changed, 297 insertions(+), 4 deletions(-) > > > > in particular. > > > > Nobody includes <linux/find.h> directly, but indirectly pretty much > > *every* single kernel C file includes it. > > > > Looking at some basic stats of my dependency files in my tree, 4426 of > > 4526 object files (~98%) depend on find.h because they get it through > > *some* path that ends up with bitmap.h -> find.h. > > > > And honestly, the number of files that actually want the new functions > > is basically just a tiny handful. It's also not obvious how useful > > those optimizations are, considering that a lot of the loops are > > *tiny*. I looked at a few cases, and the size of the bitmap it was > > iterating over was often in the 2-4 range, sometimes (like > > RTW89_TXCH_NUM) 13, etc. > > > > In radio-shark, you replaced a loop like this > > > > for (i = 0; i < 2; i++) { > > > > with that for_each_test_and_clear_bit(), and it *really* isn't clear > > that it was worth it. It sure wasn't performance-critical to begin > > with. > > > > In general, if an "optimization" doesn't have any performance numbers > > attached to it, is it an optimization at all? > > No, this is not a performance optimization, and I don't claim that. > Jan Kara reported some performance improvement, but performance is not > the main goal of the series, and I didn't run performance tests for > this on myself. > > The original motivation came from the fact that using non-volatile > find_bit() together with volatile test_and_set_bit() may trigger > KCSAN warning on concurrent memory access. Maybe to save Linus some reading: It is not only about KCSAN, it is a real (theoretical) race. In principle the problem is that find_next_bit() does something like: if (*addr) return __ffs(*addr) and if the compiler decides to refetch *addr and we race with concurrent writer bad things can happen. Now I wanted to fix this in a less intrusive way using READ_ONCE() (https://lore.kernel.org/linux-fsdevel/20231011150252.32737-1-jack@suse.cz/) but Yury didn't like that and came up with this series. > People wanted to make the whole find API volatile, which is a bad idea > for sure. So I had to give them a reasonable alternative. After some > thinking I decided that we just need a separate set of volatile find API. > Check this for initial discussion: > > https://lore.kernel.org/lkml/634f5fdf-e236-42cf-be8d-48a581c21660@alu.unizg.hr/T/#m3e7341eb3571753f3acf8fe166f3fb5b2c12e615 > > It also makes the code cleaner, at least to my taste, and some > reviewers' too. And to some degree less bug-prone. For example, > ILSEL_LEVELS is just 15, but traversing code in ilsel_enable() > is buggy, as Geert spotted. And switching to atomic find() fixes > it automatically: > > https://lore.kernel.org/lkml/CAMuHMdWHxesM-EOOMtrrw3Caz+5Wux35QiKOjvwA=vwQpRe26Q@mail.gmail.com/T/#me53217b32fd5623af6c7eafa5c4ce6d0465f6c58 > > (His review came just a couple days ago, after I submitted the pull > request, so the tag is not there.) I would just note that this pull as is still does not address the original issue in xarray when using find_next_bit() and I'm not yet sure how/if you plan to address it. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-01-22 19:46 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-01-08 17:05 [GIT PULL] bitmap patches for v6.8 Yury Norov 2024-01-21 21:47 ` Linus Torvalds 2024-01-21 23:35 ` Yury Norov 2024-01-22 19:46 ` Jan Kara
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).