* [RFC PATCH] bit_spinlock: introduce smp_cond_load_relaxed @ 2018-10-13 6:47 Gao Xiang 2018-10-13 7:04 ` Greg Kroah-Hartman 2018-10-30 6:04 ` [PATCH v2] " Gao Xiang 0 siblings, 2 replies; 19+ messages in thread From: Gao Xiang @ 2018-10-13 6:47 UTC (permalink / raw) To: Philippe Ombredanne, Kate Stewart, Thomas Gleixner, Greg Kroah-Hartman Cc: linux-kernel, Miao Xie, Chao Yu, Gao Xiang It is better to use smp_cond_load_relaxed instead of busy waiting for bit_spinlock. Signed-off-by: Gao Xiang <hsiangkao@aol.com> --- include/linux/bit_spinlock.h | 21 ++++++++------------- 1 file changed, 8 insertions(+), 13 deletions(-) diff --git a/include/linux/bit_spinlock.h b/include/linux/bit_spinlock.h index bbc4730..713efc4 100644 --- a/include/linux/bit_spinlock.h +++ b/include/linux/bit_spinlock.h @@ -15,22 +15,17 @@ */ static inline void bit_spin_lock(int bitnum, unsigned long *addr) { - /* - * Assuming the lock is uncontended, this never enters - * the body of the outer loop. If it is contended, then - * within the inner loop a non-atomic test is used to - * busywait with less bus contention for a good time to - * attempt to acquire the lock bit. - */ - preempt_disable(); #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) - while (unlikely(test_and_set_bit_lock(bitnum, addr))) { - preempt_enable(); - do { - cpu_relax(); - } while (test_bit(bitnum, addr)); + while (1) { + smp_cond_load_relaxed(&addr[BIT_WORD(bitnum)], + !(VAL >> (bitnum & (BITS_PER_LONG-1)))); preempt_disable(); + if (!test_and_set_bit_lock(bitnum, addr)) + break; + preempt_enable(); } +#else + preempt_disable(); #endif __acquire(bitlock); } -- 2.7.4 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [RFC PATCH] bit_spinlock: introduce smp_cond_load_relaxed 2018-10-13 6:47 [RFC PATCH] bit_spinlock: introduce smp_cond_load_relaxed Gao Xiang @ 2018-10-13 7:04 ` Greg Kroah-Hartman 2018-10-13 7:22 ` Gao Xiang 2018-10-30 6:04 ` [PATCH v2] " Gao Xiang 1 sibling, 1 reply; 19+ messages in thread From: Greg Kroah-Hartman @ 2018-10-13 7:04 UTC (permalink / raw) To: Gao Xiang Cc: Philippe Ombredanne, Kate Stewart, Thomas Gleixner, linux-kernel, Miao Xie, Chao Yu On Sat, Oct 13, 2018 at 02:47:29PM +0800, Gao Xiang wrote: > It is better to use smp_cond_load_relaxed instead > of busy waiting for bit_spinlock. Why? I think we need some kind of "proof" that this is true before being able to accept a patch like this, don't you agree? thanks, greg k-h ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH] bit_spinlock: introduce smp_cond_load_relaxed 2018-10-13 7:04 ` Greg Kroah-Hartman @ 2018-10-13 7:22 ` Gao Xiang 2018-10-13 7:30 ` Greg Kroah-Hartman 2018-10-13 7:30 ` Gao Xiang 0 siblings, 2 replies; 19+ messages in thread From: Gao Xiang @ 2018-10-13 7:22 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Philippe Ombredanne, Kate Stewart, Thomas Gleixner, linux-kernel, Miao Xie, Chao Yu Hi Greg, On 2018/10/13 15:04, Greg Kroah-Hartman wrote: > On Sat, Oct 13, 2018 at 02:47:29PM +0800, Gao Xiang wrote: >> It is better to use smp_cond_load_relaxed instead >> of busy waiting for bit_spinlock. > > Why? I think we need some kind of "proof" that this is true before > being able to accept a patch like this, don't you agree? There are some materials which discuss smp_cond_load_* earlier. https://patchwork.kernel.org/patch/10335991/ https://patchwork.kernel.org/patch/10325057/ In ARM64, they implements a function called "cmpwait", which uses hardware instructions to monitor a value change, I think it is more energy efficient than just do a open-code busy loop... And it seem smp_cond_load_* is already used in the current kernel, such as: ./kernel/locking/mcs_spinlock.h ./kernel/locking/qspinlock.c ./kernel/sched/core.c ./kernel/smp.c For other architectures like x86/arm64, I think they could implement smp_cond_load_* later. Thanks, Gao Xiang > > thanks, > > greg k-h > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH] bit_spinlock: introduce smp_cond_load_relaxed 2018-10-13 7:22 ` Gao Xiang @ 2018-10-13 7:30 ` Greg Kroah-Hartman 2018-10-13 7:44 ` Gao Xiang 2018-10-13 7:30 ` Gao Xiang 1 sibling, 1 reply; 19+ messages in thread From: Greg Kroah-Hartman @ 2018-10-13 7:30 UTC (permalink / raw) To: Gao Xiang Cc: Philippe Ombredanne, Kate Stewart, Thomas Gleixner, linux-kernel, Miao Xie, Chao Yu On Sat, Oct 13, 2018 at 03:22:08PM +0800, Gao Xiang wrote: > Hi Greg, > > On 2018/10/13 15:04, Greg Kroah-Hartman wrote: > > On Sat, Oct 13, 2018 at 02:47:29PM +0800, Gao Xiang wrote: > >> It is better to use smp_cond_load_relaxed instead > >> of busy waiting for bit_spinlock. > > > > Why? I think we need some kind of "proof" that this is true before > > being able to accept a patch like this, don't you agree? > > There are some materials which discuss smp_cond_load_* earlier. > https://patchwork.kernel.org/patch/10335991/ > https://patchwork.kernel.org/patch/10325057/ > > In ARM64, they implements a function called "cmpwait", which uses > hardware instructions to monitor a value change, I think it is more > energy efficient than just do a open-code busy loop... > > And it seem smp_cond_load_* is already used in the current kernel, such as: > ./kernel/locking/mcs_spinlock.h > ./kernel/locking/qspinlock.c > ./kernel/sched/core.c > ./kernel/smp.c > > For other architectures like x86/arm64, I think they could implement > smp_cond_load_* later. And have you benchmarked this change to show that it provides any benifit? You need to do that... thanks, greg k-h ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH] bit_spinlock: introduce smp_cond_load_relaxed 2018-10-13 7:30 ` Greg Kroah-Hartman @ 2018-10-13 7:44 ` Gao Xiang 0 siblings, 0 replies; 19+ messages in thread From: Gao Xiang @ 2018-10-13 7:44 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Philippe Ombredanne, Kate Stewart, Thomas Gleixner, linux-kernel, Miao Xie, Chao Yu Hi Greg, On 2018/10/13 15:30, Greg Kroah-Hartman wrote: > On Sat, Oct 13, 2018 at 03:22:08PM +0800, Gao Xiang wrote: >> Hi Greg, >> >> On 2018/10/13 15:04, Greg Kroah-Hartman wrote: >>> On Sat, Oct 13, 2018 at 02:47:29PM +0800, Gao Xiang wrote: >>>> It is better to use smp_cond_load_relaxed instead >>>> of busy waiting for bit_spinlock. >>> >>> Why? I think we need some kind of "proof" that this is true before >>> being able to accept a patch like this, don't you agree? >> >> There are some materials which discuss smp_cond_load_* earlier. >> https://patchwork.kernel.org/patch/10335991/ >> https://patchwork.kernel.org/patch/10325057/ >> >> In ARM64, they implements a function called "cmpwait", which uses >> hardware instructions to monitor a value change, I think it is more >> energy efficient than just do a open-code busy loop... >> >> And it seem smp_cond_load_* is already used in the current kernel, such as: >> ./kernel/locking/mcs_spinlock.h >> ./kernel/locking/qspinlock.c >> ./kernel/sched/core.c >> ./kernel/smp.c >> >> For other architectures like x86/arm64, I think they could implement >> smp_cond_load_* later. > > And have you benchmarked this change to show that it provides any > benifit? > > You need to do that... OK, it is my responsibility to test this patch in ARM64 indeed. I will test it later in ARM64 to see if it has any performance difference after I handled EROFS product landing stuffs (perhaps weeks later, many urgent stuffs for the current product that I need to solve...) Or if some warm-hearted folks interest in it, I'm very happy to see other implementations or comments about that. :) Thanks, Gao Xiang > > thanks, > > greg k-h > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH] bit_spinlock: introduce smp_cond_load_relaxed 2018-10-13 7:22 ` Gao Xiang 2018-10-13 7:30 ` Greg Kroah-Hartman @ 2018-10-13 7:30 ` Gao Xiang 1 sibling, 0 replies; 19+ messages in thread From: Gao Xiang @ 2018-10-13 7:30 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Philippe Ombredanne, Kate Stewart, Thomas Gleixner, linux-kernel, Miao Xie, Chao Yu On 2018/10/13 15:22, Gao Xiang wrote: > For other architectures like x86/arm64, I think they could implement > smp_cond_load_* later. Sorry about that, I mean "amd64". Actually I don't have performance numbers to proof that now. I think it really depends on the detailed architecture hardware implementation. In my opinion, I just think it is better to wrap it up rather than do open-coded all around... do { cpu_relax() } while(...); I was just cleaning up EROFS file system, and saw these piece of code (bit_spinlock) by chance. Therefore I write a patch to get some idea about it.... Thanks, Gao Xiang ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v2] bit_spinlock: introduce smp_cond_load_relaxed 2018-10-13 6:47 [RFC PATCH] bit_spinlock: introduce smp_cond_load_relaxed Gao Xiang 2018-10-13 7:04 ` Greg Kroah-Hartman @ 2018-10-30 6:04 ` Gao Xiang 2018-10-30 5:52 ` Gao Xiang 2018-11-05 22:49 ` Will Deacon 1 sibling, 2 replies; 19+ messages in thread From: Gao Xiang @ 2018-10-30 6:04 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Philippe Ombredanne, Kate Stewart, Thomas Gleixner, Will Deacon, linux-kernel, Miao Xie, Chao Yu, Gao Xiang It is better to use wrapped smp_cond_load_relaxed instead of open-coded busy waiting for bit_spinlock. Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> --- change log v2: - fix the incorrect expression !(VAL >> (bitnum & (BITS_PER_LONG-1))) - the test result is described in the following reply. Thanks, Gao Xiang include/linux/bit_spinlock.h | 23 ++++++++++------------- 1 file changed, 10 insertions(+), 13 deletions(-) diff --git a/include/linux/bit_spinlock.h b/include/linux/bit_spinlock.h index bbc4730a6505..d5f922b5ffd9 100644 --- a/include/linux/bit_spinlock.h +++ b/include/linux/bit_spinlock.h @@ -15,22 +15,19 @@ */ static inline void bit_spin_lock(int bitnum, unsigned long *addr) { - /* - * Assuming the lock is uncontended, this never enters - * the body of the outer loop. If it is contended, then - * within the inner loop a non-atomic test is used to - * busywait with less bus contention for a good time to - * attempt to acquire the lock bit. - */ - preempt_disable(); #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) - while (unlikely(test_and_set_bit_lock(bitnum, addr))) { - preempt_enable(); - do { - cpu_relax(); - } while (test_bit(bitnum, addr)); + const unsigned int bitshift = bitnum & (BITS_PER_LONG - 1); + + while (1) { + smp_cond_load_relaxed(&addr[BIT_WORD(bitnum)], + !((VAL >> bitshift) & 1)); preempt_disable(); + if (!test_and_set_bit_lock(bitnum, addr)) + break; + preempt_enable(); } +#else + preempt_disable(); #endif __acquire(bitlock); } -- 2.17.1 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v2] bit_spinlock: introduce smp_cond_load_relaxed 2018-10-30 6:04 ` [PATCH v2] " Gao Xiang @ 2018-10-30 5:52 ` Gao Xiang 2018-11-05 17:11 ` Gao Xiang 2018-11-05 22:49 ` Will Deacon 1 sibling, 1 reply; 19+ messages in thread From: Gao Xiang @ 2018-10-30 5:52 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Philippe Ombredanne, Kate Stewart, Thomas Gleixner, Will Deacon, linux-kernel, Miao Xie, Chao Yu Hi, On 2018/10/30 14:04, Gao Xiang wrote: > It is better to use wrapped smp_cond_load_relaxed > instead of open-coded busy waiting for bit_spinlock. > > Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> > --- > > change log v2: > - fix the incorrect expression !(VAL >> (bitnum & (BITS_PER_LONG-1))) > - the test result is described in the following reply. > > Thanks, > Gao Xiang Simple test script: #include <linux/kthread.h> #include <linux/bit_spinlock.h> #include <linux/module.h> unsigned long global_lock; int test_thread(void *data) { unsigned long thread_id = (unsigned long)data; int i; u64 start = ktime_get_ns(); for (i = 0; i < 500000; ++i) { bit_spin_lock(0, &global_lock); __asm__("yield"); bit_spin_unlock(0, &global_lock); } pr_err("Thread id: %lu time: %llu\n", thread_id, ktime_get_ns() - start); do_exit(0); } static int __init bitspinlock_test_module_init(void) { int i; for (i = 0; i < 8; ++i) { if (IS_ERR(kthread_run(test_thread, (void *)(unsigned long)i, "thread-%d", i))) pr_err("fail to create thread %d\n", i); } return 0; } static void __exit bitspinlock_test_module_exit(void) { } module_init(bitspinlock_test_module_init); module_exit(bitspinlock_test_module_exit); MODULE_LICENSE("GPL"); ...and tested in the following ARM server environment: Processor: HI1616 (https://en.wikichip.org/wiki/hisilicon/hi16xx/hi1616) Board: HiSilicon D05 Development Board (http://open-estuary.org/d05/) Memory: 512GB Host OS: Ubuntu 18.04.1 LTS (Ubuntu 4.15.0-29.31-generic 4.15.18) QEMU KVM OS: Linux 4.19 + buildroot QEMU KVM cmdline: qemu-system-aarch64 -enable-kvm -cpu host -smp 4 -m 256M -kernel Image -M virt,kernel_irqchip=on -nographic -hda rootfs.ext2 -append 'root=/dev/vda console=ttyAMA0 earlycon=pl011,0x9000000' -serial mon:stdio -net none Without this patch: Thread 0 Thread 1 Thread 2 Thread 3 Thread 4 Thread 5 Thread 6 Thread 7 1 1283709480 1271869280 454742480 1173673820 1145643640 1118846920 774616920 1144146140 2 643580180 625143860 576841700 322982340 649987880 585749000 529178880 373374780 3 672307220 847315000 880801860 1039502040 667086380 1033939940 1035381120 1046898300 4 568635580 440547020 737000380 910040880 804543740 712314280 868896880 867049000 5 749107320 726397720 776134480 611970100 756721040 753449440 711691300 609343300 With this patch: Thread 0 Thread 1 Thread 2 Thread 3 Thread 4 Thread 5 Thread 6 Thread 7 1 170327620 196322160 169434180 74723860 178145600 178873460 143843260 70998780 2 166415220 129649200 166161240 175241520 155474460 112811860 157003140 150087420 3 511420780 117655640 598641860 596213720 462888760 430838600 554346300 428035120 4 174520240 156311800 120274280 87465380 172781400 136118620 163728340 63026360 5 153677940 202786860 183626500 140721300 150311360 161266840 168154340 107247460 Thanks, Gao Xiang ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Re: [PATCH v2] bit_spinlock: introduce smp_cond_load_relaxed 2018-10-30 5:52 ` Gao Xiang @ 2018-11-05 17:11 ` Gao Xiang 0 siblings, 0 replies; 19+ messages in thread From: Gao Xiang @ 2018-11-05 17:11 UTC (permalink / raw) To: Gao Xiang, Greg Kroah-Hartman Cc: Philippe Ombredanne, Kate Stewart, Thomas Gleixner, Will Deacon, linux-kernel, Peter Zijlstra, Miao Xie, Chao Yu Hi all, ping... is there someone interested in this patch? It seems bit_spinlock performs much better by using smp_cond_load_relaxed than just do busy-loop in arm64... Thanks, Gao Xiang On 2018/10/30 13:52, Gao Xiang wrote: > Hi, > > On 2018/10/30 14:04, Gao Xiang wrote: >> It is better to use wrapped smp_cond_load_relaxed >> instead of open-coded busy waiting for bit_spinlock. >> >> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> >> --- >> >> change log v2: >> - fix the incorrect expression !(VAL >> (bitnum & (BITS_PER_LONG-1))) >> - the test result is described in the following reply. >> >> Thanks, >> Gao Xiang > > > Simple test script: > #include <linux/kthread.h> > #include <linux/bit_spinlock.h> > #include <linux/module.h> > > unsigned long global_lock; > > int test_thread(void *data) > { > unsigned long thread_id = (unsigned long)data; > int i; > u64 start = ktime_get_ns(); > > for (i = 0; i < 500000; ++i) { > bit_spin_lock(0, &global_lock); > __asm__("yield"); > bit_spin_unlock(0, &global_lock); > } > pr_err("Thread id: %lu time: %llu\n", thread_id, ktime_get_ns() - start); > > do_exit(0); > } > > > static int __init bitspinlock_test_module_init(void) > { > int i; > > for (i = 0; i < 8; ++i) { > if (IS_ERR(kthread_run(test_thread, (void *)(unsigned long)i, "thread-%d", i))) > pr_err("fail to create thread %d\n", i); > } > > return 0; > } > > static void __exit bitspinlock_test_module_exit(void) > { > } > > module_init(bitspinlock_test_module_init); > module_exit(bitspinlock_test_module_exit); > MODULE_LICENSE("GPL"); > > > ...and tested in the following ARM server environment: > > Processor: HI1616 (https://en.wikichip.org/wiki/hisilicon/hi16xx/hi1616) > Board: HiSilicon D05 Development Board (http://open-estuary.org/d05/) > Memory: 512GB > Host OS: Ubuntu 18.04.1 LTS (Ubuntu 4.15.0-29.31-generic 4.15.18) > QEMU KVM OS: Linux 4.19 + buildroot > QEMU KVM cmdline: qemu-system-aarch64 -enable-kvm -cpu host -smp 4 -m 256M -kernel Image -M virt,kernel_irqchip=on -nographic -hda rootfs.ext2 -append 'root=/dev/vda console=ttyAMA0 earlycon=pl011,0x9000000' -serial mon:stdio -net none > > Without this patch: > Thread 0 Thread 1 Thread 2 Thread 3 Thread 4 Thread 5 Thread 6 Thread 7 > 1 1283709480 1271869280 454742480 1173673820 1145643640 1118846920 774616920 1144146140 > 2 643580180 625143860 576841700 322982340 649987880 585749000 529178880 373374780 > 3 672307220 847315000 880801860 1039502040 667086380 1033939940 1035381120 1046898300 > 4 568635580 440547020 737000380 910040880 804543740 712314280 868896880 867049000 > 5 749107320 726397720 776134480 611970100 756721040 753449440 711691300 609343300 > > With this patch: > Thread 0 Thread 1 Thread 2 Thread 3 Thread 4 Thread 5 Thread 6 Thread 7 > 1 170327620 196322160 169434180 74723860 178145600 178873460 143843260 70998780 > 2 166415220 129649200 166161240 175241520 155474460 112811860 157003140 150087420 > 3 511420780 117655640 598641860 596213720 462888760 430838600 554346300 428035120 > 4 174520240 156311800 120274280 87465380 172781400 136118620 163728340 63026360 > 5 153677940 202786860 183626500 140721300 150311360 161266840 168154340 107247460 > > Thanks, > Gao Xiang > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2] bit_spinlock: introduce smp_cond_load_relaxed 2018-10-30 6:04 ` [PATCH v2] " Gao Xiang 2018-10-30 5:52 ` Gao Xiang @ 2018-11-05 22:49 ` Will Deacon 2018-11-06 1:45 ` Gao Xiang 2018-11-06 9:06 ` Peter Zijlstra 1 sibling, 2 replies; 19+ messages in thread From: Will Deacon @ 2018-11-05 22:49 UTC (permalink / raw) To: Gao Xiang Cc: Greg Kroah-Hartman, Philippe Ombredanne, Kate Stewart, Thomas Gleixner, linux-kernel, Miao Xie, Chao Yu, peterz [+PeterZ -- please include him on stuff like this] Hi Gao, On Tue, Oct 30, 2018 at 02:04:41PM +0800, Gao Xiang wrote: > It is better to use wrapped smp_cond_load_relaxed > instead of open-coded busy waiting for bit_spinlock. > > Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> > --- > > change log v2: > - fix the incorrect expression !(VAL >> (bitnum & (BITS_PER_LONG-1))) > - the test result is described in the following reply. Please include the results in the commit message, so that this change is justified. > diff --git a/include/linux/bit_spinlock.h b/include/linux/bit_spinlock.h > index bbc4730a6505..d5f922b5ffd9 100644 > --- a/include/linux/bit_spinlock.h > +++ b/include/linux/bit_spinlock.h > @@ -15,22 +15,19 @@ > */ > static inline void bit_spin_lock(int bitnum, unsigned long *addr) > { > - /* > - * Assuming the lock is uncontended, this never enters > - * the body of the outer loop. If it is contended, then > - * within the inner loop a non-atomic test is used to > - * busywait with less bus contention for a good time to > - * attempt to acquire the lock bit. > - */ > - preempt_disable(); > #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) > - while (unlikely(test_and_set_bit_lock(bitnum, addr))) { > - preempt_enable(); > - do { > - cpu_relax(); > - } while (test_bit(bitnum, addr)); > + const unsigned int bitshift = bitnum & (BITS_PER_LONG - 1); > + > + while (1) { > + smp_cond_load_relaxed(&addr[BIT_WORD(bitnum)], > + !((VAL >> bitshift) & 1)); > preempt_disable(); > + if (!test_and_set_bit_lock(bitnum, addr)) > + break; > + preempt_enable(); > } > +#else > + preempt_disable(); This appears to introduce a bunch of overhead for the uncontended fastpath. How about the much-simpler-but-completely-untested (tm) patch below? Will --->8 diff --git a/include/asm-generic/bitops/lock.h b/include/asm-generic/bitops/lock.h index 3ae021368f48..9de8d3544630 100644 --- a/include/asm-generic/bitops/lock.h +++ b/include/asm-generic/bitops/lock.h @@ -6,6 +6,15 @@ #include <linux/compiler.h> #include <asm/barrier.h> +static inline void spin_until_bit_unlock(unsigned int nr, + volatile unsigned long *p) +{ + unsigned long mask = BIT_MASK(bitnum); + + p += BIT_WORD(nr); + smp_cond_load_relaxed(p, VAL & mask); +} + /** * test_and_set_bit_lock - Set a bit and return its old value, for lock * @nr: Bit to set diff --git a/include/linux/bit_spinlock.h b/include/linux/bit_spinlock.h index bbc4730a6505..d711c62e718c 100644 --- a/include/linux/bit_spinlock.h +++ b/include/linux/bit_spinlock.h @@ -26,9 +26,7 @@ static inline void bit_spin_lock(int bitnum, unsigned long *addr) #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) while (unlikely(test_and_set_bit_lock(bitnum, addr))) { preempt_enable(); - do { - cpu_relax(); - } while (test_bit(bitnum, addr)); + spin_until_bit_unlock(bitnum, addr); preempt_disable(); } #endif ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v2] bit_spinlock: introduce smp_cond_load_relaxed 2018-11-05 22:49 ` Will Deacon @ 2018-11-06 1:45 ` Gao Xiang 2018-11-06 9:06 ` Peter Zijlstra 1 sibling, 0 replies; 19+ messages in thread From: Gao Xiang @ 2018-11-06 1:45 UTC (permalink / raw) To: Will Deacon Cc: Greg Kroah-Hartman, Philippe Ombredanne, Kate Stewart, Thomas Gleixner, linux-kernel, Miao Xie, Chao Yu, peterz Hi Will, On 2018/11/6 6:49, Will Deacon wrote: > Hi Gao, > > On Tue, Oct 30, 2018 at 02:04:41PM +0800, Gao Xiang wrote: >> It is better to use wrapped smp_cond_load_relaxed >> instead of open-coded busy waiting for bit_spinlock. >> >> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> >> --- >> >> change log v2: >> - fix the incorrect expression !(VAL >> (bitnum & (BITS_PER_LONG-1))) >> - the test result is described in the following reply. > Please include the results in the commit message, so that this change is > justified. Will add in the next version... > > This appears to introduce a bunch of overhead for the uncontended fastpath. > How about the much-simpler-but-completely-untested (tm) patch below? Actually I thought to do like the following (much simpler indeed) at first... But the current implementation of smp_cond_load_relaxed will do a judgement immediately which seems unnecessary (right after the test_and_set_bit_lock rather than after __cmpwait_relaxed...) for (;;) { \ VAL = READ_ONCE(*__PTR); \ if (cond_expr) \ break; \ __cmpwait_relaxed(__PTR, VAL); \ } \ p.s. I have no idea the original uncontended fastpath really works effectively... some idea about this? Thanks in advance... Thanks, Gao Xiang > > Will > > --->8 > > diff --git a/include/asm-generic/bitops/lock.h b/include/asm-generic/bitops/lock.h > index 3ae021368f48..9de8d3544630 100644 > --- a/include/asm-generic/bitops/lock.h > +++ b/include/asm-generic/bitops/lock.h > @@ -6,6 +6,15 @@ > #include <linux/compiler.h> > #include <asm/barrier.h> > > +static inline void spin_until_bit_unlock(unsigned int nr, > + volatile unsigned long *p) > +{ > + unsigned long mask = BIT_MASK(bitnum); > + > + p += BIT_WORD(nr); > + smp_cond_load_relaxed(p, VAL & mask); > +} > + > /** > * test_and_set_bit_lock - Set a bit and return its old value, for lock > * @nr: Bit to set > diff --git a/include/linux/bit_spinlock.h b/include/linux/bit_spinlock.h > index bbc4730a6505..d711c62e718c 100644 > --- a/include/linux/bit_spinlock.h > +++ b/include/linux/bit_spinlock.h > @@ -26,9 +26,7 @@ static inline void bit_spin_lock(int bitnum, unsigned long *addr) > #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) > while (unlikely(test_and_set_bit_lock(bitnum, addr))) { > preempt_enable(); > - do { > - cpu_relax(); > - } while (test_bit(bitnum, addr)); > + spin_until_bit_unlock(bitnum, addr); > preempt_disable(); > } > #endif ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2] bit_spinlock: introduce smp_cond_load_relaxed 2018-11-05 22:49 ` Will Deacon 2018-11-06 1:45 ` Gao Xiang @ 2018-11-06 9:06 ` Peter Zijlstra 2018-11-06 10:22 ` Gao Xiang 1 sibling, 1 reply; 19+ messages in thread From: Peter Zijlstra @ 2018-11-06 9:06 UTC (permalink / raw) To: Will Deacon Cc: Gao Xiang, Greg Kroah-Hartman, Philippe Ombredanne, Kate Stewart, Thomas Gleixner, linux-kernel, Miao Xie, Chao Yu On Mon, Nov 05, 2018 at 10:49:21PM +0000, Will Deacon wrote: > diff --git a/include/asm-generic/bitops/lock.h b/include/asm-generic/bitops/lock.h > index 3ae021368f48..9de8d3544630 100644 > --- a/include/asm-generic/bitops/lock.h > +++ b/include/asm-generic/bitops/lock.h > @@ -6,6 +6,15 @@ > #include <linux/compiler.h> > #include <asm/barrier.h> > > +static inline void spin_until_bit_unlock(unsigned int nr, > + volatile unsigned long *p) > +{ > + unsigned long mask = BIT_MASK(bitnum); > + > + p += BIT_WORD(nr); > + smp_cond_load_relaxed(p, VAL & mask); > +} > + > /** > * test_and_set_bit_lock - Set a bit and return its old value, for lock > * @nr: Bit to set > diff --git a/include/linux/bit_spinlock.h b/include/linux/bit_spinlock.h > index bbc4730a6505..d711c62e718c 100644 > --- a/include/linux/bit_spinlock.h > +++ b/include/linux/bit_spinlock.h > @@ -26,9 +26,7 @@ static inline void bit_spin_lock(int bitnum, unsigned long *addr) > #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) > while (unlikely(test_and_set_bit_lock(bitnum, addr))) { > preempt_enable(); > - do { > - cpu_relax(); > - } while (test_bit(bitnum, addr)); > + spin_until_bit_unlock(bitnum, addr); > preempt_disable(); > } > #endif Yes, that's much better. Ideally though, we'd get rid of bit spinlocks that have significant enough contention for this to matter. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2] bit_spinlock: introduce smp_cond_load_relaxed 2018-11-06 9:06 ` Peter Zijlstra @ 2018-11-06 10:22 ` Gao Xiang 2018-11-06 11:00 ` Peter Zijlstra 0 siblings, 1 reply; 19+ messages in thread From: Gao Xiang @ 2018-11-06 10:22 UTC (permalink / raw) To: Peter Zijlstra Cc: Will Deacon, Greg Kroah-Hartman, Philippe Ombredanne, Kate Stewart, Thomas Gleixner, linux-kernel, Miao Xie, Chao Yu Hi Peter, On 2018/11/6 17:06, Peter Zijlstra wrote: > On Mon, Nov 05, 2018 at 10:49:21PM +0000, Will Deacon wrote: >> diff --git a/include/asm-generic/bitops/lock.h b/include/asm-generic/bitops/lock.h >> index 3ae021368f48..9de8d3544630 100644 >> --- a/include/asm-generic/bitops/lock.h >> +++ b/include/asm-generic/bitops/lock.h >> @@ -6,6 +6,15 @@ >> #include <linux/compiler.h> >> #include <asm/barrier.h> >> >> +static inline void spin_until_bit_unlock(unsigned int nr, >> + volatile unsigned long *p) >> +{ >> + unsigned long mask = BIT_MASK(bitnum); >> + >> + p += BIT_WORD(nr); >> + smp_cond_load_relaxed(p, VAL & mask); >> +} >> + >> /** >> * test_and_set_bit_lock - Set a bit and return its old value, for lock >> * @nr: Bit to set >> diff --git a/include/linux/bit_spinlock.h b/include/linux/bit_spinlock.h >> index bbc4730a6505..d711c62e718c 100644 >> --- a/include/linux/bit_spinlock.h >> +++ b/include/linux/bit_spinlock.h >> @@ -26,9 +26,7 @@ static inline void bit_spin_lock(int bitnum, unsigned long *addr) >> #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) >> while (unlikely(test_and_set_bit_lock(bitnum, addr))) { >> preempt_enable(); >> - do { >> - cpu_relax(); >> - } while (test_bit(bitnum, addr)); >> + spin_until_bit_unlock(bitnum, addr); >> preempt_disable(); >> } >> #endif > > Yes, that's much better. Ideally though, we'd get rid of bit spinlocks > that have significant enough contention for this to matter. OK, I will send v3 to fix like the above. Thanks, Gao Xiang > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2] bit_spinlock: introduce smp_cond_load_relaxed 2018-11-06 10:22 ` Gao Xiang @ 2018-11-06 11:00 ` Peter Zijlstra 2018-11-06 11:36 ` Gao Xiang 0 siblings, 1 reply; 19+ messages in thread From: Peter Zijlstra @ 2018-11-06 11:00 UTC (permalink / raw) To: Gao Xiang Cc: Will Deacon, Greg Kroah-Hartman, Philippe Ombredanne, Kate Stewart, Thomas Gleixner, linux-kernel, Miao Xie, Chao Yu On Tue, Nov 06, 2018 at 06:22:31PM +0800, Gao Xiang wrote: > Hi Peter, > > On 2018/11/6 17:06, Peter Zijlstra wrote: > > On Mon, Nov 05, 2018 at 10:49:21PM +0000, Will Deacon wrote: > >> diff --git a/include/asm-generic/bitops/lock.h b/include/asm-generic/bitops/lock.h > >> index 3ae021368f48..9de8d3544630 100644 > >> --- a/include/asm-generic/bitops/lock.h > >> +++ b/include/asm-generic/bitops/lock.h > >> @@ -6,6 +6,15 @@ > >> #include <linux/compiler.h> > >> #include <asm/barrier.h> > >> > >> +static inline void spin_until_bit_unlock(unsigned int nr, > >> + volatile unsigned long *p) > >> +{ > >> + unsigned long mask = BIT_MASK(bitnum); > >> + > >> + p += BIT_WORD(nr); > >> + smp_cond_load_relaxed(p, VAL & mask); > >> +} > >> + > >> /** > >> * test_and_set_bit_lock - Set a bit and return its old value, for lock > >> * @nr: Bit to set > >> diff --git a/include/linux/bit_spinlock.h b/include/linux/bit_spinlock.h > >> index bbc4730a6505..d711c62e718c 100644 > >> --- a/include/linux/bit_spinlock.h > >> +++ b/include/linux/bit_spinlock.h > >> @@ -26,9 +26,7 @@ static inline void bit_spin_lock(int bitnum, unsigned long *addr) > >> #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) > >> while (unlikely(test_and_set_bit_lock(bitnum, addr))) { > >> preempt_enable(); > >> - do { > >> - cpu_relax(); > >> - } while (test_bit(bitnum, addr)); > >> + spin_until_bit_unlock(bitnum, addr); > >> preempt_disable(); > >> } > >> #endif > > > > Yes, that's much better. Ideally though, we'd get rid of bit spinlocks > > that have significant enough contention for this to matter. > > OK, I will send v3 to fix like the above. That's not answering the full question though. What bit spinlocks did you find where this matters? And can't we convert them to proper spinlocks instead? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2] bit_spinlock: introduce smp_cond_load_relaxed 2018-11-06 11:00 ` Peter Zijlstra @ 2018-11-06 11:36 ` Gao Xiang 2018-11-06 12:27 ` Peter Zijlstra 0 siblings, 1 reply; 19+ messages in thread From: Gao Xiang @ 2018-11-06 11:36 UTC (permalink / raw) To: Peter Zijlstra Cc: Will Deacon, Greg Kroah-Hartman, Philippe Ombredanne, Kate Stewart, Thomas Gleixner, linux-kernel, Miao Xie, Chao Yu Hi Peter, On 2018/11/6 19:00, Peter Zijlstra wrote: >>> Yes, that's much better. Ideally though, we'd get rid of bit spinlocks >>> that have significant enough contention for this to matter. >> OK, I will send v3 to fix like the above. > That's not answering the full question though. What bit spinlocks did > you find where this matters? And can't we convert them to proper > spinlocks instead? I just misunderstood your question and I get your point now. "What bit spinlocks did you find where this matters?" nope..I said the original background to Greg before, that is I tried to use smp_cond_load_relaxed instead of busy spining in the development of erofs file system and I saw this bit_spinlock implementation by chance...This is not a big modification but since I raised the question before and I want to trace to the end... "And can't we convert them to proper spinlocks instead?" I think bit_spinlock is sometime preferred since it requires little memory and can be integrated into some fields(eg. flags)...It is selectable for spinlocks don't have to many users at the same time, so I think it depends on the detailed real use scenerio... It is just a tool for user code to select case by case... That is my personal idea... IMO, to use wrapped up function for the detailed scenario could be better than open-coded all the time (eg. do cpu_relax(); while(...)) since it could be optimizated even more for the specific architecture... Thanks, Gao Xiang ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2] bit_spinlock: introduce smp_cond_load_relaxed 2018-11-06 11:36 ` Gao Xiang @ 2018-11-06 12:27 ` Peter Zijlstra 2018-11-06 12:33 ` Gao Xiang 0 siblings, 1 reply; 19+ messages in thread From: Peter Zijlstra @ 2018-11-06 12:27 UTC (permalink / raw) To: Gao Xiang Cc: Will Deacon, Greg Kroah-Hartman, Philippe Ombredanne, Kate Stewart, Thomas Gleixner, linux-kernel, Miao Xie, Chao Yu On Tue, Nov 06, 2018 at 07:36:41PM +0800, Gao Xiang wrote: > IMO, to use wrapped up function for the detailed scenario could be better than > open-coded all the time (eg. do cpu_relax(); while(...)) since it could be > optimizated even more for the specific architecture... That's the whole point though; if this actually matters, you're doing it wrong. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2] bit_spinlock: introduce smp_cond_load_relaxed 2018-11-06 12:27 ` Peter Zijlstra @ 2018-11-06 12:33 ` Gao Xiang 2018-11-06 12:38 ` Gao Xiang 2018-11-06 12:43 ` Peter Zijlstra 0 siblings, 2 replies; 19+ messages in thread From: Gao Xiang @ 2018-11-06 12:33 UTC (permalink / raw) To: Peter Zijlstra Cc: Will Deacon, Greg Kroah-Hartman, Philippe Ombredanne, Kate Stewart, Thomas Gleixner, linux-kernel, Miao Xie, Chao Yu Hi Peter, On 2018/11/6 20:27, Peter Zijlstra wrote: > On Tue, Nov 06, 2018 at 07:36:41PM +0800, Gao Xiang wrote: >> IMO, to use wrapped up function for the detailed scenario could be better than >> open-coded all the time (eg. do cpu_relax(); while(...)) since it could be >> optimizated even more for the specific architecture... > That's the whole point though; if this actually matters, you're doing it > wrong. I cannot fully understand your point...Sorry about my English... To the point, you mean it is much better to fix it as Will suggested before or leave the matter as it is since the performance of bit_spinlock itself doesn't matter? Thanks in advance. Thanks, Gao Xiang ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2] bit_spinlock: introduce smp_cond_load_relaxed 2018-11-06 12:33 ` Gao Xiang @ 2018-11-06 12:38 ` Gao Xiang 2018-11-06 12:43 ` Peter Zijlstra 1 sibling, 0 replies; 19+ messages in thread From: Gao Xiang @ 2018-11-06 12:38 UTC (permalink / raw) To: Peter Zijlstra Cc: Will Deacon, Greg Kroah-Hartman, Philippe Ombredanne, Kate Stewart, Thomas Gleixner, linux-kernel, Miao Xie, Chao Yu Hi Peter, OK, I think I understand your point. If you think the performance of bit_spinlock doesn't matter, Let's keep the current status. Thanks, Gao Xiang On 2018/11/6 20:33, Gao Xiang wrote: > Hi Peter, > > On 2018/11/6 20:27, Peter Zijlstra wrote: >> On Tue, Nov 06, 2018 at 07:36:41PM +0800, Gao Xiang wrote: >>> IMO, to use wrapped up function for the detailed scenario could be better than >>> open-coded all the time (eg. do cpu_relax(); while(...)) since it could be >>> optimizated even more for the specific architecture... >> That's the whole point though; if this actually matters, you're doing it >> wrong. > > I cannot fully understand your point...Sorry about my English... > > To the point, you mean it is much better to fix it as Will suggested before or > leave the matter as it is since the performance of bit_spinlock itself doesn't matter? > > Thanks in advance. > > Thanks, > Gao Xiang > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2] bit_spinlock: introduce smp_cond_load_relaxed 2018-11-06 12:33 ` Gao Xiang 2018-11-06 12:38 ` Gao Xiang @ 2018-11-06 12:43 ` Peter Zijlstra 1 sibling, 0 replies; 19+ messages in thread From: Peter Zijlstra @ 2018-11-06 12:43 UTC (permalink / raw) To: Gao Xiang Cc: Will Deacon, Greg Kroah-Hartman, Philippe Ombredanne, Kate Stewart, Thomas Gleixner, linux-kernel, Miao Xie, Chao Yu On Tue, Nov 06, 2018 at 08:33:56PM +0800, Gao Xiang wrote: > Hi Peter, > > On 2018/11/6 20:27, Peter Zijlstra wrote: > > On Tue, Nov 06, 2018 at 07:36:41PM +0800, Gao Xiang wrote: > >> IMO, to use wrapped up function for the detailed scenario could be better than > >> open-coded all the time (eg. do cpu_relax(); while(...)) since it could be > >> optimizated even more for the specific architecture... > > That's the whole point though; if this actually matters, you're doing it > > wrong. > > I cannot fully understand your point...Sorry about my English... > > To the point, you mean it is much better to fix it as Will suggested before or > leave the matter as it is since the performance of bit_spinlock itself doesn't matter? Right, bit-spinlocks are terrible when contended. If the contended behaviour of bit-spinlocks start to matter, you've lost already. ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2018-11-06 12:43 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-10-13 6:47 [RFC PATCH] bit_spinlock: introduce smp_cond_load_relaxed Gao Xiang 2018-10-13 7:04 ` Greg Kroah-Hartman 2018-10-13 7:22 ` Gao Xiang 2018-10-13 7:30 ` Greg Kroah-Hartman 2018-10-13 7:44 ` Gao Xiang 2018-10-13 7:30 ` Gao Xiang 2018-10-30 6:04 ` [PATCH v2] " Gao Xiang 2018-10-30 5:52 ` Gao Xiang 2018-11-05 17:11 ` Gao Xiang 2018-11-05 22:49 ` Will Deacon 2018-11-06 1:45 ` Gao Xiang 2018-11-06 9:06 ` Peter Zijlstra 2018-11-06 10:22 ` Gao Xiang 2018-11-06 11:00 ` Peter Zijlstra 2018-11-06 11:36 ` Gao Xiang 2018-11-06 12:27 ` Peter Zijlstra 2018-11-06 12:33 ` Gao Xiang 2018-11-06 12:38 ` Gao Xiang 2018-11-06 12:43 ` Peter Zijlstra
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.