* [RFC PATCH 0/3] Queued spinlocks/RW-locks for ARM @ 2019-10-07 21:44 Sebastian Andrzej Siewior 2019-10-07 21:44 ` [PATCH 1/3] ARM: Use qrwlock implementation Sebastian Andrzej Siewior ` (3 more replies) 0 siblings, 4 replies; 22+ messages in thread From: Sebastian Andrzej Siewior @ 2019-10-07 21:44 UTC (permalink / raw) To: linux-arm-kernel Cc: Arnd Bergmann, Peter Zijlstra, Russell King, Ingo Molnar, Waiman Long, Will Deacon I added support for queued-RW and -spinlocks for ARM. I wanted to remove the current implementation but this does not work. The CPU_V6 kernel config does not have support for xchg() on 2 byte memory address. This is required by q-lock' slowpath. It is possible to create a multi-kernel (with v6+v7+SMP) which then lack the function. I tested the q-lock implementation with hackbench -g40 -s 500 -l 500 The numbers in the table below represent the average runtime of 10 invocations. I tested with HZ_100,HZ_250 and the different preemption levels on a IMX6q-board (quad Cortex-A9) and an AM572x board (dual Cortex-A15). "Ticket" means the current implementation on v5.4-rc1, Q-Locks is the switch to queued RW and spinlocks and in Q-locksI the locking instruction is additionally inlined. IMX6q ~~~~~ HZ_100 | PREEMPT_NONE | PREEMPT_VOLUNTARY | PREEMPT Ticket | 52.103 | 52.284 | 60.5681 Q-locks | 54.1804 | 53.267 | 56.1914 Q-locksI| 52.2985 | 49.398 | 56.7441 HZ_250 | PREEMPT_NONE | PREEMPT_VOLUNTARY | PREEMPT Ticket | 54.3888 | 52.7896 | 58.4837 Q-locks | 52.1027 | 52.2302 | 57.26 Q-locksI| 51.6185 | 51.5856 | 55.327 AM572x ~~~~~~ HZ_100 | PREEMPT_NONE | PREEMPT_VOLUNTARY | PREEMPT Ticket | 42.3819 | 42.4821 | 43.2671 Q-locks | 40.9141 | 40.0269 | 42.65 Q-locksI| 40.0763 | 39.9101 | 40.7811 HZ_250 | PREEMPT_NONE | PREEMPT_VOLUNTARY | PREEMPT Ticket | 41.6399 | 42.9386 | 44.5865 Q-locks | 41.4476 | 43.0836 | 43.1937 Q-locksI| 39.6897 | 41.1746 | 43.1962 Based on these numbers, the Q-lock based implementation performs a little better that the current ticket spinlock implementation. On IMX6q it requires additionally to inline the locks while it makes hardly a difference on AM572x. Here are `size' numbers for the different vmlinux binary: text data bss dec dec KiB variant 8096124 2604932 198648 10899704 10644.24 5.4-rc1 CONFIG_HZ_100 CONFIG_PREEMPT_NONE 8031639 2605060 198656 10835355 10581.40 qlocks CONFIG_HZ_100 CONFIG_PREEMPT_NONE 8319233 2605072 198656 11122961 10862.27 qlocksI CONFIG_HZ_100 CONFIG_PREEMPT_NONE 8098548 2604932 198648 10902128 10646.61 5.4-rc1 CONFIG_HZ_100 CONFIG_PREEMPT_VOLUNTARY 8034103 2605060 198656 10837819 10583.81 qlocks CONFIG_HZ_100 CONFIG_PREEMPT_VOLUNTARY 8321769 2605072 198656 11125497 10864.74 qlocksI CONFIG_HZ_100 CONFIG_PREEMPT_VOLUNTARY 8082969 2605468 198712 10887149 10631.98 5.4-rc1 CONFIG_HZ_100 CONFIG_PREEMPT 8083732 2609692 198720 10892144 10636.86 qlocks CONFIG_HZ_100 CONFIG_PREEMPT 8725070 2609704 198720 11533494 11263.18 qlocksI CONFIG_HZ_100 CONFIG_PREEMPT 8096784 2605188 198648 10900620 10645.14 5.4-rc1 CONFIG_HZ_250 CONFIG_PREEMPT_NONE 8032307 2605316 198656 10836279 10582.30 qlocks CONFIG_HZ_250 CONFIG_PREEMPT_NONE 8319901 2605328 198656 11123885 10863.17 qlocksI CONFIG_HZ_250 CONFIG_PREEMPT_NONE 8099208 2605188 198648 10903044 10647.50 5.4-rc1 CONFIG_HZ_250 CONFIG_PREEMPT_VOLUNTARY 8034739 2605316 198656 10838711 10584.68 qlocks CONFIG_HZ_250 CONFIG_PREEMPT_VOLUNTARY 8322405 2605328 198656 11126389 10865.61 qlocksI CONFIG_HZ_250 CONFIG_PREEMPT_VOLUNTARY 8083645 2605724 198712 10888081 10632.89 5.4-rc1 CONFIG_HZ_250 CONFIG_PREEMPT 8084376 2609948 198720 10893044 10637.74 qlocks CONFIG_HZ_250 CONFIG_PREEMPT 8725762 2609960 198720 11534442 11264.10 qlocksI CONFIG_HZ_250 CONFIG_PREEMPT On average the q-locksI variant is approx. 200KiB larger compared to the current implementation. On the preempt configuration the size increases by approx. 600KiB which is probably not worth it. Sebastian _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 1/3] ARM: Use qrwlock implementation 2019-10-07 21:44 [RFC PATCH 0/3] Queued spinlocks/RW-locks for ARM Sebastian Andrzej Siewior @ 2019-10-07 21:44 ` Sebastian Andrzej Siewior 2019-10-07 21:44 ` [PATCH 2/3] ARM: Use qspinlock implementation Sebastian Andrzej Siewior ` (2 subsequent siblings) 3 siblings, 0 replies; 22+ messages in thread From: Sebastian Andrzej Siewior @ 2019-10-07 21:44 UTC (permalink / raw) To: linux-arm-kernel Cc: Arnd Bergmann, Peter Zijlstra, Russell King, Sebastian Andrzej Siewior, Ingo Molnar, Waiman Long, Will Deacon Use the generic qrwlock implementation for rwlock. The WFE mechanism is used as part of the spinlock implementation. Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> --- arch/arm/Kconfig | 1 + arch/arm/include/asm/Kbuild | 1 + arch/arm/include/asm/spinlock.h | 8 ++++++++ arch/arm/include/asm/spinlock_types.h | 10 +++++++++- 4 files changed, 19 insertions(+), 1 deletion(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 8a50efb559f35..2f704531d883a 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -34,6 +34,7 @@ config ARM select ARCH_SUPPORTS_ATOMIC_RMW select ARCH_USE_BUILTIN_BSWAP select ARCH_USE_CMPXCHG_LOCKREF + select ARCH_USE_QUEUED_RWLOCKS if !CPU_V6 select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU select ARCH_WANT_IPC_PARSE_VERSION select BINFMT_FLAT_ARGVP_ENVP_ON_STACK diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild index 68ca86f85eb73..5327be7572cd2 100644 --- a/arch/arm/include/asm/Kbuild +++ b/arch/arm/include/asm/Kbuild @@ -18,6 +18,7 @@ generic-y += preempt.h generic-y += seccomp.h generic-y += serial.h generic-y += trace_clock.h +generic-y += qrwlock.h generated-y += mach-types.h generated-y += unistd-nr.h diff --git a/arch/arm/include/asm/spinlock.h b/arch/arm/include/asm/spinlock.h index 8f009e788ad40..e23f71b2cd43d 100644 --- a/arch/arm/include/asm/spinlock.h +++ b/arch/arm/include/asm/spinlock.h @@ -128,6 +128,7 @@ static inline int arch_spin_is_contended(arch_spinlock_t *lock) } #define arch_spin_is_contended arch_spin_is_contended +#ifdef CONFIG_CPU_V6 /* * RWLOCKS * @@ -270,4 +271,11 @@ static inline int arch_read_trylock(arch_rwlock_t *rw) } } +#else + +#include <asm/qrwlock.h> +#define smp_mb__after_spinlock() smp_mb() + +#endif + #endif /* __ASM_SPINLOCK_H */ diff --git a/arch/arm/include/asm/spinlock_types.h b/arch/arm/include/asm/spinlock_types.h index 5976958647fe1..c942a75250bb2 100644 --- a/arch/arm/include/asm/spinlock_types.h +++ b/arch/arm/include/asm/spinlock_types.h @@ -29,6 +29,14 @@ typedef struct { u32 lock; } arch_rwlock_t; -#define __ARCH_RW_LOCK_UNLOCKED { 0 } +#ifdef CONFIG_CPU_V6 + +#define __ARCH_RW_LOCK_UNLOCKED { 0 } + +#else + +#include <asm-generic/qrwlock_types.h> + +#endif #endif -- 2.23.0 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 2/3] ARM: Use qspinlock implementation 2019-10-07 21:44 [RFC PATCH 0/3] Queued spinlocks/RW-locks for ARM Sebastian Andrzej Siewior 2019-10-07 21:44 ` [PATCH 1/3] ARM: Use qrwlock implementation Sebastian Andrzej Siewior @ 2019-10-07 21:44 ` Sebastian Andrzej Siewior 2019-10-07 21:44 ` [PATCH 3/3] ARM: Inline locking functions for !PREEMPTION Sebastian Andrzej Siewior 2019-10-08 11:42 ` [RFC PATCH 0/3] Queued spinlocks/RW-locks for ARM Arnd Bergmann 3 siblings, 0 replies; 22+ messages in thread From: Sebastian Andrzej Siewior @ 2019-10-07 21:44 UTC (permalink / raw) To: linux-arm-kernel Cc: Arnd Bergmann, Peter Zijlstra, Russell King, Sebastian Andrzej Siewior, Ingo Molnar, Waiman Long, Will Deacon Use the generic queued spinlock implementation for spinlock. The WFE mechanism is used as part of arch_mcs_spin_lock_contended(). Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> --- arch/arm/Kconfig | 1 + arch/arm/include/asm/Kbuild | 1 + arch/arm/include/asm/spinlock.h | 5 +++-- arch/arm/include/asm/spinlock_types.h | 4 ++-- 4 files changed, 7 insertions(+), 4 deletions(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 2f704531d883a..7eba89bb45755 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -35,6 +35,7 @@ config ARM select ARCH_USE_BUILTIN_BSWAP select ARCH_USE_CMPXCHG_LOCKREF select ARCH_USE_QUEUED_RWLOCKS if !CPU_V6 + select ARCH_USE_QUEUED_SPINLOCKS if !CPU_V6 select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU select ARCH_WANT_IPC_PARSE_VERSION select BINFMT_FLAT_ARGVP_ENVP_ON_STACK diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild index 5327be7572cd2..f98bcfd9612b6 100644 --- a/arch/arm/include/asm/Kbuild +++ b/arch/arm/include/asm/Kbuild @@ -18,6 +18,7 @@ generic-y += preempt.h generic-y += seccomp.h generic-y += serial.h generic-y += trace_clock.h +generic-y += qspinlock.h generic-y += qrwlock.h generated-y += mach-types.h diff --git a/arch/arm/include/asm/spinlock.h b/arch/arm/include/asm/spinlock.h index e23f71b2cd43d..7f3477679eb58 100644 --- a/arch/arm/include/asm/spinlock.h +++ b/arch/arm/include/asm/spinlock.h @@ -45,6 +45,7 @@ static inline void dsb_sev(void) __asm__(SEV); } +#ifdef CONFIG_CPU_V6 /* * ARMv6 ticket-based spin-locking. * @@ -128,7 +129,6 @@ static inline int arch_spin_is_contended(arch_spinlock_t *lock) } #define arch_spin_is_contended arch_spin_is_contended -#ifdef CONFIG_CPU_V6 /* * RWLOCKS * @@ -274,7 +274,8 @@ static inline int arch_read_trylock(arch_rwlock_t *rw) #else #include <asm/qrwlock.h> -#define smp_mb__after_spinlock() smp_mb() +#include <asm/qspinlock.h> +#define smp_mb__after_spinlock() smp_mb() #endif diff --git a/arch/arm/include/asm/spinlock_types.h b/arch/arm/include/asm/spinlock_types.h index c942a75250bb2..7d5200da6a5f8 100644 --- a/arch/arm/include/asm/spinlock_types.h +++ b/arch/arm/include/asm/spinlock_types.h @@ -6,6 +6,7 @@ # error "please don't include this file directly" #endif +#ifdef CONFIG_CPU_V6 #define TICKET_SHIFT 16 typedef struct { @@ -29,12 +30,11 @@ typedef struct { u32 lock; } arch_rwlock_t; -#ifdef CONFIG_CPU_V6 - #define __ARCH_RW_LOCK_UNLOCKED { 0 } #else +#include <asm-generic/qspinlock_types.h> #include <asm-generic/qrwlock_types.h> #endif -- 2.23.0 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 3/3] ARM: Inline locking functions for !PREEMPTION 2019-10-07 21:44 [RFC PATCH 0/3] Queued spinlocks/RW-locks for ARM Sebastian Andrzej Siewior 2019-10-07 21:44 ` [PATCH 1/3] ARM: Use qrwlock implementation Sebastian Andrzej Siewior 2019-10-07 21:44 ` [PATCH 2/3] ARM: Use qspinlock implementation Sebastian Andrzej Siewior @ 2019-10-07 21:44 ` Sebastian Andrzej Siewior 2019-10-08 11:42 ` [RFC PATCH 0/3] Queued spinlocks/RW-locks for ARM Arnd Bergmann 3 siblings, 0 replies; 22+ messages in thread From: Sebastian Andrzej Siewior @ 2019-10-07 21:44 UTC (permalink / raw) To: linux-arm-kernel Cc: Arnd Bergmann, Peter Zijlstra, Russell King, Sebastian Andrzej Siewior, Ingo Molnar, Waiman Long, Will Deacon On non-preemptive kernels, the locking instruction is less than 64 bytes and it makes sense to inline it. With PREEMPTION the kernel becomes very big if the locks are inlined. Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> --- arch/arm/Kconfig | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 7eba89bb45755..c1ee04c209e6e 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -26,6 +26,32 @@ config ARM select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST select ARCH_HAVE_CUSTOM_GPIO_H select ARCH_HAS_GCOV_PROFILE_ALL + select ARCH_INLINE_READ_LOCK if !PREEMPTION + select ARCH_INLINE_READ_LOCK_BH if !PREEMPTION + select ARCH_INLINE_READ_LOCK_IRQ if !PREEMPTION + select ARCH_INLINE_READ_LOCK_IRQSAVE if !PREEMPTION + select ARCH_INLINE_READ_UNLOCK if !PREEMPTION + select ARCH_INLINE_READ_UNLOCK_BH if !PREEMPTION + select ARCH_INLINE_READ_UNLOCK_IRQ if !PREEMPTION + select ARCH_INLINE_READ_UNLOCK_IRQRESTORE if !PREEMPTION + select ARCH_INLINE_WRITE_LOCK if !PREEMPTION + select ARCH_INLINE_WRITE_LOCK_BH if !PREEMPTION + select ARCH_INLINE_WRITE_LOCK_IRQ if !PREEMPTION + select ARCH_INLINE_WRITE_LOCK_IRQSAVE if !PREEMPTION + select ARCH_INLINE_WRITE_UNLOCK if !PREEMPTION + select ARCH_INLINE_WRITE_UNLOCK_BH if !PREEMPTION + select ARCH_INLINE_WRITE_UNLOCK_IRQ if !PREEMPTION + select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE if !PREEMPTION + select ARCH_INLINE_SPIN_TRYLOCK if !PREEMPTION + select ARCH_INLINE_SPIN_TRYLOCK_BH if !PREEMPTION + select ARCH_INLINE_SPIN_LOCK if !PREEMPTION + select ARCH_INLINE_SPIN_LOCK_BH if !PREEMPTION + select ARCH_INLINE_SPIN_LOCK_IRQ if !PREEMPTION + select ARCH_INLINE_SPIN_LOCK_IRQSAVE if !PREEMPTION + select ARCH_INLINE_SPIN_UNLOCK if !PREEMPTION + select ARCH_INLINE_SPIN_UNLOCK_BH if !PREEMPTION + select ARCH_INLINE_SPIN_UNLOCK_IRQ if !PREEMPTION + select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE if !PREEMPTION select ARCH_KEEP_MEMBLOCK if HAVE_ARCH_PFN_VALID || KEXEC select ARCH_MIGHT_HAVE_PC_PARPORT select ARCH_NO_SG_CHAIN if !ARM_HAS_SG_CHAIN -- 2.23.0 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [RFC PATCH 0/3] Queued spinlocks/RW-locks for ARM 2019-10-07 21:44 [RFC PATCH 0/3] Queued spinlocks/RW-locks for ARM Sebastian Andrzej Siewior ` (2 preceding siblings ...) 2019-10-07 21:44 ` [PATCH 3/3] ARM: Inline locking functions for !PREEMPTION Sebastian Andrzej Siewior @ 2019-10-08 11:42 ` Arnd Bergmann 2019-10-08 13:36 ` Waiman Long 2019-10-08 19:32 ` Sebastian Andrzej Siewior 3 siblings, 2 replies; 22+ messages in thread From: Arnd Bergmann @ 2019-10-08 11:42 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Peter Zijlstra, Russell King, Ingo Molnar, Waiman Long, Will Deacon, Linux ARM On Mon, Oct 7, 2019 at 11:45 PM Sebastian Andrzej Siewior <sebastian@breakpoint.cc> wrote: > > I added support for queued-RW and -spinlocks for ARM. I wanted to > remove the current implementation but this does not work. The CPU_V6 > kernel config does not have support for xchg() on 2 byte memory address. > This is required by q-lock' slowpath. It is possible to create a > multi-kernel (with v6+v7+SMP) which then lack the function. > > I tested the q-lock implementation with > hackbench -g40 -s 500 -l 500 > > The numbers in the table below represent the average runtime of 10 > invocations. I tested with HZ_100,HZ_250 and the different preemption > levels on a IMX6q-board (quad Cortex-A9) and an AM572x board (dual > Cortex-A15). > "Ticket" means the current implementation on v5.4-rc1, Q-Locks is the > switch to queued RW and spinlocks and in Q-locksI the locking > instruction is additionally inlined. This looks nice, and I don't see anything wrong with the implementation, but I am slightly worried about switching everything over to a generic spinlock while keeping the custom ARM version for an exceptionally rare corner case: The ARM spinlock is now only used when you build an SMP-enabled kernel for an ARM1136r0 that is used in OMAP2, i.MX3 and some of the least common Integrator/Realview variants. I'm not aware of any binary distros with ARMv6+ kernels, so these would run custom kernels that are almost always non-SMP as well as no longer getting kernel upgrades (almost all have been discontinued years ago, the i.MX35 chip itself was the last to get EOLd in 2018). Raspbian builds an ARMv6K SMP kernel that is not affected by this. I wonder if we can do something better here and make the asm-generic/qspinlock.h implementation always degrade into an equivalent of include/linux/spinlock_up.h when running on uniprocessor systems, avoiding both the atomic cmpxchg and the slowpath. That way, an ARMv6+SMP kernel on UP could share the qspinlock implementation but never actually get into the invalid 16-bit xchg() or sev()/wfe(). It already shouldn't ever get into the slowpath on a non-SMP system if I understand it correctly, but avoiding the cmpxchg() entirely would be an added benefit. Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH 0/3] Queued spinlocks/RW-locks for ARM 2019-10-08 11:42 ` [RFC PATCH 0/3] Queued spinlocks/RW-locks for ARM Arnd Bergmann @ 2019-10-08 13:36 ` Waiman Long 2019-10-08 14:32 ` Arnd Bergmann 2019-10-08 19:32 ` Sebastian Andrzej Siewior 1 sibling, 1 reply; 22+ messages in thread From: Waiman Long @ 2019-10-08 13:36 UTC (permalink / raw) To: Arnd Bergmann, Sebastian Andrzej Siewior Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Russell King, Linux ARM On 10/8/19 7:42 AM, Arnd Bergmann wrote: > On Mon, Oct 7, 2019 at 11:45 PM Sebastian Andrzej Siewior > <sebastian@breakpoint.cc> wrote: >> I added support for queued-RW and -spinlocks for ARM. I wanted to >> remove the current implementation but this does not work. The CPU_V6 >> kernel config does not have support for xchg() on 2 byte memory address. >> This is required by q-lock' slowpath. It is possible to create a >> multi-kernel (with v6+v7+SMP) which then lack the function. >> >> I tested the q-lock implementation with >> hackbench -g40 -s 500 -l 500 >> >> The numbers in the table below represent the average runtime of 10 >> invocations. I tested with HZ_100,HZ_250 and the different preemption >> levels on a IMX6q-board (quad Cortex-A9) and an AM572x board (dual >> Cortex-A15). >> "Ticket" means the current implementation on v5.4-rc1, Q-Locks is the >> switch to queued RW and spinlocks and in Q-locksI the locking >> instruction is additionally inlined. > This looks nice, and I don't see anything wrong with the implementation, > but I am slightly worried about switching everything over to a generic > spinlock while keeping the custom ARM version for an exceptionally > rare corner case: > > The ARM spinlock is now only used when you build an SMP-enabled > kernel for an ARM1136r0 that is used in OMAP2, i.MX3 and some > of the least common Integrator/Realview variants. I'm not aware > of any binary distros with ARMv6+ kernels, so these would run custom > kernels that are almost always non-SMP as well as no longer getting > kernel upgrades (almost all have been discontinued years ago, the i.MX35 > chip itself was the last to get EOLd in 2018). > Raspbian builds an ARMv6K SMP kernel that is not affected by this. > > I wonder if we can do something better here and make the > asm-generic/qspinlock.h implementation always degrade into an > equivalent of include/linux/spinlock_up.h when running on uniprocessor > systems, avoiding both the atomic cmpxchg and the slowpath. In x86, the lock instruction prefix is patched out when running on UP system. This downgrades the atomic cmpxchg to non-atomic one. We may do something similar in other architectures. Cheers, Longman _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH 0/3] Queued spinlocks/RW-locks for ARM 2019-10-08 13:36 ` Waiman Long @ 2019-10-08 14:32 ` Arnd Bergmann 2019-10-08 19:47 ` Sebastian Andrzej Siewior 2019-10-09 8:46 ` Peter Zijlstra 0 siblings, 2 replies; 22+ messages in thread From: Arnd Bergmann @ 2019-10-08 14:32 UTC (permalink / raw) To: Waiman Long Cc: Peter Zijlstra, Russell King, Sebastian Andrzej Siewior, Ingo Molnar, Will Deacon, Linux ARM On Tue, Oct 8, 2019 at 3:36 PM Waiman Long <longman@redhat.com> wrote: > On 10/8/19 7:42 AM, Arnd Bergmann wrote: > > On Mon, Oct 7, 2019 at 11:45 PM Sebastian Andrzej Siewior <sebastian@breakpoint.cc> wrote: > > I wonder if we can do something better here and make the > > asm-generic/qspinlock.h implementation always degrade into an > > equivalent of include/linux/spinlock_up.h when running on uniprocessor > > systems, avoiding both the atomic cmpxchg and the slowpath. I looked at the qspinlock implementation some more, and I think we can get away with simply defining a special-case smp_xchg16_relaxed() that is only used for the odd configuration: diff --git a/arch/arm/include/asm/cmpxchg.h b/arch/arm/include/asm/cmpxchg.h index 8b701f8e175c..6bf4964c105c 100644 --- a/arch/arm/include/asm/cmpxchg.h +++ b/arch/arm/include/asm/cmpxchg.h @@ -114,6 +114,24 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size return ret; } +#ifdef CONFIG_CPU_V6 +static inline unsigned short smp_xchg16_relaxed(volatile unsigned short *ptr, unsigned short x) +{ + unsigned short ret, tmp; + asm volatile("@ smp_xchg16_relaxed\n" + ".arch armv6k\n" + "1: ldrexh %0, [%3]\n" + " strexh %1, %2, [%3]\n" + " teq %1, #0\n" + " bne 1b" + : "=&r" (ret), "=&r" (tmp) + : "r" (x), "r" (ptr) + : "memory", "cc"); + return ret; +} +#define smp_xchg16_relaxed smp_xchg16_relaxed +#endif + #define xchg_relaxed(ptr, x) ({ \ (__typeof__(*(ptr)))__xchg((unsigned long)(x), (ptr), \ sizeof(*(ptr))); \ diff --git a/include/linux/atomic.h b/include/linux/atomic.h index 4c0d009a46f0..4612e27e3330 100644 --- a/include/linux/atomic.h +++ b/include/linux/atomic.h @@ -71,6 +71,10 @@ __ret; \ }) +#ifndef smp_xchg16_relaxed +#define smp_xchg16_relaxed(p, x) xchg_relaxed(p, x) +#endif + #include <linux/atomic-fallback.h> #include <asm-generic/atomic-long.h> diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index 2473f10c6956..af1473528edf 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -178,8 +178,8 @@ static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) * We can use relaxed semantics since the caller ensures that the * MCS node is properly initialized before updating the tail. */ - return (u32)xchg_relaxed(&lock->tail, - tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET; + return (u32)smp_xchg16_relaxed(&lock->tail, + tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET; } #else /* _Q_PENDING_BITS == 8 */ > In x86, the lock instruction prefix is patched out when running on UP > system. This downgrades the atomic cmpxchg to non-atomic one. We may do > something similar in other architectures. Unfortunately, the atomic macros cannot trivially be made cheaper on non-SMP systems based on load-locked/store-conditional based architectures, as there may be an interrupt in-between, and disabling interrupts would likely be more expensive. However, there might be a way to take a shortcut out of queued_spin_lock() using asm-goto combined with the ARM __ALT_SMP_ASM() macro. Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [RFC PATCH 0/3] Queued spinlocks/RW-locks for ARM 2019-10-08 14:32 ` Arnd Bergmann @ 2019-10-08 19:47 ` Sebastian Andrzej Siewior 2019-10-08 21:47 ` Arnd Bergmann 2019-10-09 8:46 ` Peter Zijlstra 1 sibling, 1 reply; 22+ messages in thread From: Sebastian Andrzej Siewior @ 2019-10-08 19:47 UTC (permalink / raw) To: Arnd Bergmann Cc: Peter Zijlstra, Russell King, Ingo Molnar, Waiman Long, Will Deacon, Linux ARM On 2019-10-08 16:32:27 [+0200], Arnd Bergmann wrote: > On Tue, Oct 8, 2019 at 3:36 PM Waiman Long <longman@redhat.com> wrote: > > In x86, the lock instruction prefix is patched out when running on UP > > system. This downgrades the atomic cmpxchg to non-atomic one. We may do > > something similar in other architectures. > > Unfortunately, the atomic macros cannot trivially be made cheaper > on non-SMP systems based on load-locked/store-conditional > based architectures, as there may be an interrupt in-between, > and disabling interrupts would likely be more expensive. > > However, there might be a way to take a shortcut out of > queued_spin_lock() using asm-goto combined with the ARM > __ALT_SMP_ASM() macro. The smp_xchg16_relaxed() snippet above looked good. I would buy it :) Where are you heading with __ALT_SMP_ASM()? > Arnd Sebastian _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH 0/3] Queued spinlocks/RW-locks for ARM 2019-10-08 19:47 ` Sebastian Andrzej Siewior @ 2019-10-08 21:47 ` Arnd Bergmann 2019-10-08 22:02 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 22+ messages in thread From: Arnd Bergmann @ 2019-10-08 21:47 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Peter Zijlstra, Russell King, Ingo Molnar, Waiman Long, Will Deacon, Linux ARM On Tue, Oct 8, 2019 at 9:47 PM Sebastian Andrzej Siewior <sebastian@breakpoint.cc> wrote: > > On 2019-10-08 16:32:27 [+0200], Arnd Bergmann wrote: > > On Tue, Oct 8, 2019 at 3:36 PM Waiman Long <longman@redhat.com> wrote: > > > In x86, the lock instruction prefix is patched out when running on UP > > > system. This downgrades the atomic cmpxchg to non-atomic one. We may do > > > something similar in other architectures. > > > > Unfortunately, the atomic macros cannot trivially be made cheaper > > on non-SMP systems based on load-locked/store-conditional > > based architectures, as there may be an interrupt in-between, > > and disabling interrupts would likely be more expensive. > > > > However, there might be a way to take a shortcut out of > > queued_spin_lock() using asm-goto combined with the ARM > > __ALT_SMP_ASM() macro. > > The smp_xchg16_relaxed() snippet above looked good. I would buy it :) > Where are you heading with __ALT_SMP_ASM()? I was thinking of something along the lines of this: diff --git a/arch/arm/include/asm/spinlock.h b/arch/arm/include/asm/spinlock.h index 2c595446cd73..1aa321b45f63 100644 --- a/arch/arm/include/asm/spinlock.h +++ b/arch/arm/include/asm/spinlock.h @@ -45,6 +45,19 @@ static inline void dsb_sev(void) __asm__(SEV); } +static __always_inline bool smp_enabled(void) +{ + if (IS_ENABLED(CONFIG_SMP)) + asm_volatile_goto(__ALT_SMP_ASM(WASM(b) " %l[smp_on_up]", WASM(nop)) + :::: smp_on_up); + + return false; + +smp_on_up: + return true; +} +#define smp_enabled smp_enabled + #include <asm/qrwlock.h> #include <asm/qspinlock.h> #define smp_mb__after_spinlock() smp_mb() diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h index fde943d180e0..3c456ad1661b 100644 --- a/include/asm-generic/qspinlock.h +++ b/include/asm-generic/qspinlock.h @@ -12,6 +12,10 @@ #include <asm-generic/qspinlock_types.h> +#ifndef smp_enabled +#define smp_enabled (true) +#endif + /** * queued_spin_is_locked - is the spinlock locked? * @lock: Pointer to queued spinlock structure @@ -75,6 +79,11 @@ static __always_inline void queued_spin_lock(struct qspinlock *lock) { u32 val = 0; + if (!smp_enabled()) { + atomic_set(&lock->val, _Q_LOCKED_VAL); + return; + } + if (likely(atomic_try_cmpxchg_acquire(&lock->val, &val, _Q_LOCKED_VAL))) return; The above is likely incorrect, non-idiomatic or inefficient, but this is a way to avoid both a runtime check and the cmpxchg() in each spinlock. Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [RFC PATCH 0/3] Queued spinlocks/RW-locks for ARM 2019-10-08 21:47 ` Arnd Bergmann @ 2019-10-08 22:02 ` Sebastian Andrzej Siewior 2019-10-09 8:15 ` Arnd Bergmann 0 siblings, 1 reply; 22+ messages in thread From: Sebastian Andrzej Siewior @ 2019-10-08 22:02 UTC (permalink / raw) To: Arnd Bergmann Cc: Peter Zijlstra, Russell King, Ingo Molnar, Waiman Long, Will Deacon, Linux ARM On 2019-10-08 23:47:31 [+0200], Arnd Bergmann wrote: … > diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h > index fde943d180e0..3c456ad1661b 100644 > --- a/include/asm-generic/qspinlock.h > +++ b/include/asm-generic/qspinlock.h … > @@ -75,6 +79,11 @@ static __always_inline void queued_spin_lock(struct > qspinlock *lock) > { > u32 val = 0; > > + if (!smp_enabled()) { > + atomic_set(&lock->val, _Q_LOCKED_VAL); > + return; > + } > + > if (likely(atomic_try_cmpxchg_acquire(&lock->val, &val, _Q_LOCKED_VAL))) > return; > > The above is likely incorrect, non-idiomatic or inefficient, but this > is a way to > avoid both a runtime check and the cmpxchg() in each spinlock. You would have to put this in arch_spin_trylock() but I get the idea. The current implementation does cmpxchg() in the try-lock case so by switching to q-locks are not getting worse in the UP case. Therefore I this is more of an optimisation for those that run SMP kernels on UP machines. > Arnd Sebastian _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH 0/3] Queued spinlocks/RW-locks for ARM 2019-10-08 22:02 ` Sebastian Andrzej Siewior @ 2019-10-09 8:15 ` Arnd Bergmann 0 siblings, 0 replies; 22+ messages in thread From: Arnd Bergmann @ 2019-10-09 8:15 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Peter Zijlstra, Russell King, Ingo Molnar, Waiman Long, Will Deacon, Linux ARM On Wed, Oct 9, 2019 at 12:02 AM Sebastian Andrzej Siewior <sebastian@breakpoint.cc> wrote: > > On 2019-10-08 23:47:31 [+0200], Arnd Bergmann wrote: > … > > diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h > > index fde943d180e0..3c456ad1661b 100644 > > --- a/include/asm-generic/qspinlock.h > > +++ b/include/asm-generic/qspinlock.h > … > > @@ -75,6 +79,11 @@ static __always_inline void queued_spin_lock(struct > > qspinlock *lock) > > { > > u32 val = 0; > > > > + if (!smp_enabled()) { > > + atomic_set(&lock->val, _Q_LOCKED_VAL); > > + return; > > + } > > + > > if (likely(atomic_try_cmpxchg_acquire(&lock->val, &val, _Q_LOCKED_VAL))) > > return; > > > > The above is likely incorrect, non-idiomatic or inefficient, but this > > is a way to > > avoid both a runtime check and the cmpxchg() in each spinlock. > > You would have to put this in arch_spin_trylock() but I get the idea. > The current implementation does cmpxchg() in the try-lock case so by > switching to q-locks are not getting worse in the UP case. Ah right, I dismissed the trylock case because there are few calls to spin_trylock but I failed to noticed the __raw_spin_lock() for the CONFIG_LOCK_STAT case. > Therefore I this is more of an optimisation for those that run SMP > kernels on UP machines. Yes, that was the point. Originally I wasn't sure if there was a case in which that configuration could still end up in the spinlock slowpath, but as that won't happen it's just a way to avoid the atomic operation on UP. It might not actually make much of a difference depending on the actual overhead of ldrex/strex while running on a single core. Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH 0/3] Queued spinlocks/RW-locks for ARM 2019-10-08 14:32 ` Arnd Bergmann 2019-10-08 19:47 ` Sebastian Andrzej Siewior @ 2019-10-09 8:46 ` Peter Zijlstra 2019-10-09 8:57 ` Arnd Bergmann 1 sibling, 1 reply; 22+ messages in thread From: Peter Zijlstra @ 2019-10-09 8:46 UTC (permalink / raw) To: Arnd Bergmann Cc: Russell King, Sebastian Andrzej Siewior, Ingo Molnar, Waiman Long, Will Deacon, Linux ARM On Tue, Oct 08, 2019 at 04:32:27PM +0200, Arnd Bergmann wrote: > diff --git a/arch/arm/include/asm/cmpxchg.h b/arch/arm/include/asm/cmpxchg.h > index 8b701f8e175c..6bf4964c105c 100644 > --- a/arch/arm/include/asm/cmpxchg.h > +++ b/arch/arm/include/asm/cmpxchg.h > @@ -114,6 +114,24 @@ static inline unsigned long __xchg(unsigned long > x, volatile void *ptr, int size > return ret; > } > > +#ifdef CONFIG_CPU_V6 > +static inline unsigned short smp_xchg16_relaxed(volatile unsigned > short *ptr, unsigned short x) > +{ > + unsigned short ret, tmp; > + asm volatile("@ smp_xchg16_relaxed\n" > + ".arch armv6k\n" > + "1: ldrexh %0, [%3]\n" > + " strexh %1, %2, [%3]\n" > + " teq %1, #0\n" > + " bne 1b" > + : "=&r" (ret), "=&r" (tmp) > + : "r" (x), "r" (ptr) > + : "memory", "cc"); > + return ret; > +} > +#define smp_xchg16_relaxed smp_xchg16_relaxed > +#endif Why is this not in __xchg() as a variant for case 2 ? _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH 0/3] Queued spinlocks/RW-locks for ARM 2019-10-09 8:46 ` Peter Zijlstra @ 2019-10-09 8:57 ` Arnd Bergmann 2019-10-09 9:31 ` Peter Zijlstra 0 siblings, 1 reply; 22+ messages in thread From: Arnd Bergmann @ 2019-10-09 8:57 UTC (permalink / raw) To: Peter Zijlstra Cc: Russell King, Sebastian Andrzej Siewior, Ingo Molnar, Waiman Long, Will Deacon, Linux ARM On Wed, Oct 9, 2019 at 10:46 AM Peter Zijlstra <peterz@infradead.org> wrote: > > On Tue, Oct 08, 2019 at 04:32:27PM +0200, Arnd Bergmann wrote: > > diff --git a/arch/arm/include/asm/cmpxchg.h b/arch/arm/include/asm/cmpxchg.h > > index 8b701f8e175c..6bf4964c105c 100644 > > --- a/arch/arm/include/asm/cmpxchg.h > > +++ b/arch/arm/include/asm/cmpxchg.h > > @@ -114,6 +114,24 @@ static inline unsigned long __xchg(unsigned long > > x, volatile void *ptr, int size > > return ret; > > } > > > > +#ifdef CONFIG_CPU_V6 > > +static inline unsigned short smp_xchg16_relaxed(volatile unsigned > > short *ptr, unsigned short x) > > +{ > > + unsigned short ret, tmp; > > + asm volatile("@ smp_xchg16_relaxed\n" > > + ".arch armv6k\n" > > + "1: ldrexh %0, [%3]\n" > > + " strexh %1, %2, [%3]\n" > > + " teq %1, #0\n" > > + " bne 1b" > > + : "=&r" (ret), "=&r" (tmp) > > + : "r" (x), "r" (ptr) > > + : "memory", "cc"); > > + return ret; > > +} > > +#define smp_xchg16_relaxed smp_xchg16_relaxed > > +#endif > > Why is this not in __xchg() as a variant for case 2 ? ldrexh/strexh are instructions that are only available on SMP-capable architecture revisions (ARMv6K or higher). When building a kernel that runs both on pre-K ARMv6 uniprocessor systems and on later SMP systems, __xchg() can only do 32-bit ldrex/strex. The trick of smp_xchg16_relaxed() is to allow the 16-bit atomics in code that can is only ever executed on SMP systems but not the uniprocessor OMAP2 and i.MX3 chips that would trap. Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH 0/3] Queued spinlocks/RW-locks for ARM 2019-10-09 8:57 ` Arnd Bergmann @ 2019-10-09 9:31 ` Peter Zijlstra 2019-10-09 10:31 ` Arnd Bergmann 0 siblings, 1 reply; 22+ messages in thread From: Peter Zijlstra @ 2019-10-09 9:31 UTC (permalink / raw) To: Arnd Bergmann Cc: Russell King, Sebastian Andrzej Siewior, Ingo Molnar, Waiman Long, Will Deacon, Linux ARM On Wed, Oct 09, 2019 at 10:57:25AM +0200, Arnd Bergmann wrote: > On Wed, Oct 9, 2019 at 10:46 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > On Tue, Oct 08, 2019 at 04:32:27PM +0200, Arnd Bergmann wrote: > > > diff --git a/arch/arm/include/asm/cmpxchg.h b/arch/arm/include/asm/cmpxchg.h > > > index 8b701f8e175c..6bf4964c105c 100644 > > > --- a/arch/arm/include/asm/cmpxchg.h > > > +++ b/arch/arm/include/asm/cmpxchg.h > > > @@ -114,6 +114,24 @@ static inline unsigned long __xchg(unsigned long > > > x, volatile void *ptr, int size > > > return ret; > > > } > > > > > > +#ifdef CONFIG_CPU_V6 > > > +static inline unsigned short smp_xchg16_relaxed(volatile unsigned > > > short *ptr, unsigned short x) > > > +{ > > > + unsigned short ret, tmp; > > > + asm volatile("@ smp_xchg16_relaxed\n" > > > + ".arch armv6k\n" > > > + "1: ldrexh %0, [%3]\n" > > > + " strexh %1, %2, [%3]\n" > > > + " teq %1, #0\n" > > > + " bne 1b" > > > + : "=&r" (ret), "=&r" (tmp) > > > + : "r" (x), "r" (ptr) > > > + : "memory", "cc"); > > > + return ret; > > > +} > > > +#define smp_xchg16_relaxed smp_xchg16_relaxed > > > +#endif > > > > Why is this not in __xchg() as a variant for case 2 ? > > ldrexh/strexh are instructions that are only available on SMP-capable > architecture revisions (ARMv6K or higher). When building a kernel > that runs both on pre-K ARMv6 uniprocessor systems and on later > SMP systems, __xchg() can only do 32-bit ldrex/strex. You can do u16 xchg using a u32 ll/sc, see openrisc's xchg_small(). _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH 0/3] Queued spinlocks/RW-locks for ARM 2019-10-09 9:31 ` Peter Zijlstra @ 2019-10-09 10:31 ` Arnd Bergmann 2019-10-09 10:56 ` Peter Zijlstra 0 siblings, 1 reply; 22+ messages in thread From: Arnd Bergmann @ 2019-10-09 10:31 UTC (permalink / raw) To: Peter Zijlstra Cc: Russell King, Sebastian Andrzej Siewior, Ingo Molnar, Waiman Long, Will Deacon, Linux ARM On Wed, Oct 9, 2019 at 11:31 AM Peter Zijlstra <peterz@infradead.org> wrote: > On Wed, Oct 09, 2019 at 10:57:25AM +0200, Arnd Bergmann wrote: > > On Wed, Oct 9, 2019 at 10:46 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > Why is this not in __xchg() as a variant for case 2 ? > > > > ldrexh/strexh are instructions that are only available on SMP-capable > > architecture revisions (ARMv6K or higher). When building a kernel > > that runs both on pre-K ARMv6 uniprocessor systems and on later > > SMP systems, __xchg() can only do 32-bit ldrex/strex. > > You can do u16 xchg using a u32 ll/sc, see openrisc's xchg_small(). Ah, right. That would be much nicer than my smp_xchg16_relaxed() hack to get the corner case working, as it avoids the ugly special case in qspinlock.h. Would this still have comparable performance characteristics? I assume the 16-bit xchg_relaxed() in qspinlock.c was meant as an optimization for x86 and other cmpxchg based architectures but doesn't actually help on ll/sc based architectures that get the reservation on the whole cache line anyway? Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH 0/3] Queued spinlocks/RW-locks for ARM 2019-10-09 10:31 ` Arnd Bergmann @ 2019-10-09 10:56 ` Peter Zijlstra 2019-10-09 12:00 ` Arnd Bergmann 0 siblings, 1 reply; 22+ messages in thread From: Peter Zijlstra @ 2019-10-09 10:56 UTC (permalink / raw) To: Arnd Bergmann Cc: Russell King, Sebastian Andrzej Siewior, Ingo Molnar, Waiman Long, Will Deacon, Linux ARM On Wed, Oct 09, 2019 at 12:31:24PM +0200, Arnd Bergmann wrote: > On Wed, Oct 9, 2019 at 11:31 AM Peter Zijlstra <peterz@infradead.org> wrote: > > On Wed, Oct 09, 2019 at 10:57:25AM +0200, Arnd Bergmann wrote: > > > On Wed, Oct 9, 2019 at 10:46 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > Why is this not in __xchg() as a variant for case 2 ? > > > > > > ldrexh/strexh are instructions that are only available on SMP-capable > > > architecture revisions (ARMv6K or higher). When building a kernel > > > that runs both on pre-K ARMv6 uniprocessor systems and on later > > > SMP systems, __xchg() can only do 32-bit ldrex/strex. > > > > You can do u16 xchg using a u32 ll/sc, see openrisc's xchg_small(). > > Ah, right. That would be much nicer than my smp_xchg16_relaxed() > hack to get the corner case working, as it avoids the ugly special > case in qspinlock.h. > > Would this still have comparable performance characteristics? I suppose so.. > I assume the 16-bit xchg_relaxed() in qspinlock.c was meant as > an optimization for x86 and other cmpxchg based architectures but > doesn't actually help on ll/sc based architectures that get the > reservation on the whole cache line anyway? It does actually help here too, because it allows other operations to be regular load/stores. Look at the #if _Q_PENDING_BITS==8 in qspinlock.c, as opposed to the #else where they're all atomic_*(). _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH 0/3] Queued spinlocks/RW-locks for ARM 2019-10-09 10:56 ` Peter Zijlstra @ 2019-10-09 12:00 ` Arnd Bergmann 2019-10-09 12:06 ` Peter Zijlstra 0 siblings, 1 reply; 22+ messages in thread From: Arnd Bergmann @ 2019-10-09 12:00 UTC (permalink / raw) To: Peter Zijlstra Cc: Russell King, Sebastian Andrzej Siewior, Ingo Molnar, Waiman Long, Will Deacon, Linux ARM On Wed, Oct 9, 2019 at 12:57 PM Peter Zijlstra <peterz@infradead.org> wrote: > On Wed, Oct 09, 2019 at 12:31:24PM +0200, Arnd Bergmann wrote: > > On Wed, Oct 9, 2019 at 11:31 AM Peter Zijlstra <peterz@infradead.org> wrote: > > I assume the 16-bit xchg_relaxed() in qspinlock.c was meant as > > an optimization for x86 and other cmpxchg based architectures but > > doesn't actually help on ll/sc based architectures that get the > > reservation on the whole cache line anyway? > > It does actually help here too, because it allows other operations to be > regular load/stores. > > Look at the #if _Q_PENDING_BITS==8 in qspinlock.c, as opposed to the > #else where they're all atomic_*(). Oh, is that safe with an xchg() implementation that operates on the whole 32 bit when a concurrent thread can do a simple store to one half of it? The ARM architecture reference says "It is UNPREDICTABLE whether the transition from Exclusive Access to Open Access state occurs when the Store or StoreExcl is from another observer.", which sounds to me me like the xchg_small() trick would not work with the qspinlock implementation on ARM. [I see that mips, openrisc and xtensa do this, but did not try to find out whether they have ll/sc semantics that make it safe when another thread does a plain store to the reservation] OTOH, I suppose we could just set _Q_PENDING_BITS to 1 regardless of NR_CPUS on any architecture without a 16-bit xchg() primitive. Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH 0/3] Queued spinlocks/RW-locks for ARM 2019-10-09 12:00 ` Arnd Bergmann @ 2019-10-09 12:06 ` Peter Zijlstra 2019-10-09 12:52 ` Will Deacon 2019-10-09 13:50 ` Arnd Bergmann 0 siblings, 2 replies; 22+ messages in thread From: Peter Zijlstra @ 2019-10-09 12:06 UTC (permalink / raw) To: Arnd Bergmann Cc: Russell King, Sebastian Andrzej Siewior, Ingo Molnar, Waiman Long, Will Deacon, Linux ARM On Wed, Oct 09, 2019 at 02:00:05PM +0200, Arnd Bergmann wrote: > On Wed, Oct 9, 2019 at 12:57 PM Peter Zijlstra <peterz@infradead.org> wrote: > > On Wed, Oct 09, 2019 at 12:31:24PM +0200, Arnd Bergmann wrote: > > > On Wed, Oct 9, 2019 at 11:31 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > I assume the 16-bit xchg_relaxed() in qspinlock.c was meant as > > > an optimization for x86 and other cmpxchg based architectures but > > > doesn't actually help on ll/sc based architectures that get the > > > reservation on the whole cache line anyway? > > > > It does actually help here too, because it allows other operations to be > > regular load/stores. > > > > Look at the #if _Q_PENDING_BITS==8 in qspinlock.c, as opposed to the > > #else where they're all atomic_*(). > > Oh, is that safe with an xchg() implementation that operates on the whole > 32 bit when a concurrent thread can do a simple store to one half of it? It had better be, otherwise LL/SC'd be broken. SC _must_ fail when there is a contending store. > The ARM architecture reference says "It is UNPREDICTABLE whether the > transition from Exclusive Access to Open Access state occurs when the > Store or StoreExcl is from another observer.", which sounds to me > me like the xchg_small() trick would not work with the qspinlock > implementation on ARM. [I see that mips, openrisc and xtensa do this, > but did not try to find out whether they have ll/sc semantics that make > it safe when another thread does a plain store to the reservation] Will? _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH 0/3] Queued spinlocks/RW-locks for ARM 2019-10-09 12:06 ` Peter Zijlstra @ 2019-10-09 12:52 ` Will Deacon 2019-10-09 13:50 ` Arnd Bergmann 1 sibling, 0 replies; 22+ messages in thread From: Will Deacon @ 2019-10-09 12:52 UTC (permalink / raw) To: Peter Zijlstra Cc: Arnd Bergmann, Russell King, Sebastian Andrzej Siewior, Ingo Molnar, Waiman Long, Linux ARM On Wed, Oct 09, 2019 at 02:06:39PM +0200, Peter Zijlstra wrote: > On Wed, Oct 09, 2019 at 02:00:05PM +0200, Arnd Bergmann wrote: > > On Wed, Oct 9, 2019 at 12:57 PM Peter Zijlstra <peterz@infradead.org> wrote: > > > On Wed, Oct 09, 2019 at 12:31:24PM +0200, Arnd Bergmann wrote: > > > > On Wed, Oct 9, 2019 at 11:31 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > I assume the 16-bit xchg_relaxed() in qspinlock.c was meant as > > > > an optimization for x86 and other cmpxchg based architectures but > > > > doesn't actually help on ll/sc based architectures that get the > > > > reservation on the whole cache line anyway? > > > > > > It does actually help here too, because it allows other operations to be > > > regular load/stores. > > > > > > Look at the #if _Q_PENDING_BITS==8 in qspinlock.c, as opposed to the > > > #else where they're all atomic_*(). > > > > Oh, is that safe with an xchg() implementation that operates on the whole > > 32 bit when a concurrent thread can do a simple store to one half of it? > > It had better be, otherwise LL/SC'd be broken. SC _must_ fail when there > is a contending store. > > > The ARM architecture reference says "It is UNPREDICTABLE whether the > > transition from Exclusive Access to Open Access state occurs when the > > Store or StoreExcl is from another observer.", which sounds to me > > me like the xchg_small() trick would not work with the qspinlock > > implementation on ARM. [I see that mips, openrisc and xtensa do this, > > but did not try to find out whether they have ll/sc semantics that make > > it safe when another thread does a plain store to the reservation] > > Will? I think this is the documentation being unhelpful -- there is a section about the "local monitor", which this text applies to, but then there's also a section about the "global monitor", which has a state machine diagram in the section called "Clear global monitor event". This diagram shows how the global monitor transitions from "Exclusive access" to "Open access" in response to a store from a different observer to the "marked address". Will _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH 0/3] Queued spinlocks/RW-locks for ARM 2019-10-09 12:06 ` Peter Zijlstra 2019-10-09 12:52 ` Will Deacon @ 2019-10-09 13:50 ` Arnd Bergmann 2019-10-09 21:42 ` Sebastian Andrzej Siewior 1 sibling, 1 reply; 22+ messages in thread From: Arnd Bergmann @ 2019-10-09 13:50 UTC (permalink / raw) To: Peter Zijlstra Cc: Russell King, Sebastian Andrzej Siewior, Ingo Molnar, Waiman Long, Will Deacon, Linux ARM On Wed, Oct 9, 2019 at 2:06 PM Peter Zijlstra <peterz@infradead.org> wrote: > On Wed, Oct 09, 2019 at 02:00:05PM +0200, Arnd Bergmann wrote: > > On Wed, Oct 9, 2019 at 12:57 PM Peter Zijlstra <peterz@infradead.org> wrote: > > > On Wed, Oct 09, 2019 at 12:31:24PM +0200, Arnd Bergmann wrote: > > > > On Wed, Oct 9, 2019 at 11:31 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > I assume the 16-bit xchg_relaxed() in qspinlock.c was meant as > > > > an optimization for x86 and other cmpxchg based architectures but > > > > doesn't actually help on ll/sc based architectures that get the > > > > reservation on the whole cache line anyway? > > > > > > It does actually help here too, because it allows other operations to be > > > regular load/stores. > > > > > > Look at the #if _Q_PENDING_BITS==8 in qspinlock.c, as opposed to the > > > #else where they're all atomic_*(). > > > > Oh, is that safe with an xchg() implementation that operates on the whole > > 32 bit when a concurrent thread can do a simple store to one half of it? > > It had better be, otherwise LL/SC'd be broken. SC _must_ fail when there > is a contending store. Ok. I looked a bit at the other implementations that do xchg16() through cmpxchg32(), and it seems it would be easiest to reuse the superh version, which is fully portable by moving arch/sh/include/asm/cmpxchg-xchg.h into include/asm-generic/, the same thing would allow us to change a number of other architectures to use the generic qspinlock instead of their own locks. Sebastian, do you want to try doing it that way? Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH 0/3] Queued spinlocks/RW-locks for ARM 2019-10-09 13:50 ` Arnd Bergmann @ 2019-10-09 21:42 ` Sebastian Andrzej Siewior 0 siblings, 0 replies; 22+ messages in thread From: Sebastian Andrzej Siewior @ 2019-10-09 21:42 UTC (permalink / raw) To: Arnd Bergmann Cc: Peter Zijlstra, Russell King, Ingo Molnar, Waiman Long, Will Deacon, Linux ARM On 2019-10-09 15:50:24 [+0200], Arnd Bergmann wrote: > Ok. I looked a bit at the other implementations that do xchg16() through > cmpxchg32(), and it seems it would be easiest to reuse the superh version, > which is fully portable by moving arch/sh/include/asm/cmpxchg-xchg.h > into include/asm-generic/, the same thing would allow us to change a > number of other architectures to use the generic qspinlock instead of > their own locks. > > Sebastian, do you want to try doing it that way? sounds good, I'm on it. > Arnd Sebastian _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH 0/3] Queued spinlocks/RW-locks for ARM 2019-10-08 11:42 ` [RFC PATCH 0/3] Queued spinlocks/RW-locks for ARM Arnd Bergmann 2019-10-08 13:36 ` Waiman Long @ 2019-10-08 19:32 ` Sebastian Andrzej Siewior 1 sibling, 0 replies; 22+ messages in thread From: Sebastian Andrzej Siewior @ 2019-10-08 19:32 UTC (permalink / raw) To: Arnd Bergmann Cc: Peter Zijlstra, Russell King, Ingo Molnar, Waiman Long, Will Deacon, Linux ARM On 2019-10-08 13:42:43 [+0200], Arnd Bergmann wrote: > This looks nice, and I don't see anything wrong with the implementation, > but I am slightly worried about switching everything over to a generic > spinlock while keeping the custom ARM version for an exceptionally > rare corner case: I do not want ARM to have two spinlock implementations. > The ARM spinlock is now only used when you build an SMP-enabled > kernel for an ARM1136r0 that is used in OMAP2, i.MX3 and some > of the least common Integrator/Realview variants. I'm not aware > of any binary distros with ARMv6+ kernels, so these would run custom > kernels that are almost always non-SMP as well as no longer getting > kernel upgrades (almost all have been discontinued years ago, the i.MX35 > chip itself was the last to get EOLd in 2018). > Raspbian builds an ARMv6K SMP kernel that is not affected by this. I just looked at the Debian configs. The armel look UP only and armhf is V7. I am not sure if there is a SMP capable CPU that is V6. I've been looking at a wiki [0] and the first SMP CPU seems to be ARMv6K. > I wonder if we can do something better here and make the > asm-generic/qspinlock.h implementation always degrade into an > equivalent of include/linux/spinlock_up.h when running on uniprocessor > systems, avoiding both the atomic cmpxchg and the slowpath. > > That way, an ARMv6+SMP kernel on UP could share the qspinlock > implementation but never actually get into the invalid 16-bit xchg() or > sev()/wfe(). It already shouldn't ever get into the slowpath on a > non-SMP system if I understand it correctly, but avoiding the cmpxchg() > entirely would be an added benefit. The lock should never be contenteded so yes. [0] https://en.wikipedia.org/wiki/List_of_ARM_microarchitectures > Arnd Sebastian _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2019-10-09 21:42 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-10-07 21:44 [RFC PATCH 0/3] Queued spinlocks/RW-locks for ARM Sebastian Andrzej Siewior 2019-10-07 21:44 ` [PATCH 1/3] ARM: Use qrwlock implementation Sebastian Andrzej Siewior 2019-10-07 21:44 ` [PATCH 2/3] ARM: Use qspinlock implementation Sebastian Andrzej Siewior 2019-10-07 21:44 ` [PATCH 3/3] ARM: Inline locking functions for !PREEMPTION Sebastian Andrzej Siewior 2019-10-08 11:42 ` [RFC PATCH 0/3] Queued spinlocks/RW-locks for ARM Arnd Bergmann 2019-10-08 13:36 ` Waiman Long 2019-10-08 14:32 ` Arnd Bergmann 2019-10-08 19:47 ` Sebastian Andrzej Siewior 2019-10-08 21:47 ` Arnd Bergmann 2019-10-08 22:02 ` Sebastian Andrzej Siewior 2019-10-09 8:15 ` Arnd Bergmann 2019-10-09 8:46 ` Peter Zijlstra 2019-10-09 8:57 ` Arnd Bergmann 2019-10-09 9:31 ` Peter Zijlstra 2019-10-09 10:31 ` Arnd Bergmann 2019-10-09 10:56 ` Peter Zijlstra 2019-10-09 12:00 ` Arnd Bergmann 2019-10-09 12:06 ` Peter Zijlstra 2019-10-09 12:52 ` Will Deacon 2019-10-09 13:50 ` Arnd Bergmann 2019-10-09 21:42 ` Sebastian Andrzej Siewior 2019-10-08 19:32 ` Sebastian Andrzej Siewior
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).