All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] locking/lockref: Use try_cmpxchg64 in CMPXCHG_LOOP macro
@ 2022-05-25 14:40 Uros Bizjak
  2022-05-25 14:40 ` [PATCH 1/2] " Uros Bizjak
  2022-05-25 14:40 ` [PATCH 2/2] locking/lockref/x86: Enable ARCH_USE_CMPXCHG_LOCKREF for X86_32 && X86_CMPXCHG64 Uros Bizjak
  0 siblings, 2 replies; 18+ messages in thread
From: Uros Bizjak @ 2022-05-25 14:40 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Uros Bizjak, Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
	Waiman.Long, paulmck

This patch series introduces try_cmpxchg64 into CMPXCHG_LOOP macro
to improve generated code a bit. x86 CMPXCHG instruction returns success
in ZF flag, so a compare after CMPXCHG and a related move instruction
can be eliminated.

The second patch (optionally) enables lockless reference count updates
for X86_32 target with X86_CMPXCHG64 config flag set. When try_cmpxchg64
is used in CMPXCHG_LOOP macro in lib/lockref.c, the compiler avoids
double-word compare and related move and produces quite optimal code
around CMPXCHG8B for a register starved X86_32 target.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman.Long@hp.com
Cc: paulmck@linux.vnet.ibm.com
---
Uros Bizjak (2):
  locking/lockref: Use try_cmpxchg64 in CMPXCHG_LOOP macro
  locking/lockref/x86: Enable ARCH_USE_CMPXCHG_LOCKREF for X86_32 &&
    X86_CMPXCHG64

 arch/x86/Kconfig | 2 +-
 lib/lockref.c    | 9 ++++-----
 2 files changed, 5 insertions(+), 6 deletions(-)

-- 
2.35.3


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/2] locking/lockref: Use try_cmpxchg64 in CMPXCHG_LOOP macro
  2022-05-25 14:40 [PATCH 0/2] locking/lockref: Use try_cmpxchg64 in CMPXCHG_LOOP macro Uros Bizjak
@ 2022-05-25 14:40 ` Uros Bizjak
  2022-05-25 16:47   ` Linus Torvalds
  2022-05-25 14:40 ` [PATCH 2/2] locking/lockref/x86: Enable ARCH_USE_CMPXCHG_LOCKREF for X86_32 && X86_CMPXCHG64 Uros Bizjak
  1 sibling, 1 reply; 18+ messages in thread
From: Uros Bizjak @ 2022-05-25 14:40 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Uros Bizjak, Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
	Waiman.Long, paulmck

Use try_cmpxchg64 instead of cmpxchg64 in CMPXCHG_LOOP macro.
x86 CMPXCHG instruction returns success in ZF flag, so this
change saves a compare after cmpxchg (and related move instruction
in front of cmpxchg). The main loop of lockref_get improves from:

  13:	48 89 c1             	mov    %rax,%rcx
  16:	48 c1 f9 20          	sar    $0x20,%rcx
  1a:	83 c1 01             	add    $0x1,%ecx
  1d:	48 89 ce             	mov    %rcx,%rsi
  20:	89 c1                	mov    %eax,%ecx
  22:	48 89 d0             	mov    %rdx,%rax
  25:	48 c1 e6 20          	shl    $0x20,%rsi
  29:	48 09 f1             	or     %rsi,%rcx
  2c:	f0 48 0f b1 4d 00    	lock cmpxchg %rcx,0x0(%rbp)
  32:	48 39 d0             	cmp    %rdx,%rax
  35:	75 17                	jne    4e <lockref_get+0x4e>

to:

  13:	48 89 ca             	mov    %rcx,%rdx
  16:	48 c1 fa 20          	sar    $0x20,%rdx
  1a:	83 c2 01             	add    $0x1,%edx
  1d:	48 89 d6             	mov    %rdx,%rsi
  20:	89 ca                	mov    %ecx,%edx
  22:	48 c1 e6 20          	shl    $0x20,%rsi
  26:	48 09 f2             	or     %rsi,%rdx
  29:	f0 48 0f b1 55 00    	lock cmpxchg %rdx,0x0(%rbp)
  2f:	75 02                	jne    33 <lockref_get+0x33>

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman.Long@hp.com
Cc: paulmck@linux.vnet.ibm.com
---
 lib/lockref.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/lib/lockref.c b/lib/lockref.c
index 5b34bbd3eba8..c6f0b183b937 100644
--- a/lib/lockref.c
+++ b/lib/lockref.c
@@ -14,12 +14,11 @@
 	BUILD_BUG_ON(sizeof(old) != 8);						\
 	old.lock_count = READ_ONCE(lockref->lock_count);			\
 	while (likely(arch_spin_value_unlocked(old.lock.rlock.raw_lock))) {  	\
-		struct lockref new = old, prev = old;				\
+		struct lockref new = old;					\
 		CODE								\
-		old.lock_count = cmpxchg64_relaxed(&lockref->lock_count,	\
-						   old.lock_count,		\
-						   new.lock_count);		\
-		if (likely(old.lock_count == prev.lock_count)) {		\
+		if (likely(try_cmpxchg64_relaxed(&lockref->lock_count,		\
+						 &old.lock_count,		\
+						 new.lock_count))) {		\
 			SUCCESS;						\
 		}								\
 		if (!--retry)							\
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/2] locking/lockref/x86: Enable ARCH_USE_CMPXCHG_LOCKREF for X86_32 && X86_CMPXCHG64
  2022-05-25 14:40 [PATCH 0/2] locking/lockref: Use try_cmpxchg64 in CMPXCHG_LOOP macro Uros Bizjak
  2022-05-25 14:40 ` [PATCH 1/2] " Uros Bizjak
@ 2022-05-25 14:40 ` Uros Bizjak
  2022-05-25 16:29   ` Linus Torvalds
  1 sibling, 1 reply; 18+ messages in thread
From: Uros Bizjak @ 2022-05-25 14:40 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Uros Bizjak, Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
	Waiman.Long, paulmck

Commit bc08b449ee14ace4d869adaa1bb35a44ce68d775 enabled lockless
reference count updates using cmpxchg() only for x86_64 and
left x86_32 behind due to inability to detect support for
cmpxchg8b instruction. Nowadays, we can use CONFIG_X86_CMPXCHG64
for this purpose. Also, by using try_cmpxchg64() instead of cmpxchg()
in CMPXCHG_LOOP macro, the compiler actually produces sane code,
improving lockref_get_or_lock main loop from:

 2a5:	8d 48 01             	lea    0x1(%eax),%ecx
 2a8:	85 c0                	test   %eax,%eax
 2aa:	7e 3c                	jle    2e8 <lockref_get_or_lock+0x78>
 2ac:	8b 44 24 08          	mov    0x8(%esp),%eax
 2b0:	8b 54 24 0c          	mov    0xc(%esp),%edx
 2b4:	8b 74 24 04          	mov    0x4(%esp),%esi
 2b8:	f0 0f c7 0e          	lock cmpxchg8b (%esi)
 2bc:	8b 4c 24 0c          	mov    0xc(%esp),%ecx
 2c0:	89 c3                	mov    %eax,%ebx
 2c2:	89 d0                	mov    %edx,%eax
 2c4:	8b 74 24 08          	mov    0x8(%esp),%esi
 2c8:	31 ca                	xor    %ecx,%edx
 2ca:	31 de                	xor    %ebx,%esi
 2cc:	09 f2                	or     %esi,%edx
 2ce:	75 40                	jne    310 <lockref_get_or_lock+0xa0>

to:

  2d:	8d 4f 01             	lea    0x1(%edi),%ecx
  30:	85 ff                	test   %edi,%edi
  32:	7e 1c                	jle    50 <lockref_get_or_lock+0x50>
  34:	f0 0f c7 0e          	lock cmpxchg8b (%esi)
  38:	75 36                	jne    70 <lockref_get_or_lock+0x70>

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman.Long@hp.com
Cc: paulmck@linux.vnet.ibm.com
---
 arch/x86/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 762a0b6ab8b6..326cfdc4f136 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -27,7 +27,6 @@ config X86_64
 	# Options that are inherently 64-bit kernel only:
 	select ARCH_HAS_GIGANTIC_PAGE
 	select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
-	select ARCH_USE_CMPXCHG_LOCKREF
 	select HAVE_ARCH_SOFT_DIRTY
 	select MODULES_USE_ELF_RELA
 	select NEED_DMA_MAP_STATE
@@ -111,6 +110,7 @@ config X86
 	select ARCH_SUPPORTS_LTO_CLANG
 	select ARCH_SUPPORTS_LTO_CLANG_THIN
 	select ARCH_USE_BUILTIN_BSWAP
+	select ARCH_USE_CMPXCHG_LOCKREF	if X86_64 || (X86_32 && X86_CMPXCHG64)
 	select ARCH_USE_MEMTEST
 	select ARCH_USE_QUEUED_RWLOCKS
 	select ARCH_USE_QUEUED_SPINLOCKS
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] locking/lockref/x86: Enable ARCH_USE_CMPXCHG_LOCKREF for X86_32 && X86_CMPXCHG64
  2022-05-25 14:40 ` [PATCH 2/2] locking/lockref/x86: Enable ARCH_USE_CMPXCHG_LOCKREF for X86_32 && X86_CMPXCHG64 Uros Bizjak
@ 2022-05-25 16:29   ` Linus Torvalds
  2022-05-26  8:30     ` David Laight
  0 siblings, 1 reply; 18+ messages in thread
From: Linus Torvalds @ 2022-05-25 16:29 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List,
	Peter Zijlstra, Thomas Gleixner, Waiman.Long, Paul McKenney

On Wed, May 25, 2022 at 7:40 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> +       select ARCH_USE_CMPXCHG_LOCKREF if X86_64 || (X86_32 && X86_CMPXCHG64)

Ugh. That looks pointlessly complicated. X86_64 already enables
X86_CMPXCHG64 afaik, so you can just say

        select ARCH_USE_CMPXCHG_LOCKREF if X86_CMPXCHG64

which is much clearer: CMPXCHG_LOCKREF needs CMPXCHG64, and the
Kconfig option says exactly that.

That said, this also makes me wonder if we should just make cmpxchg8b
requirement unconditional.

Googling around, it looks like Windows NT stopped booting on CPUs
without cmpxchg8b in version 5.1. That was in 2001.

Here we are, 21 years later, and we still ostensibly try to support
CPUs without it, but I doubt it's worth it.

So maybe time to just say "we require cmpxchg8b".

In fact, I think we have effectively done that already for years, since we have

    config X86_CMPXCHG64
            def_bool y
            depends on X86_PAE || ...

iow, enabling X86_PAE will force-enable CMPXCHG8B due to the wider
page table entries.

And I would assume that all distros basically do that anyway (but I do
not have a 32-bit distro around to check).

It would mean that we would finally drop support for i486 (and
possibly some Pentium clones, but afaik a number of them did actually
support cmpxchg8b even if they didn't report it in cpuid).

Comments?

                 Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] locking/lockref: Use try_cmpxchg64 in CMPXCHG_LOOP macro
  2022-05-25 14:40 ` [PATCH 1/2] " Uros Bizjak
@ 2022-05-25 16:47   ` Linus Torvalds
  2022-05-26  8:54     ` Uros Bizjak
                       ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Linus Torvalds @ 2022-05-25 16:47 UTC (permalink / raw)
  To: Uros Bizjak, Catalin Marinas, Will Deacon, Russell King,
	Michael Ellerman, Thomas Bogendoerfer, Heiko Carstens
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List,
	Peter Zijlstra, Thomas Gleixner, Waiman.Long, Paul McKenney

On Wed, May 25, 2022 at 7:40 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> Use try_cmpxchg64 instead of cmpxchg64 in CMPXCHG_LOOP macro.
> x86 CMPXCHG instruction returns success in ZF flag, so this
> change saves a compare after cmpxchg (and related move instruction
> in front of cmpxchg). The main loop of lockref_get improves from:

Ack on this one regardless of the 32-bit x86 question.

HOWEVER.

I'd like other architectures to pipe up too, because I think right now
x86 is the only one that implements that "arch_try_cmpxchg()" family
of operations natively, and I think the generic fallback for when it
is missing might be kind of nasty.

Maybe it ends up generating ok code, but it's also possible that it
just didn't matter when it was only used in one place in the
scheduler.

The lockref_get() case can be quite hot under some loads, it would be
sad if this made other architectures worse.

Anyway, maybe that try_cmpxchg() fallback is fine, and works out well
on architectures that use load-locked / store-conditional as-is.

But just to verify, I'm adding arm/powerpc/s390/mips people to the cc. See

    https://lore.kernel.org/all/20220525144013.6481-2-ubizjak@gmail.com/

for the original email and the x86-64 code example.

                   Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH 2/2] locking/lockref/x86: Enable ARCH_USE_CMPXCHG_LOCKREF for X86_32 && X86_CMPXCHG64
  2022-05-25 16:29   ` Linus Torvalds
@ 2022-05-26  8:30     ` David Laight
  2022-05-26  9:12       ` Uros Bizjak
  2022-05-26 17:23       ` Linus Torvalds
  0 siblings, 2 replies; 18+ messages in thread
From: David Laight @ 2022-05-26  8:30 UTC (permalink / raw)
  To: 'Linus Torvalds', Uros Bizjak
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List,
	Peter Zijlstra, Thomas Gleixner, Waiman.Long, Paul McKenney

From: Linus Torvalds
> Sent: 25 May 2022 17:30
> 
> On Wed, May 25, 2022 at 7:40 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > +       select ARCH_USE_CMPXCHG_LOCKREF if X86_64 || (X86_32 && X86_CMPXCHG64)
> 
> Ugh. That looks pointlessly complicated. X86_64 already enables
> X86_CMPXCHG64 afaik, so you can just say
> 
>         select ARCH_USE_CMPXCHG_LOCKREF if X86_CMPXCHG64
> 
> which is much clearer: CMPXCHG_LOCKREF needs CMPXCHG64, and the
> Kconfig option says exactly that.
> 
> That said, this also makes me wonder if we should just make cmpxchg8b
> requirement unconditional.
> 
> Googling around, it looks like Windows NT stopped booting on CPUs
> without cmpxchg8b in version 5.1. That was in 2001.
> 
> Here we are, 21 years later, and we still ostensibly try to support
> CPUs without it, but I doubt it's worth it.
> 
> So maybe time to just say "we require cmpxchg8b".
> 
> In fact, I think we have effectively done that already for years, since we have
> 
>     config X86_CMPXCHG64
>             def_bool y
>             depends on X86_PAE || ...
> 
> iow, enabling X86_PAE will force-enable CMPXCHG8B due to the wider
> page table entries.
> 
> And I would assume that all distros basically do that anyway (but I do
> not have a 32-bit distro around to check).
> 
> It would mean that we would finally drop support for i486 (and
> possibly some Pentium clones, but afaik a number of them did actually
> support cmpxchg8b even if they didn't report it in cpuid).

Perhaps there could be a non-smp implementation of cmpxchg8b
that just disables interrupts?

While I have used a dual 486 I doubt Linux would run ever
have on it. The same is probably true for old dual Pentiums.

I think there are still some 486-class embedded cpu that include
a few of the newer instructions (usually things like rdtsc).
But they won't be smp.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] locking/lockref: Use try_cmpxchg64 in CMPXCHG_LOOP macro
  2022-05-25 16:47   ` Linus Torvalds
@ 2022-05-26  8:54     ` Uros Bizjak
  2022-05-26 12:14       ` Michael Ellerman
  2022-05-26 16:56     ` Linus Torvalds
  2 siblings, 0 replies; 18+ messages in thread
From: Uros Bizjak @ 2022-05-26  8:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Catalin Marinas, Will Deacon, Russell King, Michael Ellerman,
	Thomas Bogendoerfer, Heiko Carstens, the arch/x86 maintainers,
	Linux Kernel Mailing List, Peter Zijlstra, Thomas Gleixner,
	Paul McKenney

[-- Attachment #1: Type: text/plain, Size: 2593 bytes --]

On Wed, May 25, 2022 at 6:48 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Wed, May 25, 2022 at 7:40 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > Use try_cmpxchg64 instead of cmpxchg64 in CMPXCHG_LOOP macro.
> > x86 CMPXCHG instruction returns success in ZF flag, so this
> > change saves a compare after cmpxchg (and related move instruction
> > in front of cmpxchg). The main loop of lockref_get improves from:
>
> Ack on this one regardless of the 32-bit x86 question.
>
> HOWEVER.
>
> I'd like other architectures to pipe up too, because I think right now
> x86 is the only one that implements that "arch_try_cmpxchg()" family
> of operations natively, and I think the generic fallback for when it
> is missing might be kind of nasty.
>
> Maybe it ends up generating ok code, but it's also possible that it
> just didn't matter when it was only used in one place in the
> scheduler.
>
> The lockref_get() case can be quite hot under some loads, it would be
> sad if this made other architectures worse.
>
> Anyway, maybe that try_cmpxchg() fallback is fine, and works out well
> on architectures that use load-locked / store-conditional as-is.

Attached to this message, please find attached the testcase that
analyses various CMPXCHG_LOOPs. Here you will find the old, the
fallback and the new cmpxchg loop, together with corresponding
lockref_get_* functions.

The testcase models the x86 cmpxchg8b and can be compiled for 64bit as
well as 32bit targets. As can be seen from the experiment, the
try_cmpxchg fallback creates EXACTLY THE SAME code for 64bit target as
the unpatched code. For the 32bit target one extra dead reg-reg 32bit
move remains in the generated fallback code assembly (this is the
compiler (gcc-10.3) artefact with double-word 64bit moves on x86_32
target).

From the above experiment, we can conclude that the patched lockref.c
creates the same code with the try_cmpxchg fallback as the original
code. I think the same will be observed also for other targets.

When the new code involving try_cmpxchg is compiled, impressive size
gains for x86_32 can be seen. The main loop size reduces from 44 bytes
to 30 bytes.

In the git repository, several transitions from cmpxchg to try_cmpxchg
can be found. The above experiment confirms, that the generated
fallback assembly is at least as good as the original unpatched
version, but can be more optimal when the target provides try_cmpxchg
instruction. Also, it looks to me that several other hot spots
throughout the code can be improved by changing them from using
cmpxchg to try_cmpxchg.

Uros.

[-- Attachment #2: lockref-test.c --]
[-- Type: text/x-csrc, Size: 2606 bytes --]

#include <stdint.h>

#define __aligned_u64 u64 __attribute__((aligned(8)))

#define LOCK_PREFIX "lock "

# define likely(x)	__builtin_expect(!!(x), 1)
# define unlikely(x)	__builtin_expect(!!(x), 0)

typedef uint64_t u64;
typedef uint32_t u32;

static inline void cpu_relax(void)
{
	asm volatile("rep; nop" ::: "memory");
}

static inline u64 cmpxchg64(volatile u64 *ptr, u64 old, u64 new)
{
	u64 prev;
	asm volatile(LOCK_PREFIX "cmpxchg8b %1"
		     : "=A" (prev),
		       "+m" (*ptr)
		     : "b" ((u32)new),
		       "c" ((u32)(new >> 32)),
		       "0" (old)
		     : "memory");
	return prev;
}

#define CMPXCHG_LOOP_OLD(CODE, SUCCESS) do {				\
	int retry = 100;						\
	__aligned_u64 old = *lockref;					\
	while (t) {							\
		__aligned_u64 new = old, prev = old;			\
		CODE							\
		old = cmpxchg64(lockref, old, new);			\
		if (likely(old == prev)) {				\
			SUCCESS;					\
		}							\
		if (!--retry)						\
			break;						\
		cpu_relax();						\
	}								\
} while (0)

void lockref_get_old(u64 *lockref, _Bool t)
{
	CMPXCHG_LOOP_OLD(
		new++;
	,
		return;
	);
}

#define try_cmpxchg64_fallback(_ptr, _oldp, _new)	\
({ \
	typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
	___r = cmpxchg64((_ptr), ___o, (_new)); \
	if (unlikely(___r != ___o)) \
		*___op = ___r; \
	likely(___r == ___o); \
})

#define CMPXCHG_LOOP_FALLBACK(CODE, SUCCESS) do {				\
	int retry = 100;							\
	__aligned_u64 old = *lockref;						\
	while (t) {								\
		__aligned_u64 new = old;					\
		CODE								\
		if (likely(try_cmpxchg64_fallback(lockref, &old, new))) {	\
			SUCCESS;						\
		}								\
		if (!--retry)							\
			break;							\
		cpu_relax();							\
	}									\
} while (0)

void lockref_get_fallback(u64 *lockref, _Bool t)
{
	CMPXCHG_LOOP_FALLBACK(
		new++;
	,
		return;
	);
}

static inline _Bool try_cmpxchg64(volatile u64 *ptr, u64 *pold, u64 new)
{
	_Bool success;
	u64 old = *pold;
	asm volatile(LOCK_PREFIX "cmpxchg8b %[ptr]"
		     : "=@ccz" (success),
		       [ptr] "+m" (*ptr),
		       "+A" (old)
		     : "b" ((u32)new),
		       "c" ((u32)(new >> 32))
		     : "memory");

	if (unlikely(!success))
		*pold = old;
	return success;
}

#define CMPXCHG_LOOP_NEW(CODE, SUCCESS) do {				\
	int retry = 100;						\
	__aligned_u64 old = *lockref;					\
	while (t) {							\
		__aligned_u64 new = old;				\
		CODE							\
		if (likely(try_cmpxchg64(lockref, &old, new))) {	\
			SUCCESS;					\
		}							\
		if (!--retry)						\
			break;						\
		cpu_relax();						\
	}								\
} while (0)

void lockref_get_new(u64 *lockref, _Bool t)
{
	CMPXCHG_LOOP_NEW(
		new++;
	,
		return;
	);
}

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] locking/lockref/x86: Enable ARCH_USE_CMPXCHG_LOCKREF for X86_32 && X86_CMPXCHG64
  2022-05-26  8:30     ` David Laight
@ 2022-05-26  9:12       ` Uros Bizjak
  2022-05-26 17:23       ` Linus Torvalds
  1 sibling, 0 replies; 18+ messages in thread
From: Uros Bizjak @ 2022-05-26  9:12 UTC (permalink / raw)
  To: David Laight
  Cc: Linus Torvalds, the arch/x86 maintainers,
	Linux Kernel Mailing List, Peter Zijlstra, Thomas Gleixner,
	Waiman.Long, Paul McKenney

On Thu, May 26, 2022 at 10:30 AM David Laight <David.Laight@aculab.com> wrote:
>
> From: Linus Torvalds
> > Sent: 25 May 2022 17:30
> >
> > On Wed, May 25, 2022 at 7:40 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > +       select ARCH_USE_CMPXCHG_LOCKREF if X86_64 || (X86_32 && X86_CMPXCHG64)
> >
> > Ugh. That looks pointlessly complicated. X86_64 already enables
> > X86_CMPXCHG64 afaik, so you can just say
> >
> >         select ARCH_USE_CMPXCHG_LOCKREF if X86_CMPXCHG64
> >
> > which is much clearer: CMPXCHG_LOCKREF needs CMPXCHG64, and the
> > Kconfig option says exactly that.
> >
> > That said, this also makes me wonder if we should just make cmpxchg8b
> > requirement unconditional.
> >
> > Googling around, it looks like Windows NT stopped booting on CPUs
> > without cmpxchg8b in version 5.1. That was in 2001.
> >
> > Here we are, 21 years later, and we still ostensibly try to support
> > CPUs without it, but I doubt it's worth it.
> >
> > So maybe time to just say "we require cmpxchg8b".
> >
> > In fact, I think we have effectively done that already for years, since we have
> >
> >     config X86_CMPXCHG64
> >             def_bool y
> >             depends on X86_PAE || ...
> >
> > iow, enabling X86_PAE will force-enable CMPXCHG8B due to the wider
> > page table entries.
> >
> > And I would assume that all distros basically do that anyway (but I do
> > not have a 32-bit distro around to check).
> >
> > It would mean that we would finally drop support for i486 (and
> > possibly some Pentium clones, but afaik a number of them did actually
> > support cmpxchg8b even if they didn't report it in cpuid).
>
> Perhaps there could be a non-smp implementation of cmpxchg8b
> that just disables interrupts?
>
> While I have used a dual 486 I doubt Linux would run ever
> have on it. The same is probably true for old dual Pentiums.
>
> I think there are still some 486-class embedded cpu that include
> a few of the newer instructions (usually things like rdtsc).
> But they won't be smp.

The above solution is already implemented when X86_CMPXCHG64 is not
defined. Please see arch/x86/include/asm/cmpxchg_32.h, where in this
case we have:

#define arch_cmpxchg64(ptr, o, n)                \
({                                \
    __typeof__(*(ptr)) __ret;                \
    __typeof__(*(ptr)) __old = (o);                \
    __typeof__(*(ptr)) __new = (n);                \
    alternative_io(LOCK_PREFIX_HERE                \
            "call cmpxchg8b_emu",            \
            "lock; cmpxchg8b (%%esi)" ,        \
               X86_FEATURE_CX8,                \
               "=A" (__ret),                \
               "S" ((ptr)), "0" (__old),        \
               "b" ((unsigned int)__new),        \
               "c" ((unsigned int)(__new>>32))        \
               : "memory");                \
    __ret; })

Without CX8, the code will be patched to the call to cmpxchg8b_emu,
defined in arch/x86/lib/cmpxchg8b_emu.S.

I think that the solution with CONFIG_X86_CMPXCHG64 is quite good.
When defined, arch_cmpxchg64 defines the real CMPXCHG8B instruction.
Without the config flag, the above definition is compiled in and the
code is patched to either the call or the real instruction during
runtime. Also, the config flag makes difference only in
arch/x86/include/asm/cmpxchg_32.h, and with try_cmpxchg64 fallbacks,
one can still use cmpxchg64/try_cmpxchg64 even when the HW doesn't
support CMPXCHG8B natively.

Uros.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] locking/lockref: Use try_cmpxchg64 in CMPXCHG_LOOP macro
  2022-05-25 16:47   ` Linus Torvalds
@ 2022-05-26 12:14       ` Michael Ellerman
  2022-05-26 12:14       ` Michael Ellerman
  2022-05-26 16:56     ` Linus Torvalds
  2 siblings, 0 replies; 18+ messages in thread
From: Michael Ellerman @ 2022-05-26 12:14 UTC (permalink / raw)
  To: Linus Torvalds, Uros Bizjak, Catalin Marinas, Will Deacon,
	Russell King, Thomas Bogendoerfer, Heiko Carstens
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List,
	Peter Zijlstra, Thomas Gleixner, Waiman.Long, Paul McKenney,
	linuxppc-dev

Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Wed, May 25, 2022 at 7:40 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>>
>> Use try_cmpxchg64 instead of cmpxchg64 in CMPXCHG_LOOP macro.
>> x86 CMPXCHG instruction returns success in ZF flag, so this
>> change saves a compare after cmpxchg (and related move instruction
>> in front of cmpxchg). The main loop of lockref_get improves from:
>
> Ack on this one regardless of the 32-bit x86 question.
>
> HOWEVER.
>
> I'd like other architectures to pipe up too, because I think right now
> x86 is the only one that implements that "arch_try_cmpxchg()" family
> of operations natively, and I think the generic fallback for when it
> is missing might be kind of nasty.
>
> Maybe it ends up generating ok code, but it's also possible that it
> just didn't matter when it was only used in one place in the
> scheduler.

This patch seems to generate slightly *better* code on powerpc.

I see one register-to-register move that gets shifted slightly later, so
that it's skipped on the path that returns directly via the SUCCESS
case.

So LGTM.

> The lockref_get() case can be quite hot under some loads, it would be
> sad if this made other architectures worse.

Do you know of a benchmark that shows it up? I tried a few things but
couldn't get lockref_get() to count for more than 1-2%.

cheers

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] locking/lockref: Use try_cmpxchg64 in CMPXCHG_LOOP macro
@ 2022-05-26 12:14       ` Michael Ellerman
  0 siblings, 0 replies; 18+ messages in thread
From: Michael Ellerman @ 2022-05-26 12:14 UTC (permalink / raw)
  To: Linus Torvalds, Uros Bizjak, Catalin Marinas, Will Deacon,
	Russell King, Thomas Bogendoerfer, Heiko Carstens
  Cc: Waiman.Long, Peter Zijlstra, the arch/x86 maintainers,
	Linux Kernel Mailing List, Thomas Gleixner, Paul McKenney,
	linuxppc-dev

Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Wed, May 25, 2022 at 7:40 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>>
>> Use try_cmpxchg64 instead of cmpxchg64 in CMPXCHG_LOOP macro.
>> x86 CMPXCHG instruction returns success in ZF flag, so this
>> change saves a compare after cmpxchg (and related move instruction
>> in front of cmpxchg). The main loop of lockref_get improves from:
>
> Ack on this one regardless of the 32-bit x86 question.
>
> HOWEVER.
>
> I'd like other architectures to pipe up too, because I think right now
> x86 is the only one that implements that "arch_try_cmpxchg()" family
> of operations natively, and I think the generic fallback for when it
> is missing might be kind of nasty.
>
> Maybe it ends up generating ok code, but it's also possible that it
> just didn't matter when it was only used in one place in the
> scheduler.

This patch seems to generate slightly *better* code on powerpc.

I see one register-to-register move that gets shifted slightly later, so
that it's skipped on the path that returns directly via the SUCCESS
case.

So LGTM.

> The lockref_get() case can be quite hot under some loads, it would be
> sad if this made other architectures worse.

Do you know of a benchmark that shows it up? I tried a few things but
couldn't get lockref_get() to count for more than 1-2%.

cheers

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] locking/lockref: Use try_cmpxchg64 in CMPXCHG_LOOP macro
  2022-05-26 12:14       ` Michael Ellerman
@ 2022-05-26 12:42         ` Mark Rutland
  -1 siblings, 0 replies; 18+ messages in thread
From: Mark Rutland @ 2022-05-26 12:42 UTC (permalink / raw)
  To: Michael Ellerman, Linus Torvalds
  Cc: Uros Bizjak, Catalin Marinas, Will Deacon, Russell King,
	Thomas Bogendoerfer, Heiko Carstens, the arch/x86 maintainers,
	Linux Kernel Mailing List, Peter Zijlstra, Thomas Gleixner,
	Waiman.Long, Paul McKenney, linuxppc-dev

On Thu, May 26, 2022 at 10:14:59PM +1000, Michael Ellerman wrote:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
> > On Wed, May 25, 2022 at 7:40 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >>
> >> Use try_cmpxchg64 instead of cmpxchg64 in CMPXCHG_LOOP macro.
> >> x86 CMPXCHG instruction returns success in ZF flag, so this
> >> change saves a compare after cmpxchg (and related move instruction
> >> in front of cmpxchg). The main loop of lockref_get improves from:
> >
> > Ack on this one regardless of the 32-bit x86 question.
> >
> > HOWEVER.
> >
> > I'd like other architectures to pipe up too, because I think right now
> > x86 is the only one that implements that "arch_try_cmpxchg()" family
> > of operations natively, and I think the generic fallback for when it
> > is missing might be kind of nasty.
> >
> > Maybe it ends up generating ok code, but it's also possible that it
> > just didn't matter when it was only used in one place in the
> > scheduler.
> 
> This patch seems to generate slightly *better* code on powerpc.
> 
> I see one register-to-register move that gets shifted slightly later, so
> that it's skipped on the path that returns directly via the SUCCESS
> case.

FWIW, I see the same on arm64; a register-to-register move gets moved out of
the success path. That changes the register allocation, and resulting in one
fewer move, but otherwise the code generation is the same.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] locking/lockref: Use try_cmpxchg64 in CMPXCHG_LOOP macro
@ 2022-05-26 12:42         ` Mark Rutland
  0 siblings, 0 replies; 18+ messages in thread
From: Mark Rutland @ 2022-05-26 12:42 UTC (permalink / raw)
  To: Michael Ellerman, Linus Torvalds
  Cc: Waiman.Long, Thomas Bogendoerfer, Peter Zijlstra,
	Catalin Marinas, Heiko Carstens, the arch/x86 maintainers,
	Uros Bizjak, Russell King, Linux Kernel Mailing List,
	linuxppc-dev, Thomas Gleixner, Paul McKenney, Will Deacon

On Thu, May 26, 2022 at 10:14:59PM +1000, Michael Ellerman wrote:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
> > On Wed, May 25, 2022 at 7:40 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >>
> >> Use try_cmpxchg64 instead of cmpxchg64 in CMPXCHG_LOOP macro.
> >> x86 CMPXCHG instruction returns success in ZF flag, so this
> >> change saves a compare after cmpxchg (and related move instruction
> >> in front of cmpxchg). The main loop of lockref_get improves from:
> >
> > Ack on this one regardless of the 32-bit x86 question.
> >
> > HOWEVER.
> >
> > I'd like other architectures to pipe up too, because I think right now
> > x86 is the only one that implements that "arch_try_cmpxchg()" family
> > of operations natively, and I think the generic fallback for when it
> > is missing might be kind of nasty.
> >
> > Maybe it ends up generating ok code, but it's also possible that it
> > just didn't matter when it was only used in one place in the
> > scheduler.
> 
> This patch seems to generate slightly *better* code on powerpc.
> 
> I see one register-to-register move that gets shifted slightly later, so
> that it's skipped on the path that returns directly via the SUCCESS
> case.

FWIW, I see the same on arm64; a register-to-register move gets moved out of
the success path. That changes the register allocation, and resulting in one
fewer move, but otherwise the code generation is the same.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] locking/lockref: Use try_cmpxchg64 in CMPXCHG_LOOP macro
  2022-05-26 12:14       ` Michael Ellerman
@ 2022-05-26 16:52         ` Linus Torvalds
  -1 siblings, 0 replies; 18+ messages in thread
From: Linus Torvalds @ 2022-05-26 16:52 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Waiman.Long, Thomas Bogendoerfer, Peter Zijlstra,
	Catalin Marinas, Heiko Carstens, the arch/x86 maintainers,
	Uros Bizjak, Russell King, Linux Kernel Mailing List,
	linuxppc-dev, Thomas Gleixner, Paul McKenney, Will Deacon

On Thu, May 26, 2022 at 5:15 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> Do you know of a benchmark that shows it up? I tried a few things but
> couldn't get lockref_get() to count for more than 1-2%.

Heh. 1% for a small instruction sequence that is only handful of
instructions and is used in just a couple of places counts as "very
hot" for me.

I assume you did the natural thing: threaded pathname lookup (with
paths being of the longer variety to not be dominated by system call
etc costs).

               Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] locking/lockref: Use try_cmpxchg64 in CMPXCHG_LOOP macro
@ 2022-05-26 16:52         ` Linus Torvalds
  0 siblings, 0 replies; 18+ messages in thread
From: Linus Torvalds @ 2022-05-26 16:52 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Uros Bizjak, Catalin Marinas, Will Deacon, Russell King,
	Thomas Bogendoerfer, Heiko Carstens, the arch/x86 maintainers,
	Linux Kernel Mailing List, Peter Zijlstra, Thomas Gleixner,
	Waiman.Long, Paul McKenney, linuxppc-dev

On Thu, May 26, 2022 at 5:15 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> Do you know of a benchmark that shows it up? I tried a few things but
> couldn't get lockref_get() to count for more than 1-2%.

Heh. 1% for a small instruction sequence that is only handful of
instructions and is used in just a couple of places counts as "very
hot" for me.

I assume you did the natural thing: threaded pathname lookup (with
paths being of the longer variety to not be dominated by system call
etc costs).

               Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] locking/lockref: Use try_cmpxchg64 in CMPXCHG_LOOP macro
  2022-05-25 16:47   ` Linus Torvalds
  2022-05-26  8:54     ` Uros Bizjak
  2022-05-26 12:14       ` Michael Ellerman
@ 2022-05-26 16:56     ` Linus Torvalds
  2 siblings, 0 replies; 18+ messages in thread
From: Linus Torvalds @ 2022-05-26 16:56 UTC (permalink / raw)
  To: Uros Bizjak, Catalin Marinas, Will Deacon, Russell King,
	Michael Ellerman, Thomas Bogendoerfer, Heiko Carstens
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List,
	Peter Zijlstra, Thomas Gleixner, Waiman.Long, Paul McKenney

On Wed, May 25, 2022 at 9:47 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Ack on this one regardless of the 32-bit x86 question.

.. and with confirmation from Michael and Mark (and the analysis from
Uros on the fallback code), I've applied this to my tree.

NOTE!

I have *not* applied the second patch with the x86-32 change to enable
this on 32-bit with CMPXCHG8B enabled. I think that's independent, and
will leave it up to the x86 maintainers.  Plus I think the patch
should be simplified.

             Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] locking/lockref/x86: Enable ARCH_USE_CMPXCHG_LOCKREF for X86_32 && X86_CMPXCHG64
  2022-05-26  8:30     ` David Laight
  2022-05-26  9:12       ` Uros Bizjak
@ 2022-05-26 17:23       ` Linus Torvalds
  1 sibling, 0 replies; 18+ messages in thread
From: Linus Torvalds @ 2022-05-26 17:23 UTC (permalink / raw)
  To: David Laight
  Cc: Uros Bizjak, the arch/x86 maintainers, Linux Kernel Mailing List,
	Peter Zijlstra, Thomas Gleixner, Waiman.Long, Paul McKenney

On Thu, May 26, 2022 at 1:30 AM David Laight <David.Laight@aculab.com> wrote:
>
> Perhaps there could be a non-smp implementation of cmpxchg8b
> that just disables interrupts?

As Uros points out, we do have exactly that, but it's not actually
written to be usable for the "trylock" case. Plus it would be
pointless for lockrefs, since the non-cmpxchg case that just falls
back to a spinlock would be faster and simpler (particularly on UP,
where locking goes away).

> While I have used a dual 486 I doubt Linux would run ever
> have on it. The same is probably true for old dual Pentiums.

Yeah, I don't think we ever supported SMP on i486, afaik they all
needed special system glue.

I think the "modern" x86 SMP support with a local APIC was a PPro and
newer thing historically, but clearly there were then later what
amounted to Penitum/MMX class cores (ie old Atom) that did support
SMP.

                      Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] locking/lockref: Use try_cmpxchg64 in CMPXCHG_LOOP macro
  2022-05-26 12:42         ` Mark Rutland
@ 2022-05-27  9:36           ` Heiko Carstens
  -1 siblings, 0 replies; 18+ messages in thread
From: Heiko Carstens @ 2022-05-27  9:36 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Michael Ellerman, Linus Torvalds, Uros Bizjak, Catalin Marinas,
	Will Deacon, Russell King, Thomas Bogendoerfer,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Peter Zijlstra, Thomas Gleixner, Waiman.Long, Paul McKenney,
	linuxppc-dev

On Thu, May 26, 2022 at 01:42:35PM +0100, Mark Rutland wrote:
> On Thu, May 26, 2022 at 10:14:59PM +1000, Michael Ellerman wrote:
> > Linus Torvalds <torvalds@linux-foundation.org> writes:
> > > On Wed, May 25, 2022 at 7:40 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >>
> > >> Use try_cmpxchg64 instead of cmpxchg64 in CMPXCHG_LOOP macro.
> > >> x86 CMPXCHG instruction returns success in ZF flag, so this
> > >> change saves a compare after cmpxchg (and related move instruction
> > >> in front of cmpxchg). The main loop of lockref_get improves from:
> > >
> > > Ack on this one regardless of the 32-bit x86 question.
> > >
> > > HOWEVER.
> > >
> > > I'd like other architectures to pipe up too, because I think right now
> > > x86 is the only one that implements that "arch_try_cmpxchg()" family
> > > of operations natively, and I think the generic fallback for when it
> > > is missing might be kind of nasty.
> > >
> > > Maybe it ends up generating ok code, but it's also possible that it
> > > just didn't matter when it was only used in one place in the
> > > scheduler.
> > 
> > This patch seems to generate slightly *better* code on powerpc.
> > 
> > I see one register-to-register move that gets shifted slightly later, so
> > that it's skipped on the path that returns directly via the SUCCESS
> > case.
> 
> FWIW, I see the same on arm64; a register-to-register move gets moved out of
> the success path. That changes the register allocation, and resulting in one
> fewer move, but otherwise the code generation is the same.

Just for the records: s390 code generation changes the same like on
powerpc; so looks good.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] locking/lockref: Use try_cmpxchg64 in CMPXCHG_LOOP macro
@ 2022-05-27  9:36           ` Heiko Carstens
  0 siblings, 0 replies; 18+ messages in thread
From: Heiko Carstens @ 2022-05-27  9:36 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Waiman.Long, Thomas Bogendoerfer, Will Deacon, Peter Zijlstra,
	the arch/x86 maintainers, Uros Bizjak, Russell King,
	Linux Kernel Mailing List, linuxppc-dev, Catalin Marinas,
	Thomas Gleixner, Paul McKenney, Linus Torvalds

On Thu, May 26, 2022 at 01:42:35PM +0100, Mark Rutland wrote:
> On Thu, May 26, 2022 at 10:14:59PM +1000, Michael Ellerman wrote:
> > Linus Torvalds <torvalds@linux-foundation.org> writes:
> > > On Wed, May 25, 2022 at 7:40 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >>
> > >> Use try_cmpxchg64 instead of cmpxchg64 in CMPXCHG_LOOP macro.
> > >> x86 CMPXCHG instruction returns success in ZF flag, so this
> > >> change saves a compare after cmpxchg (and related move instruction
> > >> in front of cmpxchg). The main loop of lockref_get improves from:
> > >
> > > Ack on this one regardless of the 32-bit x86 question.
> > >
> > > HOWEVER.
> > >
> > > I'd like other architectures to pipe up too, because I think right now
> > > x86 is the only one that implements that "arch_try_cmpxchg()" family
> > > of operations natively, and I think the generic fallback for when it
> > > is missing might be kind of nasty.
> > >
> > > Maybe it ends up generating ok code, but it's also possible that it
> > > just didn't matter when it was only used in one place in the
> > > scheduler.
> > 
> > This patch seems to generate slightly *better* code on powerpc.
> > 
> > I see one register-to-register move that gets shifted slightly later, so
> > that it's skipped on the path that returns directly via the SUCCESS
> > case.
> 
> FWIW, I see the same on arm64; a register-to-register move gets moved out of
> the success path. That changes the register allocation, and resulting in one
> fewer move, but otherwise the code generation is the same.

Just for the records: s390 code generation changes the same like on
powerpc; so looks good.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-05-27  9:37 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-25 14:40 [PATCH 0/2] locking/lockref: Use try_cmpxchg64 in CMPXCHG_LOOP macro Uros Bizjak
2022-05-25 14:40 ` [PATCH 1/2] " Uros Bizjak
2022-05-25 16:47   ` Linus Torvalds
2022-05-26  8:54     ` Uros Bizjak
2022-05-26 12:14     ` Michael Ellerman
2022-05-26 12:14       ` Michael Ellerman
2022-05-26 12:42       ` Mark Rutland
2022-05-26 12:42         ` Mark Rutland
2022-05-27  9:36         ` Heiko Carstens
2022-05-27  9:36           ` Heiko Carstens
2022-05-26 16:52       ` Linus Torvalds
2022-05-26 16:52         ` Linus Torvalds
2022-05-26 16:56     ` Linus Torvalds
2022-05-25 14:40 ` [PATCH 2/2] locking/lockref/x86: Enable ARCH_USE_CMPXCHG_LOCKREF for X86_32 && X86_CMPXCHG64 Uros Bizjak
2022-05-25 16:29   ` Linus Torvalds
2022-05-26  8:30     ` David Laight
2022-05-26  9:12       ` Uros Bizjak
2022-05-26 17:23       ` Linus Torvalds

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.