linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Uros Bizjak <ubizjak@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Russell King <linux@armlinux.org.uk>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
	Heiko Carstens <hca@linux.ibm.com>,
	"the arch/x86 maintainers" <x86@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Paul McKenney <paulmck@linux.vnet.ibm.com>
Subject: Re: [PATCH 1/2] locking/lockref: Use try_cmpxchg64 in CMPXCHG_LOOP macro
Date: Thu, 26 May 2022 10:54:58 +0200	[thread overview]
Message-ID: <CAFULd4bc54+_FmJ=f++zzz99mR8r5c11-Y49pz86Yb8G3dyJpA@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=wh1XeaxWXG5QziGA4ds918UnW1hO924kusgVB-wGj+9Og@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2593 bytes --]

On Wed, May 25, 2022 at 6:48 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Wed, May 25, 2022 at 7:40 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > Use try_cmpxchg64 instead of cmpxchg64 in CMPXCHG_LOOP macro.
> > x86 CMPXCHG instruction returns success in ZF flag, so this
> > change saves a compare after cmpxchg (and related move instruction
> > in front of cmpxchg). The main loop of lockref_get improves from:
>
> Ack on this one regardless of the 32-bit x86 question.
>
> HOWEVER.
>
> I'd like other architectures to pipe up too, because I think right now
> x86 is the only one that implements that "arch_try_cmpxchg()" family
> of operations natively, and I think the generic fallback for when it
> is missing might be kind of nasty.
>
> Maybe it ends up generating ok code, but it's also possible that it
> just didn't matter when it was only used in one place in the
> scheduler.
>
> The lockref_get() case can be quite hot under some loads, it would be
> sad if this made other architectures worse.
>
> Anyway, maybe that try_cmpxchg() fallback is fine, and works out well
> on architectures that use load-locked / store-conditional as-is.

Attached to this message, please find attached the testcase that
analyses various CMPXCHG_LOOPs. Here you will find the old, the
fallback and the new cmpxchg loop, together with corresponding
lockref_get_* functions.

The testcase models the x86 cmpxchg8b and can be compiled for 64bit as
well as 32bit targets. As can be seen from the experiment, the
try_cmpxchg fallback creates EXACTLY THE SAME code for 64bit target as
the unpatched code. For the 32bit target one extra dead reg-reg 32bit
move remains in the generated fallback code assembly (this is the
compiler (gcc-10.3) artefact with double-word 64bit moves on x86_32
target).

From the above experiment, we can conclude that the patched lockref.c
creates the same code with the try_cmpxchg fallback as the original
code. I think the same will be observed also for other targets.

When the new code involving try_cmpxchg is compiled, impressive size
gains for x86_32 can be seen. The main loop size reduces from 44 bytes
to 30 bytes.

In the git repository, several transitions from cmpxchg to try_cmpxchg
can be found. The above experiment confirms, that the generated
fallback assembly is at least as good as the original unpatched
version, but can be more optimal when the target provides try_cmpxchg
instruction. Also, it looks to me that several other hot spots
throughout the code can be improved by changing them from using
cmpxchg to try_cmpxchg.

Uros.

[-- Attachment #2: lockref-test.c --]
[-- Type: text/x-csrc, Size: 2606 bytes --]

#include <stdint.h>

#define __aligned_u64 u64 __attribute__((aligned(8)))

#define LOCK_PREFIX "lock "

# define likely(x)	__builtin_expect(!!(x), 1)
# define unlikely(x)	__builtin_expect(!!(x), 0)

typedef uint64_t u64;
typedef uint32_t u32;

static inline void cpu_relax(void)
{
	asm volatile("rep; nop" ::: "memory");
}

static inline u64 cmpxchg64(volatile u64 *ptr, u64 old, u64 new)
{
	u64 prev;
	asm volatile(LOCK_PREFIX "cmpxchg8b %1"
		     : "=A" (prev),
		       "+m" (*ptr)
		     : "b" ((u32)new),
		       "c" ((u32)(new >> 32)),
		       "0" (old)
		     : "memory");
	return prev;
}

#define CMPXCHG_LOOP_OLD(CODE, SUCCESS) do {				\
	int retry = 100;						\
	__aligned_u64 old = *lockref;					\
	while (t) {							\
		__aligned_u64 new = old, prev = old;			\
		CODE							\
		old = cmpxchg64(lockref, old, new);			\
		if (likely(old == prev)) {				\
			SUCCESS;					\
		}							\
		if (!--retry)						\
			break;						\
		cpu_relax();						\
	}								\
} while (0)

void lockref_get_old(u64 *lockref, _Bool t)
{
	CMPXCHG_LOOP_OLD(
		new++;
	,
		return;
	);
}

#define try_cmpxchg64_fallback(_ptr, _oldp, _new)	\
({ \
	typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
	___r = cmpxchg64((_ptr), ___o, (_new)); \
	if (unlikely(___r != ___o)) \
		*___op = ___r; \
	likely(___r == ___o); \
})

#define CMPXCHG_LOOP_FALLBACK(CODE, SUCCESS) do {				\
	int retry = 100;							\
	__aligned_u64 old = *lockref;						\
	while (t) {								\
		__aligned_u64 new = old;					\
		CODE								\
		if (likely(try_cmpxchg64_fallback(lockref, &old, new))) {	\
			SUCCESS;						\
		}								\
		if (!--retry)							\
			break;							\
		cpu_relax();							\
	}									\
} while (0)

void lockref_get_fallback(u64 *lockref, _Bool t)
{
	CMPXCHG_LOOP_FALLBACK(
		new++;
	,
		return;
	);
}

static inline _Bool try_cmpxchg64(volatile u64 *ptr, u64 *pold, u64 new)
{
	_Bool success;
	u64 old = *pold;
	asm volatile(LOCK_PREFIX "cmpxchg8b %[ptr]"
		     : "=@ccz" (success),
		       [ptr] "+m" (*ptr),
		       "+A" (old)
		     : "b" ((u32)new),
		       "c" ((u32)(new >> 32))
		     : "memory");

	if (unlikely(!success))
		*pold = old;
	return success;
}

#define CMPXCHG_LOOP_NEW(CODE, SUCCESS) do {				\
	int retry = 100;						\
	__aligned_u64 old = *lockref;					\
	while (t) {							\
		__aligned_u64 new = old;				\
		CODE							\
		if (likely(try_cmpxchg64(lockref, &old, new))) {	\
			SUCCESS;					\
		}							\
		if (!--retry)						\
			break;						\
		cpu_relax();						\
	}								\
} while (0)

void lockref_get_new(u64 *lockref, _Bool t)
{
	CMPXCHG_LOOP_NEW(
		new++;
	,
		return;
	);
}

  reply	other threads:[~2022-05-26  8:55 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-25 14:40 [PATCH 0/2] locking/lockref: Use try_cmpxchg64 in CMPXCHG_LOOP macro Uros Bizjak
2022-05-25 14:40 ` [PATCH 1/2] " Uros Bizjak
2022-05-25 16:47   ` Linus Torvalds
2022-05-26  8:54     ` Uros Bizjak [this message]
2022-05-26 12:14     ` Michael Ellerman
2022-05-26 12:42       ` Mark Rutland
2022-05-27  9:36         ` Heiko Carstens
2022-05-26 16:52       ` Linus Torvalds
2022-05-26 16:56     ` Linus Torvalds
2022-05-25 14:40 ` [PATCH 2/2] locking/lockref/x86: Enable ARCH_USE_CMPXCHG_LOCKREF for X86_32 && X86_CMPXCHG64 Uros Bizjak
2022-05-25 16:29   ` Linus Torvalds
2022-05-26  8:30     ` David Laight
2022-05-26  9:12       ` Uros Bizjak
2022-05-26 17:23       ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFULd4bc54+_FmJ=f++zzz99mR8r5c11-Y49pz86Yb8G3dyJpA@mail.gmail.com' \
    --to=ubizjak@gmail.com \
    --cc=catalin.marinas@arm.com \
    --cc=hca@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=mpe@ellerman.id.au \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=tsbogend@alpha.franken.de \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).