Linux-csky Archive on lore.kernel.org
 help / color / Atom feed
From: Paul Campbell <taniwha@gmail.com>
To: Arnd Bergmann <arnd@arndb.de>, linux-riscv@lists.infradead.org
Cc: Peter Zijlstra <peterz@infradead.org>,
	linux-riscv <linux-riscv@lists.infradead.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-csky@vger.kernel.org,
	linux-arch <linux-arch@vger.kernel.org>,
	Guo Ren <guoren@linux.alibaba.com>, Will Deacon <will@kernel.org>,
	Ingo Molnar <mingo@redhat.com>, Waiman Long <longman@redhat.com>,
	Anup Patel <anup@brainfault.org>,
	Sebastian Andrzej Siewior <sebastian@breakpoint.cc>,
	Guo Ren <guoren@kernel.org>
Subject: Re: [PATCH v4 3/4] locking/qspinlock: Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32
Date: Wed, 31 Mar 2021 18:33:07 +1300
Message-ID: <1706037.TLkxdtWsSY@rata> (raw)
In-Reply-To: <CAJF2gTSGLn7katm6YAtkKWJcQRqw36_yqn+aK1pKUSRM5V1zUg@mail.gmail.com>

On Wednesday, 31 March 2021 5:18:56 PM NZDT Guo Ren wrote:
> > > [1]
> > > https://github.com/c-sky/csky-linux/commit/e837aad23148542771794d8a2fcc
> > > 52afd0fcbf88> > 
> > > > It also seems that the current "amoswap" based implementation
> > > > would be reliable independent of RsrvEventual/RsrvNonEventual.
> > > 
> > > Yes, the hardware implementation of AMO could be different from LR/SC.
> > > AMO could use ACE snoop holding to lock the bus in hw coherency
> > > design, but LR/SC uses an exclusive monitor without locking the bus.
> > > 
> > > RISC-V hasn't CAS instructions, and it uses LR/SC for cmpxchg. I don't
> > > think LR/SC would be slower than CAS, and CAS is just good for code
> > > size.
> > 
> > What I meant here is that the current spinlock uses a simple amoswap,
> > which presumably does not suffer from the lack of forward process you
> > described.
> 
> Does that mean we should prevent using LR/SC (if RsrvNonEventual)?

Let me provide another data-point, I'm working on a high-end highly 
speculative implementation with many concurrent instructions in flight - from 
my point of view  both sorts of AMO (LR/SC and swap/add/etc) require me to 
grab a cache line in an exclusive modifiable state (so no difference there).

More importantly both sorts of AMO instructions  (unlike most loads and 
stores) can't be speculated (not even LR because it changes hidden state, I 
found this out the hard way bringing up the kernel).

This means that both LR AND SC individually can't be executed until all 
speculation is resolved (that means that they happen really late in the 
execute path and block the resolution of the speculation of subsequent 
instructions) - equally a single amoswap/add/etc instruction can't happen 
until late in the execute path - so both require the same cache line state, 
but one of these sorts of events is better than two of them.

Which in short means that amoswap/add/etc is better for small things - small 
buzzy lock loops, while LR/SC is better for more complex things with actual 
processing between the LR and SC.

----

Another issue here is to consider is what happens when you hit one of these 
tight spinlocks when the branch target cache is empty and they fail (ie loop 
back and try again) - the default branch prediction, and resulting 
speculation, is (very) likely to be looping back, while hopefully most locks 
are not contended when you hit them and that speculation would be wrong - a 
spinlock like this may not be so good:

	li a0, 1
loop: 
	amoswap	a1, a0, (a2)
	beqz	a1, loop
	..... subsequent code

In my world with no BTC info the pipe fills with dozens of amoswaps, rather 
than  the 'subsequent code'. While (in my world) code like this:

	li a0, 1
loop: 
	amoswap	a1, a0, (a2)
	bnez	a1, 1f
	.... subsequent code

1:	j loop

would actually be better (in my world unconditional jump instructions are 
folded early and never see execution so they're sort of free, though they mess 
with the issue/decode rate). Smart compilers could move the "j loop" out of 
the way, while the double branch on failure is not a big deal since either the 
lock is still held (and you don't care if it's slow) or it's been released in 
which case the cache line has been stolen and the refetch of that cache line 
is going to dominate the next time around the loop

I need to stress here that this is how my architecture works, other's will of 
course be different though I expect that other heavily speculative 
architectures to have similar issues :-)

	Paul Campbell
	Moonbase Otago




  reply index

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-27 18:06 [PATCH v4 0/4] riscv: Add qspinlock/qrwlock guoren
2021-03-27 18:06 ` [PATCH v4 1/4] riscv: cmpxchg.h: Cleanup unused code guoren
2021-03-27 18:06 ` [PATCH v4 2/4] riscv: cmpxchg.h: Merge macros guoren
2021-03-27 21:25   ` Arnd Bergmann
2021-03-28  1:50     ` Guo Ren
2021-03-27 18:06 ` [PATCH v4 3/4] locking/qspinlock: Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32 guoren
2021-03-27 18:43   ` Waiman Long
2021-03-28  1:48     ` Guo Ren
2021-03-29  7:50   ` Peter Zijlstra
2021-03-29  9:41     ` Arnd Bergmann
2021-03-29 11:16       ` Peter Zijlstra
2021-03-29 11:29         ` Peter Zijlstra
2021-03-29 12:52           ` Guo Ren
2021-03-29 13:56             ` Arnd Bergmann
2021-03-30  2:26               ` Guo Ren
2021-03-30  5:51                 ` Anup Patel
2021-03-30  6:26                   ` Guo Ren
2021-03-30  7:11                 ` Arnd Bergmann
2021-03-31  4:18                   ` Guo Ren
2021-03-31  5:33                     ` Paul Campbell [this message]
2021-04-05 16:12                       ` Guo Ren
2021-03-31  6:44                     ` Guo Ren
2021-03-31  7:12                       ` Arnd Bergmann
2021-03-29 11:19     ` Guo Ren
2021-03-29 11:26       ` Peter Zijlstra
2021-03-29 12:01         ` Guo Ren
2021-03-29 12:49           ` Peter Zijlstra
2021-03-30  3:13             ` Guo Ren
2021-03-30  4:54               ` Anup Patel
2021-03-30  6:27                 ` Guo Ren
2021-03-30  8:31               ` David Laight
2021-03-30 14:09               ` Waiman Long
2021-03-31 14:47                 ` Guo Ren
2021-04-05 16:45                 ` Guo Ren
2021-03-30 16:08               ` Peter Zijlstra
2021-03-30 22:35                 ` Stafford Horne
2021-03-31  7:23                   ` Arnd Bergmann
2021-03-31 12:31                     ` Stafford Horne
2021-03-31 15:10                       ` Guo Ren
2021-04-06  8:51                         ` Stafford Horne
2021-04-06  3:50                     ` Guo Ren
2021-04-06  8:56                       ` Stafford Horne
2021-04-07  8:42                         ` Arnd Bergmann
2021-04-07 11:36                           ` Peter Zijlstra
2021-04-07 11:57                             ` Arnd Bergmann
2021-04-07 12:02                             ` Peter Zijlstra
2021-04-05 16:40                 ` Guo Ren
2021-03-31 15:22             ` Guo Ren
2021-04-06  7:15               ` Peter Zijlstra
2021-04-07  9:42                 ` Christoph Hellwig
2021-04-07 14:29                   ` Christoph Müllner
2021-04-07 14:34                     ` Christoph Hellwig
2021-04-07 15:51                     ` Peter Zijlstra
2021-04-07 16:44                       ` Peter Zijlstra
2021-04-07 15:52                     ` Peter Zijlstra
2021-04-07 16:54                       ` Peter Zijlstra
2021-04-07 16:00                     ` Peter Zijlstra
2021-04-07 19:50                       ` Christoph Müllner
2021-04-06 17:24               ` Boqun Feng
2021-04-07  9:26                 ` Peter Zijlstra
2021-03-29 12:13         ` Anup Patel
2021-03-29 12:54           ` Peter Zijlstra
2021-03-27 18:06 ` [PATCH v4 4/4] riscv: Convert custom spinlock/rwlock to generic qspinlock/qrwlock guoren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1706037.TLkxdtWsSY@rata \
    --to=taniwha@gmail.com \
    --cc=anup@brainfault.org \
    --cc=arnd@arndb.de \
    --cc=guoren@kernel.org \
    --cc=guoren@linux.alibaba.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-csky@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=longman@redhat.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=sebastian@breakpoint.cc \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-csky Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-csky/0 linux-csky/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-csky linux-csky/ https://lore.kernel.org/linux-csky \
		linux-csky@vger.kernel.org
	public-inbox-index linux-csky

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-csky


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git