From: Linus Torvalds <torvalds@linux-foundation.org>
To: David Laight <David.Laight@aculab.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"peterz@infradead.org" <peterz@infradead.org>,
"longman@redhat.com" <longman@redhat.com>,
"mingo@redhat.com" <mingo@redhat.com>,
"will@kernel.org" <will@kernel.org>,
"boqun.feng@gmail.com" <boqun.feng@gmail.com>,
"xinhui.pan@linux.vnet.ibm.com" <xinhui.pan@linux.vnet.ibm.com>,
"virtualization@lists.linux-foundation.org"
<virtualization@lists.linux-foundation.org>,
Zeng Heng <zengheng4@huawei.com>
Subject: Re: [PATCH next 4/5] locking/osq_lock: Optimise per-cpu data accesses.
Date: Sat, 30 Dec 2023 12:41:12 -0800 [thread overview]
Message-ID: <CAHk-=wjbWTbRKDP=Yb9VWBGjSBEGB3dJ0=--+7-4oA2n1=1FKw@mail.gmail.com> (raw)
In-Reply-To: <bddb6b00434d4492abca4725c10f8d5a@AcuMS.aculab.com>
[-- Attachment #1: Type: text/plain, Size: 3129 bytes --]
On Fri, 29 Dec 2023 at 12:57, David Laight <David.Laight@aculab.com> wrote:
>
> this_cpu_ptr() is rather more expensive than raw_cpu_read() since
> the latter can use an 'offset from register' (%gs for x86-84).
>
> Add a 'self' field to 'struct optimistic_spin_node' that can be
> read with raw_cpu_read(), initialise on first call.
No, this is horrible.
The problem isn't the "this_cpu_ptr()", it's the rest of the code.
> bool osq_lock(struct optimistic_spin_queue *lock)
> {
> - struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
> + struct optimistic_spin_node *node = raw_cpu_read(osq_node.self);
No. Both of these are crap.
> struct optimistic_spin_node *prev, *next;
> int old;
>
> - if (unlikely(node->cpu == OSQ_UNLOCKED_VAL))
> - node->cpu = encode_cpu(smp_processor_id());
> + if (unlikely(!node)) {
> + int cpu = encode_cpu(smp_processor_id());
> + node = decode_cpu(cpu);
> + node->self = node;
> + node->cpu = cpu;
> + }
The proper fix here is to not do that silly
node = this_cpu_ptr(&osq_node);
..
node->next = NULL;
dance at all, but to simply do
this_cpu_write(osq_node.next, NULL);
in the first place. That makes the whole thing just a single store off
the segment descriptor.
Yes, you'll eventually end up doing that
node = this_cpu_ptr(&osq_node);
thing because it then wants to use that raw pointer to do
WRITE_ONCE(prev->next, node);
but that's a separate issue and still does not make it worth it to
create a pointless self-pointer.
Btw, if you *really* want to solve that separate issue, then make the
optimistic_spin_node struct not contain the pointers at all, but the
CPU numbers, and then turn those numbers into the pointers the exact
same way it does for the "lock->tail" thing, ie doing that whole
prev = decode_cpu(old);
dance. That *may* then result in avoiding turning them into pointers
at all in some cases.
Also, I think that you might want to look into making OSQ_UNLOCKED_VAL
be -1 instead, and add something like
#define IS_OSQ_UNLOCKED(x) ((int)(x)<0)
and that would then avoid the +1 / -1 games in encoding/decoding the
CPU numbers. It causes silly code generated like this:
subl $1, %eax #, cpu_nr
...
cltq
addq __per_cpu_offset(,%rax,8), %rcx
which seems honestly stupid. The cltq is there for sign-extension,
which is because all these things are "int", and the "subl" will
zero-extend to 64-bit, not sign-extend.
At that point, I think gcc might be able to just generate
addq __per_cpu_offset-8(,%rax,8), %rcx
but honestly, I think it would be nicer to just have decode_cpu() do
unsigned int cpu_nr = encoded_cpu_val;
return per_cpu_ptr(&osq_node, cpu_nr);
and not have the -1/+1 at all.
Hmm?
UNTESTED patch to just do the "this_cpu_write()" parts attached.
Again, note how we do end up doing that this_cpu_ptr conversion later
anyway, but at least it's off the critical path.
Linus
[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 1083 bytes --]
kernel/locking/osq_lock.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 75a6f6133866..c3a166b7900c 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -92,14 +92,14 @@ osq_wait_next(struct optimistic_spin_queue *lock,
bool osq_lock(struct optimistic_spin_queue *lock)
{
- struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
+ struct optimistic_spin_node *node;
struct optimistic_spin_node *prev, *next;
int curr = encode_cpu(smp_processor_id());
int old;
- node->locked = 0;
- node->next = NULL;
- node->cpu = curr;
+ this_cpu_write(osq_node.next, NULL);
+ this_cpu_write(osq_node.locked, 0);
+ this_cpu_write(osq_node.cpu, curr);
/*
* We need both ACQUIRE (pairs with corresponding RELEASE in
@@ -112,7 +112,9 @@ bool osq_lock(struct optimistic_spin_queue *lock)
return true;
prev = decode_cpu(old);
- node->prev = prev;
+ this_cpu_write(osq_node.prev, prev);
+
+ node = this_cpu_ptr(&osq_node);
/*
* osq_lock() unqueue
next prev parent reply other threads:[~2023-12-30 20:41 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-29 20:51 [PATCH next 0/5] locking/osq_lock: Optimisations to osq_lock code David Laight
2023-12-29 20:53 ` [PATCH next 1/5] locking/osq_lock: Move the definition of optimistic_spin_node into osf_lock.c David Laight
2023-12-30 1:59 ` Waiman Long
2023-12-29 20:54 ` [PATCH next 2/5] locking/osq_lock: Avoid dirtying the local cpu's 'node' in the osq_lock() fast path David Laight
2023-12-29 20:56 ` [PATCH next 3/5] locking/osq_lock: Clarify osq_wait_next() David Laight
2023-12-29 22:54 ` Linus Torvalds
2023-12-30 2:54 ` Waiman Long
2023-12-29 20:57 ` [PATCH next 4/5] locking/osq_lock: Optimise per-cpu data accesses David Laight
2023-12-30 3:08 ` Waiman Long
2023-12-30 11:09 ` Ingo Molnar
2023-12-30 11:35 ` David Laight
2023-12-31 3:04 ` Waiman Long
2023-12-31 10:36 ` David Laight
2023-12-30 20:37 ` Ingo Molnar
2023-12-30 22:47 ` David Laight
2023-12-30 20:41 ` Linus Torvalds [this message]
2023-12-30 20:59 ` Linus Torvalds
2023-12-31 11:56 ` David Laight
2023-12-31 11:41 ` David Laight
2023-12-29 20:58 ` [PATCH next 5/5] locking/osq_lock: Optimise vcpu_is_preempted() check David Laight
2023-12-30 3:13 ` Waiman Long
2023-12-30 15:57 ` Waiman Long
2023-12-30 22:37 ` David Laight
2023-12-29 22:11 ` [PATCH next 2/5] locking/osq_lock: Avoid dirtying the local cpu's 'node' in the osq_lock() fast path David Laight
2023-12-30 3:20 ` Waiman Long
2023-12-30 15:49 ` David Laight
2024-01-02 18:53 ` Boqun Feng
2024-01-02 23:32 ` David Laight
2023-12-30 19:40 ` [PATCH next 0/5] locking/osq_lock: Optimisations to osq_lock code Linus Torvalds
2023-12-30 22:39 ` David Laight
2023-12-31 2:14 ` Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAHk-=wjbWTbRKDP=Yb9VWBGjSBEGB3dJ0=--+7-4oA2n1=1FKw@mail.gmail.com' \
--to=torvalds@linux-foundation.org \
--cc=David.Laight@aculab.com \
--cc=boqun.feng@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=virtualization@lists.linux-foundation.org \
--cc=will@kernel.org \
--cc=xinhui.pan@linux.vnet.ibm.com \
--cc=zengheng4@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).