From: Ingo Molnar <mingo@kernel.org>
To: Uros Bizjak <ubizjak@gmail.com>
Cc: x86@kernel.org, linux-kernel@vger.kernel.org,
Andy Lutomirski <luto@kernel.org>, Nadav Amit <namit@vmware.com>,
Brian Gerst <brgerst@gmail.com>,
Denys Vlasenko <dvlasenk@redhat.com>,
"H . Peter Anvin" <hpa@zytor.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Borislav Petkov <bp@alien8.de>,
Josh Poimboeuf <jpoimboe@redhat.com>
Subject: Re: [PATCH 4/4] x86/percpu: Use C for percpu read/write accessors
Date: Wed, 4 Oct 2023 18:40:42 +0200 [thread overview]
Message-ID: <ZR2VitjPb6Miksim@gmail.com> (raw)
In-Reply-To: <ZR2U4DLycLT5xFH6@gmail.com>
* Ingo Molnar <mingo@kernel.org> wrote:
>
> * Uros Bizjak <ubizjak@gmail.com> wrote:
>
> > The percpu code mostly uses inline assembly. Using segment qualifiers
> > allows to use C code instead, which enables the compiler to perform
> > various optimizations (e.g. propagation of memory arguments). Convert
> > percpu read and write accessors to C code, so the memory argument can
> > be propagated to the instruction that uses this argument.
> >
> > Some examples of propagations:
> >
> > a) into sign/zero extensions:
> >
> > 110b54: 65 0f b6 05 00 00 00 movzbl %gs:0x0(%rip),%eax
> > 11ab90: 65 0f b6 15 00 00 00 movzbl %gs:0x0(%rip),%edx
> > 14484a: 65 0f b7 35 00 00 00 movzwl %gs:0x0(%rip),%esi
> > 1a08a9: 65 0f b6 43 78 movzbl %gs:0x78(%rbx),%eax
> > 1a08f9: 65 0f b6 43 78 movzbl %gs:0x78(%rbx),%eax
> >
> > 4ab29a: 65 48 63 15 00 00 00 movslq %gs:0x0(%rip),%rdx
> > 4be128: 65 4c 63 25 00 00 00 movslq %gs:0x0(%rip),%r12
> > 547468: 65 48 63 1f movslq %gs:(%rdi),%rbx
> > 5474e7: 65 48 63 0a movslq %gs:(%rdx),%rcx
> > 54d05d: 65 48 63 0d 00 00 00 movslq %gs:0x0(%rip),%rcx
>
> Could you please also quote a 'before' assembly sequence, at least once
> per group of propagations?
Ie. for any changes to x86 code generation, please follow the changelog
format of:
7c097ca50d2b ("x86/percpu: Do not clobber %rsi in percpu_{try_,}cmpxchg{64,128}_op")
...
Move the load of %rsi outside inline asm, so the compiler can
reuse the value. The code in slub.o improves from:
55ac: 49 8b 3c 24 mov (%r12),%rdi
55b0: 48 8d 4a 40 lea 0x40(%rdx),%rcx
55b4: 49 8b 1c 07 mov (%r15,%rax,1),%rbx
55b8: 4c 89 f8 mov %r15,%rax
55bb: 48 8d 37 lea (%rdi),%rsi
55be: e8 00 00 00 00 callq 55c3 <...>
55bf: R_X86_64_PLT32 this_cpu_cmpxchg16b_emu-0x4
55c3: 75 a3 jne 5568 <...>
55c5: ...
0000000000000000 <.altinstr_replacement>:
5: 65 48 0f c7 0f cmpxchg16b %gs:(%rdi)
to:
55ac: 49 8b 34 24 mov (%r12),%rsi
55b0: 48 8d 4a 40 lea 0x40(%rdx),%rcx
55b4: 49 8b 1c 07 mov (%r15,%rax,1),%rbx
55b8: 4c 89 f8 mov %r15,%rax
55bb: e8 00 00 00 00 callq 55c0 <...>
55bc: R_X86_64_PLT32 this_cpu_cmpxchg16b_emu-0x4
55c0: 75 a6 jne 5568 <...>
55c2: ...
Where the alternative replacement instruction now uses %rsi:
0000000000000000 <.altinstr_replacement>:
5: 65 48 0f c7 0e cmpxchg16b %gs:(%rsi)
The instruction (effectively a reg-reg move) at 55bb: in the original
assembly is removed. Also, both the CALL and replacement CMPXCHG16B
are 5 bytes long, removing the need for NOPs in the asm code.
...
Thanks,
Ingo
next prev parent reply other threads:[~2023-10-04 16:40 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-04 14:49 [PATCH 0/4] x86/percpu: Use segment qualifiers Uros Bizjak
2023-10-04 14:49 ` [PATCH 1/4] x86/percpu: Update arch/x86/include/asm/percpu.h to the current tip Uros Bizjak
2023-10-04 14:49 ` [PATCH 2/4] x86/percpu: Enable named address spaces with known compiler version Uros Bizjak
2023-10-05 7:20 ` [tip: x86/percpu] " tip-bot2 for Uros Bizjak
2023-10-04 14:49 ` [PATCH 3/4] x86/percpu: Use compiler segment prefix qualifier Uros Bizjak
2023-10-05 7:20 ` [tip: x86/percpu] " tip-bot2 for Nadav Amit
2023-10-04 14:49 ` [PATCH 4/4] x86/percpu: Use C for percpu read/write accessors Uros Bizjak
2023-10-04 16:37 ` Ingo Molnar
2023-10-04 16:40 ` Ingo Molnar [this message]
2023-10-04 19:23 ` [PATCH v2 " Uros Bizjak
2023-10-04 19:42 ` Linus Torvalds
2023-10-04 20:07 ` Uros Bizjak
2023-10-04 20:12 ` Linus Torvalds
2023-10-04 20:19 ` Linus Torvalds
2023-10-04 20:22 ` Uros Bizjak
2023-10-05 7:06 ` Ingo Molnar
2023-10-05 7:40 ` Uros Bizjak
2023-10-05 7:20 ` [tip: x86/percpu] " tip-bot2 for Uros Bizjak
2023-10-08 17:59 ` [PATCH 4/4] " Linus Torvalds
2023-10-08 19:17 ` Uros Bizjak
2023-10-08 20:13 ` Linus Torvalds
2023-10-08 20:48 ` Linus Torvalds
2023-10-08 21:41 ` Uros Bizjak
2023-10-09 11:41 ` Ingo Molnar
2023-10-09 11:51 ` Ingo Molnar
2023-10-09 12:00 ` Uros Bizjak
2023-10-09 12:20 ` Ingo Molnar
2023-10-09 12:21 ` Nadav Amit
2023-10-09 12:42 ` Uros Bizjak
2023-10-09 12:53 ` Nadav Amit
2023-10-09 12:27 ` Uros Bizjak
2023-10-09 14:35 ` Uros Bizjak
2024-04-10 11:11 ` Andrey Konovalov
2024-04-10 11:21 ` Uros Bizjak
2024-04-10 11:24 ` Andrey Konovalov
2023-10-09 11:42 ` Ingo Molnar
2023-10-10 6:37 ` Uros Bizjak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZR2VitjPb6Miksim@gmail.com \
--to=mingo@kernel.org \
--cc=bp@alien8.de \
--cc=brgerst@gmail.com \
--cc=dvlasenk@redhat.com \
--cc=hpa@zytor.com \
--cc=jpoimboe@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=namit@vmware.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=ubizjak@gmail.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.