[RFC PATCH 0/4] x86/percpu: Use segment qualifiers

* [RFC PATCH 0/4] x86/percpu: Use segment qualifiers
@ 2023-10-01 13:14 Uros Bizjak
  2023-10-01 13:14 ` [RFC PATCH 1/4] x86/percpu: Update arch/x86/include/asm/percpu.h to the current tip Uros Bizjak
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Uros Bizjak @ 2023-10-01 13:14 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Uros Bizjak, Andy Lutomirski, Ingo Molnar, Nadav Amit,
	Brian Gerst, Denys Vlasenko, H . Peter Anvin, Linus Torvalds,
	Peter Zijlstra, Thomas Gleixner, Borislav Petkov, Josh Poimboeuf

This patchset resurrect the work of Richard Henderson [1] and Nadav Amit [2]
to introduce named address spaces compiler extension [3,4] into the linux kernel.

On the x86 target, variables may be declared as being relative to the %fs or %gs segments.

__seg_fs
__seg_gs

The object is accessed with the respective segment override prefix.

The following patchset takes a bit more cautious approach and converts only moves,
currently implemented as an asm, to generic moves to/from named address space. The
compiler is then able to propagate memory arguments into instructions that use these
memory references, producing more compact assembly, in addition to avoiding using
a register as a temporary to hold value from the memory.

The patchset enables propagation of hundreds of memory arguments, resulting in
the cumulative code size reduction of 7.94kB (please note that the kernel is
compiled with -O2, so the code size is not entirely correct measure; some
parts of the code can now be duplicated for better performance due to -O2,
etc...).

Some examples of propagations:

a) into sign/zero extensions:

 110b54:       65 0f b6 05 00 00 00    movzbl %gs:0x0(%rip),%eax
 11ab90:       65 0f b6 15 00 00 00    movzbl %gs:0x0(%rip),%edx
 14484a:       65 0f b7 35 00 00 00    movzwl %gs:0x0(%rip),%esi
 1a08a9:       65 0f b6 43 78          movzbl %gs:0x78(%rbx),%eax
 1a08f9:       65 0f b6 43 78          movzbl %gs:0x78(%rbx),%eax

 4ab29a:       65 48 63 15 00 00 00    movslq %gs:0x0(%rip),%rdx
 4be128:       65 4c 63 25 00 00 00    movslq %gs:0x0(%rip),%r12
 547468:       65 48 63 1f             movslq %gs:(%rdi),%rbx
 5474e7:       65 48 63 0a             movslq %gs:(%rdx),%rcx
 54d05d:       65 48 63 0d 00 00 00    movslq %gs:0x0(%rip),%rcx

b) into compares:

 b40804:       65 f7 05 00 00 00 00    testl  $0xf0000,%gs:0x0(%rip)
 b487e8:       65 f7 05 00 00 00 00    testl  $0xf0000,%gs:0x0(%rip)
 b6f14c:       65 f6 05 00 00 00 00    testb  $0x1,%gs:0x0(%rip)
 bac1b8:       65 f6 05 00 00 00 00    testb  $0x1,%gs:0x0(%rip)
 df2244:       65 f7 05 00 00 00 00    testl  $0xff00,%gs:0x0(%rip)

 9a7517:       65 80 3d 00 00 00 00    cmpb   $0x0,%gs:0x0(%rip)
 b282ba:       65 44 3b 35 00 00 00    cmp    %gs:0x0(%rip),%r14d
 b48f61:       65 66 83 3d 00 00 00    cmpw   $0x8,%gs:0x0(%rip)
 b493fe:       65 80 38 00             cmpb   $0x0,%gs:(%rax)
 b73867:       65 66 83 3d 00 00 00    cmpw   $0x8,%gs:0x0(%rip)

c) into other insns:

 65ec02:       65 0f 44 15 00 00 00    cmove  %gs:0x0(%rip),%edx
 6c98ac:       65 0f 44 15 00 00 00    cmove  %gs:0x0(%rip),%edx
 9aafaf:       65 0f 44 15 00 00 00    cmove  %gs:0x0(%rip),%edx
 b45868:       65 0f 48 35 00 00 00    cmovs  %gs:0x0(%rip),%esi
 d276f8:       65 0f 44 15 00 00 00    cmove  %gs:0x0(%rip),%edx

The above propagations result in the following code size
improvements for current mainline kernel (with the default config),
compiled with

gcc (GCC) 12.3.1 20230508 (Red Hat 12.3.1-1)

   text    data     bss     dec     hex filename
25508862        4386540  808388 30703790        1d480ae vmlinux-vanilla.o
25500922        4386532  808388 30695842        1d461a2 vmlinux-new.o

The conversion of other read-modify-write instructions does not bring us any
benefits, the compiler has some problems when constructing RMW instructions
from the generic code and easily misses some opportunities.

There are other optimizations possible involving arch_raw_cpu_ptr and
aggressive caching of current that are implemented in the original
patch series. These can be implemented as follow-ups at some later
time.

The patcshet was tested on Fedora 38 with kernel 6.5.5 and gcc 13.2.1
(In fact, I'm writing this message on the patched kernel.)

[1] https://lore.kernel.org/lkml/1454483253-11246-1-git-send-email-rth@twiddle.net/
[2] https://lore.kernel.org/lkml/20190823224424.15296-1-namit@vmware.com/
[3] https://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html
[4] https://clang.llvm.org/docs/LanguageExtensions.html#target-specific-extensions

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>

Uros Bizjak (4):
  x86/percpu: Update arch/x86/include/asm/percpu.h to the current tip
  x86/percpu: Detect compiler support for named address spaces
  x86/percpu: Use compiler segment prefix qualifier
  x86/percpu: Use C for percpu read/write accessors

 arch/x86/Kconfig               |   7 +
 arch/x86/include/asm/percpu.h  | 237 ++++++++++++++++++++++++++++-----
 arch/x86/include/asm/preempt.h |   2 +-
 3 files changed, 209 insertions(+), 37 deletions(-)

-- 
2.41.0

^ permalink raw reply	[flat|nested] 18+ messages in thread