From: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
To: x86@kernel.org, Ingo Molnar <mingo@redhat.com>,
Borislav Petkov <bp@alien8.de>
Cc: Nick Desaulniers <ndesaulniers@google.com>,
Thomas Gleixner <tglx@linutronix.de>,
linux-kernel@vger.kernel.org, Yury Norov <yury.norov@gmail.com>,
llvm@lists.linux.dev,
Vincent Mailhol <mailhol.vincent@wanadoo.fr>,
Borislav Petkov <bp@suse.de>
Subject: [PATCH v1 1/2] x86/asm/bitops: Replace __fls() by its generic builtin implementation
Date: Sun, 6 Nov 2022 18:51:05 +0900 [thread overview]
Message-ID: <20221106095106.849154-2-mailhol.vincent@wanadoo.fr> (raw)
In-Reply-To: <20221106095106.849154-1-mailhol.vincent@wanadoo.fr>
Below snippet:
#include <linux/bitops.h>
unsigned int foo(unsigned long word)
{
return __fls(word);
}
produces this on GCC 12.1.0:
0000000000000000 <foo>:
0: f3 0f 1e fa endbr64
4: e8 00 00 00 00 call 9 <foo+0x9>
9: 53 push %rbx
a: 48 89 fb mov %rdi,%rbx
d: e8 00 00 00 00 call 12 <foo+0x12>
12: 48 0f bd c3 bsr %rbx,%rax
16: 5b pop %rbx
17: 31 ff xor %edi,%edi
19: e9 00 00 00 00 jmp 1e <foo+0x1e>
and that on clang 14.0.6:
0000000000000000 <foo>:
0: f3 0f 1e fa endbr64
4: e8 00 00 00 00 call 9 <foo+0x9>
9: 53 push %rbx
a: 50 push %rax
b: 48 89 fb mov %rdi,%rbx
e: e8 00 00 00 00 call 13 <foo+0x13>
13: 48 89 1c 24 mov %rbx,(%rsp)
17: 48 0f bd 04 24 bsr (%rsp),%rax
1c: 48 83 c4 08 add $0x8,%rsp
20: 5b pop %rbx
21: c3 ret
The implementation from <asm-generic/bitops/builtin-__fls.h> [1]
produces the exact same code on GCC and below code on clang:
0000000000000000 <foo>:
0: f3 0f 1e fa endbr64
4: e8 00 00 00 00 call 9 <foo+0x9>
9: 53 push %rbx
a: 48 89 fb mov %rdi,%rbx
d: e8 00 00 00 00 call 12 <foo+0x12>
12: 48 0f bd c3 bsr %rbx,%rax
16: 5b pop %rbx
17: c3 ret
The builtin implementation is better for two reasons:
1/ it saves two instructions on clang (a push and a stack pointer
decrement) because of a useless tentative to save rax.
2/ when used on constant expressions, the compiler is only able to
fold the builtin version (c.f. [2]).
For those two reasons, replace the assembly implementation by its
builtin counterpart.
[1] https://elixir.bootlin.com/linux/v6.0/source/include/asm-generic/bitops/builtin-__fls.h
[2] commit 146034fed6ee ("x86/asm/bitops: Use __builtin_ffs() to evaluate constant expressions")
CC: Borislav Petkov <bp@suse.de>
CC: Nick Desaulniers <ndesaulniers@google.com>
CC: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
---
arch/x86/include/asm/bitops.h | 14 +-------------
1 file changed, 1 insertion(+), 13 deletions(-)
diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index 2edf68475fec..a31453d7686d 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -285,19 +285,7 @@ static __always_inline unsigned long variable_ffz(unsigned long word)
(unsigned long)__builtin_ctzl(~word) : \
variable_ffz(word))
-/*
- * __fls: find last set bit in word
- * @word: The word to search
- *
- * Undefined if no set bit exists, so code should check against 0 first.
- */
-static __always_inline unsigned long __fls(unsigned long word)
-{
- asm("bsr %1,%0"
- : "=r" (word)
- : "rm" (word));
- return word;
-}
+#include <asm-generic/bitops/builtin-__fls.h>
#undef ADDR
--
2.37.4
next prev parent reply other threads:[~2022-11-06 9:51 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-06 9:51 [PATCH v1 0/2] x86/asm/bitops: optimize fls functions for constant expressions Vincent Mailhol
2022-11-06 9:51 ` Vincent Mailhol [this message]
2022-11-07 9:38 ` [PATCH v1 1/2] x86/asm/bitops: Replace __fls() by its generic builtin implementation Peter Zijlstra
2022-11-07 12:19 ` Vincent MAILHOL
2022-11-10 19:20 ` Nick Desaulniers
2022-11-10 19:06 ` Nick Desaulniers
2022-11-06 9:51 ` [PATCH v1 2/2] x86/asm/bitops: Use __builtin_clz*() to evaluate constant expressions Vincent Mailhol
2022-11-10 19:01 ` Nick Desaulniers
2022-11-11 1:57 ` Vincent MAILHOL
2022-11-11 3:36 ` Yury Norov
2022-11-11 7:02 ` Vincent MAILHOL
2022-11-25 2:33 ` [PATCH v2 0/2] x86/asm/bitops: optimize fls functions for " Vincent Mailhol
2022-11-25 2:33 ` [PATCH v2 1/2] x86/asm/bitops: Replace __fls() by its generic builtin implementation Vincent Mailhol
2022-11-25 2:33 ` [PATCH v2 2/2] x86/asm/bitops: Use __builtin_clz*() to evaluate constant expressions Vincent Mailhol
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221106095106.849154-2-mailhol.vincent@wanadoo.fr \
--to=mailhol.vincent@wanadoo.fr \
--cc=bp@alien8.de \
--cc=bp@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=llvm@lists.linux.dev \
--cc=mingo@redhat.com \
--cc=ndesaulniers@google.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
--cc=yury.norov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.