From: Nick Desaulniers <ndesaulniers@google.com>
To: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Cc: Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
x86@kernel.org, "H . Peter Anvin" <hpa@zytor.com>,
Nathan Chancellor <nathan@kernel.org>, Tom Rix <trix@redhat.com>,
linux-kernel@vger.kernel.org, llvm@lists.linux.dev,
David Howells <dhowells@redhat.com>
Subject: Re: [PATCH 1/2] x86/asm/bitops: ffs: use __builtin_ffs to evaluate constant expressions
Date: Tue, 10 May 2022 15:29:43 -0700 [thread overview]
Message-ID: <CAKwvOdnorHJWesiardEnhYACM4NY_PHBHaoJZB1miJjgKukg2Q@mail.gmail.com> (raw)
In-Reply-To: <20220510142550.1686866-2-mailhol.vincent@wanadoo.fr>
On Tue, May 10, 2022 at 7:26 AM Vincent Mailhol
<mailhol.vincent@wanadoo.fr> wrote:
>
> For x86_64, the current ffs() implementation does not produce
> optimized code when called with a constant expression. On the
> contrary, the __builtin_ffs() function of both GCC and clang is able
> to simplify the expression into a single instruction.
>
> * Example *
>
> Let's consider two dummy functions foo() and bar() as below:
>
> | #include <linux/bitops.h>
> | #define CONST 0x01000000
> |
> | unsigned int foo(void)
> | {
> | return ffs(CONST);
> | }
> |
> | unsigned int bar(void)
> | {
> | return __builtin_ffs(CONST);
> | }
>
> GCC would produce below assembly code:
>
> | 0000000000000000 <foo>:
> | 0: b8 ff ff ff ff mov $0xffffffff,%eax
> | 5: 0f bc c7 bsf %edi,%eax
> | 8: 83 c0 01 add $0x1,%eax
> | b: c3 ret
> | c: 0f 1f 40 00 nopl 0x0(%rax)
> |
> | 0000000000000010 <bar>:
> | 10: b8 19 00 00 00 mov $0x19,%eax
> | 15: c3 ret
>
> And clang would produce:
>
> | 0000000000000000 <foo>:
> | 0: 55 push %rbp
> | 1: 48 89 e5 mov %rsp,%rbp
> | 4: b8 ff ff ff ff mov $0xffffffff,%eax
> | 9: 0f bc 05 00 00 00 00 bsf 0x0(%rip),%eax # 10 <foo+0x10>
> | 10: ff c0 inc %eax
> | 12: 5d pop %rbp
> | 13: c3 ret
> | 14: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
> | 1b: 00 00 00
> | 1e: 66 90 xchg %ax,%ax
> |
> | 0000000000000020 <bar>:
> | 20: 55 push %rbp
> | 21: 48 89 e5 mov %rsp,%rbp
> | 24: b8 19 00 00 00 mov $0x19,%eax
> | 29: 5d pop %rbp
> | 2a: c3 ret
Right, we need to allocate registers to move the inputs into the asm
block, and the results back out. Inline asm is analogous to a call
with a custom calling convention, where we don't look into the body of
the inline asm.
Does -fomit-frame-pointer clean make these snippets clearer, or did
you not build with -O2? Consider using those flags if so, since we
generally prefer the ORC unwinder on x86, not the frame pointer
unwinder. If the compilers are forcing a frame pointer when using the
builtins once optimizations are enabled, that's a problem (that we've
seen in the past with the builtins for reading eflags with clang; now
fixed).
>
> For both examples, we clearly see the benefit of using __builtin_ffs()
> instead of the kernel's asm implementation for constant expressions.
>
> However, for non constant expressions, the ffs() asm version of the
> kernel remains better for x86_64 because, contrary to GCC, it doesn't
> emit the CMOV assembly instruction, c.f. [1] (noticeably, clang is
> able optimize out the CMOV call).
>
> This patch uses the __builtin_constant_p() to select between the
> kernel's ffs() and the __builtin_ffs() depending on whether the
> argument is constant or not.
>
>
> As a side benefit, this patch also removes below -Wshadow warning:
>
> | ./arch/x86/include/asm/bitops.h:283:28: warning: declaration of 'ffs' shadows a built-in function [-Wshadow]
> | 283 | static __always_inline int ffs(int x)
Nice! :)
>
> [1] commit ca3d30cc02f7 ("x86_64, asm: Optimise fls(), ffs() and fls64()")
> http://lkml.kernel.org/r/20111213145654.14362.39868.stgit@warthog.procyon.org.uk
+ David, author of ca3d30cc02f7. I was wondering if this applied to
more than just x86, but I see now that some architectures just include
include/asm-generic/bitops/builtin-ffs.h into their
arch/*/include/asm/bitops.h. It's only when we want to beat the
compiler for non-ICE expressions.
Patch LGTM; just minor comments on commit message, naming, and formatting.
>
>
> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
> ---
> arch/x86/include/asm/bitops.h | 29 +++++++++++++++++------------
> 1 file changed, 17 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
> index a288ecd230ab..535a7a358c14 100644
> --- a/arch/x86/include/asm/bitops.h
> +++ b/arch/x86/include/asm/bitops.h
> @@ -269,18 +269,7 @@ static __always_inline unsigned long __fls(unsigned long word)
> #undef ADDR
>
> #ifdef __KERNEL__
> -/**
> - * ffs - find first set bit in word
> - * @x: the word to search
> - *
> - * This is defined the same way as the libc and compiler builtin ffs
> - * routines, therefore differs in spirit from the other bitops.
> - *
> - * ffs(value) returns 0 if value is 0 or the position of the first
> - * set bit if value is nonzero. The first (least significant) bit
> - * is at position 1.
> - */
> -static __always_inline int ffs(int x)
> +static __always_inline int __ffs_asm(int x)
How about variable_ffs rather than __ffs_asm? Let's try to stick with
the convention used by test_bit?
> {
> int r;
>
> @@ -310,6 +299,22 @@ static __always_inline int ffs(int x)
> return r + 1;
> }
>
> +/**
> + * ffs - find first set bit in word
> + * @x: the word to search
> + *
> + * This is defined the same way as the libc and compiler builtin ffs
> + * routines, therefore differs in spirit from the other bitops.
> + *
> + * ffs(value) returns 0 if value is 0 or the position of the first
> + * set bit if value is nonzero. The first (least significant) bit
> + * is at position 1.
> + */
> +#define ffs(x) \
> + (__builtin_constant_p(x) ? \
> + __builtin_ffs(x) : \
> + __ffs_asm(x))
> +
I think this whole #define can fit on one line? If not, perhaps the
BCP can start on the initial line? Otherwise it looks like the
then/else clauses are indented by 1 tab followed by 1 space. Consider
just using tabs.
> /**
> * fls - find last set bit in word
> * @x: the word to search
> --
> 2.35.1
>
--
Thanks,
~Nick Desaulniers
next prev parent reply other threads:[~2022-05-10 22:30 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-10 14:25 [PATCH 0/2] x86/asm/bitops: optimize ff{s,z} functions for constant expressions Vincent Mailhol
2022-05-10 14:25 ` [PATCH 1/2] x86/asm/bitops: ffs: use __builtin_ffs to evaluate " Vincent Mailhol
2022-05-10 22:29 ` Nick Desaulniers [this message]
2022-05-10 23:54 ` Vincent MAILHOL
2022-05-11 15:38 ` Vincent MAILHOL
2022-05-10 14:25 ` [PATCH 2/2] x86/asm/bitops: __ffs,ffz: use __builtin_ctzl " Vincent Mailhol
2022-05-10 22:14 ` [PATCH 0/2] x86/asm/bitops: optimize ff{s,z} functions for " Nick Desaulniers
2022-05-10 23:24 ` Vincent MAILHOL
2022-05-11 14:46 ` Vincent MAILHOL
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAKwvOdnorHJWesiardEnhYACM4NY_PHBHaoJZB1miJjgKukg2Q@mail.gmail.com \
--to=ndesaulniers@google.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=dhowells@redhat.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=llvm@lists.linux.dev \
--cc=mailhol.vincent@wanadoo.fr \
--cc=mingo@redhat.com \
--cc=nathan@kernel.org \
--cc=tglx@linutronix.de \
--cc=trix@redhat.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).