All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nick Desaulniers <ndesaulniers@google.com>
To: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	 Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org,  "H . Peter Anvin" <hpa@zytor.com>,
	Nathan Chancellor <nathan@kernel.org>, Tom Rix <trix@redhat.com>,
	 linux-kernel@vger.kernel.org, llvm@lists.linux.dev,
	 David Howells <dhowells@redhat.com>
Subject: Re: [PATCH 1/2] x86/asm/bitops: ffs: use __builtin_ffs to evaluate constant expressions
Date: Tue, 10 May 2022 15:29:43 -0700	[thread overview]
Message-ID: <CAKwvOdnorHJWesiardEnhYACM4NY_PHBHaoJZB1miJjgKukg2Q@mail.gmail.com> (raw)
In-Reply-To: <20220510142550.1686866-2-mailhol.vincent@wanadoo.fr>

On Tue, May 10, 2022 at 7:26 AM Vincent Mailhol
<mailhol.vincent@wanadoo.fr> wrote:
>
> For x86_64, the current ffs() implementation does not produce
> optimized code when called with a constant expression. On the
> contrary, the __builtin_ffs() function of both GCC and clang is able
> to simplify the expression into a single instruction.
>
> * Example *
>
> Let's consider two dummy functions foo() and bar() as below:
>
> | #include <linux/bitops.h>
> | #define CONST 0x01000000
> |
> | unsigned int foo(void)
> | {
> |       return ffs(CONST);
> | }
> |
> | unsigned int bar(void)
> | {
> |       return __builtin_ffs(CONST);
> | }
>
> GCC would produce below assembly code:
>
> | 0000000000000000 <foo>:
> |    0: b8 ff ff ff ff          mov    $0xffffffff,%eax
> |    5: 0f bc c7                bsf    %edi,%eax
> |    8: 83 c0 01                add    $0x1,%eax
> |    b: c3                      ret
> |    c: 0f 1f 40 00             nopl   0x0(%rax)
> |
> | 0000000000000010 <bar>:
> |   10: b8 19 00 00 00          mov    $0x19,%eax
> |   15: c3                      ret
>
> And clang would produce:
>
> | 0000000000000000 <foo>:
> |    0: 55                      push   %rbp
> |    1: 48 89 e5                mov    %rsp,%rbp
> |    4: b8 ff ff ff ff          mov    $0xffffffff,%eax
> |    9: 0f bc 05 00 00 00 00    bsf    0x0(%rip),%eax        # 10 <foo+0x10>
> |   10: ff c0                   inc    %eax
> |   12: 5d                      pop    %rbp
> |   13: c3                      ret
> |   14: 66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
> |   1b: 00 00 00
> |   1e: 66 90                   xchg   %ax,%ax
> |
> | 0000000000000020 <bar>:
> |   20: 55                      push   %rbp
> |   21: 48 89 e5                mov    %rsp,%rbp
> |   24: b8 19 00 00 00          mov    $0x19,%eax
> |   29: 5d                      pop    %rbp
> |   2a: c3                      ret

Right, we need to allocate registers to move the inputs into the asm
block, and the results back out. Inline asm is analogous to a call
with a custom calling convention, where we don't look into the body of
the inline asm.

Does -fomit-frame-pointer clean make these snippets clearer, or did
you not build with -O2?  Consider using those flags if so, since we
generally prefer the ORC unwinder on x86, not the frame pointer
unwinder.  If the compilers are forcing a frame pointer when using the
builtins once optimizations are enabled, that's a problem (that we've
seen in the past with the builtins for reading eflags with clang; now
fixed).

>
> For both examples, we clearly see the benefit of using __builtin_ffs()
> instead of the kernel's asm implementation for constant expressions.
>
> However, for non constant expressions, the ffs() asm version of the
> kernel remains better for x86_64 because, contrary to GCC, it doesn't
> emit the CMOV assembly instruction, c.f. [1] (noticeably, clang is
> able optimize out the CMOV call).
>
> This patch uses the __builtin_constant_p() to select between the
> kernel's ffs() and the __builtin_ffs() depending on whether the
> argument is constant or not.
>
>
> As a side benefit, this patch also removes below -Wshadow warning:
>
> | ./arch/x86/include/asm/bitops.h:283:28: warning: declaration of 'ffs' shadows a built-in function [-Wshadow]
> |   283 | static __always_inline int ffs(int x)

Nice! :)

>
> [1] commit ca3d30cc02f7 ("x86_64, asm: Optimise fls(), ffs() and fls64()")
> http://lkml.kernel.org/r/20111213145654.14362.39868.stgit@warthog.procyon.org.uk

+ David, author of ca3d30cc02f7.  I was wondering if this applied to
more than just x86, but I see now that some architectures just include
include/asm-generic/bitops/builtin-ffs.h into their
arch/*/include/asm/bitops.h. It's only when we want to beat the
compiler for non-ICE expressions.

Patch LGTM; just minor comments on commit message, naming, and formatting.

>
>
> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
> ---
>  arch/x86/include/asm/bitops.h | 29 +++++++++++++++++------------
>  1 file changed, 17 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
> index a288ecd230ab..535a7a358c14 100644
> --- a/arch/x86/include/asm/bitops.h
> +++ b/arch/x86/include/asm/bitops.h
> @@ -269,18 +269,7 @@ static __always_inline unsigned long __fls(unsigned long word)
>  #undef ADDR
>
>  #ifdef __KERNEL__
> -/**
> - * ffs - find first set bit in word
> - * @x: the word to search
> - *
> - * This is defined the same way as the libc and compiler builtin ffs
> - * routines, therefore differs in spirit from the other bitops.
> - *
> - * ffs(value) returns 0 if value is 0 or the position of the first
> - * set bit if value is nonzero. The first (least significant) bit
> - * is at position 1.
> - */
> -static __always_inline int ffs(int x)
> +static __always_inline int __ffs_asm(int x)

How about variable_ffs rather than __ffs_asm? Let's try to stick with
the convention used by test_bit?

>  {
>         int r;
>
> @@ -310,6 +299,22 @@ static __always_inline int ffs(int x)
>         return r + 1;
>  }
>
> +/**
> + * ffs - find first set bit in word
> + * @x: the word to search
> + *
> + * This is defined the same way as the libc and compiler builtin ffs
> + * routines, therefore differs in spirit from the other bitops.
> + *
> + * ffs(value) returns 0 if value is 0 or the position of the first
> + * set bit if value is nonzero. The first (least significant) bit
> + * is at position 1.
> + */
> +#define ffs(x)                                 \
> +        (__builtin_constant_p(x) ?             \
> +         __builtin_ffs(x) :                    \
> +         __ffs_asm(x))
> +

I think this whole #define can fit on one line? If not, perhaps the
BCP can start on the initial line?  Otherwise it looks like the
then/else clauses are indented by 1 tab followed by 1 space. Consider
just using tabs.

>  /**
>   * fls - find last set bit in word
>   * @x: the word to search
> --
> 2.35.1
>


--
Thanks,
~Nick Desaulniers

  reply	other threads:[~2022-05-10 22:29 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-10 14:25 [PATCH 0/2] x86/asm/bitops: optimize ff{s,z} functions for constant expressions Vincent Mailhol
2022-05-10 14:25 ` [PATCH 1/2] x86/asm/bitops: ffs: use __builtin_ffs to evaluate " Vincent Mailhol
2022-05-10 22:29   ` Nick Desaulniers [this message]
2022-05-10 23:54     ` Vincent MAILHOL
2022-05-11 15:38       ` Vincent MAILHOL
2022-05-10 14:25 ` [PATCH 2/2] x86/asm/bitops: __ffs,ffz: use __builtin_ctzl " Vincent Mailhol
2022-05-10 22:14 ` [PATCH 0/2] x86/asm/bitops: optimize ff{s,z} functions for " Nick Desaulniers
2022-05-10 23:24   ` Vincent MAILHOL
2022-05-11 14:46     ` Vincent MAILHOL

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKwvOdnorHJWesiardEnhYACM4NY_PHBHaoJZB1miJjgKukg2Q@mail.gmail.com \
    --to=ndesaulniers@google.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=dhowells@redhat.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=llvm@lists.linux.dev \
    --cc=mailhol.vincent@wanadoo.fr \
    --cc=mingo@redhat.com \
    --cc=nathan@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=trix@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.