All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
To: Borislav Petkov <bp@alien8.de>
Cc: Nick Desaulniers <ndesaulniers@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	x86@kernel.org, Peter Zijlstra <peterz@infradead.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"H . Peter Anvin" <hpa@zytor.com>,
	Nathan Chancellor <nathan@kernel.org>, Tom Rix <trix@redhat.com>,
	linux-kernel@vger.kernel.org, llvm@lists.linux.dev,
	David Howells <dhowells@redhat.com>,
	Jan Beulich <JBeulich@suse.com>,
	Christophe Jaillet <christophe.jaillet@wanadoo.fr>,
	Joe Perches <joe@perches.com>,
	Josh Poimboeuf <jpoimboe@kernel.org>,
	Yury Norov <yury.norov@gmail.com>,
	Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Subject: [PATCH v8 1/2] x86/asm/bitops: ffs: use __builtin_ffs to evaluate constant expressions
Date: Wed,  7 Sep 2022 18:09:34 +0900	[thread overview]
Message-ID: <20220907090935.919-2-mailhol.vincent@wanadoo.fr> (raw)
In-Reply-To: <20220907090935.919-1-mailhol.vincent@wanadoo.fr>

For x86_64, the current ffs() implementation does not produce
optimized code when called with a constant expression. On the
contrary, the __builtin_ffs() functions of both GCC and clang are able
to fold the expression into a single instruction.

** Example **

Let's consider two dummy functions foo() and bar() as below:

  #include <linux/bitops.h>
  #define CONST 0x01000000

  unsigned int foo(void)
  {
  	return ffs(CONST);
  }

  unsigned int bar(void)
  {
  	return __builtin_ffs(CONST);
  }

GCC would produce below assembly code:

  0000000000000000 <foo>:
     0:	ba 00 00 00 01       	mov    $0x1000000,%edx
     5:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
     a:	0f bc c2             	bsf    %edx,%eax
     d:	83 c0 01             	add    $0x1,%eax
    10:	c3                   	ret
  <Instructions after ret and before next function were redacted>

  0000000000000020 <bar>:
    20:	b8 19 00 00 00       	mov    $0x19,%eax
    25:	c3                   	ret

And clang would produce:

  0000000000000000 <foo>:
     0:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
     5:	0f bc 05 00 00 00 00 	bsf    0x0(%rip),%eax        # c <foo+0xc>
     c:	83 c0 01             	add    $0x1,%eax
     f:	c3                   	ret

  0000000000000010 <bar>:
    10:	b8 19 00 00 00       	mov    $0x19,%eax
    15:	c3                   	ret

Both examples clearly demonstrate the benefit of using __builtin_ffs()
instead of the kernel's asm implementation for constant expressions.

However, for non constant expressions, the ffs() asm version of the
kernel remains better for x86_64 because, contrary to GCC, it doesn't
emit the CMOV assembly instruction, c.f. [1] (noticeably, clang is
able optimize out the CMOV call).

Use __builtin_constant_p() to select between the kernel's ffs() and
the __builtin_ffs() depending on whether the argument is constant or
not.

As a side benefit, replacing the ffs() function declaration by a macro
also removes below -Wshadow warning:

  ./arch/x86/include/asm/bitops.h:283:28: warning: declaration of 'ffs' shadows a built-in function [-Wshadow]
    283 | static __always_inline int ffs(int x)

** Statistics **

On a allyesconfig, before...:

  $ objdump -d vmlinux.o | grep bsf | wc -l
  1081

...and after:

  $ objdump -d vmlinux.o | grep bsf | wc -l
  792

So, roughly 26.7% of the calls to ffs() were using constant
expressions and could be optimized out.

(tests done on linux v5.18-rc5 x86_64 using GCC 11.2.1)

[1] commit ca3d30cc02f7 ("x86_64, asm: Optimise fls(), ffs() and fls64()")
Link: http://lkml.kernel.org/r/20111213145654.14362.39868.stgit@warthog.procyon.org.uk

Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
---
 arch/x86/include/asm/bitops.h | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index 0fe9de58af31..879238e5a6a0 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -292,18 +292,7 @@ static __always_inline unsigned long __fls(unsigned long word)
 #undef ADDR
 
 #ifdef __KERNEL__
-/**
- * ffs - find first set bit in word
- * @x: the word to search
- *
- * This is defined the same way as the libc and compiler builtin ffs
- * routines, therefore differs in spirit from the other bitops.
- *
- * ffs(value) returns 0 if value is 0 or the position of the first
- * set bit if value is nonzero. The first (least significant) bit
- * is at position 1.
- */
-static __always_inline int ffs(int x)
+static __always_inline int variable_ffs(int x)
 {
 	int r;
 
@@ -333,6 +322,19 @@ static __always_inline int ffs(int x)
 	return r + 1;
 }
 
+/**
+ * ffs - find first set bit in word
+ * @x: the word to search
+ *
+ * This is defined the same way as the libc and compiler builtin ffs
+ * routines, therefore differs in spirit from the other bitops.
+ *
+ * ffs(value) returns 0 if value is 0 or the position of the first
+ * set bit if value is nonzero. The first (least significant) bit
+ * is at position 1.
+ */
+#define ffs(x) (__builtin_constant_p(x) ? __builtin_ffs(x) : variable_ffs(x))
+
 /**
  * fls - find last set bit in word
  * @x: the word to search
-- 
2.35.1


  reply	other threads:[~2022-09-07  9:17 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-11 16:03 [PATCH v2 0/2] x86/asm/bitops: optimize ff{s,z} functions for constant expressions Vincent Mailhol
2022-05-11 16:03 ` [PATCH v2 1/2] x86/asm/bitops: ffs: use __builtin_ffs to evaluate " Vincent Mailhol
2022-05-11 20:56   ` Christophe JAILLET
2022-05-11 23:30     ` Vincent MAILHOL
2022-05-11 21:35   ` Nick Desaulniers
2022-05-11 23:48     ` Vincent MAILHOL
2022-05-11 16:03 ` [PATCH v2 2/2] x86/asm/bitops: __ffs,ffz: use __builtin_ctzl " Vincent Mailhol
2022-05-11 22:20   ` Nick Desaulniers
2022-05-11 23:23     ` Vincent MAILHOL
2022-05-12  0:03 ` [PATCH v3 0/2] x86/asm/bitops: optimize ff{s,z} functions for " Vincent Mailhol
2022-05-12  0:03   ` [PATCH v3 1/2] x86/asm/bitops: ffs: use __builtin_ffs to evaluate " Vincent Mailhol
2022-05-12  0:28     ` Nick Desaulniers
2022-05-12  1:18       ` Vincent MAILHOL
2022-05-12  0:03   ` [PATCH v3 2/2] x86/asm/bitops: __ffs,ffz: use __builtin_ctzl " Vincent Mailhol
2022-05-12  0:19     ` Nick Desaulniers
2022-05-12  1:18 ` [PATCH v4 0/2] x86/asm/bitops: optimize ff{s,z} functions for " Vincent Mailhol
2022-05-12  1:18   ` [PATCH v4 1/2] x86/asm/bitops: ffs: use __builtin_ffs to evaluate " Vincent Mailhol
2022-05-12  3:02     ` Joe Perches
2022-05-12  4:29       ` Vincent MAILHOL
2022-05-12  1:18   ` [PATCH v4 2/2] x86/asm/bitops: __ffs,ffz: use __builtin_ctzl " Vincent Mailhol
2022-05-23  9:22   ` [PATCH v4 0/2] x86/asm/bitops: optimize ff{s,z} functions for " Vincent MAILHOL
2022-06-25  7:26 ` [RESEND PATCH " Vincent Mailhol
2022-06-25  7:26   ` [RESEND PATCH v4 1/2] x86/asm/bitops: ffs: use __builtin_ffs to evaluate " Vincent Mailhol
2022-06-25  7:26   ` [RESEND PATCH v4 2/2] x86/asm/bitops: __ffs,ffz: use __builtin_ctzl " Vincent Mailhol
2022-07-23 15:15 ` [RESEND PATCH v4 0/2] x86/asm/bitops: optimize ff{s,z} functions for " Vincent Mailhol
2022-07-23 15:15   ` [RESEND PATCH v4 1/2] x86/asm/bitops: ffs: use __builtin_ffs to evaluate " Vincent Mailhol
2022-08-11 14:59     ` Borislav Petkov
2022-08-12 11:55       ` Vincent MAILHOL
2022-07-23 15:15   ` [RESEND PATCH v4 2/2] x86/asm/bitops: __ffs,ffz: use __builtin_ctzl " Vincent Mailhol
2022-07-29 11:24   ` [RESEND PATCH v4 0/2] x86/asm/bitops: optimize ff{s,z} functions for " Vincent MAILHOL
2022-07-29 12:22     ` Borislav Petkov
2022-07-29 13:50       ` Vincent MAILHOL
2022-08-12 11:44 ` [PATCH v5 " Vincent Mailhol
2022-08-12 11:44   ` [PATCH v5 1/2] x86/asm/bitops: ffs: use __builtin_ffs to evaluate " Vincent Mailhol
2022-08-12 11:44   ` [PATCH v5 2/2] x86/asm/bitops: __ffs,ffz: use __builtin_ctzl " Vincent Mailhol
2022-08-23 16:23     ` Borislav Petkov
2022-08-23 17:12       ` Nick Desaulniers
2022-08-23 17:43         ` Borislav Petkov
2022-08-23 20:31           ` Vincent MAILHOL
2022-08-24  8:43             ` Borislav Petkov
2022-08-24 12:10               ` Vincent MAILHOL
2022-08-24 13:24                 ` Borislav Petkov
2022-08-26 21:32                   ` Vincent MAILHOL
2022-09-07  4:06                     ` Borislav Petkov
2022-09-07  5:35                       ` Vincent MAILHOL
2022-09-07  8:50                         ` Borislav Petkov
2022-08-31  7:57 ` [PATCH v6 0/2] x86/asm/bitops: optimize ff{s,z} functions for " Vincent Mailhol
2022-08-31  7:57   ` [PATCH v6 1/2] x86/asm/bitops: ffs: use __builtin_ffs to evaluate " Vincent Mailhol
2022-08-31  7:57   ` [PATCH v6 2/2] x86/asm/bitops: __ffs,ffz: use __builtin_ctzl " Vincent Mailhol
2022-08-31  8:51   ` [PATCH v6 0/2] x86/asm/bitops: optimize ff{s,z} functions for " Yury Norov
2022-09-01  3:49     ` Yury Norov
2022-09-01 10:30       ` Vincent MAILHOL
2022-09-01 14:19         ` Yury Norov
2022-09-01 17:06           ` Nick Desaulniers
2022-09-02  5:34             ` Borislav Petkov
2022-09-02  0:41           ` Vincent MAILHOL
2022-09-02  1:19     ` Vincent MAILHOL
2022-09-05  0:37 ` [PATCH v7 " Vincent Mailhol
2022-09-05  0:37   ` [PATCH v7 1/2] x86/asm/bitops: ffs: use __builtin_ffs to evaluate " Vincent Mailhol
2022-09-05  0:37   ` [PATCH v7 2/2] x86/asm/bitops: __ffs,ffz: use __builtin_ctzl " Vincent Mailhol
2022-09-06 18:26   ` [PATCH v7 0/2] x86/asm/bitops: optimize ff{s,z} functions for " Nick Desaulniers
2022-09-07  7:04     ` Nick Desaulniers
2022-09-07  7:49       ` Vincent MAILHOL
2022-09-07  9:09 ` [PATCH v8 " Vincent Mailhol
2022-09-07  9:09   ` Vincent Mailhol [this message]
2022-09-07  9:09   ` [PATCH v8 2/2] x86/asm/bitops: __ffs,ffz: use __builtin_ctzl to evaluate " Vincent Mailhol
2022-09-20 20:00 ` [tip: x86/asm] x86/asm/bitops: Use __builtin_ctzl() " tip-bot2 for Vincent Mailhol
2022-09-20 20:00 ` [tip: x86/asm] x86/asm/bitops: Use __builtin_ffs() " tip-bot2 for Vincent Mailhol

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220907090935.919-2-mailhol.vincent@wanadoo.fr \
    --to=mailhol.vincent@wanadoo.fr \
    --cc=JBeulich@suse.com \
    --cc=bp@alien8.de \
    --cc=christophe.jaillet@wanadoo.fr \
    --cc=dave.hansen@linux.intel.com \
    --cc=dhowells@redhat.com \
    --cc=hpa@zytor.com \
    --cc=joe@perches.com \
    --cc=jpoimboe@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=llvm@lists.linux.dev \
    --cc=mingo@redhat.com \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=trix@redhat.com \
    --cc=x86@kernel.org \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.