All of lore.kernel.org
 help / color / mirror / Atom feed
From: "tip-bot2 for Vincent Mailhol" <tip-bot2@linutronix.de>
To: linux-tip-commits@vger.kernel.org
Cc: Vincent Mailhol <mailhol.vincent@wanadoo.fr>,
	Borislav Petkov <bp@suse.de>,
	Nick Desaulniers <ndesaulniers@google.com>,
	Yury Norov <yury.norov@gmail.com>,
	x86@kernel.org, linux-kernel@vger.kernel.org
Subject: [tip: x86/asm] x86/asm/bitops: Use __builtin_ctzl() to evaluate constant expressions
Date: Tue, 20 Sep 2022 20:00:18 -0000	[thread overview]
Message-ID: <166370401862.401.15713055946708670653.tip-bot2@tip-bot2> (raw)
In-Reply-To: <20220511160319.1045812-1-mailhol.vincent@wanadoo.fr>

The following commit has been merged into the x86/asm branch of tip:

Commit-ID:     fdb6649ab7c142e497539a471e573c2593b9c923
Gitweb:        https://git.kernel.org/tip/fdb6649ab7c142e497539a471e573c2593b9c923
Author:        Vincent Mailhol <mailhol.vincent@wanadoo.fr>
AuthorDate:    Wed, 07 Sep 2022 18:09:35 +09:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 20 Sep 2022 15:35:37 +02:00

x86/asm/bitops: Use __builtin_ctzl() to evaluate constant expressions

If x is not 0, __ffs(x) is equivalent to:
  (unsigned long)__builtin_ctzl(x)
And if x is not ~0UL, ffz(x) is equivalent to:
  (unsigned long)__builtin_ctzl(~x)
Because __builting_ctzl() returns an int, a cast to (unsigned long) is
necessary to avoid potential warnings on implicit casts.

Concerning the edge cases, __builtin_ctzl(0) is always undefined,
whereas __ffs(0) and ffz(~0UL) may or may not be defined, depending on
the processor. Regardless, for both functions, developers are asked to
check against 0 or ~0UL so replacing __ffs() or ffz() by
__builting_ctzl() is safe.

For x86_64, the current __ffs() and ffz() implementations do not
produce optimized code when called with a constant expression. On the
contrary, the __builtin_ctzl() folds into a single instruction.

However, for non constant expressions, the __ffs() and ffz() asm
versions of the kernel remains slightly better than the code produced
by GCC (it produces a useless instruction to clear eax).

Use __builtin_constant_p() to select between the kernel's
__ffs()/ffz() and the __builtin_ctzl() depending on whether the
argument is constant or not.

** Statistics **

On a allyesconfig, before...:

  $ objdump -d vmlinux.o | grep tzcnt | wc -l
  3607

...and after:

  $ objdump -d vmlinux.o | grep tzcnt | wc -l
  2600

So, roughly 27.9% of the calls to either __ffs() or ffz() were using
constant expressions and could be optimized out.

(tests done on linux v5.18-rc5 x86_64 using GCC 11.2.1)

Note: on x86_64, the BSF instruction produces TZCNT when used with the
REP prefix (which explain the use of `grep tzcnt' instead of `grep bsf'
in above benchmark). c.f. [1]

[1] e26a44a2d618 ("x86: Use REP BSF unconditionally")

  [ bp: Massage commit message. ]

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20220511160319.1045812-1-mailhol.vincent@wanadoo.fr
---
 arch/x86/include/asm/bitops.h | 28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index 879238e..2edf684 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -247,17 +247,30 @@ arch_test_bit_acquire(unsigned long nr, const volatile unsigned long *addr)
 					  variable_test_bit(nr, addr);
 }
 
+static __always_inline unsigned long variable__ffs(unsigned long word)
+{
+	asm("rep; bsf %1,%0"
+		: "=r" (word)
+		: "rm" (word));
+	return word;
+}
+
 /**
  * __ffs - find first set bit in word
  * @word: The word to search
  *
  * Undefined if no bit exists, so code should check against 0 first.
  */
-static __always_inline unsigned long __ffs(unsigned long word)
+#define __ffs(word)				\
+	(__builtin_constant_p(word) ?		\
+	 (unsigned long)__builtin_ctzl(word) :	\
+	 variable__ffs(word))
+
+static __always_inline unsigned long variable_ffz(unsigned long word)
 {
 	asm("rep; bsf %1,%0"
 		: "=r" (word)
-		: "rm" (word));
+		: "r" (~word));
 	return word;
 }
 
@@ -267,13 +280,10 @@ static __always_inline unsigned long __ffs(unsigned long word)
  *
  * Undefined if no zero exists, so code should check against ~0UL first.
  */
-static __always_inline unsigned long ffz(unsigned long word)
-{
-	asm("rep; bsf %1,%0"
-		: "=r" (word)
-		: "r" (~word));
-	return word;
-}
+#define ffz(word)				\
+	(__builtin_constant_p(word) ?		\
+	 (unsigned long)__builtin_ctzl(~word) :	\
+	 variable_ffz(word))
 
 /*
  * __fls: find last set bit in word

  parent reply	other threads:[~2022-09-20 20:00 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-11 16:03 [PATCH v2 0/2] x86/asm/bitops: optimize ff{s,z} functions for constant expressions Vincent Mailhol
2022-05-11 16:03 ` [PATCH v2 1/2] x86/asm/bitops: ffs: use __builtin_ffs to evaluate " Vincent Mailhol
2022-05-11 20:56   ` Christophe JAILLET
2022-05-11 23:30     ` Vincent MAILHOL
2022-05-11 21:35   ` Nick Desaulniers
2022-05-11 23:48     ` Vincent MAILHOL
2022-05-11 16:03 ` [PATCH v2 2/2] x86/asm/bitops: __ffs,ffz: use __builtin_ctzl " Vincent Mailhol
2022-05-11 22:20   ` Nick Desaulniers
2022-05-11 23:23     ` Vincent MAILHOL
2022-05-12  0:03 ` [PATCH v3 0/2] x86/asm/bitops: optimize ff{s,z} functions for " Vincent Mailhol
2022-05-12  0:03   ` [PATCH v3 1/2] x86/asm/bitops: ffs: use __builtin_ffs to evaluate " Vincent Mailhol
2022-05-12  0:28     ` Nick Desaulniers
2022-05-12  1:18       ` Vincent MAILHOL
2022-05-12  0:03   ` [PATCH v3 2/2] x86/asm/bitops: __ffs,ffz: use __builtin_ctzl " Vincent Mailhol
2022-05-12  0:19     ` Nick Desaulniers
2022-05-12  1:18 ` [PATCH v4 0/2] x86/asm/bitops: optimize ff{s,z} functions for " Vincent Mailhol
2022-05-12  1:18   ` [PATCH v4 1/2] x86/asm/bitops: ffs: use __builtin_ffs to evaluate " Vincent Mailhol
2022-05-12  3:02     ` Joe Perches
2022-05-12  4:29       ` Vincent MAILHOL
2022-05-12  1:18   ` [PATCH v4 2/2] x86/asm/bitops: __ffs,ffz: use __builtin_ctzl " Vincent Mailhol
2022-05-23  9:22   ` [PATCH v4 0/2] x86/asm/bitops: optimize ff{s,z} functions for " Vincent MAILHOL
2022-06-25  7:26 ` [RESEND PATCH " Vincent Mailhol
2022-06-25  7:26   ` [RESEND PATCH v4 1/2] x86/asm/bitops: ffs: use __builtin_ffs to evaluate " Vincent Mailhol
2022-06-25  7:26   ` [RESEND PATCH v4 2/2] x86/asm/bitops: __ffs,ffz: use __builtin_ctzl " Vincent Mailhol
2022-07-23 15:15 ` [RESEND PATCH v4 0/2] x86/asm/bitops: optimize ff{s,z} functions for " Vincent Mailhol
2022-07-23 15:15   ` [RESEND PATCH v4 1/2] x86/asm/bitops: ffs: use __builtin_ffs to evaluate " Vincent Mailhol
2022-08-11 14:59     ` Borislav Petkov
2022-08-12 11:55       ` Vincent MAILHOL
2022-07-23 15:15   ` [RESEND PATCH v4 2/2] x86/asm/bitops: __ffs,ffz: use __builtin_ctzl " Vincent Mailhol
2022-07-29 11:24   ` [RESEND PATCH v4 0/2] x86/asm/bitops: optimize ff{s,z} functions for " Vincent MAILHOL
2022-07-29 12:22     ` Borislav Petkov
2022-07-29 13:50       ` Vincent MAILHOL
2022-08-12 11:44 ` [PATCH v5 " Vincent Mailhol
2022-08-12 11:44   ` [PATCH v5 1/2] x86/asm/bitops: ffs: use __builtin_ffs to evaluate " Vincent Mailhol
2022-08-12 11:44   ` [PATCH v5 2/2] x86/asm/bitops: __ffs,ffz: use __builtin_ctzl " Vincent Mailhol
2022-08-23 16:23     ` Borislav Petkov
2022-08-23 17:12       ` Nick Desaulniers
2022-08-23 17:43         ` Borislav Petkov
2022-08-23 20:31           ` Vincent MAILHOL
2022-08-24  8:43             ` Borislav Petkov
2022-08-24 12:10               ` Vincent MAILHOL
2022-08-24 13:24                 ` Borislav Petkov
2022-08-26 21:32                   ` Vincent MAILHOL
2022-09-07  4:06                     ` Borislav Petkov
2022-09-07  5:35                       ` Vincent MAILHOL
2022-09-07  8:50                         ` Borislav Petkov
2022-08-31  7:57 ` [PATCH v6 0/2] x86/asm/bitops: optimize ff{s,z} functions for " Vincent Mailhol
2022-08-31  7:57   ` [PATCH v6 1/2] x86/asm/bitops: ffs: use __builtin_ffs to evaluate " Vincent Mailhol
2022-08-31  7:57   ` [PATCH v6 2/2] x86/asm/bitops: __ffs,ffz: use __builtin_ctzl " Vincent Mailhol
2022-08-31  8:51   ` [PATCH v6 0/2] x86/asm/bitops: optimize ff{s,z} functions for " Yury Norov
2022-09-01  3:49     ` Yury Norov
2022-09-01 10:30       ` Vincent MAILHOL
2022-09-01 14:19         ` Yury Norov
2022-09-01 17:06           ` Nick Desaulniers
2022-09-02  5:34             ` Borislav Petkov
2022-09-02  0:41           ` Vincent MAILHOL
2022-09-02  1:19     ` Vincent MAILHOL
2022-09-05  0:37 ` [PATCH v7 " Vincent Mailhol
2022-09-05  0:37   ` [PATCH v7 1/2] x86/asm/bitops: ffs: use __builtin_ffs to evaluate " Vincent Mailhol
2022-09-05  0:37   ` [PATCH v7 2/2] x86/asm/bitops: __ffs,ffz: use __builtin_ctzl " Vincent Mailhol
2022-09-06 18:26   ` [PATCH v7 0/2] x86/asm/bitops: optimize ff{s,z} functions for " Nick Desaulniers
2022-09-07  7:04     ` Nick Desaulniers
2022-09-07  7:49       ` Vincent MAILHOL
2022-09-07  9:09 ` [PATCH v8 " Vincent Mailhol
2022-09-07  9:09   ` [PATCH v8 1/2] x86/asm/bitops: ffs: use __builtin_ffs to evaluate " Vincent Mailhol
2022-09-07  9:09   ` [PATCH v8 2/2] x86/asm/bitops: __ffs,ffz: use __builtin_ctzl " Vincent Mailhol
2022-09-20 20:00 ` tip-bot2 for Vincent Mailhol [this message]
2022-09-20 20:00 ` [tip: x86/asm] x86/asm/bitops: Use __builtin_ffs() " tip-bot2 for Vincent Mailhol

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=166370401862.401.15713055946708670653.tip-bot2@tip-bot2 \
    --to=tip-bot2@linutronix.de \
    --cc=bp@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=mailhol.vincent@wanadoo.fr \
    --cc=ndesaulniers@google.com \
    --cc=x86@kernel.org \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.