linux-m68k.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v3 1/5] m68k/bitops: force inlining of all bitops functions
       [not found]   ` <20231217071250.892867-2-mailhol.vincent@wanadoo.fr>
@ 2024-01-02 10:28     ` Geert Uytterhoeven
  2024-01-07 12:01       ` Vincent MAILHOL
  0 siblings, 1 reply; 22+ messages in thread
From: Geert Uytterhoeven @ 2024-01-02 10:28 UTC (permalink / raw)
  To: Vincent Mailhol
  Cc: Andrew Morton, linux-kernel, Yury Norov, Nick Desaulniers,
	Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap,
	Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain,
	Matthew Wilcox, Paul E . McKenney, linux-hexagon, linux-m68k

Hi Vincent,

Thanks for your patch!

On Sun, Dec 17, 2023 at 8:13 AM Vincent Mailhol
<mailhol.vincent@wanadoo.fr> wrote:
> The inline keyword actually does not guarantee that the compiler will
> inline a functions. Whenever the goal is to actually inline a
> function, __always_inline should always be preferred instead.
>
> On an allyesconfig, with GCC 13.2.1, it saves roughly 5 KB.
>
>   $ size --format=GNU vmlinux.before vmlinux.after
>         text       data        bss      total filename
>     60449738   70975612    2288988  133714338 vmlinux.before
>     60446534   70972412    2289596  133708542 vmlinux.after

With gcc 9.5.0-1ubuntu1~22.04, the figures are completely different
(i.e. a size increase):

allyesconfig:

      text       data        bss      total filename
  58878600   72415994    2283652  133578246 vmlinux.before
  58882250   72419706    2284004  133585960 vmlinux.after

atari_defconfig:

      text       data        bss      total filename
   4112060    1579862     151680    5843602 vmlinux-v6.7-rc8
   4117008    1579350     151680    5848038
vmlinux-v6.7-rc8-1-m68k-bitops-force-inlining

The next patch offsets that for allyesconfig, but not for atari_defconfig.

> Reference: commit 8dd5032d9c54 ("x86/asm/bitops: Force inlining of
> test_and_set_bit and friends")

Please don't split lines containing tags.

> Link: https://git.kernel.org/torvalds/c/8dd5032d9c54
>
> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>

Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions
       [not found]   ` <20231217071250.892867-3-mailhol.vincent@wanadoo.fr>
@ 2024-01-02 10:49     ` Geert Uytterhoeven
  0 siblings, 0 replies; 22+ messages in thread
From: Geert Uytterhoeven @ 2024-01-02 10:49 UTC (permalink / raw)
  To: Vincent Mailhol
  Cc: Andrew Morton, linux-kernel, Yury Norov, Nick Desaulniers,
	Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap,
	Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain,
	Matthew Wilcox, Paul E . McKenney, linux-hexagon, linux-m68k

On Sun, Dec 17, 2023 at 8:13 AM Vincent Mailhol
<mailhol.vincent@wanadoo.fr> wrote:
>
> The compiler is not able to do constant folding on "asm volatile" code.
>
> Evaluate whether or not the function argument is a constant expression
> and if this is the case, return an equivalent builtin expression.
>
> On an allyesconfig, with GCC 13.2.1, it saves roughly 8 KB.
>
>   $ size --format=GNU vmlinux.before vmlinux.after
>         text       data        bss      total filename
>     60446534   70972412    2289596  133708542 vmlinux.before
>     60429746   70978876    2291676  133700298 vmlinux.after

Still a win with gcc 9.5.0-1ubuntu1~22.04:

allyesconfig:

      text       data        bss      total filename
  58882250   72419706    2284004  133585960
vmlinux-v6.7-rc8-1-m68k-bitops-force-inlining
  58863570   72419546    2285860  133568976
vmlinux-v6.7-rc8-2-m68k-bitops-use-__builtin_clz,ctzl,ffs

So an even bigger win...

atari_defconfig:

      text       data        bss      total filename
   4117008    1579350     151680    5848038
vmlinux-v6.7-rc8-1-m68k-bitops-force-inlining
   4116940    1579534     151712    5848186
vmlinux-v6.7-rc8-2-m68k-bitops-use-__builtin_clz,ctzl,ffs

... but a small size increase here.


>
> Reference: commit fdb6649ab7c1 ("x86/asm/bitops: Use __builtin_ctzl()
> to evaluate constant expressions")

Please don't split lines containing tags.

> Link: https://git.kernel.org/torvalds/c/fdb6649ab7c1
>
> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>

Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/5] m68k/bitops: force inlining of all bitops functions
  2024-01-02 10:28     ` [PATCH v3 1/5] m68k/bitops: force inlining of all bitops functions Geert Uytterhoeven
@ 2024-01-07 12:01       ` Vincent MAILHOL
  0 siblings, 0 replies; 22+ messages in thread
From: Vincent MAILHOL @ 2024-01-07 12:01 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Andrew Morton, linux-kernel, Yury Norov, Nick Desaulniers,
	Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap,
	Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain,
	Matthew Wilcox, Paul E . McKenney, linux-hexagon, linux-m68k

On Tue. 2 janv. 2024 at 19:28, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Vincent,
>
> Thanks for your patch!

Thanks for the review and for running the benchmark.

> On Sun, Dec 17, 2023 at 8:13 AM Vincent Mailhol
> <mailhol.vincent@wanadoo.fr> wrote:
> > The inline keyword actually does not guarantee that the compiler will
> > inline a functions. Whenever the goal is to actually inline a
> > function, __always_inline should always be preferred instead.
> >
> > On an allyesconfig, with GCC 13.2.1, it saves roughly 5 KB.
> >
> >   $ size --format=GNU vmlinux.before vmlinux.after
> >         text       data        bss      total filename
> >     60449738   70975612    2288988  133714338 vmlinux.before
> >     60446534   70972412    2289596  133708542 vmlinux.after
>
> With gcc 9.5.0-1ubuntu1~22.04, the figures are completely different
> (i.e. a size increase):

Those results are not normal, there should not be such a big
discrepancy between two versions of the same compiler. I double
checked everything and found out that I made a mistake when computing
the figures: not sure what exactly, but at some point, the ASLR seeds
(or other similar randomization feature) got reset and so, the
decrease I witnessed was just a "lucky roll".

After rerunning the benchmark (making sure to keep every seeds), I got
similar results as you:

        text       data        bss      total filename
    60449738   70975356    2288988  133714082
vmlinux_allyesconfig.before_this_series
    60446534   70979068    2289596  133715198
vmlinux_allyesconfig.after_first_patch
    60429746   70979132    2291676  133700554
vmlinux_allyesconfig.final_second_patch

Note that there are still some kind of randomness on the data segment
as shown in those other benchmarks I run:

        text       data        bss      total filename
    60449738   70976124    2288988  133714850
vmlinux_allyesconfig.before_this_series
    60446534   70980092    2289596  133716222
vmlinux_allyesconfig.after_first_patch
    60429746   70979388    2291676  133700810
vmlinux_allyesconfig.after_second_patch

        text       data        bss      total filename
    60449738   70975612    2288988  133714338
vmlinux_allyesconfig.before_this_series
    60446534   70980348    2289596  133716478
vmlinux_allyesconfig.after_first_patch
    60429746   70979900    2291676  133701322
vmlinux_allyesconfig.after_second_patch

But the error margin is within 1K.

So, in short, I inlined some functions which I shouldn't have. I am
preparing a v4 in which I will only inline the bit-find functions
(namely: __ffs(), ffs(), ffz(), __fls(), fls() and fls64()). Here are
the new figures:

        text       data        bss      total filename
    60453552   70955485    2288620  133697657
vmlinux_allyesconfig.before_this_series
    60450304   70953085    2289260  133692649
vmlinux_allyesconfig.after_first_patch
    60433536   70952637    2291340  133677513
vmlinux_allyesconfig.after_second_patch

N.B. The new figures were after a rebase, so do not try to compare
with the previous benchmarks. I will send the v4 soon, after I finish
to update the patch comments and double check things.

Concerning the other functions in bitops.h, there may be some other
ones worth a __always_inline. But I will narrow the scope of this
series only to the bit-find function. If a good samaritan wants to
investigate the other functions, go ahead!

Yours sincerely,
Vincent Mailhol




> allyesconfig:
>
>       text       data        bss      total filename
>   58878600   72415994    2283652  133578246 vmlinux.before
>   58882250   72419706    2284004  133585960 vmlinux.after
>
> atari_defconfig:
>
>       text       data        bss      total filename
>    4112060    1579862     151680    5843602 vmlinux-v6.7-rc8
>    4117008    1579350     151680    5848038
> vmlinux-v6.7-rc8-1-m68k-bitops-force-inlining
>
> The next patch offsets that for allyesconfig, but not for atari_defconfig.
>
> > Reference: commit 8dd5032d9c54 ("x86/asm/bitops: Force inlining of
> > test_and_set_bit and friends")
>
> Please don't split lines containing tags.
>
> > Link: https://git.kernel.org/torvalds/c/8dd5032d9c54
> >
> > Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
>
> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
>
> Gr{oetje,eeting}s,
>
>                         Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 0/5] bitops: optimize code and add tests
       [not found] <20221111081316.30373-1-mailhol.vincent@wanadoo.fr>
       [not found] ` <20231217071250.892867-1-mailhol.vincent@wanadoo.fr>
@ 2024-01-28  5:00 ` Vincent Mailhol
  2024-01-28  5:00   ` [PATCH v4 1/5] m68k/bitops: force inlining of all bit-find functions Vincent Mailhol
                     ` (4 more replies)
  1 sibling, 5 replies; 22+ messages in thread
From: Vincent Mailhol @ 2024-01-28  5:00 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel
  Cc: Yury Norov, Nick Desaulniers, Douglas Anderson, Kees Cook,
	Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven,
	Marco Elver, Brian Cain, Geert Uytterhoeven, Matthew Wilcox,
	Paul E . McKenney, linux-m68k, Vincent Mailhol

This series adds a compile test to make sure that all the bitops
operations (namely __ffs(), ffs(), ffz(), __fls(), fls(), fls64())
correctly fold constant expressions given that their argument is also
a constant expression. The other functions from bitops.h are out of
scope.

So far, only the n68k and the hexagon architectures lack such
optimization. To this extend, the first two patches optimize m68k
architecture, the third and fourth optimize the hexagon architecture
bitops function.

The fifth and final patch adds the compile time tests to assert that
the constant folding occurs and that the result is accurate.

This is tested on arm, arm64, hexagon, m68k, x86 and x86_64. For other
architectures, I am putting my trust into the kernel test robot to
send a report if ever one of these still lacks bitops
optimizations. The kernel test robot did not complain on v3, giving me
confidence that all architectures are now properly optimized.
---

** Changelog **

v3 -> v4:

  - Only apply the __always_inline to the bit-find functions, do not
    touch other functions from bitops.h. I discovered that the
    benchmark done in the v3 was incorrect (refer to the thread for
    details). The scope was thus narrowed down to the bit-find
    functions for which I could demonstrate the gain in the benchmark.

  - Add benchmark for hexagon (patch 3/5 and 4/5). Contrarily to the
    m68k benchmark which is with an allyesconfig, the hexagon
    benchmark uses a defconfig. The reason is just that the
    allyesconfig did not work on first try on my environment (even
    before applying this series), and I did not spent efforts to
    troubleshoot.

  - Add Geert review tag in patch 2/5. Despite also receiving the tag
    for patch 1/5, I did not apply due to new changes in that patch.

  - Do not split the lines containing tags.

  Link: https://lore.kernel.org/all/20231217071250.892867-1-mailhol.vincent@wanadoo.fr/

v2 -> v3:

  - Add patches 1/5 and 2/5 to optimize m68k architecture bitops.
    Thanks to the kernel test robot for reporting!

  - Add patches 3/5 and 4/5 to optimize hexagon architecture bitops.
    Thanks to the kernel test robot for reporting!

  - Patch 5/5: mark test_bitops_const_eval() as __always_inline, this
    done, pass n (the test number) as a parameter. Previously, only
    BITS(10) was tested. Add tests for BITS(0) and BITS(31).

  Link: https://lore.kernel.org/all/20231130102717.1297492-1-mailhol.vincent@wanadoo.fr/

v1 -> v2:

  - Drop the RFC patch. v1 was not ready to be applied on x86 because
    of pending changes in arch/x86/include/asm/bitops.h. This was
    finally fixed by Nick in commit 3dae5c43badf ("x86/asm/bitops: Use
    __builtin_clz{l|ll} to evaluate constant expressions").
    Thanks Nick!

  - Update the commit description.

  - Introduce the test_const_eval() macro to factorize code.

  - No functional change.

  Link: https://lore.kernel.org/all/20221111081316.30373-1-mailhol.vincent@wanadoo.fr/

Vincent Mailhol (5):
  m68k/bitops: force inlining of all bit-find functions
  m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant
    expressions
  hexagon/bitops: force inlining of all bit-find functions
  hexagon/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant
    expressions
  lib: test_bitops: add compile-time optimization/evaluations assertions

 arch/hexagon/include/asm/bitops.h | 25 +++++++++++++++++++-----
 arch/m68k/include/asm/bitops.h    | 26 ++++++++++++++++++-------
 lib/Kconfig.debug                 |  4 ++++
 lib/test_bitops.c                 | 32 +++++++++++++++++++++++++++++++
 4 files changed, 75 insertions(+), 12 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 1/5] m68k/bitops: force inlining of all bit-find functions
  2024-01-28  5:00 ` [PATCH v4 0/5] bitops: optimize code and add tests Vincent Mailhol
@ 2024-01-28  5:00   ` Vincent Mailhol
  2024-01-28  5:00   ` [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 22+ messages in thread
From: Vincent Mailhol @ 2024-01-28  5:00 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel
  Cc: Yury Norov, Nick Desaulniers, Douglas Anderson, Kees Cook,
	Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven,
	Marco Elver, Brian Cain, Geert Uytterhoeven, Matthew Wilcox,
	Paul E . McKenney, linux-m68k, Vincent Mailhol

The inline keyword actually does not guarantee that the compiler will
inline a functions. Whenever the goal is to actually inline a
function, __always_inline should always be preferred instead.

__always_inline is also needed for further optimizations which will
come up in a follow-up patch.

Inline all the bit-find function which have a custom m68k assembly
implementation, namely: __ffs(), ffs(), ffz(), __fls(), fls().

On linux v6.7 allyesconfig with GCC 13.2.1, it does not impact the
final size, meaning that, overall, those function were already inlined
on modern GCCs:

  $ size --format=GNU vmlinux.before vmlinux.after
    text       data        bss      total filename
    60457956   70953665    2288644  133700265 vmlinux.before
    60457964   70953697    2288644  133700305 vmlinux.after

Reference: commit 8dd5032d9c54 ("x86/asm/bitops: Force inlining of test_and_set_bit and friends")
Link: https://git.kernel.org/torvalds/c/8dd5032d9c54

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
---
 arch/m68k/include/asm/bitops.h | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
index 14c64a6f1217..a8b23f897f24 100644
--- a/arch/m68k/include/asm/bitops.h
+++ b/arch/m68k/include/asm/bitops.h
@@ -465,7 +465,7 @@ static inline int find_next_bit(const unsigned long *vaddr, int size,
  * ffz = Find First Zero in word. Undefined if no zero exists,
  * so code should check against ~0UL first..
  */
-static inline unsigned long ffz(unsigned long word)
+static __always_inline unsigned long ffz(unsigned long word)
 {
 	int res;
 
@@ -488,7 +488,7 @@ static inline unsigned long ffz(unsigned long word)
  */
 #if (defined(__mcfisaaplus__) || defined(__mcfisac__)) && \
 	!defined(CONFIG_M68000)
-static inline unsigned long __ffs(unsigned long x)
+static __always_inline unsigned long __ffs(unsigned long x)
 {
 	__asm__ __volatile__ ("bitrev %0; ff1 %0"
 		: "=d" (x)
@@ -496,7 +496,7 @@ static inline unsigned long __ffs(unsigned long x)
 	return x;
 }
 
-static inline int ffs(int x)
+static __always_inline int ffs(int x)
 {
 	if (!x)
 		return 0;
@@ -518,7 +518,7 @@ static inline int ffs(int x)
  *	the libc and compiler builtin ffs routines, therefore
  *	differs in spirit from the above ffz (man ffs).
  */
-static inline int ffs(int x)
+static __always_inline int ffs(int x)
 {
 	int cnt;
 
@@ -528,7 +528,7 @@ static inline int ffs(int x)
 	return 32 - cnt;
 }
 
-static inline unsigned long __ffs(unsigned long x)
+static __always_inline unsigned long __ffs(unsigned long x)
 {
 	return ffs(x) - 1;
 }
@@ -536,7 +536,7 @@ static inline unsigned long __ffs(unsigned long x)
 /*
  *	fls: find last bit set.
  */
-static inline int fls(unsigned int x)
+static __always_inline int fls(unsigned int x)
 {
 	int cnt;
 
@@ -546,7 +546,7 @@ static inline int fls(unsigned int x)
 	return 32 - cnt;
 }
 
-static inline unsigned long __fls(unsigned long x)
+static __always_inline unsigned long __fls(unsigned long x)
 {
 	return fls(x) - 1;
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions
  2024-01-28  5:00 ` [PATCH v4 0/5] bitops: optimize code and add tests Vincent Mailhol
  2024-01-28  5:00   ` [PATCH v4 1/5] m68k/bitops: force inlining of all bit-find functions Vincent Mailhol
@ 2024-01-28  5:00   ` Vincent Mailhol
  2024-01-28  5:39     ` Finn Thain
  2024-01-28  5:00   ` [PATCH v4 3/5] hexagon/bitops: force inlining of all bit-find functions Vincent Mailhol
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Vincent Mailhol @ 2024-01-28  5:00 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel
  Cc: Yury Norov, Nick Desaulniers, Douglas Anderson, Kees Cook,
	Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven,
	Marco Elver, Brian Cain, Geert Uytterhoeven, Matthew Wilcox,
	Paul E . McKenney, linux-m68k, Vincent Mailhol

The compiler is not able to do constant folding on "asm volatile" code.

Evaluate whether or not the function argument is a constant expression
and if this is the case, return an equivalent builtin expression.

On linux 6.7 with an allyesconfig and GCC 13.2.1, it saves roughly 11 KB.

  $ size --format=GNU vmlinux.before vmlinux.after
    text       data        bss      total filename
    60457964   70953697    2288644  133700305 vmlinux.before
    60441196   70957057    2290724  133688977 vmlinux.after

Reference: commit fdb6649ab7c1 ("x86/asm/bitops: Use __builtin_ctzl() to evaluate constant expressions")
Link: https://git.kernel.org/torvalds/c/fdb6649ab7c1

Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
---
 arch/m68k/include/asm/bitops.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
index a8b23f897f24..02ec8a193b96 100644
--- a/arch/m68k/include/asm/bitops.h
+++ b/arch/m68k/include/asm/bitops.h
@@ -469,6 +469,9 @@ static __always_inline unsigned long ffz(unsigned long word)
 {
 	int res;
 
+	if (__builtin_constant_p(word))
+		return __builtin_ctzl(~word);
+
 	__asm__ __volatile__ ("bfffo %1{#0,#0},%0"
 			      : "=d" (res) : "d" (~word & -~word));
 	return res ^ 31;
@@ -490,6 +493,9 @@ static __always_inline unsigned long ffz(unsigned long word)
 	!defined(CONFIG_M68000)
 static __always_inline unsigned long __ffs(unsigned long x)
 {
+	if (__builtin_constant_p(x))
+		return __builtin_ctzl(x);
+
 	__asm__ __volatile__ ("bitrev %0; ff1 %0"
 		: "=d" (x)
 		: "0" (x));
@@ -522,6 +528,9 @@ static __always_inline int ffs(int x)
 {
 	int cnt;
 
+	if (__builtin_constant_p(x))
+		return __builtin_ffs(x);
+
 	__asm__ ("bfffo %1{#0:#0},%0"
 		: "=d" (cnt)
 		: "dm" (x & -x));
@@ -540,6 +549,9 @@ static __always_inline int fls(unsigned int x)
 {
 	int cnt;
 
+	if (__builtin_constant_p(x))
+		return x ? BITS_PER_TYPE(x) - __builtin_clz(x) : 0;
+
 	__asm__ ("bfffo %1{#0,#0},%0"
 		: "=d" (cnt)
 		: "dm" (x));
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 3/5] hexagon/bitops: force inlining of all bit-find functions
  2024-01-28  5:00 ` [PATCH v4 0/5] bitops: optimize code and add tests Vincent Mailhol
  2024-01-28  5:00   ` [PATCH v4 1/5] m68k/bitops: force inlining of all bit-find functions Vincent Mailhol
  2024-01-28  5:00   ` [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol
@ 2024-01-28  5:00   ` Vincent Mailhol
  2024-01-28  5:00   ` [PATCH v4 4/5] hexagon/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol
  2024-01-28  5:00   ` [PATCH v4 5/5] lib: test_bitops: add compile-time optimization/evaluations assertions Vincent Mailhol
  4 siblings, 0 replies; 22+ messages in thread
From: Vincent Mailhol @ 2024-01-28  5:00 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel
  Cc: Yury Norov, Nick Desaulniers, Douglas Anderson, Kees Cook,
	Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven,
	Marco Elver, Brian Cain, Geert Uytterhoeven, Matthew Wilcox,
	Paul E . McKenney, linux-m68k, Vincent Mailhol

The inline keyword actually does not guarantee that the compiler will
inline a functions. Whenever the goal is to actually inline a
function, __always_inline should always be preferred instead.

__always_inline is also needed for further optimizations which will
come up in a follow-up patch.

Inline all the bit-find function which have a custom hexagon assembly
implementation, namely: __ffs(), ffs(), ffz(), __fls(), fls().

On linux v6.7 defconfig with clang 17.0.6, it does not impact the
final size, meaning that, overall, those function were already inlined
on modern clangs:

  $ size --format=GNU vmlinux.before vmlinux.after vmlinux.final
        text       data        bss      total filename
     4827900    1798340     364057    6990297 vmlinux.before
     4827900    1798340     364057    6990297 vmlinux.after

Reference: commit 8dd5032d9c54 ("x86/asm/bitops: Force inlining of test_and_set_bit and friends")
Link: https://git.kernel.org/torvalds/c/8dd5032d9c54

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
---
 arch/hexagon/include/asm/bitops.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h
index 160d8f37fa1a..e856d6dbfe16 100644
--- a/arch/hexagon/include/asm/bitops.h
+++ b/arch/hexagon/include/asm/bitops.h
@@ -200,7 +200,7 @@ arch_test_bit_acquire(unsigned long nr, const volatile unsigned long *addr)
  *
  * Undefined if no zero exists, so code should check against ~0UL first.
  */
-static inline long ffz(int x)
+static __always_inline long ffz(int x)
 {
 	int r;
 
@@ -217,7 +217,7 @@ static inline long ffz(int x)
  * This is defined the same way as ffs.
  * Note fls(0) = 0, fls(1) = 1, fls(0x80000000) = 32.
  */
-static inline int fls(unsigned int x)
+static __always_inline int fls(unsigned int x)
 {
 	int r;
 
@@ -238,7 +238,7 @@ static inline int fls(unsigned int x)
  * the libc and compiler builtin ffs routines, therefore
  * differs in spirit from the above ffz (man ffs).
  */
-static inline int ffs(int x)
+static __always_inline int ffs(int x)
 {
 	int r;
 
@@ -260,7 +260,7 @@ static inline int ffs(int x)
  * bits_per_long assumed to be 32
  * numbering starts at 0 I think (instead of 1 like ffs)
  */
-static inline unsigned long __ffs(unsigned long word)
+static __always_inline unsigned long __ffs(unsigned long word)
 {
 	int num;
 
@@ -278,7 +278,7 @@ static inline unsigned long __ffs(unsigned long word)
  * Undefined if no set bit exists, so code should check against 0 first.
  * bits_per_long assumed to be 32
  */
-static inline unsigned long __fls(unsigned long word)
+static __always_inline unsigned long __fls(unsigned long word)
 {
 	int num;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 4/5] hexagon/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions
  2024-01-28  5:00 ` [PATCH v4 0/5] bitops: optimize code and add tests Vincent Mailhol
                     ` (2 preceding siblings ...)
  2024-01-28  5:00   ` [PATCH v4 3/5] hexagon/bitops: force inlining of all bit-find functions Vincent Mailhol
@ 2024-01-28  5:00   ` Vincent Mailhol
  2024-01-28  5:00   ` [PATCH v4 5/5] lib: test_bitops: add compile-time optimization/evaluations assertions Vincent Mailhol
  4 siblings, 0 replies; 22+ messages in thread
From: Vincent Mailhol @ 2024-01-28  5:00 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel
  Cc: Yury Norov, Nick Desaulniers, Douglas Anderson, Kees Cook,
	Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven,
	Marco Elver, Brian Cain, Geert Uytterhoeven, Matthew Wilcox,
	Paul E . McKenney, linux-m68k, Vincent Mailhol

The compiler is not able to do constant folding on "asm volatile" code.

Evaluate whether or not the function argument is a constant expression
and if this is the case, return an equivalent builtin expression.

On linux 6.7 with an allyesconfig and clang 17.0.6, it saves roughly
4 KB.

  $ size --format=GNU vmlinux.before vmlinux.after
        text       data        bss      total filename
     4827900    1798340     364057    6990297 vmlinux.before
     4827072    1795060     364057    6986189 vmlinux.after

Reference: commit fdb6649ab7c1 ("x86/asm/bitops: Use __builtin_ctzl() to evaluate constant expressions")
Link: https://git.kernel.org/torvalds/c/fdb6649ab7c1

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
---
 arch/hexagon/include/asm/bitops.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h
index e856d6dbfe16..c74a639c84f3 100644
--- a/arch/hexagon/include/asm/bitops.h
+++ b/arch/hexagon/include/asm/bitops.h
@@ -204,6 +204,9 @@ static __always_inline long ffz(int x)
 {
 	int r;
 
+	if (__builtin_constant_p(x))
+		return __builtin_ctzl(~x);
+
 	asm("%0 = ct1(%1);\n"
 		: "=&r" (r)
 		: "r" (x));
@@ -221,6 +224,9 @@ static __always_inline int fls(unsigned int x)
 {
 	int r;
 
+	if (__builtin_constant_p(x))
+		return x ? BITS_PER_TYPE(x) - __builtin_clz(x) : 0;
+
 	asm("{ %0 = cl0(%1);}\n"
 		"%0 = sub(#32,%0);\n"
 		: "=&r" (r)
@@ -242,6 +248,9 @@ static __always_inline int ffs(int x)
 {
 	int r;
 
+	if (__builtin_constant_p(x))
+		return __builtin_ffs(x);
+
 	asm("{ P0 = cmp.eq(%1,#0); %0 = ct0(%1);}\n"
 		"{ if (P0) %0 = #0; if (!P0) %0 = add(%0,#1);}\n"
 		: "=&r" (r)
@@ -264,6 +273,9 @@ static __always_inline unsigned long __ffs(unsigned long word)
 {
 	int num;
 
+	if (__builtin_constant_p(word))
+		return __builtin_ctzl(word);
+
 	asm("%0 = ct0(%1);\n"
 		: "=&r" (num)
 		: "r" (word));
@@ -282,6 +294,9 @@ static __always_inline unsigned long __fls(unsigned long word)
 {
 	int num;
 
+	if (__builtin_constant_p(word))
+		return BITS_PER_LONG - 1 - __builtin_clzl(word);
+
 	asm("%0 = cl0(%1);\n"
 		"%0 = sub(#31,%0);\n"
 		: "=&r" (num)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 5/5] lib: test_bitops: add compile-time optimization/evaluations assertions
  2024-01-28  5:00 ` [PATCH v4 0/5] bitops: optimize code and add tests Vincent Mailhol
                     ` (3 preceding siblings ...)
  2024-01-28  5:00   ` [PATCH v4 4/5] hexagon/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol
@ 2024-01-28  5:00   ` Vincent Mailhol
  4 siblings, 0 replies; 22+ messages in thread
From: Vincent Mailhol @ 2024-01-28  5:00 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel
  Cc: Yury Norov, Nick Desaulniers, Douglas Anderson, Kees Cook,
	Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven,
	Marco Elver, Brian Cain, Geert Uytterhoeven, Matthew Wilcox,
	Paul E . McKenney, linux-m68k, Vincent Mailhol

Add a function in the bitops test suite to assert that the bitops
helper functions correctly fold into constant expressions (or trigger
a build bug otherwise). This should work on all the optimization
levels supported by Kbuild.

The added function does not perform any runtime tests and gets
optimized out to nothing after passing the build assertions.

Suggested-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
---
 lib/Kconfig.debug |  4 ++++
 lib/test_bitops.c | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 4405f81248fb..85f8638b3ae6 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2439,6 +2439,10 @@ config TEST_BITOPS
 	  compilations. It has no dependencies and doesn't run or load unless
 	  explicitly requested by name.  for example: modprobe test_bitops.
 
+	  In addition, check that the compiler is able to fold the bitops
+	  function into a compile-time constant (given that the argument is also
+	  a compile-time constant) and trigger a build bug otherwise.
+
 	  If unsure, say N.
 
 config TEST_VMALLOC
diff --git a/lib/test_bitops.c b/lib/test_bitops.c
index 3b7bcbee84db..99b612515eb6 100644
--- a/lib/test_bitops.c
+++ b/lib/test_bitops.c
@@ -50,6 +50,34 @@ static unsigned long order_comb_long[][2] = {
 };
 #endif
 
+/* Assert that a boolean expression can be folded in a constant and is true. */
+#define test_const_eval(test_expr)				\
+({								\
+	/* Evaluate once so that compiler can fold it. */	\
+	bool __test_expr = test_expr;				\
+								\
+	BUILD_BUG_ON(!__builtin_constant_p(__test_expr));	\
+	BUILD_BUG_ON(!__test_expr);				\
+})
+
+/*
+ * On any supported optimization level (-O2, -Os) and if invoked with
+ * a compile-time constant argument, the compiler must be able to fold
+ * into a constant expression all the bit find functions. Namely:
+ * __ffs(), ffs(), ffz(), __fls(), fls() and fls64(). Otherwise,
+ * trigger a build bug.
+ */
+static __always_inline void test_bitops_const_eval(unsigned int n)
+{
+	test_const_eval(__ffs(BIT(n)) == n);
+	test_const_eval(ffs(BIT(n)) == n + 1);
+	test_const_eval(ffz(~BIT(n)) == n);
+	test_const_eval(__fls(BIT(n)) == n);
+	test_const_eval(fls(BIT(n)) == n + 1);
+	test_const_eval(fls64(BIT_ULL(n)) == n + 1);
+	test_const_eval(fls64(BIT_ULL(n + 32)) == n + 33);
+}
+
 static int __init test_bitops_startup(void)
 {
 	int i, bit_set;
@@ -94,6 +122,10 @@ static int __init test_bitops_startup(void)
 	if (bit_set != BITOPS_LAST)
 		pr_err("ERROR: FOUND SET BIT %d\n", bit_set);
 
+	test_bitops_const_eval(0);
+	test_bitops_const_eval(10);
+	test_bitops_const_eval(31);
+
 	pr_info("Completed bitops test\n");
 
 	return 0;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions
  2024-01-28  5:00   ` [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol
@ 2024-01-28  5:39     ` Finn Thain
  2024-01-28  6:26       ` Vincent MAILHOL
  0 siblings, 1 reply; 22+ messages in thread
From: Finn Thain @ 2024-01-28  5:39 UTC (permalink / raw)
  To: Vincent Mailhol
  Cc: Andrew Morton, linux-kernel, Yury Norov, Nick Desaulniers,
	Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap,
	Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain,
	Geert Uytterhoeven, Matthew Wilcox, Paul E . McKenney,
	linux-m68k


On Sun, 28 Jan 2024, Vincent Mailhol wrote:

> The compiler is not able to do constant folding on "asm volatile" code.
> 
> Evaluate whether or not the function argument is a constant expression
> and if this is the case, return an equivalent builtin expression.
> 
> On linux 6.7 with an allyesconfig and GCC 13.2.1, it saves roughly 11 KB.
> 
>   $ size --format=GNU vmlinux.before vmlinux.after
>     text       data        bss      total filename
>     60457964   70953697    2288644  133700305 vmlinux.before
>     60441196   70957057    2290724  133688977 vmlinux.after
> 
> Reference: commit fdb6649ab7c1 ("x86/asm/bitops: Use __builtin_ctzl() to evaluate constant expressions")
> Link: https://git.kernel.org/torvalds/c/fdb6649ab7c1
> 
> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
> ---
>  arch/m68k/include/asm/bitops.h | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
> index a8b23f897f24..02ec8a193b96 100644
> --- a/arch/m68k/include/asm/bitops.h
> +++ b/arch/m68k/include/asm/bitops.h
> @@ -469,6 +469,9 @@ static __always_inline unsigned long ffz(unsigned long word)
>  {
>  	int res;
>  
> +	if (__builtin_constant_p(word))
> +		return __builtin_ctzl(~word);
> +
>  	__asm__ __volatile__ ("bfffo %1{#0,#0},%0"
>  			      : "=d" (res) : "d" (~word & -~word));
>  	return res ^ 31;

If the builtin has the desired behaviour, why do we reimplement it in asm? 
Shouldn't we abandon one or the other to avoid having to prove (and 
maintain) their equivalence?

> @@ -490,6 +493,9 @@ static __always_inline unsigned long ffz(unsigned long word)
>  	!defined(CONFIG_M68000)
>  static __always_inline unsigned long __ffs(unsigned long x)
>  {
> +	if (__builtin_constant_p(x))
> +		return __builtin_ctzl(x);
> +
>  	__asm__ __volatile__ ("bitrev %0; ff1 %0"
>  		: "=d" (x)
>  		: "0" (x));
> @@ -522,6 +528,9 @@ static __always_inline int ffs(int x)
>  {
>  	int cnt;
>  
> +	if (__builtin_constant_p(x))
> +		return __builtin_ffs(x);
> +
>  	__asm__ ("bfffo %1{#0:#0},%0"
>  		: "=d" (cnt)
>  		: "dm" (x & -x));
> @@ -540,6 +549,9 @@ static __always_inline int fls(unsigned int x)
>  {
>  	int cnt;
>  
> +	if (__builtin_constant_p(x))
> +		return x ? BITS_PER_TYPE(x) - __builtin_clz(x) : 0;
> +
>  	__asm__ ("bfffo %1{#0,#0},%0"
>  		: "=d" (cnt)
>  		: "dm" (x));
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions
  2024-01-28  5:39     ` Finn Thain
@ 2024-01-28  6:26       ` Vincent MAILHOL
  2024-01-28 12:16         ` David Laight
  0 siblings, 1 reply; 22+ messages in thread
From: Vincent MAILHOL @ 2024-01-28  6:26 UTC (permalink / raw)
  To: Finn Thain
  Cc: Andrew Morton, linux-kernel, Yury Norov, Nick Desaulniers,
	Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap,
	Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain,
	Geert Uytterhoeven, Matthew Wilcox, Paul E . McKenney,
	linux-m68k

On Sun. 28 Jan. 2024 at 14:39, Finn Thain <fthain@linux-m68k.org> wrote:
> On Sun, 28 Jan 2024, Vincent Mailhol wrote:
>
> > The compiler is not able to do constant folding on "asm volatile" code.
> >
> > Evaluate whether or not the function argument is a constant expression
> > and if this is the case, return an equivalent builtin expression.
> >
> > On linux 6.7 with an allyesconfig and GCC 13.2.1, it saves roughly 11 KB.
> >
> >   $ size --format=GNU vmlinux.before vmlinux.after
> >     text       data        bss      total filename
> >     60457964   70953697    2288644  133700305 vmlinux.before
> >     60441196   70957057    2290724  133688977 vmlinux.after
> >
> > Reference: commit fdb6649ab7c1 ("x86/asm/bitops: Use __builtin_ctzl() to evaluate constant expressions")
> > Link: https://git.kernel.org/torvalds/c/fdb6649ab7c1
> >
> > Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
> > Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
> > ---
> >  arch/m68k/include/asm/bitops.h | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
> > index a8b23f897f24..02ec8a193b96 100644
> > --- a/arch/m68k/include/asm/bitops.h
> > +++ b/arch/m68k/include/asm/bitops.h
> > @@ -469,6 +469,9 @@ static __always_inline unsigned long ffz(unsigned long word)
> >  {
> >       int res;
> >
> > +     if (__builtin_constant_p(word))
> > +             return __builtin_ctzl(~word);
> > +
> >       __asm__ __volatile__ ("bfffo %1{#0,#0},%0"
> >                             : "=d" (res) : "d" (~word & -~word));
> >       return res ^ 31;
>
> If the builtin has the desired behaviour, why do we reimplement it in asm?
> Shouldn't we abandon one or the other to avoid having to prove (and
> maintain) their equivalence?

The asm is meant to produce better results when the argument is not a
constant expression. Below commit is a good illustration of why we
want both the asm and the built:

  https://git.kernel.org/torvalds/c/146034fed6ee

I say "is meant", because I did not assert whether this is still true.
Note that there are some cases in which the asm is not better anymore,
for example, see this thread:

  https://lore.kernel.org/lkml/20221106095106.849154-2-mailhol.vincent@wanadoo.fr/

but I did not receive more answers, so I stopped trying to investigate
the subject.

If you want, you can check the produced assembly of both the asm and
the builtin for both clang and gcc, and if the builtin is always
either better or equivalent, then the asm can be removed. That said, I
am not spending more effort there after being ghosted once (c.f. above
thread).

> > @@ -490,6 +493,9 @@ static __always_inline unsigned long ffz(unsigned long word)
> >       !defined(CONFIG_M68000)
> >  static __always_inline unsigned long __ffs(unsigned long x)
> >  {
> > +     if (__builtin_constant_p(x))
> > +             return __builtin_ctzl(x);
> > +
> >       __asm__ __volatile__ ("bitrev %0; ff1 %0"
> >               : "=d" (x)
> >               : "0" (x));
> > @@ -522,6 +528,9 @@ static __always_inline int ffs(int x)
> >  {
> >       int cnt;
> >
> > +     if (__builtin_constant_p(x))
> > +             return __builtin_ffs(x);
> > +
> >       __asm__ ("bfffo %1{#0:#0},%0"
> >               : "=d" (cnt)
> >               : "dm" (x & -x));
> > @@ -540,6 +549,9 @@ static __always_inline int fls(unsigned int x)
> >  {
> >       int cnt;
> >
> > +     if (__builtin_constant_p(x))
> > +             return x ? BITS_PER_TYPE(x) - __builtin_clz(x) : 0;
> > +
> >       __asm__ ("bfffo %1{#0,#0},%0"
> >               : "=d" (cnt)
> >               : "dm" (x));
> >

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions
  2024-01-28  6:26       ` Vincent MAILHOL
@ 2024-01-28 12:16         ` David Laight
  2024-01-28 13:27           ` Vincent MAILHOL
  0 siblings, 1 reply; 22+ messages in thread
From: David Laight @ 2024-01-28 12:16 UTC (permalink / raw)
  To: 'Vincent MAILHOL', Finn Thain
  Cc: Andrew Morton, linux-kernel, Yury Norov, Nick Desaulniers,
	Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap,
	Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain,
	Geert Uytterhoeven, Matthew Wilcox, Paul E . McKenney,
	linux-m68k

From: Vincent MAILHOL
> Sent: 28 January 2024 06:27
> 
> On Sun. 28 Jan. 2024 at 14:39, Finn Thain <fthain@linux-m68k.org> wrote:
> > On Sun, 28 Jan 2024, Vincent Mailhol wrote:
> >
> > > The compiler is not able to do constant folding on "asm volatile" code.
> > >
> > > Evaluate whether or not the function argument is a constant expression
> > > and if this is the case, return an equivalent builtin expression.
> > >
...
> > If the builtin has the desired behaviour, why do we reimplement it in asm?
> > Shouldn't we abandon one or the other to avoid having to prove (and
> > maintain) their equivalence?
> 
> The asm is meant to produce better results when the argument is not a
> constant expression. Below commit is a good illustration of why we
> want both the asm and the built:
> 
>   https://git.kernel.org/torvalds/c/146034fed6ee
> 
> I say "is meant", because I did not assert whether this is still true.
> Note that there are some cases in which the asm is not better anymore,
> for example, see this thread:
> 
>   https://lore.kernel.org/lkml/20221106095106.849154-2-mailhol.vincent@wanadoo.fr/
> 
> but I did not receive more answers, so I stopped trying to investigate
> the subject.
> 
> If you want, you can check the produced assembly of both the asm and
> the builtin for both clang and gcc, and if the builtin is always
> either better or equivalent, then the asm can be removed. That said, I
> am not spending more effort there after being ghosted once (c.f. above
> thread).

I don't see any example there of why the __builtin_xxx() versions
shouldn't be used all the time.
(The x86-64 asm blocks contain unrelated call instructions and objdump
wasn't passed -d to show what they were.
One even has the 'return thunk pessimisation showing.)

I actually suspect the asm versions predate the builtins.

Does (or can) the outer common header use the __builtin functions
if no asm version exists?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions
  2024-01-28 12:16         ` David Laight
@ 2024-01-28 13:27           ` Vincent MAILHOL
  2024-01-28 19:01             ` David Laight
  2024-01-28 22:34             ` Finn Thain
  0 siblings, 2 replies; 22+ messages in thread
From: Vincent MAILHOL @ 2024-01-28 13:27 UTC (permalink / raw)
  To: David Laight
  Cc: Finn Thain, Andrew Morton, linux-kernel, Yury Norov,
	Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek,
	Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver,
	Brian Cain, Geert Uytterhoeven, Matthew Wilcox,
	Paul E . McKenney, linux-m68k

On Sun. 28 janv. 2024 at 21:16, David Laight <David.Laight@aculab.com> wrote:
> From: Vincent MAILHOL
> > Sent: 28 January 2024 06:27
> >
> > On Sun. 28 Jan. 2024 at 14:39, Finn Thain <fthain@linux-m68k.org> wrote:
> > > On Sun, 28 Jan 2024, Vincent Mailhol wrote:
> > >
> > > > The compiler is not able to do constant folding on "asm volatile" code.
> > > >
> > > > Evaluate whether or not the function argument is a constant expression
> > > > and if this is the case, return an equivalent builtin expression.
> > > >
> ...
> > > If the builtin has the desired behaviour, why do we reimplement it in asm?
> > > Shouldn't we abandon one or the other to avoid having to prove (and
> > > maintain) their equivalence?
> >
> > The asm is meant to produce better results when the argument is not a
> > constant expression. Below commit is a good illustration of why we
> > want both the asm and the built:
> >
> >   https://git.kernel.org/torvalds/c/146034fed6ee
> >
> > I say "is meant", because I did not assert whether this is still true.
> > Note that there are some cases in which the asm is not better anymore,
> > for example, see this thread:
> >
> >   https://lore.kernel.org/lkml/20221106095106.849154-2-mailhol.vincent@wanadoo.fr/
> >
> > but I did not receive more answers, so I stopped trying to investigate
> > the subject.
> >
> > If you want, you can check the produced assembly of both the asm and
> > the builtin for both clang and gcc, and if the builtin is always
> > either better or equivalent, then the asm can be removed. That said, I
> > am not spending more effort there after being ghosted once (c.f. above
> > thread).
>
> I don't see any example there of why the __builtin_xxx() versions
> shouldn't be used all the time.
> (The x86-64 asm blocks contain unrelated call instructions and objdump
> wasn't passed -d to show what they were.
> One even has the 'return thunk pessimisation showing.)

Fair. My goal was not to point to the assembly code but to this sentence:

  However, for non constant expressions, the kernel's ffs() asm
  version remains better for x86_64 because, contrary to GCC, it
  doesn't emit the CMOV assembly instruction

I should have been more clear. Sorry for that.

But the fact remains, on x86, some of the asm produced more optimized
code than the builtin.

> I actually suspect the asm versions predate the builtins.

This seems true. The __bultins were introduced in:

  generic: Implement generic ffs/fls using __builtin_* functions
  https://git.kernel.org/torvalds/c/048fa2df92c3

when the asm implementation already existed in m68k.

> Does (or can) the outer common header use the __builtin functions
> if no asm version exists?

Yes, this would be extremely easy. You just need to

  #include/asm-generic/bitops/builtin-__ffs.h
  #include/asm-generic/bitops/builtin-ffs.h
  #include/asm-generic/bitops/builtin-__fls.h
  #include/asm-generic/bitops/builtin-fls.h

and remove all the asm implementations. If you give me your green
light, I can do that change. This is trivial. The only thing I am not
ready to do is to compare the produced assembly code and confirm
whether or not it is better to remove asm code.

Thoughts?

Yours sincerely,
Vincent Mailhol

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions
  2024-01-28 13:27           ` Vincent MAILHOL
@ 2024-01-28 19:01             ` David Laight
  2024-01-28 22:34             ` Finn Thain
  1 sibling, 0 replies; 22+ messages in thread
From: David Laight @ 2024-01-28 19:01 UTC (permalink / raw)
  To: 'Vincent MAILHOL'
  Cc: Finn Thain, Andrew Morton, linux-kernel, Yury Norov,
	Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek,
	Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver,
	Brian Cain, Geert Uytterhoeven, Matthew Wilcox,
	Paul E . McKenney, linux-m68k

From: Vincent MAILHOL 
> Sent: 28 January 2024 13:28
...
> Fair. My goal was not to point to the assembly code but to this sentence:
> 
>   However, for non constant expressions, the kernel's ffs() asm
>   version remains better for x86_64 because, contrary to GCC, it
>   doesn't emit the CMOV assembly instruction

The comment in the code is:
	 * AMD64 says BSFL won't clobber the dest reg if x==0; Intel64 says the
	 * dest reg is undefined if x==0, but their CPU architect says its
	 * value is written to set it to the same as before, except that the
	 * top 32 bits will be cleared.
I guess gcc isn't willing to act on hearsay!

> 
> I should have been more clear. Sorry for that.
> 
> But the fact remains, on x86, some of the asm produced more optimized
> code than the builtin.
> 
> > I actually suspect the asm versions predate the builtins.
> 
> This seems true. The __bultins were introduced in:
> 
>   generic: Implement generic ffs/fls using __builtin_* functions
>   https://git.kernel.org/torvalds/c/048fa2df92c3

I was thinking of compiler versions not kernel source commits.
You'd need to be doing some serious software archaeology.

> when the asm implementation already existed in m68k.
> 
> > Does (or can) the outer common header use the __builtin functions
> > if no asm version exists?
> 
> Yes, this would be extremely easy. You just need to
> 
>   #include/asm-generic/bitops/builtin-__ffs.h
>   #include/asm-generic/bitops/builtin-ffs.h
>   #include/asm-generic/bitops/builtin-__fls.h
>   #include/asm-generic/bitops/builtin-fls.h
> 
> and remove all the asm implementations. If you give me your green
> light, I can do that change. This is trivial. The only thing I am not
> ready to do is to compare the produced assembly code and confirm
> whether or not it is better to remove asm code.
> 
> Thoughts?

Not for me to say.
But the builtins are likely to generate reasonable code.
So unless the asm can be better (like trusting undocumented
x86 cpu behaviour) using them is probably best.

For the ones you are changing it may be best to propose using
the builtins all the time.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions
  2024-01-28 13:27           ` Vincent MAILHOL
  2024-01-28 19:01             ` David Laight
@ 2024-01-28 22:34             ` Finn Thain
  2024-02-04 13:56               ` Vincent MAILHOL
  1 sibling, 1 reply; 22+ messages in thread
From: Finn Thain @ 2024-01-28 22:34 UTC (permalink / raw)
  To: Vincent MAILHOL
  Cc: David Laight, Andrew Morton, linux-kernel, Yury Norov,
	Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek,
	Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver,
	Brian Cain, Geert Uytterhoeven, Matthew Wilcox,
	Paul E . McKenney, linux-m68k


On Sun, 28 Jan 2024, Vincent MAILHOL wrote:

> > > The asm is meant to produce better results when the argument is not 
> > > a constant expression.

Is that because gcc's implementation has to satisfy requirements that are 
excessively stringent for the kernel's purposes? Or is it a compiler 
deficiency only affecting certain architectures?

> ... The only thing I am not ready to do is to compare the produced 
> assembly code and confirm whether or not it is better to remove asm 
> code.
> 

If you do the comparison and find no change, you get to say so in the 
commit log, and everyone is happy.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions
  2024-01-28 22:34             ` Finn Thain
@ 2024-02-04 13:56               ` Vincent MAILHOL
  2024-02-04 23:13                 ` Finn Thain
  0 siblings, 1 reply; 22+ messages in thread
From: Vincent MAILHOL @ 2024-02-04 13:56 UTC (permalink / raw)
  To: Finn Thain
  Cc: David Laight, Andrew Morton, linux-kernel, Yury Norov,
	Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek,
	Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver,
	Brian Cain, Geert Uytterhoeven, Matthew Wilcox,
	Paul E . McKenney, linux-m68k

Hi Finn and David,

Sorry for the late feedback, I did not have much time during weekdays.

On Monday. 29 Jan. 2024 at 07:34, Finn Thain <fthain@linux-m68k.org> wrote:
> On Sun, 28 Jan 2024, Vincent MAILHOL wrote:
> > > > The asm is meant to produce better results when the argument is not
> > > > a constant expression.
>
> Is that because gcc's implementation has to satisfy requirements that are
> excessively stringent for the kernel's purposes? Or is it a compiler
> deficiency only affecting certain architectures?

I just guess that GCC guys followed the Intel datasheet while the
kernel guys like to live a bit more dangerously and rely on some not
so well defined behaviours... But I am really not the person to whom
you should ask.

I just want to optimize the constant folding and this is the only
purpose of this series. I am absolutely not an asm. That's also why I
am reluctant to compare the asm outputs.

> > ... The only thing I am not ready to do is to compare the produced
> > assembly code and confirm whether or not it is better to remove asm
> > code.
> >
>
> If you do the comparison and find no change, you get to say so in the
> commit log, and everyone is happy.

Without getting into details, here is a quick comparaisons of gcc and
clang generated asm for all the bitops builtin:

  https://godbolt.org/z/Yb8nMKnYf

To the extent of my limited knowledge, it looks rather OK for gcc, but
for clang... seems that this is the default software implementation.

So are we fine with the current patch? Or maybe clang support is not
important for m68k? I do not know so tell me :)


Yours sincerely,
Vincent Mailhol

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions
  2024-02-04 13:56               ` Vincent MAILHOL
@ 2024-02-04 23:13                 ` Finn Thain
  2024-02-05  9:17                   ` Vincent MAILHOL
  0 siblings, 1 reply; 22+ messages in thread
From: Finn Thain @ 2024-02-04 23:13 UTC (permalink / raw)
  To: Vincent MAILHOL
  Cc: David Laight, Andrew Morton, linux-kernel, Yury Norov,
	Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek,
	Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver,
	Brian Cain, Geert Uytterhoeven, Matthew Wilcox,
	Paul E . McKenney, linux-m68k

On Sun, 4 Feb 2024, Vincent MAILHOL wrote:

> Sorry for the late feedback, I did not have much time during weekdays.
> 
> On Monday. 29 Jan. 2024 at 07:34, Finn Thain <fthain@linux-m68k.org> wrote:
> > On Sun, 28 Jan 2024, Vincent MAILHOL wrote:
> > > > > The asm is meant to produce better results when the argument is not
> > > > > a constant expression.
> >
> > Is that because gcc's implementation has to satisfy requirements that are
> > excessively stringent for the kernel's purposes? Or is it a compiler
> > deficiency only affecting certain architectures?
> 
> I just guess that GCC guys followed the Intel datasheet while the
> kernel guys like to live a bit more dangerously and rely on some not
> so well defined behaviours... But I am really not the person to whom
> you should ask.
> 
> I just want to optimize the constant folding and this is the only
> purpose of this series. I am absolutely not an asm. That's also why I
> am reluctant to compare the asm outputs.
> 

How does replacing asm with a builtin prevent constant folding?

> > > ... The only thing I am not ready to do is to compare the produced
> > > assembly code and confirm whether or not it is better to remove asm
> > > code.
> > >
> >
> > If you do the comparison and find no change, you get to say so in the
> > commit log, and everyone is happy.
> 
> Without getting into details, here is a quick comparaisons of gcc and
> clang generated asm for all the bitops builtin:
> 
>   https://godbolt.org/z/Yb8nMKnYf
> 
> To the extent of my limited knowledge, it looks rather OK for gcc, but
> for clang... seems that this is the default software implementation.
> 
> So are we fine with the current patch? Or maybe clang support is not
> important for m68k? I do not know so tell me :)
> 

Let's see if I understand.

You are proposing that the kernel source carry an unquantified 
optimization, with inherent complexity and maintenance costs, just for the 
benefit of users who choose a compiler that doesn't work as well as the 
standard compiler. Is that it?

At some point in the future when clang comes up to scrach with gcc and the 
builtin reaches parity with the asm, I wonder if you will then remove both 
your optimization and the asm, to eliminate the afore-mentioned complexity 
and maintenance costs. Is there an incentive for that work?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions
  2024-02-04 23:13                 ` Finn Thain
@ 2024-02-05  9:17                   ` Vincent MAILHOL
  2024-02-05  9:48                     ` Finn Thain
  0 siblings, 1 reply; 22+ messages in thread
From: Vincent MAILHOL @ 2024-02-05  9:17 UTC (permalink / raw)
  To: Finn Thain
  Cc: David Laight, Andrew Morton, linux-kernel, Yury Norov,
	Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek,
	Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver,
	Brian Cain, Geert Uytterhoeven, Matthew Wilcox,
	Paul E . McKenney, linux-m68k

On Mon. 5 Feb. 2024 at 08:13, Finn Thain <fthain@linux-m68k.org> wrote:
> On Sun, 4 Feb 2024, Vincent MAILHOL wrote:
(...)
> Let's see if I understand.
>
> You are proposing that the kernel source carry an unquantified
> optimization, with inherent complexity and maintenance costs, just for the
> benefit of users who choose a compiler that doesn't work as well as the
> standard compiler. Is that it?

My proposal is quantified, c.f. my commit message:

  On an allyesconfig, with GCC 13.2.1, it saves roughly 8 KB.

"Saving roughly 8kb" is a quantification.

And clang use in the kernel is standardized:

  https://www.kernel.org/doc/html/latest/process/changes.html#current-minimal-requirements

GCC may be predominant, but, although optional, clang v11+ is
officially supported, and thus my patches should not neglect it.

This is why I am asking whether or not clang support is important or
not for m68k. If you tell me it is not, then fine, I will remove all
the asm (by the way, the patch is already ready). But if there are
even a few users who care about clang for m68k, then I do not think we
should penalize them and I would not sign-off a change which
negatively impacts some users.

The linux-m68k mailing list should know better than me if people care
about clang support. So let me reiterate the question from my previous
message: is clang support important for you?

I would like a formal acknowledgement from the linux-m68k mailing list
before sending a patch that may negatively impact some users. Thank
you.

> At some point in the future when clang comes up to scrach with gcc and the
> builtin reaches parity with the asm, I wonder if you will then remove both
> your optimization and the asm, to eliminate the afore-mentioned complexity
> and maintenance costs. Is there an incentive for that work?

I will not follow up the evolution of clang support for m68k builtins.
The goal of this series is to add a test to assert that all
architectures correctly do the constant folding on the bit find
operations (fifth patch of this series). It happens that m68k and
hexagon architectures are failing that test, so I am fixing this as a
one shot. After this series, I have no plan to do further work on m68k
and hexagon architectures. Thank you for your understanding.


Yours sincerely,
Vincent Mailhol

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions
  2024-02-05  9:17                   ` Vincent MAILHOL
@ 2024-02-05  9:48                     ` Finn Thain
  2024-02-05 10:43                       ` Vincent MAILHOL
  0 siblings, 1 reply; 22+ messages in thread
From: Finn Thain @ 2024-02-05  9:48 UTC (permalink / raw)
  To: Vincent MAILHOL
  Cc: David Laight, Andrew Morton, linux-kernel, Yury Norov,
	Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek,
	Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver,
	Brian Cain, Geert Uytterhoeven, Matthew Wilcox,
	Paul E . McKenney, linux-m68k


On Mon, 5 Feb 2024, Vincent MAILHOL wrote:

> 
> This is why I am asking whether or not clang support is important or not 
> for m68k. If you tell me it is not, then fine, I will remove all the asm 
> (by the way, the patch is already ready). But if there are even a few 
> users who care about clang for m68k, then I do not think we should 
> penalize them and I would not sign-off a change which negatively impacts 
> some users.
> 

If clang support is important then clang's builtins are important. So why 
not improve those instead? That would resolve the issue in a win-win.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions
  2024-02-05  9:48                     ` Finn Thain
@ 2024-02-05 10:43                       ` Vincent MAILHOL
  2024-02-05 15:40                         ` [PATCH] m68k/bitops: always use compiler's builtin for bit finding functions Vincent Mailhol
  2024-02-07  6:31                         ` [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Finn Thain
  0 siblings, 2 replies; 22+ messages in thread
From: Vincent MAILHOL @ 2024-02-05 10:43 UTC (permalink / raw)
  To: Finn Thain
  Cc: David Laight, Andrew Morton, linux-kernel, Yury Norov,
	Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek,
	Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver,
	Brian Cain, Geert Uytterhoeven, Matthew Wilcox,
	Paul E . McKenney, linux-m68k

On Mon. 5 Feb. 2024 at 18:48, Finn Thain <fthain@linux-m68k.org> wrote:
> On Mon, 5 Feb 2024, Vincent MAILHOL wrote:
>
> >
> > This is why I am asking whether or not clang support is important or not
> > for m68k. If you tell me it is not, then fine, I will remove all the asm
> > (by the way, the patch is already ready). But if there are even a few
> > users who care about clang for m68k, then I do not think we should
> > penalize them and I would not sign-off a change which negatively impacts
> > some users.
> >
>
> If clang support is important then clang's builtins are important. So why
> not improve those instead? That would resolve the issue in a win-win.

I am deeply sorry, but with all your respect, this request is unfair.
I will not fix the compiler.

Let me repeat my question for the third time: are you (or any other
m68k maintainer) ready to acknowledge that we can degrade the
performance for clang m68k users? With that acknowledgement, I will
remove the asm and replace it with the builtins.


Yours sincerely,
Vincent Mailhol

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] m68k/bitops: always use compiler's builtin for bit finding functions
  2024-02-05 10:43                       ` Vincent MAILHOL
@ 2024-02-05 15:40                         ` Vincent Mailhol
  2024-02-07  6:31                         ` [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Finn Thain
  1 sibling, 0 replies; 22+ messages in thread
From: Vincent Mailhol @ 2024-02-05 15:40 UTC (permalink / raw)
  To: fthain
  Cc: mailhol.vincent, David.Laight, akpm, bcain, dianders, elver,
	geert+renesas, geert, keescook, linux-kernel, linux-m68k,
	ndesaulniers, paulmck, pmladek, rdunlap, willy, yury.norov,
	zhaoyang.huang

GCC and clang provides a set of builtin functions which can be used as
a replacement for ffs(), __ffs(), fls() and __fls().

Using the builtin comes as a trade-off.

Pros:

  - Simpler code, easier to maintain
  - Guarantee the constant folding

Cons:

  - clang does not provide optimized code. Ref:
    https://godbolt.org/z/Yb8nMKnYf

Despite of the cons, use the builtin unconditionally and remove any
existing assembly implementation.

This done, remove the find_first_zero_bit(), find_next_zero_bit(),
find_first_bit() and find_next_bit() assembly implementations.
Instead, rely on the generic functions from linux/find.h which will
themselves rely on the builtin we just set up.

Not-signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
---
As written earlier, I do not want to sign-off a patch which can
degrade the performances of m68k clang users. But because it is
already written, there it is.

If someone wants to pick up this, go ahead. Just make sure to remove
any reference to myself. I hereby grant permission for anyone to reuse
that patch, with or without modifications, under the unique condition
that my name gets removed.
---
 arch/m68k/include/asm/bitops.h | 215 +--------------------------------
 1 file changed, 5 insertions(+), 210 deletions(-)

diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
index 14c64a6f1217..44aac8ced9fc 100644
--- a/arch/m68k/include/asm/bitops.h
+++ b/arch/m68k/include/asm/bitops.h
@@ -340,218 +340,13 @@ static inline bool xor_unlock_is_negative_byte(unsigned long mask,
 #endif
 }
 
-/*
- *	The true 68020 and more advanced processors support the "bfffo"
- *	instruction for finding bits. ColdFire and simple 68000 parts
- *	(including CPU32) do not support this. They simply use the generic
- *	functions.
- */
-#if defined(CONFIG_CPU_HAS_NO_BITFIELDS)
-#include <asm-generic/bitops/ffz.h>
-#else
-
-static inline int find_first_zero_bit(const unsigned long *vaddr,
-				      unsigned size)
-{
-	const unsigned long *p = vaddr;
-	int res = 32;
-	unsigned int words;
-	unsigned long num;
-
-	if (!size)
-		return 0;
-
-	words = (size + 31) >> 5;
-	while (!(num = ~*p++)) {
-		if (!--words)
-			goto out;
-	}
-
-	__asm__ __volatile__ ("bfffo %1{#0,#0},%0"
-			      : "=d" (res) : "d" (num & -num));
-	res ^= 31;
-out:
-	res += ((long)p - (long)vaddr - 4) * 8;
-	return res < size ? res : size;
-}
-#define find_first_zero_bit find_first_zero_bit
-
-static inline int find_next_zero_bit(const unsigned long *vaddr, int size,
-				     int offset)
-{
-	const unsigned long *p = vaddr + (offset >> 5);
-	int bit = offset & 31UL, res;
-
-	if (offset >= size)
-		return size;
-
-	if (bit) {
-		unsigned long num = ~*p++ & (~0UL << bit);
-		offset -= bit;
-
-		/* Look for zero in first longword */
-		__asm__ __volatile__ ("bfffo %1{#0,#0},%0"
-				      : "=d" (res) : "d" (num & -num));
-		if (res < 32) {
-			offset += res ^ 31;
-			return offset < size ? offset : size;
-		}
-		offset += 32;
-
-		if (offset >= size)
-			return size;
-	}
-	/* No zero yet, search remaining full bytes for a zero */
-	return offset + find_first_zero_bit(p, size - offset);
-}
-#define find_next_zero_bit find_next_zero_bit
-
-static inline int find_first_bit(const unsigned long *vaddr, unsigned size)
-{
-	const unsigned long *p = vaddr;
-	int res = 32;
-	unsigned int words;
-	unsigned long num;
-
-	if (!size)
-		return 0;
-
-	words = (size + 31) >> 5;
-	while (!(num = *p++)) {
-		if (!--words)
-			goto out;
-	}
-
-	__asm__ __volatile__ ("bfffo %1{#0,#0},%0"
-			      : "=d" (res) : "d" (num & -num));
-	res ^= 31;
-out:
-	res += ((long)p - (long)vaddr - 4) * 8;
-	return res < size ? res : size;
-}
-#define find_first_bit find_first_bit
-
-static inline int find_next_bit(const unsigned long *vaddr, int size,
-				int offset)
-{
-	const unsigned long *p = vaddr + (offset >> 5);
-	int bit = offset & 31UL, res;
-
-	if (offset >= size)
-		return size;
-
-	if (bit) {
-		unsigned long num = *p++ & (~0UL << bit);
-		offset -= bit;
-
-		/* Look for one in first longword */
-		__asm__ __volatile__ ("bfffo %1{#0,#0},%0"
-				      : "=d" (res) : "d" (num & -num));
-		if (res < 32) {
-			offset += res ^ 31;
-			return offset < size ? offset : size;
-		}
-		offset += 32;
-
-		if (offset >= size)
-			return size;
-	}
-	/* No one yet, search remaining full bytes for a one */
-	return offset + find_first_bit(p, size - offset);
-}
-#define find_next_bit find_next_bit
-
-/*
- * ffz = Find First Zero in word. Undefined if no zero exists,
- * so code should check against ~0UL first..
- */
-static inline unsigned long ffz(unsigned long word)
-{
-	int res;
-
-	__asm__ __volatile__ ("bfffo %1{#0,#0},%0"
-			      : "=d" (res) : "d" (~word & -~word));
-	return res ^ 31;
-}
-
-#endif
-
 #ifdef __KERNEL__
 
-#if defined(CONFIG_CPU_HAS_NO_BITFIELDS)
-
-/*
- *	The newer ColdFire family members support a "bitrev" instruction
- *	and we can use that to implement a fast ffs. Older Coldfire parts,
- *	and normal 68000 parts don't have anything special, so we use the
- *	generic functions for those.
- */
-#if (defined(__mcfisaaplus__) || defined(__mcfisac__)) && \
-	!defined(CONFIG_M68000)
-static inline unsigned long __ffs(unsigned long x)
-{
-	__asm__ __volatile__ ("bitrev %0; ff1 %0"
-		: "=d" (x)
-		: "0" (x));
-	return x;
-}
-
-static inline int ffs(int x)
-{
-	if (!x)
-		return 0;
-	return __ffs(x) + 1;
-}
-
-#else
-#include <asm-generic/bitops/ffs.h>
-#include <asm-generic/bitops/__ffs.h>
-#endif
-
-#include <asm-generic/bitops/fls.h>
-#include <asm-generic/bitops/__fls.h>
-
-#else
-
-/*
- *	ffs: find first bit set. This is defined the same way as
- *	the libc and compiler builtin ffs routines, therefore
- *	differs in spirit from the above ffz (man ffs).
- */
-static inline int ffs(int x)
-{
-	int cnt;
-
-	__asm__ ("bfffo %1{#0:#0},%0"
-		: "=d" (cnt)
-		: "dm" (x & -x));
-	return 32 - cnt;
-}
-
-static inline unsigned long __ffs(unsigned long x)
-{
-	return ffs(x) - 1;
-}
-
-/*
- *	fls: find last bit set.
- */
-static inline int fls(unsigned int x)
-{
-	int cnt;
-
-	__asm__ ("bfffo %1{#0,#0},%0"
-		: "=d" (cnt)
-		: "dm" (x));
-	return 32 - cnt;
-}
-
-static inline unsigned long __fls(unsigned long x)
-{
-	return fls(x) - 1;
-}
-
-#endif
+#include <asm-generic/bitops/builtin-ffs.h>
+#include <asm-generic/bitops/builtin-__ffs.h>
+#include <asm-generic/bitops/builtin-fls.h>
+#include <asm-generic/bitops/builtin-__fls.h>
+#include <asm-generic/bitops/ffz.h>
 
 /* Simple test-and-set bit locks */
 #define test_and_set_bit_lock	test_and_set_bit
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions
  2024-02-05 10:43                       ` Vincent MAILHOL
  2024-02-05 15:40                         ` [PATCH] m68k/bitops: always use compiler's builtin for bit finding functions Vincent Mailhol
@ 2024-02-07  6:31                         ` Finn Thain
  1 sibling, 0 replies; 22+ messages in thread
From: Finn Thain @ 2024-02-07  6:31 UTC (permalink / raw)
  To: Vincent MAILHOL
  Cc: David Laight, Andrew Morton, linux-kernel, Yury Norov,
	Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek,
	Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver,
	Brian Cain, Geert Uytterhoeven, Matthew Wilcox,
	Paul E . McKenney, linux-m68k


On Mon, 5 Feb 2024, Vincent MAILHOL wrote:

> On Mon. 5 Feb. 2024 at 18:48, Finn Thain <fthain@linux-m68k.org> wrote:
> > On Mon, 5 Feb 2024, Vincent MAILHOL wrote:
> >
> > If clang support is important then clang's builtins are important. So 
> > why not improve those instead? That would resolve the issue in a 
> > win-win.
> 
> I am deeply sorry, but with all your respect, this request is unfair. I 
> will not fix the compiler.
> 

Absolutely. It is unfair. Just as your proposal was unfair to maintainers. 
That patch would make a compiler deficiency into a maintenance burden.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2024-02-07  6:30 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20221111081316.30373-1-mailhol.vincent@wanadoo.fr>
     [not found] ` <20231217071250.892867-1-mailhol.vincent@wanadoo.fr>
     [not found]   ` <20231217071250.892867-2-mailhol.vincent@wanadoo.fr>
2024-01-02 10:28     ` [PATCH v3 1/5] m68k/bitops: force inlining of all bitops functions Geert Uytterhoeven
2024-01-07 12:01       ` Vincent MAILHOL
     [not found]   ` <20231217071250.892867-3-mailhol.vincent@wanadoo.fr>
2024-01-02 10:49     ` [PATCH v3 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Geert Uytterhoeven
2024-01-28  5:00 ` [PATCH v4 0/5] bitops: optimize code and add tests Vincent Mailhol
2024-01-28  5:00   ` [PATCH v4 1/5] m68k/bitops: force inlining of all bit-find functions Vincent Mailhol
2024-01-28  5:00   ` [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol
2024-01-28  5:39     ` Finn Thain
2024-01-28  6:26       ` Vincent MAILHOL
2024-01-28 12:16         ` David Laight
2024-01-28 13:27           ` Vincent MAILHOL
2024-01-28 19:01             ` David Laight
2024-01-28 22:34             ` Finn Thain
2024-02-04 13:56               ` Vincent MAILHOL
2024-02-04 23:13                 ` Finn Thain
2024-02-05  9:17                   ` Vincent MAILHOL
2024-02-05  9:48                     ` Finn Thain
2024-02-05 10:43                       ` Vincent MAILHOL
2024-02-05 15:40                         ` [PATCH] m68k/bitops: always use compiler's builtin for bit finding functions Vincent Mailhol
2024-02-07  6:31                         ` [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Finn Thain
2024-01-28  5:00   ` [PATCH v4 3/5] hexagon/bitops: force inlining of all bit-find functions Vincent Mailhol
2024-01-28  5:00   ` [PATCH v4 4/5] hexagon/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol
2024-01-28  5:00   ` [PATCH v4 5/5] lib: test_bitops: add compile-time optimization/evaluations assertions Vincent Mailhol

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).