* [PATCH v3 0/5] bitops: optimize code and add tests
[not found] <20221111081316.30373-1-mailhol.vincent@wanadoo.fr>
@ 2023-12-17 7:12 ` Vincent Mailhol
2023-12-17 7:12 ` [PATCH v3 1/5] m68k/bitops: force inlining of all bitops functions Vincent Mailhol
` (4 more replies)
0 siblings, 5 replies; 9+ messages in thread
From: Vincent Mailhol @ 2023-12-17 7:12 UTC (permalink / raw)
To: Andrew Morton, linux-kernel, Yury Norov
Cc: Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek,
Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver,
Brian Cain, Geert Uytterhoeven, Matthew Wilcox,
Paul E . McKenney, linux-hexagon, linux-m68k, Vincent Mailhol
This series make sure that all the bitops operations (namely __ffs(),
ffs(), ffz(), __fls(), fls(), fls64()) correctly fold constant
expressions given that their argument is also a constant expression.
The first two patches optimize m68k architecture, the third and fourth
optimize the hexagon architecture bitops function, the fifth and final
patch adds test to assert that the constant folding occurs and that
the result is accurate.
This is tested on arm, arm64, hexagon, m68k, x86 and x86_64. For other
architectures, I am putting my trust into the kernel test robot to
send a report if ever one of the other architectures still lacks
bitops optimizations.
---
** Changelog **
v2 -> v3:
- Add patches 1/5 and 2/5 to optimize m68k architecture bitops.
Thanks to the kernel test robot for reporting!
- Add patches 3/5 and 4/5 to optimize hexagon architecture bitops.
Thanks to the kernel test robot for reporting!
- Patch 5/5: mark test_bitops_const_eval() as __always_inline, this
done, pass n (the test number) as a parameter. Previously, only
BITS(10) was tested. Add tests for BITS(0) and BITS(31).
Link: https://lore.kernel.org/all/20231130102717.1297492-1-mailhol.vincent@wanadoo.fr/
v1 -> v2:
- Drop the RFC patch. v1 was not ready to be applied on x86 because
of pending changes in arch/x86/include/asm/bitops.h. This was
finally fixed by Nick in commit 3dae5c43badf ("x86/asm/bitops: Use
__builtin_clz{l|ll} to evaluate constant expressions").
Thanks Nick!
- Update the commit description.
- Introduce the test_const_eval() macro to factorize code.
- No functional change.
Link: https://lore.kernel.org/all/20221111081316.30373-1-mailhol.vincent@wanadoo.fr/
Vincent Mailhol (5):
m68k/bitops: force inlining of all bitops functions
m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant
expressions
hexagon/bitops: force inlining of all bitops functions
hexagon/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant
expressions
lib: test_bitops: add compile-time optimization/evaluations assertions
arch/hexagon/include/asm/bitops.h | 37 ++++++++----
arch/m68k/include/asm/bitops.h | 99 +++++++++++++++++--------------
lib/Kconfig.debug | 4 ++
lib/test_bitops.c | 32 ++++++++++
4 files changed, 118 insertions(+), 54 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v3 1/5] m68k/bitops: force inlining of all bitops functions
2023-12-17 7:12 ` [PATCH v3 0/5] bitops: optimize code and add tests Vincent Mailhol
@ 2023-12-17 7:12 ` Vincent Mailhol
2024-01-02 10:28 ` Geert Uytterhoeven
2023-12-17 7:12 ` [PATCH v3 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol
` (3 subsequent siblings)
4 siblings, 1 reply; 9+ messages in thread
From: Vincent Mailhol @ 2023-12-17 7:12 UTC (permalink / raw)
To: Andrew Morton, linux-kernel, Yury Norov
Cc: Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek,
Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver,
Brian Cain, Geert Uytterhoeven, Matthew Wilcox,
Paul E . McKenney, linux-hexagon, linux-m68k, Vincent Mailhol
The inline keyword actually does not guarantee that the compiler will
inline a functions. Whenever the goal is to actually inline a
function, __always_inline should always be preferred instead.
On an allyesconfig, with GCC 13.2.1, it saves roughly 5 KB.
$ size --format=GNU vmlinux.before vmlinux.after
text data bss total filename
60449738 70975612 2288988 133714338 vmlinux.before
60446534 70972412 2289596 133708542 vmlinux.after
Reference: commit 8dd5032d9c54 ("x86/asm/bitops: Force inlining of
test_and_set_bit and friends")
Link: https://git.kernel.org/torvalds/c/8dd5032d9c54
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
---
arch/m68k/include/asm/bitops.h | 87 +++++++++++++++++-----------------
1 file changed, 44 insertions(+), 43 deletions(-)
diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
index 14c64a6f1217..ae0457d582b8 100644
--- a/arch/m68k/include/asm/bitops.h
+++ b/arch/m68k/include/asm/bitops.h
@@ -28,7 +28,7 @@
* So we use the best form possible on a given platform.
*/
-static inline void bset_reg_set_bit(int nr, volatile unsigned long *vaddr)
+static __always_inline void bset_reg_set_bit(int nr, volatile unsigned long *vaddr)
{
char *p = (char *)vaddr + (nr ^ 31) / 8;
@@ -38,7 +38,7 @@ static inline void bset_reg_set_bit(int nr, volatile unsigned long *vaddr)
: "memory");
}
-static inline void bset_mem_set_bit(int nr, volatile unsigned long *vaddr)
+static __always_inline void bset_mem_set_bit(int nr, volatile unsigned long *vaddr)
{
char *p = (char *)vaddr + (nr ^ 31) / 8;
@@ -47,7 +47,7 @@ static inline void bset_mem_set_bit(int nr, volatile unsigned long *vaddr)
: "di" (nr & 7));
}
-static inline void bfset_mem_set_bit(int nr, volatile unsigned long *vaddr)
+static __always_inline void bfset_mem_set_bit(int nr, volatile unsigned long *vaddr)
{
__asm__ __volatile__ ("bfset %1{%0:#1}"
:
@@ -71,7 +71,7 @@ arch___set_bit(unsigned long nr, volatile unsigned long *addr)
set_bit(nr, addr);
}
-static inline void bclr_reg_clear_bit(int nr, volatile unsigned long *vaddr)
+static __always_inline void bclr_reg_clear_bit(int nr, volatile unsigned long *vaddr)
{
char *p = (char *)vaddr + (nr ^ 31) / 8;
@@ -81,7 +81,7 @@ static inline void bclr_reg_clear_bit(int nr, volatile unsigned long *vaddr)
: "memory");
}
-static inline void bclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
+static __always_inline void bclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
{
char *p = (char *)vaddr + (nr ^ 31) / 8;
@@ -90,7 +90,7 @@ static inline void bclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
: "di" (nr & 7));
}
-static inline void bfclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
+static __always_inline void bfclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
{
__asm__ __volatile__ ("bfclr %1{%0:#1}"
:
@@ -114,7 +114,7 @@ arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
clear_bit(nr, addr);
}
-static inline void bchg_reg_change_bit(int nr, volatile unsigned long *vaddr)
+static __always_inline void bchg_reg_change_bit(int nr, volatile unsigned long *vaddr)
{
char *p = (char *)vaddr + (nr ^ 31) / 8;
@@ -124,7 +124,7 @@ static inline void bchg_reg_change_bit(int nr, volatile unsigned long *vaddr)
: "memory");
}
-static inline void bchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
+static __always_inline void bchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
{
char *p = (char *)vaddr + (nr ^ 31) / 8;
@@ -133,7 +133,7 @@ static inline void bchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
: "di" (nr & 7));
}
-static inline void bfchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
+static __always_inline void bfchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
{
__asm__ __volatile__ ("bfchg %1{%0:#1}"
:
@@ -160,8 +160,8 @@ arch___change_bit(unsigned long nr, volatile unsigned long *addr)
#define arch_test_bit generic_test_bit
#define arch_test_bit_acquire generic_test_bit_acquire
-static inline int bset_reg_test_and_set_bit(int nr,
- volatile unsigned long *vaddr)
+static __always_inline int
+bset_reg_test_and_set_bit(int nr, volatile unsigned long *vaddr)
{
char *p = (char *)vaddr + (nr ^ 31) / 8;
char retval;
@@ -173,8 +173,8 @@ static inline int bset_reg_test_and_set_bit(int nr,
return retval;
}
-static inline int bset_mem_test_and_set_bit(int nr,
- volatile unsigned long *vaddr)
+static __always_inline int
+bset_mem_test_and_set_bit(int nr, volatile unsigned long *vaddr)
{
char *p = (char *)vaddr + (nr ^ 31) / 8;
char retval;
@@ -185,8 +185,8 @@ static inline int bset_mem_test_and_set_bit(int nr,
return retval;
}
-static inline int bfset_mem_test_and_set_bit(int nr,
- volatile unsigned long *vaddr)
+static __always_inline int
+bfset_mem_test_and_set_bit(int nr, volatile unsigned long *vaddr)
{
char retval;
@@ -213,8 +213,8 @@ arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
return test_and_set_bit(nr, addr);
}
-static inline int bclr_reg_test_and_clear_bit(int nr,
- volatile unsigned long *vaddr)
+static __always_inline int
+bclr_reg_test_and_clear_bit(int nr, volatile unsigned long *vaddr)
{
char *p = (char *)vaddr + (nr ^ 31) / 8;
char retval;
@@ -226,8 +226,8 @@ static inline int bclr_reg_test_and_clear_bit(int nr,
return retval;
}
-static inline int bclr_mem_test_and_clear_bit(int nr,
- volatile unsigned long *vaddr)
+static __always_inline int
+bclr_mem_test_and_clear_bit(int nr, volatile unsigned long *vaddr)
{
char *p = (char *)vaddr + (nr ^ 31) / 8;
char retval;
@@ -238,8 +238,8 @@ static inline int bclr_mem_test_and_clear_bit(int nr,
return retval;
}
-static inline int bfclr_mem_test_and_clear_bit(int nr,
- volatile unsigned long *vaddr)
+static __always_inline int
+bfclr_mem_test_and_clear_bit(int nr, volatile unsigned long *vaddr)
{
char retval;
@@ -266,8 +266,8 @@ arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
return test_and_clear_bit(nr, addr);
}
-static inline int bchg_reg_test_and_change_bit(int nr,
- volatile unsigned long *vaddr)
+static __always_inline int
+bchg_reg_test_and_change_bit(int nr, volatile unsigned long *vaddr)
{
char *p = (char *)vaddr + (nr ^ 31) / 8;
char retval;
@@ -279,8 +279,8 @@ static inline int bchg_reg_test_and_change_bit(int nr,
return retval;
}
-static inline int bchg_mem_test_and_change_bit(int nr,
- volatile unsigned long *vaddr)
+static __always_inline int
+bchg_mem_test_and_change_bit(int nr, volatile unsigned long *vaddr)
{
char *p = (char *)vaddr + (nr ^ 31) / 8;
char retval;
@@ -291,8 +291,8 @@ static inline int bchg_mem_test_and_change_bit(int nr,
return retval;
}
-static inline int bfchg_mem_test_and_change_bit(int nr,
- volatile unsigned long *vaddr)
+static __always_inline int
+bfchg_mem_test_and_change_bit(int nr, volatile unsigned long *vaddr)
{
char retval;
@@ -319,8 +319,8 @@ arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
return test_and_change_bit(nr, addr);
}
-static inline bool xor_unlock_is_negative_byte(unsigned long mask,
- volatile unsigned long *p)
+static __always_inline bool
+xor_unlock_is_negative_byte(unsigned long mask, volatile unsigned long *p)
{
#ifdef CONFIG_COLDFIRE
__asm__ __volatile__ ("eorl %1, %0"
@@ -350,8 +350,8 @@ static inline bool xor_unlock_is_negative_byte(unsigned long mask,
#include <asm-generic/bitops/ffz.h>
#else
-static inline int find_first_zero_bit(const unsigned long *vaddr,
- unsigned size)
+static __always_inline int
+find_first_zero_bit(const unsigned long *vaddr, unsigned size)
{
const unsigned long *p = vaddr;
int res = 32;
@@ -376,8 +376,8 @@ static inline int find_first_zero_bit(const unsigned long *vaddr,
}
#define find_first_zero_bit find_first_zero_bit
-static inline int find_next_zero_bit(const unsigned long *vaddr, int size,
- int offset)
+static __always_inline int
+find_next_zero_bit(const unsigned long *vaddr, int size, int offset)
{
const unsigned long *p = vaddr + (offset >> 5);
int bit = offset & 31UL, res;
@@ -406,7 +406,8 @@ static inline int find_next_zero_bit(const unsigned long *vaddr, int size,
}
#define find_next_zero_bit find_next_zero_bit
-static inline int find_first_bit(const unsigned long *vaddr, unsigned size)
+static __always_inline int
+find_first_bit(const unsigned long *vaddr, unsigned size)
{
const unsigned long *p = vaddr;
int res = 32;
@@ -431,8 +432,8 @@ static inline int find_first_bit(const unsigned long *vaddr, unsigned size)
}
#define find_first_bit find_first_bit
-static inline int find_next_bit(const unsigned long *vaddr, int size,
- int offset)
+static __always_inline int
+find_next_bit(const unsigned long *vaddr, int size, int offset)
{
const unsigned long *p = vaddr + (offset >> 5);
int bit = offset & 31UL, res;
@@ -465,7 +466,7 @@ static inline int find_next_bit(const unsigned long *vaddr, int size,
* ffz = Find First Zero in word. Undefined if no zero exists,
* so code should check against ~0UL first..
*/
-static inline unsigned long ffz(unsigned long word)
+static __always_inline unsigned long ffz(unsigned long word)
{
int res;
@@ -488,7 +489,7 @@ static inline unsigned long ffz(unsigned long word)
*/
#if (defined(__mcfisaaplus__) || defined(__mcfisac__)) && \
!defined(CONFIG_M68000)
-static inline unsigned long __ffs(unsigned long x)
+static __always_inline unsigned long __ffs(unsigned long x)
{
__asm__ __volatile__ ("bitrev %0; ff1 %0"
: "=d" (x)
@@ -496,7 +497,7 @@ static inline unsigned long __ffs(unsigned long x)
return x;
}
-static inline int ffs(int x)
+static __always_inline int ffs(int x)
{
if (!x)
return 0;
@@ -518,7 +519,7 @@ static inline int ffs(int x)
* the libc and compiler builtin ffs routines, therefore
* differs in spirit from the above ffz (man ffs).
*/
-static inline int ffs(int x)
+static __always_inline int ffs(int x)
{
int cnt;
@@ -528,7 +529,7 @@ static inline int ffs(int x)
return 32 - cnt;
}
-static inline unsigned long __ffs(unsigned long x)
+static __always_inline unsigned long __ffs(unsigned long x)
{
return ffs(x) - 1;
}
@@ -536,7 +537,7 @@ static inline unsigned long __ffs(unsigned long x)
/*
* fls: find last bit set.
*/
-static inline int fls(unsigned int x)
+static __always_inline int fls(unsigned int x)
{
int cnt;
@@ -546,7 +547,7 @@ static inline int fls(unsigned int x)
return 32 - cnt;
}
-static inline unsigned long __fls(unsigned long x)
+static __always_inline unsigned long __fls(unsigned long x)
{
return fls(x) - 1;
}
--
2.25.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v3 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions
2023-12-17 7:12 ` [PATCH v3 0/5] bitops: optimize code and add tests Vincent Mailhol
2023-12-17 7:12 ` [PATCH v3 1/5] m68k/bitops: force inlining of all bitops functions Vincent Mailhol
@ 2023-12-17 7:12 ` Vincent Mailhol
2024-01-02 10:49 ` Geert Uytterhoeven
2023-12-17 7:12 ` [PATCH v3 3/5] hexagon/bitops: force inlining of all bitops functions Vincent Mailhol
` (2 subsequent siblings)
4 siblings, 1 reply; 9+ messages in thread
From: Vincent Mailhol @ 2023-12-17 7:12 UTC (permalink / raw)
To: Andrew Morton, linux-kernel, Yury Norov
Cc: Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek,
Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver,
Brian Cain, Geert Uytterhoeven, Matthew Wilcox,
Paul E . McKenney, linux-hexagon, linux-m68k, Vincent Mailhol
The compiler is not able to do constant folding on "asm volatile" code.
Evaluate whether or not the function argument is a constant expression
and if this is the case, return an equivalent builtin expression.
On an allyesconfig, with GCC 13.2.1, it saves roughly 8 KB.
$ size --format=GNU vmlinux.before vmlinux.after
text data bss total filename
60446534 70972412 2289596 133708542 vmlinux.before
60429746 70978876 2291676 133700298 vmlinux.after
Reference: commit fdb6649ab7c1 ("x86/asm/bitops: Use __builtin_ctzl()
to evaluate constant expressions")
Link: https://git.kernel.org/torvalds/c/fdb6649ab7c1
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
---
arch/m68k/include/asm/bitops.h | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
index ae0457d582b8..3f89b9dccc33 100644
--- a/arch/m68k/include/asm/bitops.h
+++ b/arch/m68k/include/asm/bitops.h
@@ -470,6 +470,9 @@ static __always_inline unsigned long ffz(unsigned long word)
{
int res;
+ if (__builtin_constant_p(word))
+ return __builtin_ctzl(~word);
+
__asm__ __volatile__ ("bfffo %1{#0,#0},%0"
: "=d" (res) : "d" (~word & -~word));
return res ^ 31;
@@ -491,6 +494,9 @@ static __always_inline unsigned long ffz(unsigned long word)
!defined(CONFIG_M68000)
static __always_inline unsigned long __ffs(unsigned long x)
{
+ if (__builtin_constant_p(x))
+ return __builtin_ctzl(x);
+
__asm__ __volatile__ ("bitrev %0; ff1 %0"
: "=d" (x)
: "0" (x));
@@ -523,6 +529,9 @@ static __always_inline int ffs(int x)
{
int cnt;
+ if (__builtin_constant_p(x))
+ return __builtin_ffs(x);
+
__asm__ ("bfffo %1{#0:#0},%0"
: "=d" (cnt)
: "dm" (x & -x));
@@ -541,6 +550,9 @@ static __always_inline int fls(unsigned int x)
{
int cnt;
+ if (__builtin_constant_p(x))
+ return x ? BITS_PER_TYPE(x) - __builtin_clz(x) : 0;
+
__asm__ ("bfffo %1{#0,#0},%0"
: "=d" (cnt)
: "dm" (x));
--
2.25.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v3 3/5] hexagon/bitops: force inlining of all bitops functions
2023-12-17 7:12 ` [PATCH v3 0/5] bitops: optimize code and add tests Vincent Mailhol
2023-12-17 7:12 ` [PATCH v3 1/5] m68k/bitops: force inlining of all bitops functions Vincent Mailhol
2023-12-17 7:12 ` [PATCH v3 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol
@ 2023-12-17 7:12 ` Vincent Mailhol
2023-12-17 7:12 ` [PATCH v3 4/5] hexagon/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol
2023-12-17 7:12 ` [PATCH v3 5/5] lib: test_bitops: add compile-time optimization/evaluations assertions Vincent Mailhol
4 siblings, 0 replies; 9+ messages in thread
From: Vincent Mailhol @ 2023-12-17 7:12 UTC (permalink / raw)
To: Andrew Morton, linux-kernel, Yury Norov
Cc: Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek,
Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver,
Brian Cain, Geert Uytterhoeven, Matthew Wilcox,
Paul E . McKenney, linux-hexagon, linux-m68k, Vincent Mailhol
The inline keyword actually does not guarantee that the compiler will
inline a functions. Whenever the goal is to actually inline a
function, __always_inline should always be preferred instead.
Reference: commit 8dd5032d9c54 ("x86/asm/bitops: Force inlining of
test_and_set_bit and friends")
Link: https://git.kernel.org/torvalds/c/8dd5032d9c54
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
---
arch/hexagon/include/asm/bitops.h | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h
index 160d8f37fa1a..950d4acc2edc 100644
--- a/arch/hexagon/include/asm/bitops.h
+++ b/arch/hexagon/include/asm/bitops.h
@@ -28,7 +28,7 @@
* @nr: bit number to clear
* @addr: pointer to memory
*/
-static inline int test_and_clear_bit(int nr, volatile void *addr)
+static __always_inline int test_and_clear_bit(int nr, volatile void *addr)
{
int oldval;
@@ -52,7 +52,7 @@ static inline int test_and_clear_bit(int nr, volatile void *addr)
* @nr: bit number to set
* @addr: pointer to memory
*/
-static inline int test_and_set_bit(int nr, volatile void *addr)
+static __always_inline int test_and_set_bit(int nr, volatile void *addr)
{
int oldval;
@@ -78,7 +78,7 @@ static inline int test_and_set_bit(int nr, volatile void *addr)
* @nr: bit number to set
* @addr: pointer to memory
*/
-static inline int test_and_change_bit(int nr, volatile void *addr)
+static __always_inline int test_and_change_bit(int nr, volatile void *addr)
{
int oldval;
@@ -103,17 +103,17 @@ static inline int test_and_change_bit(int nr, volatile void *addr)
* Rewrite later to save a cycle or two.
*/
-static inline void clear_bit(int nr, volatile void *addr)
+static __always_inline void clear_bit(int nr, volatile void *addr)
{
test_and_clear_bit(nr, addr);
}
-static inline void set_bit(int nr, volatile void *addr)
+static __always_inline void set_bit(int nr, volatile void *addr)
{
test_and_set_bit(nr, addr);
}
-static inline void change_bit(int nr, volatile void *addr)
+static __always_inline void change_bit(int nr, volatile void *addr)
{
test_and_change_bit(nr, addr);
}
@@ -200,7 +200,7 @@ arch_test_bit_acquire(unsigned long nr, const volatile unsigned long *addr)
*
* Undefined if no zero exists, so code should check against ~0UL first.
*/
-static inline long ffz(int x)
+static __always_inline long ffz(int x)
{
int r;
@@ -217,7 +217,7 @@ static inline long ffz(int x)
* This is defined the same way as ffs.
* Note fls(0) = 0, fls(1) = 1, fls(0x80000000) = 32.
*/
-static inline int fls(unsigned int x)
+static __always_inline int fls(unsigned int x)
{
int r;
@@ -238,7 +238,7 @@ static inline int fls(unsigned int x)
* the libc and compiler builtin ffs routines, therefore
* differs in spirit from the above ffz (man ffs).
*/
-static inline int ffs(int x)
+static __always_inline int ffs(int x)
{
int r;
@@ -260,7 +260,7 @@ static inline int ffs(int x)
* bits_per_long assumed to be 32
* numbering starts at 0 I think (instead of 1 like ffs)
*/
-static inline unsigned long __ffs(unsigned long word)
+static __always_inline unsigned long __ffs(unsigned long word)
{
int num;
@@ -278,7 +278,7 @@ static inline unsigned long __ffs(unsigned long word)
* Undefined if no set bit exists, so code should check against 0 first.
* bits_per_long assumed to be 32
*/
-static inline unsigned long __fls(unsigned long word)
+static __always_inline unsigned long __fls(unsigned long word)
{
int num;
--
2.25.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v3 4/5] hexagon/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions
2023-12-17 7:12 ` [PATCH v3 0/5] bitops: optimize code and add tests Vincent Mailhol
` (2 preceding siblings ...)
2023-12-17 7:12 ` [PATCH v3 3/5] hexagon/bitops: force inlining of all bitops functions Vincent Mailhol
@ 2023-12-17 7:12 ` Vincent Mailhol
2023-12-17 7:12 ` [PATCH v3 5/5] lib: test_bitops: add compile-time optimization/evaluations assertions Vincent Mailhol
4 siblings, 0 replies; 9+ messages in thread
From: Vincent Mailhol @ 2023-12-17 7:12 UTC (permalink / raw)
To: Andrew Morton, linux-kernel, Yury Norov
Cc: Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek,
Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver,
Brian Cain, Geert Uytterhoeven, Matthew Wilcox,
Paul E . McKenney, linux-hexagon, linux-m68k, Vincent Mailhol
The compiler is not able to do constant folding on "asm volatile" code.
Evaluate whether or not the function argument is a constant expression
and if this is the case, return an equivalent builtin expression.
Reference: commit fdb6649ab7c1 ("x86/asm/bitops: Use __builtin_ctzl()
to evaluate constant expressions")
Link: https://git.kernel.org/torvalds/c/fdb6649ab7c1
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
---
arch/hexagon/include/asm/bitops.h | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h
index 950d4acc2edc..12c6ad1ea2ed 100644
--- a/arch/hexagon/include/asm/bitops.h
+++ b/arch/hexagon/include/asm/bitops.h
@@ -204,6 +204,9 @@ static __always_inline long ffz(int x)
{
int r;
+ if (__builtin_constant_p(x))
+ return __builtin_ctzl(~x);
+
asm("%0 = ct1(%1);\n"
: "=&r" (r)
: "r" (x));
@@ -221,6 +224,9 @@ static __always_inline int fls(unsigned int x)
{
int r;
+ if (__builtin_constant_p(x))
+ return x ? BITS_PER_TYPE(x) - __builtin_clz(x) : 0;
+
asm("{ %0 = cl0(%1);}\n"
"%0 = sub(#32,%0);\n"
: "=&r" (r)
@@ -242,6 +248,9 @@ static __always_inline int ffs(int x)
{
int r;
+ if (__builtin_constant_p(x))
+ return __builtin_ffs(x);
+
asm("{ P0 = cmp.eq(%1,#0); %0 = ct0(%1);}\n"
"{ if (P0) %0 = #0; if (!P0) %0 = add(%0,#1);}\n"
: "=&r" (r)
@@ -264,6 +273,9 @@ static __always_inline unsigned long __ffs(unsigned long word)
{
int num;
+ if (__builtin_constant_p(word))
+ return __builtin_ctzl(word);
+
asm("%0 = ct0(%1);\n"
: "=&r" (num)
: "r" (word));
@@ -282,6 +294,9 @@ static __always_inline unsigned long __fls(unsigned long word)
{
int num;
+ if (__builtin_constant_p(word))
+ return BITS_PER_LONG - 1 - __builtin_clzl(word);
+
asm("%0 = cl0(%1);\n"
"%0 = sub(#31,%0);\n"
: "=&r" (num)
--
2.25.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v3 5/5] lib: test_bitops: add compile-time optimization/evaluations assertions
2023-12-17 7:12 ` [PATCH v3 0/5] bitops: optimize code and add tests Vincent Mailhol
` (3 preceding siblings ...)
2023-12-17 7:12 ` [PATCH v3 4/5] hexagon/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol
@ 2023-12-17 7:12 ` Vincent Mailhol
4 siblings, 0 replies; 9+ messages in thread
From: Vincent Mailhol @ 2023-12-17 7:12 UTC (permalink / raw)
To: Andrew Morton, linux-kernel, Yury Norov
Cc: Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek,
Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver,
Brian Cain, Geert Uytterhoeven, Matthew Wilcox,
Paul E . McKenney, linux-hexagon, linux-m68k, Vincent Mailhol
Add a function in the bitops test suite to assert that the bitops
helper correctly fold constant expressions (or trigger a build bug
otherwise). This should work on all the optimization levels supported
by Kbuild.
The added function doesn't perform any runtime tests and gets
optimized out to nothing after passing the build assertions.
Suggested-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
---
lib/Kconfig.debug | 4 ++++
lib/test_bitops.c | 32 ++++++++++++++++++++++++++++++++
2 files changed, 36 insertions(+)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index cc7d53d9dc01..c97d818dbc30 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2454,6 +2454,10 @@ config TEST_BITOPS
compilations. It has no dependencies and doesn't run or load unless
explicitly requested by name. for example: modprobe test_bitops.
+ In addition, check that the compiler is able to fold the bitops
+ function into a compile-time constant (given that the argument is also
+ a compile-time constant) and trigger a build bug otherwise.
+
If unsure, say N.
config TEST_VMALLOC
diff --git a/lib/test_bitops.c b/lib/test_bitops.c
index 3b7bcbee84db..99b612515eb6 100644
--- a/lib/test_bitops.c
+++ b/lib/test_bitops.c
@@ -50,6 +50,34 @@ static unsigned long order_comb_long[][2] = {
};
#endif
+/* Assert that a boolean expression can be folded in a constant and is true. */
+#define test_const_eval(test_expr) \
+({ \
+ /* Evaluate once so that compiler can fold it. */ \
+ bool __test_expr = test_expr; \
+ \
+ BUILD_BUG_ON(!__builtin_constant_p(__test_expr)); \
+ BUILD_BUG_ON(!__test_expr); \
+})
+
+/*
+ * On any supported optimization level (-O2, -Os) and if invoked with
+ * a compile-time constant argument, the compiler must be able to fold
+ * into a constant expression all the bit find functions. Namely:
+ * __ffs(), ffs(), ffz(), __fls(), fls() and fls64(). Otherwise,
+ * trigger a build bug.
+ */
+static __always_inline void test_bitops_const_eval(unsigned int n)
+{
+ test_const_eval(__ffs(BIT(n)) == n);
+ test_const_eval(ffs(BIT(n)) == n + 1);
+ test_const_eval(ffz(~BIT(n)) == n);
+ test_const_eval(__fls(BIT(n)) == n);
+ test_const_eval(fls(BIT(n)) == n + 1);
+ test_const_eval(fls64(BIT_ULL(n)) == n + 1);
+ test_const_eval(fls64(BIT_ULL(n + 32)) == n + 33);
+}
+
static int __init test_bitops_startup(void)
{
int i, bit_set;
@@ -94,6 +122,10 @@ static int __init test_bitops_startup(void)
if (bit_set != BITOPS_LAST)
pr_err("ERROR: FOUND SET BIT %d\n", bit_set);
+ test_bitops_const_eval(0);
+ test_bitops_const_eval(10);
+ test_bitops_const_eval(31);
+
pr_info("Completed bitops test\n");
return 0;
--
2.25.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v3 1/5] m68k/bitops: force inlining of all bitops functions
2023-12-17 7:12 ` [PATCH v3 1/5] m68k/bitops: force inlining of all bitops functions Vincent Mailhol
@ 2024-01-02 10:28 ` Geert Uytterhoeven
2024-01-07 12:01 ` Vincent MAILHOL
0 siblings, 1 reply; 9+ messages in thread
From: Geert Uytterhoeven @ 2024-01-02 10:28 UTC (permalink / raw)
To: Vincent Mailhol
Cc: Andrew Morton, linux-kernel, Yury Norov, Nick Desaulniers,
Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap,
Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain,
Matthew Wilcox, Paul E . McKenney, linux-hexagon, linux-m68k
Hi Vincent,
Thanks for your patch!
On Sun, Dec 17, 2023 at 8:13 AM Vincent Mailhol
<mailhol.vincent@wanadoo.fr> wrote:
> The inline keyword actually does not guarantee that the compiler will
> inline a functions. Whenever the goal is to actually inline a
> function, __always_inline should always be preferred instead.
>
> On an allyesconfig, with GCC 13.2.1, it saves roughly 5 KB.
>
> $ size --format=GNU vmlinux.before vmlinux.after
> text data bss total filename
> 60449738 70975612 2288988 133714338 vmlinux.before
> 60446534 70972412 2289596 133708542 vmlinux.after
With gcc 9.5.0-1ubuntu1~22.04, the figures are completely different
(i.e. a size increase):
allyesconfig:
text data bss total filename
58878600 72415994 2283652 133578246 vmlinux.before
58882250 72419706 2284004 133585960 vmlinux.after
atari_defconfig:
text data bss total filename
4112060 1579862 151680 5843602 vmlinux-v6.7-rc8
4117008 1579350 151680 5848038
vmlinux-v6.7-rc8-1-m68k-bitops-force-inlining
The next patch offsets that for allyesconfig, but not for atari_defconfig.
> Reference: commit 8dd5032d9c54 ("x86/asm/bitops: Force inlining of
> test_and_set_bit and friends")
Please don't split lines containing tags.
> Link: https://git.kernel.org/torvalds/c/8dd5032d9c54
>
> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions
2023-12-17 7:12 ` [PATCH v3 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol
@ 2024-01-02 10:49 ` Geert Uytterhoeven
0 siblings, 0 replies; 9+ messages in thread
From: Geert Uytterhoeven @ 2024-01-02 10:49 UTC (permalink / raw)
To: Vincent Mailhol
Cc: Andrew Morton, linux-kernel, Yury Norov, Nick Desaulniers,
Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap,
Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain,
Matthew Wilcox, Paul E . McKenney, linux-hexagon, linux-m68k
On Sun, Dec 17, 2023 at 8:13 AM Vincent Mailhol
<mailhol.vincent@wanadoo.fr> wrote:
>
> The compiler is not able to do constant folding on "asm volatile" code.
>
> Evaluate whether or not the function argument is a constant expression
> and if this is the case, return an equivalent builtin expression.
>
> On an allyesconfig, with GCC 13.2.1, it saves roughly 8 KB.
>
> $ size --format=GNU vmlinux.before vmlinux.after
> text data bss total filename
> 60446534 70972412 2289596 133708542 vmlinux.before
> 60429746 70978876 2291676 133700298 vmlinux.after
Still a win with gcc 9.5.0-1ubuntu1~22.04:
allyesconfig:
text data bss total filename
58882250 72419706 2284004 133585960
vmlinux-v6.7-rc8-1-m68k-bitops-force-inlining
58863570 72419546 2285860 133568976
vmlinux-v6.7-rc8-2-m68k-bitops-use-__builtin_clz,ctzl,ffs
So an even bigger win...
atari_defconfig:
text data bss total filename
4117008 1579350 151680 5848038
vmlinux-v6.7-rc8-1-m68k-bitops-force-inlining
4116940 1579534 151712 5848186
vmlinux-v6.7-rc8-2-m68k-bitops-use-__builtin_clz,ctzl,ffs
... but a small size increase here.
>
> Reference: commit fdb6649ab7c1 ("x86/asm/bitops: Use __builtin_ctzl()
> to evaluate constant expressions")
Please don't split lines containing tags.
> Link: https://git.kernel.org/torvalds/c/fdb6649ab7c1
>
> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 1/5] m68k/bitops: force inlining of all bitops functions
2024-01-02 10:28 ` Geert Uytterhoeven
@ 2024-01-07 12:01 ` Vincent MAILHOL
0 siblings, 0 replies; 9+ messages in thread
From: Vincent MAILHOL @ 2024-01-07 12:01 UTC (permalink / raw)
To: Geert Uytterhoeven
Cc: Andrew Morton, linux-kernel, Yury Norov, Nick Desaulniers,
Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap,
Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain,
Matthew Wilcox, Paul E . McKenney, linux-hexagon, linux-m68k
On Tue. 2 janv. 2024 at 19:28, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Vincent,
>
> Thanks for your patch!
Thanks for the review and for running the benchmark.
> On Sun, Dec 17, 2023 at 8:13 AM Vincent Mailhol
> <mailhol.vincent@wanadoo.fr> wrote:
> > The inline keyword actually does not guarantee that the compiler will
> > inline a functions. Whenever the goal is to actually inline a
> > function, __always_inline should always be preferred instead.
> >
> > On an allyesconfig, with GCC 13.2.1, it saves roughly 5 KB.
> >
> > $ size --format=GNU vmlinux.before vmlinux.after
> > text data bss total filename
> > 60449738 70975612 2288988 133714338 vmlinux.before
> > 60446534 70972412 2289596 133708542 vmlinux.after
>
> With gcc 9.5.0-1ubuntu1~22.04, the figures are completely different
> (i.e. a size increase):
Those results are not normal, there should not be such a big
discrepancy between two versions of the same compiler. I double
checked everything and found out that I made a mistake when computing
the figures: not sure what exactly, but at some point, the ASLR seeds
(or other similar randomization feature) got reset and so, the
decrease I witnessed was just a "lucky roll".
After rerunning the benchmark (making sure to keep every seeds), I got
similar results as you:
text data bss total filename
60449738 70975356 2288988 133714082
vmlinux_allyesconfig.before_this_series
60446534 70979068 2289596 133715198
vmlinux_allyesconfig.after_first_patch
60429746 70979132 2291676 133700554
vmlinux_allyesconfig.final_second_patch
Note that there are still some kind of randomness on the data segment
as shown in those other benchmarks I run:
text data bss total filename
60449738 70976124 2288988 133714850
vmlinux_allyesconfig.before_this_series
60446534 70980092 2289596 133716222
vmlinux_allyesconfig.after_first_patch
60429746 70979388 2291676 133700810
vmlinux_allyesconfig.after_second_patch
text data bss total filename
60449738 70975612 2288988 133714338
vmlinux_allyesconfig.before_this_series
60446534 70980348 2289596 133716478
vmlinux_allyesconfig.after_first_patch
60429746 70979900 2291676 133701322
vmlinux_allyesconfig.after_second_patch
But the error margin is within 1K.
So, in short, I inlined some functions which I shouldn't have. I am
preparing a v4 in which I will only inline the bit-find functions
(namely: __ffs(), ffs(), ffz(), __fls(), fls() and fls64()). Here are
the new figures:
text data bss total filename
60453552 70955485 2288620 133697657
vmlinux_allyesconfig.before_this_series
60450304 70953085 2289260 133692649
vmlinux_allyesconfig.after_first_patch
60433536 70952637 2291340 133677513
vmlinux_allyesconfig.after_second_patch
N.B. The new figures were after a rebase, so do not try to compare
with the previous benchmarks. I will send the v4 soon, after I finish
to update the patch comments and double check things.
Concerning the other functions in bitops.h, there may be some other
ones worth a __always_inline. But I will narrow the scope of this
series only to the bit-find function. If a good samaritan wants to
investigate the other functions, go ahead!
Yours sincerely,
Vincent Mailhol
> allyesconfig:
>
> text data bss total filename
> 58878600 72415994 2283652 133578246 vmlinux.before
> 58882250 72419706 2284004 133585960 vmlinux.after
>
> atari_defconfig:
>
> text data bss total filename
> 4112060 1579862 151680 5843602 vmlinux-v6.7-rc8
> 4117008 1579350 151680 5848038
> vmlinux-v6.7-rc8-1-m68k-bitops-force-inlining
>
> The next patch offsets that for allyesconfig, but not for atari_defconfig.
>
> > Reference: commit 8dd5032d9c54 ("x86/asm/bitops: Force inlining of
> > test_and_set_bit and friends")
>
> Please don't split lines containing tags.
>
> > Link: https://git.kernel.org/torvalds/c/8dd5032d9c54
> >
> > Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
>
> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
>
> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
> -- Linus Torvalds
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-01-07 12:01 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20221111081316.30373-1-mailhol.vincent@wanadoo.fr>
2023-12-17 7:12 ` [PATCH v3 0/5] bitops: optimize code and add tests Vincent Mailhol
2023-12-17 7:12 ` [PATCH v3 1/5] m68k/bitops: force inlining of all bitops functions Vincent Mailhol
2024-01-02 10:28 ` Geert Uytterhoeven
2024-01-07 12:01 ` Vincent MAILHOL
2023-12-17 7:12 ` [PATCH v3 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol
2024-01-02 10:49 ` Geert Uytterhoeven
2023-12-17 7:12 ` [PATCH v3 3/5] hexagon/bitops: force inlining of all bitops functions Vincent Mailhol
2023-12-17 7:12 ` [PATCH v3 4/5] hexagon/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol
2023-12-17 7:12 ` [PATCH v3 5/5] lib: test_bitops: add compile-time optimization/evaluations assertions Vincent Mailhol
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).