linux-hexagon.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/5] bitops: optimize code and add tests
       [not found] <20221111081316.30373-1-mailhol.vincent@wanadoo.fr>
@ 2023-12-17  7:12 ` Vincent Mailhol
  2023-12-17  7:12   ` [PATCH v3 1/5] m68k/bitops: force inlining of all bitops functions Vincent Mailhol
                     ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Vincent Mailhol @ 2023-12-17  7:12 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, Yury Norov
  Cc: Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek,
	Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver,
	Brian Cain, Geert Uytterhoeven, Matthew Wilcox,
	Paul E . McKenney, linux-hexagon, linux-m68k, Vincent Mailhol

This series make sure that all the bitops operations (namely __ffs(),
ffs(), ffz(), __fls(), fls(), fls64()) correctly fold constant
expressions given that their argument is also a constant expression.

The first two patches optimize m68k architecture, the third and fourth
optimize the hexagon architecture bitops function, the fifth and final
patch adds test to assert that the constant folding occurs and that
the result is accurate.

This is tested on arm, arm64, hexagon, m68k, x86 and x86_64. For other
architectures, I am putting my trust into the kernel test robot to
send a report if ever one of the other architectures still lacks
bitops optimizations.
---

** Changelog **

v2 -> v3:

  - Add patches 1/5 and 2/5 to optimize m68k architecture bitops.
    Thanks to the kernel test robot for reporting!

  - Add patches 3/5 and 4/5 to optimize hexagon architecture bitops.
    Thanks to the kernel test robot for reporting!

  - Patch 5/5: mark test_bitops_const_eval() as __always_inline, this
    done, pass n (the test number) as a parameter. Previously, only
    BITS(10) was tested. Add tests for BITS(0) and BITS(31).

  Link: https://lore.kernel.org/all/20231130102717.1297492-1-mailhol.vincent@wanadoo.fr/

v1 -> v2:

  - Drop the RFC patch. v1 was not ready to be applied on x86 because
    of pending changes in arch/x86/include/asm/bitops.h. This was
    finally fixed by Nick in commit 3dae5c43badf ("x86/asm/bitops: Use
    __builtin_clz{l|ll} to evaluate constant expressions").
    Thanks Nick!

  - Update the commit description.

  - Introduce the test_const_eval() macro to factorize code.

  - No functional change.

  Link: https://lore.kernel.org/all/20221111081316.30373-1-mailhol.vincent@wanadoo.fr/

Vincent Mailhol (5):
  m68k/bitops: force inlining of all bitops functions
  m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant
    expressions
  hexagon/bitops: force inlining of all bitops functions
  hexagon/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant
    expressions
  lib: test_bitops: add compile-time optimization/evaluations assertions

 arch/hexagon/include/asm/bitops.h | 37 ++++++++----
 arch/m68k/include/asm/bitops.h    | 99 +++++++++++++++++--------------
 lib/Kconfig.debug                 |  4 ++
 lib/test_bitops.c                 | 32 ++++++++++
 4 files changed, 118 insertions(+), 54 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 1/5] m68k/bitops: force inlining of all bitops functions
  2023-12-17  7:12 ` [PATCH v3 0/5] bitops: optimize code and add tests Vincent Mailhol
@ 2023-12-17  7:12   ` Vincent Mailhol
  2024-01-02 10:28     ` Geert Uytterhoeven
  2023-12-17  7:12   ` [PATCH v3 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: Vincent Mailhol @ 2023-12-17  7:12 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, Yury Norov
  Cc: Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek,
	Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver,
	Brian Cain, Geert Uytterhoeven, Matthew Wilcox,
	Paul E . McKenney, linux-hexagon, linux-m68k, Vincent Mailhol

The inline keyword actually does not guarantee that the compiler will
inline a functions. Whenever the goal is to actually inline a
function, __always_inline should always be preferred instead.

On an allyesconfig, with GCC 13.2.1, it saves roughly 5 KB.

  $ size --format=GNU vmlinux.before vmlinux.after
        text       data        bss      total filename
    60449738   70975612    2288988  133714338 vmlinux.before
    60446534   70972412    2289596  133708542 vmlinux.after

Reference: commit 8dd5032d9c54 ("x86/asm/bitops: Force inlining of
test_and_set_bit and friends")
Link: https://git.kernel.org/torvalds/c/8dd5032d9c54

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
---
 arch/m68k/include/asm/bitops.h | 87 +++++++++++++++++-----------------
 1 file changed, 44 insertions(+), 43 deletions(-)

diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
index 14c64a6f1217..ae0457d582b8 100644
--- a/arch/m68k/include/asm/bitops.h
+++ b/arch/m68k/include/asm/bitops.h
@@ -28,7 +28,7 @@
  *	So we use the best form possible on a given platform.
  */
 
-static inline void bset_reg_set_bit(int nr, volatile unsigned long *vaddr)
+static __always_inline void bset_reg_set_bit(int nr, volatile unsigned long *vaddr)
 {
 	char *p = (char *)vaddr + (nr ^ 31) / 8;
 
@@ -38,7 +38,7 @@ static inline void bset_reg_set_bit(int nr, volatile unsigned long *vaddr)
 		: "memory");
 }
 
-static inline void bset_mem_set_bit(int nr, volatile unsigned long *vaddr)
+static __always_inline void bset_mem_set_bit(int nr, volatile unsigned long *vaddr)
 {
 	char *p = (char *)vaddr + (nr ^ 31) / 8;
 
@@ -47,7 +47,7 @@ static inline void bset_mem_set_bit(int nr, volatile unsigned long *vaddr)
 		: "di" (nr & 7));
 }
 
-static inline void bfset_mem_set_bit(int nr, volatile unsigned long *vaddr)
+static __always_inline void bfset_mem_set_bit(int nr, volatile unsigned long *vaddr)
 {
 	__asm__ __volatile__ ("bfset %1{%0:#1}"
 		:
@@ -71,7 +71,7 @@ arch___set_bit(unsigned long nr, volatile unsigned long *addr)
 	set_bit(nr, addr);
 }
 
-static inline void bclr_reg_clear_bit(int nr, volatile unsigned long *vaddr)
+static __always_inline void bclr_reg_clear_bit(int nr, volatile unsigned long *vaddr)
 {
 	char *p = (char *)vaddr + (nr ^ 31) / 8;
 
@@ -81,7 +81,7 @@ static inline void bclr_reg_clear_bit(int nr, volatile unsigned long *vaddr)
 		: "memory");
 }
 
-static inline void bclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
+static __always_inline void bclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
 {
 	char *p = (char *)vaddr + (nr ^ 31) / 8;
 
@@ -90,7 +90,7 @@ static inline void bclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
 		: "di" (nr & 7));
 }
 
-static inline void bfclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
+static __always_inline void bfclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
 {
 	__asm__ __volatile__ ("bfclr %1{%0:#1}"
 		:
@@ -114,7 +114,7 @@ arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
 	clear_bit(nr, addr);
 }
 
-static inline void bchg_reg_change_bit(int nr, volatile unsigned long *vaddr)
+static __always_inline void bchg_reg_change_bit(int nr, volatile unsigned long *vaddr)
 {
 	char *p = (char *)vaddr + (nr ^ 31) / 8;
 
@@ -124,7 +124,7 @@ static inline void bchg_reg_change_bit(int nr, volatile unsigned long *vaddr)
 		: "memory");
 }
 
-static inline void bchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
+static __always_inline void bchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
 {
 	char *p = (char *)vaddr + (nr ^ 31) / 8;
 
@@ -133,7 +133,7 @@ static inline void bchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
 		: "di" (nr & 7));
 }
 
-static inline void bfchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
+static __always_inline void bfchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
 {
 	__asm__ __volatile__ ("bfchg %1{%0:#1}"
 		:
@@ -160,8 +160,8 @@ arch___change_bit(unsigned long nr, volatile unsigned long *addr)
 #define arch_test_bit generic_test_bit
 #define arch_test_bit_acquire generic_test_bit_acquire
 
-static inline int bset_reg_test_and_set_bit(int nr,
-					    volatile unsigned long *vaddr)
+static __always_inline int
+bset_reg_test_and_set_bit(int nr, volatile unsigned long *vaddr)
 {
 	char *p = (char *)vaddr + (nr ^ 31) / 8;
 	char retval;
@@ -173,8 +173,8 @@ static inline int bset_reg_test_and_set_bit(int nr,
 	return retval;
 }
 
-static inline int bset_mem_test_and_set_bit(int nr,
-					    volatile unsigned long *vaddr)
+static __always_inline int
+bset_mem_test_and_set_bit(int nr, volatile unsigned long *vaddr)
 {
 	char *p = (char *)vaddr + (nr ^ 31) / 8;
 	char retval;
@@ -185,8 +185,8 @@ static inline int bset_mem_test_and_set_bit(int nr,
 	return retval;
 }
 
-static inline int bfset_mem_test_and_set_bit(int nr,
-					     volatile unsigned long *vaddr)
+static __always_inline int
+bfset_mem_test_and_set_bit(int nr, volatile unsigned long *vaddr)
 {
 	char retval;
 
@@ -213,8 +213,8 @@ arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
 	return test_and_set_bit(nr, addr);
 }
 
-static inline int bclr_reg_test_and_clear_bit(int nr,
-					      volatile unsigned long *vaddr)
+static __always_inline int
+bclr_reg_test_and_clear_bit(int nr, volatile unsigned long *vaddr)
 {
 	char *p = (char *)vaddr + (nr ^ 31) / 8;
 	char retval;
@@ -226,8 +226,8 @@ static inline int bclr_reg_test_and_clear_bit(int nr,
 	return retval;
 }
 
-static inline int bclr_mem_test_and_clear_bit(int nr,
-					      volatile unsigned long *vaddr)
+static __always_inline int
+bclr_mem_test_and_clear_bit(int nr, volatile unsigned long *vaddr)
 {
 	char *p = (char *)vaddr + (nr ^ 31) / 8;
 	char retval;
@@ -238,8 +238,8 @@ static inline int bclr_mem_test_and_clear_bit(int nr,
 	return retval;
 }
 
-static inline int bfclr_mem_test_and_clear_bit(int nr,
-					       volatile unsigned long *vaddr)
+static __always_inline int
+bfclr_mem_test_and_clear_bit(int nr, volatile unsigned long *vaddr)
 {
 	char retval;
 
@@ -266,8 +266,8 @@ arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 	return test_and_clear_bit(nr, addr);
 }
 
-static inline int bchg_reg_test_and_change_bit(int nr,
-					       volatile unsigned long *vaddr)
+static __always_inline int
+bchg_reg_test_and_change_bit(int nr, volatile unsigned long *vaddr)
 {
 	char *p = (char *)vaddr + (nr ^ 31) / 8;
 	char retval;
@@ -279,8 +279,8 @@ static inline int bchg_reg_test_and_change_bit(int nr,
 	return retval;
 }
 
-static inline int bchg_mem_test_and_change_bit(int nr,
-					       volatile unsigned long *vaddr)
+static __always_inline int
+bchg_mem_test_and_change_bit(int nr, volatile unsigned long *vaddr)
 {
 	char *p = (char *)vaddr + (nr ^ 31) / 8;
 	char retval;
@@ -291,8 +291,8 @@ static inline int bchg_mem_test_and_change_bit(int nr,
 	return retval;
 }
 
-static inline int bfchg_mem_test_and_change_bit(int nr,
-						volatile unsigned long *vaddr)
+static __always_inline int
+bfchg_mem_test_and_change_bit(int nr, volatile unsigned long *vaddr)
 {
 	char retval;
 
@@ -319,8 +319,8 @@ arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 	return test_and_change_bit(nr, addr);
 }
 
-static inline bool xor_unlock_is_negative_byte(unsigned long mask,
-		volatile unsigned long *p)
+static __always_inline bool
+xor_unlock_is_negative_byte(unsigned long mask, volatile unsigned long *p)
 {
 #ifdef CONFIG_COLDFIRE
 	__asm__ __volatile__ ("eorl %1, %0"
@@ -350,8 +350,8 @@ static inline bool xor_unlock_is_negative_byte(unsigned long mask,
 #include <asm-generic/bitops/ffz.h>
 #else
 
-static inline int find_first_zero_bit(const unsigned long *vaddr,
-				      unsigned size)
+static __always_inline int
+find_first_zero_bit(const unsigned long *vaddr, unsigned size)
 {
 	const unsigned long *p = vaddr;
 	int res = 32;
@@ -376,8 +376,8 @@ static inline int find_first_zero_bit(const unsigned long *vaddr,
 }
 #define find_first_zero_bit find_first_zero_bit
 
-static inline int find_next_zero_bit(const unsigned long *vaddr, int size,
-				     int offset)
+static __always_inline int
+find_next_zero_bit(const unsigned long *vaddr, int size, int offset)
 {
 	const unsigned long *p = vaddr + (offset >> 5);
 	int bit = offset & 31UL, res;
@@ -406,7 +406,8 @@ static inline int find_next_zero_bit(const unsigned long *vaddr, int size,
 }
 #define find_next_zero_bit find_next_zero_bit
 
-static inline int find_first_bit(const unsigned long *vaddr, unsigned size)
+static __always_inline int
+find_first_bit(const unsigned long *vaddr, unsigned size)
 {
 	const unsigned long *p = vaddr;
 	int res = 32;
@@ -431,8 +432,8 @@ static inline int find_first_bit(const unsigned long *vaddr, unsigned size)
 }
 #define find_first_bit find_first_bit
 
-static inline int find_next_bit(const unsigned long *vaddr, int size,
-				int offset)
+static __always_inline int
+find_next_bit(const unsigned long *vaddr, int size, int offset)
 {
 	const unsigned long *p = vaddr + (offset >> 5);
 	int bit = offset & 31UL, res;
@@ -465,7 +466,7 @@ static inline int find_next_bit(const unsigned long *vaddr, int size,
  * ffz = Find First Zero in word. Undefined if no zero exists,
  * so code should check against ~0UL first..
  */
-static inline unsigned long ffz(unsigned long word)
+static __always_inline unsigned long ffz(unsigned long word)
 {
 	int res;
 
@@ -488,7 +489,7 @@ static inline unsigned long ffz(unsigned long word)
  */
 #if (defined(__mcfisaaplus__) || defined(__mcfisac__)) && \
 	!defined(CONFIG_M68000)
-static inline unsigned long __ffs(unsigned long x)
+static __always_inline unsigned long __ffs(unsigned long x)
 {
 	__asm__ __volatile__ ("bitrev %0; ff1 %0"
 		: "=d" (x)
@@ -496,7 +497,7 @@ static inline unsigned long __ffs(unsigned long x)
 	return x;
 }
 
-static inline int ffs(int x)
+static __always_inline int ffs(int x)
 {
 	if (!x)
 		return 0;
@@ -518,7 +519,7 @@ static inline int ffs(int x)
  *	the libc and compiler builtin ffs routines, therefore
  *	differs in spirit from the above ffz (man ffs).
  */
-static inline int ffs(int x)
+static __always_inline int ffs(int x)
 {
 	int cnt;
 
@@ -528,7 +529,7 @@ static inline int ffs(int x)
 	return 32 - cnt;
 }
 
-static inline unsigned long __ffs(unsigned long x)
+static __always_inline unsigned long __ffs(unsigned long x)
 {
 	return ffs(x) - 1;
 }
@@ -536,7 +537,7 @@ static inline unsigned long __ffs(unsigned long x)
 /*
  *	fls: find last bit set.
  */
-static inline int fls(unsigned int x)
+static __always_inline int fls(unsigned int x)
 {
 	int cnt;
 
@@ -546,7 +547,7 @@ static inline int fls(unsigned int x)
 	return 32 - cnt;
 }
 
-static inline unsigned long __fls(unsigned long x)
+static __always_inline unsigned long __fls(unsigned long x)
 {
 	return fls(x) - 1;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions
  2023-12-17  7:12 ` [PATCH v3 0/5] bitops: optimize code and add tests Vincent Mailhol
  2023-12-17  7:12   ` [PATCH v3 1/5] m68k/bitops: force inlining of all bitops functions Vincent Mailhol
@ 2023-12-17  7:12   ` Vincent Mailhol
  2024-01-02 10:49     ` Geert Uytterhoeven
  2023-12-17  7:12   ` [PATCH v3 3/5] hexagon/bitops: force inlining of all bitops functions Vincent Mailhol
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: Vincent Mailhol @ 2023-12-17  7:12 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, Yury Norov
  Cc: Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek,
	Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver,
	Brian Cain, Geert Uytterhoeven, Matthew Wilcox,
	Paul E . McKenney, linux-hexagon, linux-m68k, Vincent Mailhol

The compiler is not able to do constant folding on "asm volatile" code.

Evaluate whether or not the function argument is a constant expression
and if this is the case, return an equivalent builtin expression.

On an allyesconfig, with GCC 13.2.1, it saves roughly 8 KB.

  $ size --format=GNU vmlinux.before vmlinux.after
        text       data        bss      total filename
    60446534   70972412    2289596  133708542 vmlinux.before
    60429746   70978876    2291676  133700298 vmlinux.after

Reference: commit fdb6649ab7c1 ("x86/asm/bitops: Use __builtin_ctzl()
to evaluate constant expressions")
Link: https://git.kernel.org/torvalds/c/fdb6649ab7c1

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
---
 arch/m68k/include/asm/bitops.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
index ae0457d582b8..3f89b9dccc33 100644
--- a/arch/m68k/include/asm/bitops.h
+++ b/arch/m68k/include/asm/bitops.h
@@ -470,6 +470,9 @@ static __always_inline unsigned long ffz(unsigned long word)
 {
 	int res;
 
+	if (__builtin_constant_p(word))
+		return __builtin_ctzl(~word);
+
 	__asm__ __volatile__ ("bfffo %1{#0,#0},%0"
 			      : "=d" (res) : "d" (~word & -~word));
 	return res ^ 31;
@@ -491,6 +494,9 @@ static __always_inline unsigned long ffz(unsigned long word)
 	!defined(CONFIG_M68000)
 static __always_inline unsigned long __ffs(unsigned long x)
 {
+	if (__builtin_constant_p(x))
+		return __builtin_ctzl(x);
+
 	__asm__ __volatile__ ("bitrev %0; ff1 %0"
 		: "=d" (x)
 		: "0" (x));
@@ -523,6 +529,9 @@ static __always_inline int ffs(int x)
 {
 	int cnt;
 
+	if (__builtin_constant_p(x))
+		return __builtin_ffs(x);
+
 	__asm__ ("bfffo %1{#0:#0},%0"
 		: "=d" (cnt)
 		: "dm" (x & -x));
@@ -541,6 +550,9 @@ static __always_inline int fls(unsigned int x)
 {
 	int cnt;
 
+	if (__builtin_constant_p(x))
+		return x ? BITS_PER_TYPE(x) - __builtin_clz(x) : 0;
+
 	__asm__ ("bfffo %1{#0,#0},%0"
 		: "=d" (cnt)
 		: "dm" (x));
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 3/5] hexagon/bitops: force inlining of all bitops functions
  2023-12-17  7:12 ` [PATCH v3 0/5] bitops: optimize code and add tests Vincent Mailhol
  2023-12-17  7:12   ` [PATCH v3 1/5] m68k/bitops: force inlining of all bitops functions Vincent Mailhol
  2023-12-17  7:12   ` [PATCH v3 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol
@ 2023-12-17  7:12   ` Vincent Mailhol
  2023-12-17  7:12   ` [PATCH v3 4/5] hexagon/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol
  2023-12-17  7:12   ` [PATCH v3 5/5] lib: test_bitops: add compile-time optimization/evaluations assertions Vincent Mailhol
  4 siblings, 0 replies; 9+ messages in thread
From: Vincent Mailhol @ 2023-12-17  7:12 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, Yury Norov
  Cc: Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek,
	Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver,
	Brian Cain, Geert Uytterhoeven, Matthew Wilcox,
	Paul E . McKenney, linux-hexagon, linux-m68k, Vincent Mailhol

The inline keyword actually does not guarantee that the compiler will
inline a functions. Whenever the goal is to actually inline a
function, __always_inline should always be preferred instead.

Reference: commit 8dd5032d9c54 ("x86/asm/bitops: Force inlining of
test_and_set_bit and friends")
Link: https://git.kernel.org/torvalds/c/8dd5032d9c54

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
---
 arch/hexagon/include/asm/bitops.h | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h
index 160d8f37fa1a..950d4acc2edc 100644
--- a/arch/hexagon/include/asm/bitops.h
+++ b/arch/hexagon/include/asm/bitops.h
@@ -28,7 +28,7 @@
  * @nr:  bit number to clear
  * @addr:  pointer to memory
  */
-static inline int test_and_clear_bit(int nr, volatile void *addr)
+static __always_inline int test_and_clear_bit(int nr, volatile void *addr)
 {
 	int oldval;
 
@@ -52,7 +52,7 @@ static inline int test_and_clear_bit(int nr, volatile void *addr)
  * @nr:  bit number to set
  * @addr:  pointer to memory
  */
-static inline int test_and_set_bit(int nr, volatile void *addr)
+static __always_inline int test_and_set_bit(int nr, volatile void *addr)
 {
 	int oldval;
 
@@ -78,7 +78,7 @@ static inline int test_and_set_bit(int nr, volatile void *addr)
  * @nr:  bit number to set
  * @addr:  pointer to memory
  */
-static inline int test_and_change_bit(int nr, volatile void *addr)
+static __always_inline int test_and_change_bit(int nr, volatile void *addr)
 {
 	int oldval;
 
@@ -103,17 +103,17 @@ static inline int test_and_change_bit(int nr, volatile void *addr)
  * Rewrite later to save a cycle or two.
  */
 
-static inline void clear_bit(int nr, volatile void *addr)
+static __always_inline void clear_bit(int nr, volatile void *addr)
 {
 	test_and_clear_bit(nr, addr);
 }
 
-static inline void set_bit(int nr, volatile void *addr)
+static __always_inline void set_bit(int nr, volatile void *addr)
 {
 	test_and_set_bit(nr, addr);
 }
 
-static inline void change_bit(int nr, volatile void *addr)
+static __always_inline void change_bit(int nr, volatile void *addr)
 {
 	test_and_change_bit(nr, addr);
 }
@@ -200,7 +200,7 @@ arch_test_bit_acquire(unsigned long nr, const volatile unsigned long *addr)
  *
  * Undefined if no zero exists, so code should check against ~0UL first.
  */
-static inline long ffz(int x)
+static __always_inline long ffz(int x)
 {
 	int r;
 
@@ -217,7 +217,7 @@ static inline long ffz(int x)
  * This is defined the same way as ffs.
  * Note fls(0) = 0, fls(1) = 1, fls(0x80000000) = 32.
  */
-static inline int fls(unsigned int x)
+static __always_inline int fls(unsigned int x)
 {
 	int r;
 
@@ -238,7 +238,7 @@ static inline int fls(unsigned int x)
  * the libc and compiler builtin ffs routines, therefore
  * differs in spirit from the above ffz (man ffs).
  */
-static inline int ffs(int x)
+static __always_inline int ffs(int x)
 {
 	int r;
 
@@ -260,7 +260,7 @@ static inline int ffs(int x)
  * bits_per_long assumed to be 32
  * numbering starts at 0 I think (instead of 1 like ffs)
  */
-static inline unsigned long __ffs(unsigned long word)
+static __always_inline unsigned long __ffs(unsigned long word)
 {
 	int num;
 
@@ -278,7 +278,7 @@ static inline unsigned long __ffs(unsigned long word)
  * Undefined if no set bit exists, so code should check against 0 first.
  * bits_per_long assumed to be 32
  */
-static inline unsigned long __fls(unsigned long word)
+static __always_inline unsigned long __fls(unsigned long word)
 {
 	int num;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 4/5] hexagon/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions
  2023-12-17  7:12 ` [PATCH v3 0/5] bitops: optimize code and add tests Vincent Mailhol
                     ` (2 preceding siblings ...)
  2023-12-17  7:12   ` [PATCH v3 3/5] hexagon/bitops: force inlining of all bitops functions Vincent Mailhol
@ 2023-12-17  7:12   ` Vincent Mailhol
  2023-12-17  7:12   ` [PATCH v3 5/5] lib: test_bitops: add compile-time optimization/evaluations assertions Vincent Mailhol
  4 siblings, 0 replies; 9+ messages in thread
From: Vincent Mailhol @ 2023-12-17  7:12 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, Yury Norov
  Cc: Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek,
	Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver,
	Brian Cain, Geert Uytterhoeven, Matthew Wilcox,
	Paul E . McKenney, linux-hexagon, linux-m68k, Vincent Mailhol

The compiler is not able to do constant folding on "asm volatile" code.

Evaluate whether or not the function argument is a constant expression
and if this is the case, return an equivalent builtin expression.

Reference: commit fdb6649ab7c1 ("x86/asm/bitops: Use __builtin_ctzl()
to evaluate constant expressions")
Link: https://git.kernel.org/torvalds/c/fdb6649ab7c1

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
---
 arch/hexagon/include/asm/bitops.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h
index 950d4acc2edc..12c6ad1ea2ed 100644
--- a/arch/hexagon/include/asm/bitops.h
+++ b/arch/hexagon/include/asm/bitops.h
@@ -204,6 +204,9 @@ static __always_inline long ffz(int x)
 {
 	int r;
 
+	if (__builtin_constant_p(x))
+		return __builtin_ctzl(~x);
+
 	asm("%0 = ct1(%1);\n"
 		: "=&r" (r)
 		: "r" (x));
@@ -221,6 +224,9 @@ static __always_inline int fls(unsigned int x)
 {
 	int r;
 
+	if (__builtin_constant_p(x))
+		return x ? BITS_PER_TYPE(x) - __builtin_clz(x) : 0;
+
 	asm("{ %0 = cl0(%1);}\n"
 		"%0 = sub(#32,%0);\n"
 		: "=&r" (r)
@@ -242,6 +248,9 @@ static __always_inline int ffs(int x)
 {
 	int r;
 
+	if (__builtin_constant_p(x))
+		return __builtin_ffs(x);
+
 	asm("{ P0 = cmp.eq(%1,#0); %0 = ct0(%1);}\n"
 		"{ if (P0) %0 = #0; if (!P0) %0 = add(%0,#1);}\n"
 		: "=&r" (r)
@@ -264,6 +273,9 @@ static __always_inline unsigned long __ffs(unsigned long word)
 {
 	int num;
 
+	if (__builtin_constant_p(word))
+		return __builtin_ctzl(word);
+
 	asm("%0 = ct0(%1);\n"
 		: "=&r" (num)
 		: "r" (word));
@@ -282,6 +294,9 @@ static __always_inline unsigned long __fls(unsigned long word)
 {
 	int num;
 
+	if (__builtin_constant_p(word))
+		return BITS_PER_LONG - 1 - __builtin_clzl(word);
+
 	asm("%0 = cl0(%1);\n"
 		"%0 = sub(#31,%0);\n"
 		: "=&r" (num)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 5/5] lib: test_bitops: add compile-time optimization/evaluations assertions
  2023-12-17  7:12 ` [PATCH v3 0/5] bitops: optimize code and add tests Vincent Mailhol
                     ` (3 preceding siblings ...)
  2023-12-17  7:12   ` [PATCH v3 4/5] hexagon/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol
@ 2023-12-17  7:12   ` Vincent Mailhol
  4 siblings, 0 replies; 9+ messages in thread
From: Vincent Mailhol @ 2023-12-17  7:12 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, Yury Norov
  Cc: Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek,
	Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver,
	Brian Cain, Geert Uytterhoeven, Matthew Wilcox,
	Paul E . McKenney, linux-hexagon, linux-m68k, Vincent Mailhol

Add a function in the bitops test suite to assert that the bitops
helper correctly fold constant expressions (or trigger a build bug
otherwise). This should work on all the optimization levels supported
by Kbuild.

The added function doesn't perform any runtime tests and gets
optimized out to nothing after passing the build assertions.

Suggested-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
---
 lib/Kconfig.debug |  4 ++++
 lib/test_bitops.c | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index cc7d53d9dc01..c97d818dbc30 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2454,6 +2454,10 @@ config TEST_BITOPS
 	  compilations. It has no dependencies and doesn't run or load unless
 	  explicitly requested by name.  for example: modprobe test_bitops.
 
+	  In addition, check that the compiler is able to fold the bitops
+	  function into a compile-time constant (given that the argument is also
+	  a compile-time constant) and trigger a build bug otherwise.
+
 	  If unsure, say N.
 
 config TEST_VMALLOC
diff --git a/lib/test_bitops.c b/lib/test_bitops.c
index 3b7bcbee84db..99b612515eb6 100644
--- a/lib/test_bitops.c
+++ b/lib/test_bitops.c
@@ -50,6 +50,34 @@ static unsigned long order_comb_long[][2] = {
 };
 #endif
 
+/* Assert that a boolean expression can be folded in a constant and is true. */
+#define test_const_eval(test_expr)				\
+({								\
+	/* Evaluate once so that compiler can fold it. */	\
+	bool __test_expr = test_expr;				\
+								\
+	BUILD_BUG_ON(!__builtin_constant_p(__test_expr));	\
+	BUILD_BUG_ON(!__test_expr);				\
+})
+
+/*
+ * On any supported optimization level (-O2, -Os) and if invoked with
+ * a compile-time constant argument, the compiler must be able to fold
+ * into a constant expression all the bit find functions. Namely:
+ * __ffs(), ffs(), ffz(), __fls(), fls() and fls64(). Otherwise,
+ * trigger a build bug.
+ */
+static __always_inline void test_bitops_const_eval(unsigned int n)
+{
+	test_const_eval(__ffs(BIT(n)) == n);
+	test_const_eval(ffs(BIT(n)) == n + 1);
+	test_const_eval(ffz(~BIT(n)) == n);
+	test_const_eval(__fls(BIT(n)) == n);
+	test_const_eval(fls(BIT(n)) == n + 1);
+	test_const_eval(fls64(BIT_ULL(n)) == n + 1);
+	test_const_eval(fls64(BIT_ULL(n + 32)) == n + 33);
+}
+
 static int __init test_bitops_startup(void)
 {
 	int i, bit_set;
@@ -94,6 +122,10 @@ static int __init test_bitops_startup(void)
 	if (bit_set != BITOPS_LAST)
 		pr_err("ERROR: FOUND SET BIT %d\n", bit_set);
 
+	test_bitops_const_eval(0);
+	test_bitops_const_eval(10);
+	test_bitops_const_eval(31);
+
 	pr_info("Completed bitops test\n");
 
 	return 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/5] m68k/bitops: force inlining of all bitops functions
  2023-12-17  7:12   ` [PATCH v3 1/5] m68k/bitops: force inlining of all bitops functions Vincent Mailhol
@ 2024-01-02 10:28     ` Geert Uytterhoeven
  2024-01-07 12:01       ` Vincent MAILHOL
  0 siblings, 1 reply; 9+ messages in thread
From: Geert Uytterhoeven @ 2024-01-02 10:28 UTC (permalink / raw)
  To: Vincent Mailhol
  Cc: Andrew Morton, linux-kernel, Yury Norov, Nick Desaulniers,
	Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap,
	Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain,
	Matthew Wilcox, Paul E . McKenney, linux-hexagon, linux-m68k

Hi Vincent,

Thanks for your patch!

On Sun, Dec 17, 2023 at 8:13 AM Vincent Mailhol
<mailhol.vincent@wanadoo.fr> wrote:
> The inline keyword actually does not guarantee that the compiler will
> inline a functions. Whenever the goal is to actually inline a
> function, __always_inline should always be preferred instead.
>
> On an allyesconfig, with GCC 13.2.1, it saves roughly 5 KB.
>
>   $ size --format=GNU vmlinux.before vmlinux.after
>         text       data        bss      total filename
>     60449738   70975612    2288988  133714338 vmlinux.before
>     60446534   70972412    2289596  133708542 vmlinux.after

With gcc 9.5.0-1ubuntu1~22.04, the figures are completely different
(i.e. a size increase):

allyesconfig:

      text       data        bss      total filename
  58878600   72415994    2283652  133578246 vmlinux.before
  58882250   72419706    2284004  133585960 vmlinux.after

atari_defconfig:

      text       data        bss      total filename
   4112060    1579862     151680    5843602 vmlinux-v6.7-rc8
   4117008    1579350     151680    5848038
vmlinux-v6.7-rc8-1-m68k-bitops-force-inlining

The next patch offsets that for allyesconfig, but not for atari_defconfig.

> Reference: commit 8dd5032d9c54 ("x86/asm/bitops: Force inlining of
> test_and_set_bit and friends")

Please don't split lines containing tags.

> Link: https://git.kernel.org/torvalds/c/8dd5032d9c54
>
> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>

Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions
  2023-12-17  7:12   ` [PATCH v3 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol
@ 2024-01-02 10:49     ` Geert Uytterhoeven
  0 siblings, 0 replies; 9+ messages in thread
From: Geert Uytterhoeven @ 2024-01-02 10:49 UTC (permalink / raw)
  To: Vincent Mailhol
  Cc: Andrew Morton, linux-kernel, Yury Norov, Nick Desaulniers,
	Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap,
	Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain,
	Matthew Wilcox, Paul E . McKenney, linux-hexagon, linux-m68k

On Sun, Dec 17, 2023 at 8:13 AM Vincent Mailhol
<mailhol.vincent@wanadoo.fr> wrote:
>
> The compiler is not able to do constant folding on "asm volatile" code.
>
> Evaluate whether or not the function argument is a constant expression
> and if this is the case, return an equivalent builtin expression.
>
> On an allyesconfig, with GCC 13.2.1, it saves roughly 8 KB.
>
>   $ size --format=GNU vmlinux.before vmlinux.after
>         text       data        bss      total filename
>     60446534   70972412    2289596  133708542 vmlinux.before
>     60429746   70978876    2291676  133700298 vmlinux.after

Still a win with gcc 9.5.0-1ubuntu1~22.04:

allyesconfig:

      text       data        bss      total filename
  58882250   72419706    2284004  133585960
vmlinux-v6.7-rc8-1-m68k-bitops-force-inlining
  58863570   72419546    2285860  133568976
vmlinux-v6.7-rc8-2-m68k-bitops-use-__builtin_clz,ctzl,ffs

So an even bigger win...

atari_defconfig:

      text       data        bss      total filename
   4117008    1579350     151680    5848038
vmlinux-v6.7-rc8-1-m68k-bitops-force-inlining
   4116940    1579534     151712    5848186
vmlinux-v6.7-rc8-2-m68k-bitops-use-__builtin_clz,ctzl,ffs

... but a small size increase here.


>
> Reference: commit fdb6649ab7c1 ("x86/asm/bitops: Use __builtin_ctzl()
> to evaluate constant expressions")

Please don't split lines containing tags.

> Link: https://git.kernel.org/torvalds/c/fdb6649ab7c1
>
> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>

Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/5] m68k/bitops: force inlining of all bitops functions
  2024-01-02 10:28     ` Geert Uytterhoeven
@ 2024-01-07 12:01       ` Vincent MAILHOL
  0 siblings, 0 replies; 9+ messages in thread
From: Vincent MAILHOL @ 2024-01-07 12:01 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Andrew Morton, linux-kernel, Yury Norov, Nick Desaulniers,
	Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap,
	Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain,
	Matthew Wilcox, Paul E . McKenney, linux-hexagon, linux-m68k

On Tue. 2 janv. 2024 at 19:28, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Vincent,
>
> Thanks for your patch!

Thanks for the review and for running the benchmark.

> On Sun, Dec 17, 2023 at 8:13 AM Vincent Mailhol
> <mailhol.vincent@wanadoo.fr> wrote:
> > The inline keyword actually does not guarantee that the compiler will
> > inline a functions. Whenever the goal is to actually inline a
> > function, __always_inline should always be preferred instead.
> >
> > On an allyesconfig, with GCC 13.2.1, it saves roughly 5 KB.
> >
> >   $ size --format=GNU vmlinux.before vmlinux.after
> >         text       data        bss      total filename
> >     60449738   70975612    2288988  133714338 vmlinux.before
> >     60446534   70972412    2289596  133708542 vmlinux.after
>
> With gcc 9.5.0-1ubuntu1~22.04, the figures are completely different
> (i.e. a size increase):

Those results are not normal, there should not be such a big
discrepancy between two versions of the same compiler. I double
checked everything and found out that I made a mistake when computing
the figures: not sure what exactly, but at some point, the ASLR seeds
(or other similar randomization feature) got reset and so, the
decrease I witnessed was just a "lucky roll".

After rerunning the benchmark (making sure to keep every seeds), I got
similar results as you:

        text       data        bss      total filename
    60449738   70975356    2288988  133714082
vmlinux_allyesconfig.before_this_series
    60446534   70979068    2289596  133715198
vmlinux_allyesconfig.after_first_patch
    60429746   70979132    2291676  133700554
vmlinux_allyesconfig.final_second_patch

Note that there are still some kind of randomness on the data segment
as shown in those other benchmarks I run:

        text       data        bss      total filename
    60449738   70976124    2288988  133714850
vmlinux_allyesconfig.before_this_series
    60446534   70980092    2289596  133716222
vmlinux_allyesconfig.after_first_patch
    60429746   70979388    2291676  133700810
vmlinux_allyesconfig.after_second_patch

        text       data        bss      total filename
    60449738   70975612    2288988  133714338
vmlinux_allyesconfig.before_this_series
    60446534   70980348    2289596  133716478
vmlinux_allyesconfig.after_first_patch
    60429746   70979900    2291676  133701322
vmlinux_allyesconfig.after_second_patch

But the error margin is within 1K.

So, in short, I inlined some functions which I shouldn't have. I am
preparing a v4 in which I will only inline the bit-find functions
(namely: __ffs(), ffs(), ffz(), __fls(), fls() and fls64()). Here are
the new figures:

        text       data        bss      total filename
    60453552   70955485    2288620  133697657
vmlinux_allyesconfig.before_this_series
    60450304   70953085    2289260  133692649
vmlinux_allyesconfig.after_first_patch
    60433536   70952637    2291340  133677513
vmlinux_allyesconfig.after_second_patch

N.B. The new figures were after a rebase, so do not try to compare
with the previous benchmarks. I will send the v4 soon, after I finish
to update the patch comments and double check things.

Concerning the other functions in bitops.h, there may be some other
ones worth a __always_inline. But I will narrow the scope of this
series only to the bit-find function. If a good samaritan wants to
investigate the other functions, go ahead!

Yours sincerely,
Vincent Mailhol




> allyesconfig:
>
>       text       data        bss      total filename
>   58878600   72415994    2283652  133578246 vmlinux.before
>   58882250   72419706    2284004  133585960 vmlinux.after
>
> atari_defconfig:
>
>       text       data        bss      total filename
>    4112060    1579862     151680    5843602 vmlinux-v6.7-rc8
>    4117008    1579350     151680    5848038
> vmlinux-v6.7-rc8-1-m68k-bitops-force-inlining
>
> The next patch offsets that for allyesconfig, but not for atari_defconfig.
>
> > Reference: commit 8dd5032d9c54 ("x86/asm/bitops: Force inlining of
> > test_and_set_bit and friends")
>
> Please don't split lines containing tags.
>
> > Link: https://git.kernel.org/torvalds/c/8dd5032d9c54
> >
> > Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
>
> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
>
> Gr{oetje,eeting}s,
>
>                         Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-01-07 12:01 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20221111081316.30373-1-mailhol.vincent@wanadoo.fr>
2023-12-17  7:12 ` [PATCH v3 0/5] bitops: optimize code and add tests Vincent Mailhol
2023-12-17  7:12   ` [PATCH v3 1/5] m68k/bitops: force inlining of all bitops functions Vincent Mailhol
2024-01-02 10:28     ` Geert Uytterhoeven
2024-01-07 12:01       ` Vincent MAILHOL
2023-12-17  7:12   ` [PATCH v3 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol
2024-01-02 10:49     ` Geert Uytterhoeven
2023-12-17  7:12   ` [PATCH v3 3/5] hexagon/bitops: force inlining of all bitops functions Vincent Mailhol
2023-12-17  7:12   ` [PATCH v3 4/5] hexagon/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol
2023-12-17  7:12   ` [PATCH v3 5/5] lib: test_bitops: add compile-time optimization/evaluations assertions Vincent Mailhol

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).