All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] powerpc: Replace some bitops functions by builtin/generic ones
@ 2017-04-21 11:18 Christophe Leroy
  2017-04-21 11:18 ` [PATCH 1/4] powerpc: Discard ffs()/__ffs() function and use builtin functions instead Christophe Leroy
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Christophe Leroy @ 2017-04-21 11:18 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Scott Wood
  Cc: linux-kernel, linuxppc-dev

This patchset replaces the following bitops by builtin/generic ones:
- ffs() / __ffs()
- fls() / __fls() / fls64()
- ffz()
- __ilog2() / __ilog2_u32() / __ilog2_u64()

The current functions are written as inline assembly which prevents
GCC to optimise them in case of constant parameters and obliges GCC
to group the related instructions all together.

With the builtin alternatives, GCC optimises better

Christophe Leroy (4):
  powerpc: Discard ffs()/__ffs() function and use builtin functions
    instead
  powerpc: Use builtin functions for fls()/__fls()/fls64()
  powerpc: Replace ffz() by equivalent generic function
  powerpc: remove __ilog2()s and use generic ones

 arch/powerpc/Kconfig              |  8 ----
 arch/powerpc/include/asm/bitops.h | 87 ++++-----------------------------------
 2 files changed, 7 insertions(+), 88 deletions(-)

-- 
2.12.0

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/4] powerpc: Discard ffs()/__ffs() function and use builtin functions instead
  2017-04-21 11:18 [PATCH 0/4] powerpc: Replace some bitops functions by builtin/generic ones Christophe Leroy
@ 2017-04-21 11:18 ` Christophe Leroy
  2017-06-05 10:21   ` [1/4] " Michael Ellerman
  2017-04-21 11:18 ` [PATCH 2/4] powerpc: Use builtin functions for fls()/__fls()/fls64() Christophe Leroy
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 6+ messages in thread
From: Christophe Leroy @ 2017-04-21 11:18 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Scott Wood
  Cc: linux-kernel, linuxppc-dev

With the ffs() function as defined in arch/powerpc/include/asm/bitops.h
GCC will not optimise the code in case of constant parameter, as shown
by the small exemple below.

int ffs_test(void)
{
	return 4 << ffs(31);
}

c0012334 <ffs_test>:
c0012334:       39 20 00 01     li      r9,1
c0012338:       38 60 00 04     li      r3,4
c001233c:       7d 29 00 34     cntlzw  r9,r9
c0012340:       21 29 00 20     subfic  r9,r9,32
c0012344:       7c 63 48 30     slw     r3,r3,r9
c0012348:       4e 80 00 20     blr

With this patch, the same function will compile as follows:

c0012334 <ffs_test>:
c0012334:       38 60 00 08     li      r3,8
c0012338:       4e 80 00 20     blr

The same happens with __ffs()

For non constant calls, the generated code is doing the same,
allthought it is slightly different on 64 bits for ffs():

unsigned long test__ffs(unsigned long x)
{
	return __ffs(x);
}

int testffs(int x)
{
	return ffs(x);
}

On PPC32, before the patch:
0000003c <test__ffs>:
  3c:	7d 23 00 d0 	neg     r9,r3
  40:	7d 23 18 38 	and     r3,r9,r3
  44:	7c 63 00 34 	cntlzw  r3,r3
  48:	20 63 00 1f 	subfic  r3,r3,31
  4c:	4e 80 00 20 	blr

00000050 <testffs>:
  50:	7d 23 00 d0 	neg     r9,r3
  54:	7d 23 18 38 	and     r3,r9,r3
  58:	7c 63 00 34 	cntlzw  r3,r3
  5c:	20 63 00 20 	subfic  r3,r3,32
  60:	4e 80 00 20 	blr

On PPC32, after the patch:
0000002c <test__ffs>:
  2c:	7d 23 00 d0 	neg     r9,r3
  30:	7d 23 18 38 	and     r3,r9,r3
  34:	7c 63 00 34 	cntlzw  r3,r3
  38:	20 63 00 1f 	subfic  r3,r3,31
  3c:	4e 80 00 20 	blr

00000040 <testffs>:
  40:	7d 23 00 d0 	neg     r9,r3
  44:	7d 23 18 38 	and     r3,r9,r3
  48:	7c 63 00 34 	cntlzw  r3,r3
  4c:	20 63 00 20 	subfic  r3,r3,32
  50:	4e 80 00 20 	blr

On PPC64, before the patch:
0000000000000060 <.test__ffs>:
  60:	7c 03 00 d0 	neg     r0,r3
  64:	7c 03 18 38 	and     r3,r0,r3
  68:	7c 63 00 74 	cntlzd  r3,r3
  6c:	20 63 00 3f 	subfic  r3,r3,63
  70:	7c 63 07 b4 	extsw   r3,r3
  74:	4e 80 00 20 	blr

0000000000000080 <.testffs>:
  80:	7c 03 00 d0 	neg     r0,r3
  84:	7c 03 18 38 	and     r3,r0,r3
  88:	7c 63 00 74 	cntlzd  r3,r3
  8c:	20 63 00 40 	subfic  r3,r3,64
  90:	7c 63 07 b4 	extsw   r3,r3
  94:	4e 80 00 20 	blr

On PPC64, after the patch:
0000000000000050 <.test__ffs>:
  50:	7c 03 00 d0 	neg     r0,r3
  54:	7c 03 18 38 	and     r3,r0,r3
  58:	7c 63 00 74 	cntlzd  r3,r3
  5c:	20 63 00 3f 	subfic  r3,r3,63
  60:	4e 80 00 20 	blr

0000000000000070 <.testffs>:
  70:	7c 03 00 d0 	neg     r0,r3
  74:	7c 03 18 38 	and     r3,r0,r3
  78:	7c 63 00 34 	cntlzw  r3,r3
  7c:	20 63 00 20 	subfic  r3,r3,32
  80:	7c 63 07 b4 	extsw   r3,r3
  84:	4e 80 00 20 	blr
(ffs() operates on an int so cntlzw is equivalent to cntlzd)

In addition, when reading the generated vmlinux, we can observe
that with the builtin functions, GCC sometimes efficiently spreads
the instructions within the generated functions while the inline
assembly force them to remain grouped together.

__builtin_ffs() is already used in arch/powerpc/include/asm/page_32.h

Those builtins have been in GCC since at least 3.4.6 (see
https://gcc.gnu.org/onlinedocs/gcc-3.4.6/gcc/Other-Builtins.html )

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/include/asm/bitops.h | 16 ++--------------
 1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/bitops.h b/arch/powerpc/include/asm/bitops.h
index 33a24fdd7958..71b05685f3a7 100644
--- a/arch/powerpc/include/asm/bitops.h
+++ b/arch/powerpc/include/asm/bitops.h
@@ -253,21 +253,9 @@ static __inline__ unsigned long ffz(unsigned long x)
 	return __ilog2(x & -x);
 }
 
-static __inline__ unsigned long __ffs(unsigned long x)
-{
-	return __ilog2(x & -x);
-}
+#include <asm-generic/bitops/builtin-__ffs.h>
 
-/*
- * ffs: find first bit set. This is defined the same way as
- * the libc and compiler builtin ffs routines, therefore
- * differs in spirit from the above ffz (man ffs).
- */
-static __inline__ int ffs(int x)
-{
-	unsigned long i = (unsigned long)x;
-	return __ilog2(i & -i) + 1;
-}
+#include <asm-generic/bitops/builtin-ffs.h>
 
 /*
  * fls: find last (most-significant) bit set.
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/4] powerpc: Use builtin functions for fls()/__fls()/fls64()
  2017-04-21 11:18 [PATCH 0/4] powerpc: Replace some bitops functions by builtin/generic ones Christophe Leroy
  2017-04-21 11:18 ` [PATCH 1/4] powerpc: Discard ffs()/__ffs() function and use builtin functions instead Christophe Leroy
@ 2017-04-21 11:18 ` Christophe Leroy
  2017-04-21 11:18 ` [PATCH 3/4] powerpc: Replace ffz() by equivalent generic function Christophe Leroy
  2017-04-21 11:18 ` [PATCH 4/4] powerpc: remove __ilog2()s and use generic ones Christophe Leroy
  3 siblings, 0 replies; 6+ messages in thread
From: Christophe Leroy @ 2017-04-21 11:18 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Scott Wood
  Cc: linux-kernel, linuxppc-dev

With the fls() functions as defined in arch/powerpc/include/asm/bitops.h
GCC will not optimise the code in case of constant parameter.

This patch replaces __fls() by the builtin function, and modifies
fls() and fls64() to use builtins instead of inline assembly

For non constant calls, the generated code is doing the same:

int testfls(unsigned int x)
{
	return fls(x);
}

unsigned long test__fls(unsigned long x)
{
	return __fls(x);
}

int testfls64(__u64 x)
{
	return fls64(x);
}

On PPC32, before the patch:
00000064 <testfls>:
  64:	7c 63 00 34 	cntlzw  r3,r3
  68:	20 63 00 20 	subfic  r3,r3,32
  6c:	4e 80 00 20 	blr

00000070 <test__fls>:
  70:	7c 63 00 34 	cntlzw  r3,r3
  74:	20 63 00 1f 	subfic  r3,r3,31
  78:	4e 80 00 20 	blr

0000007c <testfls64>:
  7c:	2c 03 00 00 	cmpwi   r3,0
  80:	40 82 00 10 	bne     90 <testfls64+0x14>
  84:	7c 83 00 34 	cntlzw  r3,r4
  88:	20 63 00 20 	subfic  r3,r3,32
  8c:	4e 80 00 20 	blr
  90:	7c 63 00 34 	cntlzw  r3,r3
  94:	20 63 00 40 	subfic  r3,r3,64
  98:	4e 80 00 20 	blr

On PPC32, after the patch:
00000054 <testfls>:
  54:	7c 63 00 34 	cntlzw  r3,r3
  58:	20 63 00 20 	subfic  r3,r3,32
  5c:	4e 80 00 20 	blr

00000060 <test__fls>:
  60:	7c 63 00 34 	cntlzw  r3,r3
  64:	20 63 00 1f 	subfic  r3,r3,31
  68:	4e 80 00 20 	blr

0000006c <testfls64>:
  6c:	2c 03 00 00 	cmpwi   r3,0
  70:	41 82 00 10 	beq     80 <testfls64+0x14>
  74:	7c 63 00 34 	cntlzw  r3,r3
  78:	20 63 00 40 	subfic  r3,r3,64
  7c:	4e 80 00 20 	blr
  80:	7c 83 00 34 	cntlzw  r3,r4
  84:	20 63 00 40 	subfic  r3,r3,32
  88:	4e 80 00 20 	blr

On PPC64, before the patch:
00000000000000a0 <.testfls>:
  a0:	7c 63 00 34 	cntlzw  r3,r3
  a4:	20 63 00 20 	subfic  r3,r3,32
  a8:	7c 63 07 b4 	extsw   r3,r3
  ac:	4e 80 00 20 	blr

00000000000000b0 <.test__fls>:
  b0:	7c 63 00 74 	cntlzd  r3,r3
  b4:	20 63 00 3f 	subfic  r3,r3,63
  b8:	7c 63 07 b4 	extsw   r3,r3
  bc:	4e 80 00 20 	blr

00000000000000c0 <.testfls64>:
  c0:	7c 63 00 74 	cntlzd  r3,r3
  c4:	20 63 00 40 	subfic  r3,r3,64
  c8:	7c 63 07 b4 	extsw   r3,r3
  cc:	4e 80 00 20 	blr

On PPC64, after the patch:
0000000000000090 <.testfls>:
  90:	7c 63 00 34 	cntlzw  r3,r3
  94:	20 63 00 20 	subfic  r3,r3,32
  98:	7c 63 07 b4 	extsw   r3,r3
  9c:	4e 80 00 20 	blr

00000000000000a0 <.test__fls>:
  a0:	7c 63 00 74 	cntlzd  r3,r3
  a4:	20 63 00 3f 	subfic  r3,r3,63
  a8:	4e 80 00 20 	blr
  ac:	60 00 00 00 	nop

00000000000000b0 <.testfls64>:
  b0:	7c 63 00 74 	cntlzd  r3,r3
  b4:	20 63 00 40 	subfic  r3,r3,64
  b8:	7c 63 07 b4 	extsw   r3,r3
  bc:	4e 80 00 20 	blr

Those builtins have been in GCC since at least 3.4.6 (see
https://gcc.gnu.org/onlinedocs/gcc-3.4.6/gcc/Other-Builtins.html )

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/include/asm/bitops.h | 24 +++---------------------
 1 file changed, 3 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/bitops.h b/arch/powerpc/include/asm/bitops.h
index 71b05685f3a7..af36b404dbe8 100644
--- a/arch/powerpc/include/asm/bitops.h
+++ b/arch/powerpc/include/asm/bitops.h
@@ -263,33 +263,15 @@ static __inline__ unsigned long ffz(unsigned long x)
  */
 static __inline__ int fls(unsigned int x)
 {
-	int lz;
-
-	asm ("cntlzw %0,%1" : "=r" (lz) : "r" (x));
-	return 32 - lz;
+	return 32 - __builtin_clz(x);
 }
 
-static __inline__ unsigned long __fls(unsigned long x)
-{
-	return __ilog2(x);
-}
+#include <asm-generic/bitops/builtin-__fls.h>
 
-/*
- * 64-bit can do this using one cntlzd (count leading zeroes doubleword)
- * instruction; for 32-bit we use the generic version, which does two
- * 32-bit fls calls.
- */
-#ifdef __powerpc64__
 static __inline__ int fls64(__u64 x)
 {
-	int lz;
-
-	asm ("cntlzd %0,%1" : "=r" (lz) : "r" (x));
-	return 64 - lz;
+	return 64 - __builtin_clzll(x);
 }
-#else
-#include <asm-generic/bitops/fls64.h>
-#endif /* __powerpc64__ */
 
 #ifdef CONFIG_PPC64
 unsigned int __arch_hweight8(unsigned int w);
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/4] powerpc: Replace ffz() by equivalent generic function
  2017-04-21 11:18 [PATCH 0/4] powerpc: Replace some bitops functions by builtin/generic ones Christophe Leroy
  2017-04-21 11:18 ` [PATCH 1/4] powerpc: Discard ffs()/__ffs() function and use builtin functions instead Christophe Leroy
  2017-04-21 11:18 ` [PATCH 2/4] powerpc: Use builtin functions for fls()/__fls()/fls64() Christophe Leroy
@ 2017-04-21 11:18 ` Christophe Leroy
  2017-04-21 11:18 ` [PATCH 4/4] powerpc: remove __ilog2()s and use generic ones Christophe Leroy
  3 siblings, 0 replies; 6+ messages in thread
From: Christophe Leroy @ 2017-04-21 11:18 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Scott Wood
  Cc: linux-kernel, linuxppc-dev

With the ffz() function as defined in arch/powerpc/include/asm/bitops.h
GCC will not optimise the code in case of constant parameter.

This patch replaces ffz() by the generic function.

The generic ffz(x) expects to never be called with ~x == 0
as written in the comment in include/asm-generic/bitops/ffz.h
The only user of ffz() within arch/powerpc/ is
platforms/512x/mpc5121_ads_cpld.c, which checks if x is not 0xff

For non constant calls, the generated code is doing the same:

unsigned long testffz(unsigned long x)
{
	return ffz(x);
}

On PPC32, before the patch:
00000018 <testffz>:
  18:	7c 63 18 f9 	not.    r3,r3
  1c:	40 82 00 0c 	bne     28 <testffz+0x10>
  20:	38 60 00 20 	li      r3,32
  24:	4e 80 00 20 	blr
  28:	7d 23 00 d0 	neg     r9,r3
  2c:	7d 23 18 38 	and     r3,r9,r3
  30:	7c 63 00 34 	cntlzw  r3,r3
  34:	20 63 00 1f 	subfic  r3,r3,31
  38:	4e 80 00 20 	blr

On PPC32, after the patch:
00000018 <testffz>:
  18:	39 23 00 01 	addi    r9,r3,1
  1c:	7d 23 18 78 	andc    r3,r9,r3
  20:	7c 63 00 34 	cntlzw  r3,r3
  24:	20 63 00 1f 	subfic  r3,r3,31
  28:	4e 80 00 20 	blr

On PPC64, before the patch:
0000000000000030 <.testffz>:
  30:	7c 60 18 f9 	not.    r0,r3
  34:	38 60 00 40 	li      r3,64
  38:	4d 82 00 20 	beqlr
  3c:	7c 60 00 d0 	neg     r3,r0
  40:	7c 63 00 38 	and     r3,r3,r0
  44:	7c 63 00 74 	cntlzd  r3,r3
  48:	20 63 00 3f 	subfic  r3,r3,63
  4c:	7c 63 07 b4 	extsw   r3,r3
  50:	4e 80 00 20 	blr

On PPC64, after the patch:
0000000000000030 <.testffz>:
  30:	38 03 00 01 	addi    r0,r3,1
  34:	7c 03 18 78 	andc    r3,r0,r3
  38:	7c 63 00 74 	cntlzd  r3,r3
  3c:	20 63 00 3f 	subfic  r3,r3,63
  40:	4e 80 00 20 	blr

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/include/asm/bitops.h | 20 +-------------------
 1 file changed, 1 insertion(+), 19 deletions(-)

diff --git a/arch/powerpc/include/asm/bitops.h b/arch/powerpc/include/asm/bitops.h
index af36b404dbe8..d835cd697d6b 100644
--- a/arch/powerpc/include/asm/bitops.h
+++ b/arch/powerpc/include/asm/bitops.h
@@ -233,25 +233,7 @@ int __ilog2_u64(u64 n)
 }
 #endif
 
-/*
- * Determines the bit position of the least significant 0 bit in the
- * specified double word. The returned bit position will be
- * zero-based, starting from the right side (63/31 - 0).
- */
-static __inline__ unsigned long ffz(unsigned long x)
-{
-	/* no zero exists anywhere in the 8 byte area. */
-	if ((x = ~x) == 0)
-		return BITS_PER_LONG;
-
-	/*
-	 * Calculate the bit position of the least significant '1' bit in x
-	 * (since x has been changed this will actually be the least significant
-	 * '0' bit in * the original x).  Note: (x & -x) gives us a mask that
-	 * is the least significant * (RIGHT-most) 1-bit of the value in x.
-	 */
-	return __ilog2(x & -x);
-}
+#include <asm-generic/bitops/ffz.h>
 
 #include <asm-generic/bitops/builtin-__ffs.h>
 
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 4/4] powerpc: remove __ilog2()s and use generic ones
  2017-04-21 11:18 [PATCH 0/4] powerpc: Replace some bitops functions by builtin/generic ones Christophe Leroy
                   ` (2 preceding siblings ...)
  2017-04-21 11:18 ` [PATCH 3/4] powerpc: Replace ffz() by equivalent generic function Christophe Leroy
@ 2017-04-21 11:18 ` Christophe Leroy
  3 siblings, 0 replies; 6+ messages in thread
From: Christophe Leroy @ 2017-04-21 11:18 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Scott Wood
  Cc: linux-kernel, linuxppc-dev

With the __ilog2() function as defined in
arch/powerpc/include/asm/bitops.h, GCC will not optimise the code
in case of constant parameter.

The generic ilog2() function in include/linux/log2.h is written
to handle the case of the constant parameter.

This patch discards the three __ilog2() functions and
defines __ilog2() as ilog2()

For non constant calls, the generated code is doing the same:
int test__ilog2(unsigned long x)
{
	return __ilog2(x);
}

int test__ilog2_u32(u32 n)
{
	return __ilog2_u32(n);
}

int test__ilog2_u64(u64 n)
{
	return __ilog2_u64(n);
}

On PPC32 before the patch:
00000000 <test__ilog2>:
   0:	7c 63 00 34 	cntlzw  r3,r3
   4:	20 63 00 1f 	subfic  r3,r3,31
   8:	4e 80 00 20 	blr

0000000c <test__ilog2_u32>:
   c:	7c 63 00 34 	cntlzw  r3,r3
  10:	20 63 00 1f 	subfic  r3,r3,31
  14:	4e 80 00 20 	blr

On PPC32 after the patch:
00000000 <test__ilog2>:
   0:	7c 63 00 34 	cntlzw  r3,r3
   4:	20 63 00 1f 	subfic  r3,r3,31
   8:	4e 80 00 20 	blr

0000000c <test__ilog2_u32>:
   c:	7c 63 00 34 	cntlzw  r3,r3
  10:	20 63 00 1f 	subfic  r3,r3,31
  14:	4e 80 00 20 	blr

On PPC64 before the patch:
0000000000000000 <.test__ilog2>:
   0:	7c 63 00 74 	cntlzd  r3,r3
   4:	20 63 00 3f 	subfic  r3,r3,63
   8:	7c 63 07 b4 	extsw   r3,r3
   c:	4e 80 00 20 	blr

0000000000000010 <.test__ilog2_u32>:
  10:	7c 63 00 34 	cntlzw  r3,r3
  14:	20 63 00 1f 	subfic  r3,r3,31
  18:	7c 63 07 b4 	extsw   r3,r3
  1c:	4e 80 00 20 	blr

0000000000000020 <.test__ilog2_u64>:
  20:	7c 63 00 74 	cntlzd  r3,r3
  24:	20 63 00 3f 	subfic  r3,r3,63
  28:	7c 63 07 b4 	extsw   r3,r3
  2c:	4e 80 00 20 	blr

On PPC64 after the patch:
0000000000000000 <.test__ilog2>:
   0:	7c 63 00 74 	cntlzd  r3,r3
   4:	20 63 00 3f 	subfic  r3,r3,63
   8:	7c 63 07 b4 	extsw   r3,r3
   c:	4e 80 00 20 	blr

0000000000000010 <.test__ilog2_u32>:
  10:	7c 63 00 34 	cntlzw  r3,r3
  14:	20 63 00 1f 	subfic  r3,r3,31
  18:	7c 63 07 b4 	extsw   r3,r3
  1c:	4e 80 00 20 	blr

0000000000000020 <.test__ilog2_u64>:
  20:	7c 63 00 74 	cntlzd  r3,r3
  24:	20 63 00 3f 	subfic  r3,r3,63
  28:	7c 63 07 b4 	extsw   r3,r3
  2c:	4e 80 00 20 	blr

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/Kconfig              |  8 --------
 arch/powerpc/include/asm/bitops.h | 27 +--------------------------
 2 files changed, 1 insertion(+), 34 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2bfd49098c5e..f231b0f190b5 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -62,14 +62,6 @@ config GENERIC_LOCKBREAK
 	default y
 	depends on SMP && PREEMPT
 
-config ARCH_HAS_ILOG2_U32
-	bool
-	default y
-
-config ARCH_HAS_ILOG2_U64
-	bool
-	default y if 64BIT
-
 config GENERIC_HWEIGHT
 	bool
 	default y
diff --git a/arch/powerpc/include/asm/bitops.h b/arch/powerpc/include/asm/bitops.h
index d835cd697d6b..b750ffef83c7 100644
--- a/arch/powerpc/include/asm/bitops.h
+++ b/arch/powerpc/include/asm/bitops.h
@@ -206,32 +206,7 @@ static __inline__ void __clear_bit_unlock(int nr, volatile unsigned long *addr)
  * Return the zero-based bit position (LE, not IBM bit numbering) of
  * the most significant 1-bit in a double word.
  */
-static __inline__ __attribute__((const))
-int __ilog2(unsigned long x)
-{
-	int lz;
-
-	asm (PPC_CNTLZL "%0,%1" : "=r" (lz) : "r" (x));
-	return BITS_PER_LONG - 1 - lz;
-}
-
-static inline __attribute__((const))
-int __ilog2_u32(u32 n)
-{
-	int bit;
-	asm ("cntlzw %0,%1" : "=r" (bit) : "r" (n));
-	return 31 - bit;
-}
-
-#ifdef __powerpc64__
-static inline __attribute__((const))
-int __ilog2_u64(u64 n)
-{
-	int bit;
-	asm ("cntlzd %0,%1" : "=r" (bit) : "r" (n));
-	return 63 - bit;
-}
-#endif
+#define __ilog2(x)	ilog2(x)
 
 #include <asm-generic/bitops/ffz.h>
 
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [1/4] powerpc: Discard ffs()/__ffs() function and use builtin functions instead
  2017-04-21 11:18 ` [PATCH 1/4] powerpc: Discard ffs()/__ffs() function and use builtin functions instead Christophe Leroy
@ 2017-06-05 10:21   ` Michael Ellerman
  0 siblings, 0 replies; 6+ messages in thread
From: Michael Ellerman @ 2017-06-05 10:21 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras, Scott Wood
  Cc: linuxppc-dev, linux-kernel

On Fri, 2017-04-21 at 11:18:46 UTC, Christophe Leroy wrote:
> With the ffs() function as defined in arch/powerpc/include/asm/bitops.h
> GCC will not optimise the code in case of constant parameter, as shown
> by the small exemple below.
...
> 
> In addition, when reading the generated vmlinux, we can observe
> that with the builtin functions, GCC sometimes efficiently spreads
> the instructions within the generated functions while the inline
> assembly force them to remain grouped together.
> 
> __builtin_ffs() is already used in arch/powerpc/include/asm/page_32.h
> 
> Those builtins have been in GCC since at least 3.4.6 (see
> https://gcc.gnu.org/onlinedocs/gcc-3.4.6/gcc/Other-Builtins.html )
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/f83647d642270f6b9d75736817fb5a

cheers

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-06-05 10:22 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-21 11:18 [PATCH 0/4] powerpc: Replace some bitops functions by builtin/generic ones Christophe Leroy
2017-04-21 11:18 ` [PATCH 1/4] powerpc: Discard ffs()/__ffs() function and use builtin functions instead Christophe Leroy
2017-06-05 10:21   ` [1/4] " Michael Ellerman
2017-04-21 11:18 ` [PATCH 2/4] powerpc: Use builtin functions for fls()/__fls()/fls64() Christophe Leroy
2017-04-21 11:18 ` [PATCH 3/4] powerpc: Replace ffz() by equivalent generic function Christophe Leroy
2017-04-21 11:18 ` [PATCH 4/4] powerpc: remove __ilog2()s and use generic ones Christophe Leroy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.