linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] find_bit: Micro-optimise find_next_*_bit
@ 2016-12-23 17:20 Matthew Wilcox
  2016-12-23 20:43 ` Rasmus Villemoes
  2016-12-24  6:43 ` Yury Norov
  0 siblings, 2 replies; 3+ messages in thread
From: Matthew Wilcox @ 2016-12-23 17:20 UTC (permalink / raw)
  To: Yury Norov, Rasmus Villemoes, George Spelvin, Akinobu Mita,
	Thomas Gleixner, Andrew Morton, linux-kernel
  Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

This saves 20 bytes on my x86-64 build, mostly due to alignment
considerations ... I think it actually saves about five bytes of
instructions.  There's really two parts to this commit.  First, the
first half of the test: (!nbits || start >= nbits) is trivially a subset
of the second half, since nbits and start are both unsigned.  Second,
while looking at the disassembly, I noticed that GCC was predicting the
branch taken.  Since this is a failure case, it's clearly the less likely
of the two branches, so add an unlikely() to override GCC's heuristics.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 lib/find_bit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/find_bit.c b/lib/find_bit.c
index 18072ea9c20e..7d4a681d625f 100644
--- a/lib/find_bit.c
+++ b/lib/find_bit.c
@@ -33,7 +33,7 @@ static unsigned long _find_next_bit(const unsigned long *addr,
 {
 	unsigned long tmp;
 
-	if (!nbits || start >= nbits)
+	if (unlikely(start >= nbits))
 		return nbits;
 
 	tmp = addr[start / BITS_PER_LONG] ^ invert;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] find_bit: Micro-optimise find_next_*_bit
  2016-12-23 17:20 [PATCH] find_bit: Micro-optimise find_next_*_bit Matthew Wilcox
@ 2016-12-23 20:43 ` Rasmus Villemoes
  2016-12-24  6:43 ` Yury Norov
  1 sibling, 0 replies; 3+ messages in thread
From: Rasmus Villemoes @ 2016-12-23 20:43 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Yury Norov, George Spelvin, Akinobu Mita, Thomas Gleixner,
	Andrew Morton, linux-kernel, Matthew Wilcox

On Fri, Dec 23 2016, Matthew Wilcox <mawilcox@linuxonhyperv.com> wrote:

> From: Matthew Wilcox <mawilcox@microsoft.com>
>
> First, the first half of the test: (!nbits || start >= nbits) is
> trivially a subset of the second half, since nbits and start are both unsigned.

Yeah, I filed that as a missed optimization bug with gcc a year ago, but
it seems that even 6.3 still does two tests - clang 3.6 is a bit
smarter. Anyway,

Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] find_bit: Micro-optimise find_next_*_bit
  2016-12-23 17:20 [PATCH] find_bit: Micro-optimise find_next_*_bit Matthew Wilcox
  2016-12-23 20:43 ` Rasmus Villemoes
@ 2016-12-24  6:43 ` Yury Norov
  1 sibling, 0 replies; 3+ messages in thread
From: Yury Norov @ 2016-12-24  6:43 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Yury Norov, Rasmus Villemoes, George Spelvin, Akinobu Mita,
	Thomas Gleixner, Andrew Morton, linux-kernel, Matthew Wilcox

Hi Mattew,

On Fri, Dec 23, 2016 at 09:20:03AM -0800, Matthew Wilcox wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>
> 
> This saves 20 bytes on my x86-64 build, mostly due to alignment
> considerations ... I think it actually saves about five bytes of
> instructions.  There's really two parts to this commit.  First, the
> first half of the test: (!nbits || start >= nbits) is trivially a subset
> of the second half, since nbits and start are both unsigned.

Yes... It's obvious... when you point it out.

ARM64 GCC compiler didn't notice it as well as me:

37 0000000000000070 <find_next_bit>:
38   70:   eb1f003f        cmp     x1, xzr
39   74:   fa421020        ccmp    x1, x2, #0x0, ne
40   78:   54000088        b.hi    88 <find_next_bit+0x18>
41   7c:   aa0103e0        mov     x0, x1
42   80:   d65f03c0        ret
43   84:   d503201f        nop
44   88:   a9bf7bfd        stp     x29, x30, [sp,#-16]!
45   8c:   910003fd        mov     x29, sp
46   90:   d2800003        mov     x3, #0x0 // #0
47   94:   97ffffdb        bl      0 <_find_next_bit.part.0>
48   98:   a8c17bfd        ldp     x29, x30, [sp],#16
49   9c:   d65f03c0        ret

> Second,
> while looking at the disassembly, I noticed that GCC was predicting the
> branch taken.  Since this is a failure case, it's clearly the less likely
> of the two branches, so add an unlikely() to override GCC's heuristics.
> 
> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
> ---
>  lib/find_bit.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/find_bit.c b/lib/find_bit.c
> index 18072ea9c20e..7d4a681d625f 100644
> --- a/lib/find_bit.c
> +++ b/lib/find_bit.c
> @@ -33,7 +33,7 @@ static unsigned long _find_next_bit(const unsigned long *addr,
>  {
>  	unsigned long tmp;
>  
> -	if (!nbits || start >= nbits)
> +	if (unlikely(start >= nbits))
>  		return nbits;
>  
>  	tmp = addr[start / BITS_PER_LONG] ^ invert;
> -- 
> 2.11.0


There's also _find_next_bit_le() with same code. I think it should be
also patched.

Acked-by: Yury Norov <ynorov@caviumnetworks.com>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-12-24  6:43 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-23 17:20 [PATCH] find_bit: Micro-optimise find_next_*_bit Matthew Wilcox
2016-12-23 20:43 ` Rasmus Villemoes
2016-12-24  6:43 ` Yury Norov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).