linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] bitmap: bitmap_equal memcmp optimization
       [not found] <1465288661-3273-1-git-send-email-schwidefsky@de.ibm.com>
@ 2016-06-07  8:37 ` Martin Schwidefsky
  2016-06-08 20:42   ` Andrew Morton
  0 siblings, 1 reply; 3+ messages in thread
From: Martin Schwidefsky @ 2016-06-07  8:37 UTC (permalink / raw)
  To: linux-arch, linux-kernel, Benjamin Herrenschmidt, David S. Miller
  Cc: Martin Schwidefsky

The bitmap_equal function has optimized code for small bitmaps with less
than BITS_PER_LONG bits. For larger bitmaps the out-of-line function
__bitmap_equal is called.

For a constant number of bits divisible by BITS_PER_LONG the memcmp
function can be used. For s390 gcc knows how to optimize this function,
memcmp calls with up to 256 bytes / 2048 bits are translated into a
single instruction.

Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---
 include/linux/bitmap.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index e9b0b9a..27bfc0b 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -267,6 +267,10 @@ static inline int bitmap_equal(const unsigned long *src1,
 {
 	if (small_const_nbits(nbits))
 		return ! ((*src1 ^ *src2) & BITMAP_LAST_WORD_MASK(nbits));
+#ifdef CONFIG_S390
+	else if (__builtin_constant_p(nbits) && (nbits % BITS_PER_LONG) == 0)
+		return !memcmp(src1, src2, nbits / 8);
+#endif
 	else
 		return __bitmap_equal(src1, src2, nbits);
 }
-- 
2.6.6

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] bitmap: bitmap_equal memcmp optimization
  2016-06-07  8:37 ` [PATCH] bitmap: bitmap_equal memcmp optimization Martin Schwidefsky
@ 2016-06-08 20:42   ` Andrew Morton
  2016-06-09  6:31     ` Martin Schwidefsky
  0 siblings, 1 reply; 3+ messages in thread
From: Andrew Morton @ 2016-06-08 20:42 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: linux-arch, linux-kernel, Benjamin Herrenschmidt, David S. Miller

On Tue,  7 Jun 2016 10:37:41 +0200 Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:

> The bitmap_equal function has optimized code for small bitmaps with less
> than BITS_PER_LONG bits. For larger bitmaps the out-of-line function
> __bitmap_equal is called.
> 
> For a constant number of bits divisible by BITS_PER_LONG the memcmp
> function can be used. For s390 gcc knows how to optimize this function,
> memcmp calls with up to 256 bytes / 2048 bits are translated into a
> single instruction.

Patch looks simple enough, although it would benefit from a comment
explaining what's special about s390.

> --- a/include/linux/bitmap.h
> +++ b/include/linux/bitmap.h
> @@ -267,6 +267,10 @@ static inline int bitmap_equal(const unsigned long *src1,
>  {
>  	if (small_const_nbits(nbits))
>  		return ! ((*src1 ^ *src2) & BITMAP_LAST_WORD_MASK(nbits));
> +#ifdef CONFIG_S390
> +	else if (__builtin_constant_p(nbits) && (nbits % BITS_PER_LONG) == 0)
> +		return !memcmp(src1, src2, nbits / 8);
> +#endif
>  	else
>  		return __bitmap_equal(src1, src2, nbits);
>  }

Those elses are a bit daffy.  This:

--- a/include/linux/bitmap.h~bitmap-bitmap_equal-memcmp-optimization-fix
+++ a/include/linux/bitmap.h
@@ -266,13 +266,12 @@ static inline int bitmap_equal(const uns
 			const unsigned long *src2, unsigned int nbits)
 {
 	if (small_const_nbits(nbits))
-		return ! ((*src1 ^ *src2) & BITMAP_LAST_WORD_MASK(nbits));
+		return !((*src1 ^ *src2) & BITMAP_LAST_WORD_MASK(nbits));
 #ifdef CONFIG_S390
-	else if (__builtin_constant_p(nbits) && (nbits % BITS_PER_LONG) == 0)
+	if (__builtin_constant_p(nbits) && (nbits % BITS_PER_LONG) == 0)
 		return !memcmp(src1, src2, nbits / 8);
 #endif
-	else
-		return __bitmap_equal(src1, src2, nbits);
+	return __bitmap_equal(src1, src2, nbits);
 }
 
 static inline int bitmap_intersects(const unsigned long *src1,
_

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] bitmap: bitmap_equal memcmp optimization
  2016-06-08 20:42   ` Andrew Morton
@ 2016-06-09  6:31     ` Martin Schwidefsky
  0 siblings, 0 replies; 3+ messages in thread
From: Martin Schwidefsky @ 2016-06-09  6:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-arch, linux-kernel, Benjamin Herrenschmidt, David S. Miller

On Wed, 8 Jun 2016 13:42:12 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Tue,  7 Jun 2016 10:37:41 +0200 Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:
> 
> > The bitmap_equal function has optimized code for small bitmaps with less
> > than BITS_PER_LONG bits. For larger bitmaps the out-of-line function
> > __bitmap_equal is called.
> > 
> > For a constant number of bits divisible by BITS_PER_LONG the memcmp
> > function can be used. For s390 gcc knows how to optimize this function,
> > memcmp calls with up to 256 bytes / 2048 bits are translated into a
> > single instruction.
> 
> Patch looks simple enough, although it would benefit from a comment
> explaining what's special about s390.

The explanation from the git commit could be transfered into a comment.

> > --- a/include/linux/bitmap.h
> > +++ b/include/linux/bitmap.h
> > @@ -267,6 +267,10 @@ static inline int bitmap_equal(const unsigned long *src1,
> >  {
> >  	if (small_const_nbits(nbits))
> >  		return ! ((*src1 ^ *src2) & BITMAP_LAST_WORD_MASK(nbits));
> > +#ifdef CONFIG_S390
> > +	else if (__builtin_constant_p(nbits) && (nbits % BITS_PER_LONG) == 0)
> > +		return !memcmp(src1, src2, nbits / 8);
> > +#endif
> >  	else
> >  		return __bitmap_equal(src1, src2, nbits);
> >  }
> 
> Those elses are a bit daffy.  This:
> 
> --- a/include/linux/bitmap.h~bitmap-bitmap_equal-memcmp-optimization-fix
> +++ a/include/linux/bitmap.h
> @@ -266,13 +266,12 @@ static inline int bitmap_equal(const uns
>  			const unsigned long *src2, unsigned int nbits)
>  {
>  	if (small_const_nbits(nbits))
> -		return ! ((*src1 ^ *src2) & BITMAP_LAST_WORD_MASK(nbits));
> +		return !((*src1 ^ *src2) & BITMAP_LAST_WORD_MASK(nbits));
>  #ifdef CONFIG_S390
> -	else if (__builtin_constant_p(nbits) && (nbits % BITS_PER_LONG) == 0)
> +	if (__builtin_constant_p(nbits) && (nbits % BITS_PER_LONG) == 0)
>  		return !memcmp(src1, src2, nbits / 8);
>  #endif
> -	else
> -		return __bitmap_equal(src1, src2, nbits);
> +	return __bitmap_equal(src1, src2, nbits);
>  }
> 
>  static inline int bitmap_intersects(const unsigned long *src1,
> _
> 

Yeah that looks better. Thanks for picking this up!

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-06-09  6:31 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1465288661-3273-1-git-send-email-schwidefsky@de.ibm.com>
2016-06-07  8:37 ` [PATCH] bitmap: bitmap_equal memcmp optimization Martin Schwidefsky
2016-06-08 20:42   ` Andrew Morton
2016-06-09  6:31     ` Martin Schwidefsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).