* [PATCH 0/3] Bitmap optimisations @ 2017-06-07 14:29 Matthew Wilcox 2017-06-07 14:29 ` [PATCH 1/3] bitmap: Optimise bitmap_set and bitmap_clear of a single bit Matthew Wilcox ` (3 more replies) 0 siblings, 4 replies; 13+ messages in thread From: Matthew Wilcox @ 2017-06-07 14:29 UTC (permalink / raw) To: linux-kernel Cc: Andrew Morton, Martin Schwidefsky, Rasmus Villemoes, Matthew Wilcox From: Matthew Wilcox <mawilcox@microsoft.com> These three bitmap patches use more efficient specialisations when the compiler can figure out that it's safe to do so. Matthew Wilcox (3): bitmap: Optimise bitmap_set and bitmap_clear of a single bit Turn bitmap_set and bitmap_clear into memset when possible bitmap: Use memcmp optimisation in more situations include/linux/bitmap.h | 33 +++++++++++++++++++++++++++------ lib/bitmap.c | 8 ++++---- 2 files changed, 31 insertions(+), 10 deletions(-) -- 2.11.0 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/3] bitmap: Optimise bitmap_set and bitmap_clear of a single bit 2017-06-07 14:29 [PATCH 0/3] Bitmap optimisations Matthew Wilcox @ 2017-06-07 14:29 ` Matthew Wilcox 2017-06-07 14:29 ` [PATCH 2/3] Turn bitmap_set and bitmap_clear into memset when possible Matthew Wilcox ` (2 subsequent siblings) 3 siblings, 0 replies; 13+ messages in thread From: Matthew Wilcox @ 2017-06-07 14:29 UTC (permalink / raw) To: linux-kernel Cc: Andrew Morton, Martin Schwidefsky, Rasmus Villemoes, Matthew Wilcox From: Matthew Wilcox <mawilcox@microsoft.com> We have eight users calling bitmap_clear for a single bit and seventeen calling bitmap_set for a single bit. Rather than fix all of them to call __clear_bit or __set_bit, turn bitmap_clear and bitmap_set into inline functions and make this special case efficient. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> --- include/linux/bitmap.h | 23 ++++++++++++++++++++--- lib/bitmap.c | 8 ++++---- 2 files changed, 24 insertions(+), 7 deletions(-) diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index 3b77588a9360..4e0f0c8167af 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -112,9 +112,8 @@ extern int __bitmap_intersects(const unsigned long *bitmap1, extern int __bitmap_subset(const unsigned long *bitmap1, const unsigned long *bitmap2, unsigned int nbits); extern int __bitmap_weight(const unsigned long *bitmap, unsigned int nbits); - -extern void bitmap_set(unsigned long *map, unsigned int start, int len); -extern void bitmap_clear(unsigned long *map, unsigned int start, int len); +extern void __bitmap_set(unsigned long *map, unsigned int start, int len); +extern void __bitmap_clear(unsigned long *map, unsigned int start, int len); extern unsigned long bitmap_find_next_zero_area_off(unsigned long *map, unsigned long size, @@ -315,6 +314,24 @@ static __always_inline int bitmap_weight(const unsigned long *src, unsigned int return __bitmap_weight(src, nbits); } +static __always_inline void bitmap_set(unsigned long *map, unsigned int start, + unsigned int nbits) +{ + if (__builtin_constant_p(nbits) && nbits == 1) + __set_bit(start, map); + else + __bitmap_set(map, start, nbits); +} + +static __always_inline void bitmap_clear(unsigned long *map, unsigned int start, + unsigned int nbits) +{ + if (__builtin_constant_p(nbits) && nbits == 1) + __clear_bit(start, map); + else + __bitmap_clear(map, start, nbits); +} + static inline void bitmap_shift_right(unsigned long *dst, const unsigned long *src, unsigned int shift, int nbits) { diff --git a/lib/bitmap.c b/lib/bitmap.c index 0b66f0e5eb6b..6e354a6d0dbb 100644 --- a/lib/bitmap.c +++ b/lib/bitmap.c @@ -251,7 +251,7 @@ int __bitmap_weight(const unsigned long *bitmap, unsigned int bits) } EXPORT_SYMBOL(__bitmap_weight); -void bitmap_set(unsigned long *map, unsigned int start, int len) +void __bitmap_set(unsigned long *map, unsigned int start, int len) { unsigned long *p = map + BIT_WORD(start); const unsigned int size = start + len; @@ -270,9 +270,9 @@ void bitmap_set(unsigned long *map, unsigned int start, int len) *p |= mask_to_set; } } -EXPORT_SYMBOL(bitmap_set); +EXPORT_SYMBOL(__bitmap_set); -void bitmap_clear(unsigned long *map, unsigned int start, int len) +void __bitmap_clear(unsigned long *map, unsigned int start, int len) { unsigned long *p = map + BIT_WORD(start); const unsigned int size = start + len; @@ -291,7 +291,7 @@ void bitmap_clear(unsigned long *map, unsigned int start, int len) *p &= ~mask_to_clear; } } -EXPORT_SYMBOL(bitmap_clear); +EXPORT_SYMBOL(__bitmap_clear); /** * bitmap_find_next_zero_area_off - find a contiguous aligned zero area -- 2.11.0 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/3] Turn bitmap_set and bitmap_clear into memset when possible 2017-06-07 14:29 [PATCH 0/3] Bitmap optimisations Matthew Wilcox 2017-06-07 14:29 ` [PATCH 1/3] bitmap: Optimise bitmap_set and bitmap_clear of a single bit Matthew Wilcox @ 2017-06-07 14:29 ` Matthew Wilcox 2017-06-07 21:16 ` Rasmus Villemoes 2017-06-07 14:29 ` [PATCH 3/3] bitmap: Use memcmp optimisation in more situations Matthew Wilcox 2017-06-07 21:37 ` [PATCH 0/3] Bitmap optimisations Rasmus Villemoes 3 siblings, 1 reply; 13+ messages in thread From: Matthew Wilcox @ 2017-06-07 14:29 UTC (permalink / raw) To: linux-kernel Cc: Andrew Morton, Martin Schwidefsky, Rasmus Villemoes, Matthew Wilcox From: Matthew Wilcox <mawilcox@microsoft.com> Several callers have constant 'start' and an 'nbits' that is a multiple of 8, so we can turn them into calls to memset. We don't need the entirety of 'start' and 'nbits' to be constant, we just need to know whether they're divisible by 8. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> --- include/linux/bitmap.h | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index 4e0f0c8167af..0b3e4452b054 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -319,6 +319,9 @@ static __always_inline void bitmap_set(unsigned long *map, unsigned int start, { if (__builtin_constant_p(nbits) && nbits == 1) __set_bit(start, map); + else if (__builtin_constant_p(start & 7) && IS_ALIGNED(start, 8) && + __builtin_constant_p(nbits & 7) && IS_ALIGNED(nbits, 8)) + memset(map + start / 8, 0xff, nbits / 8); else __bitmap_set(map, start, nbits); } @@ -328,6 +331,9 @@ static __always_inline void bitmap_clear(unsigned long *map, unsigned int start, { if (__builtin_constant_p(nbits) && nbits == 1) __clear_bit(start, map); + else if (__builtin_constant_p(start & 7) && IS_ALIGNED(start, 8) && + __builtin_constant_p(nbits & 7) && IS_ALIGNED(nbits, 8)) + memset(map + start / 8, 0, nbits / 8); else __bitmap_clear(map, start, nbits); } -- 2.11.0 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] Turn bitmap_set and bitmap_clear into memset when possible 2017-06-07 14:29 ` [PATCH 2/3] Turn bitmap_set and bitmap_clear into memset when possible Matthew Wilcox @ 2017-06-07 21:16 ` Rasmus Villemoes 2017-06-27 7:02 ` Matthew Wilcox 0 siblings, 1 reply; 13+ messages in thread From: Rasmus Villemoes @ 2017-06-07 21:16 UTC (permalink / raw) To: Matthew Wilcox Cc: linux-kernel, Andrew Morton, Martin Schwidefsky, Matthew Wilcox On Wed, Jun 07 2017, Matthew Wilcox <willy@infradead.org> wrote: > From: Matthew Wilcox <mawilcox@microsoft.com> > > Several callers have constant 'start' and an 'nbits' that is a multiple of > 8, so we can turn them into calls to memset. We don't need the entirety > of 'start' and 'nbits' to be constant, we just need to know whether > they're divisible by 8. > > Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> > --- > include/linux/bitmap.h | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h > index 4e0f0c8167af..0b3e4452b054 100644 > --- a/include/linux/bitmap.h > +++ b/include/linux/bitmap.h > @@ -319,6 +319,9 @@ static __always_inline void bitmap_set(unsigned long *map, unsigned int start, > { > if (__builtin_constant_p(nbits) && nbits == 1) > __set_bit(start, map); > + else if (__builtin_constant_p(start & 7) && IS_ALIGNED(start, 8) && > + __builtin_constant_p(nbits & 7) && IS_ALIGNED(nbits, 8)) > + memset(map + start / 8, 0xff, nbits / 8); > else Isn't the pointer arithmetic wrong here? I think you need to cast map to (char*). > > __bitmap_set(map, start, nbits); > } > @@ -328,6 +331,9 @@ static __always_inline void bitmap_clear(unsigned long *map, unsigned int start, > { > if (__builtin_constant_p(nbits) && nbits == 1) > __clear_bit(start, map); > + else if (__builtin_constant_p(start & 7) && IS_ALIGNED(start, 8) && > + __builtin_constant_p(nbits & 7) && IS_ALIGNED(nbits, 8)) > + memset(map + start / 8, 0, nbits / 8); > else Ditto. Do you have an example of how the generated code changes, both in the case of actual constants and a case where gcc can see that start and nbits are byte-aligned without knowing their actual values? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] Turn bitmap_set and bitmap_clear into memset when possible 2017-06-07 21:16 ` Rasmus Villemoes @ 2017-06-27 7:02 ` Matthew Wilcox 2017-06-27 7:58 ` Rasmus Villemoes 0 siblings, 1 reply; 13+ messages in thread From: Matthew Wilcox @ 2017-06-27 7:02 UTC (permalink / raw) To: Rasmus Villemoes Cc: linux-kernel, Andrew Morton, Martin Schwidefsky, Matthew Wilcox On Wed, Jun 07, 2017 at 11:16:37PM +0200, Rasmus Villemoes wrote: > On Wed, Jun 07 2017, Matthew Wilcox <willy@infradead.org> wrote: > > @@ -319,6 +319,9 @@ static __always_inline void bitmap_set(unsigned long *map, unsigned int start, > > { > > if (__builtin_constant_p(nbits) && nbits == 1) > > __set_bit(start, map); > > + else if (__builtin_constant_p(start & 7) && IS_ALIGNED(start, 8) && > > + __builtin_constant_p(nbits & 7) && IS_ALIGNED(nbits, 8)) > > + memset(map + start / 8, 0xff, nbits / 8); > > else > > Isn't the pointer arithmetic wrong here? I think you need to cast map to > (char*). Good catch. I've added a test case to test_bitmap.c and it catches the problem. > Do you have an example of how the generated code changes, both in the > case of actual constants and a case where gcc can see that start and > nbits are byte-aligned without knowing their actual values? Here's an example of start == 0 and nbits being clearly a multiple of 8, courtesy of btrfs's extent-io-tests.c. Before: f1: 48 8b 5c 24 10 mov 0x10(%rsp),%rbx f6: 4c 8b 3c 24 mov (%rsp),%r15 fa: 31 f6 xor %esi,%esi fc: 44 8d 34 dd 00 00 00 lea 0x0(,%rbx,8),%r14d 103: 00 104: 4c 8d 2c dd 00 00 00 lea 0x0(,%rbx,8),%r13 10b: 00 10c: 4c 89 ff mov %r15,%rdi 10f: 44 89 f2 mov %r14d,%edx 112: e8 00 00 00 00 callq 117 <__test_eb_bitmaps+0x77> 113: R_X86_64_PC32 bitmap_set-0x4 117: 31 d2 xor %edx,%edx 119: 31 f6 xor %esi,%esi 11b: 4c 89 e9 mov %r13,%rcx 11e: 4c 89 e7 mov %r12,%rdi 121: e8 00 00 00 00 callq 126 <__test_eb_bitmaps+0x86> 122: R_X86_64_PC32 extent_buffer_bitmap_set-0x4 After: f1: 4c 8b 7c 24 10 mov 0x10(%rsp),%r15 f6: be ff 00 00 00 mov $0xff,%esi fb: 48 89 df mov %rbx,%rdi fe: 4d 89 fe mov %r15,%r14 101: 4e 8d 2c fd 00 00 00 lea 0x0(,%r15,8),%r13 108: 00 109: 41 81 e6 ff ff ff 1f and $0x1fffffff,%r14d 110: 4c 89 f2 mov %r14,%rdx 113: e8 00 00 00 00 callq 118 <__test_eb_bitmaps+0x78> 114: R_X86_64_PC32 memset-0x4 118: 48 8b 2c 24 mov (%rsp),%rbp 11c: 31 d2 xor %edx,%edx 11e: 31 f6 xor %esi,%esi 120: 4c 89 e9 mov %r13,%rcx 123: 48 89 ef mov %rbp,%rdi 126: e8 00 00 00 00 callq 12b <__test_eb_bitmaps+0x8b> 127: R_X86_64_PC32 extent_buffer_bitmap_set-0x4 It's actually slightly less efficient in the caller (although obviously memset() is going to execute faster than bitmap_set()). Partly because x86 made some odd choices about the behaviour of an 8-bit move instruction (it leaves the upper 24 bits intact, rather than zeroing them, so gcc has to use a 32-bit move instruction to put 0xff into the second argument to memset()), and partly because gcc is seeing us do: unsigned long len; len = len * 8 / 8; so it's (very reasonably) transforming that into len &= 0x1fff'ffff'ffff'ffff; That's partly a quirk of how this particular function is written. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] Turn bitmap_set and bitmap_clear into memset when possible 2017-06-27 7:02 ` Matthew Wilcox @ 2017-06-27 7:58 ` Rasmus Villemoes 0 siblings, 0 replies; 13+ messages in thread From: Rasmus Villemoes @ 2017-06-27 7:58 UTC (permalink / raw) To: Matthew Wilcox Cc: linux-kernel, Andrew Morton, Martin Schwidefsky, Matthew Wilcox > It's actually slightly less efficient in the caller (although obviously > memset() is going to execute faster than bitmap_set()). Partly because > x86 made some odd choices about the behaviour of an 8-bit move instruction > (it leaves the upper 24 bits intact, rather than zeroing them, so gcc > has to use a 32-bit move instruction to put 0xff into the second argument > to memset()), Heh, I thought gcc knew and made full use of the semantics of memset, so that only the low byte matters. I suppose there might be architectures where passing -1 is slightly cheaper (at least in code size) than 255... [quick checking] indeed, on x86_64, there's no change in the generated code, but on 32 bit, gcc ends up doing 6a ff push $0xffffffff instead of 68 ff 00 00 00 push $0xff ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 3/3] bitmap: Use memcmp optimisation in more situations 2017-06-07 14:29 [PATCH 0/3] Bitmap optimisations Matthew Wilcox 2017-06-07 14:29 ` [PATCH 1/3] bitmap: Optimise bitmap_set and bitmap_clear of a single bit Matthew Wilcox 2017-06-07 14:29 ` [PATCH 2/3] Turn bitmap_set and bitmap_clear into memset when possible Matthew Wilcox @ 2017-06-07 14:29 ` Matthew Wilcox 2017-06-08 1:48 ` Andy Shevchenko 2017-06-07 21:37 ` [PATCH 0/3] Bitmap optimisations Rasmus Villemoes 3 siblings, 1 reply; 13+ messages in thread From: Matthew Wilcox @ 2017-06-07 14:29 UTC (permalink / raw) To: linux-kernel Cc: Andrew Morton, Martin Schwidefsky, Rasmus Villemoes, Matthew Wilcox From: Matthew Wilcox <mawilcox@microsoft.com> Commit 7dd968163f ("bitmap: bitmap_equal memcmp optimization") was rather more restrictive than necessary; we can use memcmp() to implement bitmap_equal() as long as the number of bits can be proved to be a multiple of 8. And architectures other than s390 may be able to make good use of this optimisation. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> --- include/linux/bitmap.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index 0b3e4452b054..26244e0098f0 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -266,10 +266,8 @@ static inline int bitmap_equal(const unsigned long *src1, { if (small_const_nbits(nbits)) return !((*src1 ^ *src2) & BITMAP_LAST_WORD_MASK(nbits)); -#ifdef CONFIG_S390 - if (__builtin_constant_p(nbits) && (nbits % BITS_PER_LONG) == 0) + if (__builtin_constant_p(nbits & 7) && IS_ALIGNED(nbits, 8)) return !memcmp(src1, src2, nbits / 8); -#endif return __bitmap_equal(src1, src2, nbits); } -- 2.11.0 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 3/3] bitmap: Use memcmp optimisation in more situations 2017-06-07 14:29 ` [PATCH 3/3] bitmap: Use memcmp optimisation in more situations Matthew Wilcox @ 2017-06-08 1:48 ` Andy Shevchenko 2017-06-08 2:55 ` Matthew Wilcox 0 siblings, 1 reply; 13+ messages in thread From: Andy Shevchenko @ 2017-06-08 1:48 UTC (permalink / raw) To: Matthew Wilcox Cc: linux-kernel, Andrew Morton, Martin Schwidefsky, Rasmus Villemoes, Matthew Wilcox On Wed, Jun 7, 2017 at 5:29 PM, Matthew Wilcox <willy@infradead.org> wrote: > Commit 7dd968163f ("bitmap: bitmap_equal memcmp optimization") was > rather more restrictive than necessary; we can use memcmp() to implement > bitmap_equal() as long as the number of bits can be proved to be a > multiple of 8. And architectures other than s390 may be able to make > good use of this optimisation. > - if (__builtin_constant_p(nbits) && (nbits % BITS_PER_LONG) == 0) > + if (__builtin_constant_p(nbits & 7) && IS_ALIGNED(nbits, 8)) > return !memcmp(src1, src2, nbits / 8); I'm not sure this is a fully correct change. What exactly ' & 7' part does? For me looks like you may just drop it. -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 3/3] bitmap: Use memcmp optimisation in more situations 2017-06-08 1:48 ` Andy Shevchenko @ 2017-06-08 2:55 ` Matthew Wilcox 2017-06-08 12:31 ` Andy Shevchenko 0 siblings, 1 reply; 13+ messages in thread From: Matthew Wilcox @ 2017-06-08 2:55 UTC (permalink / raw) To: Andy Shevchenko Cc: linux-kernel, Andrew Morton, Martin Schwidefsky, Rasmus Villemoes, Matthew Wilcox On Thu, Jun 08, 2017 at 04:48:04AM +0300, Andy Shevchenko wrote: > > Commit 7dd968163f ("bitmap: bitmap_equal memcmp optimization") was > > rather more restrictive than necessary; we can use memcmp() to implement > > bitmap_equal() as long as the number of bits can be proved to be a > > multiple of 8. And architectures other than s390 may be able to make > > good use of this optimisation. > > > - if (__builtin_constant_p(nbits) && (nbits % BITS_PER_LONG) == 0) > > + if (__builtin_constant_p(nbits & 7) && IS_ALIGNED(nbits, 8)) > > return !memcmp(src1, src2, nbits / 8); > > I'm not sure this is a fully correct change. > What exactly ' & 7' part does? > For me looks like you may just drop it. We only need to know if the bottom 3 bits are 0 to apply this optimisation. For example, if we have a user which does this: nbits = 8; if (argle) nbits += 8; if (bitmap_equal(ptr1, ptr2, nbits)) blah(); then we can use memcmp() because gcc can deduce that the bottom 3 bits are never set (try it! it works!). We don't need nbits as a whole to be const. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 3/3] bitmap: Use memcmp optimisation in more situations 2017-06-08 2:55 ` Matthew Wilcox @ 2017-06-08 12:31 ` Andy Shevchenko 2017-06-08 13:43 ` Rasmus Villemoes 0 siblings, 1 reply; 13+ messages in thread From: Andy Shevchenko @ 2017-06-08 12:31 UTC (permalink / raw) To: Matthew Wilcox Cc: linux-kernel, Andrew Morton, Martin Schwidefsky, Rasmus Villemoes, Matthew Wilcox On Thu, Jun 8, 2017 at 5:55 AM, Matthew Wilcox <willy@infradead.org> wrote: > On Thu, Jun 08, 2017 at 04:48:04AM +0300, Andy Shevchenko wrote: >> > Commit 7dd968163f ("bitmap: bitmap_equal memcmp optimization") was >> > rather more restrictive than necessary; we can use memcmp() to implement >> > bitmap_equal() as long as the number of bits can be proved to be a >> > multiple of 8. And architectures other than s390 may be able to make >> > good use of this optimisation. >> >> > - if (__builtin_constant_p(nbits) && (nbits % BITS_PER_LONG) == 0) >> > + if (__builtin_constant_p(nbits & 7) && IS_ALIGNED(nbits, 8)) >> > return !memcmp(src1, src2, nbits / 8); >> >> I'm not sure this is a fully correct change. >> What exactly ' & 7' part does? >> For me looks like you may just drop it. > > We only need to know if the bottom 3 bits are 0 to apply this optimisation. > For example, if we have a user which does this: > > nbits = 8; > if (argle) > nbits += 8; > if (bitmap_equal(ptr1, ptr2, nbits)) > blah(); > > then we can use memcmp() because gcc can deduce that the bottom 3 bits > are never set (try it! it works!). We don't need nbits as a whole to > be const. What I'm talking about is that by my opinion the both below are equivalent. __builtin_constant_p(nbits) __builtin_constant_p(nbits & 7) Thus, again, what & 7 does there? -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 3/3] bitmap: Use memcmp optimisation in more situations 2017-06-08 12:31 ` Andy Shevchenko @ 2017-06-08 13:43 ` Rasmus Villemoes 2017-06-08 14:47 ` Andy Shevchenko 0 siblings, 1 reply; 13+ messages in thread From: Rasmus Villemoes @ 2017-06-08 13:43 UTC (permalink / raw) To: Andy Shevchenko Cc: Matthew Wilcox, linux-kernel, Andrew Morton, Martin Schwidefsky, Matthew Wilcox On 8 June 2017 at 14:31, Andy Shevchenko <andy.shevchenko@gmail.com> wrote: > On Thu, Jun 8, 2017 at 5:55 AM, Matthew Wilcox <willy@infradead.org> wrote: >> We only need to know if the bottom 3 bits are 0 to apply this optimisation. >> For example, if we have a user which does this: >> >> nbits = 8; >> if (argle) >> nbits += 8; >> if (bitmap_equal(ptr1, ptr2, nbits)) >> blah(); >> >> then we can use memcmp() because gcc can deduce that the bottom 3 bits >> are never set (try it! it works!). We don't need nbits as a whole to >> be const. > > What I'm talking about is that by my opinion the both below are equivalent. > __builtin_constant_p(nbits) > __builtin_constant_p(nbits & 7) They are not. Read Matthew's example again. Assuming that argle is something non-constant (maybe an argument to the function), the value of nbits at the time of the bitmap_equal call is _not_ a compile-time-constant. However, if the compiler is smart (which at least some versions of gcc are), the compiler may deduce that nbits is either 8 or 16; there really are no other options. Hence it _is_ statically known that nbits is divisible by 8, so the expression nbits&7 _is_ compile-time constant (0), so gcc can change the bitmap_equal call to a memcmp call. (It may then either pass a run-time value of nbits>>3 and emit a single memcmp call, or it may decide to unroll the two options, creating two memcmp calls with 1 and 2 as compile-time arguments; these may or may not then in turn be "inlined" to code doing roughly *(u8*)p1 == *(u8*)p2 and similarly for u16 casts). ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 3/3] bitmap: Use memcmp optimisation in more situations 2017-06-08 13:43 ` Rasmus Villemoes @ 2017-06-08 14:47 ` Andy Shevchenko 0 siblings, 0 replies; 13+ messages in thread From: Andy Shevchenko @ 2017-06-08 14:47 UTC (permalink / raw) To: Rasmus Villemoes Cc: Matthew Wilcox, linux-kernel, Andrew Morton, Martin Schwidefsky, Matthew Wilcox On Thu, Jun 8, 2017 at 4:43 PM, Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote: > On 8 June 2017 at 14:31, Andy Shevchenko <andy.shevchenko@gmail.com> wrote: >> On Thu, Jun 8, 2017 at 5:55 AM, Matthew Wilcox <willy@infradead.org> wrote: >>> We only need to know if the bottom 3 bits are 0 to apply this optimisation. >>> For example, if we have a user which does this: >>> >>> nbits = 8; >>> if (argle) >>> nbits += 8; >>> if (bitmap_equal(ptr1, ptr2, nbits)) >>> blah(); >>> >>> then we can use memcmp() because gcc can deduce that the bottom 3 bits >>> are never set (try it! it works!). We don't need nbits as a whole to >>> be const. >> >> What I'm talking about is that by my opinion the both below are equivalent. >> __builtin_constant_p(nbits) >> __builtin_constant_p(nbits & 7) > > They are not. Read Matthew's example again. Assuming that argle is > something non-constant (maybe an argument to the function), the value > of nbits at the time of the bitmap_equal call is _not_ a > compile-time-constant. However, if the compiler is smart (which at > least some versions of gcc are), the compiler may deduce that nbits is > either 8 or 16; there really are no other options. Hence it _is_ > statically known that nbits is divisible by 8, so the expression > nbits&7 _is_ compile-time constant (0), so gcc can change the > bitmap_equal call to a memcmp call. Yeah, thanks for detailed explanation. So, basically what we do, we consider 1. 3 LSBs _is_ constant, *and* 2. They are equal to 0. > (It may then either pass a run-time value of nbits>>3 and emit a > single memcmp call, or it may decide to unroll the two options, > creating two memcmp calls with 1 and 2 as compile-time arguments; > these may or may not then in turn be "inlined" to code doing roughly > *(u8*)p1 == *(u8*)p2 and similarly for u16 casts). -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/3] Bitmap optimisations 2017-06-07 14:29 [PATCH 0/3] Bitmap optimisations Matthew Wilcox ` (2 preceding siblings ...) 2017-06-07 14:29 ` [PATCH 3/3] bitmap: Use memcmp optimisation in more situations Matthew Wilcox @ 2017-06-07 21:37 ` Rasmus Villemoes 3 siblings, 0 replies; 13+ messages in thread From: Rasmus Villemoes @ 2017-06-07 21:37 UTC (permalink / raw) To: Matthew Wilcox Cc: linux-kernel, Andrew Morton, Martin Schwidefsky, Matthew Wilcox On Wed, Jun 07 2017, Matthew Wilcox <willy@infradead.org> wrote: > From: Matthew Wilcox <mawilcox@microsoft.com> > > These three bitmap patches use more efficient specialisations when the > compiler can figure out that it's safe to do so. > With the pointer arithmetic fixed, Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2017-06-27 7:59 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-06-07 14:29 [PATCH 0/3] Bitmap optimisations Matthew Wilcox 2017-06-07 14:29 ` [PATCH 1/3] bitmap: Optimise bitmap_set and bitmap_clear of a single bit Matthew Wilcox 2017-06-07 14:29 ` [PATCH 2/3] Turn bitmap_set and bitmap_clear into memset when possible Matthew Wilcox 2017-06-07 21:16 ` Rasmus Villemoes 2017-06-27 7:02 ` Matthew Wilcox 2017-06-27 7:58 ` Rasmus Villemoes 2017-06-07 14:29 ` [PATCH 3/3] bitmap: Use memcmp optimisation in more situations Matthew Wilcox 2017-06-08 1:48 ` Andy Shevchenko 2017-06-08 2:55 ` Matthew Wilcox 2017-06-08 12:31 ` Andy Shevchenko 2017-06-08 13:43 ` Rasmus Villemoes 2017-06-08 14:47 ` Andy Shevchenko 2017-06-07 21:37 ` [PATCH 0/3] Bitmap optimisations Rasmus Villemoes
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).