linux-media.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] lib/memweight.c: optimize by inlining bitmap_weight()
@ 2019-08-21  7:42 Denis Efremov
  2019-08-22  1:25 ` Andrew Morton
  2019-08-24 10:01 ` [PATCH v2] lib/memweight.c: open codes bitmap_weight() Denis Efremov
  0 siblings, 2 replies; 9+ messages in thread
From: Denis Efremov @ 2019-08-21  7:42 UTC (permalink / raw)
  To: akpm
  Cc: Denis Efremov, Akinobu Mita, Jan Kara, Matthew Wilcox,
	linux-kernel, dm-devel, linux-fsdevel, linux-media,
	Erdem Tumurov, Vladimir Shelekhov

This patch inlines bitmap_weight() call. Thus, removing the BUG_ON,
and 'longs to bits -> bits to longs' conversion by directly calling
hweight_long().

./scripts/bloat-o-meter lib/memweight.o.old lib/memweight.o.new
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-10 (-10)
Function                                     old     new   delta
memweight                                    162     152     -10

Co-developed-by: Erdem Tumurov <erdemus@gmail.com>
Co-developed-by: Vladimir Shelekhov <vshel@iis.nsk.su>
Signed-off-by: Erdem Tumurov <erdemus@gmail.com>
Signed-off-by: Vladimir Shelekhov <vshel@iis.nsk.su>
Signed-off-by: Denis Efremov <efremov@ispras.ru>
---
 lib/memweight.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/lib/memweight.c b/lib/memweight.c
index 94dd72ccaa7f..f050b2b4c5e2 100644
--- a/lib/memweight.c
+++ b/lib/memweight.c
@@ -20,11 +20,13 @@ size_t memweight(const void *ptr, size_t bytes)
 
 	longs = bytes / sizeof(long);
 	if (longs) {
-		BUG_ON(longs >= INT_MAX / BITS_PER_LONG);
-		ret += bitmap_weight((unsigned long *)bitmap,
-				longs * BITS_PER_LONG);
+		const unsigned long *bitmap_long =
+			(const unsigned long *)bitmap;
+
 		bytes -= longs * sizeof(long);
-		bitmap += longs * sizeof(long);
+		for (; longs > 0; longs--, bitmap_long++)
+			ret += hweight_long(*bitmap_long);
+		bitmap = (const unsigned char *)bitmap_long;
 	}
 	/*
 	 * The reason that this last loop is distinct from the preceding
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] lib/memweight.c: optimize by inlining bitmap_weight()
  2019-08-21  7:42 [PATCH] lib/memweight.c: optimize by inlining bitmap_weight() Denis Efremov
@ 2019-08-22  1:25 ` Andrew Morton
  2019-08-22  7:30   ` Denis Efremov
  2019-08-24 10:01 ` [PATCH v2] lib/memweight.c: open codes bitmap_weight() Denis Efremov
  1 sibling, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2019-08-22  1:25 UTC (permalink / raw)
  To: Denis Efremov
  Cc: Akinobu Mita, Jan Kara, Matthew Wilcox, linux-kernel, dm-devel,
	linux-fsdevel, linux-media, Erdem Tumurov, Vladimir Shelekhov

On Wed, 21 Aug 2019 10:42:00 +0300 Denis Efremov <efremov@ispras.ru> wrote:

> This patch inlines bitmap_weight() call.

It is better to say the patch "open codes" the bitmap_weight() call.

> Thus, removing the BUG_ON,

Why is that OK to do?

I expect all the code size improvements are from doing this?

> and 'longs to bits -> bits to longs' conversion by directly calling
> hweight_long().
> 
> ./scripts/bloat-o-meter lib/memweight.o.old lib/memweight.o.new
> add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-10 (-10)
> Function                                     old     new   delta
> memweight                                    162     152     -10
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] lib/memweight.c: optimize by inlining bitmap_weight()
  2019-08-22  1:25 ` Andrew Morton
@ 2019-08-22  7:30   ` Denis Efremov
  0 siblings, 0 replies; 9+ messages in thread
From: Denis Efremov @ 2019-08-22  7:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Akinobu Mita, Jan Kara, Matthew Wilcox, linux-kernel, dm-devel,
	linux-fsdevel, linux-media, Erdem Tumurov, Vladimir Shelekhov



On 22.08.2019 04:25, Andrew Morton wrote:
> On Wed, 21 Aug 2019 10:42:00 +0300 Denis Efremov <efremov@ispras.ru> wrote:
> 
>> This patch inlines bitmap_weight() call.
> 
> It is better to say the patch "open codes" the bitmap_weight() call.
> 
>> Thus, removing the BUG_ON,
> 
> Why is that OK to do?

BUG_ON was necessary here to check that bitmap_weight will return a correct value,
i.e. the computed weight will fit the int type: 
static __always_inline int bitmap_weight(const unsigned long *src, unsigned int nbits);

BUG_ON was added in the memweight v2
https://lore.kernel.org/lkml/20120523092113.GG10452@quack.suse.cz/
Jan Kara wrote:
>> +
>> +	for (longs = bytes / sizeof(long); longs > 0; ) {
>> +		size_t bits = min_t(size_t, INT_MAX & ~(BITS_PER_LONG - 1),
> +					longs * BITS_PER_LONG);
>  I find it highly unlikely that someone would have such a large bitmap
> (256 MB or more on 32-bit). Also the condition as you wrote it can just
> overflow so it won't have the desired effect. Just do
>	BUG_ON(longs >= ULONG_MAX / BITS_PER_LONG);
> and remove the loop completely. If someone comes with such a huge bitmap,
> the code can be modified easily (after really closely inspecting whether
> such a huge bitmap is really well justified).
>> +
>> +		w += bitmap_weight(bitmap.ptr, bits);
>> +		bytes -= bits / BITS_PER_BYTE;
>> +		bitmap.address += bits / BITS_PER_BYTE;
>> +		longs -= bits / BITS_PER_LONG;

Akinobu Mita wrote:
> The bits argument of bitmap_weight() is int type. So this should be
>
>        BUG_ON(longs >= INT_MAX / BITS_PER_LONG);

We don't need this check, since we removed the bitmap_weight call and
control the computation directly with size_t everywhere.

We could add BUG_ON(bytes >= SIZE_MAX / BITS_PER_BYTE);
at the very beginning of the function to check that the array is not
very big (>2000PiB), but it seems excessive.

> 
> I expect all the code size improvements are from doing this?

Yes, but I thought it's good to show that the total size is not
increasing because of the manual "inlining".

> 
>> and 'longs to bits -> bits to longs' conversion by directly calling
>> hweight_long().
>>
>> ./scripts/bloat-o-meter lib/memweight.o.old lib/memweight.o.new
>> add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-10 (-10)
>> Function                                     old     new   delta
>> memweight                                    162     152     -10
>>
> 

Regards,
Denis

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2] lib/memweight.c: open codes bitmap_weight()
  2019-08-21  7:42 [PATCH] lib/memweight.c: optimize by inlining bitmap_weight() Denis Efremov
  2019-08-22  1:25 ` Andrew Morton
@ 2019-08-24 10:01 ` Denis Efremov
  2019-08-25  6:11   ` Matthew Wilcox
  1 sibling, 1 reply; 9+ messages in thread
From: Denis Efremov @ 2019-08-24 10:01 UTC (permalink / raw)
  To: akpm
  Cc: Denis Efremov, Akinobu Mita, Jan Kara, linux-kernel,
	Matthew Wilcox, dm-devel, linux-fsdevel, linux-media,
	Erdem Tumurov, Vladimir Shelekhov

This patch open codes the bitmap_weight() call. The direct
invocation of hweight_long() allows to remove the BUG_ON and
excessive "longs to bits, bits to longs" conversion.

BUG_ON was required to check that bitmap_weight() will return
a correct value, i.e. the computed weight will fit the int type
of the return value. With this patch memweight() controls the
computation directly with size_t type everywhere. Thus, the BUG_ON
becomes unnecessary.

Total size reduced:
./scripts/bloat-o-meter lib/memweight.o.old lib/memweight.o.new
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-10 (-10)
Function                                     old     new   delta
memweight                                    162     152     -10

Co-developed-by: Erdem Tumurov <erdemus@gmail.com>
Signed-off-by: Erdem Tumurov <erdemus@gmail.com>
Co-developed-by: Vladimir Shelekhov <vshel@iis.nsk.su>
Signed-off-by: Vladimir Shelekhov <vshel@iis.nsk.su>
Signed-off-by: Denis Efremov <efremov@ispras.ru>
---
 lib/memweight.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/lib/memweight.c b/lib/memweight.c
index 94dd72ccaa7f..f050b2b4c5e2 100644
--- a/lib/memweight.c
+++ b/lib/memweight.c
@@ -20,11 +20,13 @@ size_t memweight(const void *ptr, size_t bytes)
 
 	longs = bytes / sizeof(long);
 	if (longs) {
-		BUG_ON(longs >= INT_MAX / BITS_PER_LONG);
-		ret += bitmap_weight((unsigned long *)bitmap,
-				longs * BITS_PER_LONG);
+		const unsigned long *bitmap_long =
+			(const unsigned long *)bitmap;
+
 		bytes -= longs * sizeof(long);
-		bitmap += longs * sizeof(long);
+		for (; longs > 0; longs--, bitmap_long++)
+			ret += hweight_long(*bitmap_long);
+		bitmap = (const unsigned char *)bitmap_long;
 	}
 	/*
 	 * The reason that this last loop is distinct from the preceding
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] lib/memweight.c: open codes bitmap_weight()
  2019-08-24 10:01 ` [PATCH v2] lib/memweight.c: open codes bitmap_weight() Denis Efremov
@ 2019-08-25  6:11   ` Matthew Wilcox
  2019-08-25 11:39     ` Denis Efremov
  0 siblings, 1 reply; 9+ messages in thread
From: Matthew Wilcox @ 2019-08-25  6:11 UTC (permalink / raw)
  To: Denis Efremov
  Cc: akpm, Akinobu Mita, Jan Kara, linux-kernel, Matthew Wilcox,
	dm-devel, linux-fsdevel, linux-media, Erdem Tumurov,
	Vladimir Shelekhov

On Sat, Aug 24, 2019 at 01:01:02PM +0300, Denis Efremov wrote:
> This patch open codes the bitmap_weight() call. The direct
> invocation of hweight_long() allows to remove the BUG_ON and
> excessive "longs to bits, bits to longs" conversion.

Honestly, that's not the problem with this function.  Take a look
at https://danluu.com/assembly-intrinsics/ for a _benchmarked_
set of problems with popcnt.

> BUG_ON was required to check that bitmap_weight() will return
> a correct value, i.e. the computed weight will fit the int type
> of the return value.

What?  No.  Look at the _arguments_ of bitmap_weight():

static __always_inline int bitmap_weight(const unsigned long *src, unsigned int nbits)

> With this patch memweight() controls the
> computation directly with size_t type everywhere. Thus, the BUG_ON
> becomes unnecessary.

Why are you bothering?  How are you allocating half a gigabyte of memory?
Why are you calling memweight() on half a gigabyte of memory?

>  	if (longs) {
> -		BUG_ON(longs >= INT_MAX / BITS_PER_LONG);
> -		ret += bitmap_weight((unsigned long *)bitmap,
> -				longs * BITS_PER_LONG);
> +		const unsigned long *bitmap_long =
> +			(const unsigned long *)bitmap;
> +
>  		bytes -= longs * sizeof(long);
> -		bitmap += longs * sizeof(long);
> +		for (; longs > 0; longs--, bitmap_long++)
> +			ret += hweight_long(*bitmap_long);
> +		bitmap = (const unsigned char *)bitmap_long;
>  	}

If you really must change anything, I'd rather see this turned into a
loop:

	while (longs) {
		unsigned int nbits;

		if (longs >= INT_MAX / BITS_PER_LONG)
			nbits = INT_MAX + 1;
		else
			nbits = longs * BITS_PER_LONG;

		ret += bitmap_weight((unsigned long *)bitmap, sz);
		bytes -= nbits / 8;
		bitmap += nbits / 8;
		longs -= nbits / BITS_PER_LONG;
	}

then we only have to use Dan Luu's optimisation in bitmap_weight()
and not in memweight() as well.

Also, why does the trailer do this:

        for (; bytes > 0; bytes--, bitmap++)
                ret += hweight8(*bitmap);

instead of calling hweight_long on *bitmap & mask?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] lib/memweight.c: open codes bitmap_weight()
  2019-08-25  6:11   ` Matthew Wilcox
@ 2019-08-25 11:39     ` Denis Efremov
  2019-08-26 18:39       ` Matthew Wilcox
  0 siblings, 1 reply; 9+ messages in thread
From: Denis Efremov @ 2019-08-25 11:39 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: akpm, Akinobu Mita, Jan Kara, linux-kernel, Matthew Wilcox,
	dm-devel, linux-fsdevel, linux-media, Erdem Tumurov,
	Vladimir Shelekhov



On 25.08.2019 09:11, Matthew Wilcox wrote:
> On Sat, Aug 24, 2019 at 01:01:02PM +0300, Denis Efremov wrote:
>> This patch open codes the bitmap_weight() call. The direct
>> invocation of hweight_long() allows to remove the BUG_ON and
>> excessive "longs to bits, bits to longs" conversion.
> 
> Honestly, that's not the problem with this function.  Take a look
> at https://danluu.com/assembly-intrinsics/ for a _benchmarked_
> set of problems with popcnt.
> 
>> BUG_ON was required to check that bitmap_weight() will return
>> a correct value, i.e. the computed weight will fit the int type
>> of the return value.
> 
> What?  No.  Look at the _arguments_ of bitmap_weight():
> 
> static __always_inline int bitmap_weight(const unsigned long *src, unsigned int nbits)

I'm not sure why it is INT_MAX then? I would expect in case we care only about arguments
something like:
 
BUG_ON(longs >= UINT_MAX / BITS_PER_LONG);

> 
>> With this patch memweight() controls the
>> computation directly with size_t type everywhere. Thus, the BUG_ON
>> becomes unnecessary.
> 
> Why are you bothering?  How are you allocating half a gigabyte of memory?
> Why are you calling memweight() on half a gigabyte of memory?
> 

No, we don't use such big arrays. However, it's possible to remove BUG_ON and make
the code more "straight". Why do we need to "artificially" limit this function
to arrays of a particular size if we can relatively simple omit this restriction?

> 
> If you really must change anything, I'd rather see this turned into a
> loop:
> 
> 	while (longs) {
> 		unsigned int nbits;
> 
> 		if (longs >= INT_MAX / BITS_PER_LONG)
> 			nbits = INT_MAX + 1;
> 		else
> 			nbits = longs * BITS_PER_LONG;
> 
> 		ret += bitmap_weight((unsigned long *)bitmap, sz);
> 		bytes -= nbits / 8;
> 		bitmap += nbits / 8;
> 		longs -= nbits / BITS_PER_LONG;
> 	}
> 
> then we only have to use Dan Luu's optimisation in bitmap_weight()
> and not in memweight() as well.

I don't know how the implementation of this optimization will look like in it's
final shape, because of different hardware/compiler issues. It looks there are
a number of different ways to do it https://arxiv.org/pdf/1611.07612.pdf, 
http://0x80.pl/articles/sse-popcount.html.

However, if it will be based on popcnt instruction I would expect that
hweight_long will also contain this intrinsics. Since version 4.9.2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62011#c13 GCC knows of the
false-dependency in popcnt and generates code to handle it
(e.g. xor https://godbolt.org/z/Q7AW_d) Thus, I would expect that it's
possible to use popcnt intrinsics in hweight_long that would be natively
optimized in all loops like "for (...) { res += hweight_long() }" without
requiring manual unrolling like in builtin_popcnt_unrolled_errata_manual
example of Dan Luu's optimization.

> 
> Also, why does the trailer do this:
> 
>         for (; bytes > 0; bytes--, bitmap++)
>                 ret += hweight8(*bitmap);
> 
> instead of calling hweight_long on *bitmap & mask?
> 

Do you mean something like this?

        longs = bytes;
        bytes = do_div(longs, sizeof(long));
        bitmap_long = (const unsigned long *)bitmap;
        if (longs) {
                for (; longs > 0; longs--, bitmap_long++)
                        ret += hweight_long(*bitmap_long);
        }
        if (bytes) {
                ret += hweight_long(*bitmap_long &
                                   ((0x1 << bytes * BITS_PER_BYTE) - 1));
        }

The *bitmap_long will lead to buffer overflow here.

Thanks,
Denis

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] lib/memweight.c: open codes bitmap_weight()
  2019-08-25 11:39     ` Denis Efremov
@ 2019-08-26 18:39       ` Matthew Wilcox
  2019-09-13 11:48         ` Denis Efremov
  0 siblings, 1 reply; 9+ messages in thread
From: Matthew Wilcox @ 2019-08-26 18:39 UTC (permalink / raw)
  To: Denis Efremov
  Cc: akpm, Akinobu Mita, Jan Kara, linux-kernel, Matthew Wilcox,
	dm-devel, linux-fsdevel, linux-media, Erdem Tumurov,
	Vladimir Shelekhov

On Sun, Aug 25, 2019 at 02:39:47PM +0300, Denis Efremov wrote:
> On 25.08.2019 09:11, Matthew Wilcox wrote:
> > On Sat, Aug 24, 2019 at 01:01:02PM +0300, Denis Efremov wrote:
> >> This patch open codes the bitmap_weight() call. The direct
> >> invocation of hweight_long() allows to remove the BUG_ON and
> >> excessive "longs to bits, bits to longs" conversion.
> > 
> > Honestly, that's not the problem with this function.  Take a look
> > at https://danluu.com/assembly-intrinsics/ for a _benchmarked_
> > set of problems with popcnt.
> > 
> >> BUG_ON was required to check that bitmap_weight() will return
> >> a correct value, i.e. the computed weight will fit the int type
> >> of the return value.
> > 
> > What?  No.  Look at the _arguments_ of bitmap_weight():
> > 
> > static __always_inline int bitmap_weight(const unsigned long *src, unsigned int nbits)
> 
> I'm not sure why it is INT_MAX then? I would expect in case we care only about arguments
> something like:
>  
> BUG_ON(longs >= UINT_MAX / BITS_PER_LONG);

People aren't always terribly consistent with INT_MAX vs UINT_MAX.
Also, bitmap_weight() should arguably return an unisnged int (it can't
legitimately return a negative value).

> >> With this patch memweight() controls the
> >> computation directly with size_t type everywhere. Thus, the BUG_ON
> >> becomes unnecessary.
> > 
> > Why are you bothering?  How are you allocating half a gigabyte of memory?
> > Why are you calling memweight() on half a gigabyte of memory?
> > 
> 
> No, we don't use such big arrays. However, it's possible to remove BUG_ON and make
> the code more "straight". Why do we need to "artificially" limit this function
> to arrays of a particular size if we can relatively simple omit this restriction?

You're not making a great case for changing the implementation of
memweight() here ...

> I don't know how the implementation of this optimization will look like in it's
> final shape, because of different hardware/compiler issues. It looks there are
> a number of different ways to do it https://arxiv.org/pdf/1611.07612.pdf, 
> http://0x80.pl/articles/sse-popcount.html.

The problem with using XMM registers is that they have to be saved/restored.
Not to mention the thermal issues caused by heavy usage of AVX instructions.

> However, if it will be based on popcnt instruction I would expect that
> hweight_long will also contain this intrinsics. Since version 4.9.2
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62011#c13 GCC knows of the
> false-dependency in popcnt and generates code to handle it

Ah!  Glad to see GCC knows about this problem and has worked around it.

> (e.g. xor https://godbolt.org/z/Q7AW_d) Thus, I would expect that it's
> possible to use popcnt intrinsics in hweight_long that would be natively
> optimized in all loops like "for (...) { res += hweight_long() }" without
> requiring manual unrolling like in builtin_popcnt_unrolled_errata_manual
> example of Dan Luu's optimization.

That might be expecting rather more from our compiler than is reasonable ...

> > 
> > Also, why does the trailer do this:
> > 
> >         for (; bytes > 0; bytes--, bitmap++)
> >                 ret += hweight8(*bitmap);
> > 
> > instead of calling hweight_long on *bitmap & mask?
> > 
> 
> Do you mean something like this?
> 
>         longs = bytes;
>         bytes = do_div(longs, sizeof(long));
>         bitmap_long = (const unsigned long *)bitmap;
>         if (longs) {
>                 for (; longs > 0; longs--, bitmap_long++)
>                         ret += hweight_long(*bitmap_long);
>         }
>         if (bytes) {
>                 ret += hweight_long(*bitmap_long &
>                                    ((0x1 << bytes * BITS_PER_BYTE) - 1));
>         }
> 
> The *bitmap_long will lead to buffer overflow here.

No it won't.  The CPU will access more bytes than the `bytes' argument
would seem to imply -- but it's going to have fetched that entire
cacheline anyway.  It might confuse a very strict bounds checking library,
but usually those just check you're not accessing outside your object,
which is going to be a multiple of 'sizeof(long)' anyway.

If we do something like this, we'll need to use an 'inverse' of that mask
on big-endian machines.  ie something more like:

	if (bytes) {
		unsigned long mask;
		if (_BIG_ENDIAN)
			mask = ~0UL >> (bytes * 8);
		else
			mask = ~0UL << (bytes * 8);
		ret += hweight_long(*bitmap_long & ~mask);
	}

Also we need a memweight() test to be sure we didn't get that wrong.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] lib/memweight.c: open codes bitmap_weight()
  2019-08-26 18:39       ` Matthew Wilcox
@ 2019-09-13 11:48         ` Denis Efremov
  2019-09-13 13:41           ` efremov
  0 siblings, 1 reply; 9+ messages in thread
From: Denis Efremov @ 2019-09-13 11:48 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: akpm, Akinobu Mita, Jan Kara, linux-kernel, Matthew Wilcox,
	dm-devel, linux-fsdevel, linux-media, Erdem Tumurov,
	Vladimir Shelekhov

Hi,

Sorry for reviving this conversation, but it looks to me like
this function could be reduced to a single bitmap_weight call:

static inline size_t memweight(const void *ptr, size_t bytes)
{
        BUG_ON(bytes >= UINT_MAX / BITS_PER_BYTE);
        return bitmap_weight(ptr, bytes * BITS_PER_BYTE);
}

Comparing to the current implementation
https://elixir.bootlin.com/linux/latest/source/lib/memweight.c#L11 
this results in a signification simplification. 

__bitmap_weight already count last bits with hweight_long as we
discussed earlier.

int __bitmap_weight(const unsigned long *bitmap, unsigned int bits)
{
	...
	if (bits % BITS_PER_LONG)
		w += hweight_long(bitmap[k] & BITMAP_LAST_WORD_MASK(bits));
	...
}

and __arch_hweight* functions use popcnt instruction.

I've briefly tested the equivalence of 2 implementations on x86_64 with
fuzzing here: https://gist.github.com/evdenis/95a8b9b8041e09368b31c3a9510491a5

What do you think making this function static inline and moving it
to include/linux/string.h? I could prepare a patch for it and add some tests for
memweight and bitmap_weight. Or maybe I miss something again?

Best regards,
Denis

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] lib/memweight.c: open codes bitmap_weight()
  2019-09-13 11:48         ` Denis Efremov
@ 2019-09-13 13:41           ` efremov
  0 siblings, 0 replies; 9+ messages in thread
From: efremov @ 2019-09-13 13:41 UTC (permalink / raw)
  To: Denis Efremov
  Cc: Matthew Wilcox, akpm, Akinobu Mita, Jan Kara, linux-kernel,
	Matthew Wilcox, dm-devel, linux-fsdevel, linux-media,
	Erdem Tumurov, Vladimir Shelekhov

Sorry, no question, pointer alignment of course.

Denis Efremov писал 2019-09-13 14:48:
> Hi,
> 
> Sorry for reviving this conversation, but it looks to me like
> this function could be reduced to a single bitmap_weight call:
> 
> static inline size_t memweight(const void *ptr, size_t bytes)
> {
>         BUG_ON(bytes >= UINT_MAX / BITS_PER_BYTE);
>         return bitmap_weight(ptr, bytes * BITS_PER_BYTE);
> }
> 
> Comparing to the current implementation
> https://elixir.bootlin.com/linux/latest/source/lib/memweight.c#L11
> this results in a signification simplification.
> 
> __bitmap_weight already count last bits with hweight_long as we
> discussed earlier.
> 
> int __bitmap_weight(const unsigned long *bitmap, unsigned int bits)
> {
> 	...
> 	if (bits % BITS_PER_LONG)
> 		w += hweight_long(bitmap[k] & BITMAP_LAST_WORD_MASK(bits));
> 	...
> }
> 
> and __arch_hweight* functions use popcnt instruction.
> 
> I've briefly tested the equivalence of 2 implementations on x86_64 with
> fuzzing here: 
> https://gist.github.com/evdenis/95a8b9b8041e09368b31c3a9510491a5
> 
> What do you think making this function static inline and moving it
> to include/linux/string.h? I could prepare a patch for it and add some 
> tests for
> memweight and bitmap_weight. Or maybe I miss something again?
> 
> Best regards,
> Denis

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-09-13 13:41 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-21  7:42 [PATCH] lib/memweight.c: optimize by inlining bitmap_weight() Denis Efremov
2019-08-22  1:25 ` Andrew Morton
2019-08-22  7:30   ` Denis Efremov
2019-08-24 10:01 ` [PATCH v2] lib/memweight.c: open codes bitmap_weight() Denis Efremov
2019-08-25  6:11   ` Matthew Wilcox
2019-08-25 11:39     ` Denis Efremov
2019-08-26 18:39       ` Matthew Wilcox
2019-09-13 11:48         ` Denis Efremov
2019-09-13 13:41           ` efremov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).