All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/3] lib: find_*_bit reimplementation
@ 2015-02-08 14:10 Yury Norov
  2015-02-08 14:10 ` [PATCH v3 1/3] " Yury Norov
                   ` (4 more replies)
  0 siblings, 5 replies; 27+ messages in thread
From: Yury Norov @ 2015-02-08 14:10 UTC (permalink / raw)
  To: klimov.linux, davem, akpm, hannes, dborkman, laijs,
	takahiro.akashi, valentinrothberg, linux, msalter, chris, tgraf
  Cc: Yury Norov, linux-kernel

This patchset does rework find_bit functions family to achieve better
performance, and decrease size of text. All rework is done in patch 1.
Patches 2 and 3 are about code moving and renaming.

It was boot-tested on x86_64 and MIPS (big-endian) machines.
Performance tests were ran on userspace with code like this:

	/* addr[] is filled from /dev/urandom */
	start = clock();
	while (ret < nbits)
		ret = find_next_bit(addr, nbits, ret + 1);

	end = clock();
	printf("%ld\t", (unsigned long) end - start);

On Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz rezults are next:
(for find_next_bit, nbits is 8M, for find_first_bit - 80K)

	find_next_bit:		find_first_bit:
	new	current		new	current
	26932	43151		14777	14925
	26947	43182		14521	15423
	26507	43824		15053	14705
	27329	43759		14473	14777
	26895	43367		14847	15023
	26990	43693		15103	15163
	26775	43299		15067	15232
	27282	42752		14544	15121
	27504	43088		14644	14858
	26761	43856		14699	15193
	26692	43075		14781	14681
	27137	42969		14451	15061
	...			...

find_next_bit performance gain is 35-40%;
find_first_bit - no measurable difference.

On ARM machine, there is arch-specific implementation for find_bit.
To disable it, and use generic one, please apply next patch:
	
	---
	 arch/arm/include/asm/bitops.h   | 19 -------------------
	 arch/arm/kernel/armksyms.c      | 11 -----------
	 arch/arm/lib/Makefile           |  2 +-
	 include/asm-generic/bitops/le.h |  1 +
	 4 files changed, 2 insertions(+), 31 deletions(-)

	diff --git a/arch/arm/include/asm/bitops.h b/arch/arm/include/asm/bitops.h
	index 5638099..e0611d1 100644
	--- a/arch/arm/include/asm/bitops.h
	+++ b/arch/arm/include/asm/bitops.h
	@@ -192,25 +192,6 @@ extern int _find_next_bit_be(const unsigned long *p, int size, int offset);
	 #define test_and_clear_bit(nr,p)	ATOMIC_BITOP(test_and_clear_bit,nr,p)
	 #define test_and_change_bit(nr,p)	ATOMIC_BITOP(test_and_change_bit,nr,p)
	 
	-#ifndef __ARMEB__
	-/*
	- * These are the little endian, atomic definitions.
	- */
	-#define find_first_zero_bit(p,sz)	_find_first_zero_bit_le(p,sz)
	-#define find_next_zero_bit(p,sz,off)	_find_next_zero_bit_le(p,sz,off)
	-#define find_first_bit(p,sz)		_find_first_bit_le(p,sz)
	-#define find_next_bit(p,sz,off)		_find_next_bit_le(p,sz,off)
	-
	-#else
	-/*
	- * These are the big endian, atomic definitions.
	- */
	-#define find_first_zero_bit(p,sz)	_find_first_zero_bit_be(p,sz)
	-#define find_next_zero_bit(p,sz,off)	_find_next_zero_bit_be(p,sz,off)
	-#define find_first_bit(p,sz)		_find_first_bit_be(p,sz)
	-#define find_next_bit(p,sz,off)		_find_next_bit_be(p,sz,off)
	-
	-#endif
	 
	 #if __LINUX_ARM_ARCH__ < 5
	 
	diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
	index a88671c..22e8748 100644
	--- a/arch/arm/kernel/armksyms.c
	+++ b/arch/arm/kernel/armksyms.c
	@@ -146,17 +146,6 @@ EXPORT_SYMBOL(_clear_bit);
	 EXPORT_SYMBOL(_test_and_clear_bit);
	 EXPORT_SYMBOL(_change_bit);
	 EXPORT_SYMBOL(_test_and_change_bit);
	-EXPORT_SYMBOL(_find_first_zero_bit_le);
	-EXPORT_SYMBOL(_find_next_zero_bit_le);
	-EXPORT_SYMBOL(_find_first_bit_le);
	-EXPORT_SYMBOL(_find_next_bit_le);
	-
	-#ifdef __ARMEB__
	-EXPORT_SYMBOL(_find_first_zero_bit_be);
	-EXPORT_SYMBOL(_find_next_zero_bit_be);
	-EXPORT_SYMBOL(_find_first_bit_be);
	-EXPORT_SYMBOL(_find_next_bit_be);
	-#endif
	 
	 #ifdef CONFIG_FUNCTION_TRACER
	 #ifdef CONFIG_OLD_MCOUNT
	diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
	index 0573faa..de369aa 100644
	--- a/arch/arm/lib/Makefile
	+++ b/arch/arm/lib/Makefile
	@@ -6,7 +6,7 @@
	 
	 lib-y		:= backtrace.o changebit.o csumipv6.o csumpartial.o   \
			   csumpartialcopy.o csumpartialcopyuser.o clearbit.o \
	-		   delay.o delay-loop.o findbit.o memchr.o memcpy.o   \
	+		   delay.o delay-loop.o memchr.o memcpy.o	      \
			   memmove.o memset.o memzero.o setbit.o              \
			   strchr.o strrchr.o                                 \
			   testchangebit.o testclearbit.o testsetbit.o        \
	diff --git a/include/asm-generic/bitops/le.h b/include/asm-generic/bitops/le.h
	index 6173154..9a8798f 100644
	--- a/include/asm-generic/bitops/le.h
	+++ b/include/asm-generic/bitops/le.h
	@@ -2,6 +2,7 @@
	 #define _ASM_GENERIC_BITOPS_LE_H_
	 
	 #include <asm/types.h>
	+#include <asm-generic/bitops/find.h>
	 #include <asm/byteorder.h>
	 
	 #if defined(__LITTLE_ENDIAN)
	-- 
	1.9.1
	
v3:
* conditional bit inverting in _find_next_bit replaced with XORing by
  mask (patch 1);
* size/offset checkers moved from wrappers to _find_next_bit (patch 1);
* #include <linux/bitops.h> restored (patch 1);
* lib/find_next_bit descriptor changed to not appeal to file name and
  so let to produce clean diff in case of rename (patch 2);
* patch 3 is generated with '-M' option to detect renaming and drop
  file content;
* this cover added.

Yury Norov (3):
  lib: find_*_bit reimplementation
  lib: move find_last_bit to lib/find_next_bit.c
  lib: rename lib/find_next_bit.c to lib/find_bit.c

 lib/Makefile        |   2 +-
 lib/find_bit.c      | 197 ++++++++++++++++++++++++++++++++++++
 lib/find_last_bit.c |  49 ---------
 lib/find_next_bit.c | 285 ----------------------------------------------------
 4 files changed, 198 insertions(+), 335 deletions(-)
 create mode 100644 lib/find_bit.c
 delete mode 100644 lib/find_last_bit.c
 delete mode 100644 lib/find_next_bit.c

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Alexey Klimov <klimov.linux@gmail.com> (odroid-xu3 ARMv7 SoC).
-- 
2.1.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 1/3] lib: find_*_bit reimplementation
  2015-02-08 14:10 [PATCH v3 0/3] lib: find_*_bit reimplementation Yury Norov
@ 2015-02-08 14:10 ` Yury Norov
  2015-02-08 18:48   ` George Spelvin
  2015-02-09  8:32   ` George Spelvin
  2015-02-08 14:10 ` [PATCH v3 2/3] lib: move find_last_bit to lib/find_next_bit.c Yury Norov
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 27+ messages in thread
From: Yury Norov @ 2015-02-08 14:10 UTC (permalink / raw)
  To: klimov.linux, davem, akpm, hannes, dborkman, laijs,
	takahiro.akashi, valentinrothberg, linux, msalter, chris, tgraf
  Cc: Yury Norov, linux-kernel

New implementations takes less space in source file (see diffstat)
and in object. For me it's 710 vs 453 bytes of text.
It also shows better performance.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 lib/find_last_bit.c |  30 ++----
 lib/find_next_bit.c | 275 ++++++++++++++++------------------------------------
 2 files changed, 92 insertions(+), 213 deletions(-)

diff --git a/lib/find_last_bit.c b/lib/find_last_bit.c
index 91ca09f..106050f 100644
--- a/lib/find_last_bit.c
+++ b/lib/find_last_bit.c
@@ -4,6 +4,9 @@
  * Written by Rusty Russell <rusty@rustcorp.com.au>
  * (Inspired by David Howell's find_next_bit implementation)
  *
+ * Rewritten by Yury Norov <yury.norov@gmail.com> to decrease
+ * size and improve performance, 2015.
+ *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
  * as published by the Free Software Foundation; either version
@@ -12,36 +15,19 @@
 
 #include <linux/bitops.h>
 #include <linux/export.h>
-#include <asm/types.h>
-#include <asm/byteorder.h>
+#include <linux/kernel.h>
 
 #ifndef find_last_bit
 
 unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
 {
-	unsigned long words;
-	unsigned long tmp;
-
-	/* Start at final word. */
-	words = size / BITS_PER_LONG;
-
-	/* Partial final word? */
-	if (size & (BITS_PER_LONG-1)) {
-		tmp = (addr[words] & (~0UL >> (BITS_PER_LONG
-					 - (size & (BITS_PER_LONG-1)))));
-		if (tmp)
-			goto found;
-	}
+	unsigned long idx = DIV_ROUND_UP(size, BITS_PER_LONG);
 
-	while (words) {
-		tmp = addr[--words];
-		if (tmp) {
-found:
-			return words * BITS_PER_LONG + __fls(tmp);
-		}
+	while (idx--) {
+		if (addr[idx])
+			return min(idx * BITS_PER_LONG + __fls(addr[idx]), size);
 	}
 
-	/* Not found */
 	return size;
 }
 EXPORT_SYMBOL(find_last_bit);
diff --git a/lib/find_next_bit.c b/lib/find_next_bit.c
index 0cbfc0b..71aa497 100644
--- a/lib/find_next_bit.c
+++ b/lib/find_next_bit.c
@@ -3,6 +3,9 @@
  * Copyright (C) 2004 Red Hat, Inc. All Rights Reserved.
  * Written by David Howells (dhowells@redhat.com)
  *
+ * Rewritten by Yury Norov <yury.norov@gmail.com> to decrease
+ * size and improve performance, 2015.
+ *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
  * as published by the Free Software Foundation; either version
@@ -11,10 +14,48 @@
 
 #include <linux/bitops.h>
 #include <linux/export.h>
-#include <asm/types.h>
-#include <asm/byteorder.h>
+#include <linux/kernel.h>
+
+#define HIGH_BITS_MASK(nr)		(ULONG_MAX << (nr))
+
+#if !defined(find_next_bit) || !defined(find_next_zero_bit)
+static unsigned long _find_next_bit(const unsigned long *addr,
+		unsigned long nbits, unsigned long start, unsigned long mask)
+{
+	unsigned long tmp;
+
+	if (start >= nbits)
+		return nbits;
+
+	tmp = addr[start / BITS_PER_LONG] ^ mask;
 
-#define BITOP_WORD(nr)		((nr) / BITS_PER_LONG)
+	/* Handle 1st word. */
+	if (!IS_ALIGNED(start, BITS_PER_LONG)) {
+		tmp &= HIGH_BITS_MASK(start % BITS_PER_LONG);
+		start = round_down(start, BITS_PER_LONG);
+	}
+
+	while (!tmp) {
+		start += BITS_PER_LONG;
+		if (start >= nbits)
+			return nbits;
+
+		/*
+		 * This is an equvalent for:
+		 *
+		 * tmp = find_set ? addr[start / BITS_PER_LONG]
+		 *	: ~addr[start / BITS_PER_LONG];
+		 *
+		 * but saves a branch condition.
+		 *
+		 * Thanks George Spelvin <linux@horizon.com> for idea.
+		 */
+		tmp = addr[start / BITS_PER_LONG] ^ mask;
+	}
+
+	return min(start + __ffs(tmp), nbits);
+}
+#endif
 
 #ifndef find_next_bit
 /*
@@ -23,86 +64,16 @@
 unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
 			    unsigned long offset)
 {
-	const unsigned long *p = addr + BITOP_WORD(offset);
-	unsigned long result = offset & ~(BITS_PER_LONG-1);
-	unsigned long tmp;
-
-	if (offset >= size)
-		return size;
-	size -= result;
-	offset %= BITS_PER_LONG;
-	if (offset) {
-		tmp = *(p++);
-		tmp &= (~0UL << offset);
-		if (size < BITS_PER_LONG)
-			goto found_first;
-		if (tmp)
-			goto found_middle;
-		size -= BITS_PER_LONG;
-		result += BITS_PER_LONG;
-	}
-	while (size & ~(BITS_PER_LONG-1)) {
-		if ((tmp = *(p++)))
-			goto found_middle;
-		result += BITS_PER_LONG;
-		size -= BITS_PER_LONG;
-	}
-	if (!size)
-		return result;
-	tmp = *p;
-
-found_first:
-	tmp &= (~0UL >> (BITS_PER_LONG - size));
-	if (tmp == 0UL)		/* Are any bits set? */
-		return result + size;	/* Nope. */
-found_middle:
-	return result + __ffs(tmp);
+	return _find_next_bit(addr, size, offset, 0UL);
 }
 EXPORT_SYMBOL(find_next_bit);
 #endif
 
 #ifndef find_next_zero_bit
-/*
- * This implementation of find_{first,next}_zero_bit was stolen from
- * Linus' asm-alpha/bitops.h.
- */
 unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
 				 unsigned long offset)
 {
-	const unsigned long *p = addr + BITOP_WORD(offset);
-	unsigned long result = offset & ~(BITS_PER_LONG-1);
-	unsigned long tmp;
-
-	if (offset >= size)
-		return size;
-	size -= result;
-	offset %= BITS_PER_LONG;
-	if (offset) {
-		tmp = *(p++);
-		tmp |= ~0UL >> (BITS_PER_LONG - offset);
-		if (size < BITS_PER_LONG)
-			goto found_first;
-		if (~tmp)
-			goto found_middle;
-		size -= BITS_PER_LONG;
-		result += BITS_PER_LONG;
-	}
-	while (size & ~(BITS_PER_LONG-1)) {
-		if (~(tmp = *(p++)))
-			goto found_middle;
-		result += BITS_PER_LONG;
-		size -= BITS_PER_LONG;
-	}
-	if (!size)
-		return result;
-	tmp = *p;
-
-found_first:
-	tmp |= ~0UL << size;
-	if (tmp == ~0UL)	/* Are any bits zero? */
-		return result + size;	/* Nope. */
-found_middle:
-	return result + ffz(tmp);
+	return _find_next_bit(addr, size, offset, ~0UL);
 }
 EXPORT_SYMBOL(find_next_zero_bit);
 #endif
@@ -113,24 +84,14 @@ EXPORT_SYMBOL(find_next_zero_bit);
  */
 unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
 {
-	const unsigned long *p = addr;
-	unsigned long result = 0;
-	unsigned long tmp;
+	unsigned long idx;
 
-	while (size & ~(BITS_PER_LONG-1)) {
-		if ((tmp = *(p++)))
-			goto found;
-		result += BITS_PER_LONG;
-		size -= BITS_PER_LONG;
+	for (idx = 0; idx * BITS_PER_LONG < size; idx++) {
+		if (addr[idx])
+			return min(idx * BITS_PER_LONG + __ffs(addr[idx]), size);
 	}
-	if (!size)
-		return result;
 
-	tmp = (*p) & (~0UL >> (BITS_PER_LONG - size));
-	if (tmp == 0UL)		/* Are any bits set? */
-		return result + size;	/* Nope. */
-found:
-	return result + __ffs(tmp);
+	return size;
 }
 EXPORT_SYMBOL(find_first_bit);
 #endif
@@ -141,24 +102,14 @@ EXPORT_SYMBOL(find_first_bit);
  */
 unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
 {
-	const unsigned long *p = addr;
-	unsigned long result = 0;
-	unsigned long tmp;
+	unsigned long idx;
 
-	while (size & ~(BITS_PER_LONG-1)) {
-		if (~(tmp = *(p++)))
-			goto found;
-		result += BITS_PER_LONG;
-		size -= BITS_PER_LONG;
+	for (idx = 0; idx * BITS_PER_LONG < size; idx++) {
+		if (addr[idx] != ULONG_MAX)
+			return min(idx * BITS_PER_LONG + ffz(addr[idx]), size);
 	}
-	if (!size)
-		return result;
 
-	tmp = (*p) | (~0UL << size);
-	if (tmp == ~0UL)	/* Are any bits zero? */
-		return result + size;	/* Nope. */
-found:
-	return result + ffz(tmp);
+	return size;
 }
 EXPORT_SYMBOL(find_first_zero_bit);
 #endif
@@ -166,18 +117,6 @@ EXPORT_SYMBOL(find_first_zero_bit);
 #ifdef __BIG_ENDIAN
 
 /* include/linux/byteorder does not support "unsigned long" type */
-static inline unsigned long ext2_swabp(const unsigned long * x)
-{
-#if BITS_PER_LONG == 64
-	return (unsigned long) __swab64p((u64 *) x);
-#elif BITS_PER_LONG == 32
-	return (unsigned long) __swab32p((u32 *) x);
-#else
-#error BITS_PER_LONG not defined
-#endif
-}
-
-/* include/linux/byteorder doesn't support "unsigned long" type */
 static inline unsigned long ext2_swab(const unsigned long y)
 {
 #if BITS_PER_LONG == 64
@@ -189,48 +128,40 @@ static inline unsigned long ext2_swab(const unsigned long y)
 #endif
 }
 
-#ifndef find_next_zero_bit_le
-unsigned long find_next_zero_bit_le(const void *addr, unsigned
-		long size, unsigned long offset)
+#if !defined(find_next_bit_le) || !defined(find_next_zero_bit_le)
+static unsigned long _find_next_bit_le(const unsigned long *addr,
+		unsigned long nbits, unsigned long start, unsigned long mask)
 {
-	const unsigned long *p = addr;
-	unsigned long result = offset & ~(BITS_PER_LONG - 1);
 	unsigned long tmp;
 
-	if (offset >= size)
-		return size;
-	p += BITOP_WORD(offset);
-	size -= result;
-	offset &= (BITS_PER_LONG - 1UL);
-	if (offset) {
-		tmp = ext2_swabp(p++);
-		tmp |= (~0UL >> (BITS_PER_LONG - offset));
-		if (size < BITS_PER_LONG)
-			goto found_first;
-		if (~tmp)
-			goto found_middle;
-		size -= BITS_PER_LONG;
-		result += BITS_PER_LONG;
+	if (start >= nbits)
+		return nbits;
+
+	tmp = addr[start / BITS_PER_LONG] ^ mask;
+
+	/* Handle 1st word. */
+	if (!IS_ALIGNED(start, BITS_PER_LONG)) {
+		tmp &= ext2_swab(HIGH_BITS_MASK(start % BITS_PER_LONG));
+		start = round_down(start, BITS_PER_LONG);
 	}
 
-	while (size & ~(BITS_PER_LONG - 1)) {
-		if (~(tmp = *(p++)))
-			goto found_middle_swap;
-		result += BITS_PER_LONG;
-		size -= BITS_PER_LONG;
+	while (!tmp) {
+		start += BITS_PER_LONG;
+		if (start >= nbits)
+			return nbits;
+
+		tmp = addr[start / BITS_PER_LONG] ^ mask;
 	}
-	if (!size)
-		return result;
-	tmp = ext2_swabp(p);
-found_first:
-	tmp |= ~0UL << size;
-	if (tmp == ~0UL)	/* Are any bits zero? */
-		return result + size; /* Nope. Skip ffz */
-found_middle:
-	return result + ffz(tmp);
 
-found_middle_swap:
-	return result + ffz(ext2_swab(tmp));
+	return min(start + __ffs(ext2_swab(tmp)), nbits);
+}
+#endif
+
+#ifndef find_next_zero_bit_le
+unsigned long find_next_zero_bit_le(const void *addr, unsigned
+		long size, unsigned long offset)
+{
+	return _find_next_bit_le(addr, size, offset, ~0UL);
 }
 EXPORT_SYMBOL(find_next_zero_bit_le);
 #endif
@@ -239,45 +170,7 @@ EXPORT_SYMBOL(find_next_zero_bit_le);
 unsigned long find_next_bit_le(const void *addr, unsigned
 		long size, unsigned long offset)
 {
-	const unsigned long *p = addr;
-	unsigned long result = offset & ~(BITS_PER_LONG - 1);
-	unsigned long tmp;
-
-	if (offset >= size)
-		return size;
-	p += BITOP_WORD(offset);
-	size -= result;
-	offset &= (BITS_PER_LONG - 1UL);
-	if (offset) {
-		tmp = ext2_swabp(p++);
-		tmp &= (~0UL << offset);
-		if (size < BITS_PER_LONG)
-			goto found_first;
-		if (tmp)
-			goto found_middle;
-		size -= BITS_PER_LONG;
-		result += BITS_PER_LONG;
-	}
-
-	while (size & ~(BITS_PER_LONG - 1)) {
-		tmp = *(p++);
-		if (tmp)
-			goto found_middle_swap;
-		result += BITS_PER_LONG;
-		size -= BITS_PER_LONG;
-	}
-	if (!size)
-		return result;
-	tmp = ext2_swabp(p);
-found_first:
-	tmp &= (~0UL >> (BITS_PER_LONG - size));
-	if (tmp == 0UL)		/* Are any bits set? */
-		return result + size; /* Nope. */
-found_middle:
-	return result + __ffs(tmp);
-
-found_middle_swap:
-	return result + __ffs(ext2_swab(tmp));
+	return _find_next_bit_le(addr, size, offset, 0UL);
 }
 EXPORT_SYMBOL(find_next_bit_le);
 #endif
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 2/3] lib: move find_last_bit to lib/find_next_bit.c
  2015-02-08 14:10 [PATCH v3 0/3] lib: find_*_bit reimplementation Yury Norov
  2015-02-08 14:10 ` [PATCH v3 1/3] " Yury Norov
@ 2015-02-08 14:10 ` Yury Norov
  2015-02-08 14:10 ` [PATCH v3 3/3] lib: rename lib/find_next_bit.c to lib/find_bit.c Yury Norov
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 27+ messages in thread
From: Yury Norov @ 2015-02-08 14:10 UTC (permalink / raw)
  To: klimov.linux, davem, akpm, hannes, dborkman, laijs,
	takahiro.akashi, valentinrothberg, linux, msalter, chris, tgraf
  Cc: Yury Norov, linux-kernel

Currently all 'find_*_bit' family is located in lib/find_next_bit.c,
except 'find_last_bit', which is in lib/find_last_bit.c. It seems,
there's no major benefit to have it separated.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 lib/Makefile        |  2 +-
 lib/find_last_bit.c | 35 -----------------------------------
 lib/find_next_bit.c | 21 ++++++++++++++++++++-
 3 files changed, 21 insertions(+), 37 deletions(-)
 delete mode 100644 lib/find_last_bit.c

diff --git a/lib/Makefile b/lib/Makefile
index 3c3b30b..13990aa 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -25,7 +25,7 @@ obj-y	+= lockref.o
 obj-y += bcd.o div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \
 	 bust_spinlocks.o hexdump.o kasprintf.o bitmap.o scatterlist.o \
 	 gcd.o lcm.o list_sort.o uuid.o flex_array.o iovec.o clz_ctz.o \
-	 bsearch.o find_last_bit.o find_next_bit.o llist.o memweight.o kfifo.o \
+	 bsearch.o find_next_bit.o llist.o memweight.o kfifo.o \
 	 percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o
 obj-y += string_helpers.o
 obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
diff --git a/lib/find_last_bit.c b/lib/find_last_bit.c
deleted file mode 100644
index 106050f..0000000
--- a/lib/find_last_bit.c
+++ /dev/null
@@ -1,35 +0,0 @@
-/* find_last_bit.c: fallback find next bit implementation
- *
- * Copyright (C) 2008 IBM Corporation
- * Written by Rusty Russell <rusty@rustcorp.com.au>
- * (Inspired by David Howell's find_next_bit implementation)
- *
- * Rewritten by Yury Norov <yury.norov@gmail.com> to decrease
- * size and improve performance, 2015.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- */
-
-#include <linux/bitops.h>
-#include <linux/export.h>
-#include <linux/kernel.h>
-
-#ifndef find_last_bit
-
-unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
-{
-	unsigned long idx = DIV_ROUND_UP(size, BITS_PER_LONG);
-
-	while (idx--) {
-		if (addr[idx])
-			return min(idx * BITS_PER_LONG + __fls(addr[idx]), size);
-	}
-
-	return size;
-}
-EXPORT_SYMBOL(find_last_bit);
-
-#endif
diff --git a/lib/find_next_bit.c b/lib/find_next_bit.c
index 71aa497..5ec0ab9 100644
--- a/lib/find_next_bit.c
+++ b/lib/find_next_bit.c
@@ -1,8 +1,12 @@
-/* find_next_bit.c: fallback find next bit implementation
+/* bit search implementation
  *
  * Copyright (C) 2004 Red Hat, Inc. All Rights Reserved.
  * Written by David Howells (dhowells@redhat.com)
  *
+ * Copyright (C) 2008 IBM Corporation
+ * 'find_last_bit' is written by Rusty Russell <rusty@rustcorp.com.au>
+ * (Inspired by David Howell's find_next_bit implementation)
+ *
  * Rewritten by Yury Norov <yury.norov@gmail.com> to decrease
  * size and improve performance, 2015.
  *
@@ -114,6 +118,21 @@ unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
 EXPORT_SYMBOL(find_first_zero_bit);
 #endif
 
+#ifndef find_last_bit
+unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
+{
+	unsigned long idx = DIV_ROUND_UP(size, BITS_PER_LONG);
+
+	while (idx--) {
+		if (addr[idx])
+			return min(idx * BITS_PER_LONG + __fls(addr[idx]), size);
+	}
+
+	return size;
+}
+EXPORT_SYMBOL(find_last_bit);
+#endif
+
 #ifdef __BIG_ENDIAN
 
 /* include/linux/byteorder does not support "unsigned long" type */
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 3/3] lib: rename lib/find_next_bit.c to lib/find_bit.c
  2015-02-08 14:10 [PATCH v3 0/3] lib: find_*_bit reimplementation Yury Norov
  2015-02-08 14:10 ` [PATCH v3 1/3] " Yury Norov
  2015-02-08 14:10 ` [PATCH v3 2/3] lib: move find_last_bit to lib/find_next_bit.c Yury Norov
@ 2015-02-08 14:10 ` Yury Norov
  2015-02-17  2:35 ` [PATCH v4 0/3] lib: find_*_bit reimplementation Yury Norov
  2015-02-22 17:24 ` [PATCH v5 0/3] lib: find_*_bit reimplementation Yury Norov
  4 siblings, 0 replies; 27+ messages in thread
From: Yury Norov @ 2015-02-08 14:10 UTC (permalink / raw)
  To: klimov.linux, davem, akpm, hannes, dborkman, laijs,
	takahiro.akashi, valentinrothberg, linux, msalter, chris, tgraf
  Cc: Yury Norov, linux-kernel

This file contains implementation for all find_*_bit{,_le}
So giving it more generic name looks reasonable.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 lib/Makefile                        | 2 +-
 lib/{find_next_bit.c => find_bit.c} | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename lib/{find_next_bit.c => find_bit.c} (100%)

diff --git a/lib/Makefile b/lib/Makefile
index 13990aa..1cc93f4 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -25,7 +25,7 @@ obj-y	+= lockref.o
 obj-y += bcd.o div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \
 	 bust_spinlocks.o hexdump.o kasprintf.o bitmap.o scatterlist.o \
 	 gcd.o lcm.o list_sort.o uuid.o flex_array.o iovec.o clz_ctz.o \
-	 bsearch.o find_next_bit.o llist.o memweight.o kfifo.o \
+	 bsearch.o find_bit.o llist.o memweight.o kfifo.o \
 	 percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o
 obj-y += string_helpers.o
 obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
diff --git a/lib/find_next_bit.c b/lib/find_bit.c
similarity index 100%
rename from lib/find_next_bit.c
rename to lib/find_bit.c
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/3] lib: find_*_bit reimplementation
  2015-02-08 14:10 ` [PATCH v3 1/3] " Yury Norov
@ 2015-02-08 18:48   ` George Spelvin
  2015-02-09  8:32   ` George Spelvin
  1 sibling, 0 replies; 27+ messages in thread
From: George Spelvin @ 2015-02-08 18:48 UTC (permalink / raw)
  To: akpm, chris, davem, dborkman, hannes, klimov.linux, laijs, linux,
	msalter, takahiro.akashi, tgraf, valentinrothberg, yury.norov
  Cc: linux-kernel

This basically has my Reviewed-by: (I'll send it in a few hours
when I have time to do a real final copy-editing), but a few minor
notes:

>+		/*
>+		 * This is an equvalent for:
>+		 *
>+		 * tmp = find_set ? addr[start / BITS_PER_LONG]
>+		 *	: ~addr[start / BITS_PER_LONG];
>+		 *
>+		 * but saves a branch condition.
>+		 *
>+		 * Thanks George Spelvin <linux@horizon.com> for idea.
>+		 */
>+		tmp = addr[start / BITS_PER_LONG] ^ mask;

1. There's no need for detailed credit for such a trivial and obvious
   thing.  If you want to comment it, describe the use of the parameter
   in the function header, e.g.

+/*
+ * "mask" is either 0 or ~0UL and XORed into each fetched word, to select between
+ * searching for one bits and zero bits.  If this function is inlined, GCC is
+ * smart enough to propagate the constant.
+ */

2. The variable might be more meaningfully named "xor" or "invert"; "mask"
   suggests something that is &-ed with a value.


> unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
> 				 unsigned long offset)
> {
>+	return _find_next_bit(addr, size, offset, ~0UL);	<---
> }

> unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
> {
>+	unsigned long idx;
> 
>+	for (idx = 0; idx * BITS_PER_LONG < size; idx++) {
>+		if (addr[idx] != ULONG_MAX)			<---
>+			return min(idx * BITS_PER_LONG + ffz(addr[idx]), size);
> 	}
> 
>+	return size;
> }

3. Using two names (ULONG_MAX and ~0UL) for the same thing is a bit odd;
   you might want to be consistent.

I'll ack it either way; none of these are significant technical issues.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/3] lib: find_*_bit reimplementation
  2015-02-08 14:10 ` [PATCH v3 1/3] " Yury Norov
  2015-02-08 18:48   ` George Spelvin
@ 2015-02-09  8:32   ` George Spelvin
  2015-02-09 11:53     ` Rasmus Villemoes
  1 sibling, 1 reply; 27+ messages in thread
From: George Spelvin @ 2015-02-09  8:32 UTC (permalink / raw)
  To: akpm, chris, davem, dborkman, hannes, klimov.linux, laijs, linux,
	msalter, takahiro.akashi, tgraf, valentinrothberg, yury.norov
  Cc: linux-kernel

Two more comments on the code.  Two minor, but one that
seems like a bug, so for now, it's

Nacked-by: George Spelvin <linux@horizon.com> 

Specifically, it seems like find_last_bit used to ignore trailing
garbage in the bitmap, but now will stop searching if the last word
contains some set bits not within size.


The minor one is that I don't think the first-word masking needs to
be conditional.  The general code works fine if the start is aligned
(HIGH_BITS_MASK just generates an all-ones mask), is quite quick, and
saves a test & conditional branch.

A second minor one is that the comment in include/linux/bitops.h has
an obvious typo, and should probably be fixed, too.

Here's a diff on top of yours with all my suggested changes, both
these two and the ones from my previous e-mail.

diff --git a/find_last_bit.c b/find_last_bit.c
index e67e970..106050f 100644
--- a/find_last_bit.c
+++ b/find_last_bit.c
@@ -13,6 +13,7 @@
  * 2 of the License, or (at your option) any later version.
  */
 
+#include <linux/bitops.h>
 #include <linux/export.h>
 #include <linux/kernel.h>
 
diff --git a/find_next_bit.c b/find_next_bit.c
index 71aa497..9d01d4a 100644
--- a/find_next_bit.c
+++ b/find_next_bit.c
@@ -16,41 +16,34 @@
 #include <linux/export.h>
 #include <linux/kernel.h>
 
-#define HIGH_BITS_MASK(nr)		(ULONG_MAX << (nr))
+#define HIGH_BITS_MASK(nr)		(~0UL << (nr))
 
 #if !defined(find_next_bit) || !defined(find_next_zero_bit)
+
+/*
+ * This is a common helper function for find_next_bit and
+ * find_next_zero_bit.  The difference is the "invert" argument, which
+ * is XORed with each fetched word before searching it for one bits.
+ */
 static unsigned long _find_next_bit(const unsigned long *addr,
-		unsigned long nbits, unsigned long start, unsigned long mask)
+		unsigned long nbits, unsigned long start, unsigned long invert)
 {
 	unsigned long tmp;
 
 	if (start >= nbits)
 		return nbits;
 
-	tmp = addr[start / BITS_PER_LONG] ^ mask;
+	tmp = addr[start / BITS_PER_LONG] ^ invert;
 
 	/* Handle 1st word. */
-	if (!IS_ALIGNED(start, BITS_PER_LONG)) {
-		tmp &= HIGH_BITS_MASK(start % BITS_PER_LONG);
-		start = round_down(start, BITS_PER_LONG);
-	}
+	tmp &= HIGH_BITS_MASK(start % BITS_PER_LONG);
+	start = round_down(start, BITS_PER_LONG);
 
 	while (!tmp) {
 		start += BITS_PER_LONG;
 		if (start >= nbits)
 			return nbits;
-
-		/*
-		 * This is an equvalent for:
-		 *
-		 * tmp = find_set ? addr[start / BITS_PER_LONG]
-		 *	: ~addr[start / BITS_PER_LONG];
-		 *
-		 * but saves a branch condition.
-		 *
-		 * Thanks George Spelvin <linux@horizon.com> for idea.
-		 */
-		tmp = addr[start / BITS_PER_LONG] ^ mask;
+		tmp = addr[start / BITS_PER_LONG] ^ invert;
 	}
 
 	return min(start + __ffs(tmp), nbits);
@@ -105,7 +98,7 @@ unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
 	unsigned long idx;
 
 	for (idx = 0; idx * BITS_PER_LONG < size; idx++) {
-		if (addr[idx] != ULONG_MAX)
+		if (~addr[idx])
 			return min(idx * BITS_PER_LONG + ffz(addr[idx]), size);
 	}
 
@@ -130,14 +123,14 @@ static inline unsigned long ext2_swab(const unsigned long y)
 
 #if !defined(find_next_bit_le) || !defined(find_next_zero_bit_le)
 static unsigned long _find_next_bit_le(const unsigned long *addr,
-		unsigned long nbits, unsigned long start, unsigned long mask)
+		unsigned long nbits, unsigned long start, unsigned long invert)
 {
 	unsigned long tmp;
 
 	if (start >= nbits)
 		return nbits;
 
-	tmp = addr[start / BITS_PER_LONG] ^ mask;
+	tmp = addr[start / BITS_PER_LONG] ^ invert;
 
 	/* Handle 1st word. */
 	if (!IS_ALIGNED(start, BITS_PER_LONG)) {
@@ -150,7 +143,7 @@ static unsigned long _find_next_bit_le(const unsigned long *addr,
 		if (start >= nbits)
 			return nbits;
 
-		tmp = addr[start / BITS_PER_LONG] ^ mask;
+		tmp = addr[start / BITS_PER_LONG] ^ invert;
 	}
 
 	return min(start + __ffs(ext2_swab(tmp)), nbits);
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 5d858e02..4a5e5934 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -218,9 +218,9 @@ static inline unsigned long __ffs64(u64 word)
 /**
  * find_last_bit - find the last set bit in a memory region
  * @addr: The address to start the search at
- * @size: The maximum size to search
+ * @size: The number of bits to search
  *
- * Returns the bit number of the first set bit, or size.
+ * Returns the bit number of the last set bit, or size if no bits are set.
  */
 extern unsigned long find_last_bit(const unsigned long *addr,
 				   unsigned long size);




Now for the serious issue.  I see a function change, fo find_last_bit
with no justifying comments.

diff --git a/lib/find_last_bit.c b/lib/find_last_bit.c
index 91ca09f..106050f 100644
--- a/lib/find_last_bit.c
+++ b/lib/find_last_bit.c
@@ -19,29 +21,13 @@
 
 unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
 {
-	unsigned long words;
-	unsigned long tmp;
-
-	/* Start at final word. */
-	words = size / BITS_PER_LONG;
-
-	/* Partial final word? */
-	if (size & (BITS_PER_LONG-1)) {
-		tmp = (addr[words] & (~0UL >> (BITS_PER_LONG
-					 - (size & (BITS_PER_LONG-1)))));
-		if (tmp)
-			goto found;
-	}
+	unsigned long idx = DIV_ROUND_UP(size, BITS_PER_LONG);
 
-	while (words) {
-		tmp = addr[--words];
-		if (tmp) {
-found:
-			return words * BITS_PER_LONG + __fls(tmp);
-		}
+	while (idx--) {
+		if (addr[idx])
+			return min(idx * BITS_PER_LONG + __fls(addr[idx]), size);
 	}
 
-	/* Not found */
 	return size;
 }
 EXPORT_SYMBOL(find_last_bit);


Previously, the last word was masked, so bits beyond "size" were ignored.
With the revised code, something like find_last_bit(array, 96) will return 96
if array[1] >> 32 is non-zero, even if array[1] & 0xffffffff is zero.

Looking through the callers, I haven't found a case where this matters yet
so perhaps it's a safe optimization, but this really needs to be more
clearly documented if intentional.

If no change was desired, I'd think a good way to do this would be:

 unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
 {
	size_t idx = DIV_ROUND_UP(size, BITS_PER_LONG);
 	unsigned long tmp = addr[--idx];

	tmp &= (2UL << (size % BITS_PER_LONG)) - 1;	/* Mask last word */

	while (!tmp) {
		if (!idx)
			return size;
 		tmp = addr[--idx];
	}
	return idx * BITS_PER_LONG + __fls(tmp);
}

Which is not that different from the old code, but cleaned up.

(Feel free to add my Signed-off-by if you think you need it.)

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/3] lib: find_*_bit reimplementation
  2015-02-09  8:32   ` George Spelvin
@ 2015-02-09 11:53     ` Rasmus Villemoes
  2015-02-09 16:45       ` George Spelvin
  2015-02-11 23:05       ` Yury
  0 siblings, 2 replies; 27+ messages in thread
From: Rasmus Villemoes @ 2015-02-09 11:53 UTC (permalink / raw)
  To: George Spelvin
  Cc: akpm, chris, davem, dborkman, hannes, klimov.linux, laijs,
	msalter, takahiro.akashi, tgraf, valentinrothberg, yury.norov,
	linux-kernel

[Yury, please do remember to Cc everyone who has previously
participated]

On Mon, Feb 09 2015, "George Spelvin" <linux@horizon.com> wrote:

> Two more comments on the code.  Two minor, but one that
> seems like a bug, so for now, it's
>
> Nacked-by: George Spelvin <linux@horizon.com> 
>
> Specifically, it seems like find_last_bit used to ignore trailing
> garbage in the bitmap, but now will stop searching if the last word
> contains some set bits not within size.

True, though see below.

> The minor one is that I don't think the first-word masking needs to
> be conditional.  The general code works fine if the start is aligned
> (HIGH_BITS_MASK just generates an all-ones mask), is quite quick, and
> saves a test & conditional branch.
>

I also noted that during the first review, but when I tried to compile
it gcc actually generated slightly worse code, so I decided not to
comment on it. I don't have a strong preference either way, though.

>
> Previously, the last word was masked, so bits beyond "size" were ignored.
> With the revised code, something like find_last_bit(array, 96) will return 96
> if array[1] >> 32 is non-zero, even if array[1] & 0xffffffff is zero.
>
> Looking through the callers, I haven't found a case where this matters yet
> so perhaps it's a safe optimization, but this really needs to be more
> clearly documented if intentional.
>
> If no change was desired, I'd think a good way to do this would be:
>
>  unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
>  {
> 	size_t idx = DIV_ROUND_UP(size, BITS_PER_LONG);
>  	unsigned long tmp = addr[--idx];
>
> 	tmp &= (2UL << (size % BITS_PER_LONG)) - 1;	/* Mask last word */
>
> 	while (!tmp) {
> 		if (!idx)
> 			return size;
>  		tmp = addr[--idx];
> 	}
> 	return idx * BITS_PER_LONG + __fls(tmp);
> }

How should that work? If size is for example 1, the mask evaluates to 3UL,
while what is needed is 1UL. If size is aligned, the mask becomes 1UL,
which is also not right.

Also, I think it is best to handle size==0 appropriately, meaning that
one cannot dereference addr in any way (and certainly not addr[-1]).

So how about

unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
{
	size_t idx = DIV_ROUND_UP(size, BITS_PER_LONG);
	unsigned long mask = LAST_WORD_MASK(size);

	while (idx--) {
		unsigned long val = addr[idx] & mask;
		if (val)
			return idx * BITS_PER_LONG + __fls(val);
		mask = ~0ul;
	}
	return size;
}

Rasmus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/3] lib: find_*_bit reimplementation
  2015-02-09 11:53     ` Rasmus Villemoes
@ 2015-02-09 16:45       ` George Spelvin
  2015-02-11 22:14         ` Rasmus Villemoes
  2015-02-11 23:05       ` Yury
  1 sibling, 1 reply; 27+ messages in thread
From: George Spelvin @ 2015-02-09 16:45 UTC (permalink / raw)
  To: linux, linux
  Cc: akpm, chris, davem, dborkman, hannes, klimov.linux, laijs,
	linux-kernel, msalter, takahiro.akashi, tgraf, valentinrothberg,
	yury.norov

Sorry, I screwed up the bit-twiddling while messing with various options.
I was trying to get size == 32 to work; that should have been:

> 	tmp &= (2UL << ((size-1) % BITS_PER_LONG)) - 1;	/* Mask last word */

And you're right that LAST_WORD_MASK is a good wrapper.

Vasrious working solutions include:
#define LAST_WORD_MASK(bits) 	((2UL << (bits-1) % BITS_PER_LONG) - 1)
#define LAST_WORD_MASK(bits) 	~(~0UL << bits % BITS_PER_LONG)
#define LAST_WORD_MASK(bits) 	(~0UL >> -bits % BITS_PER_LONG)

I'm not sure which generates the nicest code.  It's 4 instructions
each way, with the last being 1 byte smaller:

0000000000000000 <lwm1>:
   0:   8d 4f ff                lea    -0x1(%rdi),%ecx
   3:   b8 02 00 00 00          mov    $0x2,%eax
   8:   48 d3 e0                shl    %cl,%rax
   b:   48 83 e8 01             sub    $0x1,%rax
   f:   c3                      retq   

0000000000000010 <lwm2>:
  10:   48 c7 c0 ff ff ff ff    mov    $0xffffffffffffffff,%rax
  17:   89 f9                   mov    %edi,%ecx
  19:   48 d3 e0                shl    %cl,%rax
  1c:   48 f7 d0                not    %rax
  1f:   c3                      retq   

0000000000000020 <lwm3>:
  20:   89 f9                   mov    %edi,%ecx
  22:   f7 d9                   neg    %ecx
  24:   48 c7 c0 ff ff ff ff    mov    $0xffffffffffffffff,%rax
  2b:   48 d3 e8                shr    %cl,%rax
  2e:   c3                      retq   


> Also, I think it is best to handle size==0 appropriately, meaning that
> one cannot dereference addr in any way (and certainly not addr[-1]).

Ah, okay; l I figured that was a safe case to omit.  But your solution is nicer
than mine overall.

It may be that omitting the mask *is* safe, but it's a lot of wading through
callers to prove it.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/3] lib: find_*_bit reimplementation
  2015-02-09 16:45       ` George Spelvin
@ 2015-02-11 22:14         ` Rasmus Villemoes
  0 siblings, 0 replies; 27+ messages in thread
From: Rasmus Villemoes @ 2015-02-11 22:14 UTC (permalink / raw)
  To: George Spelvin
  Cc: akpm, chris, davem, dborkman, hannes, klimov.linux, laijs,
	linux-kernel, msalter, takahiro.akashi, tgraf, valentinrothberg,
	yury.norov

[for some reason google decided to put this in my spam folder, hrmpf]

On Mon, Feb 09 2015, "George Spelvin" <linux@horizon.com> wrote:

> Sorry, I screwed up the bit-twiddling while messing with various options.
> I was trying to get size == 32 to work; that should have been:
>
>> 	tmp &= (2UL << ((size-1) % BITS_PER_LONG)) - 1;	/* Mask last word */
>
> And you're right that LAST_WORD_MASK is a good wrapper.
>

Well, it's not my invention, I just misremembered the
name. linux/bitmap.h already exposes BITMAP_LAST_WORD_MASK.

> Vasrious working solutions include:
> #define LAST_WORD_MASK(bits) 	((2UL << (bits-1) % BITS_PER_LONG) - 1)
> #define LAST_WORD_MASK(bits) 	~(~0UL << bits % BITS_PER_LONG)
> #define LAST_WORD_MASK(bits) 	(~0UL >> -bits % BITS_PER_LONG)

Incidentally, I had a patch lying around for replacing BITMAP_LAST_WORD_MASK by
something like the last of these (it is currently using a ?:). But to allow bits to
have signed type it is safer to spell it

#define BITMAP_LAST_WORD_MASK(bits) (~0UL >> ((-(bits)) & (BITS_PER_LONG-1)))

[also adding lots of parentheses so I don't have to worry about precedence].

> I'm not sure which generates the nicest code.  It's 4 instructions
> each way, with the last being 1 byte smaller:

I think one would have to look at effects on real code; when just compiling a
function doing nothing but this gcc has to use specific registers for in
and out. 

>> Also, I think it is best to handle size==0 appropriately, meaning that
>> one cannot dereference addr in any way (and certainly not addr[-1]).
>
> Ah, okay; l I figured that was a safe case to omit.  But your solution is nicer
> than mine overall.
>
> It may be that omitting the mask *is* safe, but it's a lot of wading through
> callers to prove it.

I think generic library code like this should provide both safety
checks, and only if some true performance bottleneck is found can one
start looking at implementing __shortcuts which have further constraints
on the caller.

Rasmus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/3] lib: find_*_bit reimplementation
  2015-02-09 11:53     ` Rasmus Villemoes
  2015-02-09 16:45       ` George Spelvin
@ 2015-02-11 23:05       ` Yury
  2015-02-12  8:15         ` George Spelvin
  1 sibling, 1 reply; 27+ messages in thread
From: Yury @ 2015-02-11 23:05 UTC (permalink / raw)
  To: Rasmus Villemoes, George Spelvin
  Cc: akpm, chris, davem, dborkman, hannes, klimov.linux, laijs,
	msalter, takahiro.akashi, tgraf, valentinrothberg, linux-kernel


On 09.02.2015 14:53, Rasmus Villemoes wrote:
> [Yury, please do remember to Cc everyone who has previously
> participated]
>
> On Mon, Feb 09 2015, "George Spelvin" <linux@horizon.com> wrote:
>
>> Two more comments on the code.  Two minor, but one that
>> seems like a bug, so for now, it's
>>
>> Nacked-by: George Spelvin <linux@horizon.com> 
>>
>> Specifically, it seems like find_last_bit used to ignore trailing
>> garbage in the bitmap, but now will stop searching if the last word
>> contains some set bits not within size.
> True, though see below.
>
>> The minor one is that I don't think the first-word masking needs to
>> be conditional.  The general code works fine if the start is aligned
>> (HIGH_BITS_MASK just generates an all-ones mask), is quite quick, and
>> saves a test & conditional branch.
>>
> I also noted that during the first review, but when I tried to compile
> it gcc actually generated slightly worse code, so I decided not to
> comment on it. I don't have a strong preference either way, though.
>
>> Previously, the last word was masked, so bits beyond "size" were ignored.
>> With the revised code, something like find_last_bit(array, 96) will return 96
>> if array[1] >> 32 is non-zero, even if array[1] & 0xffffffff is zero.
>>
>> Looking through the callers, I haven't found a case where this matters yet
>> so perhaps it's a safe optimization, but this really needs to be more
>> clearly documented if intentional.
>>
>> If no change was desired, I'd think a good way to do this would be:
>>
>>  unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
>>  {
>> 	size_t idx = DIV_ROUND_UP(size, BITS_PER_LONG);
>>  	unsigned long tmp = addr[--idx];
>>
>> 	tmp &= (2UL << (size % BITS_PER_LONG)) - 1;	/* Mask last word */
>>
>> 	while (!tmp) {
>> 		if (!idx)
>> 			return size;
>>  		tmp = addr[--idx];
>> 	}
>> 	return idx * BITS_PER_LONG + __fls(tmp);
>> }
> How should that work? If size is for example 1, the mask evaluates to 3UL,
> while what is needed is 1UL. If size is aligned, the mask becomes 1UL,
> which is also not right.
>
> Also, I think it is best to handle size==0 appropriately, meaning that
> one cannot dereference addr in any way (and certainly not addr[-1]).
>
> So how about
>
> unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
> {
> 	size_t idx = DIV_ROUND_UP(size, BITS_PER_LONG);
> 	unsigned long mask = LAST_WORD_MASK(size);
>
> 	while (idx--) {
> 		unsigned long val = addr[idx] & mask;
> 		if (val)
> 			return idx * BITS_PER_LONG + __fls(val);
> 		mask = ~0ul;
> 	}
> 	return size;
> }
>
> Rasmus
Rasmus, your version has ANDing by mask, and resetting the mask at each iteration
of main loop. I think we can avoid it. What do you think on next?

unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
{
        size_t idx;
        unsigned long tmp;

        if (!size)
                return 0;

        idx = DIV_ROUND_UP(size, BITS_PER_LONG) - 1;
        tmp = addr[idx] & LAST_WORD_MASK(size);

        while (!tmp) {
                if (!idx--)
                        return size;
                tmp = addr[idx];
        }
        return idx * BITS_PER_LONG + __fls(tmp);
}

Yury

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/3] lib: find_*_bit reimplementation
  2015-02-11 23:05       ` Yury
@ 2015-02-12  8:15         ` George Spelvin
  2015-02-12  9:58           ` Rasmus Villemoes
  0 siblings, 1 reply; 27+ messages in thread
From: George Spelvin @ 2015-02-12  8:15 UTC (permalink / raw)
  To: linux, linux, yury.norov
  Cc: akpm, chris, davem, dborkman, hannes, klimov.linux, laijs,
	linux-kernel, msalter, takahiro.akashi, tgraf, valentinrothberg

> Rasmus, your version has ANDing by mask, and resetting the mask at each iteration
> of main loop. I think we can avoid it. What do you think on next?

Yes, that's basically what I proposed (modulo checking for zero size and
my buggy LAST_WORD_MASK).

But two unconditional instructions in the loop are awfully minor; it's
loads and conditional branches that cost.

The reset of the mask can be done in parallel with other operations; it's
only the AND that actually takes a cycle.

I can definitely see the argument that, for code that's not used often
enough to stay resident in the L1 cache, any speedup has to win by at
least one L2 cache access to be worth taking another cache line.

For Ivy bridge, those numbers are 32 KB and 12 cycles.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/3] lib: find_*_bit reimplementation
  2015-02-12  8:15         ` George Spelvin
@ 2015-02-12  9:58           ` Rasmus Villemoes
  2015-02-12 23:46             ` George Spelvin
  0 siblings, 1 reply; 27+ messages in thread
From: Rasmus Villemoes @ 2015-02-12  9:58 UTC (permalink / raw)
  To: George Spelvin
  Cc: yury.norov, akpm, chris, davem, dborkman, hannes, klimov.linux,
	laijs, linux-kernel, msalter, takahiro.akashi, tgraf,
	valentinrothberg

On Thu, Feb 12 2015, "George Spelvin" <linux@horizon.com> wrote:

>> Rasmus, your version has ANDing by mask, and resetting the mask at each iteration
>> of main loop. I think we can avoid it. What do you think on next?
>
> Yes, that's basically what I proposed (modulo checking for zero size and
> my buggy LAST_WORD_MASK).
>
> But two unconditional instructions in the loop are awfully minor; it's
> loads and conditional branches that cost.
>
> The reset of the mask can be done in parallel with other operations; it's
> only the AND that actually takes a cycle.
>
> I can definitely see the argument that, for code that's not used often
> enough to stay resident in the L1 cache, any speedup has to win by at
> least one L2 cache access to be worth taking another cache line.

That, and if the compiler was smart enough, the AND should actually be
free (at least on x86), since it can replace a test instruction. [1]

In any case, my code compiles to fewer bytes (though not an entire cache
line), and I think it is somewhat clearer - I'm obviously biased on the
latter point.

Rasmus


[1] Unfortunately, the compiler doesn't seem to be smart enough :-( When I
compile my code with gcc 5.0, I get

0000000000000000 <find_last_bit_rv>:
   0:   53                      push   %rbx
   1:   89 f1                   mov    %esi,%ecx
   3:   48 8d 5e 3f             lea    0x3f(%rsi),%rbx
   7:   f7 d9                   neg    %ecx
   9:   48 c7 c2 ff ff ff ff    mov    $0xffffffffffffffff,%rdx
  10:   48 83 e3 c0             and    $0xffffffffffffffc0,%rbx
  14:   48 d3 ea                shr    %cl,%rdx
  17:   eb 1a                   jmp    33 <find_last_bit_rv+0x33>
  19:   0f 1f 80 00 00 00 00    nopl   0x0(%rax)
  20:   48 89 d1                mov    %rdx,%rcx
  23:   48 23 0c df             and    (%rdi,%rbx,8),%rcx
  27:   48 c7 c2 ff ff ff ff    mov    $0xffffffffffffffff,%rdx
  2e:   48 85 c9                test   %rcx,%rcx
  31:   75 15                   jne    48 <find_last_bit_rv+0x48>
  33:   48 83 eb 01             sub    $0x1,%rbx
  37:   48 83 fb ff             cmp    $0xffffffffffffffff,%rbx
  3b:   75 e3                   jne    20 <find_last_bit_rv+0x20>
  3d:   48 89 f0                mov    %rsi,%rax
  40:   5b                      pop    %rbx
  41:   c3                      retq   
  42:   66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
  48:   48 89 cf                mov    %rcx,%rdi
  4b:   48 c1 e3 06             shl    $0x6,%rbx
  4f:   e8 00 00 00 00          callq  54 <find_last_bit_rv+0x54>
  54:   48 98                   cltq   
  56:   48 01 d8                add    %rbx,%rax
  59:   5b                      pop    %rbx
  5a:   c3                      retq   
  5b:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

the main loop is 20--3b. The test instruction at 2e seems to be
redundant. The same at 37: the sub instruction already sets plenty of
flags that could be used, so explicitly comparing %rbx to -1 seems
redundant.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/3] lib: find_*_bit reimplementation
  2015-02-12  9:58           ` Rasmus Villemoes
@ 2015-02-12 23:46             ` George Spelvin
  2015-02-13 10:13               ` Rasmus Villemoes
  0 siblings, 1 reply; 27+ messages in thread
From: George Spelvin @ 2015-02-12 23:46 UTC (permalink / raw)
  To: linux, linux
  Cc: akpm, chris, davem, dborkman, hannes, klimov.linux, laijs,
	linux-kernel, msalter, takahiro.akashi, tgraf, valentinrothberg,
	yury.norov

> That, and if the compiler was smart enough, the AND should actually be
> free (at least on x86), since it can replace a test instruction. [1]

Ah, right, x86 loads don't set the flags, so you need a TEST
instruction anyway.  So the AND is free!

Of course, not all the world's an x86.  But load/store architectures
generally need a test instruction as well.  It's things like MIPS and
Aloha, that don't have condition codes, but only conditional branches
on register values, that will need an extra cycle.

> In any case, my code compiles to fewer bytes (though not an entire cache
> line), and I think it is somewhat clearer - I'm obviously biased on the
> latter point.

Myself, I think the code that shows that only the first word gets masked
by control flow is clearer, but since the complexity is completely
localized to this function, I'll take smaller and faster.

> [1] Unfortunately, the compiler doesn't seem to be smart enough :-( When I
> compile my code with gcc 5.0, I get
> 
> 0000000000000000 <find_last_bit_rv>:
>    0:   53                      push   %rbx
>    1:   89 f1                   mov    %esi,%ecx
>    3:   48 8d 5e 3f             lea    0x3f(%rsi),%rbx
>    7:   f7 d9                   neg    %ecx
>    9:   48 c7 c2 ff ff ff ff    mov    $0xffffffffffffffff,%rdx
>   10:   48 83 e3 c0             and    $0xffffffffffffffc0,%rbx
>   14:   48 d3 ea                shr    %cl,%rdx
>   17:   eb 1a                   jmp    33 <find_last_bit_rv+0x33>
>   19:   0f 1f 80 00 00 00 00    nopl   0x0(%rax)
> 
>   20:   48 89 d1                mov    %rdx,%rcx
>   23:   48 23 0c df             and    (%rdi,%rbx,8),%rcx
>   27:   48 c7 c2 ff ff ff ff    mov    $0xffffffffffffffff,%rdx
>   2e:   48 85 c9                test   %rcx,%rcx
>   31:   75 15                   jne    48 <find_last_bit_rv+0x48>
>   33:   48 83 eb 01             sub    $0x1,%rbx
>   37:   48 83 fb ff             cmp    $0xffffffffffffffff,%rbx
>   3b:   75 e3                   jne    20 <find_last_bit_rv+0x20>
> 
>   3d:   48 89 f0                mov    %rsi,%rax
>   40:   5b                      pop    %rbx
>   41:   c3                      retq   
>   42:   66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
>   48:   48 89 cf                mov    %rcx,%rdi
>   4b:   48 c1 e3 06             shl    $0x6,%rbx
>   4f:   e8 00 00 00 00          callq  54 <find_last_bit_rv+0x54>
>   54:   48 98                   cltq   
>   56:   48 01 d8                add    %rbx,%rax
>   59:   5b                      pop    %rbx
>   5a:   c3                      retq   
>   5b:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

> the main loop is 20--3b. The test instruction at 2e seems to be
> redundant. The same at 37: the sub instruction already sets plenty of
> flags that could be used, so explicitly comparing %rbx to -1 seems
> redundant.

Er... I think you hand-edited that code; it's wrong.  The loop assumes that
%rbx is in units of words, but the prologue sets it up in units of bits.

The mov to %rcx is also redundant, since it could be eliminated with some
minor rescheduling.

The code generation I *want* for that function is:

# addr in %rdi, size in %rsi
	movl	%esi, %ecx
	leaq	0x3f(%rsi), %rax
	negl	%ecx
	movq	$-1, %rdx
        shrq	$6, %rax
	shrq	%cl, %rdx
	jmp	2f
1:
	movq	$-1, %rdx
2:
	subq	$1, %rax
	jc	3f
	andq	(%rdi,%rax,8), %rdx
	jeq	1b

	bsrq	%rdx, %rdx
        salq    $6, %rax
	addq	%rdx, %rax
        ret
3:
	movq	%rsi, %rax
	retq

I wonder if the compiler can be convinced by a bit of source tweaking?

unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
{
	size_t idx = DIV_ROUND_UP(size, BITS_PER_LONG);
	unsigned long val = LAST_WORD_MASK(size);

	while (idx--) {
		val &= addr[idx];
		if (val)
			return idx * BITS_PER_LONG + __fls(val);
		val = ~0ul;
	}
	return size;
}

Darn, no difference...

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/3] lib: find_*_bit reimplementation
  2015-02-12 23:46             ` George Spelvin
@ 2015-02-13 10:13               ` Rasmus Villemoes
  0 siblings, 0 replies; 27+ messages in thread
From: Rasmus Villemoes @ 2015-02-13 10:13 UTC (permalink / raw)
  To: George Spelvin
  Cc: akpm, chris, davem, dborkman, hannes, klimov.linux, laijs,
	linux-kernel, msalter, takahiro.akashi, tgraf, valentinrothberg,
	yury.norov

On Fri, Feb 13 2015, "George Spelvin" <linux@horizon.com> wrote:

>> the main loop is 20--3b. The test instruction at 2e seems to be
>> redundant. The same at 37: the sub instruction already sets plenty of
>> flags that could be used, so explicitly comparing %rbx to -1 seems
>> redundant.
>
> Er... I think you hand-edited that code; it's wrong.  The loop assumes that
> %rbx is in units of words, but the prologue sets it up in units of bits.

No, but I messed up the source by hand :-) My DIV_ROUND_UP macro was
bogus. Well spotted. Fixing that I still see the redundant cmp and
test, though.

> The mov to %rcx is also redundant, since it could be eliminated with
> some minor rescheduling.
>
> The code generation I *want* for that function is:
>
> # addr in %rdi, size in %rsi
> 	movl	%esi, %ecx
> 	leaq	0x3f(%rsi), %rax
> 	negl	%ecx
> 	movq	$-1, %rdx
>         shrq	$6, %rax
> 	shrq	%cl, %rdx
> 	jmp	2f
> 1:
> 	movq	$-1, %rdx
> 2:
> 	subq	$1, %rax
> 	jc	3f
> 	andq	(%rdi,%rax,8), %rdx
> 	jeq	1b
>
> 	bsrq	%rdx, %rdx
>         salq    $6, %rax
> 	addq	%rdx, %rax
>         ret
> 3:
> 	movq	%rsi, %rax
> 	retq

Nice. But I don't think find_last_bit is important enough to warrant
arch-specific versions.

So, where are we with this? Have we reached some kind of consensus?

Rasmus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v4 0/3] lib: find_*_bit reimplementation
  2015-02-08 14:10 [PATCH v3 0/3] lib: find_*_bit reimplementation Yury Norov
                   ` (2 preceding siblings ...)
  2015-02-08 14:10 ` [PATCH v3 3/3] lib: rename lib/find_next_bit.c to lib/find_bit.c Yury Norov
@ 2015-02-17  2:35 ` Yury Norov
  2015-02-17  2:35   ` [PATCH v4 1/3] " Yury Norov
                     ` (2 more replies)
  2015-02-22 17:24 ` [PATCH v5 0/3] lib: find_*_bit reimplementation Yury Norov
  4 siblings, 3 replies; 27+ messages in thread
From: Yury Norov @ 2015-02-17  2:35 UTC (permalink / raw)
  To: linux, klimov.linux, linux
  Cc: akpm, davem, dborkman, hannes, laijs, msalter, takahiro.akashi,
	tgraf, valentinrothberg, yury.norov, linux-kernel, chris

This patchset does rework find_bit functions family to achieve better
performance, and decrease size of text. All rework is done in patch 1.
Patches 2 and 3 are about code moving and renaming.

It was boot-tested on x86_64 and MIPS (big-endian) machines.
Performance tests were ran on userspace with code like this:

	/* addr[] is filled from /dev/urandom */
	start = clock();
	while (ret < nbits)
		ret = find_next_bit(addr, nbits, ret + 1);

	end = clock();
	printf("%ld\t", (unsigned long) end - start);

On Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz measuremets are next:
(for find_next_bit, nbits is 8M, for find_first_bit - 80K)

	find_next_bit:		find_first_bit:
	new	current		new	current
	26932	43151		14777	14925
	26947	43182		14521	15423
	26507	43824		15053	14705
	27329	43759		14473	14777
	26895	43367		14847	15023
	26990	43693		15103	15163
	26775	43299		15067	15232
	27282	42752		14544	15121
	27504	43088		14644	14858
	26761	43856		14699	15193
	26692	43075		14781	14681
	27137	42969		14451	15061
	...			...

find_next_bit performance gain is 35-40%;
find_first_bit - no measurable difference.

On ARM machine, there is arch-specific implementation for find_bit.
To disable it, and use generic one, please apply next patch:
	
	---
	 arch/arm/include/asm/bitops.h   | 19 -------------------
	 arch/arm/kernel/armksyms.c      | 11 -----------
	 arch/arm/lib/Makefile           |  2 +-
	 include/asm-generic/bitops/le.h |  1 +
	 4 files changed, 2 insertions(+), 31 deletions(-)

	diff --git a/arch/arm/include/asm/bitops.h b/arch/arm/include/asm/bitops.h
	index 5638099..e0611d1 100644
	--- a/arch/arm/include/asm/bitops.h
	+++ b/arch/arm/include/asm/bitops.h
	@@ -192,25 +192,6 @@ extern int _find_next_bit_be(const unsigned long *p, int size, int offset);
	 #define test_and_clear_bit(nr,p)	ATOMIC_BITOP(test_and_clear_bit,nr,p)
	 #define test_and_change_bit(nr,p)	ATOMIC_BITOP(test_and_change_bit,nr,p)
	 
	-#ifndef __ARMEB__
	-/*
	- * These are the little endian, atomic definitions.
	- */
	-#define find_first_zero_bit(p,sz)	_find_first_zero_bit_le(p,sz)
	-#define find_next_zero_bit(p,sz,off)	_find_next_zero_bit_le(p,sz,off)
	-#define find_first_bit(p,sz)		_find_first_bit_le(p,sz)
	-#define find_next_bit(p,sz,off)		_find_next_bit_le(p,sz,off)
	-
	-#else
	-/*
	- * These are the big endian, atomic definitions.
	- */
	-#define find_first_zero_bit(p,sz)	_find_first_zero_bit_be(p,sz)
	-#define find_next_zero_bit(p,sz,off)	_find_next_zero_bit_be(p,sz,off)
	-#define find_first_bit(p,sz)		_find_first_bit_be(p,sz)
	-#define find_next_bit(p,sz,off)		_find_next_bit_be(p,sz,off)
	-
	-#endif
	 
	 #if __LINUX_ARM_ARCH__ < 5
	 
	diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
	index a88671c..22e8748 100644
	--- a/arch/arm/kernel/armksyms.c
	+++ b/arch/arm/kernel/armksyms.c
	@@ -146,17 +146,6 @@ EXPORT_SYMBOL(_clear_bit);
	 EXPORT_SYMBOL(_test_and_clear_bit);
	 EXPORT_SYMBOL(_change_bit);
	 EXPORT_SYMBOL(_test_and_change_bit);
	-EXPORT_SYMBOL(_find_first_zero_bit_le);
	-EXPORT_SYMBOL(_find_next_zero_bit_le);
	-EXPORT_SYMBOL(_find_first_bit_le);
	-EXPORT_SYMBOL(_find_next_bit_le);
	-
	-#ifdef __ARMEB__
	-EXPORT_SYMBOL(_find_first_zero_bit_be);
	-EXPORT_SYMBOL(_find_next_zero_bit_be);
	-EXPORT_SYMBOL(_find_first_bit_be);
	-EXPORT_SYMBOL(_find_next_bit_be);
	-#endif
	 
	 #ifdef CONFIG_FUNCTION_TRACER
	 #ifdef CONFIG_OLD_MCOUNT
	diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
	index 0573faa..de369aa 100644
	--- a/arch/arm/lib/Makefile
	+++ b/arch/arm/lib/Makefile
	@@ -6,7 +6,7 @@
	 
	 lib-y		:= backtrace.o changebit.o csumipv6.o csumpartial.o   \
			   csumpartialcopy.o csumpartialcopyuser.o clearbit.o \
	-		   delay.o delay-loop.o findbit.o memchr.o memcpy.o   \
	+		   delay.o delay-loop.o memchr.o memcpy.o	      \
			   memmove.o memset.o memzero.o setbit.o              \
			   strchr.o strrchr.o                                 \
			   testchangebit.o testclearbit.o testsetbit.o        \
	diff --git a/include/asm-generic/bitops/le.h b/include/asm-generic/bitops/le.h
	index 6173154..9a8798f 100644
	--- a/include/asm-generic/bitops/le.h
	+++ b/include/asm-generic/bitops/le.h
	@@ -2,6 +2,7 @@
	 #define _ASM_GENERIC_BITOPS_LE_H_
	 
	 #include <asm/types.h>
	+#include <asm-generic/bitops/find.h>
	 #include <asm/byteorder.h>
	 
	 #if defined(__LITTLE_ENDIAN)
	-- 
	1.9.1

Thanks a lot to George Spelvin and Rasmus Villemoes for hints and
helpful discussions.
	
v4:
* find_last_bit found buggy and replaced with George Spelvin's version,
which handles garbage at the end of word-unaligned bitmap correctly;
* find_last_bit description fixed in 'include/linux/bitops.h';
* '_find_next_bit{,_le}: first word handling turned to be unconditional;

v3:
* conditional bit inverting in _find_next_bit replaced with XORing by
  mask (patch 1);
* size/offset checkers moved from wrappers to _find_next_bit (patch 1);
* #include <linux/bitops.h> restored (patch 1);
* lib/find_next_bit descriptor changed to not appeal to file name and
  so let to produce clean diff in case of rename (patch 2);
* patch 3 is generated with '-M' option to detect renaming and drop
  file content;
* this cover added.

Yury Norov (3):
  lib: find_*_bit reimplementation
  lib: move find_last_bit to lib/find_next_bit.c
  lib: rename lib/find_next_bit.c to lib/find_bit.c

 include/linux/bitops.h |   4 +-
 lib/Makefile           |   2 +-
 lib/find_bit.c         | 195 +++++++++++++++++++++++++++++++++
 lib/find_last_bit.c    |  49 ---------
 lib/find_next_bit.c    | 285 -------------------------------------------------
 5 files changed, 198 insertions(+), 337 deletions(-)
 create mode 100644 lib/find_bit.c
 delete mode 100644 lib/find_last_bit.c
 delete mode 100644 lib/find_next_bit.c

-- 
2.1.0

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v4 1/3] lib: find_*_bit reimplementation
  2015-02-17  2:35 ` [PATCH v4 0/3] lib: find_*_bit reimplementation Yury Norov
@ 2015-02-17  2:35   ` Yury Norov
  2015-02-18 17:57     ` Rasmus Villemoes
  2015-02-17  2:35   ` [PATCH v4 2/3] lib: move find_last_bit to lib/find_next_bit.c Yury Norov
  2015-02-17  2:35   ` [PATCH v4 3/3] lib: rename lib/find_next_bit.c to lib/find_bit.c Yury Norov
  2 siblings, 1 reply; 27+ messages in thread
From: Yury Norov @ 2015-02-17  2:35 UTC (permalink / raw)
  To: linux, klimov.linux, linux
  Cc: akpm, davem, dborkman, hannes, laijs, msalter, takahiro.akashi,
	tgraf, valentinrothberg, yury.norov, linux-kernel, chris

The new implementation takes less space in the sources
(see diffstat) and in the object. For me it's 710 vs 453
bytes of text. It also shows a better performance.

find_last_bit description fixed due to obvious typo.

In this patch 2 macros were introduced: {LOW,HIGH}_BITS_MASK,
that are doing almost the same as BITMAP_{FIRST,LAST}_WORD_MASK
in include/linux/bitmap.h. But 'LAST' macro is potentially less
effective, because it issues a conditional branch that can be
omitted.  If it is replaced one day by a more effective
implementation, {LOW,HIGH}_BITS_MASK can be removed.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 include/linux/bitops.h |   4 +-
 lib/find_last_bit.c    |  37 +++----
 lib/find_next_bit.c    | 269 ++++++++++++++-----------------------------------
 3 files changed, 94 insertions(+), 216 deletions(-)

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 5d858e0..297f5bd 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -218,9 +218,9 @@ static inline unsigned long __ffs64(u64 word)
 /**
  * find_last_bit - find the last set bit in a memory region
  * @addr: The address to start the search at
- * @size: The maximum size to search
+ * @size: The number of bits to search
  *
- * Returns the bit number of the first set bit, or size.
+ * Returns the bit number of the last set bit, or size.
  */
 extern unsigned long find_last_bit(const unsigned long *addr,
 				   unsigned long size);
diff --git a/lib/find_last_bit.c b/lib/find_last_bit.c
index 91ca09f..edbb281 100644
--- a/lib/find_last_bit.c
+++ b/lib/find_last_bit.c
@@ -4,6 +4,9 @@
  * Written by Rusty Russell <rusty@rustcorp.com.au>
  * (Inspired by David Howell's find_next_bit implementation)
  *
+ * Rewritten by Yury Norov <yury.norov@gmail.com> to decrease
+ * size and improve performance, 2015.
+ *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
  * as published by the Free Software Foundation; either version
@@ -12,36 +15,26 @@
 
 #include <linux/bitops.h>
 #include <linux/export.h>
-#include <asm/types.h>
-#include <asm/byteorder.h>
+#include <linux/kernel.h>
+
+#define LOW_BITS_MASK(nr) (~0UL >> -(nr))
 
 #ifndef find_last_bit
 
 unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
 {
-	unsigned long words;
-	unsigned long tmp;
+	if (size) {
+		unsigned long val = LOW_BITS_MASK(size % BITS_PER_LONG);
+		unsigned long idx = (size-1) / BITS_PER_LONG;
 
-	/* Start at final word. */
-	words = size / BITS_PER_LONG;
+		do {
+			val &= addr[idx];
+			if (val)
+				return idx * BITS_PER_LONG + __fls(val);
 
-	/* Partial final word? */
-	if (size & (BITS_PER_LONG-1)) {
-		tmp = (addr[words] & (~0UL >> (BITS_PER_LONG
-					 - (size & (BITS_PER_LONG-1)))));
-		if (tmp)
-			goto found;
+			val = ~0ul;
+		} while (idx--);
 	}
-
-	while (words) {
-		tmp = addr[--words];
-		if (tmp) {
-found:
-			return words * BITS_PER_LONG + __fls(tmp);
-		}
-	}
-
-	/* Not found */
 	return size;
 }
 EXPORT_SYMBOL(find_last_bit);
diff --git a/lib/find_next_bit.c b/lib/find_next_bit.c
index 0cbfc0b..1f5f108 100644
--- a/lib/find_next_bit.c
+++ b/lib/find_next_bit.c
@@ -3,6 +3,9 @@
  * Copyright (C) 2004 Red Hat, Inc. All Rights Reserved.
  * Written by David Howells (dhowells@redhat.com)
  *
+ * Rewritten by Yury Norov <yury.norov@gmail.com> to decrease
+ * size and improve performance, 2015.
+ *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
  * as published by the Free Software Foundation; either version
@@ -11,98 +14,60 @@
 
 #include <linux/bitops.h>
 #include <linux/export.h>
-#include <asm/types.h>
-#include <asm/byteorder.h>
+#include <linux/kernel.h>
 
-#define BITOP_WORD(nr)		((nr) / BITS_PER_LONG)
+#define HIGH_BITS_MASK(nr)		(~0UL << (nr))
+
+#if !defined(find_next_bit) || !defined(find_next_zero_bit)
 
-#ifndef find_next_bit
 /*
- * Find the next set bit in a memory region.
+ * This is a common helper function for find_next_bit and
+ * find_next_zero_bit.  The difference is the "invert" argument, which
+ * is XORed with each fetched word before searching it for one bits.
  */
-unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
-			    unsigned long offset)
+static unsigned long _find_next_bit(const unsigned long *addr,
+		unsigned long nbits, unsigned long start, unsigned long invert)
 {
-	const unsigned long *p = addr + BITOP_WORD(offset);
-	unsigned long result = offset & ~(BITS_PER_LONG-1);
 	unsigned long tmp;
 
-	if (offset >= size)
-		return size;
-	size -= result;
-	offset %= BITS_PER_LONG;
-	if (offset) {
-		tmp = *(p++);
-		tmp &= (~0UL << offset);
-		if (size < BITS_PER_LONG)
-			goto found_first;
-		if (tmp)
-			goto found_middle;
-		size -= BITS_PER_LONG;
-		result += BITS_PER_LONG;
-	}
-	while (size & ~(BITS_PER_LONG-1)) {
-		if ((tmp = *(p++)))
-			goto found_middle;
-		result += BITS_PER_LONG;
-		size -= BITS_PER_LONG;
+	if (!nbits || start >= nbits)
+		return nbits;
+
+	tmp = addr[start / BITS_PER_LONG] ^ invert;
+
+	/* Handle 1st word. */
+	tmp &= HIGH_BITS_MASK(start % BITS_PER_LONG);
+	start = round_down(start, BITS_PER_LONG);
+
+	while (!tmp) {
+		start += BITS_PER_LONG;
+		if (start >= nbits)
+			return nbits;
+
+		tmp = addr[start / BITS_PER_LONG] ^ invert;
 	}
-	if (!size)
-		return result;
-	tmp = *p;
 
-found_first:
-	tmp &= (~0UL >> (BITS_PER_LONG - size));
-	if (tmp == 0UL)		/* Are any bits set? */
-		return result + size;	/* Nope. */
-found_middle:
-	return result + __ffs(tmp);
+	return min(start + __ffs(tmp), nbits);
 }
-EXPORT_SYMBOL(find_next_bit);
 #endif
 
-#ifndef find_next_zero_bit
+#ifndef find_next_bit
 /*
- * This implementation of find_{first,next}_zero_bit was stolen from
- * Linus' asm-alpha/bitops.h.
+ * Find the next set bit in a memory region.
  */
+unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
+			    unsigned long offset)
+{
+	return _find_next_bit(addr, size, offset, 0UL);
+}
+EXPORT_SYMBOL(find_next_bit);
+#endif
+
+#ifndef find_next_zero_bit
 unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
 				 unsigned long offset)
 {
-	const unsigned long *p = addr + BITOP_WORD(offset);
-	unsigned long result = offset & ~(BITS_PER_LONG-1);
-	unsigned long tmp;
-
-	if (offset >= size)
-		return size;
-	size -= result;
-	offset %= BITS_PER_LONG;
-	if (offset) {
-		tmp = *(p++);
-		tmp |= ~0UL >> (BITS_PER_LONG - offset);
-		if (size < BITS_PER_LONG)
-			goto found_first;
-		if (~tmp)
-			goto found_middle;
-		size -= BITS_PER_LONG;
-		result += BITS_PER_LONG;
-	}
-	while (size & ~(BITS_PER_LONG-1)) {
-		if (~(tmp = *(p++)))
-			goto found_middle;
-		result += BITS_PER_LONG;
-		size -= BITS_PER_LONG;
-	}
-	if (!size)
-		return result;
-	tmp = *p;
-
-found_first:
-	tmp |= ~0UL << size;
-	if (tmp == ~0UL)	/* Are any bits zero? */
-		return result + size;	/* Nope. */
-found_middle:
-	return result + ffz(tmp);
+	return _find_next_bit(addr, size, offset, ~0UL);
 }
 EXPORT_SYMBOL(find_next_zero_bit);
 #endif
@@ -113,24 +78,14 @@ EXPORT_SYMBOL(find_next_zero_bit);
  */
 unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
 {
-	const unsigned long *p = addr;
-	unsigned long result = 0;
-	unsigned long tmp;
+	unsigned long idx;
 
-	while (size & ~(BITS_PER_LONG-1)) {
-		if ((tmp = *(p++)))
-			goto found;
-		result += BITS_PER_LONG;
-		size -= BITS_PER_LONG;
+	for (idx = 0; idx * BITS_PER_LONG < size; idx++) {
+		if (addr[idx])
+			return min(idx * BITS_PER_LONG + __ffs(addr[idx]), size);
 	}
-	if (!size)
-		return result;
 
-	tmp = (*p) & (~0UL >> (BITS_PER_LONG - size));
-	if (tmp == 0UL)		/* Are any bits set? */
-		return result + size;	/* Nope. */
-found:
-	return result + __ffs(tmp);
+	return size;
 }
 EXPORT_SYMBOL(find_first_bit);
 #endif
@@ -141,24 +96,14 @@ EXPORT_SYMBOL(find_first_bit);
  */
 unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
 {
-	const unsigned long *p = addr;
-	unsigned long result = 0;
-	unsigned long tmp;
+	unsigned long idx;
 
-	while (size & ~(BITS_PER_LONG-1)) {
-		if (~(tmp = *(p++)))
-			goto found;
-		result += BITS_PER_LONG;
-		size -= BITS_PER_LONG;
+	for (idx = 0; idx * BITS_PER_LONG < size; idx++) {
+		if (addr[idx] != ~0UL)
+			return min(idx * BITS_PER_LONG + ffz(addr[idx]), size);
 	}
-	if (!size)
-		return result;
 
-	tmp = (*p) | (~0UL << size);
-	if (tmp == ~0UL)	/* Are any bits zero? */
-		return result + size;	/* Nope. */
-found:
-	return result + ffz(tmp);
+	return size;
 }
 EXPORT_SYMBOL(find_first_zero_bit);
 #endif
@@ -166,18 +111,6 @@ EXPORT_SYMBOL(find_first_zero_bit);
 #ifdef __BIG_ENDIAN
 
 /* include/linux/byteorder does not support "unsigned long" type */
-static inline unsigned long ext2_swabp(const unsigned long * x)
-{
-#if BITS_PER_LONG == 64
-	return (unsigned long) __swab64p((u64 *) x);
-#elif BITS_PER_LONG == 32
-	return (unsigned long) __swab32p((u32 *) x);
-#else
-#error BITS_PER_LONG not defined
-#endif
-}
-
-/* include/linux/byteorder doesn't support "unsigned long" type */
 static inline unsigned long ext2_swab(const unsigned long y)
 {
 #if BITS_PER_LONG == 64
@@ -189,48 +122,38 @@ static inline unsigned long ext2_swab(const unsigned long y)
 #endif
 }
 
-#ifndef find_next_zero_bit_le
-unsigned long find_next_zero_bit_le(const void *addr, unsigned
-		long size, unsigned long offset)
+#if !defined(find_next_bit_le) || !defined(find_next_zero_bit_le)
+static unsigned long _find_next_bit_le(const unsigned long *addr,
+		unsigned long nbits, unsigned long start, unsigned long invert)
 {
-	const unsigned long *p = addr;
-	unsigned long result = offset & ~(BITS_PER_LONG - 1);
 	unsigned long tmp;
 
-	if (offset >= size)
-		return size;
-	p += BITOP_WORD(offset);
-	size -= result;
-	offset &= (BITS_PER_LONG - 1UL);
-	if (offset) {
-		tmp = ext2_swabp(p++);
-		tmp |= (~0UL >> (BITS_PER_LONG - offset));
-		if (size < BITS_PER_LONG)
-			goto found_first;
-		if (~tmp)
-			goto found_middle;
-		size -= BITS_PER_LONG;
-		result += BITS_PER_LONG;
-	}
+	if (!nbits || start >= nbits)
+		return nbits;
+
+	tmp = addr[start / BITS_PER_LONG] ^ invert;
+
+	/* Handle 1st word. */
+	tmp &= ext2_swab(HIGH_BITS_MASK(start % BITS_PER_LONG));
+	start = round_down(start, BITS_PER_LONG);
 
-	while (size & ~(BITS_PER_LONG - 1)) {
-		if (~(tmp = *(p++)))
-			goto found_middle_swap;
-		result += BITS_PER_LONG;
-		size -= BITS_PER_LONG;
+	while (!tmp) {
+		start += BITS_PER_LONG;
+		if (start >= nbits)
+			return nbits;
+
+		tmp = addr[start / BITS_PER_LONG] ^ invert;
 	}
-	if (!size)
-		return result;
-	tmp = ext2_swabp(p);
-found_first:
-	tmp |= ~0UL << size;
-	if (tmp == ~0UL)	/* Are any bits zero? */
-		return result + size; /* Nope. Skip ffz */
-found_middle:
-	return result + ffz(tmp);
 
-found_middle_swap:
-	return result + ffz(ext2_swab(tmp));
+	return min(start + __ffs(ext2_swab(tmp)), nbits);
+}
+#endif
+
+#ifndef find_next_zero_bit_le
+unsigned long find_next_zero_bit_le(const void *addr, unsigned
+		long size, unsigned long offset)
+{
+	return _find_next_bit_le(addr, size, offset, ~0UL);
 }
 EXPORT_SYMBOL(find_next_zero_bit_le);
 #endif
@@ -239,45 +162,7 @@ EXPORT_SYMBOL(find_next_zero_bit_le);
 unsigned long find_next_bit_le(const void *addr, unsigned
 		long size, unsigned long offset)
 {
-	const unsigned long *p = addr;
-	unsigned long result = offset & ~(BITS_PER_LONG - 1);
-	unsigned long tmp;
-
-	if (offset >= size)
-		return size;
-	p += BITOP_WORD(offset);
-	size -= result;
-	offset &= (BITS_PER_LONG - 1UL);
-	if (offset) {
-		tmp = ext2_swabp(p++);
-		tmp &= (~0UL << offset);
-		if (size < BITS_PER_LONG)
-			goto found_first;
-		if (tmp)
-			goto found_middle;
-		size -= BITS_PER_LONG;
-		result += BITS_PER_LONG;
-	}
-
-	while (size & ~(BITS_PER_LONG - 1)) {
-		tmp = *(p++);
-		if (tmp)
-			goto found_middle_swap;
-		result += BITS_PER_LONG;
-		size -= BITS_PER_LONG;
-	}
-	if (!size)
-		return result;
-	tmp = ext2_swabp(p);
-found_first:
-	tmp &= (~0UL >> (BITS_PER_LONG - size));
-	if (tmp == 0UL)		/* Are any bits set? */
-		return result + size; /* Nope. */
-found_middle:
-	return result + __ffs(tmp);
-
-found_middle_swap:
-	return result + __ffs(ext2_swab(tmp));
+	return _find_next_bit_le(addr, size, offset, 0UL);
 }
 EXPORT_SYMBOL(find_next_bit_le);
 #endif
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 2/3] lib: move find_last_bit to lib/find_next_bit.c
  2015-02-17  2:35 ` [PATCH v4 0/3] lib: find_*_bit reimplementation Yury Norov
  2015-02-17  2:35   ` [PATCH v4 1/3] " Yury Norov
@ 2015-02-17  2:35   ` Yury Norov
  2015-02-17  2:35   ` [PATCH v4 3/3] lib: rename lib/find_next_bit.c to lib/find_bit.c Yury Norov
  2 siblings, 0 replies; 27+ messages in thread
From: Yury Norov @ 2015-02-17  2:35 UTC (permalink / raw)
  To: linux, klimov.linux, linux
  Cc: akpm, davem, dborkman, hannes, laijs, msalter, takahiro.akashi,
	tgraf, valentinrothberg, yury.norov, linux-kernel, chris

Currently all 'find_*_bit' family is located in lib/find_next_bit.c,
except 'find_last_bit', which is in lib/find_last_bit.c. It seems,
there's no major benefit to have it separated.
---
 lib/Makefile        |  2 +-
 lib/find_last_bit.c | 42 ------------------------------------------
 lib/find_next_bit.c | 27 ++++++++++++++++++++++++++-
 3 files changed, 27 insertions(+), 44 deletions(-)
 delete mode 100644 lib/find_last_bit.c

diff --git a/lib/Makefile b/lib/Makefile
index 87eb3bf..9dafcd5 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -25,7 +25,7 @@ obj-y	+= lockref.o
 obj-y += bcd.o div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \
 	 bust_spinlocks.o kasprintf.o bitmap.o scatterlist.o \
 	 gcd.o lcm.o list_sort.o uuid.o flex_array.o clz_ctz.o \
-	 bsearch.o find_last_bit.o find_next_bit.o llist.o memweight.o kfifo.o \
+	 bsearch.o find_next_bit.o llist.o memweight.o kfifo.o \
 	 percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o
 obj-y += string_helpers.o
 obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
diff --git a/lib/find_last_bit.c b/lib/find_last_bit.c
deleted file mode 100644
index edbb281..0000000
--- a/lib/find_last_bit.c
+++ /dev/null
@@ -1,42 +0,0 @@
-/* find_last_bit.c: fallback find next bit implementation
- *
- * Copyright (C) 2008 IBM Corporation
- * Written by Rusty Russell <rusty@rustcorp.com.au>
- * (Inspired by David Howell's find_next_bit implementation)
- *
- * Rewritten by Yury Norov <yury.norov@gmail.com> to decrease
- * size and improve performance, 2015.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- */
-
-#include <linux/bitops.h>
-#include <linux/export.h>
-#include <linux/kernel.h>
-
-#define LOW_BITS_MASK(nr) (~0UL >> -(nr))
-
-#ifndef find_last_bit
-
-unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
-{
-	if (size) {
-		unsigned long val = LOW_BITS_MASK(size % BITS_PER_LONG);
-		unsigned long idx = (size-1) / BITS_PER_LONG;
-
-		do {
-			val &= addr[idx];
-			if (val)
-				return idx * BITS_PER_LONG + __fls(val);
-
-			val = ~0ul;
-		} while (idx--);
-	}
-	return size;
-}
-EXPORT_SYMBOL(find_last_bit);
-
-#endif
diff --git a/lib/find_next_bit.c b/lib/find_next_bit.c
index 1f5f108..627d0ec 100644
--- a/lib/find_next_bit.c
+++ b/lib/find_next_bit.c
@@ -1,8 +1,12 @@
-/* find_next_bit.c: fallback find next bit implementation
+/* bit search implementation
  *
  * Copyright (C) 2004 Red Hat, Inc. All Rights Reserved.
  * Written by David Howells (dhowells@redhat.com)
  *
+ * Copyright (C) 2008 IBM Corporation
+ * 'find_last_bit' is written by Rusty Russell <rusty@rustcorp.com.au>
+ * (Inspired by David Howell's find_next_bit implementation)
+ *
  * Rewritten by Yury Norov <yury.norov@gmail.com> to decrease
  * size and improve performance, 2015.
  *
@@ -17,6 +21,7 @@
 #include <linux/kernel.h>
 
 #define HIGH_BITS_MASK(nr)		(~0UL << (nr))
+#define LOW_BITS_MASK(nr)		(~0UL >> -(nr))
 
 #if !defined(find_next_bit) || !defined(find_next_zero_bit)
 
@@ -108,6 +113,26 @@ unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
 EXPORT_SYMBOL(find_first_zero_bit);
 #endif
 
+#ifndef find_last_bit
+unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
+{
+	if (size) {
+		unsigned long val = LOW_BITS_MASK(size % BITS_PER_LONG);
+		unsigned long idx = (size-1) / BITS_PER_LONG;
+
+		do {
+			val &= addr[idx];
+			if (val)
+				return idx * BITS_PER_LONG + __fls(val);
+
+			val = ~0ul;
+		} while (idx--);
+	}
+	return size;
+}
+EXPORT_SYMBOL(find_last_bit);
+#endif
+
 #ifdef __BIG_ENDIAN
 
 /* include/linux/byteorder does not support "unsigned long" type */
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 3/3] lib: rename lib/find_next_bit.c to lib/find_bit.c
  2015-02-17  2:35 ` [PATCH v4 0/3] lib: find_*_bit reimplementation Yury Norov
  2015-02-17  2:35   ` [PATCH v4 1/3] " Yury Norov
  2015-02-17  2:35   ` [PATCH v4 2/3] lib: move find_last_bit to lib/find_next_bit.c Yury Norov
@ 2015-02-17  2:35   ` Yury Norov
  2 siblings, 0 replies; 27+ messages in thread
From: Yury Norov @ 2015-02-17  2:35 UTC (permalink / raw)
  To: linux, klimov.linux, linux
  Cc: akpm, davem, dborkman, hannes, laijs, msalter, takahiro.akashi,
	tgraf, valentinrothberg, yury.norov, linux-kernel, chris

This file contains implementation for all find_*_bit{,_le} family.
So giving it more generic name looks reasonable.
---
 lib/Makefile                        | 2 +-
 lib/{find_next_bit.c => find_bit.c} | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename lib/{find_next_bit.c => find_bit.c} (100%)

diff --git a/lib/Makefile b/lib/Makefile
index 9dafcd5..d857965 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -25,7 +25,7 @@ obj-y	+= lockref.o
 obj-y += bcd.o div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \
 	 bust_spinlocks.o kasprintf.o bitmap.o scatterlist.o \
 	 gcd.o lcm.o list_sort.o uuid.o flex_array.o clz_ctz.o \
-	 bsearch.o find_next_bit.o llist.o memweight.o kfifo.o \
+	 bsearch.o find_bit.o llist.o memweight.o kfifo.o \
 	 percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o
 obj-y += string_helpers.o
 obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
diff --git a/lib/find_next_bit.c b/lib/find_bit.c
similarity index 100%
rename from lib/find_next_bit.c
rename to lib/find_bit.c
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 1/3] lib: find_*_bit reimplementation
  2015-02-17  2:35   ` [PATCH v4 1/3] " Yury Norov
@ 2015-02-18 17:57     ` Rasmus Villemoes
  0 siblings, 0 replies; 27+ messages in thread
From: Rasmus Villemoes @ 2015-02-18 17:57 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux, klimov.linux, akpm, davem, dborkman, hannes, laijs,
	msalter, takahiro.akashi, tgraf, valentinrothberg, linux-kernel,
	chris

On Tue, Feb 17 2015, Yury Norov <yury.norov@gmail.com> wrote:

> The new implementation takes less space in the sources
> (see diffstat) and in the object. For me it's 710 vs 453
> bytes of text. It also shows a better performance.
>
> find_last_bit description fixed due to obvious typo.
>
> In this patch 2 macros were introduced: {LOW,HIGH}_BITS_MASK,
> that are doing almost the same as BITMAP_{FIRST,LAST}_WORD_MASK
> in include/linux/bitmap.h. But 'LAST' macro is potentially less
> effective, because it issues a conditional branch that can be
> omitted.  If it is replaced one day by a more effective
> implementation, {LOW,HIGH}_BITS_MASK can be removed.
>

I think it's better to use the existing macros and then improve them
instead of duplicating the functionality. I'll submit a patch for that
later tonight (that should then make it to 3.21 [or whatever 3.19+2 will
be called] together with this series). More on this issue below.

>
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> ---
>  include/linux/bitops.h |   4 +-
>  lib/find_last_bit.c    |  37 +++----
>  lib/find_next_bit.c    | 269 ++++++++++++++-----------------------------------
>  3 files changed, 94 insertions(+), 216 deletions(-)
>
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index 5d858e0..297f5bd 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -218,9 +218,9 @@ static inline unsigned long __ffs64(u64 word)
>  /**
>   * find_last_bit - find the last set bit in a memory region
>   * @addr: The address to start the search at
> - * @size: The maximum size to search
> + * @size: The number of bits to search
>   *
> - * Returns the bit number of the first set bit, or size.
> + * Returns the bit number of the last set bit, or size.
>   */
>  extern unsigned long find_last_bit(const unsigned long *addr,
>  				   unsigned long size);
> diff --git a/lib/find_last_bit.c b/lib/find_last_bit.c
> index 91ca09f..edbb281 100644
> --- a/lib/find_last_bit.c
> +++ b/lib/find_last_bit.c
> @@ -4,6 +4,9 @@
>   * Written by Rusty Russell <rusty@rustcorp.com.au>
>   * (Inspired by David Howell's find_next_bit implementation)
>   *
> + * Rewritten by Yury Norov <yury.norov@gmail.com> to decrease
> + * size and improve performance, 2015.
> + *
>   * This program is free software; you can redistribute it and/or
>   * modify it under the terms of the GNU General Public License
>   * as published by the Free Software Foundation; either version
> @@ -12,36 +15,26 @@
>  
>  #include <linux/bitops.h>
>  #include <linux/export.h>
> -#include <asm/types.h>
> -#include <asm/byteorder.h>
> +#include <linux/kernel.h>
> +
> +#define LOW_BITS_MASK(nr) (~0UL >> -(nr))

This is technically wrong, and may not even work on architectures that
are not as forgiving as x86: Whatever type and value nr has, -(nr) is
almost guaranteed not to be a number between 0 and BITS_PER_LONG-1. And
even on x86, gcc doesn't generate as good code as it could:

 163:   49 c7 c0 ff ff ff ff    mov    $0xffffffffffffffff,%r8
 16a:   83 e1 3f                and    $0x3f,%ecx
 16d:   f7 d9                   neg    %ecx
 16f:   48 c1 ea 06             shr    $0x6,%rdx
 173:   49 d3 e8                shr    %cl,%r8

It doesn't realize that pre-masking %ecx with 0x3f is redundant when we
then negate it and use it as a shift amount.

So a better definition of the macro is

#define BITMAP_LAST_WORD_MASK(nr) (~0UL >> (-(nr) & (BITS_PER_LONG-1)))

and then callers shouldn't do the modulo. On x86, gcc knows that the &
is redundant. I use & instead of % so that nr may also have signed type
(otherwise we're again in UB land, since -(nr) % BITS_PER_LONG is then,
by the broken C standard, a negative number).


>  #include <linux/bitops.h>
>  #include <linux/export.h>
> -#include <asm/types.h>
> -#include <asm/byteorder.h>
> +#include <linux/kernel.h>
>  
> -#define BITOP_WORD(nr)		((nr) / BITS_PER_LONG)
> +#define HIGH_BITS_MASK(nr)		(~0UL << (nr))
> +
> +#if !defined(find_next_bit) || !defined(find_next_zero_bit)
>  
> -#ifndef find_next_bit
>  /*
> - * Find the next set bit in a memory region.
> + * This is a common helper function for find_next_bit and
> + * find_next_zero_bit.  The difference is the "invert" argument, which
> + * is XORed with each fetched word before searching it for one bits.
>   */
> -unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
> -			    unsigned long offset)
> +static unsigned long _find_next_bit(const unsigned long *addr,
> +		unsigned long nbits, unsigned long start, unsigned long invert)
>  {
> -	const unsigned long *p = addr + BITOP_WORD(offset);
> -	unsigned long result = offset & ~(BITS_PER_LONG-1);
>  	unsigned long tmp;
>  
> -	if (offset >= size)
> -		return size;
> -	size -= result;
> -	offset %= BITS_PER_LONG;
> -	if (offset) {
> -		tmp = *(p++);
> -		tmp &= (~0UL << offset);
> -		if (size < BITS_PER_LONG)
> -			goto found_first;
> -		if (tmp)
> -			goto found_middle;
> -		size -= BITS_PER_LONG;
> -		result += BITS_PER_LONG;
> -	}
> -	while (size & ~(BITS_PER_LONG-1)) {
> -		if ((tmp = *(p++)))
> -			goto found_middle;
> -		result += BITS_PER_LONG;
> -		size -= BITS_PER_LONG;
> +	if (!nbits || start >= nbits)
> +		return nbits;
> +
> +	tmp = addr[start / BITS_PER_LONG] ^ invert;
> +
> +	/* Handle 1st word. */
> +	tmp &= HIGH_BITS_MASK(start % BITS_PER_LONG);

And of course here, I'd then suggest using BITMAP_FIRST_WORD_MASK(start)
(that even matches the comment :-)), omitting the definition of
HIGH_BITS_MASK.

> @@ -113,24 +78,14 @@ EXPORT_SYMBOL(find_next_zero_bit);
>   */
>  unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
>  {
> -	const unsigned long *p = addr;
> -	unsigned long result = 0;
> -	unsigned long tmp;
> +	unsigned long idx;
>  
> -	while (size & ~(BITS_PER_LONG-1)) {
> -		if ((tmp = *(p++)))
> -			goto found;
> -		result += BITS_PER_LONG;
> -		size -= BITS_PER_LONG;
> +	for (idx = 0; idx * BITS_PER_LONG < size; idx++) {
> +		if (addr[idx])
> +			return min(idx * BITS_PER_LONG + __ffs(addr[idx]), size);
>  	}
> -	if (!size)
> -		return result;
>  
> -	tmp = (*p) & (~0UL >> (BITS_PER_LONG - size));
> -	if (tmp == 0UL)		/* Are any bits set? */
> -		return result + size;	/* Nope. */
> -found:
> -	return result + __ffs(tmp);
> +	return size;
>  }
>  EXPORT_SYMBOL(find_first_bit);
>  #endif
> @@ -141,24 +96,14 @@ EXPORT_SYMBOL(find_first_bit);
>   */
>  unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
>  {
> -	const unsigned long *p = addr;
> -	unsigned long result = 0;
> -	unsigned long tmp;
> +	unsigned long idx;
>  
> -	while (size & ~(BITS_PER_LONG-1)) {
> -		if (~(tmp = *(p++)))
> -			goto found;
> -		result += BITS_PER_LONG;
> -		size -= BITS_PER_LONG;
> +	for (idx = 0; idx * BITS_PER_LONG < size; idx++) {
> +		if (addr[idx] != ~0UL)
> +			return min(idx * BITS_PER_LONG + ffz(addr[idx]), size);
>  	}

Since I'm afraid the above means I have to ask you to send a v5, I might
as well also comment on this: I think it would make the code much more
obviously parallel to find_first_bit if the test was "if (~addr[idx])"
and the ffz is then replaced by __ffs(~addr[idx]). Many architectures
implement ffz(x) as __ffs(~x) anyway, so it shouldn't be any less
efficient. But it's no big deal, so if you feel this is better, just
leave it.

Rasmus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v5 0/3] lib: find_*_bit reimplementation
  2015-02-08 14:10 [PATCH v3 0/3] lib: find_*_bit reimplementation Yury Norov
                   ` (3 preceding siblings ...)
  2015-02-17  2:35 ` [PATCH v4 0/3] lib: find_*_bit reimplementation Yury Norov
@ 2015-02-22 17:24 ` Yury Norov
  2015-02-22 17:24   ` [PATCH v5 1/3] " Yury Norov
                     ` (3 more replies)
  4 siblings, 4 replies; 27+ messages in thread
From: Yury Norov @ 2015-02-22 17:24 UTC (permalink / raw)
  To: linux, klimov.linux, linux
  Cc: akpm, davem, dborkman, hannes, laijs, msalter, takahiro.akashi,
	tgraf, valentinrothberg, yury.norov, linux-kernel, chris

This patchset does rework find_bit functions family to achieve better
performance, and decrease size of text. All rework is done in patch 1.
Patches 2 and 3 are about code moving and renaming.

It was boot-tested on x86_64 and MIPS (big-endian) machines.
Performance tests were ran on userspace with code like this:

	/* addr[] is filled from /dev/urandom */
	start = clock();
	while (ret < nbits)
		ret = find_next_bit(addr, nbits, ret + 1);

	end = clock();
	printf("%ld\t", (unsigned long) end - start);

On Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz measuremets are next:
(for find_next_bit, nbits is 8M, for find_first_bit - 80K)

	find_next_bit:		find_first_bit:
	new	current		new	current
	26932	43151		14777	14925
	26947	43182		14521	15423
	26507	43824		15053	14705
	27329	43759		14473	14777
	26895	43367		14847	15023
	26990	43693		15103	15163
	26775	43299		15067	15232
	27282	42752		14544	15121
	27504	43088		14644	14858
	26761	43856		14699	15193
	26692	43075		14781	14681
	27137	42969		14451	15061
	...			...

find_next_bit performance gain is 35-40%;
find_first_bit - no measurable difference.

On ARM machine, there is arch-specific implementation for find_bit.
To disable it, and use generic one, please apply next patch:
	
	---
	 arch/arm/include/asm/bitops.h   | 19 -------------------
	 arch/arm/kernel/armksyms.c      | 11 -----------
	 arch/arm/lib/Makefile           |  2 +-
	 include/asm-generic/bitops/le.h |  1 +
	 4 files changed, 2 insertions(+), 31 deletions(-)

	diff --git a/arch/arm/include/asm/bitops.h b/arch/arm/include/asm/bitops.h
	index 5638099..e0611d1 100644
	--- a/arch/arm/include/asm/bitops.h
	+++ b/arch/arm/include/asm/bitops.h
	@@ -192,25 +192,6 @@ extern int _find_next_bit_be(const unsigned long *p, int size, int offset);
	 #define test_and_clear_bit(nr,p)	ATOMIC_BITOP(test_and_clear_bit,nr,p)
	 #define test_and_change_bit(nr,p)	ATOMIC_BITOP(test_and_change_bit,nr,p)
	 
	-#ifndef __ARMEB__
	-/*
	- * These are the little endian, atomic definitions.
	- */
	-#define find_first_zero_bit(p,sz)	_find_first_zero_bit_le(p,sz)
	-#define find_next_zero_bit(p,sz,off)	_find_next_zero_bit_le(p,sz,off)
	-#define find_first_bit(p,sz)		_find_first_bit_le(p,sz)
	-#define find_next_bit(p,sz,off)		_find_next_bit_le(p,sz,off)
	-
	-#else
	-/*
	- * These are the big endian, atomic definitions.
	- */
	-#define find_first_zero_bit(p,sz)	_find_first_zero_bit_be(p,sz)
	-#define find_next_zero_bit(p,sz,off)	_find_next_zero_bit_be(p,sz,off)
	-#define find_first_bit(p,sz)		_find_first_bit_be(p,sz)
	-#define find_next_bit(p,sz,off)		_find_next_bit_be(p,sz,off)
	-
	-#endif
	 
	 #if __LINUX_ARM_ARCH__ < 5
	 
	diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
	index a88671c..22e8748 100644
	--- a/arch/arm/kernel/armksyms.c
	+++ b/arch/arm/kernel/armksyms.c
	@@ -146,17 +146,6 @@ EXPORT_SYMBOL(_clear_bit);
	 EXPORT_SYMBOL(_test_and_clear_bit);
	 EXPORT_SYMBOL(_change_bit);
	 EXPORT_SYMBOL(_test_and_change_bit);
	-EXPORT_SYMBOL(_find_first_zero_bit_le);
	-EXPORT_SYMBOL(_find_next_zero_bit_le);
	-EXPORT_SYMBOL(_find_first_bit_le);
	-EXPORT_SYMBOL(_find_next_bit_le);
	-
	-#ifdef __ARMEB__
	-EXPORT_SYMBOL(_find_first_zero_bit_be);
	-EXPORT_SYMBOL(_find_next_zero_bit_be);
	-EXPORT_SYMBOL(_find_first_bit_be);
	-EXPORT_SYMBOL(_find_next_bit_be);
	-#endif
	 
	 #ifdef CONFIG_FUNCTION_TRACER
	 #ifdef CONFIG_OLD_MCOUNT
	diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
	index 0573faa..de369aa 100644
	--- a/arch/arm/lib/Makefile
	+++ b/arch/arm/lib/Makefile
	@@ -6,7 +6,7 @@
	 
	 lib-y		:= backtrace.o changebit.o csumipv6.o csumpartial.o   \
			   csumpartialcopy.o csumpartialcopyuser.o clearbit.o \
	-		   delay.o delay-loop.o findbit.o memchr.o memcpy.o   \
	+		   delay.o delay-loop.o memchr.o memcpy.o	      \
			   memmove.o memset.o memzero.o setbit.o              \
			   strchr.o strrchr.o                                 \
			   testchangebit.o testclearbit.o testsetbit.o        \
	diff --git a/include/asm-generic/bitops/le.h b/include/asm-generic/bitops/le.h
	index 6173154..9a8798f 100644
	--- a/include/asm-generic/bitops/le.h
	+++ b/include/asm-generic/bitops/le.h
	@@ -2,6 +2,7 @@
	 #define _ASM_GENERIC_BITOPS_LE_H_
	 
	 #include <asm/types.h>
	+#include <asm-generic/bitops/find.h>
	 #include <asm/byteorder.h>
	 
	 #if defined(__LITTLE_ENDIAN)
	-- 
	1.9.1

Thanks a lot to George Spelvin and Rasmus Villemoes for hints and
helpful discussions.
	
v5:
* {FIRST,LAST}_BITS_MASK macros are removed. BITMAP_{FIRST,LAST}_WORD_MASK
are used instead.

v4:
* find_last_bit found buggy and replaced with George Spelvin's version,
which handles garbage at the end of word-unaligned bitmap correctly;
* find_last_bit description fixed in 'include/linux/bitops.h';
* '_find_next_bit{,_le}: first word handling turned to be unconditional;

v3:
* conditional bit inverting in _find_next_bit replaced with XORing by
  mask (patch 1);
* size/offset checkers moved from wrappers to _find_next_bit (patch 1);
* #include <linux/bitops.h> restored (patch 1);
* lib/find_next_bit descriptor changed to not appeal to file name and
  so let to produce clean diff in case of rename (patch 2);
* patch 3 is generated with '-M' option to detect renaming and drop
  file content;
* this cover added.

Yury Norov (3):
  lib: find_*_bit reimplementation
  lib: move find_last_bit to lib/find_next_bit.c
  lib: rename lib/find_next_bit.c to lib/find_bit.c

 include/linux/bitops.h |   4 +-
 lib/Makefile           |   2 +-
 lib/find_bit.c         | 195 +++++++++++++++++++++++++++++++++
 lib/find_last_bit.c    |  49 ---------
 lib/find_next_bit.c    | 285 -------------------------------------------------
 5 files changed, 198 insertions(+), 337 deletions(-)
 create mode 100644 lib/find_bit.c
 delete mode 100644 lib/find_last_bit.c
 delete mode 100644 lib/find_next_bit.c

-- 
2.1.0

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v5 1/3] lib: find_*_bit reimplementation
  2015-02-22 17:24 ` [PATCH v5 0/3] lib: find_*_bit reimplementation Yury Norov
@ 2015-02-22 17:24   ` Yury Norov
  2015-02-23 21:50     ` Rasmus Villemoes
  2015-02-22 17:24   ` [PATCH v5 2/3] lib: move find_last_bit to lib/find_next_bit.c Yury Norov
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 27+ messages in thread
From: Yury Norov @ 2015-02-22 17:24 UTC (permalink / raw)
  To: linux, klimov.linux, linux
  Cc: akpm, davem, dborkman, hannes, laijs, msalter, takahiro.akashi,
	tgraf, valentinrothberg, yury.norov, linux-kernel, chris

New implementations takes less space in source file (see diffstat)
and in object. For me it's 710 vs 453 bytes of text.
It also shows better performance.

find_last_bit description fixed due to obvious typo.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 include/linux/bitops.h |   4 +-
 lib/find_last_bit.c    |  35 +++----
 lib/find_next_bit.c    | 267 ++++++++++++++-----------------------------------
 3 files changed, 90 insertions(+), 216 deletions(-)

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 5d858e0..297f5bd 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -218,9 +218,9 @@ static inline unsigned long __ffs64(u64 word)
 /**
  * find_last_bit - find the last set bit in a memory region
  * @addr: The address to start the search at
- * @size: The maximum size to search
+ * @size: The number of bits to search
  *
- * Returns the bit number of the first set bit, or size.
+ * Returns the bit number of the last set bit, or size.
  */
 extern unsigned long find_last_bit(const unsigned long *addr,
 				   unsigned long size);
diff --git a/lib/find_last_bit.c b/lib/find_last_bit.c
index 91ca09f..f7b594e 100644
--- a/lib/find_last_bit.c
+++ b/lib/find_last_bit.c
@@ -4,6 +4,9 @@
  * Written by Rusty Russell <rusty@rustcorp.com.au>
  * (Inspired by David Howell's find_next_bit implementation)
  *
+ * Rewritten by Yury Norov <yury.norov@gmail.com> to decrease
+ * size and improve performance, 2015.
+ *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
  * as published by the Free Software Foundation; either version
@@ -12,36 +15,24 @@
 
 #include <linux/bitops.h>
 #include <linux/export.h>
-#include <asm/types.h>
-#include <asm/byteorder.h>
+#include <linux/kernel.h>
 
 #ifndef find_last_bit
 
 unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
 {
-	unsigned long words;
-	unsigned long tmp;
-
-	/* Start at final word. */
-	words = size / BITS_PER_LONG;
+	if (size) {
+		unsigned long val = BITMAP_LAST_WORD_MASK(size);
+		unsigned long idx = (size-1) / BITS_PER_LONG;
 
-	/* Partial final word? */
-	if (size & (BITS_PER_LONG-1)) {
-		tmp = (addr[words] & (~0UL >> (BITS_PER_LONG
-					 - (size & (BITS_PER_LONG-1)))));
-		if (tmp)
-			goto found;
-	}
+		do {
+			val &= addr[idx];
+			if (val)
+				return idx * BITS_PER_LONG + __fls(val);
 
-	while (words) {
-		tmp = addr[--words];
-		if (tmp) {
-found:
-			return words * BITS_PER_LONG + __fls(tmp);
-		}
+			val = ~0ul;
+		} while (idx--);
 	}
-
-	/* Not found */
 	return size;
 }
 EXPORT_SYMBOL(find_last_bit);
diff --git a/lib/find_next_bit.c b/lib/find_next_bit.c
index 0cbfc0b..cbea5ef 100644
--- a/lib/find_next_bit.c
+++ b/lib/find_next_bit.c
@@ -3,6 +3,9 @@
  * Copyright (C) 2004 Red Hat, Inc. All Rights Reserved.
  * Written by David Howells (dhowells@redhat.com)
  *
+ * Rewritten by Yury Norov <yury.norov@gmail.com> to decrease
+ * size and improve performance, 2015.
+ *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
  * as published by the Free Software Foundation; either version
@@ -11,98 +14,58 @@
 
 #include <linux/bitops.h>
 #include <linux/export.h>
-#include <asm/types.h>
-#include <asm/byteorder.h>
+#include <linux/kernel.h>
 
-#define BITOP_WORD(nr)		((nr) / BITS_PER_LONG)
+#if !defined(find_next_bit) || !defined(find_next_zero_bit)
 
-#ifndef find_next_bit
 /*
- * Find the next set bit in a memory region.
+ * This is a common helper function for find_next_bit and
+ * find_next_zero_bit.  The difference is the "invert" argument, which
+ * is XORed with each fetched word before searching it for one bits.
  */
-unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
-			    unsigned long offset)
+static unsigned long _find_next_bit(const unsigned long *addr,
+		unsigned long nbits, unsigned long start, unsigned long invert)
 {
-	const unsigned long *p = addr + BITOP_WORD(offset);
-	unsigned long result = offset & ~(BITS_PER_LONG-1);
 	unsigned long tmp;
 
-	if (offset >= size)
-		return size;
-	size -= result;
-	offset %= BITS_PER_LONG;
-	if (offset) {
-		tmp = *(p++);
-		tmp &= (~0UL << offset);
-		if (size < BITS_PER_LONG)
-			goto found_first;
-		if (tmp)
-			goto found_middle;
-		size -= BITS_PER_LONG;
-		result += BITS_PER_LONG;
-	}
-	while (size & ~(BITS_PER_LONG-1)) {
-		if ((tmp = *(p++)))
-			goto found_middle;
-		result += BITS_PER_LONG;
-		size -= BITS_PER_LONG;
+	if (!nbits || start >= nbits)
+		return nbits;
+
+	tmp = addr[start / BITS_PER_LONG] ^ invert;
+
+	/* Handle 1st word. */
+	tmp &= BITMAP_FIRST_WORD_MASK(start);
+	start = round_down(start, BITS_PER_LONG);
+
+	while (!tmp) {
+		start += BITS_PER_LONG;
+		if (start >= nbits)
+			return nbits;
+
+		tmp = addr[start / BITS_PER_LONG] ^ invert;
 	}
-	if (!size)
-		return result;
-	tmp = *p;
 
-found_first:
-	tmp &= (~0UL >> (BITS_PER_LONG - size));
-	if (tmp == 0UL)		/* Are any bits set? */
-		return result + size;	/* Nope. */
-found_middle:
-	return result + __ffs(tmp);
+	return min(start + __ffs(tmp), nbits);
 }
-EXPORT_SYMBOL(find_next_bit);
 #endif
 
-#ifndef find_next_zero_bit
+#ifndef find_next_bit
 /*
- * This implementation of find_{first,next}_zero_bit was stolen from
- * Linus' asm-alpha/bitops.h.
+ * Find the next set bit in a memory region.
  */
+unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
+			    unsigned long offset)
+{
+	return _find_next_bit(addr, size, offset, 0UL);
+}
+EXPORT_SYMBOL(find_next_bit);
+#endif
+
+#ifndef find_next_zero_bit
 unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
 				 unsigned long offset)
 {
-	const unsigned long *p = addr + BITOP_WORD(offset);
-	unsigned long result = offset & ~(BITS_PER_LONG-1);
-	unsigned long tmp;
-
-	if (offset >= size)
-		return size;
-	size -= result;
-	offset %= BITS_PER_LONG;
-	if (offset) {
-		tmp = *(p++);
-		tmp |= ~0UL >> (BITS_PER_LONG - offset);
-		if (size < BITS_PER_LONG)
-			goto found_first;
-		if (~tmp)
-			goto found_middle;
-		size -= BITS_PER_LONG;
-		result += BITS_PER_LONG;
-	}
-	while (size & ~(BITS_PER_LONG-1)) {
-		if (~(tmp = *(p++)))
-			goto found_middle;
-		result += BITS_PER_LONG;
-		size -= BITS_PER_LONG;
-	}
-	if (!size)
-		return result;
-	tmp = *p;
-
-found_first:
-	tmp |= ~0UL << size;
-	if (tmp == ~0UL)	/* Are any bits zero? */
-		return result + size;	/* Nope. */
-found_middle:
-	return result + ffz(tmp);
+	return _find_next_bit(addr, size, offset, ~0UL);
 }
 EXPORT_SYMBOL(find_next_zero_bit);
 #endif
@@ -113,24 +76,14 @@ EXPORT_SYMBOL(find_next_zero_bit);
  */
 unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
 {
-	const unsigned long *p = addr;
-	unsigned long result = 0;
-	unsigned long tmp;
+	unsigned long idx;
 
-	while (size & ~(BITS_PER_LONG-1)) {
-		if ((tmp = *(p++)))
-			goto found;
-		result += BITS_PER_LONG;
-		size -= BITS_PER_LONG;
+	for (idx = 0; idx * BITS_PER_LONG < size; idx++) {
+		if (addr[idx])
+			return min(idx * BITS_PER_LONG + __ffs(addr[idx]), size);
 	}
-	if (!size)
-		return result;
 
-	tmp = (*p) & (~0UL >> (BITS_PER_LONG - size));
-	if (tmp == 0UL)		/* Are any bits set? */
-		return result + size;	/* Nope. */
-found:
-	return result + __ffs(tmp);
+	return size;
 }
 EXPORT_SYMBOL(find_first_bit);
 #endif
@@ -141,24 +94,14 @@ EXPORT_SYMBOL(find_first_bit);
  */
 unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
 {
-	const unsigned long *p = addr;
-	unsigned long result = 0;
-	unsigned long tmp;
+	unsigned long idx;
 
-	while (size & ~(BITS_PER_LONG-1)) {
-		if (~(tmp = *(p++)))
-			goto found;
-		result += BITS_PER_LONG;
-		size -= BITS_PER_LONG;
+	for (idx = 0; idx * BITS_PER_LONG < size; idx++) {
+		if (addr[idx] != ~0UL)
+			return min(idx * BITS_PER_LONG + ffz(addr[idx]), size);
 	}
-	if (!size)
-		return result;
 
-	tmp = (*p) | (~0UL << size);
-	if (tmp == ~0UL)	/* Are any bits zero? */
-		return result + size;	/* Nope. */
-found:
-	return result + ffz(tmp);
+	return size;
 }
 EXPORT_SYMBOL(find_first_zero_bit);
 #endif
@@ -166,18 +109,6 @@ EXPORT_SYMBOL(find_first_zero_bit);
 #ifdef __BIG_ENDIAN
 
 /* include/linux/byteorder does not support "unsigned long" type */
-static inline unsigned long ext2_swabp(const unsigned long * x)
-{
-#if BITS_PER_LONG == 64
-	return (unsigned long) __swab64p((u64 *) x);
-#elif BITS_PER_LONG == 32
-	return (unsigned long) __swab32p((u32 *) x);
-#else
-#error BITS_PER_LONG not defined
-#endif
-}
-
-/* include/linux/byteorder doesn't support "unsigned long" type */
 static inline unsigned long ext2_swab(const unsigned long y)
 {
 #if BITS_PER_LONG == 64
@@ -189,48 +120,38 @@ static inline unsigned long ext2_swab(const unsigned long y)
 #endif
 }
 
-#ifndef find_next_zero_bit_le
-unsigned long find_next_zero_bit_le(const void *addr, unsigned
-		long size, unsigned long offset)
+#if !defined(find_next_bit_le) || !defined(find_next_zero_bit_le)
+static unsigned long _find_next_bit_le(const unsigned long *addr,
+		unsigned long nbits, unsigned long start, unsigned long invert)
 {
-	const unsigned long *p = addr;
-	unsigned long result = offset & ~(BITS_PER_LONG - 1);
 	unsigned long tmp;
 
-	if (offset >= size)
-		return size;
-	p += BITOP_WORD(offset);
-	size -= result;
-	offset &= (BITS_PER_LONG - 1UL);
-	if (offset) {
-		tmp = ext2_swabp(p++);
-		tmp |= (~0UL >> (BITS_PER_LONG - offset));
-		if (size < BITS_PER_LONG)
-			goto found_first;
-		if (~tmp)
-			goto found_middle;
-		size -= BITS_PER_LONG;
-		result += BITS_PER_LONG;
-	}
+	if (!nbits || start >= nbits)
+		return nbits;
+
+	tmp = addr[start / BITS_PER_LONG] ^ invert;
+
+	/* Handle 1st word. */
+	tmp &= ext2_swab(BITMAP_FIRST_WORD_MASK(start));
+	start = round_down(start, BITS_PER_LONG);
 
-	while (size & ~(BITS_PER_LONG - 1)) {
-		if (~(tmp = *(p++)))
-			goto found_middle_swap;
-		result += BITS_PER_LONG;
-		size -= BITS_PER_LONG;
+	while (!tmp) {
+		start += BITS_PER_LONG;
+		if (start >= nbits)
+			return nbits;
+
+		tmp = addr[start / BITS_PER_LONG] ^ invert;
 	}
-	if (!size)
-		return result;
-	tmp = ext2_swabp(p);
-found_first:
-	tmp |= ~0UL << size;
-	if (tmp == ~0UL)	/* Are any bits zero? */
-		return result + size; /* Nope. Skip ffz */
-found_middle:
-	return result + ffz(tmp);
 
-found_middle_swap:
-	return result + ffz(ext2_swab(tmp));
+	return min(start + __ffs(ext2_swab(tmp)), nbits);
+}
+#endif
+
+#ifndef find_next_zero_bit_le
+unsigned long find_next_zero_bit_le(const void *addr, unsigned
+		long size, unsigned long offset)
+{
+	return _find_next_bit_le(addr, size, offset, ~0UL);
 }
 EXPORT_SYMBOL(find_next_zero_bit_le);
 #endif
@@ -239,45 +160,7 @@ EXPORT_SYMBOL(find_next_zero_bit_le);
 unsigned long find_next_bit_le(const void *addr, unsigned
 		long size, unsigned long offset)
 {
-	const unsigned long *p = addr;
-	unsigned long result = offset & ~(BITS_PER_LONG - 1);
-	unsigned long tmp;
-
-	if (offset >= size)
-		return size;
-	p += BITOP_WORD(offset);
-	size -= result;
-	offset &= (BITS_PER_LONG - 1UL);
-	if (offset) {
-		tmp = ext2_swabp(p++);
-		tmp &= (~0UL << offset);
-		if (size < BITS_PER_LONG)
-			goto found_first;
-		if (tmp)
-			goto found_middle;
-		size -= BITS_PER_LONG;
-		result += BITS_PER_LONG;
-	}
-
-	while (size & ~(BITS_PER_LONG - 1)) {
-		tmp = *(p++);
-		if (tmp)
-			goto found_middle_swap;
-		result += BITS_PER_LONG;
-		size -= BITS_PER_LONG;
-	}
-	if (!size)
-		return result;
-	tmp = ext2_swabp(p);
-found_first:
-	tmp &= (~0UL >> (BITS_PER_LONG - size));
-	if (tmp == 0UL)		/* Are any bits set? */
-		return result + size; /* Nope. */
-found_middle:
-	return result + __ffs(tmp);
-
-found_middle_swap:
-	return result + __ffs(ext2_swab(tmp));
+	return _find_next_bit_le(addr, size, offset, 0UL);
 }
 EXPORT_SYMBOL(find_next_bit_le);
 #endif
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 2/3] lib: move find_last_bit to lib/find_next_bit.c
  2015-02-22 17:24 ` [PATCH v5 0/3] lib: find_*_bit reimplementation Yury Norov
  2015-02-22 17:24   ` [PATCH v5 1/3] " Yury Norov
@ 2015-02-22 17:24   ` Yury Norov
  2015-02-22 17:24   ` [PATCH v5 3/3] lib: rename lib/find_next_bit.c to lib/find_bit.c Yury Norov
  2015-02-24  0:40   ` [PATCH v5 0/3] lib: find_*_bit reimplementation Andrew Morton
  3 siblings, 0 replies; 27+ messages in thread
From: Yury Norov @ 2015-02-22 17:24 UTC (permalink / raw)
  To: linux, klimov.linux, linux
  Cc: akpm, davem, dborkman, hannes, laijs, msalter, takahiro.akashi,
	tgraf, valentinrothberg, yury.norov, linux-kernel, chris

Currently all 'find_*_bit' family is located in lib/find_next_bit.c,
except 'find_last_bit', which is in lib/find_last_bit.c. It seems,
there's no major benefit to have it separated.
---
 lib/Makefile        |  2 +-
 lib/find_last_bit.c | 40 ----------------------------------------
 lib/find_next_bit.c | 26 +++++++++++++++++++++++++-
 3 files changed, 26 insertions(+), 42 deletions(-)
 delete mode 100644 lib/find_last_bit.c

diff --git a/lib/Makefile b/lib/Makefile
index 87eb3bf..9dafcd5 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -25,7 +25,7 @@ obj-y	+= lockref.o
 obj-y += bcd.o div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \
 	 bust_spinlocks.o kasprintf.o bitmap.o scatterlist.o \
 	 gcd.o lcm.o list_sort.o uuid.o flex_array.o clz_ctz.o \
-	 bsearch.o find_last_bit.o find_next_bit.o llist.o memweight.o kfifo.o \
+	 bsearch.o find_next_bit.o llist.o memweight.o kfifo.o \
 	 percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o
 obj-y += string_helpers.o
 obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
diff --git a/lib/find_last_bit.c b/lib/find_last_bit.c
deleted file mode 100644
index f7b594e..0000000
--- a/lib/find_last_bit.c
+++ /dev/null
@@ -1,40 +0,0 @@
-/* find_last_bit.c: fallback find next bit implementation
- *
- * Copyright (C) 2008 IBM Corporation
- * Written by Rusty Russell <rusty@rustcorp.com.au>
- * (Inspired by David Howell's find_next_bit implementation)
- *
- * Rewritten by Yury Norov <yury.norov@gmail.com> to decrease
- * size and improve performance, 2015.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- */
-
-#include <linux/bitops.h>
-#include <linux/export.h>
-#include <linux/kernel.h>
-
-#ifndef find_last_bit
-
-unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
-{
-	if (size) {
-		unsigned long val = BITMAP_LAST_WORD_MASK(size);
-		unsigned long idx = (size-1) / BITS_PER_LONG;
-
-		do {
-			val &= addr[idx];
-			if (val)
-				return idx * BITS_PER_LONG + __fls(val);
-
-			val = ~0ul;
-		} while (idx--);
-	}
-	return size;
-}
-EXPORT_SYMBOL(find_last_bit);
-
-#endif
diff --git a/lib/find_next_bit.c b/lib/find_next_bit.c
index cbea5ef..9ae4d40 100644
--- a/lib/find_next_bit.c
+++ b/lib/find_next_bit.c
@@ -1,8 +1,12 @@
-/* find_next_bit.c: fallback find next bit implementation
+/* bit search implementation
  *
  * Copyright (C) 2004 Red Hat, Inc. All Rights Reserved.
  * Written by David Howells (dhowells@redhat.com)
  *
+ * Copyright (C) 2008 IBM Corporation
+ * 'find_last_bit' is written by Rusty Russell <rusty@rustcorp.com.au>
+ * (Inspired by David Howell's find_next_bit implementation)
+ *
  * Rewritten by Yury Norov <yury.norov@gmail.com> to decrease
  * size and improve performance, 2015.
  *
@@ -106,6 +110,26 @@ unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
 EXPORT_SYMBOL(find_first_zero_bit);
 #endif
 
+#ifndef find_last_bit
+unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
+{
+	if (size) {
+		unsigned long val = BITMAP_LAST_WORD_MASK(size);
+		unsigned long idx = (size-1) / BITS_PER_LONG;
+
+		do {
+			val &= addr[idx];
+			if (val)
+				return idx * BITS_PER_LONG + __fls(val);
+
+			val = ~0ul;
+		} while (idx--);
+	}
+	return size;
+}
+EXPORT_SYMBOL(find_last_bit);
+#endif
+
 #ifdef __BIG_ENDIAN
 
 /* include/linux/byteorder does not support "unsigned long" type */
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 3/3] lib: rename lib/find_next_bit.c to lib/find_bit.c
  2015-02-22 17:24 ` [PATCH v5 0/3] lib: find_*_bit reimplementation Yury Norov
  2015-02-22 17:24   ` [PATCH v5 1/3] " Yury Norov
  2015-02-22 17:24   ` [PATCH v5 2/3] lib: move find_last_bit to lib/find_next_bit.c Yury Norov
@ 2015-02-22 17:24   ` Yury Norov
  2015-02-24  0:40   ` [PATCH v5 0/3] lib: find_*_bit reimplementation Andrew Morton
  3 siblings, 0 replies; 27+ messages in thread
From: Yury Norov @ 2015-02-22 17:24 UTC (permalink / raw)
  To: linux, klimov.linux, linux
  Cc: akpm, davem, dborkman, hannes, laijs, msalter, takahiro.akashi,
	tgraf, valentinrothberg, yury.norov, linux-kernel, chris

This file contains implementation for all find_*_bit{,_le}
So giving it more generic name looks reasonable.
---
 lib/Makefile                        | 2 +-
 lib/{find_next_bit.c => find_bit.c} | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename lib/{find_next_bit.c => find_bit.c} (100%)

diff --git a/lib/Makefile b/lib/Makefile
index 9dafcd5..d857965 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -25,7 +25,7 @@ obj-y	+= lockref.o
 obj-y += bcd.o div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \
 	 bust_spinlocks.o kasprintf.o bitmap.o scatterlist.o \
 	 gcd.o lcm.o list_sort.o uuid.o flex_array.o clz_ctz.o \
-	 bsearch.o find_next_bit.o llist.o memweight.o kfifo.o \
+	 bsearch.o find_bit.o llist.o memweight.o kfifo.o \
 	 percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o
 obj-y += string_helpers.o
 obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
diff --git a/lib/find_next_bit.c b/lib/find_bit.c
similarity index 100%
rename from lib/find_next_bit.c
rename to lib/find_bit.c
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 1/3] lib: find_*_bit reimplementation
  2015-02-22 17:24   ` [PATCH v5 1/3] " Yury Norov
@ 2015-02-23 21:50     ` Rasmus Villemoes
  2015-02-24  0:29       ` George Spelvin
  0 siblings, 1 reply; 27+ messages in thread
From: Rasmus Villemoes @ 2015-02-23 21:50 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux, klimov.linux, akpm, davem, dborkman, hannes, laijs,
	msalter, takahiro.akashi, tgraf, valentinrothberg, linux-kernel,
	chris

On Sun, Feb 22 2015, Yury Norov <yury.norov@gmail.com> wrote:

> New implementations takes less space in source file (see diffstat)
> and in object. For me it's 710 vs 453 bytes of text.
> It also shows better performance.
>
> find_last_bit description fixed due to obvious typo.
>

Hm, sorry, I should probably have reminded you to include linux/bitmap.h:

lib/find_last_bit.c: In function ‘find_last_bit’:
lib/find_last_bit.c:25:3: error: implicit declaration of function ‘BITMAP_LAST_WORD_MASK’ [-Werror=implicit-function-declaration]
lib/find_next_bit.c: In function ‘_find_next_bit’:
lib/find_next_bit.c:37:2: error: implicit declaration of function ‘BITMAP_FIRST_WORD_MASK’ [-Werror=implicit-function-declaration]
cc1: some warnings being treated as errors
cc1: some warnings being treated as errors

With that fixed:

Reviewed-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>

That also applies to 2/3 and 3/3.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 1/3] lib: find_*_bit reimplementation
  2015-02-23 21:50     ` Rasmus Villemoes
@ 2015-02-24  0:29       ` George Spelvin
  0 siblings, 0 replies; 27+ messages in thread
From: George Spelvin @ 2015-02-24  0:29 UTC (permalink / raw)
  To: linux, yury.norov
  Cc: akpm, chris, davem, dborkman, hannes, klimov.linux, laijs,
	linux-kernel, linux, msalter, takahiro.akashi, tgraf,
	valentinrothberg

Also add my

Reviewed-by: George Spelvin <linux@horizon.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 0/3] lib: find_*_bit reimplementation
  2015-02-22 17:24 ` [PATCH v5 0/3] lib: find_*_bit reimplementation Yury Norov
                     ` (2 preceding siblings ...)
  2015-02-22 17:24   ` [PATCH v5 3/3] lib: rename lib/find_next_bit.c to lib/find_bit.c Yury Norov
@ 2015-02-24  0:40   ` Andrew Morton
  2015-03-08 18:17     ` Yury Norov
  3 siblings, 1 reply; 27+ messages in thread
From: Andrew Morton @ 2015-02-24  0:40 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux, klimov.linux, linux, davem, dborkman, hannes, laijs,
	msalter, takahiro.akashi, tgraf, valentinrothberg, linux-kernel,
	chris

On Sun, 22 Feb 2015 20:24:14 +0300 Yury Norov <yury.norov@gmail.com> wrote:

> This patchset does rework find_bit functions family to achieve better
> performance, and decrease size of text. All rework is done in patch 1.
> Patches 2 and 3 are about code moving and renaming.
> 
> It was boot-tested on x86_64 and MIPS (big-endian) machines.
> Performance tests were ran on userspace with code like this:
> 
> 	/* addr[] is filled from /dev/urandom */
> 	start = clock();
> 	while (ret < nbits)
> 		ret = find_next_bit(addr, nbits, ret + 1);
> 
> 	end = clock();
> 	printf("%ld\t", (unsigned long) end - start);
> 
> On Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz measuremets are next:
> (for find_next_bit, nbits is 8M, for find_first_bit - 80K)
> 
> 	find_next_bit:		find_first_bit:
> 	new	current		new	current
> 	26932	43151		14777	14925
> 	26947	43182		14521	15423
> 	26507	43824		15053	14705
> 	27329	43759		14473	14777
> 	26895	43367		14847	15023
> 	26990	43693		15103	15163
> 	26775	43299		15067	15232
> 	27282	42752		14544	15121
> 	27504	43088		14644	14858
> 	26761	43856		14699	15193
> 	26692	43075		14781	14681
> 	27137	42969		14451	15061
> 	...			...
> 
> find_next_bit performance gain is 35-40%;
> find_first_bit - no measurable difference.
> 
> On ARM machine, there is arch-specific implementation for find_bit.
> To disable it, and use generic one, please apply next patch:

I avoid putting patches into changelogs because in some situations
patch(1) tries to apply it when you apply the real patch.  Maybe you
can share the userspace test harness with someone who has access to an
arm machine?

Patches 2 and 3 are missing your signed-off-by:.  I added it to my
copies of those patches.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 0/3] lib: find_*_bit reimplementation
  2015-02-24  0:40   ` [PATCH v5 0/3] lib: find_*_bit reimplementation Andrew Morton
@ 2015-03-08 18:17     ` Yury Norov
  0 siblings, 0 replies; 27+ messages in thread
From: Yury Norov @ 2015-03-08 18:17 UTC (permalink / raw)
  To: linux, klimov.linux, linux, akpm
  Cc: davem, dborkman, hannes, laijs, msalter, takahiro.akashi, tgraf,
	valentinrothberg, yury.norov, linux-kernel, chris

It looks I screwed up with performance measurements.

At first, I measured performance of 'find_next_zero_bit', while wrote about
'find_next_bit'. I was really thinking it's not matter, so I'm sorry.

At second, to run test on userspace I got ffs and ffz from kernel.  And it
looks like all performance difference is caused by them.  Simple measurement
of performance of current 'find_next_bit' and 'find_next_zero_bit' shows the
same 35% difference, while the code is virtually identical. The trick is that
new implementation doesn't use ffz, that is slow for me (because I got generic
implementation), and so 'find_next_zero_bit' looks faster, while it's not.
In kernel, ffs and ffz are both fast.  I'm sorry again.

When I ran this test in kernel mode, I found measurements weird. I tested it on
Core i7-3770@3.40GHz and Core i7-2630QM@2.00GHz. First machine did show noizly
but relatively equal results (I cannot attach it now), second one is more stable,
and it looks like performance degrade for about 20%.

	find_next_bit	find_next_zero_bit
        old	new	old	new
        25	30	24	29
        25	30      24	29
        25	30      25	29
        26	29      24	29
        25	30      24	29
        26	30      24	29
        26	30      24	30
        25	30      25	29
        26	30      25	29
        25	30      25	30

The module I use for testing is here.

To Alexey:
Could you be so kind to run this test on your Odroid? It's also interesting, is
there real difference between generic and platform implementations. 

---
 lib/Kconfig.debug   |   8 +++
 lib/Makefile        |   1 +
 lib/test_find_bit.c | 189 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 198 insertions(+)
 create mode 100644 lib/test_find_bit.c

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c5cefb3..5d78dbb 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1669,6 +1669,14 @@ config DMA_API_DEBUG
 
 	  If unsure, say N.
 
+config TEST_FIND_BIT
+	tristate "Test find_bit family performance"
+	default n
+	help
+	  This builds the "test_find_bit".
+
+	  If unsure, say N.
+
 config TEST_LKM
 	tristate "Test module loading with 'hello world' module"
 	default n
diff --git a/lib/Makefile b/lib/Makefile
index d857965..9186da4 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -34,6 +34,7 @@ obj-$(CONFIG_TEST_HEXDUMP) += test-hexdump.o
 obj-y += kstrtox.o
 obj-$(CONFIG_TEST_BPF) += test_bpf.o
 obj-$(CONFIG_TEST_FIRMWARE) += test_firmware.o
+obj-$(CONFIG_TEST_FIND_BIT) += test_find_bit.o
 obj-$(CONFIG_TEST_KASAN) += test_kasan.o
 obj-$(CONFIG_TEST_KSTRTOX) += test-kstrtox.o
 obj-$(CONFIG_TEST_LKM) += test_module.o
diff --git a/lib/test_find_bit.c b/lib/test_find_bit.c
new file mode 100644
index 0000000..0c04022
--- /dev/null
+++ b/lib/test_find_bit.c
@@ -0,0 +1,189 @@
+/*
+ * This module tests 'find_next_bit' performance.
+ */
+
+#include <linux/bitops.h>
+#include <linux/init.h>
+#include <linux/random.h>
+#include <linux/module.h>
+#include <linux/printk.h>
+
+#define BITOP_WORD(nr)         ((nr) / BITS_PER_LONG)
+
+unsigned long find_next_bit_old(const unsigned long *addr, unsigned long size,
+			    unsigned long offset)
+{
+	const unsigned long *p = addr + BITOP_WORD(offset);
+	unsigned long result = offset & ~(BITS_PER_LONG-1);
+	unsigned long tmp;
+
+	if (offset >= size)
+		return size;
+	size -= result;
+	offset %= BITS_PER_LONG;
+	if (offset) {
+		tmp = *(p++);
+		tmp &= (~0UL << offset);
+		if (size < BITS_PER_LONG)
+			goto found_first;
+		if (tmp)
+			goto found_middle;
+		size -= BITS_PER_LONG;
+		result += BITS_PER_LONG;
+	}
+	while (size & ~(BITS_PER_LONG-1)) {
+		if ((tmp = *(p++)))
+			goto found_middle;
+		result += BITS_PER_LONG;
+		size -= BITS_PER_LONG;
+	}
+	if (!size)
+		return result;
+	tmp = *p;
+
+found_first:
+	tmp &= (~0UL >> (BITS_PER_LONG - size));
+	if (tmp == 0UL)		/* Are any bits set? */
+		return result + size;	/* Nope. */
+found_middle:
+	return result + __ffs(tmp);
+}
+
+unsigned long find_next_zero_bit_old(const unsigned long *addr, unsigned long size,
+				 unsigned long offset)
+{
+	const unsigned long *p = addr + BITOP_WORD(offset);
+	unsigned long result = offset & ~(BITS_PER_LONG-1);
+	unsigned long tmp;
+
+	if (offset >= size)
+		return size;
+	size -= result;
+	offset %= BITS_PER_LONG;
+	if (offset) {
+		tmp = *(p++);
+		tmp |= ~0UL >> (BITS_PER_LONG - offset);
+		if (size < BITS_PER_LONG)
+			goto found_first;
+		if (~tmp)
+			goto found_middle;
+		size -= BITS_PER_LONG;
+		result += BITS_PER_LONG;
+	}
+	while (size & ~(BITS_PER_LONG-1)) {
+		if (~(tmp = *(p++)))
+			goto found_middle;
+		result += BITS_PER_LONG;
+		size -= BITS_PER_LONG;
+	}
+	if (!size)
+		return result;
+	tmp = *p;
+
+found_first:
+	tmp |= ~0UL << size;
+	if (tmp == ~0UL)	/* Are any bits zero? */
+		return result + size;	/* Nope. */
+found_middle:
+	return result + ffz(tmp);
+ }
+static unsigned long _find_next_bit(const unsigned long *addr,
+		unsigned long nbits, unsigned long start, unsigned long invert)
+{
+	unsigned long tmp;
+
+	if (!nbits || start >= nbits)
+		return nbits;
+
+	tmp = addr[start / BITS_PER_LONG] ^ invert;
+
+	/* Handle 1st word. */
+	tmp &= BITMAP_FIRST_WORD_MASK(start);
+	start = round_down(start, BITS_PER_LONG);
+
+	while (!tmp) {
+		start += BITS_PER_LONG;
+		if (start >= nbits)
+			return nbits;
+
+		tmp = addr[start / BITS_PER_LONG] ^ invert;
+	}
+
+	return min(start + __ffs(tmp), nbits);
+}
+
+unsigned long find_next_bit_new(const unsigned long *addr, unsigned long size,
+			    unsigned long offset)
+{
+	return _find_next_bit(addr, size, offset, 0UL);
+}
+
+unsigned long find_next_zero_bit_new(const unsigned long *addr, unsigned long size,
+			    unsigned long offset)
+{
+	return _find_next_bit(addr, size, offset, ~0UL);
+}
+
+static int __init test_module_init(void)
+{
+	long long start, old, new;
+	unsigned long *addr;
+	unsigned long ret, i, nbits = 30*1024*1024;
+
+	addr = alloc_pages_exact(nbits/8, GFP_KERNEL);
+	if (!addr) {
+		pr_warn("No mem\n");
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < 10; i++) {
+		get_random_bytes(addr, nbits/8);
+
+		ret = 0;
+		start = jiffies;
+		while (ret < nbits)
+			ret = find_next_zero_bit_old(addr, nbits, ret + 1);
+
+		old = jiffies - start;
+
+		ret = 0;
+		start = jiffies;
+		while (ret < nbits)
+			ret = find_next_zero_bit_new(addr, nbits, ret + 1);
+
+		new = jiffies - start;
+		pr_warn("%lld\t%lld\n", old, new);
+	}
+
+	pr_warn("\n");
+
+	for (i = 0; i < 10; i++) {
+		get_random_bytes(addr, nbits/8);
+
+		ret = 0;
+		start = jiffies;
+		while (ret < nbits)
+			ret = find_next_bit_old(addr, nbits, ret + 1);
+
+		old = jiffies - start;
+
+		ret = 0;
+		start = jiffies;
+		while (ret < nbits)
+			ret = find_next_bit_new(addr, nbits, ret + 1);
+
+		new = jiffies - start;
+
+		pr_warn("%lld\t%lld\n", old, new);
+	}
+
+	free_pages_exact(addr, nbits/8);
+	return 0;
+}
+module_init(test_module_init);
+
+static void __exit test_module_exit(void) {}
+module_exit(test_module_exit);
+
+MODULE_AUTHOR("Yury Norov <yury.norov@gmail.com>");
+MODULE_LICENSE("GPL");
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2015-03-08 18:17 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-08 14:10 [PATCH v3 0/3] lib: find_*_bit reimplementation Yury Norov
2015-02-08 14:10 ` [PATCH v3 1/3] " Yury Norov
2015-02-08 18:48   ` George Spelvin
2015-02-09  8:32   ` George Spelvin
2015-02-09 11:53     ` Rasmus Villemoes
2015-02-09 16:45       ` George Spelvin
2015-02-11 22:14         ` Rasmus Villemoes
2015-02-11 23:05       ` Yury
2015-02-12  8:15         ` George Spelvin
2015-02-12  9:58           ` Rasmus Villemoes
2015-02-12 23:46             ` George Spelvin
2015-02-13 10:13               ` Rasmus Villemoes
2015-02-08 14:10 ` [PATCH v3 2/3] lib: move find_last_bit to lib/find_next_bit.c Yury Norov
2015-02-08 14:10 ` [PATCH v3 3/3] lib: rename lib/find_next_bit.c to lib/find_bit.c Yury Norov
2015-02-17  2:35 ` [PATCH v4 0/3] lib: find_*_bit reimplementation Yury Norov
2015-02-17  2:35   ` [PATCH v4 1/3] " Yury Norov
2015-02-18 17:57     ` Rasmus Villemoes
2015-02-17  2:35   ` [PATCH v4 2/3] lib: move find_last_bit to lib/find_next_bit.c Yury Norov
2015-02-17  2:35   ` [PATCH v4 3/3] lib: rename lib/find_next_bit.c to lib/find_bit.c Yury Norov
2015-02-22 17:24 ` [PATCH v5 0/3] lib: find_*_bit reimplementation Yury Norov
2015-02-22 17:24   ` [PATCH v5 1/3] " Yury Norov
2015-02-23 21:50     ` Rasmus Villemoes
2015-02-24  0:29       ` George Spelvin
2015-02-22 17:24   ` [PATCH v5 2/3] lib: move find_last_bit to lib/find_next_bit.c Yury Norov
2015-02-22 17:24   ` [PATCH v5 3/3] lib: rename lib/find_next_bit.c to lib/find_bit.c Yury Norov
2015-02-24  0:40   ` [PATCH v5 0/3] lib: find_*_bit reimplementation Andrew Morton
2015-03-08 18:17     ` Yury Norov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.