linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86: Change x86 to use generic find_next_bit
@ 2008-03-09 20:01 Alexander van Heukelum
  2008-03-09 20:10 ` Ingo Molnar
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Alexander van Heukelum @ 2008-03-09 20:01 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, LKML; +Cc: heukelum

x86: Change x86 to use the generic find_next_bit implementation

The versions with inline assembly are in fact slower on the machines I
tested them on (in userspace) (Athlon XP 2800+, p4-like Xeon 2.8GHz, AMD
Opteron 270). The i386-version needed a fix similar to 06024f21 to avoid
crashing the benchmark.

Benchmark using: gcc -fomit-frame-pointer -Os. For each bitmap size 
1...512, for each possible bitmap with one bit set, for each possible 
offset: find the position of the first bit starting at offset. If you 
follow ;). Times include setup of the bitmap and checking of the 
results.

		Athlon		Xeon		Opteron 32/64bit
x86-specific:	0m3.692s	0m2.820s	0m3.196s / 0m2.480s
generic:	0m2.622s	0m1.662s	0m2.100s / 0m1.572s

If the bitmap size is not a multiple of BITS_PER_LONG, and no set 
(cleared) bit is found, find_next_bit (find_next_zero_bit) returns a 
value outside of the range [0,size]. The generic version always returns 
exactly size. The generic version also uses unsigned long everywhere, 
while the x86 versions use a mishmash of int, unsigned (int), long and 
unsigned long.

Using the generic version does give a slightly bigger kernel, though.

defconfig:	   text    data     bss     dec     hex filename
x86-specific:	4738555  481232  626688 5846475  5935cb vmlinux (32 bit)
generic:	4738621  481232  626688 5846541  59360d vmlinux (32 bit)
x86-specific:	5392395  846568  724424 6963387  6a40bb vmlinux (64 bit)
generic:	5392458  846568  724424 6963450  6a40fa vmlinux (64 bit)

 arch/x86/Kconfig            |    3 ++
 arch/x86/lib/Makefile       |    2 +-
 arch/x86/lib/bitops_32.c    |   70 -------------------------------------------
 arch/x86/lib/bitops_64.c    |   68 -----------------------------------------
 include/asm-x86/bitops.h    |    6 ++++
 include/asm-x86/bitops_32.h |   16 ----------
 include/asm-x86/bitops_64.h |    2 -
 lib/find_next_bit.c         |    2 +
 8 files changed, 12 insertions(+), 157 deletions(-)

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>

---

Hi,

I think it would be a good idea to use the generic versions of
find_next_bit and find_next_zero_bit. The i386 versions have
a bug, and the generic ones are faster (in my probably 
not-so-representative micro-benchmark, that is).

Patch is against -x86#testing. It compiles.

Greetings,
	Alexander

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ab85a04..29a5a38 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -83,6 +83,9 @@ config GENERIC_BUG
 	def_bool y
 	depends on BUG
 
+config GENERIC_FIND_NEXT_BIT
+	def_bool y
+
 config GENERIC_HWEIGHT
 	def_bool y
 
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 25df1c1..4360932 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -11,7 +11,7 @@ lib-y += memcpy_$(BITS).o
 ifeq ($(CONFIG_X86_32),y)
         lib-y += checksum_32.o
         lib-y += strstr_32.o
-        lib-y += bitops_32.o semaphore_32.o string_32.o
+        lib-y += semaphore_32.o string_32.o
 
         lib-$(CONFIG_X86_USE_3DNOW) += mmx_32.o
 else
diff --git a/arch/x86/lib/bitops_32.c b/arch/x86/lib/bitops_32.c
deleted file mode 100644
index b654404..0000000
--- a/arch/x86/lib/bitops_32.c
+++ /dev/null
@@ -1,70 +0,0 @@
-#include <linux/bitops.h>
-#include <linux/module.h>
-
-/**
- * find_next_bit - find the next set bit in a memory region
- * @addr: The address to base the search on
- * @offset: The bitnumber to start searching at
- * @size: The maximum size to search
- */
-int find_next_bit(const unsigned long *addr, int size, int offset)
-{
-	const unsigned long *p = addr + (offset >> 5);
-	int set = 0, bit = offset & 31, res;
-
-	if (bit) {
-		/*
-		 * Look for nonzero in the first 32 bits:
-		 */
-		__asm__("bsfl %1,%0\n\t"
-			"jne 1f\n\t"
-			"movl $32, %0\n"
-			"1:"
-			: "=r" (set)
-			: "r" (*p >> bit));
-		if (set < (32 - bit))
-			return set + offset;
-		set = 32 - bit;
-		p++;
-	}
-	/*
-	 * No set bit yet, search remaining full words for a bit
-	 */
-	res = find_first_bit (p, size - 32 * (p - addr));
-	return (offset + set + res);
-}
-EXPORT_SYMBOL(find_next_bit);
-
-/**
- * find_next_zero_bit - find the first zero bit in a memory region
- * @addr: The address to base the search on
- * @offset: The bitnumber to start searching at
- * @size: The maximum size to search
- */
-int find_next_zero_bit(const unsigned long *addr, int size, int offset)
-{
-	const unsigned long *p = addr + (offset >> 5);
-	int set = 0, bit = offset & 31, res;
-
-	if (bit) {
-		/*
-		 * Look for zero in the first 32 bits.
-		 */
-		__asm__("bsfl %1,%0\n\t"
-			"jne 1f\n\t"
-			"movl $32, %0\n"
-			"1:"
-			: "=r" (set)
-			: "r" (~(*p >> bit)));
-		if (set < (32 - bit))
-			return set + offset;
-		set = 32 - bit;
-		p++;
-	}
-	/*
-	 * No zero yet, search remaining full bytes for a zero
-	 */
-	res = find_first_zero_bit(p, size - 32 * (p - addr));
-	return (offset + set + res);
-}
-EXPORT_SYMBOL(find_next_zero_bit);
diff --git a/arch/x86/lib/bitops_64.c b/arch/x86/lib/bitops_64.c
index 0e8f491..0eeb704 100644
--- a/arch/x86/lib/bitops_64.c
+++ b/arch/x86/lib/bitops_64.c
@@ -1,9 +1,7 @@
 #include <linux/bitops.h>
 
 #undef find_first_zero_bit
-#undef find_next_zero_bit
 #undef find_first_bit
-#undef find_next_bit
 
 static inline long
 __find_first_zero_bit(const unsigned long * addr, unsigned long size)
@@ -57,39 +55,6 @@ long find_first_zero_bit(const unsigned long * addr, unsigned long size)
 	return __find_first_zero_bit (addr, size);
 }
 
-/**
- * find_next_zero_bit - find the next zero bit in a memory region
- * @addr: The address to base the search on
- * @offset: The bitnumber to start searching at
- * @size: The maximum size to search
- */
-long find_next_zero_bit (const unsigned long * addr, long size, long offset)
-{
-	const unsigned long * p = addr + (offset >> 6);
-	unsigned long set = 0;
-	unsigned long res, bit = offset&63;
-
-	if (bit) {
-		/*
-		 * Look for zero in first word
-		 */
-		asm("bsfq %1,%0\n\t"
-		    "cmoveq %2,%0"
-		    : "=r" (set)
-		    : "r" (~(*p >> bit)), "r"(64L));
-		if (set < (64 - bit))
-			return set + offset;
-		set = 64 - bit;
-		p++;
-	}
-	/*
-	 * No zero yet, search remaining full words for a zero
-	 */
-	res = __find_first_zero_bit (p, size - 64 * (p - addr));
-
-	return (offset + set + res);
-}
-
 static inline long
 __find_first_bit(const unsigned long * addr, unsigned long size)
 {
@@ -136,40 +101,7 @@ long find_first_bit(const unsigned long * addr, unsigned long size)
 	return __find_first_bit(addr,size);
 }
 
-/**
- * find_next_bit - find the first set bit in a memory region
- * @addr: The address to base the search on
- * @offset: The bitnumber to start searching at
- * @size: The maximum size to search
- */
-long find_next_bit(const unsigned long * addr, long size, long offset)
-{
-	const unsigned long * p = addr + (offset >> 6);
-	unsigned long set = 0, bit = offset & 63, res;
-
-	if (bit) {
-		/*
-		 * Look for nonzero in the first 64 bits:
-		 */
-		asm("bsfq %1,%0\n\t"
-		    "cmoveq %2,%0\n\t"
-		    : "=r" (set)
-		    : "r" (*p >> bit), "r" (64L));
-		if (set < (64 - bit))
-			return set + offset;
-		set = 64 - bit;
-		p++;
-	}
-	/*
-	 * No set bit yet, search remaining full words for a bit
-	 */
-	res = __find_first_bit (p, size - 64 * (p - addr));
-	return (offset + set + res);
-}
-
 #include <linux/module.h>
 
-EXPORT_SYMBOL(find_next_bit);
 EXPORT_SYMBOL(find_first_bit);
 EXPORT_SYMBOL(find_first_zero_bit);
-EXPORT_SYMBOL(find_next_zero_bit);
diff --git a/include/asm-x86/bitops.h b/include/asm-x86/bitops.h
index 1a23ce1..7fbf93a 100644
--- a/include/asm-x86/bitops.h
+++ b/include/asm-x86/bitops.h
@@ -312,6 +312,12 @@ static int test_bit(int nr, const volatile unsigned long *addr);
 
 #undef ADDR
 
+unsigned long find_next_bit(const unsigned long *addr,
+		unsigned long size, unsigned long offset);
+unsigned long find_next_zero_bit(const unsigned long *addr,
+		unsigned long size, unsigned long offset);
+
+
 #ifdef CONFIG_X86_32
 # include "bitops_32.h"
 #else
diff --git a/include/asm-x86/bitops_32.h b/include/asm-x86/bitops_32.h
index e4d75fc..570f0fa 100644
--- a/include/asm-x86/bitops_32.h
+++ b/include/asm-x86/bitops_32.h
@@ -38,14 +38,6 @@ static inline int find_first_zero_bit(const unsigned long *addr, unsigned size)
 }
 
 /**
- * find_next_zero_bit - find the first zero bit in a memory region
- * @addr: The address to base the search on
- * @offset: The bit number to start searching at
- * @size: The maximum size to search
- */
-int find_next_zero_bit(const unsigned long *addr, int size, int offset);
-
-/**
  * __ffs - find first bit in word.
  * @word: The word to search
  *
@@ -81,14 +73,6 @@ static inline unsigned find_first_bit(const unsigned long *addr, unsigned size)
 }
 
 /**
- * find_next_bit - find the first set bit in a memory region
- * @addr: The address to base the search on
- * @offset: The bit number to start searching at
- * @size: The maximum size to search
- */
-int find_next_bit(const unsigned long *addr, int size, int offset);
-
-/**
  * ffz - find first zero in word.
  * @word: The word to search
  *
diff --git a/include/asm-x86/bitops_64.h b/include/asm-x86/bitops_64.h
index aaf1519..40bf0f8 100644
--- a/include/asm-x86/bitops_64.h
+++ b/include/asm-x86/bitops_64.h
@@ -6,9 +6,7 @@
  */
 
 extern long find_first_zero_bit(const unsigned long *addr, unsigned long size);
-extern long find_next_zero_bit(const unsigned long *addr, long size, long offset);
 extern long find_first_bit(const unsigned long *addr, unsigned long size);
-extern long find_next_bit(const unsigned long *addr, long size, long offset);
 
 /* return index of first bet set in val or max when no bit is set */
 static inline long __scanbit(unsigned long val, unsigned long max)
diff --git a/lib/find_next_bit.c b/lib/find_next_bit.c
index 78ccd73..5820e07 100644
--- a/lib/find_next_bit.c
+++ b/lib/find_next_bit.c
@@ -15,6 +15,8 @@
 #include <asm/byteorder.h>
 
 #define BITOP_WORD(nr)		((nr) / BITS_PER_LONG)
+#undef find_next_bit
+#undef find_next_zero_bit
 
 /**
  * find_next_bit - find the next set bit in a memory region


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86: Change x86 to use generic find_next_bit
  2008-03-09 20:01 [PATCH] x86: Change x86 to use generic find_next_bit Alexander van Heukelum
@ 2008-03-09 20:10 ` Ingo Molnar
  2008-03-09 21:03   ` Andi Kleen
  2008-03-09 21:13   ` Alexander van Heukelum
  2008-03-09 20:11 ` Ingo Molnar
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 19+ messages in thread
From: Ingo Molnar @ 2008-03-09 20:10 UTC (permalink / raw)
  To: Alexander van Heukelum; +Cc: Thomas Gleixner, H. Peter Anvin, LKML, heukelum


* Alexander van Heukelum <heukelum@mailshack.com> wrote:

> x86: Change x86 to use the generic find_next_bit implementation
> 
> The versions with inline assembly are in fact slower on the machines I 
> tested them on (in userspace) (Athlon XP 2800+, p4-like Xeon 2.8GHz, 
> AMD Opteron 270). The i386-version needed a fix similar to 06024f21 to 
> avoid crashing the benchmark.
> 
> Benchmark using: gcc -fomit-frame-pointer -Os. For each bitmap size 
> 1...512, for each possible bitmap with one bit set, for each possible 
> offset: find the position of the first bit starting at offset. If you 
> follow ;). Times include setup of the bitmap and checking of the 
> results.
> 
> 		Athlon		Xeon		Opteron 32/64bit
> x86-specific:	0m3.692s	0m2.820s	0m3.196s / 0m2.480s
> generic:	0m2.622s	0m1.662s	0m2.100s / 0m1.572s

ok, that's rather convincing.

the generic version in lib/find_next_bit.c is open-coded C which gcc can 
optimize pretty nicely.

the hand-coded assembly versions in arch/x86/lib/bitops_32.c mostly use 
the special x86 'bit search forward' (BSF) instruction - which i know 
from the days when the scheduler relied on it has some non-trivial setup 
costs. So especially when there's _small_ bitmasks involved, it's more 
expensive.

> If the bitmap size is not a multiple of BITS_PER_LONG, and no set 
> (cleared) bit is found, find_next_bit (find_next_zero_bit) returns a 
> value outside of the range [0,size]. The generic version always 
> returns exactly size. The generic version also uses unsigned long 
> everywhere, while the x86 versions use a mishmash of int, unsigned 
> (int), long and unsigned long.

i'm not surprised that the hand-coded assembly versions had a bug ...

[ this means we have to test it quite carefully though, as lots of code 
  only ever gets tested on x86 so code could have built dependency on 
  the buggy behavior. ]

> Using the generic version does give a slightly bigger kernel, though.
> 
> defconfig:	   text    data     bss     dec     hex filename
> x86-specific:	4738555  481232  626688 5846475  5935cb vmlinux (32 bit)
> generic:	4738621  481232  626688 5846541  59360d vmlinux (32 bit)
> x86-specific:	5392395  846568  724424 6963387  6a40bb vmlinux (64 bit)
> generic:	5392458  846568  724424 6963450  6a40fa vmlinux (64 bit)

i'd not worry about that too much. Have you tried to build with:

  CONFIG_CC_OPTIMIZE_FOR_SIZE=y
  CONFIG_OPTIMIZE_INLINING=y

(the latter only available in x86.git)

> Patch is against -x86#testing. It compiles.

i've picked it up into x86.git, lets see how it goes in practice.

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86: Change x86 to use generic find_next_bit
  2008-03-09 20:01 [PATCH] x86: Change x86 to use generic find_next_bit Alexander van Heukelum
  2008-03-09 20:10 ` Ingo Molnar
@ 2008-03-09 20:11 ` Ingo Molnar
  2008-03-09 20:31   ` Alexander van Heukelum
  2008-03-09 20:28 ` [PATCH] x86: Change x86 to use generic find_next_bit Andi Kleen
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2008-03-09 20:11 UTC (permalink / raw)
  To: Alexander van Heukelum; +Cc: Thomas Gleixner, H. Peter Anvin, LKML, heukelum


* Alexander van Heukelum <heukelum@mailshack.com> wrote:

> --- a/lib/find_next_bit.c
> +++ b/lib/find_next_bit.c
> @@ -15,6 +15,8 @@
>  #include <asm/byteorder.h>
>  
>  #define BITOP_WORD(nr)		((nr) / BITS_PER_LONG)
> +#undef find_next_bit
> +#undef find_next_zero_bit

this bit looks weird - did you need it for testing?

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86: Change x86 to use generic find_next_bit
  2008-03-09 20:01 [PATCH] x86: Change x86 to use generic find_next_bit Alexander van Heukelum
  2008-03-09 20:10 ` Ingo Molnar
  2008-03-09 20:11 ` Ingo Molnar
@ 2008-03-09 20:28 ` Andi Kleen
  2008-03-09 21:31 ` Andi Kleen
  2008-03-13 12:44 ` Aneesh Kumar K.V
  4 siblings, 0 replies; 19+ messages in thread
From: Andi Kleen @ 2008-03-09 20:28 UTC (permalink / raw)
  To: Alexander van Heukelum
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, LKML, heukelum

Alexander van Heukelum <heukelum@mailshack.com> writes:
> 
> If the bitmap size is not a multiple of BITS_PER_LONG, and no set 
> (cleared) bit is found, find_next_bit (find_next_zero_bit) returns a 
> value outside of the range [0,size]. The generic version always returns 
> exactly size.

With that change it is likely possible to remove the min(NR_CPUS in
lib/cpumask.c __first/__next_cpu. iirc it was just a workaround for
the x86 quirk. I suspect with such a change it would be possible to
inline those again, possibly speeding up some loops who do cpumask
walking (iirc the scheduler used to do that frequently)

-Andi


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86: Change x86 to use generic find_next_bit
  2008-03-09 20:11 ` Ingo Molnar
@ 2008-03-09 20:31   ` Alexander van Heukelum
  2008-03-09 20:51     ` Ingo Molnar
  0 siblings, 1 reply; 19+ messages in thread
From: Alexander van Heukelum @ 2008-03-09 20:31 UTC (permalink / raw)
  To: Ingo Molnar, Alexander van Heukelum; +Cc: Thomas Gleixner, H. Peter Anvin, LKML


On Sun, 9 Mar 2008 21:11:52 +0100, "Ingo Molnar" <mingo@elte.hu> said:
> * Alexander van Heukelum <heukelum@mailshack.com> wrote:
> > --- a/lib/find_next_bit.c
> > +++ b/lib/find_next_bit.c
> > @@ -15,6 +15,8 @@
> >  #include <asm/byteorder.h>
> >  
> >  #define BITOP_WORD(nr)		((nr) / BITS_PER_LONG)
> > +#undef find_next_bit
> > +#undef find_next_zero_bit
> 
> this bit looks weird - did you need it for testing?

Worse, it's needed to get x86_64 to compile.

They are defined in include/asm-x86/bitops_64.h (which gets
included). They are used to optimize the case where the
bitmap size is known at compile time and not larger than
BITS_PER_LONG. Undeffing them here is the easiest way to get
things to compile, here.

Greetings,
    Alexander

-- 
http://www.fastmail.fm - The way an email service should be


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86: Change x86 to use generic find_next_bit
  2008-03-09 20:31   ` Alexander van Heukelum
@ 2008-03-09 20:51     ` Ingo Molnar
  2008-03-09 21:29       ` Andi Kleen
  2008-03-10 23:17       ` [RFC/PATCH] x86: Optimize find_next_(zero_)bit for small constant-size bitmaps Alexander van Heukelum
  0 siblings, 2 replies; 19+ messages in thread
From: Ingo Molnar @ 2008-03-09 20:51 UTC (permalink / raw)
  To: Alexander van Heukelum
  Cc: Alexander van Heukelum, Thomas Gleixner, H. Peter Anvin, LKML


* Alexander van Heukelum <heukelum@fastmail.fm> wrote:

> > >  #define BITOP_WORD(nr)		((nr) / BITS_PER_LONG)
> > > +#undef find_next_bit
> > > +#undef find_next_zero_bit
> > 
> > this bit looks weird - did you need it for testing?
> 
> Worse, it's needed to get x86_64 to compile.
> 
> They are defined in include/asm-x86/bitops_64.h (which gets included). 
> They are used to optimize the case where the bitmap size is known at 
> compile time and not larger than BITS_PER_LONG. Undeffing them here is 
> the easiest way to get things to compile, here.

ok - but this needs to be solved in a cleaner way. That build-time 
optimization needs to be pushed into generic code so that 32-bit x86 and 
other architectures can make use of it as well. The lib/find_next_bit.c 
functions should be named __find_next_bit() or so.

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86: Change x86 to use generic find_next_bit
  2008-03-09 20:10 ` Ingo Molnar
@ 2008-03-09 21:03   ` Andi Kleen
  2008-03-09 21:32     ` Andi Kleen
  2008-03-09 21:13   ` Alexander van Heukelum
  1 sibling, 1 reply; 19+ messages in thread
From: Andi Kleen @ 2008-03-09 21:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Alexander van Heukelum, Thomas Gleixner, H. Peter Anvin, LKML, heukelum

Ingo Molnar <mingo@elte.hu> writes:
> 
> the generic version in lib/find_next_bit.c is open-coded C which gcc can 
> optimize pretty nicely.
> 
> the hand-coded assembly versions in arch/x86/lib/bitops_32.c mostly use 
> the special x86 'bit search forward' (BSF) instruction - which i know 
> from the days when the scheduler relied on it has some non-trivial setup 

~14 cycles on K8 for memory, but if you stay in a register it is 8 cycles

> costs. So especially when there's _small_ bitmasks involved, it's more 
> expensive.

I had a patchkit some time ago to special case the max_bit <= 63 case
and always use directly inlined stream lined single instruction
assembler for that. There was still some issue and I dropped it then,
but doing something like that makes still sense. Even if the BSF 
is slightly slower than the open coded version just getting rid
of the CALL will make it a win and it could be also kept in a register
so you get the 8 cycle variant (for which I doubt you can do 
it faster open coded) 

The result would be that a standard for_each_cpu () in a NR_CPUS <= 64
kernel wouldn't have any unnecessary calls.

In general the problem of walking cpu masks is quite different
from seaching ext2 bitmaps, so they likely should be special cased
and optimized for each.

-Andi


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86: Change x86 to use generic find_next_bit
  2008-03-09 20:10 ` Ingo Molnar
  2008-03-09 21:03   ` Andi Kleen
@ 2008-03-09 21:13   ` Alexander van Heukelum
  2008-03-10  6:29     ` Ingo Molnar
  1 sibling, 1 reply; 19+ messages in thread
From: Alexander van Heukelum @ 2008-03-09 21:13 UTC (permalink / raw)
  To: Ingo Molnar, Alexander van Heukelum; +Cc: Thomas Gleixner, H. Peter Anvin, LKML

On Sun, 9 Mar 2008 21:10:16 +0100, "Ingo Molnar" <mingo@elte.hu> said:
> > 		Athlon		Xeon		Opteron 32/64bit
> > x86-specific:	0m3.692s	0m2.820s	0m3.196s / 0m2.480s
> > generic:	0m2.622s	0m1.662s	0m2.100s / 0m1.572s
> 
> ok, that's rather convincing.
> 
> the generic version in lib/find_next_bit.c is open-coded C which gcc can 
> optimize pretty nicely.
> 
> the hand-coded assembly versions in arch/x86/lib/bitops_32.c mostly use 
> the special x86 'bit search forward' (BSF) instruction - which i know 
> from the days when the scheduler relied on it has some non-trivial setup 
> costs. So especially when there's _small_ bitmasks involved, it's more 
> expensive.

Hi,

BSF is fine, it doesn't need any special setup. The problem is probably
that the old versions use find_first_bit and find_first_zero_bit,
which are also hand optimized versions... and they use "repe scasl/q".
That's another little project ;).

> > If the bitmap size is not a multiple of BITS_PER_LONG, and no set 
> > (cleared) bit is found, find_next_bit (find_next_zero_bit) returns a 
> > value outside of the range [0,size]. The generic version always 
> > returns exactly size. The generic version also uses unsigned long 
> > everywhere, while the x86 versions use a mishmash of int, unsigned 
> > (int), long and unsigned long.
> 
> i'm not surprised that the hand-coded assembly versions had a bug ...

Not surprised about the bug, but it was in fact noticed, and fixed
in x86_64!

> [ this means we have to test it quite carefully though, as lots of code 
>   only ever gets tested on x86 so code could have built dependency on 
>   the buggy behavior. ]

Agreed.

> > Using the generic version does give a slightly bigger kernel, though.
> > 
> > defconfig:	   text    data     bss     dec     hex filename
> > x86-specific:	4738555  481232  626688 5846475  5935cb vmlinux (32 bit)
> > generic:	4738621  481232  626688 5846541  59360d vmlinux (32 bit)
> > x86-specific:	5392395  846568  724424 6963387  6a40bb vmlinux (64 bit)
> > generic:	5392458  846568  724424 6963450  6a40fa vmlinux (64 bit)
> 
> i'd not worry about that too much. Have you tried to build with:

I don't but I needed to compile something to test the build anyhow ;)

>   CONFIG_CC_OPTIMIZE_FOR_SIZE=y
>   CONFIG_OPTIMIZE_INLINING=y

This was defconfig in -x86#testing, they were both already enabled. 
Here is what you get with those options turned off ;).

                   text    data     bss     dec     hex filename
x86-specific:   5543996  481232  626688 6651916  65800c vmlinux (32 bit)
generic:        5543880  481232  626688 6651800  657f98 vmlinux (32 bit)
x86-specific:   6111834  846568  724424 7682826  753b0a vmlinux (64 bit)
generic:        6111882  846568  724424 7682874  753b3a vmlinux (64 bit)

(and I double-checked the i386 results)

> (the latter only available in x86.git)
> 
> > Patch is against -x86#testing. It compiles.
> 
> i've picked it up into x86.git, lets see how it goes in practice.

Thanks,
    Alexander

> 	Ingo
-- 
  Alexander van Heukelum
  heukelum@fastmail.fm

-- 
http://www.fastmail.fm - And now for something completely different…


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86: Change x86 to use generic find_next_bit
  2008-03-09 20:51     ` Ingo Molnar
@ 2008-03-09 21:29       ` Andi Kleen
  2008-03-10 23:17       ` [RFC/PATCH] x86: Optimize find_next_(zero_)bit for small constant-size bitmaps Alexander van Heukelum
  1 sibling, 0 replies; 19+ messages in thread
From: Andi Kleen @ 2008-03-09 21:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Alexander van Heukelum, Alexander van Heukelum, Thomas Gleixner,
	H. Peter Anvin, LKML

Ingo Molnar <mingo@elte.hu> writes:
> 
> ok - but this needs to be solved in a cleaner way. That build-time 
> optimization needs to be pushed into generic code so that 32-bit x86 and 

It probably doesn't make sense in generic code because it will be 
too large open coded.

-Andi



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86: Change x86 to use generic find_next_bit
  2008-03-09 20:01 [PATCH] x86: Change x86 to use generic find_next_bit Alexander van Heukelum
                   ` (2 preceding siblings ...)
  2008-03-09 20:28 ` [PATCH] x86: Change x86 to use generic find_next_bit Andi Kleen
@ 2008-03-09 21:31 ` Andi Kleen
  2008-03-13 12:44 ` Aneesh Kumar K.V
  4 siblings, 0 replies; 19+ messages in thread
From: Andi Kleen @ 2008-03-09 21:31 UTC (permalink / raw)
  To: Alexander van Heukelum
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, LKML, heukelum

Alexander van Heukelum <heukelum@mailshack.com> writes:
> 
> Benchmark using: gcc -fomit-frame-pointer -Os. For each bitmap size 
> 1...512, for each possible bitmap with one bit set, for each possible 
> offset: find the position of the first bit starting at offset. If you 
> follow ;). Times include setup of the bitmap and checking of the 
> results.

BTW another comment: it would be far more sense if you did 
some profiling on what bitmap sizes a real kernel uses
(should be easy enough with some systemtap) and benchmarked
only that.

I doubt 1 bit bitmap searches are common for example ...

-Andi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86: Change x86 to use generic find_next_bit
  2008-03-09 21:03   ` Andi Kleen
@ 2008-03-09 21:32     ` Andi Kleen
  0 siblings, 0 replies; 19+ messages in thread
From: Andi Kleen @ 2008-03-09 21:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Alexander van Heukelum, Thomas Gleixner, H. Peter Anvin, LKML, heukelum

Andi Kleen <andi@firstfloor.org> writes:
> 
> I had a patchkit some time ago to special case the max_bit <= 63 case

... never mind ... the patchkit is actually included. I had only
dropped it at some point, but fixed later.

-Andi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86: Change x86 to use generic find_next_bit
  2008-03-09 21:13   ` Alexander van Heukelum
@ 2008-03-10  6:29     ` Ingo Molnar
  0 siblings, 0 replies; 19+ messages in thread
From: Ingo Molnar @ 2008-03-10  6:29 UTC (permalink / raw)
  To: Alexander van Heukelum
  Cc: Alexander van Heukelum, Thomas Gleixner, H. Peter Anvin, LKML


* Alexander van Heukelum <heukelum@fastmail.fm> wrote:

> > ok, that's rather convincing.
> > 
> > the generic version in lib/find_next_bit.c is open-coded C which gcc 
> > can optimize pretty nicely.
> > 
> > the hand-coded assembly versions in arch/x86/lib/bitops_32.c mostly 
> > use the special x86 'bit search forward' (BSF) instruction - which i 
> > know from the days when the scheduler relied on it has some 
> > non-trivial setup costs. So especially when there's _small_ bitmasks 
> > involved, it's more expensive.
> 
> Hi,
> 
> BSF is fine, it doesn't need any special setup. [...]

under "setup costs" i mean cycles spent by the CPU itself - the 
instruction itself is simple (of course) and needs no setup. If you look 
at BSF performance you'll see that it has nontrivial overhead.

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC/PATCH] x86: Optimize find_next_(zero_)bit for small constant-size bitmaps
  2008-03-09 20:51     ` Ingo Molnar
  2008-03-09 21:29       ` Andi Kleen
@ 2008-03-10 23:17       ` Alexander van Heukelum
  2008-03-11  9:56         ` Ingo Molnar
  1 sibling, 1 reply; 19+ messages in thread
From: Alexander van Heukelum @ 2008-03-10 23:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Alexander van Heukelum, Andi Kleen, Thomas Gleixner,
	H. Peter Anvin, LKML

x86: Optimize find_next_(zero_)bit for small constant-size bitmaps

NOTE: This is not well tested. I'm also quite unsure if this makes the 
pre-processor situation any better.

On an i386 defconfig (the x86#testing one), the size of vmlinux hardly
changes with this applied. I have observed only four places where this
optimization avoids a call into find_next_bit:

In the functions return_unused_surplus_pages, alloc_fresh_huge_page,
and adjust_pool_surplus, this patch avoids a call for a 1-bit bitmap.
In __next_cpu a call is avoided for a 32-bit bitmap. That's it.

On x86_64 I observed the following (I did not look closely yet, though).

Current #testing defconfig: 
	146 x bsf, 27 x find_next_*bit
   text    data     bss     dec     hex filename
   5392637  846592  724424 6963653  6a41c5 vmlinux

After removing the x86_64 specific optimization for find_next_*bit:
	94 x bsf, 79 x find_next_*bit
   text    data     bss     dec     hex filename
   5392358  846592  724424 6963374  6a40ae vmlinux

After this patch (making the optimization generic):
	146 x bsf, 27 x find_next_*bit
   text    data     bss     dec     hex filename
   5392396  846592  724424 6963412  6a40d4 vmlinux

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>

---

> ok - but this needs to be solved in a cleaner way. That build-time 
> optimization needs to be pushed into generic code so that 32-bit x86 and 
> other architectures can make use of it as well. The lib/find_next_bit.c 
> functions should be named __find_next_bit() or so.
> 
> 	Ingo

I think this is what you had in mind? It seems to work.

Alternative implementation ideas? Comments? In particular: does
this break non-x86?

Greetings,
	Alexander

 include/asm-x86/bitops.h    |    4 +-
 include/asm-x86/bitops_64.h |   10 -------
 include/linux/bitops.h      |   60 +++++++++++++++++++++++++++++++++++++++++++
 lib/find_next_bit.c         |   10 +++----
 4 files changed, 66 insertions(+), 18 deletions(-)

diff --git a/include/asm-x86/bitops.h b/include/asm-x86/bitops.h
index 7fbf93a..cbbe329 100644
--- a/include/asm-x86/bitops.h
+++ b/include/asm-x86/bitops.h
@@ -312,9 +312,9 @@ static int test_bit(int nr, const volatile unsigned long *addr);
 
 #undef ADDR
 
-unsigned long find_next_bit(const unsigned long *addr,
+unsigned long __find_next_bit(const unsigned long *addr,
 		unsigned long size, unsigned long offset);
-unsigned long find_next_zero_bit(const unsigned long *addr,
+unsigned long __find_next_zero_bit(const unsigned long *addr,
 		unsigned long size, unsigned long offset);
 
 
diff --git a/include/asm-x86/bitops_64.h b/include/asm-x86/bitops_64.h
index 40bf0f8..87e1a17 100644
--- a/include/asm-x86/bitops_64.h
+++ b/include/asm-x86/bitops_64.h
@@ -20,21 +20,11 @@ static inline long __scanbit(unsigned long val, unsigned long max)
   (__scanbit(*(unsigned long *)addr,(size))) : \
   find_first_bit(addr,size)))
 
-#define find_next_bit(addr,size,off) \
-((__builtin_constant_p(size) && (size) <= BITS_PER_LONG ? 	  \
-  ((off) + (__scanbit((*(unsigned long *)addr) >> (off),(size)-(off)))) : \
-	find_next_bit(addr,size,off)))
-
 #define find_first_zero_bit(addr,size) \
 ((__builtin_constant_p(size) && (size) <= BITS_PER_LONG ? \
   (__scanbit(~*(unsigned long *)addr,(size))) : \
   	find_first_zero_bit(addr,size)))
 	
-#define find_next_zero_bit(addr,size,off) \
-((__builtin_constant_p(size) && (size) <= BITS_PER_LONG ? 	  \
-  ((off)+(__scanbit(~(((*(unsigned long *)addr)) >> (off)),(size)-(off)))) : \
-	find_next_zero_bit(addr,size,off)))
-
 static inline void set_bit_string(unsigned long *bitmap, unsigned long i, 
 				  int len) 
 { 
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 69c1edb..ba78ca1 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -72,4 +72,64 @@ static inline unsigned fls_long(unsigned long l)
 	return fls64(l);
 }
 
+#ifdef __KERNEL__
+#ifdef CONFIG_GENERIC_FIND_NEXT_BIT
+unsigned long __find_next_bit(const unsigned long *addr,
+		unsigned long size, unsigned long offset);
+
+static __always_inline unsigned long
+find_next_bit(const unsigned long *addr, unsigned long size,
+		unsigned long offset)
+{
+	unsigned long value;
+
+	/* Here we would like to use a version of ffs that */
+	/* returns BITS_PER_LONG if its argument is zero. */
+	if (__builtin_constant_p(size) && (size == BITS_PER_LONG)) {
+		value = (*addr) & ((~0ul) << offset);
+		return (value == 0) ? BITS_PER_LONG : __ffs(value);
+	}
+
+	/* Less than BITS_PER_LONG: put in sentinel, then __ffs */
+	/* Here we would like to have a __set_bit/__ffs combo */
+	if (__builtin_constant_p(size) && (size < BITS_PER_LONG)) {
+		value = (*addr) & ((~0ul) << offset);
+		value |= (1ul << size);
+		return __ffs(value);
+	}
+
+	/* size not constant or too big */
+	return __find_next_bit(addr, size, offset);
+}
+
+unsigned long __find_next_zero_bit(const unsigned long *addr,
+		unsigned long size, unsigned long offset);
+
+static __always_inline unsigned long
+find_next_zero_bit(const unsigned long *addr, unsigned long size,
+		unsigned long offset)
+{
+	unsigned long value;
+
+	/* Here we would like to use a version of ffs that */
+	/* returns BITS_PER_LONG if its argument is zero. */
+	if (__builtin_constant_p(size) && (size == BITS_PER_LONG)) {
+		value = (~(*addr)) & ((~0ul) << offset);
+		return (value == 0) ? BITS_PER_LONG : __ffs(value);
+	}
+
+	/* Less than BITS_PER_LONG: put in sentinel, then __ffs */
+	/* Here we would like to have a __set_bit/__ffs combo. */
+	if (__builtin_constant_p(size) && (size < BITS_PER_LONG)) {
+		value = (~(*addr)) & ((~0ul) << offset);
+		value |= (1ul << size);
+		return __ffs(value);
+	}
+
+	/* size not constant or too big */
+	return __find_next_zero_bit(addr, size, offset);
+}
+
+#endif /* CONFIG_GENERIC_FIND_NEXT_BIT */
+#endif /* __KERNEL__ */
 #endif
diff --git a/lib/find_next_bit.c b/lib/find_next_bit.c
index 5820e07..5249f4a 100644
--- a/lib/find_next_bit.c
+++ b/lib/find_next_bit.c
@@ -15,8 +15,6 @@
 #include <asm/byteorder.h>
 
 #define BITOP_WORD(nr)		((nr) / BITS_PER_LONG)
-#undef find_next_bit
-#undef find_next_zero_bit
 
 /**
  * find_next_bit - find the next set bit in a memory region
@@ -24,8 +22,8 @@
  * @offset: The bitnumber to start searching at
  * @size: The maximum size to search
  */
-unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
-		unsigned long offset)
+unsigned long __find_next_bit(const unsigned long *addr,
+		unsigned long size, unsigned long offset)
 {
 	const unsigned long *p = addr + BITOP_WORD(offset);
 	unsigned long result = offset & ~(BITS_PER_LONG-1);
@@ -69,8 +67,8 @@ EXPORT_SYMBOL(find_next_bit);
  * This implementation of find_{first,next}_zero_bit was stolen from
  * Linus' asm-alpha/bitops.h.
  */
-unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
-		unsigned long offset)
+unsigned long __find_next_zero_bit(const unsigned long *addr,
+		unsigned long size, unsigned long offset)
 {
 	const unsigned long *p = addr + BITOP_WORD(offset);
 	unsigned long result = offset & ~(BITS_PER_LONG-1);


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC/PATCH] x86: Optimize find_next_(zero_)bit for small constant-size bitmaps
  2008-03-10 23:17       ` [RFC/PATCH] x86: Optimize find_next_(zero_)bit for small constant-size bitmaps Alexander van Heukelum
@ 2008-03-11  9:56         ` Ingo Molnar
  2008-03-11 15:17           ` [PATCH] " Alexander van Heukelum
  0 siblings, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2008-03-11  9:56 UTC (permalink / raw)
  To: Alexander van Heukelum
  Cc: Alexander van Heukelum, Andi Kleen, Thomas Gleixner,
	H. Peter Anvin, LKML


* Alexander van Heukelum <heukelum@mailshack.com> wrote:

> x86: Optimize find_next_(zero_)bit for small constant-size bitmaps

thanks, looks nice, except for a small detail:

> +#ifdef CONFIG_GENERIC_FIND_NEXT_BIT
> +unsigned long __find_next_bit(const unsigned long *addr,
> +		unsigned long size, unsigned long offset);
> +
> +static __always_inline unsigned long
> +find_next_bit(const unsigned long *addr, unsigned long size,
> +		unsigned long offset)

> +#endif /* CONFIG_GENERIC_FIND_NEXT_BIT */

there should be an #else here i think, which maps find_next_bit to 
__find_next_bit / etc.

small syntactic nit:

> +unsigned long __find_next_bit(const unsigned long *addr,
> +		unsigned long size, unsigned long offset);

should be explicitly "extern", so that we see that it's a prototype 
declaration.

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] x86: Optimize find_next_(zero_)bit for small constant-size bitmaps
  2008-03-11  9:56         ` Ingo Molnar
@ 2008-03-11 15:17           ` Alexander van Heukelum
  2008-03-11 15:22             ` [RFC] non-x86: " Alexander van Heukelum
  2008-03-11 15:23             ` [PATCH] x86: " Ingo Molnar
  0 siblings, 2 replies; 19+ messages in thread
From: Alexander van Heukelum @ 2008-03-11 15:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Alexander van Heukelum, Andi Kleen, Thomas Gleixner,
	H. Peter Anvin, LKML

x86: Optimize find_next_(zero_)bit for small constant-size bitmaps.

This moves an optimization for searching constant-sized small 
bitmaps form x86_64-specific to generic code.

On an i386 defconfig (the x86#testing one), the size of vmlinux hardly
changes with this applied. I have observed only four places where this
optimization avoids a call into find_next_bit:

In the functions return_unused_surplus_pages, alloc_fresh_huge_page,
and adjust_pool_surplus, this patch avoids a call for a 1-bit bitmap.
In __next_cpu a call is avoided for a 32-bit bitmap. That's it.

On x86_64, 52 locations are optimized with a minimal increase in
code size:

Current #testing defconfig: 
	146 x bsf, 27 x find_next_*bit
   text    data     bss     dec     hex filename
   5392637  846592  724424 6963653  6a41c5 vmlinux

After removing the x86_64 specific optimization for find_next_*bit:
	94 x bsf, 79 x find_next_*bit
   text    data     bss     dec     hex filename
   5392358  846592  724424 6963374  6a40ae vmlinux

After this patch (making the optimization generic):
	146 x bsf, 27 x find_next_*bit
   text    data     bss     dec     hex filename
   5392396  846592  724424 6963412  6a40d4 vmlinux

---

Hi Ingo,

I have now tested (in user-space) the one-long-versions of the 
find_next_(zero_)bit in the patch. They give the expected
results.

> > +#ifdef CONFIG_GENERIC_FIND_NEXT_BIT
> > +unsigned long __find_next_bit(const unsigned long *addr,
> > +		unsigned long size, unsigned long offset);
> > +
> > +static __always_inline unsigned long
> > +find_next_bit(const unsigned long *addr, unsigned long size,
> > +		unsigned long offset)
> 
> > +#endif /* CONFIG_GENERIC_FIND_NEXT_BIT */
> 
> there should be an #else here i think, which maps find_next_bit to 
> __find_next_bit / etc.

No, that is intentional: architectures are expected to either set
CONFIG_GENERIC_FIND_NEXT_BIT, or provide their own implementation
of find_next_bit.

An alternative would be to have all architectures providing
__find_next_bit/__find_next_zero_bit and do the optimization
unconditionally.

> > +unsigned long __find_next_bit(const unsigned long *addr,
> > +		unsigned long size, unsigned long offset);
> 
> should be explicitly "extern", so that we see that it's a prototype 
> declaration.

ok.

This version runs fine on qemu.

Greetings,
	Alexander

diff --git a/include/asm-x86/bitops.h b/include/asm-x86/bitops.h
index 7fbf93a..1a23ce1 100644
--- a/include/asm-x86/bitops.h
+++ b/include/asm-x86/bitops.h
@@ -312,12 +312,6 @@ static int test_bit(int nr, const volatile unsigned long *addr);
 
 #undef ADDR
 
-unsigned long find_next_bit(const unsigned long *addr,
-		unsigned long size, unsigned long offset);
-unsigned long find_next_zero_bit(const unsigned long *addr,
-		unsigned long size, unsigned long offset);
-
-
 #ifdef CONFIG_X86_32
 # include "bitops_32.h"
 #else
diff --git a/include/asm-x86/bitops_64.h b/include/asm-x86/bitops_64.h
index 40bf0f8..87e1a17 100644
--- a/include/asm-x86/bitops_64.h
+++ b/include/asm-x86/bitops_64.h
@@ -20,21 +20,11 @@ static inline long __scanbit(unsigned long val, unsigned long max)
   (__scanbit(*(unsigned long *)addr,(size))) : \
   find_first_bit(addr,size)))
 
-#define find_next_bit(addr,size,off) \
-((__builtin_constant_p(size) && (size) <= BITS_PER_LONG ? 	  \
-  ((off) + (__scanbit((*(unsigned long *)addr) >> (off),(size)-(off)))) : \
-	find_next_bit(addr,size,off)))
-
 #define find_first_zero_bit(addr,size) \
 ((__builtin_constant_p(size) && (size) <= BITS_PER_LONG ? \
   (__scanbit(~*(unsigned long *)addr,(size))) : \
   	find_first_zero_bit(addr,size)))
 	
-#define find_next_zero_bit(addr,size,off) \
-((__builtin_constant_p(size) && (size) <= BITS_PER_LONG ? 	  \
-  ((off)+(__scanbit(~(((*(unsigned long *)addr)) >> (off)),(size)-(off)))) : \
-	find_next_zero_bit(addr,size,off)))
-
 static inline void set_bit_string(unsigned long *bitmap, unsigned long i, 
 				  int len) 
 { 
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 69c1edb..15360f9 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -72,4 +72,81 @@ static inline unsigned fls_long(unsigned long l)
 	return fls64(l);
 }
 
+#ifdef __KERNEL__
+#ifdef CONFIG_GENERIC_FIND_NEXT_BIT
+extern unsigned long __find_next_bit(const unsigned long *addr,
+		unsigned long size, unsigned long offset);
+
+/**
+ * find_next_bit - find the next set bit in a memory region
+ * @addr: The address to base the search on
+ * @offset: The bitnumber to start searching at
+ * @size: The bitmap size in bits
+ */
+static __always_inline unsigned long
+find_next_bit(const unsigned long *addr, unsigned long size,
+		unsigned long offset)
+{
+	unsigned long value;
+
+	/* Avoid a function call if the bitmap size is a constant */
+	/* and not bigger than BITS_PER_LONG. */
+
+	/* insert a sentinel so that __ffs returns size if there */
+	/* are no set bits in the bitmap */
+	if (__builtin_constant_p(size) && (size < BITS_PER_LONG)) {
+		value = (*addr) & ((~0ul) << offset);
+		value |= (1ul << size);
+		return __ffs(value);
+	}
+
+	/* the result of __ffs(0) is undefined, so it needs to be */
+	/* handled separately */
+	if (__builtin_constant_p(size) && (size == BITS_PER_LONG)) {
+		value = (*addr) & ((~0ul) << offset);
+		return (value == 0) ? BITS_PER_LONG : __ffs(value);
+	}
+
+	/* size is not constant or too big */
+	return __find_next_bit(addr, size, offset);
+}
+
+extern unsigned long __find_next_zero_bit(const unsigned long *addr,
+		unsigned long size, unsigned long offset);
+
+/**
+ * find_next_zero_bit - find the next cleared bit in a memory region
+ * @addr: The address to base the search on
+ * @offset: The bitnumber to start searching at
+ * @size: The bitmap size in bits
+ */
+static __always_inline unsigned long
+find_next_zero_bit(const unsigned long *addr, unsigned long size,
+		unsigned long offset)
+{
+	unsigned long value;
+
+	/* Avoid a function call if the bitmap size is a constant */
+	/* and not bigger than BITS_PER_LONG. */
+
+	/* insert a sentinel so that __ffs returns size if there */
+	/* are no set bits in the bitmap */
+	if (__builtin_constant_p(size) && (size < BITS_PER_LONG)) {
+		value = (~(*addr)) & ((~0ul) << offset);
+		value |= (1ul << size);
+		return __ffs(value);
+	}
+
+	/* the result of __ffs(0) is undefined, so it needs to be */
+	/* handled separately */
+	if (__builtin_constant_p(size) && (size == BITS_PER_LONG)) {
+		value = (~(*addr)) & ((~0ul) << offset);
+		return (value == 0) ? BITS_PER_LONG : __ffs(value);
+	}
+
+	/* size is not constant or too big */
+	return __find_next_zero_bit(addr, size, offset);
+}
+#endif /* CONFIG_GENERIC_FIND_NEXT_BIT */
+#endif /* __KERNEL__ */
 #endif
diff --git a/lib/find_next_bit.c b/lib/find_next_bit.c
index 5820e07..ce94c4c 100644
--- a/lib/find_next_bit.c
+++ b/lib/find_next_bit.c
@@ -15,17 +15,12 @@
 #include <asm/byteorder.h>
 
 #define BITOP_WORD(nr)		((nr) / BITS_PER_LONG)
-#undef find_next_bit
-#undef find_next_zero_bit
-
-/**
- * find_next_bit - find the next set bit in a memory region
- * @addr: The address to base the search on
- * @offset: The bitnumber to start searching at
- * @size: The maximum size to search
+
+/*
+ * Find the next set bit in a memory region.
  */
-unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
-		unsigned long offset)
+unsigned long __find_next_bit(const unsigned long *addr,
+		unsigned long size, unsigned long offset)
 {
 	const unsigned long *p = addr + BITOP_WORD(offset);
 	unsigned long result = offset & ~(BITS_PER_LONG-1);
@@ -62,15 +57,14 @@ found_first:
 found_middle:
 	return result + __ffs(tmp);
 }
-
-EXPORT_SYMBOL(find_next_bit);
+EXPORT_SYMBOL(__find_next_bit);
 
 /*
  * This implementation of find_{first,next}_zero_bit was stolen from
  * Linus' asm-alpha/bitops.h.
  */
-unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
-		unsigned long offset)
+unsigned long __find_next_zero_bit(const unsigned long *addr,
+		unsigned long size, unsigned long offset)
 {
 	const unsigned long *p = addr + BITOP_WORD(offset);
 	unsigned long result = offset & ~(BITS_PER_LONG-1);
@@ -107,8 +101,7 @@ found_first:
 found_middle:
 	return result + ffz(tmp);
 }
-
-EXPORT_SYMBOL(find_next_zero_bit);
+EXPORT_SYMBOL(__find_next_zero_bit);
 
 #ifdef __BIG_ENDIAN
 


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC] non-x86: Optimize find_next_(zero_)bit for small constant-size bitmaps
  2008-03-11 15:17           ` [PATCH] " Alexander van Heukelum
@ 2008-03-11 15:22             ` Alexander van Heukelum
  2008-03-11 15:23             ` [PATCH] x86: " Ingo Molnar
  1 sibling, 0 replies; 19+ messages in thread
From: Alexander van Heukelum @ 2008-03-11 15:22 UTC (permalink / raw)
  To: linux-arch
  Cc: Alexander van Heukelum, Ingo Molnar, Andi Kleen, Thomas Gleixner,
	H. Peter Anvin, LKML

non-x86: Optimize small bitmap searches for the other archs.

Concerns archs that have their own implementation of find_next_bit
and find_next_zero_bit.

For archs that define CONFIG_GENERIC_FIND_NEXT_BIT, a patch is
submitted to be included in Ingo's x86-testing tree. It moves
an optimization for searching small, constant-sized bitmaps
from x86_64-specific to generic code. It works by creating
a new __always_inline-marked function in linux/bitops.h, and
renaming find_next_(zero_)bit to __find_next_(zero_)bit in
the generic code. The new code is currently guarded by
CONFIG_GENERIC_FIND_NEXT_BIT, so other archs are expected to
work as before.

To enable the optimization globally, all other archs need to
implement __find_next_bit and __find_next_zero_bit either
by exporting these symbols, or by implementing them as inline
functions in their asm/bitops.h.

It would also be nice if every implementation would have the
same declaration:

	unsigned long __find_next_(zero_)bit(unsigned long *,
			unsigned long, unsigned long);

I added a totally untested patch for reference.

Did I mis an arch?
Does the assembly still match the (changed) prototype?
Can we get concensus on doing the optimization in generic code?

Greetings,
	Alexander

 arch/arm/lib/findbit.S          |    6 ++++--
 arch/avr32/kernel/avr32_ksyms.c |    4 ++--
 arch/avr32/lib/findbit.S        |    4 ++--
 include/asm-arm/bitops.h        |   20 ++++++++++++--------
 include/asm-m68k/bitops.h       |    8 ++++----
 include/asm-powerpc/bitops.h    |    4 ----
 include/asm-s390/bitops.h       |   12 ++++++------
 include/linux/bitops.h          |    2 --
 8 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/arch/arm/lib/findbit.S b/arch/arm/lib/findbit.S
index a5ca024..5c92967 100644
--- a/arch/arm/lib/findbit.S
+++ b/arch/arm/lib/findbit.S
@@ -36,7 +36,8 @@ ENTRY(_find_first_zero_bit_le)
 
 /*
  * Purpose  : Find next 'zero' bit
- * Prototype: int find_next_zero_bit(void *addr, unsigned int maxbit, int offset)
+ * Prototype: unsigned long find_next_zero_bit(unsigned long *addr,
+ *			unsigned long maxbit, unsigned long offset);
  */
 ENTRY(_find_next_zero_bit_le)
 		teq	r1, #0
@@ -70,7 +71,8 @@ ENTRY(_find_first_bit_le)
 
 /*
  * Purpose  : Find next 'one' bit
- * Prototype: int find_next_zero_bit(void *addr, unsigned int maxbit, int offset)
+ * Prototype: unsigned long find_next_zero_bit(unsigned long *addr,
+ *			unsigned long maxbit, unsigned long offset);
  */
 ENTRY(_find_next_bit_le)
 		teq	r1, #0
diff --git a/arch/avr32/kernel/avr32_ksyms.c b/arch/avr32/kernel/avr32_ksyms.c
index 80f55f8..c228ed1 100644
--- a/arch/avr32/kernel/avr32_ksyms.c
+++ b/arch/avr32/kernel/avr32_ksyms.c
@@ -51,9 +51,9 @@ EXPORT_SYMBOL(__const_udelay);
 
 /* Bit operations (lib/findbit.S) */
 EXPORT_SYMBOL(find_first_zero_bit);
-EXPORT_SYMBOL(find_next_zero_bit);
+EXPORT_SYMBOL(__find_next_zero_bit);
 EXPORT_SYMBOL(find_first_bit);
-EXPORT_SYMBOL(find_next_bit);
+EXPORT_SYMBOL(__find_next_bit);
 EXPORT_SYMBOL(generic_find_next_zero_le_bit);
 
 /* I/O primitives (lib/io-*.S) */
diff --git a/arch/avr32/lib/findbit.S b/arch/avr32/lib/findbit.S
index c6b91de..729deab 100644
--- a/arch/avr32/lib/findbit.S
+++ b/arch/avr32/lib/findbit.S
@@ -29,7 +29,7 @@ ENTRY(find_first_zero_bit)
 	 *				    unsigned long size,
 	 *				    unsigned long offset)
 	 */
-ENTRY(find_next_zero_bit)
+ENTRY(__find_next_zero_bit)
 	lsr	r8, r10, 5
 	sub	r9, r11, r10
 	retle	r11
@@ -93,7 +93,7 @@ ENTRY(find_first_bit)
 	 *			       unsigned long size,
 	 *			       unsigned long offset)
 	 */
-ENTRY(find_next_bit)
+ENTRY(__find_next_bit)
 	lsr	r8, r10, 5
 	sub	r9, r11, r10
 	retle	r11
diff --git a/include/asm-arm/bitops.h b/include/asm-arm/bitops.h
index 5c60bfc..08b3163 100644
--- a/include/asm-arm/bitops.h
+++ b/include/asm-arm/bitops.h
@@ -158,9 +158,11 @@ extern int _test_and_set_bit_le(int nr, volatile unsigned long * p);
 extern int _test_and_clear_bit_le(int nr, volatile unsigned long * p);
 extern int _test_and_change_bit_le(int nr, volatile unsigned long * p);
 extern int _find_first_zero_bit_le(const void * p, unsigned size);
-extern int _find_next_zero_bit_le(const void * p, int size, int offset);
+extern unsigned long _find_next_zero_bit_le(const unsigned long * p,
+		unsigned long size, unsigned long offset);
 extern int _find_first_bit_le(const unsigned long *p, unsigned size);
-extern int _find_next_bit_le(const unsigned long *p, int size, int offset);
+extern unsigned long _find_next_bit_le(const unsigned long *p,
+		unsigned long size, unsigned long offset);
 
 /*
  * Big endian assembly bitops.  nr = 0 -> byte 3 bit 0.
@@ -172,9 +174,11 @@ extern int _test_and_set_bit_be(int nr, volatile unsigned long * p);
 extern int _test_and_clear_bit_be(int nr, volatile unsigned long * p);
 extern int _test_and_change_bit_be(int nr, volatile unsigned long * p);
 extern int _find_first_zero_bit_be(const void * p, unsigned size);
-extern int _find_next_zero_bit_be(const void * p, int size, int offset);
+extern unsigned long _find_next_zero_bit_be(const unsigned long * p,
+		unsigned long size, unsigned long offset);
 extern int _find_first_bit_be(const unsigned long *p, unsigned size);
-extern int _find_next_bit_be(const unsigned long *p, int size, int offset);
+extern unsigned long _find_next_bit_be(const unsigned long *p,
+		unsigned long size, unsigned long offset);
 
 #ifndef CONFIG_SMP
 /*
@@ -208,9 +212,9 @@ extern int _find_next_bit_be(const unsigned long *p, int size, int offset);
 #define test_and_clear_bit(nr,p)	ATOMIC_BITOP_LE(test_and_clear_bit,nr,p)
 #define test_and_change_bit(nr,p)	ATOMIC_BITOP_LE(test_and_change_bit,nr,p)
 #define find_first_zero_bit(p,sz)	_find_first_zero_bit_le(p,sz)
-#define find_next_zero_bit(p,sz,off)	_find_next_zero_bit_le(p,sz,off)
+#define __find_next_zero_bit(p,sz,off)	_find_next_zero_bit_le(p,sz,off)
 #define find_first_bit(p,sz)		_find_first_bit_le(p,sz)
-#define find_next_bit(p,sz,off)		_find_next_bit_le(p,sz,off)
+#define __find_next_bit(p,sz,off)	_find_next_bit_le(p,sz,off)
 
 #define WORD_BITOFF_TO_LE(x)		((x))
 
@@ -226,9 +230,9 @@ extern int _find_next_bit_be(const unsigned long *p, int size, int offset);
 #define test_and_clear_bit(nr,p)	ATOMIC_BITOP_BE(test_and_clear_bit,nr,p)
 #define test_and_change_bit(nr,p)	ATOMIC_BITOP_BE(test_and_change_bit,nr,p)
 #define find_first_zero_bit(p,sz)	_find_first_zero_bit_be(p,sz)
-#define find_next_zero_bit(p,sz,off)	_find_next_zero_bit_be(p,sz,off)
+#define __find_next_zero_bit(p,sz,off)	_find_next_zero_bit_be(p,sz,off)
 #define find_first_bit(p,sz)		_find_first_bit_be(p,sz)
-#define find_next_bit(p,sz,off)		_find_next_bit_be(p,sz,off)
+#define __find_next_bit(p,sz,off)	_find_next_bit_be(p,sz,off)
 
 #define WORD_BITOFF_TO_LE(x)		((x) ^ 0x18)
 
diff --git a/include/asm-m68k/bitops.h b/include/asm-m68k/bitops.h
index 83d1f28..c320c7a 100644
--- a/include/asm-m68k/bitops.h
+++ b/include/asm-m68k/bitops.h
@@ -199,8 +199,8 @@ out:
 	return ((long)p - (long)vaddr - 4) * 8 + res;
 }
 
-static inline int find_next_zero_bit(const unsigned long *vaddr, int size,
-				     int offset)
+static inline unsigned long __find_next_zero_bit(const unsigned long *vaddr,
+		unsigned long size, unsigned long offset)
 {
 	const unsigned long *p = vaddr + (offset >> 5);
 	int bit = offset & 31UL, res;
@@ -246,8 +246,8 @@ out:
 	return ((long)p - (long)vaddr - 4) * 8 + res;
 }
 
-static inline int find_next_bit(const unsigned long *vaddr, int size,
-				int offset)
+static inline unsigned long __find_next_bit(const unsigned long *vaddr,
+		unsigned long size, unsigned long offset)
 {
 	const unsigned long *p = vaddr + (offset >> 5);
 	int bit = offset & 31UL, res;
diff --git a/include/asm-powerpc/bitops.h b/include/asm-powerpc/bitops.h
index 220d9a7..8ae6667 100644
--- a/include/asm-powerpc/bitops.h
+++ b/include/asm-powerpc/bitops.h
@@ -317,8 +317,6 @@ static __inline__ int fls(unsigned int x)
 #include <asm-generic/bitops/hweight.h>
 
 #define find_first_zero_bit(addr, size) find_next_zero_bit((addr), (size), 0)
-unsigned long find_next_zero_bit(const unsigned long *addr,
-				 unsigned long size, unsigned long offset);
 /**
  * find_first_bit - find the first set bit in a memory region
  * @addr: The address to start the search at
@@ -328,8 +326,6 @@ unsigned long find_next_zero_bit(const unsigned long *addr,
  * containing a bit.
  */
 #define find_first_bit(addr, size) find_next_bit((addr), (size), 0)
-unsigned long find_next_bit(const unsigned long *addr,
-			    unsigned long size, unsigned long offset);
 
 /* Little-endian versions */
 
diff --git a/include/asm-s390/bitops.h b/include/asm-s390/bitops.h
index 965394e..13a5c33 100644
--- a/include/asm-s390/bitops.h
+++ b/include/asm-s390/bitops.h
@@ -691,9 +691,9 @@ static inline unsigned long find_first_bit(const unsigned long * addr,
  * @offset: The bitnumber to start searching at
  * @size: The maximum size to search
  */
-static inline int find_next_zero_bit (const unsigned long * addr,
-				      unsigned long size,
-				      unsigned long offset)
+static inline unsigned long __find_next_zero_bit(const unsigned long * addr,
+						 unsigned long size,
+						 unsigned long offset)
 {
         const unsigned long *p;
 	unsigned long bit, set;
@@ -727,9 +727,9 @@ static inline int find_next_zero_bit (const unsigned long * addr,
  * @offset: The bitnumber to start searching at
  * @size: The maximum size to search
  */
-static inline int find_next_bit (const unsigned long * addr,
-				 unsigned long size,
-				 unsigned long offset)
+static inline unsigned long __find_next_bit(const unsigned long * addr,
+					    unsigned long size,
+					    unsigned long offset)
 {
         const unsigned long *p;
 	unsigned long bit, set;
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 15360f9..68be268 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -73,7 +73,6 @@ static inline unsigned fls_long(unsigned long l)
 }
 
 #ifdef __KERNEL__
-#ifdef CONFIG_GENERIC_FIND_NEXT_BIT
 extern unsigned long __find_next_bit(const unsigned long *addr,
 		unsigned long size, unsigned long offset);
 
@@ -147,6 +146,5 @@ find_next_zero_bit(const unsigned long *addr, unsigned long size,
 	/* size is not constant or too big */
 	return __find_next_zero_bit(addr, size, offset);
 }
-#endif /* CONFIG_GENERIC_FIND_NEXT_BIT */
 #endif /* __KERNEL__ */
 #endif


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86: Optimize find_next_(zero_)bit for small constant-size bitmaps
  2008-03-11 15:17           ` [PATCH] " Alexander van Heukelum
  2008-03-11 15:22             ` [RFC] non-x86: " Alexander van Heukelum
@ 2008-03-11 15:23             ` Ingo Molnar
  1 sibling, 0 replies; 19+ messages in thread
From: Ingo Molnar @ 2008-03-11 15:23 UTC (permalink / raw)
  To: Alexander van Heukelum
  Cc: Alexander van Heukelum, Andi Kleen, Thomas Gleixner,
	H. Peter Anvin, LKML


* Alexander van Heukelum <heukelum@mailshack.com> wrote:

> > > +#endif /* CONFIG_GENERIC_FIND_NEXT_BIT */
> > 
> > there should be an #else here i think, which maps find_next_bit to 
> > __find_next_bit / etc.
> 
> No, that is intentional: architectures are expected to either set 
> CONFIG_GENERIC_FIND_NEXT_BIT, or provide their own implementation of 
> find_next_bit.

indeed, you are right.

> This version runs fine on qemu.

great - i've queued it up into x86.git.

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86: Change x86 to use generic find_next_bit
  2008-03-09 20:01 [PATCH] x86: Change x86 to use generic find_next_bit Alexander van Heukelum
                   ` (3 preceding siblings ...)
  2008-03-09 21:31 ` Andi Kleen
@ 2008-03-13 12:44 ` Aneesh Kumar K.V
  2008-03-13 14:27   ` Alexander van Heukelum
  4 siblings, 1 reply; 19+ messages in thread
From: Aneesh Kumar K.V @ 2008-03-13 12:44 UTC (permalink / raw)
  To: Alexander van Heukelum
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, LKML, heukelum

On Sun, Mar 09, 2008 at 09:01:04PM +0100, Alexander van Heukelum wrote:
> x86: Change x86 to use the generic find_next_bit implementation
> 
> The versions with inline assembly are in fact slower on the machines I
> tested them on (in userspace) (Athlon XP 2800+, p4-like Xeon 2.8GHz, AMD
> Opteron 270). The i386-version needed a fix similar to 06024f21 to avoid
> crashing the benchmark.
> 
> Benchmark using: gcc -fomit-frame-pointer -Os. For each bitmap size 
> 1...512, for each possible bitmap with one bit set, for each possible 
> offset: find the position of the first bit starting at offset. If you 
> follow ;). Times include setup of the bitmap and checking of the 
> results.
> 
> 		Athlon		Xeon		Opteron 32/64bit
> x86-specific:	0m3.692s	0m2.820s	0m3.196s / 0m2.480s
> generic:	0m2.622s	0m1.662s	0m2.100s / 0m1.572s
> 
> If the bitmap size is not a multiple of BITS_PER_LONG, and no set 
> (cleared) bit is found, find_next_bit (find_next_zero_bit) returns a 
> value outside of the range [0,size]. The generic version always returns 
> exactly size. The generic version also uses unsigned long everywhere, 
> while the x86 versions use a mishmash of int, unsigned (int), long and 
> unsigned long.
> 

This problem is observed on x86_64 and powerpc also. We need a long
aligned address for test_bit, set_bit find_bit etc.  In ext4 we have
to make sure we align the address passed to

ext4_test_bit
ext4_set_bit
ext4_find_next_zero_bit
ext4_find_next_bit

fs/ext4/mballoc.c have some examples.

-aneesh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86: Change x86 to use generic find_next_bit
  2008-03-13 12:44 ` Aneesh Kumar K.V
@ 2008-03-13 14:27   ` Alexander van Heukelum
  0 siblings, 0 replies; 19+ messages in thread
From: Alexander van Heukelum @ 2008-03-13 14:27 UTC (permalink / raw)
  To: Aneesh Kumar K.V, Alexander van Heukelum
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, LKML


On Thu, 13 Mar 2008 18:14:24 +0530, "Aneesh Kumar K.V"
<aneesh.kumar@linux.vnet.ibm.com> said:
> On Sun, Mar 09, 2008 at 09:01:04PM +0100, Alexander van Heukelum wrote:
> > x86: Change x86 to use the generic find_next_bit implementation
> > 
> > The versions with inline assembly are in fact slower on the machines I
> > tested them on (in userspace) (Athlon XP 2800+, p4-like Xeon 2.8GHz, AMD
> > Opteron 270). The i386-version needed a fix similar to 06024f21 to avoid
> > crashing the benchmark.
> > 
> > Benchmark using: gcc -fomit-frame-pointer -Os. For each bitmap size 
> > 1...512, for each possible bitmap with one bit set, for each possible 
> > offset: find the position of the first bit starting at offset. If you 
> > follow ;). Times include setup of the bitmap and checking of the 
> > results.
> > 
> > 		Athlon		Xeon		Opteron 32/64bit
> > x86-specific:	0m3.692s	0m2.820s	0m3.196s / 0m2.480s
> > generic:	0m2.622s	0m1.662s	0m2.100s / 0m1.572s
> > 
> > If the bitmap size is not a multiple of BITS_PER_LONG, and no set 
> > (cleared) bit is found, find_next_bit (find_next_zero_bit) returns a 
> > value outside of the range [0,size]. The generic version always returns 
> > exactly size. The generic version also uses unsigned long everywhere, 
> > while the x86 versions use a mishmash of int, unsigned (int), long and 
> > unsigned long.
> 
> This problem is observed on x86_64 and powerpc also.

I'm not entirely sure if it is a problem. In some cases it
certainly is an inconvenience, though ;). I mentioned the
difference between the old and generic versions, because
of the possibility of dependence of this behaviour.

Indeed I see for example (in fs/ext4/mballoc.c).

    bit = mb_find_next_zero_bit(bitmap_bh->b_data, end, bit);
    if (bit >= end)
        break;
    next = mb_find_next_bit(bitmap_bh->b_data, end, bit);
    if (next > end)
        next = end;
    free += next - bit;

So here it needed to adjust the value.

> We need a long
> aligned address for test_bit, set_bit find_bit etc.  In ext4 we have
> to make sure we align the address passed to
> 
> ext4_test_bit
> ext4_set_bit
> ext4_find_next_zero_bit
> ext4_find_next_bit
> 
> fs/ext4/mballoc.c have some examples.

This is a different 'problem'. find_next_bit works on arrays of long,
while the bitmaps in ext4_find_next_bit are of type void * and seem
not to have any alignment restrictions. ext4 implements wrappers
around find_next_bit to solve that 'problem'.

The question that arises is: do we want find_first_bit, find_next_bit.
etc. to always return a value in the range [0, size], or do we want
to allow implementations that return [0, size-1] if there is a bit
found and something else (roundup(size,bitsperlong) or ulongmax, for
example if, none were found?

The current x86_64 versions of find_first_bit and find_next_bit return
roundup(size,bitsperlong) if no bits were found, but on the other hand
I guess most bitmaps are a multiple of bitsperlong bits in size, which
hides the difference.

Greetings,
    Alexander

> -aneesh
-- 
  Alexander van Heukelum
  heukelum@fastmail.fm

-- 
http://www.fastmail.fm - Or how I learned to stop worrying and
                          love email again


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2008-03-13 14:49 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-09 20:01 [PATCH] x86: Change x86 to use generic find_next_bit Alexander van Heukelum
2008-03-09 20:10 ` Ingo Molnar
2008-03-09 21:03   ` Andi Kleen
2008-03-09 21:32     ` Andi Kleen
2008-03-09 21:13   ` Alexander van Heukelum
2008-03-10  6:29     ` Ingo Molnar
2008-03-09 20:11 ` Ingo Molnar
2008-03-09 20:31   ` Alexander van Heukelum
2008-03-09 20:51     ` Ingo Molnar
2008-03-09 21:29       ` Andi Kleen
2008-03-10 23:17       ` [RFC/PATCH] x86: Optimize find_next_(zero_)bit for small constant-size bitmaps Alexander van Heukelum
2008-03-11  9:56         ` Ingo Molnar
2008-03-11 15:17           ` [PATCH] " Alexander van Heukelum
2008-03-11 15:22             ` [RFC] non-x86: " Alexander van Heukelum
2008-03-11 15:23             ` [PATCH] x86: " Ingo Molnar
2008-03-09 20:28 ` [PATCH] x86: Change x86 to use generic find_next_bit Andi Kleen
2008-03-09 21:31 ` Andi Kleen
2008-03-13 12:44 ` Aneesh Kumar K.V
2008-03-13 14:27   ` Alexander van Heukelum

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).