* Re: [PATCH 8/12] generic hweight{32,16,8}()
@ 2006-01-31 16:49 linux
2006-01-31 18:14 ` Grant Grundler
2006-02-02 9:34 ` Balbir Singh
0 siblings, 2 replies; 13+ messages in thread
From: linux @ 2006-01-31 16:49 UTC (permalink / raw)
To: linux-ia64, linux-kernel, mita
This is an extremely well-known technique. You can see a similar version
that uses a multiply for the last few steps at
http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel
whch refers to
"Software Optimization Guide for AMD Athlon 64 and Opteron Processors"
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25112.PDF
It's section 8.6, "Efficient Implementation of Population-Count Function
in 32-bit Mode", pages 179-180.
It uses the name that I am more familiar with, "popcunt" (population count),
although "Hamming weight" also makes sense.
Anyway, the proof of correctness proceeds as follows:
b = a - ((a >> 1) & 0x55555555);
c = (b & 0x33333333) + ((b >> 2) & 0x33333333);
d = (c + (c >> 4)) & 0x0f0f0f0f;
#if SLOW_MULTIPLY
e = d + (d >> 8)
f = e + (e >> 16);
return f & 63;
#else
/* Useful if multiply takes at most 4 cycles */
return (d * 0x01010101) >> 24;
#endif
The input value a can be thought of as 32 1-bit fields each holding
their own hamming weight. Now look at it as 16 2-bit fields.
Each 2-bit field a1..a0 has the value 2*a1 + a0. This can be converted
into the hamming weight of the 2-bit field a1+a0 by subtracting a1.
That's what the (a >> 1) & mask subtraction does. Since there can be no
borrows, you can just do it all at once.
Enumerating the 4 possible cases:
0b00 = 0 -> 0 - 0 = 0
0b01 = 1 -> 1 - 0 = 1
0b10 = 2 -> 2 - 1 = 1
0b11 = 3 -> 3 - 1 = 2
The next step consists of breaking up b (made of 16 2-bir fields) into
even and odd halves and adding them into 4-bit fields. Since the largest
possible sum is 2+2 = 4, which will not fit into a 4-bit field, the 2-bit
fields have to be masked before they are added.
After this point, the masking can be delayed. Each 4-bit field holds
a population count from 0..4, taking at most 3 bits. These numbers can
be added without overflowing a 4-bit field, so we can compute
c + (c >> 4), and only then mask off the unwanted bits.
This produces d, a number of 4 8-bit fields, each in the range 0..8.
>From this point, we can shift and add d multiple times without overflowing
an 8-bit field, and only do a final mask at the end.
The number to mask with has to be at least 63 (so that 32 on't be truncated),
but can also be 128 or 255. The x86 has a special encoding for signed
immediate byte values -128..127, so the value of 255 is slower. On
other processors, a special "sign extend byte" instruction might be faster.
On a processor with fast integer multiplies (Athlon but not P4), you can
reduce the final few serially dependent instructions to a single integer
multiply. Consider d to be 3 8-bit values d3, d2, d1 and d0, each in the
range 0..8. The multiply forms the partial products:
d3 d2 d1 d0
d3 d2 d1 d0
d3 d2 d1 d0
+ d3 d2 d1 d0
----------------------
e3 e2 e1 e0
Where e3 = d3 + d2 + d1 + d0. e2, e1 and e0 obviously cannot generate
any carries.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 8/12] generic hweight{32,16,8}() 2006-01-31 16:49 [PATCH 8/12] generic hweight{32,16,8}() linux @ 2006-01-31 18:14 ` Grant Grundler 2006-02-02 9:34 ` Balbir Singh 1 sibling, 0 replies; 13+ messages in thread From: Grant Grundler @ 2006-01-31 18:14 UTC (permalink / raw) To: linux; +Cc: linux-ia64, linux-kernel, mita On Tue, Jan 31, 2006 at 11:49:49AM -0500, linux@horizon.com wrote: > This is an extremely well-known technique. You can see a similar version > that uses a multiply for the last few steps at > http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel > whch refers to > "Software Optimization Guide for AMD Athlon 64 and Opteron Processors" > http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25112.PDF ... > The next step consists of breaking up b (made of 16 2-bir fields) into > even and odd halves and adding them into 4-bit fields. Since the largest > possible sum is 2+2 = 4, which will not fit into a 4-bit field, the 2-bit > fields have to be masked before they are added. Up to here, things were clear. My guess is you meant "which will not fit into a 2-bit field". thanks, grant ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 8/12] generic hweight{32,16,8}() 2006-01-31 16:49 [PATCH 8/12] generic hweight{32,16,8}() linux 2006-01-31 18:14 ` Grant Grundler @ 2006-02-02 9:34 ` Balbir Singh 1 sibling, 0 replies; 13+ messages in thread From: Balbir Singh @ 2006-02-02 9:34 UTC (permalink / raw) To: linux; +Cc: linux-ia64, linux-kernel, mita > This is an extremely well-known technique. You can see a similar version > that uses a multiply for the last few steps at > http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel > whch refers to > "Software Optimization Guide for AMD Athlon 64 and Opteron Processors" > http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25112.PDF > > It's section 8.6, "Efficient Implementation of Population-Count Function > in 32-bit Mode", pages 179-180. Thanks for doing this. The proof looks good except for what has been already pointed out by Grant Grundler. ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 0/6] RFC: use include/asm-generic/bitops.h @ 2006-01-25 11:26 Akinobu Mita 2006-01-25 11:32 ` [PATCH 3/6] C-language equivalents of include/asm-*/bitops.h Akinobu Mita 0 siblings, 1 reply; 13+ messages in thread From: Akinobu Mita @ 2006-01-25 11:26 UTC (permalink / raw) To: linux-kernel Cc: Richard Henderson, Ivan Kokshaysky, Russell King, Ian Molton, dev-etrax, David Howells, Yoshinori Sato, Linus Torvalds, linux-ia64, Hirokazu Takata, linux-m68k, Greg Ungerer, linux-mips, parisc-linux, linuxppc-dev, linux390, linuxsh-dev, linuxsh-shmedia-dev, sparclinux, ultralinux, Miles Bader, Andi Kleen, Chris Zankel Large number of boilerplate bit operations written in C-language are scattered around include/asm-*/bitops.h. These patch series gather them into include/asm-generic/bitops.h. And - kill duplicated code and comment (about 4000lines) - use better C-language equivalents - help porting new architecture (now include/asm-generic/bitops.h is not referenced from anywhere) ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 3/6] C-language equivalents of include/asm-*/bitops.h 2006-01-25 11:26 [PATCH 0/6] RFC: use include/asm-generic/bitops.h Akinobu Mita @ 2006-01-25 11:32 ` Akinobu Mita 2006-01-25 20:02 ` Russell King 0 siblings, 1 reply; 13+ messages in thread From: Akinobu Mita @ 2006-01-25 11:32 UTC (permalink / raw) To: linux-kernel Cc: Richard Henderson, Ivan Kokshaysky, Russell King, Ian Molton, dev-etrax, David Howells, Yoshinori Sato, Linus Torvalds, linux-ia64, Hirokazu Takata, linux-m68k, Greg Ungerer, linux-mips, parisc-linux, linuxppc-dev, linux390, linuxsh-dev, linuxsh-shmedia-dev, sparclinux, ultralinux, Miles Bader, Andi Kleen, Chris Zankel o generic {,test_and_}{set,clear,change}_bit() (atomic bitops) This patch introduces the C-language equivalents of the functions below: void set_bit(int nr, volatile unsigned long *addr); void clear_bit(int nr, volatile unsigned long *addr); void change_bit(int nr, volatile unsigned long *addr); int test_and_set_bit(int nr, volatile unsigned long *addr); int test_and_clear_bit(int nr, volatile unsigned long *addr); int test_and_change_bit(int nr, volatile unsigned long *addr); HAVE_ARCH_ATOMIC_BITOPS is defined when the architecture has its own version of these functions. This code largely copied from: include/asm-powerpc/bitops.h include/asm-parisc/bitops.h include/asm-parisc/atomic.h o generic __{,test_and_}{set,clear,change}_bit() and test_bit() This patch introduces the C-language equivalents of the functions below: void __set_bit(int nr, volatile unsigned long *addr); void __clear_bit(int nr, volatile unsigned long *addr); void __change_bit(int nr, volatile unsigned long *addr); int __test_and_set_bit(int nr, volatile unsigned long *addr); int __test_and_clear_bit(int nr, volatile unsigned long *addr); int __test_and_change_bit(int nr, volatile unsigned long *addr); int test_bit(int nr, const volatile unsigned long *addr); HAVE_ARCH_NON_ATOMIC_BITOPS is defined when the architecture has its own version of these functions. This code largely copied from: asm-powerpc/bitops.h o generic __ffs() This patch introduces the C-language equivalent of the function: unsigned long __ffs(unsigned long word); HAVE_ARCH___FFS_BITOPS is defined when the architecture has its own version of these functions. This code largely copied from: include/asm-sparc64/bitops.h o generic ffz() This patch introduces the C-language equivalent of the function: unsigned long ffz(unsigned long word); HAVE_ARCH_FFZ_BITOPS is defined when the architecture has its own version of these functions. This code largely copied from: include/asm-sparc64/bitops.h o generic fls() This patch introduces the C-language equivalent of the function: int fls(int x); HAVE_ARCH_FLS_BITOPS is defined when the architecture has its own version of these functions. This code largely copied from: include/linux/bitops.h o generic fls64() This patch introduces the C-language equivalent of the function: int fls64(__u64 x); HAVE_ARCH_FLS64_BITOPS is defined when the architecture has its own version of these functions. This code largely copied from: include/linux/bitops.h o generic find_{next,first}{,_zero}_bit() This patch introduces the C-language equivalents of the functions below: unsigned logn find_next_bit(const unsigned long *addr, unsigned long size, unsigned long offset); unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size, unsigned long offset); unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size); unsigned long find_first_bit(const unsigned long *addr, unsigned long size); HAVE_ARCH_FIND_BITOPS is defined when the architecture has its own version of these functions. This code largely copied from: arch/powerpc/lib/bitops.c ==== KERNEL o generic sched_find_first_bit() This patch introduces the C-language equivalent of the function: int sched_find_first_bit(const unsigned long *b); HAVE_ARCH_SCHED_BITOPS is defined when the architecture has its own version of these functions. This code largely copied from: include/asm-powerpc/bitops.h o generic ffs() This patch introduces the C-language equivalent of the function: int ffs(int x); HAVE_ARCH_FFS_BITOPS is defined when the architecture has its own version of these functions. This code largely copied from: include/linux/bitops.h o generic hweight{32,16,8}() This patch introduces the C-language equivalents of the functions below: unsigned int hweight32(unsigned int w); unsigned int hweight16(unsigned int w); unsigned int hweight8(unsigned int w); HAVE_ARCH_HWEIGHT_BITOPS is defined when the architecture has its own version of these functions. This code largely copied from: include/linux/bitops.h o generic hweight64() This patch introduces the C-language equivalent of the function: unsigned long hweight64(__u64 w); HAVE_ARCH_HWEIGHT64_BITOPS is defined when the architecture has its own version of these functions. This code largely copied from: include/linux/bitops.h o generic ext2_{set,clear,test,find_first_zero,find_next_zero}_bit() This patch introduces the C-language equivalents of the functions below: int ext2_set_bit(int nr, volatile unsigned long *addr); int ext2_clear_bit(int nr, volatile unsigned long *addr); int ext2_test_bit(int nr, const volatile unsigned long *addr); unsigned long ext2_find_first_zero_bit(const unsigned long *addr, unsigned long size); HAVE_ARCH_EXT2_NON_ATOMIC_BITOPS is defined when the architecture has its own version of these functions. unsinged long ext2_find_next_zero_bit(const unsigned long *addr, unsigned long size); This code largely copied from: include/asm-powerpc/bitops.h include/asm-parisc/bitops.h o generic ext2_{set,clear}_bit_atomic() This patch introduces the C-language equivalents of the functions below: int ext2_set_bit_atomic(int nr, volatile unsigned long *addr); int ext2_clear_bit_atomic(int nr, volatile unsigned long *addr); HAVE_ARCH_EXT2_ATOMIC_BITOPS is defined when the architecture has its own version of these functions. This code largely copied from: include/asm-sparc/bitops.h o generic minix_{test,set,test_and_clear,test,find_first_zero}_bit() This patch introduces the C-language equivalents of the functions below: HAVE_ARCH_MINIX_BITOPS is defined when the architecture has its own version of these functions. int minix_test_and_set_bit(int nr, volatile unsigned long *addr); int minix_set_bit(int nr, volatile unsigned long *addr); int minix_test_and_clear_bit(int nr, volatile unsigned long *addr); int minix_test_bit(int nr, const volatile unsigned long *addr); unsigned long minix_find_first_zero_bit(const unsigned long *addr, unsigned long size); This code largely copied from: include/asm-sparc/bitops.h Signed-off-by: Akinobu Mita <mita@miraclelinux.com> --- bitops.h | 677 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---- 1 files changed, 641 insertions(+), 36 deletions(-) Index: work/include/asm-generic/bitops.h =================================================================== --- work.orig/include/asm-generic/bitops.h 2006-01-25 19:14:27.000000000 +0900 +++ work/include/asm-generic/bitops.h 2006-01-25 19:32:48.000000000 +0900 @@ -1,81 +1,686 @@ #ifndef _ASM_GENERIC_BITOPS_H_ #define _ASM_GENERIC_BITOPS_H_ +#include <asm/types.h> + +#define BITOP_MASK(nr) (1UL << ((nr) % BITS_PER_LONG)) +#define BITOP_WORD(nr) ((nr) / BITS_PER_LONG) +#define BITOP_LE_SWIZZLE ((BITS_PER_LONG-1) & ~0x7) + +#ifndef HAVE_ARCH_ATOMIC_BITOPS + +#ifdef CONFIG_SMP +#include <asm/spinlock.h> +#include <asm/cache.h> /* we use L1_CACHE_BYTES */ + +/* Use an array of spinlocks for our atomic_ts. + * Hash function to index into a different SPINLOCK. + * Since "a" is usually an address, use one spinlock per cacheline. + */ +# define ATOMIC_HASH_SIZE 4 +# define ATOMIC_HASH(a) (&(__atomic_hash[ (((unsigned long) a)/L1_CACHE_BYTES) & (ATOMIC_HASH_SIZE-1) ])) + +extern raw_spinlock_t __atomic_hash[ATOMIC_HASH_SIZE] __lock_aligned; + +/* Can't use raw_spin_lock_irq because of #include problems, so + * this is the substitute */ +#define _atomic_spin_lock_irqsave(l,f) do { \ + raw_spinlock_t *s = ATOMIC_HASH(l); \ + local_irq_save(f); \ + __raw_spin_lock(s); \ +} while(0) + +#define _atomic_spin_unlock_irqrestore(l,f) do { \ + raw_spinlock_t *s = ATOMIC_HASH(l); \ + __raw_spin_unlock(s); \ + local_irq_restore(f); \ +} while(0) + + +#else +# define _atomic_spin_lock_irqsave(l,f) do { local_irq_save(f); } while (0) +# define _atomic_spin_unlock_irqrestore(l,f) do { local_irq_restore(f); } while (0) +#endif + /* * For the benefit of those who are trying to port Linux to another * architecture, here are some C-language equivalents. You should * recode these in the native assembly language, if at all possible. - * To guarantee atomicity, these routines call cli() and sti() to - * disable interrupts while they operate. (You have to provide inline - * routines to cli() and sti().) * - * Also note, these routines assume that you have 32 bit longs. - * You will have to change this if you are trying to port Linux to the - * Alpha architecture or to a Cray. :-) - * * C language equivalents written by Theodore Ts'o, 9/26/92 */ -extern __inline__ int set_bit(int nr,long * addr) +static __inline__ void set_bit(int nr, volatile unsigned long *addr) +{ + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + unsigned long flags; + + _atomic_spin_lock_irqsave(p, flags); + *p |= mask; + _atomic_spin_unlock_irqrestore(p, flags); +} + +static __inline__ void clear_bit(int nr, volatile unsigned long *addr) +{ + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + unsigned long flags; + + _atomic_spin_lock_irqsave(p, flags); + *p &= ~mask; + _atomic_spin_unlock_irqrestore(p, flags); +} + +static __inline__ void change_bit(int nr, volatile unsigned long *addr) +{ + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + unsigned long flags; + + _atomic_spin_lock_irqsave(p, flags); + *p ^= mask; + _atomic_spin_unlock_irqrestore(p, flags); +} + +static __inline__ int test_and_set_bit(int nr, volatile unsigned long *addr) +{ + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + unsigned long old; + unsigned long flags; + + _atomic_spin_lock_irqsave(p, flags); + old = *p; + *p = old | mask; + _atomic_spin_unlock_irqrestore(p, flags); + + return (old & mask) != 0; +} + +static __inline__ int test_and_clear_bit(int nr, volatile unsigned long *addr) +{ + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + unsigned long old; + unsigned long flags; + + _atomic_spin_lock_irqsave(p, flags); + old = *p; + *p = old & ~mask; + _atomic_spin_unlock_irqrestore(p, flags); + + return (old & mask) != 0; +} + +static __inline__ int test_and_change_bit(int nr, volatile unsigned long *addr) +{ + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + unsigned long old; + unsigned long flags; + + _atomic_spin_lock_irqsave(p, flags); + old = *p; + *p = old ^ mask; + _atomic_spin_unlock_irqrestore(p, flags); + + return (old & mask) != 0; +} + +#endif /* HAVE_ARCH_ATOMIC_BITOPS */ + +#ifndef HAVE_ARCH_NON_ATOMIC_BITOPS + +static __inline__ void __set_bit(int nr, volatile unsigned long *addr) +{ + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + + *p |= mask; +} + +static __inline__ void __clear_bit(int nr, volatile unsigned long *addr) +{ + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + + *p &= ~mask; +} + +static __inline__ void __change_bit(int nr, volatile unsigned long *addr) +{ + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + + *p ^= mask; +} + +static __inline__ int __test_and_set_bit(int nr, volatile unsigned long *addr) { - int mask, retval; + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + unsigned long old = *p; - addr += nr >> 5; - mask = 1 << (nr & 0x1f); - cli(); - retval = (mask & *addr) != 0; - *addr |= mask; - sti(); - return retval; + *p = old | mask; + return (old & mask) != 0; } -extern __inline__ int clear_bit(int nr, long * addr) +static __inline__ int __test_and_clear_bit(int nr, volatile unsigned long *addr) { - int mask, retval; + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + unsigned long old = *p; - addr += nr >> 5; - mask = 1 << (nr & 0x1f); - cli(); - retval = (mask & *addr) != 0; - *addr &= ~mask; - sti(); - return retval; + *p = old & ~mask; + return (old & mask) != 0; } -extern __inline__ int test_bit(int nr, const unsigned long * addr) +static __inline__ int __test_and_change_bit(int nr, + volatile unsigned long *addr) +{ + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + unsigned long old = *p; + + *p = old ^ mask; + return (old & mask) != 0; +} + +static __inline__ int test_bit(int nr, __const__ volatile unsigned long *addr) +{ + return 1UL & (addr[BITOP_WORD(nr)] >> (nr & (BITS_PER_LONG-1))); +} + +#endif /* HAVE_ARCH_NON_ATOMIC_BITOPS */ + +#ifndef HAVE_ARCH___FFS_BITOPS + +/** + * __ffs - find first bit in word. + * @word: The word to search + * + * Returns 0..BITS_PER_LONG-1 + * Undefined if no bit exists, so code should check against 0 first. + */ +static inline unsigned long __ffs(unsigned long word) { - int mask; + int b = 0, s; - addr += nr >> 5; - mask = 1 << (nr & 0x1f); - return ((mask & *addr) != 0); +#if BITS_PER_LONG == 32 + s = 16; if (word << 16 != 0) s = 0; b += s; word >>= s; + s = 8; if (word << 24 != 0) s = 0; b += s; word >>= s; + s = 4; if (word << 28 != 0) s = 0; b += s; word >>= s; + s = 2; if (word << 30 != 0) s = 0; b += s; word >>= s; + s = 1; if (word << 31 != 0) s = 0; b += s; + + return b; +#elif BITS_PER_LONG == 64 + s = 32; if (word << 32 != 0) s = 0; b += s; word >>= s; + s = 16; if (word << 48 != 0) s = 0; b += s; word >>= s; + s = 8; if (word << 56 != 0) s = 0; b += s; word >>= s; + s = 4; if (word << 60 != 0) s = 0; b += s; word >>= s; + s = 2; if (word << 62 != 0) s = 0; b += s; word >>= s; + s = 1; if (word << 63 != 0) s = 0; b += s; + + return b; +#else +#error BITS_PER_LONG not defined +#endif } +#endif /* HAVE_ARCH___FFS_BITOPS */ + +#ifndef HAVE_ARCH_FFZ_BITOPS + +/* Undefined if no bit is zero. */ +#define ffz(x) __ffs(~x) + +#endif /* HAVE_ARCH_FFZ_BITOPS */ + +#ifndef HAVE_ARCH_FLS_BITOPS + /* * fls: find last bit set. */ -#define fls(x) generic_fls(x) -#define fls64(x) generic_fls64(x) +static __inline__ int fls(int x) +{ + int r = 32; + + if (!x) + return 0; + if (!(x & 0xffff0000u)) { + x <<= 16; + r -= 16; + } + if (!(x & 0xff000000u)) { + x <<= 8; + r -= 8; + } + if (!(x & 0xf0000000u)) { + x <<= 4; + r -= 4; + } + if (!(x & 0xc0000000u)) { + x <<= 2; + r -= 2; + } + if (!(x & 0x80000000u)) { + x <<= 1; + r -= 1; + } + return r; +} + +#endif /* HAVE_ARCH_FLS_BITOPS */ + +#ifndef HAVE_ARCH_FLS64_BITOPS + +static inline int fls64(__u64 x) +{ + __u32 h = x >> 32; + if (h) + return fls(x) + 32; + return fls(x); +} + +#endif /* HAVE_ARCH_FLS64_BITOPS */ + +#ifndef HAVE_ARCH_FIND_BITOPS + +/** + * find_next_bit - find the next set bit in a memory region + * @addr: The address to base the search on + * @offset: The bitnumber to start searching at + * @size: The maximum size to search + */ +static inline unsigned long find_next_bit(const unsigned long *addr, + unsigned long size, unsigned long offset) +{ + const unsigned long *p = addr + BITOP_WORD(offset); + unsigned long result = offset & ~(BITS_PER_LONG-1); + unsigned long tmp; + + if (offset >= size) + return size; + size -= result; + offset %= BITS_PER_LONG; + if (offset) { + tmp = *(p++); + tmp &= (~0UL << offset); + if (size < BITS_PER_LONG) + goto found_first; + if (tmp) + goto found_middle; + size -= BITS_PER_LONG; + result += BITS_PER_LONG; + } + while (size & ~(BITS_PER_LONG-1)) { + if ((tmp = *(p++))) + goto found_middle; + result += BITS_PER_LONG; + size -= BITS_PER_LONG; + } + if (!size) + return result; + tmp = *p; + +found_first: + tmp &= (~0UL >> (BITS_PER_LONG - size)); + if (tmp == 0UL) /* Are any bits set? */ + return result + size; /* Nope. */ +found_middle: + return result + __ffs(tmp); +} + +/* + * This implementation of find_{first,next}_zero_bit was stolen from + * Linus' asm-alpha/bitops.h. + */ +static inline unsigned long find_next_zero_bit(const unsigned long *addr, + unsigned long size, unsigned long offset) +{ + const unsigned long *p = addr + BITOP_WORD(offset); + unsigned long result = offset & ~(BITS_PER_LONG-1); + unsigned long tmp; + + if (offset >= size) + return size; + size -= result; + offset %= BITS_PER_LONG; + if (offset) { + tmp = *(p++); + tmp |= ~0UL >> (BITS_PER_LONG - offset); + if (size < BITS_PER_LONG) + goto found_first; + if (~tmp) + goto found_middle; + size -= BITS_PER_LONG; + result += BITS_PER_LONG; + } + while (size & ~(BITS_PER_LONG-1)) { + if (~(tmp = *(p++))) + goto found_middle; + result += BITS_PER_LONG; + size -= BITS_PER_LONG; + } + if (!size) + return result; + tmp = *p; + +found_first: + tmp |= ~0UL << size; + if (tmp == ~0UL) /* Are any bits zero? */ + return result + size; /* Nope. */ +found_middle: + return result + ffz(tmp); +} + +#define find_first_zero_bit(addr, size) find_next_zero_bit((addr), (size), 0) +#define find_first_bit(addr, size) find_next_bit((addr), (size), 0) + +#endif /* HAVE_ARCH_FIND_BITOPS */ #ifdef __KERNEL__ +#ifndef HAVE_ARCH_SCHED_BITOPS + +#include <linux/compiler.h> /* unlikely() */ + +/* + * Every architecture must define this function. It's the fastest + * way of searching a 140-bit bitmap where the first 100 bits are + * unlikely to be set. It's guaranteed that at least one of the 140 + * bits is cleared. + */ +static inline int sched_find_first_bit(const unsigned long *b) +{ +#if BITS_PER_LONG == 64 + if (unlikely(b[0])) + return __ffs(b[0]); + if (unlikely(b[1])) + return __ffs(b[1]) + 64; + return __ffs(b[2]) + 128; +#elif BITS_PER_LONG == 32 + if (unlikely(b[0])) + return __ffs(b[0]); + if (unlikely(b[1])) + return __ffs(b[1]) + 32; + if (unlikely(b[2])) + return __ffs(b[2]) + 64; + if (b[3]) + return __ffs(b[3]) + 96; + return __ffs(b[4]) + 128; +#else +#error BITS_PER_LONG not defined +#endif +} + +#endif /* HAVE_ARCH_SCHED_BITOPS */ + +#ifndef HAVE_ARCH_FFS_BITOPS + /* * ffs: find first bit set. This is defined the same way as * the libc and compiler builtin ffs routines, therefore * differs in spirit from the above ffz (man ffs). */ -#define ffs(x) generic_ffs(x) +static inline int ffs(int x) +{ + int r = 1; + + if (!x) + return 0; + if (!(x & 0xffff)) { + x >>= 16; + r += 16; + } + if (!(x & 0xff)) { + x >>= 8; + r += 8; + } + if (!(x & 0xf)) { + x >>= 4; + r += 4; + } + if (!(x & 3)) { + x >>= 2; + r += 2; + } + if (!(x & 1)) { + x >>= 1; + r += 1; + } + return r; +} + +#endif /* HAVE_ARCH_FFS_BITOPS */ + + +#ifndef HAVE_ARCH_HWEIGHT_BITOPS /* * hweightN: returns the hamming weight (i.e. the number * of bits set) of a N-bit word */ -#define hweight32(x) generic_hweight32(x) -#define hweight16(x) generic_hweight16(x) -#define hweight8(x) generic_hweight8(x) +static inline unsigned int hweight32(unsigned int w) +{ + unsigned int res = (w & 0x55555555) + ((w >> 1) & 0x55555555); + res = (res & 0x33333333) + ((res >> 2) & 0x33333333); + res = (res & 0x0F0F0F0F) + ((res >> 4) & 0x0F0F0F0F); + res = (res & 0x00FF00FF) + ((res >> 8) & 0x00FF00FF); + return (res & 0x0000FFFF) + ((res >> 16) & 0x0000FFFF); +} + +static inline unsigned int hweight16(unsigned int w) +{ + unsigned int res = (w & 0x5555) + ((w >> 1) & 0x5555); + res = (res & 0x3333) + ((res >> 2) & 0x3333); + res = (res & 0x0F0F) + ((res >> 4) & 0x0F0F); + return (res & 0x00FF) + ((res >> 8) & 0x00FF); +} + +static inline unsigned int hweight8(unsigned int w) +{ + unsigned int res = (w & 0x55) + ((w >> 1) & 0x55); + res = (res & 0x33) + ((res >> 2) & 0x33); + return (res & 0x0F) + ((res >> 4) & 0x0F); +} + +#endif /* HAVE_ARCH_HWEIGHT_BITOPS */ + +#ifndef HAVE_ARCH_HWEIGHT64_BITOPS + +static inline unsigned long hweight64(__u64 w) +{ +#if BITS_PER_LONG < 64 + return hweight32((unsigned int)(w >> 32)) + hweight32((unsigned int)w); +#else + u64 res; + res = (w & 0x5555555555555555ul) + ((w >> 1) & 0x5555555555555555ul); + res = (res & 0x3333333333333333ul) + ((res >> 2) & 0x3333333333333333ul); + res = (res & 0x0F0F0F0F0F0F0F0Ful) + ((res >> 4) & 0x0F0F0F0F0F0F0F0Ful); + res = (res & 0x00FF00FF00FF00FFul) + ((res >> 8) & 0x00FF00FF00FF00FFul); + res = (res & 0x0000FFFF0000FFFFul) + ((res >> 16) & 0x0000FFFF0000FFFFul); + return (res & 0x00000000FFFFFFFFul) + ((res >> 32) & 0x00000000FFFFFFFFul); +#endif +} + +#endif /* HAVE_ARCH_HWEIGHT64_BITOPS */ + +#ifndef HAVE_ARCH_EXT2_NON_ATOMIC_BITOPS + +#include <asm/byteorder.h> + +#if defined(__LITTLE_ENDIAN) + +static __inline__ int generic_test_le_bit(unsigned long nr, + __const__ unsigned long *addr) +{ + __const__ unsigned char *tmp = (__const__ unsigned char *) addr; + return (tmp[nr >> 3] >> (nr & 7)) & 1; +} + +#define generic___set_le_bit(nr, addr) __set_bit(nr, addr) +#define generic___clear_le_bit(nr, addr) __clear_bit(nr, addr) + +#define generic_test_and_set_le_bit(nr, addr) test_and_set_bit(nr, addr) +#define generic_test_and_clear_le_bit(nr, addr) test_and_clear_bit(nr, addr) + +#define generic___test_and_set_le_bit(nr, addr) __test_and_set_bit(nr, addr) +#define generic___test_and_clear_le_bit(nr, addr) __test_and_clear_bit(nr, addr) + +#define generic_find_next_zero_le_bit(addr, size, offset) find_next_zero_bit(addr, size, offset) + +#elif defined(__BIG_ENDIAN) + +static __inline__ int generic_test_le_bit(unsigned long nr, + __const__ unsigned long *addr) +{ + __const__ unsigned char *tmp = (__const__ unsigned char *) addr; + return (tmp[nr >> 3] >> (nr & 7)) & 1; +} + +#define generic___set_le_bit(nr, addr) \ + __set_bit((nr) ^ BITOP_LE_SWIZZLE, (addr)) +#define generic___clear_le_bit(nr, addr) \ + __clear_bit((nr) ^ BITOP_LE_SWIZZLE, (addr)) + +#define generic_test_and_set_le_bit(nr, addr) \ + test_and_set_bit((nr) ^ BITOP_LE_SWIZZLE, (addr)) +#define generic_test_and_clear_le_bit(nr, addr) \ + test_and_clear_bit((nr) ^ BITOP_LE_SWIZZLE, (addr)) + +#define generic___test_and_set_le_bit(nr, addr) \ + __test_and_set_bit((nr) ^ BITOP_LE_SWIZZLE, (addr)) +#define generic___test_and_clear_le_bit(nr, addr) \ + __test_and_clear_bit((nr) ^ BITOP_LE_SWIZZLE, (addr)) + +/* include/linux/byteorder does not support "unsigned long" type */ +static inline unsigned long ext2_swabp(const unsigned long * x) +{ +#if BITS_PER_LONG == 64 + return (unsigned long) __swab64p((u64 *) x); +#elif BITS_PER_LONG == 32 + return (unsigned long) __swab32p((u32 *) x); +#else +#error BITS_PER_LONG not defined +#endif +} + +/* include/linux/byteorder doesn't support "unsigned long" type */ +static inline unsigned long ext2_swab(const unsigned long y) +{ +#if BITS_PER_LONG == 64 + return (unsigned long) __swab64((u64) y); +#elif BITS_PER_LONG == 32 + return (unsigned long) __swab32((u32) y); +#else +#error BITS_PER_LONG not defined +#endif +} + +static __inline__ unsigned long generic_find_next_zero_le_bit(const unsigned long *addr, + unsigned long size, unsigned long offset) +{ + const unsigned long *p = addr + BITOP_WORD(offset); + unsigned long result = offset & ~(BITS_PER_LONG - 1); + unsigned long tmp; + + if (offset >= size) + return size; + size -= result; + offset &= (BITS_PER_LONG - 1UL); + if (offset) { + tmp = ext2_swabp(p++); + tmp |= (~0UL >> (BITS_PER_LONG - offset)); + if (size < BITS_PER_LONG) + goto found_first; + if (~tmp) + goto found_middle; + size -= BITS_PER_LONG; + result += BITS_PER_LONG; + } + + while (size & ~(BITS_PER_LONG - 1)) { + if (~(tmp = *(p++))) + goto found_middle_swap; + result += BITS_PER_LONG; + size -= BITS_PER_LONG; + } + if (!size) + return result; + tmp = ext2_swabp(p); +found_first: + tmp |= ~0UL << size; + if (tmp == ~0UL) /* Are any bits zero? */ + return result + size; /* Nope. Skip ffz */ +found_middle: + return result + ffz(tmp); + +found_middle_swap: + return result + ffz(ext2_swab(tmp)); +} +#else +#error "Please fix <asm/byteorder.h>" +#endif + +#define generic_find_first_zero_le_bit(addr, size) \ + generic_find_next_zero_le_bit((addr), (size), 0) + +#define ext2_set_bit(nr,addr) \ + generic___test_and_set_le_bit((nr),(unsigned long *)(addr)) +#define ext2_clear_bit(nr,addr) \ + generic___test_and_clear_le_bit((nr),(unsigned long *)(addr)) + +#define ext2_test_bit(nr,addr) \ + generic_test_le_bit((nr),(unsigned long *)(addr)) +#define ext2_find_first_zero_bit(addr, size) \ + generic_find_first_zero_le_bit((unsigned long *)(addr), (size)) +#define ext2_find_next_zero_bit(addr, size, off) \ + generic_find_next_zero_le_bit((unsigned long *)(addr), (size), (off)) + +#endif /* HAVE_ARCH_EXT2_NON_ATOMIC_BITOPS */ + +#ifndef HAVE_ARCH_EXT2_ATOMIC_BITOPS + +#define ext2_set_bit_atomic(lock, nr, addr) \ + ({ \ + int ret; \ + spin_lock(lock); \ + ret = ext2_set_bit((nr), (unsigned long *)(addr)); \ + spin_unlock(lock); \ + ret; \ + }) + +#define ext2_clear_bit_atomic(lock, nr, addr) \ + ({ \ + int ret; \ + spin_lock(lock); \ + ret = ext2_clear_bit((nr), (unsigned long *)(addr)); \ + spin_unlock(lock); \ + ret; \ + }) + +#endif /* HAVE_ARCH_EXT2_ATOMIC_BITOPS */ + +#ifndef HAVE_ARCH_MINIX_BITOPS + +#define minix_test_and_set_bit(nr,addr) \ + __test_and_set_bit((nr),(unsigned long *)(addr)) +#define minix_set_bit(nr,addr) \ + __set_bit((nr),(unsigned long *)(addr)) +#define minix_test_and_clear_bit(nr,addr) \ + __test_and_clear_bit((nr),(unsigned long *)(addr)) +#define minix_test_bit(nr,addr) \ + test_bit((nr),(unsigned long *)(addr)) +#define minix_find_first_zero_bit(addr,size) \ + find_first_zero_bit((unsigned long *)(addr),(size)) + +#endif /* HAVE_ARCH_MINIX_BITOPS */ #endif /* __KERNEL__ */ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 3/6] C-language equivalents of include/asm-*/bitops.h 2006-01-25 11:32 ` [PATCH 3/6] C-language equivalents of include/asm-*/bitops.h Akinobu Mita @ 2006-01-25 20:02 ` Russell King 2006-01-25 20:59 ` Grant Grundler 0 siblings, 1 reply; 13+ messages in thread From: Russell King @ 2006-01-25 20:02 UTC (permalink / raw) To: Akinobu Mita Cc: linux-kernel, Richard Henderson, Ivan Kokshaysky, Ian Molton, dev-etrax, David Howells, Yoshinori Sato, Linus Torvalds, linux-ia64, Hirokazu Takata, linux-m68k, Greg Ungerer, linux-mips, parisc-linux, linuxppc-dev, linux390, linuxsh-dev, linuxsh-shmedia-dev, sparclinux, ultralinux, Miles Bader, Andi Kleen, Chris Zankel On Wed, Jan 25, 2006 at 08:32:06PM +0900, Akinobu Mita wrote: > +#ifndef HAVE_ARCH___FFS_BITOPS > + > +/** > + * __ffs - find first bit in word. > + * @word: The word to search > + * > + * Returns 0..BITS_PER_LONG-1 > + * Undefined if no bit exists, so code should check against 0 first. > + */ > +static inline unsigned long __ffs(unsigned long word) > { > - int mask; > + int b = 0, s; > > - addr += nr >> 5; > - mask = 1 << (nr & 0x1f); > - return ((mask & *addr) != 0); > +#if BITS_PER_LONG == 32 > + s = 16; if (word << 16 != 0) s = 0; b += s; word >>= s; > + s = 8; if (word << 24 != 0) s = 0; b += s; word >>= s; > + s = 4; if (word << 28 != 0) s = 0; b += s; word >>= s; > + s = 2; if (word << 30 != 0) s = 0; b += s; word >>= s; > + s = 1; if (word << 31 != 0) s = 0; b += s; > + > + return b; > +#elif BITS_PER_LONG == 64 > + s = 32; if (word << 32 != 0) s = 0; b += s; word >>= s; > + s = 16; if (word << 48 != 0) s = 0; b += s; word >>= s; > + s = 8; if (word << 56 != 0) s = 0; b += s; word >>= s; > + s = 4; if (word << 60 != 0) s = 0; b += s; word >>= s; > + s = 2; if (word << 62 != 0) s = 0; b += s; word >>= s; > + s = 1; if (word << 63 != 0) s = 0; b += s; > + > + return b; > +#else > +#error BITS_PER_LONG not defined > +#endif This code generates more expensive shifts than our (ARMs) existing C version. This is a backward step. Basically, shifts which depend on a variable are more expensive than constant-based shifts. I've not really looked at the rest because I haven't figured out which bits will be used on ARM and which won't - which I think is another problem with this patch set. I'll look again later tonight. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 3/6] C-language equivalents of include/asm-*/bitops.h 2006-01-25 20:02 ` Russell King @ 2006-01-25 20:59 ` Grant Grundler 2006-01-26 3:27 ` Akinobu Mita 0 siblings, 1 reply; 13+ messages in thread From: Grant Grundler @ 2006-01-25 20:59 UTC (permalink / raw) To: Akinobu Mita; +Cc: Linux Kernel Development, linux-ia64 On Wed, Jan 25, 2006 at 08:02:50PM +0000, Russell King wrote: > I've not really looked at the rest because I haven't figured out which > bits will be used on ARM and which won't - which I think is another > problem with this patch set. I'll look again later tonight. Russell, I have the same problem. This file is 920 lines long and contains 7 distinct changes according to the (well written) notes. Akinobu, I appreciate your work - but could this particular peice be split up into 7 chunks? That would make checking the behavior of something like HAVE_ARCH_FFZ_BITOPS easier for each arch. thanks, grant ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 3/6] C-language equivalents of include/asm-*/bitops.h 2006-01-25 20:59 ` Grant Grundler @ 2006-01-26 3:27 ` Akinobu Mita 2006-01-26 3:36 ` [PATCH 8/12] generic hweight{32,16,8}() Akinobu Mita 0 siblings, 1 reply; 13+ messages in thread From: Akinobu Mita @ 2006-01-26 3:27 UTC (permalink / raw) To: Grant Grundler; +Cc: Linux Kernel Development, linux-ia64 On Wed, Jan 25, 2006 at 12:59:07PM -0800, Grant Grundler wrote: > Akinobu, > I appreciate your work - but could this particular peice be > split up into 7 chunks? I have 12 patches. ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 8/12] generic hweight{32,16,8}() 2006-01-26 3:27 ` Akinobu Mita @ 2006-01-26 3:36 ` Akinobu Mita 2006-01-26 7:12 ` Balbir Singh 2006-01-26 18:57 ` Bryan O'Sullivan 0 siblings, 2 replies; 13+ messages in thread From: Akinobu Mita @ 2006-01-26 3:36 UTC (permalink / raw) To: Grant Grundler; +Cc: Linux Kernel Development, linux-ia64 This patch introduces the C-language equivalents of the functions below: unsigned int hweight32(unsigned int w); unsigned int hweight16(unsigned int w); unsigned int hweight8(unsigned int w); HAVE_ARCH_HWEIGHT_BITOPS is defined when the architecture has its own version of these functions. This code largely copied from: include/linux/bitops.h Index: 2.6-git/include/asm-generic/bitops.h =================================================================== --- 2.6-git.orig/include/asm-generic/bitops.h 2006-01-25 19:14:10.000000000 +0900 +++ 2.6-git/include/asm-generic/bitops.h 2006-01-25 19:14:11.000000000 +0900 @@ -458,14 +458,38 @@ #endif /* HAVE_ARCH_FFS_BITOPS */ +#ifndef HAVE_ARCH_HWEIGHT_BITOPS + /* * hweightN: returns the hamming weight (i.e. the number * of bits set) of a N-bit word */ -#define hweight32(x) generic_hweight32(x) -#define hweight16(x) generic_hweight16(x) -#define hweight8(x) generic_hweight8(x) +static inline unsigned int hweight32(unsigned int w) +{ + unsigned int res = (w & 0x55555555) + ((w >> 1) & 0x55555555); + res = (res & 0x33333333) + ((res >> 2) & 0x33333333); + res = (res & 0x0F0F0F0F) + ((res >> 4) & 0x0F0F0F0F); + res = (res & 0x00FF00FF) + ((res >> 8) & 0x00FF00FF); + return (res & 0x0000FFFF) + ((res >> 16) & 0x0000FFFF); +} + +static inline unsigned int hweight16(unsigned int w) +{ + unsigned int res = (w & 0x5555) + ((w >> 1) & 0x5555); + res = (res & 0x3333) + ((res >> 2) & 0x3333); + res = (res & 0x0F0F) + ((res >> 4) & 0x0F0F); + return (res & 0x00FF) + ((res >> 8) & 0x00FF); +} + +static inline unsigned int hweight8(unsigned int w) +{ + unsigned int res = (w & 0x55) + ((w >> 1) & 0x55); + res = (res & 0x33) + ((res >> 2) & 0x33); + return (res & 0x0F) + ((res >> 4) & 0x0F); +} + +#endif /* HAVE_ARCH_HWEIGHT_BITOPS */ #endif /* __KERNEL__ */ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 8/12] generic hweight{32,16,8}() 2006-01-26 3:36 ` [PATCH 8/12] generic hweight{32,16,8}() Akinobu Mita @ 2006-01-26 7:12 ` Balbir Singh 2006-01-26 10:04 ` Rutger Nijlunsing 2006-01-27 4:55 ` Akinobu Mita 2006-01-26 18:57 ` Bryan O'Sullivan 1 sibling, 2 replies; 13+ messages in thread From: Balbir Singh @ 2006-01-26 7:12 UTC (permalink / raw) To: Akinobu Mita; +Cc: Grant Grundler, Linux Kernel Development, linux-ia64 On 1/26/06, Akinobu Mita <mita@miraclelinux.com> wrote: > This patch introduces the C-language equivalents of the functions below: > unsigned int hweight32(unsigned int w); > unsigned int hweight16(unsigned int w); > unsigned int hweight8(unsigned int w); > > HAVE_ARCH_HWEIGHT_BITOPS is defined when the architecture has its own > version of these functions. > > This code largely copied from: > include/linux/bitops.h > > Index: 2.6-git/include/asm-generic/bitops.h > =================================================================== > --- 2.6-git.orig/include/asm-generic/bitops.h 2006-01-25 19:14:10.000000000 +0900 > +++ 2.6-git/include/asm-generic/bitops.h 2006-01-25 19:14:11.000000000 +0900 > @@ -458,14 +458,38 @@ > #endif /* HAVE_ARCH_FFS_BITOPS */ > > > +#ifndef HAVE_ARCH_HWEIGHT_BITOPS > + > /* > * hweightN: returns the hamming weight (i.e. the number > * of bits set) of a N-bit word > */ > > -#define hweight32(x) generic_hweight32(x) > -#define hweight16(x) generic_hweight16(x) > -#define hweight8(x) generic_hweight8(x) > +static inline unsigned int hweight32(unsigned int w) > +{ > + unsigned int res = (w & 0x55555555) + ((w >> 1) & 0x55555555); > + res = (res & 0x33333333) + ((res >> 2) & 0x33333333); > + res = (res & 0x0F0F0F0F) + ((res >> 4) & 0x0F0F0F0F); > + res = (res & 0x00FF00FF) + ((res >> 8) & 0x00FF00FF); > + return (res & 0x0000FFFF) + ((res >> 16) & 0x0000FFFF); > +} > + This can be replaced with register int res=w; res=res-((res>>1)&0x55555555); res=(res&0x33333333)+((res>>2)&0x33333333); res=(res+(res>>4))&0x0f0f0f0f; res=res+(res>>8); return (res+(res>>16)) & 0xff; Similar optimizations can be applied to the routines below. Please see http://www-cs-faculty.stanford.edu/~knuth/mmixware.html errata and the code in mmix-arith.w for the complete set of optimizations and credits. http://www.jjj.de/fxt/fxtbook.pdf is another inspirational source for such algorithms. > +static inline unsigned int hweight16(unsigned int w) > +{ > + unsigned int res = (w & 0x5555) + ((w >> 1) & 0x5555); > + res = (res & 0x3333) + ((res >> 2) & 0x3333); > + res = (res & 0x0F0F) + ((res >> 4) & 0x0F0F); > + return (res & 0x00FF) + ((res >> 8) & 0x00FF); > +} > + > +static inline unsigned int hweight8(unsigned int w) > +{ > + unsigned int res = (w & 0x55) + ((w >> 1) & 0x55); > + res = (res & 0x33) + ((res >> 2) & 0x33); > + return (res & 0x0F) + ((res >> 4) & 0x0F); > +} > + > +#endif /* HAVE_ARCH_HWEIGHT_BITOPS */ > > #endif /* __KERNEL__ */ Regards, Balbir ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 8/12] generic hweight{32,16,8}() 2006-01-26 7:12 ` Balbir Singh @ 2006-01-26 10:04 ` Rutger Nijlunsing 2006-01-27 4:55 ` Akinobu Mita 1 sibling, 0 replies; 13+ messages in thread From: Rutger Nijlunsing @ 2006-01-26 10:04 UTC (permalink / raw) To: Balbir Singh Cc: Akinobu Mita, Grant Grundler, Linux Kernel Development, linux-ia64 [snip] > > /* > > * hweightN: returns the hamming weight (i.e. the number > > * of bits set) of a N-bit word > > */ > > > > -#define hweight32(x) generic_hweight32(x) > > -#define hweight16(x) generic_hweight16(x) > > -#define hweight8(x) generic_hweight8(x) > > +static inline unsigned int hweight32(unsigned int w) > > +{ > > + unsigned int res = (w & 0x55555555) + ((w >> 1) & 0x55555555); > > + res = (res & 0x33333333) + ((res >> 2) & 0x33333333); > > + res = (res & 0x0F0F0F0F) + ((res >> 4) & 0x0F0F0F0F); > > + res = (res & 0x00FF00FF) + ((res >> 8) & 0x00FF00FF); > > + return (res & 0x0000FFFF) + ((res >> 16) & 0x0000FFFF); > > +} > > + > > This can be replaced with > > register int res=w; > res=res-((res>>1)&0x55555555); > res=(res&0x33333333)+((res>>2)&0x33333333); > res=(res+(res>>4))&0x0f0f0f0f; > res=res+(res>>8); > return (res+(res>>16)) & 0xff; > > Similar optimizations can be applied to the routines below. Please see > http://www-cs-faculty.stanford.edu/~knuth/mmixware.html errata and the code > in mmix-arith.w for the complete set of optimizations and credits. > > http://www.jjj.de/fxt/fxtbook.pdf is another inspirational source for > such algorithms. Ah, the joys of bit twiddling! http://graphics.stanford.edu/~seander/bithacks.html ...has some more. -- Rutger Nijlunsing ---------------------------------- eludias ed dse.nl never attribute to a conspiracy which can be explained by incompetence ---------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 8/12] generic hweight{32,16,8}() 2006-01-26 7:12 ` Balbir Singh 2006-01-26 10:04 ` Rutger Nijlunsing @ 2006-01-27 4:55 ` Akinobu Mita 2006-01-27 5:40 ` Balbir Singh 1 sibling, 1 reply; 13+ messages in thread From: Akinobu Mita @ 2006-01-27 4:55 UTC (permalink / raw) To: Balbir Singh; +Cc: Grant Grundler, Linux Kernel Development, linux-ia64 On Thu, Jan 26, 2006 at 12:42:09PM +0530, Balbir Singh wrote: > > +static inline unsigned int hweight32(unsigned int w) > > +{ > > + unsigned int res = (w & 0x55555555) + ((w >> 1) & 0x55555555); > > + res = (res & 0x33333333) + ((res >> 2) & 0x33333333); > > + res = (res & 0x0F0F0F0F) + ((res >> 4) & 0x0F0F0F0F); > > + res = (res & 0x00FF00FF) + ((res >> 8) & 0x00FF00FF); > > + return (res & 0x0000FFFF) + ((res >> 16) & 0x0000FFFF); > > +} > > + > > This can be replaced with > > register int res=w; > res=res-((res>>1)&0x55555555); > res=(res&0x33333333)+((res>>2)&0x33333333); > res=(res+(res>>4))&0x0f0f0f0f; > res=res+(res>>8); > return (res+(res>>16)) & 0xff; Probably you are right. Unfortunately, it is difficult for me to prove that sane equivalence. Anyway those hweight*() functions are copied from include/linux/bitops.h: generic_hweight*(). So you can optimize these functions. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 8/12] generic hweight{32,16,8}() 2006-01-27 4:55 ` Akinobu Mita @ 2006-01-27 5:40 ` Balbir Singh 2006-01-27 6:40 ` Akinobu Mita 0 siblings, 1 reply; 13+ messages in thread From: Balbir Singh @ 2006-01-27 5:40 UTC (permalink / raw) To: Akinobu Mita; +Cc: Grant Grundler, Linux Kernel Development, linux-ia64 On 1/27/06, Akinobu Mita <mita@miraclelinux.com> wrote: > On Thu, Jan 26, 2006 at 12:42:09PM +0530, Balbir Singh wrote: > > > > +static inline unsigned int hweight32(unsigned int w) > > > +{ > > > + unsigned int res = (w & 0x55555555) + ((w >> 1) & 0x55555555); > > > + res = (res & 0x33333333) + ((res >> 2) & 0x33333333); > > > + res = (res & 0x0F0F0F0F) + ((res >> 4) & 0x0F0F0F0F); > > > + res = (res & 0x00FF00FF) + ((res >> 8) & 0x00FF00FF); > > > + return (res & 0x0000FFFF) + ((res >> 16) & 0x0000FFFF); > > > +} > > > + > > > > This can be replaced with > > > > register int res=w; > > res=res-((res>>1)&0x55555555); > > res=(res&0x33333333)+((res>>2)&0x33333333); > > res=(res+(res>>4))&0x0f0f0f0f; > > res=res+(res>>8); > > return (res+(res>>16)) & 0xff; > > Probably you are right. > Unfortunately, it is difficult for me to prove that sane equivalence. > Well, a proof is not difficult. This is a well tested proven piece of code published by Don Knuth. If you need a proof, I can provide one. > Anyway those hweight*() functions are copied from include/linux/bitops.h: > generic_hweight*(). So you can optimize these functions. > You are right, even those functions can be optimized. Balbir ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 8/12] generic hweight{32,16,8}() 2006-01-27 5:40 ` Balbir Singh @ 2006-01-27 6:40 ` Akinobu Mita 2006-01-31 11:14 ` Balbir Singh 0 siblings, 1 reply; 13+ messages in thread From: Akinobu Mita @ 2006-01-27 6:40 UTC (permalink / raw) To: Balbir Singh; +Cc: Grant Grundler, Linux Kernel Development, linux-ia64 On Fri, Jan 27, 2006 at 11:10:29AM +0530, Balbir Singh wrote: > On 1/27/06, Akinobu Mita <mita@miraclelinux.com> wrote: > > On Thu, Jan 26, 2006 at 12:42:09PM +0530, Balbir Singh wrote: > > > > > > +static inline unsigned int hweight32(unsigned int w) > > > > +{ > > > > + unsigned int res = (w & 0x55555555) + ((w >> 1) & 0x55555555); > > > > + res = (res & 0x33333333) + ((res >> 2) & 0x33333333); > > > > + res = (res & 0x0F0F0F0F) + ((res >> 4) & 0x0F0F0F0F); > > > > + res = (res & 0x00FF00FF) + ((res >> 8) & 0x00FF00FF); > > > > + return (res & 0x0000FFFF) + ((res >> 16) & 0x0000FFFF); > > > > +} > > > > + > > > > > > This can be replaced with > > > > > > register int res=w; > > > res=res-((res>>1)&0x55555555); > > > res=(res&0x33333333)+((res>>2)&0x33333333); > > > res=(res+(res>>4))&0x0f0f0f0f; > > > res=res+(res>>8); > > > return (res+(res>>16)) & 0xff; > > > > Probably you are right. > > Unfortunately, it is difficult for me to prove that sane equivalence. > > > > Well, a proof is not difficult. This is a well tested proven piece of > code published by Don Knuth. If you need a proof, I can provide one. Thanks, I want. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 8/12] generic hweight{32,16,8}() 2006-01-27 6:40 ` Akinobu Mita @ 2006-01-31 11:14 ` Balbir Singh 0 siblings, 0 replies; 13+ messages in thread From: Balbir Singh @ 2006-01-31 11:14 UTC (permalink / raw) To: Akinobu Mita; +Cc: Grant Grundler, Linux Kernel Development, linux-ia64 > > Well, a proof is not difficult. This is a well tested proven piece of > > code published by Don Knuth. If you need a proof, I can provide one. > > Thanks, I want. > Please bear with me, I am caught up with other things. I will try and provide one soon. Balbir ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 8/12] generic hweight{32,16,8}() 2006-01-26 3:36 ` [PATCH 8/12] generic hweight{32,16,8}() Akinobu Mita 2006-01-26 7:12 ` Balbir Singh @ 2006-01-26 18:57 ` Bryan O'Sullivan 2006-01-27 4:43 ` Akinobu Mita 1 sibling, 1 reply; 13+ messages in thread From: Bryan O'Sullivan @ 2006-01-26 18:57 UTC (permalink / raw) To: Akinobu Mita; +Cc: Grant Grundler, Linux Kernel Development, linux-ia64 On Thu, 2006-01-26 at 12:36 +0900, Akinobu Mita wrote: > HAVE_ARCH_HWEIGHT_BITOPS is defined when the architecture has its own > version of these functions. All of this HAVE_ARCH_xxx stuff gave Linus heartburn a few weeks ago, and you're massively increasing its proliferation. How about putting each class of bitop into its own header file in asm-generic, and getting the arches that need each one to include the specific files it needs in its own bitops.h header? For example, the hweight stuff would go into asm-generic/bitops-hweight.h, and then asm-foo/bitops.h would just use #include <asm-generic/bitops-hweight.h> or else define its own if it didn't need the generic versions. <b ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 8/12] generic hweight{32,16,8}() 2006-01-26 18:57 ` Bryan O'Sullivan @ 2006-01-27 4:43 ` Akinobu Mita 2006-01-27 5:23 ` Bryan O'Sullivan 0 siblings, 1 reply; 13+ messages in thread From: Akinobu Mita @ 2006-01-27 4:43 UTC (permalink / raw) To: Bryan O'Sullivan; +Cc: Grant Grundler, Linux Kernel Development, linux-ia64 On Thu, Jan 26, 2006 at 10:57:47AM -0800, Bryan O'Sullivan wrote: > How about putting each class of bitop into its own header file in > asm-generic, and getting the arches that need each one to include the > specific files it needs in its own bitops.h header? > I think it's better than adding many HAVE_ARCH_*_BITOPS. I will have 14 new headers. So I want to make new directory include/asm-generic/bitops/: include/asm-generic/bitops/atomic.h include/asm-generic/bitops/nonatomic.h include/asm-generic/bitops/__ffs.h include/asm-generic/bitops/ffz.h include/asm-generic/bitops/fls.h include/asm-generic/bitops/fls64.h include/asm-generic/bitops/find.h include/asm-generic/bitops/ffs.h include/asm-generic/bitops/sched-ffs.h include/asm-generic/bitops/hweight.h include/asm-generic/bitops/hweight64.h include/asm-generic/bitops/ext2-atomic.h include/asm-generic/bitops/ext2-nonatomic.h include/asm-generic/bitops/minix.h ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 8/12] generic hweight{32,16,8}() 2006-01-27 4:43 ` Akinobu Mita @ 2006-01-27 5:23 ` Bryan O'Sullivan 0 siblings, 0 replies; 13+ messages in thread From: Bryan O'Sullivan @ 2006-01-27 5:23 UTC (permalink / raw) To: Akinobu Mita; +Cc: Grant Grundler, Linux Kernel Development, linux-ia64 On Fri, 2006-01-27 at 13:43 +0900, Akinobu Mita wrote: > I think it's better than adding many HAVE_ARCH_*_BITOPS. > I will have 14 new headers. So I want to make new directory > include/asm-generic/bitops/: While you're thrashing all that stuff, have you thought about adding generic support for the atomic_*_mask functions? Only eight of almost 30 arches actually implement them, which makes them worthless for portable drivers. The same approach you're using now with other bitops will work equally well, or be just as broken, depending on the arch in question :-) <b ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2006-02-02 9:34 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2006-01-31 16:49 [PATCH 8/12] generic hweight{32,16,8}() linux 2006-01-31 18:14 ` Grant Grundler 2006-02-02 9:34 ` Balbir Singh -- strict thread matches above, loose matches on Subject: below -- 2006-01-25 11:26 [PATCH 0/6] RFC: use include/asm-generic/bitops.h Akinobu Mita 2006-01-25 11:32 ` [PATCH 3/6] C-language equivalents of include/asm-*/bitops.h Akinobu Mita 2006-01-25 20:02 ` Russell King 2006-01-25 20:59 ` Grant Grundler 2006-01-26 3:27 ` Akinobu Mita 2006-01-26 3:36 ` [PATCH 8/12] generic hweight{32,16,8}() Akinobu Mita 2006-01-26 7:12 ` Balbir Singh 2006-01-26 10:04 ` Rutger Nijlunsing 2006-01-27 4:55 ` Akinobu Mita 2006-01-27 5:40 ` Balbir Singh 2006-01-27 6:40 ` Akinobu Mita 2006-01-31 11:14 ` Balbir Singh 2006-01-26 18:57 ` Bryan O'Sullivan 2006-01-27 4:43 ` Akinobu Mita 2006-01-27 5:23 ` Bryan O'Sullivan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).