linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] RFC: use include/asm-generic/bitops.h
@ 2006-01-25 11:26 Akinobu Mita
  2006-01-25 11:28 ` [PATCH 1/6] {set,clear,test}_bit() related cleanup Akinobu Mita
                   ` (5 more replies)
  0 siblings, 6 replies; 84+ messages in thread
From: Akinobu Mita @ 2006-01-25 11:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Richard Henderson, Ivan Kokshaysky, Russell King, Ian Molton,
	dev-etrax, David Howells, Yoshinori Sato, Linus Torvalds,
	linux-ia64, Hirokazu Takata, linux-m68k, Greg Ungerer,
	linux-mips, parisc-linux, linuxppc-dev, linux390, linuxsh-dev,
	linuxsh-shmedia-dev, sparclinux, ultralinux, Miles Bader,
	Andi Kleen, Chris Zankel

Large number of boilerplate bit operations written in C-language
are scattered around include/asm-*/bitops.h.
These patch series gather them into include/asm-generic/bitops.h. And

- kill duplicated code and comment (about 4000lines)
- use better C-language equivalents
- help porting new architecture (now include/asm-generic/bitops.h is not
  referenced from anywhere)


^ permalink raw reply	[flat|nested] 84+ messages in thread
* Re: [PATCH 8/12] generic hweight{32,16,8}()
@ 2006-01-31 16:49 linux
  2006-01-31 18:14 ` Grant Grundler
  2006-02-02  9:34 ` Balbir Singh
  0 siblings, 2 replies; 84+ messages in thread
From: linux @ 2006-01-31 16:49 UTC (permalink / raw)
  To: linux-ia64, linux-kernel, mita

This is an extremely well-known technique.  You can see a similar version
that uses a multiply for the last few steps at
http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel
whch refers to 
"Software Optimization Guide for AMD Athlon 64 and Opteron Processors"
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25112.PDF

It's section 8.6, "Efficient Implementation of Population-Count Function
in 32-bit Mode", pages 179-180.

It uses the name that I am more familiar with, "popcunt" (population count),
although "Hamming weight" also makes sense.

Anyway, the proof of correctness proceeds as follows:

	b = a - ((a >> 1) & 0x55555555);
	c = (b & 0x33333333) + ((b >> 2) & 0x33333333);
	d = (c + (c >> 4)) & 0x0f0f0f0f;
#if SLOW_MULTIPLY
	e = d + (d >> 8)
	f = e + (e >> 16);
	return f & 63;
#else
	/* Useful if multiply takes at most 4 cycles */
	return (d * 0x01010101) >> 24;
#endif

The input value a can be thought of as 32 1-bit fields each holding
their own hamming weight.  Now look at it as 16 2-bit fields.
Each 2-bit field a1..a0 has the value 2*a1 + a0.  This can be converted
into the hamming weight of the 2-bit field a1+a0 by subtracting a1.

That's what the (a >> 1) & mask subtraction does.  Since there can be no
borrows, you can just do it all at once.

Enumerating the 4 possible cases:

0b00 = 0  ->  0 - 0 = 0
0b01 = 1  ->  1 - 0 = 1
0b10 = 2  ->  2 - 1 = 1
0b11 = 3  ->  3 - 1 = 2


The next step consists of breaking up b (made of 16 2-bir fields) into
even and odd halves and adding them into 4-bit fields.  Since the largest
possible sum is 2+2 = 4, which will not fit into a 4-bit field, the 2-bit
fields have to be masked before they are added.


After this point, the masking can be delayed.  Each 4-bit field holds
a population count from 0..4, taking at most 3 bits.  These numbers can
be added without overflowing a 4-bit field, so we can compute
c + (c >> 4), and only then mask off the unwanted bits.


This produces d, a number of 4 8-bit fields, each in the range 0..8.
>From this point, we can shift and add d multiple times without overflowing
an 8-bit field, and only do a final mask at the end.

The number to mask with has to be at least 63 (so that 32 on't be truncated),
but can also be 128 or 255.  The x86 has a special encoding for signed
immediate byte values -128..127, so the value of 255 is slower.  On
other processors, a special "sign extend byte" instruction might be faster.


On a processor with fast integer multiplies (Athlon but not P4), you can
reduce the final few serially dependent instructions to a single integer
multiply.  Consider d to be 3 8-bit values d3, d2, d1 and d0, each in the
range 0..8.  The multiply forms the partial products:

           d3 d2 d1 d0
        d3 d2 d1 d0
     d3 d2 d1 d0
+ d3 d2 d1 d0
----------------------
           e3 e2 e1 e0

Where e3 = d3 + d2 + d1 + d0.   e2, e1 and e0 obviously cannot generate
any carries.

^ permalink raw reply	[flat|nested] 84+ messages in thread

end of thread, other threads:[~2006-02-03 10:27 UTC | newest]

Thread overview: 84+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-25 11:26 [PATCH 0/6] RFC: use include/asm-generic/bitops.h Akinobu Mita
2006-01-25 11:28 ` [PATCH 1/6] {set,clear,test}_bit() related cleanup Akinobu Mita
2006-01-25 11:46   ` Andi Kleen
2006-01-26 16:14   ` Pavel Machek
2006-01-26 16:47     ` Russell King
2006-01-26 19:14     ` Paul Jackson
2006-01-25 11:30 ` [PATCH 2/6] use non atomic operations for minix_*_bit() and ext2_*_bit() Akinobu Mita
2006-01-25 11:32 ` [PATCH 3/6] C-language equivalents of include/asm-*/bitops.h Akinobu Mita
2006-01-25 11:54   ` Keith Owens
2006-01-26  2:13     ` Akinobu Mita
2006-01-26  2:19       ` Akinobu Mita
2006-01-25 20:02   ` Russell King
2006-01-25 20:59     ` Grant Grundler
2006-01-26  3:27       ` Akinobu Mita
2006-01-26  3:29         ` [PATCH 1/12] generic *_bit() Akinobu Mita
2006-02-01 15:11           ` Chen, Kenneth W
2006-02-01 18:02             ` Christoph Hellwig
2006-02-01 18:07               ` Chen, Kenneth W
2006-02-01 19:19                 ` Russell King
2006-02-01 19:25                   ` Chen, Kenneth W
2006-02-01 19:35                     ` Russell King
2006-02-03 10:24                   ` Geert Uytterhoeven
2006-02-03 10:27                     ` Russell King
2006-02-01 19:39                 ` Grant Grundler
2006-02-01 21:41                   ` Chen, Kenneth W
2006-02-01 22:09                     ` Grant Grundler
2006-02-01 22:49                       ` Anton Altaparmakov
2006-02-02  0:08                         ` Grant Grundler
2006-02-02  8:52                           ` Anton Altaparmakov
2006-02-02 10:13                             ` Andreas Schwab
2006-02-02 22:43                 ` Paul Mackerras
2006-01-26  3:30         ` [PATCH 2/12] generic __ffs() Akinobu Mita
2006-01-26  3:31         ` [PATCH 3/12] generic ffz() Akinobu Mita
2006-01-26  8:21           ` Michael Tokarev
2006-01-27  6:39             ` [PATCH] parisc: add ()-pair in __ffs() Akinobu Mita
2006-01-26  3:32         ` [PATCH 4/12] generic fls() and fls64() Akinobu Mita
2006-01-26  3:33         ` [PATCH 5/12] generic find_{next,first}{,_zero}_bit() Akinobu Mita
2006-01-26  3:34         ` [PATCH 6/12] generic sched_find_first_bit() Akinobu Mita
2006-01-26  3:35         ` [PATCH 7/12] generic ffs() Akinobu Mita
2006-01-26  3:36         ` [PATCH 8/12] generic hweight{32,16,8}() Akinobu Mita
2006-01-26  7:12           ` Balbir Singh
2006-01-26 10:04             ` Rutger Nijlunsing
2006-01-27  4:55             ` Akinobu Mita
2006-01-27  5:40               ` Balbir Singh
2006-01-27  6:40                 ` Akinobu Mita
2006-01-31 11:14                   ` Balbir Singh
2006-01-26 18:57           ` Bryan O'Sullivan
2006-01-27  4:43             ` Akinobu Mita
2006-01-27  5:23               ` Bryan O'Sullivan
2006-01-26  3:36         ` [PATCH 9/12] generic hweight64() Akinobu Mita
2006-01-26  7:05           ` Balbir Singh
2006-01-26  3:38         ` [PATCH 10/12] generic ext2_{set,clear,test,find_first_zero,find_next_zero}_bit() Akinobu Mita
2006-01-26  3:38         ` [PATCH 11/12] generic ext2_{set,clear}_bit_atomic() Akinobu Mita
2006-01-26  3:39         ` [PATCH 12/12] generic minix_{test,set,test_and_clear,test,find_first_zero}_bit() Akinobu Mita
2006-01-25 23:25     ` [PATCH 3/6] C-language equivalents of include/asm-*/bitops.h Ian Molton
2006-01-26  0:06     ` Richard Henderson
2006-01-26  4:34       ` Edgar Toernig
2006-01-26 17:30         ` Richard Henderson
2006-01-26  8:55       ` Russell King
2006-01-26 16:18         ` [parisc-linux] " Grant Grundler
2006-01-26 16:30           ` Nicolas Pitre
2006-01-26 16:40           ` Russell King
2006-01-26 23:04             ` Grant Grundler
2006-01-26 23:03               ` Russell King
2006-01-29  7:12                 ` Stuart Brady
2006-01-30  4:03                   ` David S. Miller
2006-01-30 17:06                   ` Ralf Baechle
2006-01-30 19:50                     ` Stuart Brady
2006-01-30 23:02                       ` David S. Miller
2006-01-27  0:28               ` [parisc-linux] Re: [PATCH 3/6] C-language equivalents of John David Anglin
2006-01-27 12:51   ` [PATCH 3/6] C-language equivalents of include/asm-*/bitops.h Hirokazu Takata
2006-01-30  3:29     ` Akinobu Mita
2006-01-25 11:34 ` [PATCH 5/6] fix warning on test_ti_thread_flag() Akinobu Mita
2006-01-25 12:28   ` Geert Uytterhoeven
2006-01-25 22:28   ` Paul Mackerras
2006-01-26  0:04     ` David S. Miller
2006-01-25 11:35 ` [PATCH 6/6] remove unused generic bitops in include/linux/bitops.h Akinobu Mita
     [not found] ` <20060125113336.GE18584@miraclelinux.com>
2006-01-26  1:49   ` [PATCH 4/6] use include/asm-generic/bitops for each architecture Akinobu Mita
2006-01-26  2:37     ` Grant Grundler
2006-01-27 13:04     ` Hirokazu Takata
2006-01-30  3:15       ` Akinobu Mita
2006-01-31 16:49 [PATCH 8/12] generic hweight{32,16,8}() linux
2006-01-31 18:14 ` Grant Grundler
2006-02-02  9:34 ` Balbir Singh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).