Hi, On March 1, hugang wrote: > On Thu, 1 May 2003 15:52:04 +0200 > > Willy TARREAU wrote: > > However, I have good news for you, the table code is 15% faster than my > > tree on an Alpha EV6 ;-) > > It may be because gcc can use different tricks on this arch. > > > > Cheers > > Willy > > However, The most code changed has increase a little peformance, Do we > really need it? > > Here is test in my machine, The faster is still table. I think the table version is so fast as a normal distribution of numbers is tested. With this distribution half of the occuring values have the MSB set, one quarter the next significant bit and so on... The tree version is approximately equally fast for every number, but the table version is fast if a very significant bit is set and slow if a less significant bit is set... If small values occour more often than big ones the tree version should be preferred, else following version may be very fast, too. static inline int fls_shift(int x) { int bit = 32; while(x > 0) { --bit; x <<= 1; } return x ? bit : 0; } I get following times on my K6-III (compiled with -O3 -march=k6) for 4294967296 iterations: fls_bsrl: (uses the 'bsrl' command via inline assembler) real 3m39.059s user 3m27.416s sys 0m0.345s fls_shift: real 1m47.337s user 1m41.801s sys 0m0.185s fls_tree: real 1m56.746s user 1m50.681s sys 0m0.165s The 32Bit tree is a bit slower here, too: real 1m59.447s user 1m53.375s sys 0m0.200s I did not test the table version as I think it cannot be faster than the shift version... Best regards Thomas Schlichter P.S.: I'd be happy to see how this performs on an Athlon or P-IV...