linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Russell King <rmk+lkml@arm.linux.org.uk>
To: Grant Grundler <grundler@parisc-linux.org>
Cc: Akinobu Mita <mita@miraclelinux.com>,
	linux-kernel@vger.kernel.org,
	Ivan Kokshaysky <ink@jurassic.park.msu.ru>,
	Ian Molton <spyro@f2s.com>,
	dev-etrax@axis.com, David Howells <dhowells@redhat.com>,
	Yoshinori Sato <ysato@users.sourceforge.jp>,
	Linus Torvalds <torvalds@osdl.org>,
	linux-ia64@vger.kernel.org,
	Hirokazu Takata <takata@linux-m32r.org>,
	linux-m68k@vger.kernel.org, Greg Ungerer <gerg@uclinux.org>,
	linux-mips@linux-mips.org, parisc-linux@parisc-linux.org,
	linuxppc-dev@ozlabs.org, linux390@de.ibm.com,
	linuxsh-dev@lists.sourceforge.net,
	linuxsh-shmedia-dev@lists.sourceforge.net,
	sparclinux@vger.kernel.org, ultralinux@vger.kernel.org,
	Miles Bader <uclinux-v850@lsi.nec.co.jp>, Andi Kleen <ak@suse.de>,
	Chris Zankel <chris@zankel.net>
Subject: Re: [parisc-linux] Re: [PATCH 3/6] C-language equivalents of include/asm-*/bitops.h
Date: Thu, 26 Jan 2006 16:40:21 +0000	[thread overview]
Message-ID: <20060126164020.GA27222@flint.arm.linux.org.uk> (raw)
In-Reply-To: <20060126161849.GA13632@colo.lackof.org>

On Thu, Jan 26, 2006 at 09:18:49AM -0700, Grant Grundler wrote:
> On Thu, Jan 26, 2006 at 08:55:41AM +0000, Russell King wrote:
> > Unfortunately that's not correct.  You do not appear to have checked
> > the compiler output like I did - this code does _not_ generate
> > constant shifts.
> 
> Russell,
> By "written stupidly", I thought Richard meant they could have
> used constants instead of "s".  e.g.:
> 	if (word << 16 == 0) { b += 16; word >>= 16); }
> 	if (word << 24 == 0) { b +=  8; word >>=  8); }
> 	if (word << 28 == 0) { b +=  4; word >>=  4); }
> 
> But I prefer what Edgar Toernig suggested.

Ok, I can see I'm going to lose this, but what the hell.

Firstly though, an out of line function call on ARM clobbers six out
of 11 CPU registers.

Let's compare the implementations, which are:

int toernig_ffs(unsigned long word)
{
    int bit = 0;
    word &= -word; // only keep the lsb.
    if (word & 0xffff0000) bit |= 16;
    if (word & 0xff00ff00) bit |=  8;
    if (word & 0xf0f0f0f0) bit |=  4;
    if (word & 0xcccccccc) bit |=  2;
    if (word & 0xaaaaaaaa) bit |=  1;
    return bit;
}

toernig_ffs:
        rsb     r3, r0, #0
        and     r0, r0, r3
        mov     r3, r0, lsr #16
        bic     r2, r0, #16711680
        str     lr, [sp, #-4]!
        mov     r3, r3, asl #16
        ldr     lr, .L7
        ldr     r1, .L7+4
        ldr     ip, .L7+8
        cmp     r3, #0
        bic     r2, r2, #255
        and     lr, r0, lr
        and     r1, r0, r1
        and     ip, r0, ip
        movne   r0, #16
        moveq   r0, #0
        cmp     r2, #0
        orrne   r0, r0, #8
        cmp     r1, #0
        orrne   r0, r0, #4
        cmp     ip, #0
        orrne   r0, r0, #2
        cmp     lr, #0
        orrne   r0, r0, #1
        ldr     pc, [sp], #4
.L8:
        .align  2
.L7:
        .word   -1431655766
        .word   -252645136
        .word   -858993460

25 instructions.  3 words of additional data.  5 registers.  0 register
based shifts.

I feel that this is far too expensive to sanely inline - at least three
words of additional data for a use in a function, and has a high register
usage comparable to that of an out of line function.

int mita_ffs(unsigned long word)
{
     int b = 0, s;
     s = 16; if (word << 16 != 0) s = 0; b += s; word >>= s;
     s =  8; if (word << 24 != 0) s = 0; b += s; word >>= s;
     s =  4; if (word << 28 != 0) s = 0; b += s; word >>= s;
     s =  2; if (word << 30 != 0) s = 0; b += s; word >>= s;
     s =  1; if (word << 31 != 0) s = 0; b += s;
     return b;
}

mita_ffs:
        movs    r1, r0, asl #16
        moveq   r2, #16
        movne   r2, #0
        mov     r0, r0, lsr r2		@ register-based shift
        mov     r3, r2
        movs    r2, r0, asl #24
        moveq   r2, #8
        movne   r2, #0
        mov     r0, r0, lsr r2		@ register-based shift
        movs    r1, r0, asl #28
        add     r3, r3, r2
        moveq   r2, #4
        movne   r2, #0
        mov     r0, r0, lsr r2		@ register-based shift
        movs    r1, r0, asl #30
        add     r3, r3, r2
        moveq   r2, #2
        movne   r2, #0
        mov     r0, r0, lsr r2		@ register-based shift
        tst     r0, #1
        add     r3, r3, r2
        moveq   r2, #1
        movne   r2, #0
        add     r3, r3, r2
        mov     r0, r3
        mov     pc, lr

26 instructions.  4 registers used.  4 unconditional register-based
shifts (expensive).

Better, but uses inefficient register based shifts (which can take twice
as many cycles as non-register based shifts depending on the CPU).  Still
has a high usage on CPU registers though.  Could possibly be a candidate
for inlining.

int arm_ffs(unsigned long word)
{
     int k = 31;
     if (word & 0x0000ffff) { k -= 16; word <<= 16; }
     if (word & 0x00ff0000) { k -= 8;  word <<= 8;  }
     if (word & 0x0f000000) { k -= 4;  word <<= 4;  }
     if (word & 0x30000000) { k -= 2;  word <<= 2;  }
     if (word & 0x40000000) { k -= 1; }
     return k;
}

arm_ffs:
        mov     r3, r0, asl #16
        mov     r3, r3, lsr #16
        cmp     r3, #0
        movne   r0, r0, asl #16
        mov     r3, #31
        movne   r3, #15
        tst     r0, #16711680
        movne   r0, r0, asl #8
        subne   r3, r3, #8
        tst     r0, #251658240
        movne   r0, r0, asl #4
        subne   r3, r3, #4
        tst     r0, #805306368
        movne   r0, r0, asl #2
        subne   r3, r3, #2
        tst     r0, #1073741824
        subne   r3, r3, #1
        mov     r0, r3
        mov     pc, lr

19 instructions.  2 registers.  0 register based shifts.  More reasonable
for inlining.

Clearly the smallest of the lot with the smallest register pressure,
being the best candidate out of the lot, whether we inline it or not.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core

  parent reply	other threads:[~2006-01-26 16:40 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-01-25 11:26 [PATCH 0/6] RFC: use include/asm-generic/bitops.h Akinobu Mita
2006-01-25 11:28 ` [PATCH 1/6] {set,clear,test}_bit() related cleanup Akinobu Mita
2006-01-25 11:46   ` Andi Kleen
2006-01-26 16:14   ` Pavel Machek
2006-01-26 16:47     ` Russell King
2006-01-26 19:14     ` Paul Jackson
2006-01-25 11:30 ` [PATCH 2/6] use non atomic operations for minix_*_bit() and ext2_*_bit() Akinobu Mita
2006-01-25 11:32 ` [PATCH 3/6] C-language equivalents of include/asm-*/bitops.h Akinobu Mita
2006-01-25 11:54   ` Keith Owens
2006-01-26  2:13     ` Akinobu Mita
2006-01-26  2:19       ` Akinobu Mita
2006-01-25 20:02   ` Russell King
2006-01-25 20:59     ` Grant Grundler
2006-01-26  3:27       ` Akinobu Mita
2006-01-26  3:29         ` [PATCH 1/12] generic *_bit() Akinobu Mita
2006-02-01 15:11           ` Chen, Kenneth W
2006-02-01 18:02             ` Christoph Hellwig
2006-02-01 18:07               ` Chen, Kenneth W
2006-02-01 19:19                 ` Russell King
2006-02-01 19:25                   ` Chen, Kenneth W
2006-02-01 19:35                     ` Russell King
2006-02-03 10:24                   ` Geert Uytterhoeven
2006-02-03 10:27                     ` Russell King
2006-02-01 19:39                 ` Grant Grundler
2006-02-01 21:41                   ` Chen, Kenneth W
2006-02-01 22:09                     ` Grant Grundler
2006-02-01 22:49                       ` Anton Altaparmakov
2006-02-02  0:08                         ` Grant Grundler
2006-02-02  8:52                           ` Anton Altaparmakov
2006-02-02 10:13                             ` Andreas Schwab
2006-02-02 22:43                 ` Paul Mackerras
2006-01-26  3:30         ` [PATCH 2/12] generic __ffs() Akinobu Mita
2006-01-26  3:31         ` [PATCH 3/12] generic ffz() Akinobu Mita
2006-01-26  8:21           ` Michael Tokarev
2006-01-27  6:39             ` [PATCH] parisc: add ()-pair in __ffs() Akinobu Mita
2006-01-26  3:32         ` [PATCH 4/12] generic fls() and fls64() Akinobu Mita
2006-01-26  3:33         ` [PATCH 5/12] generic find_{next,first}{,_zero}_bit() Akinobu Mita
2006-01-26  3:34         ` [PATCH 6/12] generic sched_find_first_bit() Akinobu Mita
2006-01-26  3:35         ` [PATCH 7/12] generic ffs() Akinobu Mita
2006-01-26  3:36         ` [PATCH 8/12] generic hweight{32,16,8}() Akinobu Mita
2006-01-26  7:12           ` Balbir Singh
2006-01-26 10:04             ` Rutger Nijlunsing
2006-01-27  4:55             ` Akinobu Mita
2006-01-27  5:40               ` Balbir Singh
2006-01-27  6:40                 ` Akinobu Mita
2006-01-31 11:14                   ` Balbir Singh
2006-01-26 18:57           ` Bryan O'Sullivan
2006-01-27  4:43             ` Akinobu Mita
2006-01-27  5:23               ` Bryan O'Sullivan
2006-01-26  3:36         ` [PATCH 9/12] generic hweight64() Akinobu Mita
2006-01-26  7:05           ` Balbir Singh
2006-01-26  3:38         ` [PATCH 10/12] generic ext2_{set,clear,test,find_first_zero,find_next_zero}_bit() Akinobu Mita
2006-01-26  3:38         ` [PATCH 11/12] generic ext2_{set,clear}_bit_atomic() Akinobu Mita
2006-01-26  3:39         ` [PATCH 12/12] generic minix_{test,set,test_and_clear,test,find_first_zero}_bit() Akinobu Mita
2006-01-25 23:25     ` [PATCH 3/6] C-language equivalents of include/asm-*/bitops.h Ian Molton
2006-01-26  0:06     ` Richard Henderson
2006-01-26  4:34       ` Edgar Toernig
2006-01-26 17:30         ` Richard Henderson
2006-01-26  8:55       ` Russell King
2006-01-26 16:18         ` [parisc-linux] " Grant Grundler
2006-01-26 16:30           ` Nicolas Pitre
2006-01-26 16:40           ` Russell King [this message]
2006-01-26 23:04             ` Grant Grundler
2006-01-26 23:03               ` Russell King
2006-01-29  7:12                 ` Stuart Brady
2006-01-30  4:03                   ` David S. Miller
2006-01-30 17:06                   ` Ralf Baechle
2006-01-30 19:50                     ` Stuart Brady
2006-01-30 23:02                       ` David S. Miller
2006-01-27  0:28               ` [parisc-linux] Re: [PATCH 3/6] C-language equivalents of John David Anglin
2006-01-27 12:51   ` [PATCH 3/6] C-language equivalents of include/asm-*/bitops.h Hirokazu Takata
2006-01-30  3:29     ` Akinobu Mita
2006-01-25 11:34 ` [PATCH 5/6] fix warning on test_ti_thread_flag() Akinobu Mita
2006-01-25 12:28   ` Geert Uytterhoeven
2006-01-25 22:28   ` Paul Mackerras
2006-01-26  0:04     ` David S. Miller
2006-01-25 11:35 ` [PATCH 6/6] remove unused generic bitops in include/linux/bitops.h Akinobu Mita
     [not found] ` <20060125113336.GE18584@miraclelinux.com>
2006-01-26  1:49   ` [PATCH 4/6] use include/asm-generic/bitops for each architecture Akinobu Mita
2006-01-26  2:37     ` Grant Grundler
2006-01-27 13:04     ` Hirokazu Takata
2006-01-30  3:15       ` Akinobu Mita

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060126164020.GA27222@flint.arm.linux.org.uk \
    --to=rmk+lkml@arm.linux.org.uk \
    --cc=ak@suse.de \
    --cc=chris@zankel.net \
    --cc=dev-etrax@axis.com \
    --cc=dhowells@redhat.com \
    --cc=gerg@uclinux.org \
    --cc=grundler@parisc-linux.org \
    --cc=ink@jurassic.park.msu.ru \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-m68k@vger.kernel.org \
    --cc=linux-mips@linux-mips.org \
    --cc=linux390@de.ibm.com \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=linuxsh-dev@lists.sourceforge.net \
    --cc=linuxsh-shmedia-dev@lists.sourceforge.net \
    --cc=mita@miraclelinux.com \
    --cc=parisc-linux@parisc-linux.org \
    --cc=sparclinux@vger.kernel.org \
    --cc=spyro@f2s.com \
    --cc=takata@linux-m32r.org \
    --cc=torvalds@osdl.org \
    --cc=uclinux-v850@lsi.nec.co.jp \
    --cc=ultralinux@vger.kernel.org \
    --cc=ysato@users.sourceforge.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).