From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760916AbYCCJz1 (ORCPT ); Mon, 3 Mar 2008 04:55:27 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754625AbYCCJzM (ORCPT ); Mon, 3 Mar 2008 04:55:12 -0500 Received: from gra-lx1.iram.es ([150.214.224.41]:54829 "EHLO gra-lx1.iram.es" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754482AbYCCJzK (ORCPT ); Mon, 3 Mar 2008 04:55:10 -0500 From: Gabriel Paubert Date: Mon, 3 Mar 2008 10:54:43 +0100 To: Steven Rostedt Cc: Benjamin Herrenschmidt , linuxppc-dev@ozlabs.org, paulus@samba.org, LKML Subject: Re: [PATCH] add strncmp to PowerPC Message-ID: <20080303095443.GB27105@iram.es> References: <1204301097.14759.6.camel@localhost.localdomain> <1204340690.15052.457.camel@pasglop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.13 (2006-08-11) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Feb 29, 2008 at 10:56:45PM -0500, Steven Rostedt wrote: > > On Sat, 1 Mar 2008, Benjamin Herrenschmidt wrote: > > > > Do we have any indication that it performs better than the C one ? > > See below. > > > > > Ben. > > > > > > > > > +_GLOBAL(strncmp) > > > + mtctr r5 > > > + addi r5,r3,-1 > > > + addi r4,r4,-1 > > > +1: lbzu r3,1(r5) > > > + cmpwi 1,r3,0 > > > + lbzu r0,1(r4) > > > + subf. r3,r0,r3 > > > + beqlr 1 > > > + bdnzt eq,1b > > > + blr > > > + > > > And here's the objdump of the C version: > > 0000000000000080 <.strncmp>: > 80: fb e1 ff f0 std r31,-16(r1) > 84: f8 21 ff c1 stdu r1,-64(r1) > 88: 7c 69 1b 78 mr r9,r3 > 8c: 7c a0 2b 79 mr. r0,r5 > 90: 38 60 00 00 li r3,0 > 94: 7c 09 03 a6 mtctr r0 > 98: 7c 3f 0b 78 mr r31,r1 > 9c: 41 82 00 68 beq- 104 <.strncmp+0x84> > a0: 89 69 00 00 lbz r11,0(r9) > a4: 88 04 00 00 lbz r0,0(r4) > a8: 7c 00 58 50 subf r0,r0,r11 > ac: 78 00 06 20 clrldi r0,r0,56 > b0: 2f a0 00 00 cmpdi cr7,r0,0 > b4: 7c 00 07 74 extsb r0,r0 > b8: 7c 03 03 78 mr r3,r0 > bc: 40 9e 00 48 bne- cr7,104 <.strncmp+0x84> > c0: 2f ab 00 00 cmpdi cr7,r11,0 > c4: 41 9e 00 40 beq- cr7,104 <.strncmp+0x84> > c8: 38 84 00 01 addi r4,r4,1 > cc: 38 69 00 01 addi r3,r9,1 > d0: 42 40 00 30 bdz- 100 <.strncmp+0x80> > d4: 88 03 00 00 lbz r0,0(r3) > d8: 89 24 00 00 lbz r9,0(r4) > dc: 38 63 00 01 addi r3,r3,1 > e0: 38 84 00 01 addi r4,r4,1 > e4: 2f 20 00 00 cmpdi cr6,r0,0 > e8: 7c 09 00 50 subf r0,r9,r0 > ec: 78 00 06 20 clrldi r0,r0,56 > f0: 2f a0 00 00 cmpdi cr7,r0,0 > f4: 7c 00 07 74 extsb r0,r0 > f8: 40 9e 00 08 bne- cr7,100 <.strncmp+0x80> > fc: 40 9a ff d4 bne+ cr6,d0 <.strncmp+0x50> > 100: 7c 03 03 78 mr r3,r0 > 104: e8 21 00 00 ld r1,0(r1) > 108: eb e1 ff f0 ld r31,-16(r1) > 10c: 4e 80 00 20 blr > > > I'll let you decide ;-) > > Even if it was logically faster (which I still doubt) it's a hell of a lot > of cache lines to waste. Indeed, but there are some corner cases that the C code handles. Like a length of 0 which may lead to infinite loop in the asm code. OTOH, I'm a bit surprised by the extsb instructions in the compiler generated code. We don't compile with -fsigned-char, do we? The clrldi instructions are also extremely stupid. Now that I think a bit more about it, I believe that the C version is incorrect: the clrldi/extsb dance takes a value between -255 and +255 and collapses it into the -128 to 127 range, meaning that the return value may be wrong if we rely on the sign of the result. So unless I miss something, the problem is much more serious than just stupid code (I had just a look at the libc version in C and characters are cast to unsigned char before the comparison). Regards, Gabriel