From: Scott Wood <scottwood@freescale.com> To: christophe leroy <christophe.leroy@c-s.fr> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>, Paul Mackerras <paulus@samba.org>, <linuxppc-dev@lists.ozlabs.org>, <linux-kernel@vger.kernel.org> Subject: Re: [v2,2/2] powerpc32: add support for csum_add() Date: Fri, 1 May 2015 20:00:14 -0500 [thread overview] Message-ID: <1430528414.16357.201.camel@freescale.com> (raw) In-Reply-To: <553FD904.8000309@c-s.fr> On Tue, 2015-04-28 at 21:01 +0200, christophe leroy wrote: > > > Le 25/03/2015 02:30, Scott Wood a écrit : > > > On Tue, Feb 03, 2015 at 12:39:27PM +0100, LEROY Christophe wrote: > > > The C version of csum_add() as defined in include/net/checksum.h gives the > > > following assembly: > > > 0: 7c 04 1a 14 add r0,r4,r3 > > > 4: 7c 64 00 10 subfc r3,r4,r0 > > > 8: 7c 63 19 10 subfe r3,r3,r3 > > > c: 7c 63 00 50 subf r3,r3,r0 > > > > > > include/net/checksum.h also offers the possibility to define an arch specific > > > function. > > > This patch provides a ppc32 specific csum_add() inline function. > > What makes it 32-bit specific? > > > > > As far as I understand, the 64-bit will do a 64 bit addition, so we > will have to handle differently the carry, can't just be an addze like > in 32-bit. OK. Before I couldn't find where this was ifdeffed to 32-bit, but it's in patch 1/2. > The generated code is most likely different on ppc64. I have no ppc64 > compiler so I can't check what gcc generates for the following code: > > __wsum csum_add(__wsum csum, __wsum addend) > { > u32 res = (__force u32)csum; > res += (__force u32)addend; > return (__force __wsum)(res + (res < (__force u32)addend)); > } > > Can someone with a ppc64 compiler tell what we get ? With CONFIG_GENERIC_CPU: 0xc000000000001af8 <+0>: add r3,r3,r4 0xc000000000001afc <+4>: cmplw cr7,r3,r4 0xc000000000001b00 <+8>: mfcr r4 0xc000000000001b04 <+12>: rlwinm r4,r4,29,31,31 0xc000000000001b08 <+16>: add r3,r4,r3 0xc000000000001b0c <+20>: clrldi r3,r3,32 0xc000000000001b10 <+24>: blr The mfcr is particularly nasty, at least on our chips. With CONFIG_CPU_E6500: 0xc000000000001b30 <+0>: add r3,r3,r4 0xc000000000001b34 <+4>: cmplw cr7,r3,r4 0xc000000000001b38 <+8>: mfocrf r4,1 0xc000000000001b3c <+12>: rlwinm r4,r4,29,31,31 0xc000000000001b40 <+16>: add r3,r4,r3 0xc000000000001b44 <+20>: clrldi r3,r3,32 0xc000000000001b48 <+24>: blr Ideal (short of a 64-bit __wsum) would probably be something like (untested): add r3,r3,r4 srdi r5,r3,32 add r3,r3,r5 clrldi r3,r3,32 Or in C code (which would let the compiler schedule it better): static inline __wsum csum_add(__wsum csum, __wsum addend) { u64 res = (__force u64)csum; res += (__force u32)addend; return (__force __wsum)((u32)res + (res >> 32)); } -Scott
WARNING: multiple messages have this Message-ID (diff)
From: Scott Wood <scottwood@freescale.com> To: christophe leroy <christophe.leroy@c-s.fr> Cc: Paul Mackerras <paulus@samba.org>, linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org Subject: Re: [v2,2/2] powerpc32: add support for csum_add() Date: Fri, 1 May 2015 20:00:14 -0500 [thread overview] Message-ID: <1430528414.16357.201.camel@freescale.com> (raw) In-Reply-To: <553FD904.8000309@c-s.fr> On Tue, 2015-04-28 at 21:01 +0200, christophe leroy wrote: > > > Le 25/03/2015 02:30, Scott Wood a écrit : > > > On Tue, Feb 03, 2015 at 12:39:27PM +0100, LEROY Christophe wrote: > > > The C version of csum_add() as defined in include/net/checksum.h gives the > > > following assembly: > > > 0: 7c 04 1a 14 add r0,r4,r3 > > > 4: 7c 64 00 10 subfc r3,r4,r0 > > > 8: 7c 63 19 10 subfe r3,r3,r3 > > > c: 7c 63 00 50 subf r3,r3,r0 > > > > > > include/net/checksum.h also offers the possibility to define an arch specific > > > function. > > > This patch provides a ppc32 specific csum_add() inline function. > > What makes it 32-bit specific? > > > > > As far as I understand, the 64-bit will do a 64 bit addition, so we > will have to handle differently the carry, can't just be an addze like > in 32-bit. OK. Before I couldn't find where this was ifdeffed to 32-bit, but it's in patch 1/2. > The generated code is most likely different on ppc64. I have no ppc64 > compiler so I can't check what gcc generates for the following code: > > __wsum csum_add(__wsum csum, __wsum addend) > { > u32 res = (__force u32)csum; > res += (__force u32)addend; > return (__force __wsum)(res + (res < (__force u32)addend)); > } > > Can someone with a ppc64 compiler tell what we get ? With CONFIG_GENERIC_CPU: 0xc000000000001af8 <+0>: add r3,r3,r4 0xc000000000001afc <+4>: cmplw cr7,r3,r4 0xc000000000001b00 <+8>: mfcr r4 0xc000000000001b04 <+12>: rlwinm r4,r4,29,31,31 0xc000000000001b08 <+16>: add r3,r4,r3 0xc000000000001b0c <+20>: clrldi r3,r3,32 0xc000000000001b10 <+24>: blr The mfcr is particularly nasty, at least on our chips. With CONFIG_CPU_E6500: 0xc000000000001b30 <+0>: add r3,r3,r4 0xc000000000001b34 <+4>: cmplw cr7,r3,r4 0xc000000000001b38 <+8>: mfocrf r4,1 0xc000000000001b3c <+12>: rlwinm r4,r4,29,31,31 0xc000000000001b40 <+16>: add r3,r4,r3 0xc000000000001b44 <+20>: clrldi r3,r3,32 0xc000000000001b48 <+24>: blr Ideal (short of a 64-bit __wsum) would probably be something like (untested): add r3,r3,r4 srdi r5,r3,32 add r3,r3,r5 clrldi r3,r3,32 Or in C code (which would let the compiler schedule it better): static inline __wsum csum_add(__wsum csum, __wsum addend) { u64 res = (__force u64)csum; res += (__force u32)addend; return (__force __wsum)((u32)res + (res >> 32)); } -Scott
next prev parent reply other threads:[~2015-05-02 1:00 UTC|newest] Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top 2015-02-03 11:39 [PATCH v2 2/2] powerpc32: add support for csum_add() Christophe Leroy 2015-02-03 11:39 ` Christophe Leroy 2015-03-25 1:30 ` [v2,2/2] " Scott Wood 2015-03-25 1:30 ` Scott Wood 2015-04-28 19:01 ` christophe leroy 2015-05-02 1:00 ` Scott Wood [this message] 2015-05-02 1:00 ` Scott Wood 2015-05-04 22:10 ` Segher Boessenkool 2015-05-04 22:10 ` Segher Boessenkool 2015-05-19 11:37 ` leroy christophe 2015-05-19 11:37 ` leroy christophe 2015-03-31 3:14 ` Scott Wood 2015-03-31 3:14 ` Scott Wood
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=1430528414.16357.201.camel@freescale.com \ --to=scottwood@freescale.com \ --cc=benh@kernel.crashing.org \ --cc=christophe.leroy@c-s.fr \ --cc=linux-kernel@vger.kernel.org \ --cc=linuxppc-dev@lists.ozlabs.org \ --cc=paulus@samba.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.