From: Segher Boessenkool <segher@kernel.crashing.org> To: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>, Paul Mackerras <paulus@samba.org>, Michael Ellerman <mpe@ellerman.id.au>, linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] powerpc: Force inlining of csum_add() Date: Tue, 11 May 2021 05:51:54 -0500 [thread overview] Message-ID: <20210511105154.GJ10366@gate.crashing.org> (raw) In-Reply-To: <f7f4d4e364de6e473da874468b903da6e5d97adc.1620713272.git.christophe.leroy@csgroup.eu> Hi! On Tue, May 11, 2021 at 06:08:06AM +0000, Christophe Leroy wrote: > Commit 328e7e487a46 ("powerpc: force inlining of csum_partial() to > avoid multiple csum_partial() with GCC10") inlined csum_partial(). > > Now that csum_partial() is inlined, GCC outlines csum_add() when > called by csum_partial(). > c064fb28 <csum_add>: > c064fb28: 7c 63 20 14 addc r3,r3,r4 > c064fb2c: 7c 63 01 94 addze r3,r3 > c064fb30: 4e 80 00 20 blr Could you build this with -fdump-tree-einline-all and send me the results? Or open a GCC PR yourself :-) Something seems to have decided this asm is more expensive than it is. That isn't always avoidable -- the compiler cannot look inside asms -- but it seems it could be improved here. Do you have (or can make) a self-contained testcase? > The sum with 0 is useless, should have been skipped. That isn't something the compiler can do anything about (not sure if you were suggesting that); it has to be done in the user code (and it tries to already, see below). > And there is even one completely unused instance of csum_add(). That is strange, that should never happen. > ./arch/powerpc/include/asm/checksum.h: In function '__ip6_tnl_rcv': > ./arch/powerpc/include/asm/checksum.h:94:22: warning: inlining failed in call to 'csum_add': call is unlikely and code size would grow [-Winline] > 94 | static inline __wsum csum_add(__wsum csum, __wsum addend) > | ^~~~~~~~ > ./arch/powerpc/include/asm/checksum.h:172:31: note: called from here > 172 | sum = csum_add(sum, (__force __wsum)*(const u32 *)buff); > | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ At least we say what happened. Progress! :-) > In the non-inlined version, the first sum with 0 was performed. > Here it is skipped. That is because of how __builtin_constant_p works, most likely. As we discussed elsewhere it is evaluated before all forms of loop unrolling. The patch looks perfect of course :-) Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Segher > --- a/arch/powerpc/include/asm/checksum.h > +++ b/arch/powerpc/include/asm/checksum.h > @@ -91,7 +91,7 @@ static inline __sum16 csum_tcpudp_magic(__be32 saddr, __be32 daddr, __u32 len, > } > > #define HAVE_ARCH_CSUM_ADD > -static inline __wsum csum_add(__wsum csum, __wsum addend) > +static __always_inline __wsum csum_add(__wsum csum, __wsum addend) > { > #ifdef __powerpc64__ > u64 res = (__force u64)csum;
WARNING: multiple messages have this Message-ID (diff)
From: Segher Boessenkool <segher@kernel.crashing.org> To: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Paul Mackerras <paulus@samba.org>, linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] powerpc: Force inlining of csum_add() Date: Tue, 11 May 2021 05:51:54 -0500 [thread overview] Message-ID: <20210511105154.GJ10366@gate.crashing.org> (raw) In-Reply-To: <f7f4d4e364de6e473da874468b903da6e5d97adc.1620713272.git.christophe.leroy@csgroup.eu> Hi! On Tue, May 11, 2021 at 06:08:06AM +0000, Christophe Leroy wrote: > Commit 328e7e487a46 ("powerpc: force inlining of csum_partial() to > avoid multiple csum_partial() with GCC10") inlined csum_partial(). > > Now that csum_partial() is inlined, GCC outlines csum_add() when > called by csum_partial(). > c064fb28 <csum_add>: > c064fb28: 7c 63 20 14 addc r3,r3,r4 > c064fb2c: 7c 63 01 94 addze r3,r3 > c064fb30: 4e 80 00 20 blr Could you build this with -fdump-tree-einline-all and send me the results? Or open a GCC PR yourself :-) Something seems to have decided this asm is more expensive than it is. That isn't always avoidable -- the compiler cannot look inside asms -- but it seems it could be improved here. Do you have (or can make) a self-contained testcase? > The sum with 0 is useless, should have been skipped. That isn't something the compiler can do anything about (not sure if you were suggesting that); it has to be done in the user code (and it tries to already, see below). > And there is even one completely unused instance of csum_add(). That is strange, that should never happen. > ./arch/powerpc/include/asm/checksum.h: In function '__ip6_tnl_rcv': > ./arch/powerpc/include/asm/checksum.h:94:22: warning: inlining failed in call to 'csum_add': call is unlikely and code size would grow [-Winline] > 94 | static inline __wsum csum_add(__wsum csum, __wsum addend) > | ^~~~~~~~ > ./arch/powerpc/include/asm/checksum.h:172:31: note: called from here > 172 | sum = csum_add(sum, (__force __wsum)*(const u32 *)buff); > | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ At least we say what happened. Progress! :-) > In the non-inlined version, the first sum with 0 was performed. > Here it is skipped. That is because of how __builtin_constant_p works, most likely. As we discussed elsewhere it is evaluated before all forms of loop unrolling. The patch looks perfect of course :-) Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Segher > --- a/arch/powerpc/include/asm/checksum.h > +++ b/arch/powerpc/include/asm/checksum.h > @@ -91,7 +91,7 @@ static inline __sum16 csum_tcpudp_magic(__be32 saddr, __be32 daddr, __u32 len, > } > > #define HAVE_ARCH_CSUM_ADD > -static inline __wsum csum_add(__wsum csum, __wsum addend) > +static __always_inline __wsum csum_add(__wsum csum, __wsum addend) > { > #ifdef __powerpc64__ > u64 res = (__force u64)csum;
next prev parent reply other threads:[~2021-05-11 10:57 UTC|newest] Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-05-11 6:08 [PATCH] powerpc: Force inlining of csum_add() Christophe Leroy 2021-05-11 6:08 ` Christophe Leroy 2021-05-11 10:51 ` Segher Boessenkool [this message] 2021-05-11 10:51 ` Segher Boessenkool 2021-05-12 12:56 ` Christophe Leroy 2021-05-12 12:56 ` Christophe Leroy 2021-05-12 14:31 ` Segher Boessenkool 2021-05-12 14:31 ` Segher Boessenkool 2021-05-12 14:43 ` Christophe Leroy 2021-05-12 14:43 ` Christophe Leroy 2021-05-12 18:21 ` Segher Boessenkool 2021-05-12 18:21 ` Segher Boessenkool 2021-06-18 3:51 ` Michael Ellerman
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20210511105154.GJ10366@gate.crashing.org \ --to=segher@kernel.crashing.org \ --cc=benh@kernel.crashing.org \ --cc=christophe.leroy@csgroup.eu \ --cc=linux-kernel@vger.kernel.org \ --cc=linuxppc-dev@lists.ozlabs.org \ --cc=mpe@ellerman.id.au \ --cc=paulus@samba.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.