linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] powerpc: force inlining of csum_partial() to avoid multiple csum_partial() with GCC10
@ 2020-10-15 10:52 Christophe Leroy
  2020-10-15 13:25 ` Segher Boessenkool
  2020-12-21 11:03 ` Michael Ellerman
  0 siblings, 2 replies; 5+ messages in thread
From: Christophe Leroy @ 2020-10-15 10:52 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linux-kernel, linuxppc-dev

	ppc-linux-objdump -d vmlinux | grep -e "<csum_partial>" -e "<__csum_partial>"

With gcc9 I get:

	c0017ef8 <__csum_partial>:
	c00182fc:	4b ff fb fd 	bl      c0017ef8 <__csum_partial>
	c0018478:	4b ff fa 80 	b       c0017ef8 <__csum_partial>
	c03e8458:	4b c2 fa a0 	b       c0017ef8 <__csum_partial>
	c03e8518:	4b c2 f9 e1 	bl      c0017ef8 <__csum_partial>
	c03ef410:	4b c2 8a e9 	bl      c0017ef8 <__csum_partial>
	c03f0b24:	4b c2 73 d5 	bl      c0017ef8 <__csum_partial>
	c04279a4:	4b bf 05 55 	bl      c0017ef8 <__csum_partial>
	c0429820:	4b be e6 d9 	bl      c0017ef8 <__csum_partial>
	c0429944:	4b be e5 b5 	bl      c0017ef8 <__csum_partial>
	c042b478:	4b be ca 81 	bl      c0017ef8 <__csum_partial>
	c042b554:	4b be c9 a5 	bl      c0017ef8 <__csum_partial>
	c045f15c:	4b bb 8d 9d 	bl      c0017ef8 <__csum_partial>
	c0492190:	4b b8 5d 69 	bl      c0017ef8 <__csum_partial>
	c0492310:	4b b8 5b e9 	bl      c0017ef8 <__csum_partial>
	c0495594:	4b b8 29 65 	bl      c0017ef8 <__csum_partial>
	c049c420:	4b b7 ba d9 	bl      c0017ef8 <__csum_partial>
	c049c870:	4b b7 b6 89 	bl      c0017ef8 <__csum_partial>
	c049c930:	4b b7 b5 c9 	bl      c0017ef8 <__csum_partial>
	c04a9ca0:	4b b6 e2 59 	bl      c0017ef8 <__csum_partial>
	c04bdde4:	4b b5 a1 15 	bl      c0017ef8 <__csum_partial>
	c04be480:	4b b5 9a 79 	bl      c0017ef8 <__csum_partial>
	c04be710:	4b b5 97 e9 	bl      c0017ef8 <__csum_partial>
	c04c969c:	4b b4 e8 5d 	bl      c0017ef8 <__csum_partial>
	c04ca2fc:	4b b4 db fd 	bl      c0017ef8 <__csum_partial>
	c04cf5bc:	4b b4 89 3d 	bl      c0017ef8 <__csum_partial>
	c04d0440:	4b b4 7a b9 	bl      c0017ef8 <__csum_partial>

With gcc10 I get:

	c0018d08 <__csum_partial>:
	c0019020 <csum_partial>:
	c0019020:	4b ff fc e8 	b       c0018d08 <__csum_partial>
	c001914c:	4b ff fe d4 	b       c0019020 <csum_partial>
	c0019250:	4b ff fd d1 	bl      c0019020 <csum_partial>
	c03e404c <csum_partial>:
	c03e404c:	4b c3 4c bc 	b       c0018d08 <__csum_partial>
	c03e4050:	4b ff ff fc 	b       c03e404c <csum_partial>
	c03e40fc:	4b ff ff 51 	bl      c03e404c <csum_partial>
	c03e6680:	4b ff d9 cd 	bl      c03e404c <csum_partial>
	c03e68c4:	4b ff d7 89 	bl      c03e404c <csum_partial>
	c03e7934:	4b ff c7 19 	bl      c03e404c <csum_partial>
	c03e7bf8:	4b ff c4 55 	bl      c03e404c <csum_partial>
	c03eb148:	4b ff 8f 05 	bl      c03e404c <csum_partial>
	c03ecf68:	4b c2 bd a1 	bl      c0018d08 <__csum_partial>
	c04275b8 <csum_partial>:
	c04275b8:	4b bf 17 50 	b       c0018d08 <__csum_partial>
	c0427884:	4b ff fd 35 	bl      c04275b8 <csum_partial>
	c0427b18:	4b ff fa a1 	bl      c04275b8 <csum_partial>
	c0427bd8:	4b ff f9 e1 	bl      c04275b8 <csum_partial>
	c0427cd4:	4b ff f8 e5 	bl      c04275b8 <csum_partial>
	c0427e34:	4b ff f7 85 	bl      c04275b8 <csum_partial>
	c045a1c0:	4b bb eb 49 	bl      c0018d08 <__csum_partial>
	c0489464 <csum_partial>:
	c0489464:	4b b8 f8 a4 	b       c0018d08 <__csum_partial>
	c04896b0:	4b ff fd b5 	bl      c0489464 <csum_partial>
	c048982c:	4b ff fc 39 	bl      c0489464 <csum_partial>
	c0490158:	4b b8 8b b1 	bl      c0018d08 <__csum_partial>
	c0492f0c <csum_partial>:
	c0492f0c:	4b b8 5d fc 	b       c0018d08 <__csum_partial>
	c049326c:	4b ff fc a1 	bl      c0492f0c <csum_partial>
	c049333c:	4b ff fb d1 	bl      c0492f0c <csum_partial>
	c0493b18:	4b ff f3 f5 	bl      c0492f0c <csum_partial>
	c0493f50:	4b ff ef bd 	bl      c0492f0c <csum_partial>
	c0493ffc:	4b ff ef 11 	bl      c0492f0c <csum_partial>
	c04a0f78:	4b b7 7d 91 	bl      c0018d08 <__csum_partial>
	c04b3e3c:	4b b6 4e cd 	bl      c0018d08 <__csum_partial>
	c04b40d0 <csum_partial>:
	c04b40d0:	4b b6 4c 38 	b       c0018d08 <__csum_partial>
	c04b4448:	4b ff fc 89 	bl      c04b40d0 <csum_partial>
	c04b46f4:	4b ff f9 dd 	bl      c04b40d0 <csum_partial>
	c04bf448:	4b b5 98 c0 	b       c0018d08 <__csum_partial>
	c04c5264:	4b b5 3a a5 	bl      c0018d08 <__csum_partial>
	c04c61e4:	4b b5 2b 25 	bl      c0018d08 <__csum_partial>

gcc10 defines multiple versions of csum_partial() which are just
an unconditionnal branch to __csum_partial().

To enforce inlining of that branch to __csum_partial(),
mark csum_partial() as __always_inline.

With this patch with gcc10:

	c0018d08 <__csum_partial>:
	c0019148:	4b ff fb c0 	b       c0018d08 <__csum_partial>
	c001924c:	4b ff fa bd 	bl      c0018d08 <__csum_partial>
	c03e40ec:	4b c3 4c 1d 	bl      c0018d08 <__csum_partial>
	c03e4120:	4b c3 4b e8 	b       c0018d08 <__csum_partial>
	c03eb004:	4b c2 dd 05 	bl      c0018d08 <__csum_partial>
	c03ecef4:	4b c2 be 15 	bl      c0018d08 <__csum_partial>
	c0427558:	4b bf 17 b1 	bl      c0018d08 <__csum_partial>
	c04286e4:	4b bf 06 25 	bl      c0018d08 <__csum_partial>
	c0428cd8:	4b bf 00 31 	bl      c0018d08 <__csum_partial>
	c0428d84:	4b be ff 85 	bl      c0018d08 <__csum_partial>
	c045a17c:	4b bb eb 8d 	bl      c0018d08 <__csum_partial>
	c0489450:	4b b8 f8 b9 	bl      c0018d08 <__csum_partial>
	c0491860:	4b b8 74 a9 	bl      c0018d08 <__csum_partial>
	c0492eec:	4b b8 5e 1d 	bl      c0018d08 <__csum_partial>
	c04a0eac:	4b b7 7e 5d 	bl      c0018d08 <__csum_partial>
	c04b3e34:	4b b6 4e d5 	bl      c0018d08 <__csum_partial>
	c04b426c:	4b b6 4a 9d 	bl      c0018d08 <__csum_partial>
	c04b463c:	4b b6 46 cd 	bl      c0018d08 <__csum_partial>
	c04c004c:	4b b5 8c bd 	bl      c0018d08 <__csum_partial>
	c04c0368:	4b b5 89 a1 	bl      c0018d08 <__csum_partial>
	c04c5254:	4b b5 3a b5 	bl      c0018d08 <__csum_partial>
	c04c60d4:	4b b5 2c 35 	bl      c0018d08 <__csum_partial>

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/include/asm/checksum.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/checksum.h b/arch/powerpc/include/asm/checksum.h
index 9cce06194dcc..cab8fec60005 100644
--- a/arch/powerpc/include/asm/checksum.h
+++ b/arch/powerpc/include/asm/checksum.h
@@ -164,7 +164,7 @@ static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
  */
 __wsum __csum_partial(const void *buff, int len, __wsum sum);
 
-static inline __wsum csum_partial(const void *buff, int len, __wsum sum)
+static __always_inline __wsum csum_partial(const void *buff, int len, __wsum sum)
 {
 	if (__builtin_constant_p(len) && len <= 16 && (len & 1) == 0) {
 		if (len == 2)
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] powerpc: force inlining of csum_partial() to avoid multiple csum_partial() with GCC10
  2020-10-15 10:52 [PATCH] powerpc: force inlining of csum_partial() to avoid multiple csum_partial() with GCC10 Christophe Leroy
@ 2020-10-15 13:25 ` Segher Boessenkool
  2020-10-15 13:32   ` Christophe Leroy
  2020-10-15 13:59   ` Christophe Leroy
  2020-12-21 11:03 ` Michael Ellerman
  1 sibling, 2 replies; 5+ messages in thread
From: Segher Boessenkool @ 2020-10-15 13:25 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linuxppc-dev, linux-kernel

Hi!

On Thu, Oct 15, 2020 at 10:52:20AM +0000, Christophe Leroy wrote:
> With gcc9 I get:

<snip>

> With gcc10 I get:

<snip>

> gcc10 defines multiple versions of csum_partial() which are just
> an unconditionnal branch to __csum_partial().

It doesn't inline it, yes.

Could you open a GCC PR for this please?

> To enforce inlining of that branch to __csum_partial(),
> mark csum_partial() as __always_inline.

That should be fine of course, but it would be good if we could fix this
in the compiler :-)

Reviewed-by: Segher Boessenkool  <segher@kernel.crashing.org>


Segher

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] powerpc: force inlining of csum_partial() to avoid multiple csum_partial() with GCC10
  2020-10-15 13:25 ` Segher Boessenkool
@ 2020-10-15 13:32   ` Christophe Leroy
  2020-10-15 13:59   ` Christophe Leroy
  1 sibling, 0 replies; 5+ messages in thread
From: Christophe Leroy @ 2020-10-15 13:32 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linuxppc-dev, linux-kernel



Le 15/10/2020 à 15:25, Segher Boessenkool a écrit :
> Hi!
> 
> On Thu, Oct 15, 2020 at 10:52:20AM +0000, Christophe Leroy wrote:
>> With gcc9 I get:
> 
> <snip>
> 
>> With gcc10 I get:
> 
> <snip>
> 
>> gcc10 defines multiple versions of csum_partial() which are just
>> an unconditionnal branch to __csum_partial().
> 
> It doesn't inline it, yes.
> 
> Could you open a GCC PR for this please?

Sure.

I also have get_order() 75 times in my vmlinux, all the same as the following:

c0016790 <get_order>:
c0016790:	38 63 ff ff 	addi    r3,r3,-1
c0016794:	54 63 a3 3e 	rlwinm  r3,r3,20,12,31
c0016798:	7c 63 00 34 	cntlzw  r3,r3
c001679c:	20 63 00 20 	subfic  r3,r3,32
c00167a0:	4e 80 00 20 	blr

Christophe

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] powerpc: force inlining of csum_partial() to avoid multiple csum_partial() with GCC10
  2020-10-15 13:25 ` Segher Boessenkool
  2020-10-15 13:32   ` Christophe Leroy
@ 2020-10-15 13:59   ` Christophe Leroy
  1 sibling, 0 replies; 5+ messages in thread
From: Christophe Leroy @ 2020-10-15 13:59 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linuxppc-dev, linux-kernel



Le 15/10/2020 à 15:25, Segher Boessenkool a écrit :
> Hi!
> 
> On Thu, Oct 15, 2020 at 10:52:20AM +0000, Christophe Leroy wrote:
>> With gcc9 I get:
> 
> <snip>
> 
>> With gcc10 I get:
> 
> <snip>
> 
>> gcc10 defines multiple versions of csum_partial() which are just
>> an unconditionnal branch to __csum_partial().
> 
> It doesn't inline it, yes.
> 
> Could you open a GCC PR for this please?
> 

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445

Christophe

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] powerpc: force inlining of csum_partial() to avoid multiple csum_partial() with GCC10
  2020-10-15 10:52 [PATCH] powerpc: force inlining of csum_partial() to avoid multiple csum_partial() with GCC10 Christophe Leroy
  2020-10-15 13:25 ` Segher Boessenkool
@ 2020-12-21 11:03 ` Michael Ellerman
  1 sibling, 0 replies; 5+ messages in thread
From: Michael Ellerman @ 2020-12-21 11:03 UTC (permalink / raw)
  To: Christophe Leroy, Paul Mackerras, Benjamin Herrenschmidt,
	Michael Ellerman
  Cc: linuxppc-dev, linux-kernel

On Thu, 15 Oct 2020 10:52:20 +0000 (UTC), Christophe Leroy wrote:
> 	ppc-linux-objdump -d vmlinux | grep -e "<csum_partial>" -e "<__csum_partial>"
> 
> With gcc9 I get:
> 
> 	c0017ef8 <__csum_partial>:
> 	c00182fc:	4b ff fb fd 	bl      c0017ef8 <__csum_partial>
> 	c0018478:	4b ff fa 80 	b       c0017ef8 <__csum_partial>
> 	c03e8458:	4b c2 fa a0 	b       c0017ef8 <__csum_partial>
> 	c03e8518:	4b c2 f9 e1 	bl      c0017ef8 <__csum_partial>
> 	c03ef410:	4b c2 8a e9 	bl      c0017ef8 <__csum_partial>
> 	c03f0b24:	4b c2 73 d5 	bl      c0017ef8 <__csum_partial>
> 	c04279a4:	4b bf 05 55 	bl      c0017ef8 <__csum_partial>
> 	c0429820:	4b be e6 d9 	bl      c0017ef8 <__csum_partial>
> 	c0429944:	4b be e5 b5 	bl      c0017ef8 <__csum_partial>
> 	c042b478:	4b be ca 81 	bl      c0017ef8 <__csum_partial>
> 	c042b554:	4b be c9 a5 	bl      c0017ef8 <__csum_partial>
> 	c045f15c:	4b bb 8d 9d 	bl      c0017ef8 <__csum_partial>
> 	c0492190:	4b b8 5d 69 	bl      c0017ef8 <__csum_partial>
> 	c0492310:	4b b8 5b e9 	bl      c0017ef8 <__csum_partial>
> 	c0495594:	4b b8 29 65 	bl      c0017ef8 <__csum_partial>
> 	c049c420:	4b b7 ba d9 	bl      c0017ef8 <__csum_partial>
> 	c049c870:	4b b7 b6 89 	bl      c0017ef8 <__csum_partial>
> 	c049c930:	4b b7 b5 c9 	bl      c0017ef8 <__csum_partial>
> 	c04a9ca0:	4b b6 e2 59 	bl      c0017ef8 <__csum_partial>
> 	c04bdde4:	4b b5 a1 15 	bl      c0017ef8 <__csum_partial>
> 	c04be480:	4b b5 9a 79 	bl      c0017ef8 <__csum_partial>
> 	c04be710:	4b b5 97 e9 	bl      c0017ef8 <__csum_partial>
> 	c04c969c:	4b b4 e8 5d 	bl      c0017ef8 <__csum_partial>
> 	c04ca2fc:	4b b4 db fd 	bl      c0017ef8 <__csum_partial>
> 	c04cf5bc:	4b b4 89 3d 	bl      c0017ef8 <__csum_partial>
> 	c04d0440:	4b b4 7a b9 	bl      c0017ef8 <__csum_partial>
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc: force inlining of csum_partial() to avoid multiple csum_partial() with GCC10
      https://git.kernel.org/powerpc/c/328e7e487a464aad024fbde6663b7859df082b7b

cheers

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-12-21 11:04 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-15 10:52 [PATCH] powerpc: force inlining of csum_partial() to avoid multiple csum_partial() with GCC10 Christophe Leroy
2020-10-15 13:25 ` Segher Boessenkool
2020-10-15 13:32   ` Christophe Leroy
2020-10-15 13:59   ` Christophe Leroy
2020-12-21 11:03 ` Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).