linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] net: Allow csum_sub() to be provided in arch
@ 2022-02-11 10:24 Christophe Leroy
  2022-02-11 10:24 ` [PATCH 2/2] powerpc/32: Implement csum_sub Christophe Leroy
  0 siblings, 1 reply; 4+ messages in thread
From: Christophe Leroy @ 2022-02-11 10:24 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	David S. Miller, Jakub Kicinski
  Cc: netdev, linuxppc-dev, linux-kernel

In the same spirit as commit 07064c6e022b ("net: Allow csum_add to be
provided in arch"), allow csum_sub() to be provided by arch.

The generic implementation of csum_sub() call csum_add() with the
complement of the addendum.

Some architectures can do it directly.

This will also avoid getting several copies of csum_sub() outlined
when building with -Os.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 include/net/checksum.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/net/checksum.h b/include/net/checksum.h
index 9badcd5532ef..735d98724145 100644
--- a/include/net/checksum.h
+++ b/include/net/checksum.h
@@ -62,10 +62,12 @@ static inline __wsum csum_add(__wsum csum, __wsum addend)
 }
 #endif
 
+#ifndef HAVE_ARCH_CSUM_SUB
 static inline __wsum csum_sub(__wsum csum, __wsum addend)
 {
 	return csum_add(csum, ~addend);
 }
+#endif
 
 static inline __sum16 csum16_add(__sum16 csum, __be16 addend)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] powerpc/32: Implement csum_sub
  2022-02-11 10:24 [PATCH 1/2] net: Allow csum_sub() to be provided in arch Christophe Leroy
@ 2022-02-11 10:24 ` Christophe Leroy
  2022-02-13  3:01   ` David Laight
  0 siblings, 1 reply; 4+ messages in thread
From: Christophe Leroy @ 2022-02-11 10:24 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	David S. Miller, Jakub Kicinski
  Cc: netdev, linuxppc-dev, linux-kernel

When building kernel with CONFIG_CC_OPTIMISE_FOR_SIZE, several
copies of csum_sub() are generated, with the following code:

	00000170 <csum_sub>:
	     170:	7c 84 20 f8 	not     r4,r4
	     174:	7c 63 20 14 	addc    r3,r3,r4
	     178:	7c 63 01 94 	addze   r3,r3
	     17c:	4e 80 00 20 	blr

Let's define a PPC32 version with subc/addme, and for it's inlining.

It will return 0 instead of 0xffffffff when subtracting 0x80000000 to itself,
this is not an issue as 0 and ~0 are equivalent, refer to RFC 1624.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/include/asm/checksum.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/powerpc/include/asm/checksum.h b/arch/powerpc/include/asm/checksum.h
index 350de8f90250..3288a1bf5e8d 100644
--- a/arch/powerpc/include/asm/checksum.h
+++ b/arch/powerpc/include/asm/checksum.h
@@ -112,6 +112,22 @@ static __always_inline __wsum csum_add(__wsum csum, __wsum addend)
 #endif
 }
 
+#ifdef CONFIG_PPC32
+#define HAVE_ARCH_CSUM_SUB
+static __always_inline __wsum csum_sub(__wsum csum, __wsum addend)
+{
+	if (__builtin_constant_p(csum) && (csum == 0 || csum == ~0))
+		return ~addend;
+	if (__builtin_constant_p(addend) && (addend == 0 || addend == ~0))
+		return csum;
+
+	asm("subc %0,%0,%1;"
+	    "addme %0,%0;"
+	    : "+r" (csum) : "r" (addend) : "xer");
+	return csum;
+}
+#endif
+
 /*
  * This is a version of ip_compute_csum() optimized for IP headers,
  * which always checksum on 4 octet boundaries.  ihl is the number
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* RE: [PATCH 2/2] powerpc/32: Implement csum_sub
  2022-02-11 10:24 ` [PATCH 2/2] powerpc/32: Implement csum_sub Christophe Leroy
@ 2022-02-13  3:01   ` David Laight
  2022-02-17 10:13     ` Christophe Leroy
  0 siblings, 1 reply; 4+ messages in thread
From: David Laight @ 2022-02-13  3:01 UTC (permalink / raw)
  To: 'Christophe Leroy',
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	David S. Miller, Jakub Kicinski
  Cc: netdev, linuxppc-dev, linux-kernel

From: Christophe Leroy
> Sent: 11 February 2022 10:25
> 
> When building kernel with CONFIG_CC_OPTIMISE_FOR_SIZE, several
> copies of csum_sub() are generated, with the following code:
> 
> 	00000170 <csum_sub>:
> 	     170:	7c 84 20 f8 	not     r4,r4
> 	     174:	7c 63 20 14 	addc    r3,r3,r4
> 	     178:	7c 63 01 94 	addze   r3,r3
> 	     17c:	4e 80 00 20 	blr
> 
> Let's define a PPC32 version with subc/addme, and for it's inlining.
> 
> It will return 0 instead of 0xffffffff when subtracting 0x80000000 to itself,
> this is not an issue as 0 and ~0 are equivalent, refer to RFC 1624.

They are not always equivalent.
In particular in the UDP checksum field one of them is (0?) 'checksum not calculated'.

I think all the Linux functions have to return a non-zero value (for non-zero input).

If the csum is going to be converted to 16 bit, inverted, and put into a packet
the code usually has to have a check that changes 0 to 0xffff.
However if the csum functions guarantee never to return zero they can feed
an extra 1 into the first csum_partial() then just invert and add 1 at the end.
Because (~csum_partion(buffer, 1) + 1) is the same as ~csum_partial(buffer, 0)
except when the buffer's csum is 0xffffffff.

I did do some experiments and the 64bit value can be reduced directly to
16bits using '% 0xffff'.
This is different because it returns 0 not 0xffff.
However gcc 'randomly' picks between the fast 'multiply by reciprocal'
and slow divide instruction paths.
The former is (probably) faster than reducing using shifts and adc.
The latter definitely slower.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2] powerpc/32: Implement csum_sub
  2022-02-13  3:01   ` David Laight
@ 2022-02-17 10:13     ` Christophe Leroy
  0 siblings, 0 replies; 4+ messages in thread
From: Christophe Leroy @ 2022-02-17 10:13 UTC (permalink / raw)
  To: David Laight, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, David S. Miller, Jakub Kicinski
  Cc: netdev, linuxppc-dev, linux-kernel



Le 13/02/2022 à 04:01, David Laight a écrit :
> From: Christophe Leroy
>> Sent: 11 February 2022 10:25
>>
>> When building kernel with CONFIG_CC_OPTIMISE_FOR_SIZE, several
>> copies of csum_sub() are generated, with the following code:
>>
>> 	00000170 <csum_sub>:
>> 	     170:	7c 84 20 f8 	not     r4,r4
>> 	     174:	7c 63 20 14 	addc    r3,r3,r4
>> 	     178:	7c 63 01 94 	addze   r3,r3
>> 	     17c:	4e 80 00 20 	blr
>>
>> Let's define a PPC32 version with subc/addme, and for it's inlining.
>>
>> It will return 0 instead of 0xffffffff when subtracting 0x80000000 to itself,
>> this is not an issue as 0 and ~0 are equivalent, refer to RFC 1624.
> 
> They are not always equivalent.
> In particular in the UDP checksum field one of them is (0?) 'checksum not calculated'.
> 
> I think all the Linux functions have to return a non-zero value (for non-zero input).
> 
> If the csum is going to be converted to 16 bit, inverted, and put into a packet
> the code usually has to have a check that changes 0 to 0xffff.
> However if the csum functions guarantee never to return zero they can feed
> an extra 1 into the first csum_partial() then just invert and add 1 at the end.
> Because (~csum_partion(buffer, 1) + 1) is the same as ~csum_partial(buffer, 0)
> except when the buffer's csum is 0xffffffff.
> 
> I did do some experiments and the 64bit value can be reduced directly to
> 16bits using '% 0xffff'.
> This is different because it returns 0 not 0xffff.
> However gcc 'randomly' picks between the fast 'multiply by reciprocal'
> and slow divide instruction paths.
> The former is (probably) faster than reducing using shifts and adc.
> The latter definitely slower.
> 

Ok, I submitted a patch to force inlining of all checksum helpers in 
net/checksum.h instead.

Christophe

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-02-17 10:14 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-11 10:24 [PATCH 1/2] net: Allow csum_sub() to be provided in arch Christophe Leroy
2022-02-11 10:24 ` [PATCH 2/2] powerpc/32: Implement csum_sub Christophe Leroy
2022-02-13  3:01   ` David Laight
2022-02-17 10:13     ` Christophe Leroy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).