All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] arm64: csum: Optimise IPv6 header checksum
@ 2020-01-20 18:52 Robin Murphy
  2020-01-21 10:34 ` Will Deacon
  2020-03-09 18:09 ` Catalin Marinas
  0 siblings, 2 replies; 5+ messages in thread
From: Robin Murphy @ 2020-01-20 18:52 UTC (permalink / raw)
  To: will, catalin.marinas; +Cc: linux-arm-kernel

Throwing our __uint128_t idioms at csum_ipv6_magic() makes it
about 1.3x-2x faster across a range of microarchitecture/compiler
combinations. Not much in absolute terms, but every little helps.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---

Before I move on, this seemed like it might be worth touching as well,
comparing what other architectures do.

 arch/arm64/include/asm/checksum.h |  7 ++++++-
 arch/arm64/lib/csum.c             | 27 +++++++++++++++++++++++++++
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/checksum.h b/arch/arm64/include/asm/checksum.h
index 8d2a7de39744..b6f7bc6da5fb 100644
--- a/arch/arm64/include/asm/checksum.h
+++ b/arch/arm64/include/asm/checksum.h
@@ -5,7 +5,12 @@
 #ifndef __ASM_CHECKSUM_H
 #define __ASM_CHECKSUM_H
 
-#include <linux/types.h>
+#include <linux/in6.h>
+
+#define _HAVE_ARCH_IPV6_CSUM
+__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
+			const struct in6_addr *daddr,
+			__u32 len, __u8 proto, __wsum sum);
 
 static inline __sum16 csum_fold(__wsum csum)
 {
diff --git a/arch/arm64/lib/csum.c b/arch/arm64/lib/csum.c
index 847eb725ce09..4a522e45f23b 100644
--- a/arch/arm64/lib/csum.c
+++ b/arch/arm64/lib/csum.c
@@ -121,3 +121,30 @@ unsigned int do_csum(const unsigned char *buff, int len)
 
 	return sum >> 16;
 }
+
+__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
+			const struct in6_addr *daddr,
+			__u32 len, __u8 proto, __wsum csum)
+{
+	__uint128_t src, dst;
+	u64 sum = (__force u64)csum;
+
+	src = *(const __uint128_t *)saddr->s6_addr;
+	dst = *(const __uint128_t *)daddr->s6_addr;
+
+	sum += (__force u32)htonl(len);
+#ifdef __LITTLE_ENDIAN
+	sum += (u32)proto << 24;
+#else
+	sum += proto;
+#endif
+	src += (src >> 64) | (src << 64);
+	dst += (dst >> 64) | (dst << 64);
+
+	sum = accumulate(sum, src >> 64);
+	sum = accumulate(sum, dst >> 64);
+
+	sum += ((sum >> 32) | (sum << 32));
+	return csum_fold((__force __wsum)(sum >> 32));
+}
+EXPORT_SYMBOL(csum_ipv6_magic);
-- 
2.23.0.dirty


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] arm64: csum: Optimise IPv6 header checksum
  2020-01-20 18:52 [PATCH] arm64: csum: Optimise IPv6 header checksum Robin Murphy
@ 2020-01-21 10:34 ` Will Deacon
  2020-02-03  9:29   ` Shaokun Zhang
  2020-03-09 18:09 ` Catalin Marinas
  1 sibling, 1 reply; 5+ messages in thread
From: Will Deacon @ 2020-01-21 10:34 UTC (permalink / raw)
  To: Robin Murphy
  Cc: zhangshaokun, catalin.marinas, linux-arm-kernel, huanglingyan2

[+ Shaokun and Lingyan for review and testing feedback]

On Mon, Jan 20, 2020 at 06:52:29PM +0000, Robin Murphy wrote:
> Throwing our __uint128_t idioms at csum_ipv6_magic() makes it
> about 1.3x-2x faster across a range of microarchitecture/compiler
> combinations. Not much in absolute terms, but every little helps.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
> 
> Before I move on, this seemed like it might be worth touching as well,
> comparing what other architectures do.
> 
>  arch/arm64/include/asm/checksum.h |  7 ++++++-
>  arch/arm64/lib/csum.c             | 27 +++++++++++++++++++++++++++
>  2 files changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/checksum.h b/arch/arm64/include/asm/checksum.h
> index 8d2a7de39744..b6f7bc6da5fb 100644
> --- a/arch/arm64/include/asm/checksum.h
> +++ b/arch/arm64/include/asm/checksum.h
> @@ -5,7 +5,12 @@
>  #ifndef __ASM_CHECKSUM_H
>  #define __ASM_CHECKSUM_H
>  
> -#include <linux/types.h>
> +#include <linux/in6.h>
> +
> +#define _HAVE_ARCH_IPV6_CSUM
> +__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
> +			const struct in6_addr *daddr,
> +			__u32 len, __u8 proto, __wsum sum);
>  
>  static inline __sum16 csum_fold(__wsum csum)
>  {
> diff --git a/arch/arm64/lib/csum.c b/arch/arm64/lib/csum.c
> index 847eb725ce09..4a522e45f23b 100644
> --- a/arch/arm64/lib/csum.c
> +++ b/arch/arm64/lib/csum.c
> @@ -121,3 +121,30 @@ unsigned int do_csum(const unsigned char *buff, int len)
>  
>  	return sum >> 16;
>  }
> +
> +__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
> +			const struct in6_addr *daddr,
> +			__u32 len, __u8 proto, __wsum csum)
> +{
> +	__uint128_t src, dst;
> +	u64 sum = (__force u64)csum;
> +
> +	src = *(const __uint128_t *)saddr->s6_addr;
> +	dst = *(const __uint128_t *)daddr->s6_addr;
> +
> +	sum += (__force u32)htonl(len);
> +#ifdef __LITTLE_ENDIAN
> +	sum += (u32)proto << 24;
> +#else
> +	sum += proto;
> +#endif
> +	src += (src >> 64) | (src << 64);
> +	dst += (dst >> 64) | (dst << 64);
> +
> +	sum = accumulate(sum, src >> 64);
> +	sum = accumulate(sum, dst >> 64);
> +
> +	sum += ((sum >> 32) | (sum << 32));
> +	return csum_fold((__force __wsum)(sum >> 32));
> +}
> +EXPORT_SYMBOL(csum_ipv6_magic);
> -- 
> 2.23.0.dirty
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] arm64: csum: Optimise IPv6 header checksum
  2020-01-21 10:34 ` Will Deacon
@ 2020-02-03  9:29   ` Shaokun Zhang
  2020-02-11  8:35     ` Chen Zhou
  0 siblings, 1 reply; 5+ messages in thread
From: Shaokun Zhang @ 2020-02-03  9:29 UTC (permalink / raw)
  To: Will Deacon, Robin Murphy
  Cc: catalin.marinas, linux-arm-kernel, huanglingyan2

Hi Will/Robin,

My apologies for the slow reply because of the Spring Festival in China.

Robin's idea sounds nice, We will test it later because our machine
broke down.

Thanks,
Shaokun

On 2020/1/21 18:34, Will Deacon wrote:
> [+ Shaokun and Lingyan for review and testing feedback]
> 
> On Mon, Jan 20, 2020 at 06:52:29PM +0000, Robin Murphy wrote:
>> Throwing our __uint128_t idioms at csum_ipv6_magic() makes it
>> about 1.3x-2x faster across a range of microarchitecture/compiler
>> combinations. Not much in absolute terms, but every little helps.
>>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>>
>> Before I move on, this seemed like it might be worth touching as well,
>> comparing what other architectures do.
>>
>>  arch/arm64/include/asm/checksum.h |  7 ++++++-
>>  arch/arm64/lib/csum.c             | 27 +++++++++++++++++++++++++++
>>  2 files changed, 33 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/include/asm/checksum.h b/arch/arm64/include/asm/checksum.h
>> index 8d2a7de39744..b6f7bc6da5fb 100644
>> --- a/arch/arm64/include/asm/checksum.h
>> +++ b/arch/arm64/include/asm/checksum.h
>> @@ -5,7 +5,12 @@
>>  #ifndef __ASM_CHECKSUM_H
>>  #define __ASM_CHECKSUM_H
>>  
>> -#include <linux/types.h>
>> +#include <linux/in6.h>
>> +
>> +#define _HAVE_ARCH_IPV6_CSUM
>> +__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
>> +			const struct in6_addr *daddr,
>> +			__u32 len, __u8 proto, __wsum sum);
>>  
>>  static inline __sum16 csum_fold(__wsum csum)
>>  {
>> diff --git a/arch/arm64/lib/csum.c b/arch/arm64/lib/csum.c
>> index 847eb725ce09..4a522e45f23b 100644
>> --- a/arch/arm64/lib/csum.c
>> +++ b/arch/arm64/lib/csum.c
>> @@ -121,3 +121,30 @@ unsigned int do_csum(const unsigned char *buff, int len)
>>  
>>  	return sum >> 16;
>>  }
>> +
>> +__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
>> +			const struct in6_addr *daddr,
>> +			__u32 len, __u8 proto, __wsum csum)
>> +{
>> +	__uint128_t src, dst;
>> +	u64 sum = (__force u64)csum;
>> +
>> +	src = *(const __uint128_t *)saddr->s6_addr;
>> +	dst = *(const __uint128_t *)daddr->s6_addr;
>> +
>> +	sum += (__force u32)htonl(len);
>> +#ifdef __LITTLE_ENDIAN
>> +	sum += (u32)proto << 24;
>> +#else
>> +	sum += proto;
>> +#endif
>> +	src += (src >> 64) | (src << 64);
>> +	dst += (dst >> 64) | (dst << 64);
>> +
>> +	sum = accumulate(sum, src >> 64);
>> +	sum = accumulate(sum, dst >> 64);
>> +
>> +	sum += ((sum >> 32) | (sum << 32));
>> +	return csum_fold((__force __wsum)(sum >> 32));
>> +}
>> +EXPORT_SYMBOL(csum_ipv6_magic);
>> -- 
>> 2.23.0.dirty
>>
> 
> .
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] arm64: csum: Optimise IPv6 header checksum
  2020-02-03  9:29   ` Shaokun Zhang
@ 2020-02-11  8:35     ` Chen Zhou
  0 siblings, 0 replies; 5+ messages in thread
From: Chen Zhou @ 2020-02-11  8:35 UTC (permalink / raw)
  To: Shaokun Zhang, Will Deacon, Robin Murphy
  Cc: wxf.wang, catalin.marinas, Hanjun Guo, linux-arm-kernel, huanglingyan2

Hi Will/Robin/Shaokun,

Shaokun's machine broken down, so i tested it.

On KunPeng920 board, the optimised ipv6 header checksum can get
about 1.2 times performance gain and my gcc version is 7.3.0.

Thanks,
Chen Zhou

On 2020/2/3 17:29, Shaokun Zhang wrote:
> Hi Will/Robin,
> 
> My apologies for the slow reply because of the Spring Festival in China. 
> 
> Robin's idea sounds nice, We will test it later because our machine
> broke down.
> 
> Thanks,
> Shaokun
> 
> On 2020/1/21 18:34, Will Deacon wrote:
>> [+ Shaokun and Lingyan for review and testing feedback]
>>
>> On Mon, Jan 20, 2020 at 06:52:29PM +0000, Robin Murphy wrote:
>>> Throwing our __uint128_t idioms at csum_ipv6_magic() makes it
>>> about 1.3x-2x faster across a range of microarchitecture/compiler
>>> combinations. Not much in absolute terms, but every little helps.
>>>
>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>>> ---
>>>
>>> Before I move on, this seemed like it might be worth touching as well,
>>> comparing what other architectures do.
>>>
>>>  arch/arm64/include/asm/checksum.h |  7 ++++++-
>>>  arch/arm64/lib/csum.c             | 27 +++++++++++++++++++++++++++
>>>  2 files changed, 33 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/arm64/include/asm/checksum.h b/arch/arm64/include/asm/checksum.h
>>> index 8d2a7de39744..b6f7bc6da5fb 100644
>>> --- a/arch/arm64/include/asm/checksum.h
>>> +++ b/arch/arm64/include/asm/checksum.h
>>> @@ -5,7 +5,12 @@
>>>  #ifndef __ASM_CHECKSUM_H
>>>  #define __ASM_CHECKSUM_H
>>>  
>>> -#include <linux/types.h>
>>> +#include <linux/in6.h>
>>> +
>>> +#define _HAVE_ARCH_IPV6_CSUM
>>> +__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
>>> +			const struct in6_addr *daddr,
>>> +			__u32 len, __u8 proto, __wsum sum);
>>>  
>>>  static inline __sum16 csum_fold(__wsum csum)
>>>  {
>>> diff --git a/arch/arm64/lib/csum.c b/arch/arm64/lib/csum.c
>>> index 847eb725ce09..4a522e45f23b 100644
>>> --- a/arch/arm64/lib/csum.c
>>> +++ b/arch/arm64/lib/csum.c
>>> @@ -121,3 +121,30 @@ unsigned int do_csum(const unsigned char *buff, int len)
>>>  
>>>  	return sum >> 16;
>>>  }
>>> +
>>> +__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
>>> +			const struct in6_addr *daddr,
>>> +			__u32 len, __u8 proto, __wsum csum)
>>> +{
>>> +	__uint128_t src, dst;
>>> +	u64 sum = (__force u64)csum;
>>> +
>>> +	src = *(const __uint128_t *)saddr->s6_addr;
>>> +	dst = *(const __uint128_t *)daddr->s6_addr;
>>> +
>>> +	sum += (__force u32)htonl(len);
>>> +#ifdef __LITTLE_ENDIAN
>>> +	sum += (u32)proto << 24;
>>> +#else
>>> +	sum += proto;
>>> +#endif
>>> +	src += (src >> 64) | (src << 64);
>>> +	dst += (dst >> 64) | (dst << 64);
>>> +
>>> +	sum = accumulate(sum, src >> 64);
>>> +	sum = accumulate(sum, dst >> 64);
>>> +
>>> +	sum += ((sum >> 32) | (sum << 32));
>>> +	return csum_fold((__force __wsum)(sum >> 32));
>>> +}
>>> +EXPORT_SYMBOL(csum_ipv6_magic);
>>> -- 
>>> 2.23.0.dirty
>>>
>>
>> .
>>
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
> .
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] arm64: csum: Optimise IPv6 header checksum
  2020-01-20 18:52 [PATCH] arm64: csum: Optimise IPv6 header checksum Robin Murphy
  2020-01-21 10:34 ` Will Deacon
@ 2020-03-09 18:09 ` Catalin Marinas
  1 sibling, 0 replies; 5+ messages in thread
From: Catalin Marinas @ 2020-03-09 18:09 UTC (permalink / raw)
  To: Robin Murphy; +Cc: will, linux-arm-kernel

On Mon, Jan 20, 2020 at 06:52:29PM +0000, Robin Murphy wrote:
> Throwing our __uint128_t idioms at csum_ipv6_magic() makes it
> about 1.3x-2x faster across a range of microarchitecture/compiler
> combinations. Not much in absolute terms, but every little helps.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>

Queued for 5.7. Thanks.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-03-09 18:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-20 18:52 [PATCH] arm64: csum: Optimise IPv6 header checksum Robin Murphy
2020-01-21 10:34 ` Will Deacon
2020-02-03  9:29   ` Shaokun Zhang
2020-02-11  8:35     ` Chen Zhou
2020-03-09 18:09 ` Catalin Marinas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.