* [PATCH] arm64: csum: Optimise IPv6 header checksum
@ 2020-01-20 18:52 Robin Murphy
2020-01-21 10:34 ` Will Deacon
2020-03-09 18:09 ` Catalin Marinas
0 siblings, 2 replies; 5+ messages in thread
From: Robin Murphy @ 2020-01-20 18:52 UTC (permalink / raw)
To: will, catalin.marinas; +Cc: linux-arm-kernel
Throwing our __uint128_t idioms at csum_ipv6_magic() makes it
about 1.3x-2x faster across a range of microarchitecture/compiler
combinations. Not much in absolute terms, but every little helps.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
Before I move on, this seemed like it might be worth touching as well,
comparing what other architectures do.
arch/arm64/include/asm/checksum.h | 7 ++++++-
arch/arm64/lib/csum.c | 27 +++++++++++++++++++++++++++
2 files changed, 33 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/checksum.h b/arch/arm64/include/asm/checksum.h
index 8d2a7de39744..b6f7bc6da5fb 100644
--- a/arch/arm64/include/asm/checksum.h
+++ b/arch/arm64/include/asm/checksum.h
@@ -5,7 +5,12 @@
#ifndef __ASM_CHECKSUM_H
#define __ASM_CHECKSUM_H
-#include <linux/types.h>
+#include <linux/in6.h>
+
+#define _HAVE_ARCH_IPV6_CSUM
+__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
+ const struct in6_addr *daddr,
+ __u32 len, __u8 proto, __wsum sum);
static inline __sum16 csum_fold(__wsum csum)
{
diff --git a/arch/arm64/lib/csum.c b/arch/arm64/lib/csum.c
index 847eb725ce09..4a522e45f23b 100644
--- a/arch/arm64/lib/csum.c
+++ b/arch/arm64/lib/csum.c
@@ -121,3 +121,30 @@ unsigned int do_csum(const unsigned char *buff, int len)
return sum >> 16;
}
+
+__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
+ const struct in6_addr *daddr,
+ __u32 len, __u8 proto, __wsum csum)
+{
+ __uint128_t src, dst;
+ u64 sum = (__force u64)csum;
+
+ src = *(const __uint128_t *)saddr->s6_addr;
+ dst = *(const __uint128_t *)daddr->s6_addr;
+
+ sum += (__force u32)htonl(len);
+#ifdef __LITTLE_ENDIAN
+ sum += (u32)proto << 24;
+#else
+ sum += proto;
+#endif
+ src += (src >> 64) | (src << 64);
+ dst += (dst >> 64) | (dst << 64);
+
+ sum = accumulate(sum, src >> 64);
+ sum = accumulate(sum, dst >> 64);
+
+ sum += ((sum >> 32) | (sum << 32));
+ return csum_fold((__force __wsum)(sum >> 32));
+}
+EXPORT_SYMBOL(csum_ipv6_magic);
--
2.23.0.dirty
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] arm64: csum: Optimise IPv6 header checksum
2020-01-20 18:52 [PATCH] arm64: csum: Optimise IPv6 header checksum Robin Murphy
@ 2020-01-21 10:34 ` Will Deacon
2020-02-03 9:29 ` Shaokun Zhang
2020-03-09 18:09 ` Catalin Marinas
1 sibling, 1 reply; 5+ messages in thread
From: Will Deacon @ 2020-01-21 10:34 UTC (permalink / raw)
To: Robin Murphy
Cc: zhangshaokun, catalin.marinas, linux-arm-kernel, huanglingyan2
[+ Shaokun and Lingyan for review and testing feedback]
On Mon, Jan 20, 2020 at 06:52:29PM +0000, Robin Murphy wrote:
> Throwing our __uint128_t idioms at csum_ipv6_magic() makes it
> about 1.3x-2x faster across a range of microarchitecture/compiler
> combinations. Not much in absolute terms, but every little helps.
>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>
> Before I move on, this seemed like it might be worth touching as well,
> comparing what other architectures do.
>
> arch/arm64/include/asm/checksum.h | 7 ++++++-
> arch/arm64/lib/csum.c | 27 +++++++++++++++++++++++++++
> 2 files changed, 33 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/checksum.h b/arch/arm64/include/asm/checksum.h
> index 8d2a7de39744..b6f7bc6da5fb 100644
> --- a/arch/arm64/include/asm/checksum.h
> +++ b/arch/arm64/include/asm/checksum.h
> @@ -5,7 +5,12 @@
> #ifndef __ASM_CHECKSUM_H
> #define __ASM_CHECKSUM_H
>
> -#include <linux/types.h>
> +#include <linux/in6.h>
> +
> +#define _HAVE_ARCH_IPV6_CSUM
> +__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
> + const struct in6_addr *daddr,
> + __u32 len, __u8 proto, __wsum sum);
>
> static inline __sum16 csum_fold(__wsum csum)
> {
> diff --git a/arch/arm64/lib/csum.c b/arch/arm64/lib/csum.c
> index 847eb725ce09..4a522e45f23b 100644
> --- a/arch/arm64/lib/csum.c
> +++ b/arch/arm64/lib/csum.c
> @@ -121,3 +121,30 @@ unsigned int do_csum(const unsigned char *buff, int len)
>
> return sum >> 16;
> }
> +
> +__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
> + const struct in6_addr *daddr,
> + __u32 len, __u8 proto, __wsum csum)
> +{
> + __uint128_t src, dst;
> + u64 sum = (__force u64)csum;
> +
> + src = *(const __uint128_t *)saddr->s6_addr;
> + dst = *(const __uint128_t *)daddr->s6_addr;
> +
> + sum += (__force u32)htonl(len);
> +#ifdef __LITTLE_ENDIAN
> + sum += (u32)proto << 24;
> +#else
> + sum += proto;
> +#endif
> + src += (src >> 64) | (src << 64);
> + dst += (dst >> 64) | (dst << 64);
> +
> + sum = accumulate(sum, src >> 64);
> + sum = accumulate(sum, dst >> 64);
> +
> + sum += ((sum >> 32) | (sum << 32));
> + return csum_fold((__force __wsum)(sum >> 32));
> +}
> +EXPORT_SYMBOL(csum_ipv6_magic);
> --
> 2.23.0.dirty
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] arm64: csum: Optimise IPv6 header checksum
2020-01-21 10:34 ` Will Deacon
@ 2020-02-03 9:29 ` Shaokun Zhang
2020-02-11 8:35 ` Chen Zhou
0 siblings, 1 reply; 5+ messages in thread
From: Shaokun Zhang @ 2020-02-03 9:29 UTC (permalink / raw)
To: Will Deacon, Robin Murphy
Cc: catalin.marinas, linux-arm-kernel, huanglingyan2
Hi Will/Robin,
My apologies for the slow reply because of the Spring Festival in China.
Robin's idea sounds nice, We will test it later because our machine
broke down.
Thanks,
Shaokun
On 2020/1/21 18:34, Will Deacon wrote:
> [+ Shaokun and Lingyan for review and testing feedback]
>
> On Mon, Jan 20, 2020 at 06:52:29PM +0000, Robin Murphy wrote:
>> Throwing our __uint128_t idioms at csum_ipv6_magic() makes it
>> about 1.3x-2x faster across a range of microarchitecture/compiler
>> combinations. Not much in absolute terms, but every little helps.
>>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>>
>> Before I move on, this seemed like it might be worth touching as well,
>> comparing what other architectures do.
>>
>> arch/arm64/include/asm/checksum.h | 7 ++++++-
>> arch/arm64/lib/csum.c | 27 +++++++++++++++++++++++++++
>> 2 files changed, 33 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/include/asm/checksum.h b/arch/arm64/include/asm/checksum.h
>> index 8d2a7de39744..b6f7bc6da5fb 100644
>> --- a/arch/arm64/include/asm/checksum.h
>> +++ b/arch/arm64/include/asm/checksum.h
>> @@ -5,7 +5,12 @@
>> #ifndef __ASM_CHECKSUM_H
>> #define __ASM_CHECKSUM_H
>>
>> -#include <linux/types.h>
>> +#include <linux/in6.h>
>> +
>> +#define _HAVE_ARCH_IPV6_CSUM
>> +__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
>> + const struct in6_addr *daddr,
>> + __u32 len, __u8 proto, __wsum sum);
>>
>> static inline __sum16 csum_fold(__wsum csum)
>> {
>> diff --git a/arch/arm64/lib/csum.c b/arch/arm64/lib/csum.c
>> index 847eb725ce09..4a522e45f23b 100644
>> --- a/arch/arm64/lib/csum.c
>> +++ b/arch/arm64/lib/csum.c
>> @@ -121,3 +121,30 @@ unsigned int do_csum(const unsigned char *buff, int len)
>>
>> return sum >> 16;
>> }
>> +
>> +__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
>> + const struct in6_addr *daddr,
>> + __u32 len, __u8 proto, __wsum csum)
>> +{
>> + __uint128_t src, dst;
>> + u64 sum = (__force u64)csum;
>> +
>> + src = *(const __uint128_t *)saddr->s6_addr;
>> + dst = *(const __uint128_t *)daddr->s6_addr;
>> +
>> + sum += (__force u32)htonl(len);
>> +#ifdef __LITTLE_ENDIAN
>> + sum += (u32)proto << 24;
>> +#else
>> + sum += proto;
>> +#endif
>> + src += (src >> 64) | (src << 64);
>> + dst += (dst >> 64) | (dst << 64);
>> +
>> + sum = accumulate(sum, src >> 64);
>> + sum = accumulate(sum, dst >> 64);
>> +
>> + sum += ((sum >> 32) | (sum << 32));
>> + return csum_fold((__force __wsum)(sum >> 32));
>> +}
>> +EXPORT_SYMBOL(csum_ipv6_magic);
>> --
>> 2.23.0.dirty
>>
>
> .
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] arm64: csum: Optimise IPv6 header checksum
2020-02-03 9:29 ` Shaokun Zhang
@ 2020-02-11 8:35 ` Chen Zhou
0 siblings, 0 replies; 5+ messages in thread
From: Chen Zhou @ 2020-02-11 8:35 UTC (permalink / raw)
To: Shaokun Zhang, Will Deacon, Robin Murphy
Cc: wxf.wang, catalin.marinas, Hanjun Guo, linux-arm-kernel, huanglingyan2
Hi Will/Robin/Shaokun,
Shaokun's machine broken down, so i tested it.
On KunPeng920 board, the optimised ipv6 header checksum can get
about 1.2 times performance gain and my gcc version is 7.3.0.
Thanks,
Chen Zhou
On 2020/2/3 17:29, Shaokun Zhang wrote:
> Hi Will/Robin,
>
> My apologies for the slow reply because of the Spring Festival in China.
>
> Robin's idea sounds nice, We will test it later because our machine
> broke down.
>
> Thanks,
> Shaokun
>
> On 2020/1/21 18:34, Will Deacon wrote:
>> [+ Shaokun and Lingyan for review and testing feedback]
>>
>> On Mon, Jan 20, 2020 at 06:52:29PM +0000, Robin Murphy wrote:
>>> Throwing our __uint128_t idioms at csum_ipv6_magic() makes it
>>> about 1.3x-2x faster across a range of microarchitecture/compiler
>>> combinations. Not much in absolute terms, but every little helps.
>>>
>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>>> ---
>>>
>>> Before I move on, this seemed like it might be worth touching as well,
>>> comparing what other architectures do.
>>>
>>> arch/arm64/include/asm/checksum.h | 7 ++++++-
>>> arch/arm64/lib/csum.c | 27 +++++++++++++++++++++++++++
>>> 2 files changed, 33 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/arm64/include/asm/checksum.h b/arch/arm64/include/asm/checksum.h
>>> index 8d2a7de39744..b6f7bc6da5fb 100644
>>> --- a/arch/arm64/include/asm/checksum.h
>>> +++ b/arch/arm64/include/asm/checksum.h
>>> @@ -5,7 +5,12 @@
>>> #ifndef __ASM_CHECKSUM_H
>>> #define __ASM_CHECKSUM_H
>>>
>>> -#include <linux/types.h>
>>> +#include <linux/in6.h>
>>> +
>>> +#define _HAVE_ARCH_IPV6_CSUM
>>> +__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
>>> + const struct in6_addr *daddr,
>>> + __u32 len, __u8 proto, __wsum sum);
>>>
>>> static inline __sum16 csum_fold(__wsum csum)
>>> {
>>> diff --git a/arch/arm64/lib/csum.c b/arch/arm64/lib/csum.c
>>> index 847eb725ce09..4a522e45f23b 100644
>>> --- a/arch/arm64/lib/csum.c
>>> +++ b/arch/arm64/lib/csum.c
>>> @@ -121,3 +121,30 @@ unsigned int do_csum(const unsigned char *buff, int len)
>>>
>>> return sum >> 16;
>>> }
>>> +
>>> +__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
>>> + const struct in6_addr *daddr,
>>> + __u32 len, __u8 proto, __wsum csum)
>>> +{
>>> + __uint128_t src, dst;
>>> + u64 sum = (__force u64)csum;
>>> +
>>> + src = *(const __uint128_t *)saddr->s6_addr;
>>> + dst = *(const __uint128_t *)daddr->s6_addr;
>>> +
>>> + sum += (__force u32)htonl(len);
>>> +#ifdef __LITTLE_ENDIAN
>>> + sum += (u32)proto << 24;
>>> +#else
>>> + sum += proto;
>>> +#endif
>>> + src += (src >> 64) | (src << 64);
>>> + dst += (dst >> 64) | (dst << 64);
>>> +
>>> + sum = accumulate(sum, src >> 64);
>>> + sum = accumulate(sum, dst >> 64);
>>> +
>>> + sum += ((sum >> 32) | (sum << 32));
>>> + return csum_fold((__force __wsum)(sum >> 32));
>>> +}
>>> +EXPORT_SYMBOL(csum_ipv6_magic);
>>> --
>>> 2.23.0.dirty
>>>
>>
>> .
>>
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
> .
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] arm64: csum: Optimise IPv6 header checksum
2020-01-20 18:52 [PATCH] arm64: csum: Optimise IPv6 header checksum Robin Murphy
2020-01-21 10:34 ` Will Deacon
@ 2020-03-09 18:09 ` Catalin Marinas
1 sibling, 0 replies; 5+ messages in thread
From: Catalin Marinas @ 2020-03-09 18:09 UTC (permalink / raw)
To: Robin Murphy; +Cc: will, linux-arm-kernel
On Mon, Jan 20, 2020 at 06:52:29PM +0000, Robin Murphy wrote:
> Throwing our __uint128_t idioms at csum_ipv6_magic() makes it
> about 1.3x-2x faster across a range of microarchitecture/compiler
> combinations. Not much in absolute terms, but every little helps.
>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Queued for 5.7. Thanks.
--
Catalin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-03-09 18:09 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-20 18:52 [PATCH] arm64: csum: Optimise IPv6 header checksum Robin Murphy
2020-01-21 10:34 ` Will Deacon
2020-02-03 9:29 ` Shaokun Zhang
2020-02-11 8:35 ` Chen Zhou
2020-03-09 18:09 ` Catalin Marinas
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.