Re: [PATCH v3] arm64: lib: accelerate do_csum with NEON instruction

From: "huanglingyan (A)" <huanglingyan2@huawei.com>
To: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Zhangshaokun <zhangshaokun@hisilicon.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will.deacon@arm.com>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH v3] arm64: lib: accelerate do_csum with NEON instruction
Date: Tue, 12 Feb 2019 10:26:26 +0800	[thread overview]
Message-ID: <d97f1ba1-1b73-1bde-cd8f-de55115acd9e@huawei.com> (raw)
In-Reply-To: <CAKv+Gu-MUDT-pAE4kwHbCsW2MSYBCDB3N1reRgeFL1EwiNQvxQ@mail.gmail.com>

On 2019/1/18 19:14, Ard Biesheuvel wrote:
> On Fri, 18 Jan 2019 at 02:07, huanglingyan (A) <huanglingyan2@huawei.com> wrote:
>>
>> On 2019/1/17 0:46, Will Deacon wrote:
>>> On Wed, Jan 09, 2019 at 10:03:05AM +0800, huanglingyan (A) wrote:
>>>> On 2019/1/8 21:54, Will Deacon wrote:
>>>>> [re-adding Ard and LAKML -- not sure why the headers are so munged]
>>>>>
>>>>> On Mon, Jan 07, 2019 at 10:38:55AM +0800, huanglingyan (A) wrote:
>>>>>> On 2019/1/6 16:26, Ard Biesheuvel wrote:
>>>>>>     Please change this into
>>>>>>
>>>>>>     if (IS_ENABLED(CONFIG_KERNEL_MODE_NEON) &&
>>>>>>         len >= CSUM_NEON_THRESHOLD &&
>>>>>>         may_use_simd()) {
>>>>>>             kernel_neon_begin();
>>>>>>             res = do_csum_neon(buff, len);
>>>>>>             kernel_neon_end();
>>>>>>         }
>>>>>>
>>>>>>     and drop the intermediate do_csum_arm()
>>>>>>
>>>>>>
>>>>>>         +               return do_csum_arm(buff, len);
>>>>>>         +#endif  /* CONFIG_KERNEL_MODE_NEON */
>>>>>>
>>>>>>     No else? What happens if len < CSUM_NEON_THRESHOLD ?
>>>>>>
>>>>>>
>>>>>>         +#undef do_csum
>>>>>>
>>>>>>     Can we drop this?
>>>>>>
>>>>>> Using NEON instructions will bring some costs. The spending maybe introduced
>>>>>> when reservering/restoring
>>>>>> neon registers with kernel_neon_begin()/kernel_neon_end(). Therefore NEON code
>>>>>> is Only used when
>>>>>> the length exceeds CSUM_NEON_THRESHOLD. General do csum() codes in lib/
>>>>>> checksum.c will be used in
>>>>>> shorter length. To achieve this goal, I use the "#undef do_csum" in else clause
>>>>>> to have the oppotunity to
>>>>>> utilize the general codes.
>>>>> I don't think that's how it works :/
>>>>>
>>>>> Before we get deeper into the implementation, please could you justify the
>>>>> need for a CPU-optimised checksum implementation at all? I thought this was
>>>>> usually offloaded to the NIC?
>>>>>
>>>>> Will
>>>>>
>>>>> .
>>>> This problem is introduced when testing Intel x710 network card on my ARM server.
>>>> Ip forward is set for ease of testing. Then send lots of packages to server by Tesgine
>>>> machine and then receive.
>>> In the marketing blurb, that card boasts:
>>>
>>>   `Tx/Rx IP, SCTP, TCP, and UDP checksum offloading (IPv4, IPv6) capabilities'
>>>
>>> so we shouldn't need to run this on the CPU. Again, I'm not keen to optimise
>>> this given that it /really/ shouldn't be used on arm64 machines that care
>>> about network performance.
>>>
>>> Will
>>>
>>> .
>> Yeah, you are right. Checksum is usually done in network card which is told by
>> someone familiar with NIC. However, it may be used in testing scenaries and
>> some primary network cards. I think it's no harm to optimize this code while
>> other ARCHs have their own optimized versions.
> I disagree. If this code path is never exercised, we should not
> include it. We can revisit this decision when there is a use case
> where the checksumming performance is an actual bottleneck.
>
> .
The mainstream network cards has an option to switch the csum pattern.
Users can determine the one who calculate csum, hardware or software.

        ethtool -K eth0 rx-checksum off
        ethtool -K eth0 tx-checksum-ip-generic off

What's more, there's some network features that may cause hardware
checksum not work, like gso ( not so sure). Which means, the software
checksum has its existing meaning.

.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel