linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: "huanglingyan (A)" <huanglingyan2@huawei.com>
To: Will Deacon <will.deacon@arm.com>
Cc: Zhangshaokun <zhangshaokun@hisilicon.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	linux-arm-kernel@lists.infradead.org, ard.biesheuvel@linaro.org
Subject: Re: [PATCH v3] arm64: lib: accelerate do_csum with NEON instruction
Date: Wed, 9 Jan 2019 10:03:05 +0800	[thread overview]
Message-ID: <cd5bb83e-bb0e-b348-5365-095c5fcd9648@huawei.com> (raw)
In-Reply-To: <20190108135444.GB14476@fuggles.cambridge.arm.com>


On 2019/1/8 21:54, Will Deacon wrote:
> [re-adding Ard and LAKML -- not sure why the headers are so munged]
>
> On Mon, Jan 07, 2019 at 10:38:55AM +0800, huanglingyan (A) wrote:
>> On 2019/1/6 16:26, Ard Biesheuvel wrote:
>>     Please change this into
>>
>>     if (IS_ENABLED(CONFIG_KERNEL_MODE_NEON) &&
>>         len >= CSUM_NEON_THRESHOLD &&
>>         may_use_simd()) {
>>             kernel_neon_begin();
>>             res = do_csum_neon(buff, len);
>>             kernel_neon_end();
>>         }
>>
>>     and drop the intermediate do_csum_arm()
>>
>>
>>         +               return do_csum_arm(buff, len);
>>         +#endif  /* CONFIG_KERNEL_MODE_NEON */
>>
>>     No else? What happens if len < CSUM_NEON_THRESHOLD ?
>>
>>
>>         +#undef do_csum
>>
>>     Can we drop this?
>>
>> Using NEON instructions will bring some costs. The spending maybe introduced
>> when reservering/restoring
>> neon registers with kernel_neon_begin()/kernel_neon_end(). Therefore NEON code
>> is Only used when
>> the length exceeds CSUM_NEON_THRESHOLD. General do csum() codes in lib/
>> checksum.c will be used in
>> shorter length. To achieve this goal, I use the "#undef do_csum" in else clause
>> to have the oppotunity to
>> utilize the general codes.
> I don't think that's how it works :/
>
> Before we get deeper into the implementation, please could you justify the
> need for a CPU-optimised checksum implementation at all? I thought this was
> usually offloaded to the NIC?
>
> Will
>
> .
This problem is introduced when testing Intel x710 network card on my ARM server.
Ip forward is set for ease of testing. Then send lots of packages to server by Tesgine
machine and then receive.

The bandwidth in Intel 8180 is 9.5 Gbps while only 5.8 Gbps in ARM. It shows that
do_csum() costs 36% in ARM and only 6% in Intel with perf tools. That's why I decide
to modify do_csum() function in ARM.

As a newbee of linux kernel, I has little knowledge of the implementation of such
situation. Looking forward to get your help of improving this patch.

Lingyan Huang
.

**************************

**************************
>


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2019-01-09  2:02 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-06  1:55 [PATCH v3] arm64: lib: accelerate do_csum with NEON instruction Lingyan Huang
2019-01-06  8:26 ` Ard Biesheuvel
     [not found]   ` <9129b882-60f3-8046-0cb9-e0b2452a118d@huawei.com>
2019-01-08 13:54     ` Will Deacon
2019-01-09  2:03       ` huanglingyan (A) [this message]
2019-01-10  4:08         ` 胡海
2019-01-10  8:14           ` huanglingyan (A)
2019-01-16 16:46         ` Will Deacon
2019-01-18  1:07           ` huanglingyan (A)
2019-01-18 11:14             ` Ard Biesheuvel
2019-02-12  2:26               ` huanglingyan (A)
2019-02-12  7:07                 ` Ard Biesheuvel
2019-02-13  8:42                   ` huanglingyan (A)
2019-02-13  9:15                     ` Ard Biesheuvel
2019-02-13 17:55                       ` Ard Biesheuvel
2019-02-14  9:57                         ` huanglingyan (A)
2019-02-18  8:49                           ` huanglingyan (A)
2019-02-18  9:03                             ` Ard Biesheuvel
2019-01-09 14:58 ` Dave Martin
2019-01-10  8:03   ` huanglingyan (A)
2019-01-10 13:53     ` Dave Martin
     [not found] <1f065749-6676-6489-14ae-fdcfeeb3389c@huawei.com>
2019-01-07  6:11 ` huanglingyan (A)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cd5bb83e-bb0e-b348-5365-095c5fcd9648@huawei.com \
    --to=huanglingyan2@huawei.com \
    --cc=ard.biesheuvel@linaro.org \
    --cc=catalin.marinas@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=will.deacon@arm.com \
    --cc=zhangshaokun@hisilicon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).