linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Robin Murphy <robin.murphy@arm.com>
To: Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	Ilias Apalodimas <ilias.apalodimas@linaro.org>,
	Catalin Marinas <catalin.marinas@arm.com>
Cc: "<netdev@vger.kernel.org>" <netdev@vger.kernel.org>,
	Steve Capper <steve.capper@arm.com>,
	Will Deacon <will.deacon@arm.com>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
	"huanglingyan \(A\)" <huanglingyan2@huawei.com>
Subject: Re: [PATCH] arm64: do_csum: implement accelerated scalar version
Date: Thu, 28 Feb 2019 15:13:59 +0000	[thread overview]
Message-ID: <93697477-4dcc-4ab2-c838-2f487d334c56@arm.com> (raw)
In-Reply-To: <CAKv+Gu-oH44z16com1+c__7UoipJA-1ZpThKuvTpLdR6kjgyDA@mail.gmail.com>

Hi Ard,

On 28/02/2019 14:16, Ard Biesheuvel wrote:
> (+ Catalin)
> 
> On Tue, 19 Feb 2019 at 16:08, Ilias Apalodimas
> <ilias.apalodimas@linaro.org> wrote:
>>
>> On Tue, Feb 19, 2019 at 12:08:42AM +0100, Ard Biesheuvel wrote:
>>> It turns out that the IP checksumming code is still exercised often,
>>> even though one might expect that modern NICs with checksum offload
>>> have no use for it. However, as Lingyan points out, there are
>>> combinations of features where the network stack may still fall back
>>> to software checksumming, and so it makes sense to provide an
>>> optimized implementation in software as well.
>>>
>>> So provide an implementation of do_csum() in scalar assembler, which,
>>> unlike C, gives direct access to the carry flag, making the code run
>>> substantially faster. The routine uses overlapping 64 byte loads for
>>> all input size > 64 bytes, in order to reduce the number of branches
>>> and improve performance on cores with deep pipelines.
>>>
>>> On Cortex-A57, this implementation is on par with Lingyan's NEON
>>> implementation, and roughly 7x as fast as the generic C code.
>>>
>>> Cc: "huanglingyan (A)" <huanglingyan2@huawei.com>
>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ...
>>
>> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> 
> Full patch here
> 
> https://lore.kernel.org/linux-arm-kernel/20190218230842.11448-1-ard.biesheuvel@linaro.org/
> 
> This was a follow-up to some discussions about Lingyan's NEON code,
> CC'ed to netdev@ so people could chime in as to whether we need
> accelerated checksumming code in the first place.
FWIW ever since we did ip_fast_csum() I've been meaning to see how well 
I can do with a similar tweaked C implementation for this (mostly for 
fun). Since I've recently dug out my RK3328 box for other reasons I'll 
give this a test - that's a weedy little quad-A53 whose GbE hardware 
checksumming is slightly busted and has to be turned off, so the 
do_csum() overhead under heavy network load is comparatively massive. 
(plus it's non-EFI so I should be able to try big-endian easily too)

The asm looks pretty reasonable to me - instinct says there's *possibly* 
some value for out-of-order cores in doing the 8-way accumulations in a 
more pairwise fashion, but I guess either way the carry flag dependency 
is going to dominate, so it may well be moot. What may be more 
worthwhile is taking the effort to align the source pointer, at least 
for larger inputs, so as to be kinder to little cores - according to its 
optimisation guide, A55 is fairly sensitive to unaligned loads, so I'd 
assume that's true of its older/smaller friends too. I'll see what I can 
measure in practice - until proven otherwise I'd have no great objection 
to merging this patch as-is if the need is real. Improvements can always 
come later :)

Robin.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2019-02-28 15:14 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-18 23:08 [PATCH] arm64: do_csum: implement accelerated scalar version Ard Biesheuvel
2019-02-19 15:08 ` Ilias Apalodimas
2019-02-28 14:16   ` Ard Biesheuvel
2019-02-28 15:13     ` Robin Murphy [this message]
2019-02-28 15:28       ` Ard Biesheuvel
2019-04-12  2:31 ` Zhangshaokun
2019-04-12  9:52   ` Will Deacon
2019-04-15 18:18     ` Robin Murphy
2019-05-15  9:47       ` Will Deacon
2019-05-15 10:15         ` David Laight
2019-05-15 10:57           ` Robin Murphy
2019-05-15 11:13             ` David Laight
2019-05-15 12:39               ` Robin Murphy
2019-05-15 13:54                 ` David Laight
2019-05-15 11:02         ` Robin Murphy
2019-05-16  3:14         ` Zhangshaokun
2019-08-15 16:46           ` Will Deacon
2019-08-16  8:15             ` Shaokun Zhang
2019-08-16 14:55               ` Robin Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=93697477-4dcc-4ab2-c838-2f487d334c56@arm.com \
    --to=robin.murphy@arm.com \
    --cc=ard.biesheuvel@linaro.org \
    --cc=catalin.marinas@arm.com \
    --cc=huanglingyan2@huawei.com \
    --cc=ilias.apalodimas@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=netdev@vger.kernel.org \
    --cc=steve.capper@arm.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).