From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: Robin Murphy <robin.murphy@arm.com>
Cc: Steve Capper <steve.capper@arm.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Ilias Apalodimas <ilias.apalodimas@linaro.org>,
Will Deacon <will.deacon@arm.com>,
"huanglingyan \(A\)" <huanglingyan2@huawei.com>,
"<netdev@vger.kernel.org>" <netdev@vger.kernel.org>,
linux-arm-kernel <linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH] arm64: do_csum: implement accelerated scalar version
Date: Thu, 28 Feb 2019 16:28:30 +0100 [thread overview]
Message-ID: <CAKv+Gu-HJ1fRoetsYMLKkpGa4QCRfCJ2WAhcX=gUfonR4F-bEQ@mail.gmail.com> (raw)
In-Reply-To: <93697477-4dcc-4ab2-c838-2f487d334c56@arm.com>
On Thu, 28 Feb 2019 at 16:14, Robin Murphy <robin.murphy@arm.com> wrote:
>
> Hi Ard,
>
> On 28/02/2019 14:16, Ard Biesheuvel wrote:
> > (+ Catalin)
> >
> > On Tue, 19 Feb 2019 at 16:08, Ilias Apalodimas
> > <ilias.apalodimas@linaro.org> wrote:
> >>
> >> On Tue, Feb 19, 2019 at 12:08:42AM +0100, Ard Biesheuvel wrote:
> >>> It turns out that the IP checksumming code is still exercised often,
> >>> even though one might expect that modern NICs with checksum offload
> >>> have no use for it. However, as Lingyan points out, there are
> >>> combinations of features where the network stack may still fall back
> >>> to software checksumming, and so it makes sense to provide an
> >>> optimized implementation in software as well.
> >>>
> >>> So provide an implementation of do_csum() in scalar assembler, which,
> >>> unlike C, gives direct access to the carry flag, making the code run
> >>> substantially faster. The routine uses overlapping 64 byte loads for
> >>> all input size > 64 bytes, in order to reduce the number of branches
> >>> and improve performance on cores with deep pipelines.
> >>>
> >>> On Cortex-A57, this implementation is on par with Lingyan's NEON
> >>> implementation, and roughly 7x as fast as the generic C code.
> >>>
> >>> Cc: "huanglingyan (A)" <huanglingyan2@huawei.com>
> >>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > ...
> >>
> >> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> >
> > Full patch here
> >
> > https://lore.kernel.org/linux-arm-kernel/20190218230842.11448-1-ard.biesheuvel@linaro.org/
> >
> > This was a follow-up to some discussions about Lingyan's NEON code,
> > CC'ed to netdev@ so people could chime in as to whether we need
> > accelerated checksumming code in the first place.
Thanks for taking a look.
> FWIW ever since we did ip_fast_csum() I've been meaning to see how well
> I can do with a similar tweaked C implementation for this (mostly for
> fun). Since I've recently dug out my RK3328 box for other reasons I'll
> give this a test - that's a weedy little quad-A53 whose GbE hardware
> checksumming is slightly busted and has to be turned off, so the
> do_csum() overhead under heavy network load is comparatively massive.
> (plus it's non-EFI so I should be able to try big-endian easily too)
>
Yes please. I've been meaning to run this on A72 myself, but ever
since my MacchiatoBin self-combusted, I've been relying on AWS for
this, which is a bit finicky.
As for the C implementation, not having access to the carry flag is
pretty limiting, so I wonder how you intend to get around that.
> The asm looks pretty reasonable to me - instinct says there's *possibly*
> some value for out-of-order cores in doing the 8-way accumulations in a
> more pairwise fashion, but I guess either way the carry flag dependency
> is going to dominate, so it may well be moot.
Yes. In fact, I was surprised the speedup is as dramatic as it is
despite of this, but I guess they optimize for this rather well at the
uarch level.
> What may be more
> worthwhile is taking the effort to align the source pointer, at least
> for larger inputs, so as to be kinder to little cores - according to its
> optimisation guide, A55 is fairly sensitive to unaligned loads, so I'd
> assume that's true of its older/smaller friends too. I'll see what I can
> measure in practice - until proven otherwise I'd have no great objection
> to merging this patch as-is if the need is real. Improvements can always
> come later :)
>
Good point re alignment, I didn't consider that at all tbh.
I'll let the maintainers decide whether/when to merge this. I don't
feel strongly either way.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2019-02-28 15:28 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-18 23:08 [PATCH] arm64: do_csum: implement accelerated scalar version Ard Biesheuvel
2019-02-19 15:08 ` Ilias Apalodimas
2019-02-28 14:16 ` Ard Biesheuvel
2019-02-28 15:13 ` Robin Murphy
2019-02-28 15:28 ` Ard Biesheuvel [this message]
2019-04-12 2:31 ` Zhangshaokun
2019-04-12 9:52 ` Will Deacon
2019-04-15 18:18 ` Robin Murphy
2019-05-15 9:47 ` Will Deacon
2019-05-15 10:15 ` David Laight
2019-05-15 10:57 ` Robin Murphy
2019-05-15 11:13 ` David Laight
2019-05-15 12:39 ` Robin Murphy
2019-05-15 13:54 ` David Laight
2019-05-15 11:02 ` Robin Murphy
2019-05-16 3:14 ` Zhangshaokun
2019-08-15 16:46 ` Will Deacon
2019-08-16 8:15 ` Shaokun Zhang
2019-08-16 14:55 ` Robin Murphy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAKv+Gu-HJ1fRoetsYMLKkpGa4QCRfCJ2WAhcX=gUfonR4F-bEQ@mail.gmail.com' \
--to=ard.biesheuvel@linaro.org \
--cc=catalin.marinas@arm.com \
--cc=huanglingyan2@huawei.com \
--cc=ilias.apalodimas@linaro.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=netdev@vger.kernel.org \
--cc=robin.murphy@arm.com \
--cc=steve.capper@arm.com \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).