linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: Robin Murphy <robin.murphy@arm.com>
Cc: Steve Capper <steve.capper@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Ilias Apalodimas <ilias.apalodimas@linaro.org>,
	Will Deacon <will.deacon@arm.com>,
	"huanglingyan \(A\)" <huanglingyan2@huawei.com>,
	"<netdev@vger.kernel.org>" <netdev@vger.kernel.org>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH] arm64: do_csum: implement accelerated scalar version
Date: Thu, 28 Feb 2019 16:28:30 +0100	[thread overview]
Message-ID: <CAKv+Gu-HJ1fRoetsYMLKkpGa4QCRfCJ2WAhcX=gUfonR4F-bEQ@mail.gmail.com> (raw)
In-Reply-To: <93697477-4dcc-4ab2-c838-2f487d334c56@arm.com>

On Thu, 28 Feb 2019 at 16:14, Robin Murphy <robin.murphy@arm.com> wrote:
>
> Hi Ard,
>
> On 28/02/2019 14:16, Ard Biesheuvel wrote:
> > (+ Catalin)
> >
> > On Tue, 19 Feb 2019 at 16:08, Ilias Apalodimas
> > <ilias.apalodimas@linaro.org> wrote:
> >>
> >> On Tue, Feb 19, 2019 at 12:08:42AM +0100, Ard Biesheuvel wrote:
> >>> It turns out that the IP checksumming code is still exercised often,
> >>> even though one might expect that modern NICs with checksum offload
> >>> have no use for it. However, as Lingyan points out, there are
> >>> combinations of features where the network stack may still fall back
> >>> to software checksumming, and so it makes sense to provide an
> >>> optimized implementation in software as well.
> >>>
> >>> So provide an implementation of do_csum() in scalar assembler, which,
> >>> unlike C, gives direct access to the carry flag, making the code run
> >>> substantially faster. The routine uses overlapping 64 byte loads for
> >>> all input size > 64 bytes, in order to reduce the number of branches
> >>> and improve performance on cores with deep pipelines.
> >>>
> >>> On Cortex-A57, this implementation is on par with Lingyan's NEON
> >>> implementation, and roughly 7x as fast as the generic C code.
> >>>
> >>> Cc: "huanglingyan (A)" <huanglingyan2@huawei.com>
> >>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > ...
> >>
> >> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> >
> > Full patch here
> >
> > https://lore.kernel.org/linux-arm-kernel/20190218230842.11448-1-ard.biesheuvel@linaro.org/
> >
> > This was a follow-up to some discussions about Lingyan's NEON code,
> > CC'ed to netdev@ so people could chime in as to whether we need
> > accelerated checksumming code in the first place.

Thanks for taking a look.

> FWIW ever since we did ip_fast_csum() I've been meaning to see how well
> I can do with a similar tweaked C implementation for this (mostly for
> fun). Since I've recently dug out my RK3328 box for other reasons I'll
> give this a test - that's a weedy little quad-A53 whose GbE hardware
> checksumming is slightly busted and has to be turned off, so the
> do_csum() overhead under heavy network load is comparatively massive.
> (plus it's non-EFI so I should be able to try big-endian easily too)
>

Yes please. I've been meaning to run this on A72 myself, but ever
since my MacchiatoBin self-combusted, I've been relying on AWS for
this, which is a bit finicky.

As for the C implementation, not having access to the carry flag is
pretty limiting, so I wonder how you intend to get around that.

> The asm looks pretty reasonable to me - instinct says there's *possibly*
> some value for out-of-order cores in doing the 8-way accumulations in a
> more pairwise fashion, but I guess either way the carry flag dependency
> is going to dominate, so it may well be moot.

Yes. In fact, I was surprised the speedup is as dramatic as it is
despite of this, but I guess they optimize for this rather well at the
uarch level.

> What may be more
> worthwhile is taking the effort to align the source pointer, at least
> for larger inputs, so as to be kinder to little cores - according to its
> optimisation guide, A55 is fairly sensitive to unaligned loads, so I'd
> assume that's true of its older/smaller friends too. I'll see what I can
> measure in practice - until proven otherwise I'd have no great objection
> to merging this patch as-is if the need is real. Improvements can always
> come later :)
>

Good point re alignment, I didn't consider that at all tbh.

I'll let the maintainers decide whether/when to merge this. I don't
feel strongly either way.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2019-02-28 15:28 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-18 23:08 [PATCH] arm64: do_csum: implement accelerated scalar version Ard Biesheuvel
2019-02-19 15:08 ` Ilias Apalodimas
2019-02-28 14:16   ` Ard Biesheuvel
2019-02-28 15:13     ` Robin Murphy
2019-02-28 15:28       ` Ard Biesheuvel [this message]
2019-04-12  2:31 ` Zhangshaokun
2019-04-12  9:52   ` Will Deacon
2019-04-15 18:18     ` Robin Murphy
2019-05-15  9:47       ` Will Deacon
2019-05-15 10:15         ` David Laight
2019-05-15 10:57           ` Robin Murphy
2019-05-15 11:13             ` David Laight
2019-05-15 12:39               ` Robin Murphy
2019-05-15 13:54                 ` David Laight
2019-05-15 11:02         ` Robin Murphy
2019-05-16  3:14         ` Zhangshaokun
2019-08-15 16:46           ` Will Deacon
2019-08-16  8:15             ` Shaokun Zhang
2019-08-16 14:55               ` Robin Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKv+Gu-HJ1fRoetsYMLKkpGa4QCRfCJ2WAhcX=gUfonR4F-bEQ@mail.gmail.com' \
    --to=ard.biesheuvel@linaro.org \
    --cc=catalin.marinas@arm.com \
    --cc=huanglingyan2@huawei.com \
    --cc=ilias.apalodimas@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=netdev@vger.kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=steve.capper@arm.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).