* Chacha-Poly performance on ARM64
@ 2019-09-26 14:55 Pascal Van Leeuwen
2019-09-26 14:59 ` Ard Biesheuvel
0 siblings, 1 reply; 3+ messages in thread
From: Pascal Van Leeuwen @ 2019-09-26 14:55 UTC (permalink / raw)
To: Linux Crypto Mailing List, Ard Biesheuvel
Hi,
I'm currently doing some performance benchmarking on a quad core Cortex
A72 (Macchiatobin dev board) for rfc7539esp (ChachaPoly) and the
relatively low performance kind of took me by surprise, considering how
everyone keeps shouting how efficient Chacha-Poly is in software on
modern CPU's.
Then I noticed that it was using chacha20-generic for the encrypt
direction, while a chacha20-neon implementation exists (it actually
DOES use that one for decryption). Why would that be?
Also, it also uses poly1305-generic in both cases. Is that the best
possible on ARM64? I did a quick search in the codebase but couldn't
find any ARM64 optimized version ...
Regards,
Pascal van Leeuwen
Silicon IP Architect, Multi-Protocol Engines @ Verimatrix
www.insidesecure.com
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Chacha-Poly performance on ARM64
2019-09-26 14:55 Chacha-Poly performance on ARM64 Pascal Van Leeuwen
@ 2019-09-26 14:59 ` Ard Biesheuvel
2019-09-26 20:04 ` Pascal Van Leeuwen
0 siblings, 1 reply; 3+ messages in thread
From: Ard Biesheuvel @ 2019-09-26 14:59 UTC (permalink / raw)
To: Pascal Van Leeuwen; +Cc: Linux Crypto Mailing List
On Thu, 26 Sep 2019 at 16:55, Pascal Van Leeuwen
<pvanleeuwen@verimatrix.com> wrote:
>
> Hi,
>
> I'm currently doing some performance benchmarking on a quad core Cortex
> A72 (Macchiatobin dev board) for rfc7539esp (ChachaPoly) and the
> relatively low performance kind of took me by surprise, considering how
> everyone keeps shouting how efficient Chacha-Poly is in software on
> modern CPU's.
>
> Then I noticed that it was using chacha20-generic for the encrypt
> direction, while a chacha20-neon implementation exists (it actually
> DOES use that one for decryption). Why would that be?
>
> Also, it also uses poly1305-generic in both cases. Is that the best
> possible on ARM64? I did a quick search in the codebase but couldn't
> find any ARM64 optimized version ...
>
The Poly1305 implementation is part of the 18 piece WireGuard series I
just sent out yesterday (which I know you have seen :-))
The Chacha20 code should be used in preference to the generic code, so
if you end up with the wrong version, there's a bug somewhere we need
to fix.
Also, how do you know which direction uses which transform? What are
the refcounts for the transforms in /proc/crypto?
^ permalink raw reply [flat|nested] 3+ messages in thread
* RE: Chacha-Poly performance on ARM64
2019-09-26 14:59 ` Ard Biesheuvel
@ 2019-09-26 20:04 ` Pascal Van Leeuwen
0 siblings, 0 replies; 3+ messages in thread
From: Pascal Van Leeuwen @ 2019-09-26 20:04 UTC (permalink / raw)
To: Ard Biesheuvel; +Cc: Linux Crypto Mailing List
> -----Original Message-----
> From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Sent: Thursday, September 26, 2019 4:59 PM
> To: Pascal Van Leeuwen <pvanleeuwen@verimatrix.com>
> Cc: Linux Crypto Mailing List <linux-crypto@vger.kernel.org>
> Subject: Re: Chacha-Poly performance on ARM64
>
> On Thu, 26 Sep 2019 at 16:55, Pascal Van Leeuwen
> <pvanleeuwen@verimatrix.com> wrote:
> >
> > Hi,
> >
> > I'm currently doing some performance benchmarking on a quad core Cortex
> > A72 (Macchiatobin dev board) for rfc7539esp (ChachaPoly) and the
> > relatively low performance kind of took me by surprise, considering how
> > everyone keeps shouting how efficient Chacha-Poly is in software on
> > modern CPU's.
> >
> > Then I noticed that it was using chacha20-generic for the encrypt
> > direction, while a chacha20-neon implementation exists (it actually
> > DOES use that one for decryption). Why would that be?
> >
> > Also, it also uses poly1305-generic in both cases. Is that the best
> > possible on ARM64? I did a quick search in the codebase but couldn't
> > find any ARM64 optimized version ...
> >
>
> The Poly1305 implementation is part of the 18 piece WireGuard series I
> just sent out yesterday (which I know you have seen :-))
>
I've seen the series but I must have missed that detail. I had hunch you
would be the one working on it though :-) I'll look it up and try it
tomorrow.
> The Chacha20 code should be used in preference to the generic code, so
> if you end up with the wrong version, there's a bug somewhere we need
> to fix.
>
Yes, I think so too. In fact, I think it may be the same bug I reported
earlier regarding the selftests, where it also unexpectedly picked the
generic implementation. IIRC the response I got back was that this was
a known issue where for the very first use of a cipher, the generic
implementation gets chosen instead of the optimal one. I guess no one
has looked into that yet ...
> Also, how do you know which direction uses which transform?
>
Well, tcrypt just logs that to the message log.
> What are the refcounts for the transforms in /proc/crypto?
>
All refcnt's in /proc/crypto are 1.
Regards,
Pascal van Leeuwen
Silicon IP Architect, Multi-Protocol Engines @ Verimatrix
www.insidesecure.com
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2019-09-26 20:04 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-26 14:55 Chacha-Poly performance on ARM64 Pascal Van Leeuwen
2019-09-26 14:59 ` Ard Biesheuvel
2019-09-26 20:04 ` Pascal Van Leeuwen
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.