All of lore.kernel.org
 help / color / mirror / Atom feed
* Chacha-Poly performance on ARM64
@ 2019-09-26 14:55 Pascal Van Leeuwen
  2019-09-26 14:59 ` Ard Biesheuvel
  0 siblings, 1 reply; 3+ messages in thread
From: Pascal Van Leeuwen @ 2019-09-26 14:55 UTC (permalink / raw)
  To: Linux Crypto Mailing List, Ard Biesheuvel

Hi,

I'm currently doing some performance benchmarking on a quad core Cortex
A72 (Macchiatobin dev board) for rfc7539esp (ChachaPoly) and the 
relatively low performance kind of took me by surprise, considering how
everyone  keeps shouting how efficient Chacha-Poly is in software on
modern CPU's.

Then I noticed that it was using chacha20-generic for the encrypt
direction, while a chacha20-neon implementation exists (it actually
DOES use that one for decryption). Why would that be?

Also, it also uses poly1305-generic in both cases. Is that the best
possible on ARM64? I did a quick search in the codebase but couldn't
find any ARM64 optimized version ...

Regards,
Pascal van Leeuwen
Silicon IP Architect, Multi-Protocol Engines @ Verimatrix
www.insidesecure.com


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Chacha-Poly performance on ARM64
  2019-09-26 14:55 Chacha-Poly performance on ARM64 Pascal Van Leeuwen
@ 2019-09-26 14:59 ` Ard Biesheuvel
  2019-09-26 20:04   ` Pascal Van Leeuwen
  0 siblings, 1 reply; 3+ messages in thread
From: Ard Biesheuvel @ 2019-09-26 14:59 UTC (permalink / raw)
  To: Pascal Van Leeuwen; +Cc: Linux Crypto Mailing List

On Thu, 26 Sep 2019 at 16:55, Pascal Van Leeuwen
<pvanleeuwen@verimatrix.com> wrote:
>
> Hi,
>
> I'm currently doing some performance benchmarking on a quad core Cortex
> A72 (Macchiatobin dev board) for rfc7539esp (ChachaPoly) and the
> relatively low performance kind of took me by surprise, considering how
> everyone  keeps shouting how efficient Chacha-Poly is in software on
> modern CPU's.
>
> Then I noticed that it was using chacha20-generic for the encrypt
> direction, while a chacha20-neon implementation exists (it actually
> DOES use that one for decryption). Why would that be?
>
> Also, it also uses poly1305-generic in both cases. Is that the best
> possible on ARM64? I did a quick search in the codebase but couldn't
> find any ARM64 optimized version ...
>

The Poly1305 implementation is part of the 18 piece WireGuard series I
just sent out yesterday (which I know you have seen :-))

The Chacha20 code should be used in preference to the generic code, so
if you end up with the wrong version, there's a bug somewhere we need
to fix.

Also, how do you know which direction uses which transform? What are
the refcounts for the transforms in /proc/crypto?

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: Chacha-Poly performance on ARM64
  2019-09-26 14:59 ` Ard Biesheuvel
@ 2019-09-26 20:04   ` Pascal Van Leeuwen
  0 siblings, 0 replies; 3+ messages in thread
From: Pascal Van Leeuwen @ 2019-09-26 20:04 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: Linux Crypto Mailing List

> -----Original Message-----
> From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Sent: Thursday, September 26, 2019 4:59 PM
> To: Pascal Van Leeuwen <pvanleeuwen@verimatrix.com>
> Cc: Linux Crypto Mailing List <linux-crypto@vger.kernel.org>
> Subject: Re: Chacha-Poly performance on ARM64
> 
> On Thu, 26 Sep 2019 at 16:55, Pascal Van Leeuwen
> <pvanleeuwen@verimatrix.com> wrote:
> >
> > Hi,
> >
> > I'm currently doing some performance benchmarking on a quad core Cortex
> > A72 (Macchiatobin dev board) for rfc7539esp (ChachaPoly) and the
> > relatively low performance kind of took me by surprise, considering how
> > everyone  keeps shouting how efficient Chacha-Poly is in software on
> > modern CPU's.
> >
> > Then I noticed that it was using chacha20-generic for the encrypt
> > direction, while a chacha20-neon implementation exists (it actually
> > DOES use that one for decryption). Why would that be?
> >
> > Also, it also uses poly1305-generic in both cases. Is that the best
> > possible on ARM64? I did a quick search in the codebase but couldn't
> > find any ARM64 optimized version ...
> >
> 
> The Poly1305 implementation is part of the 18 piece WireGuard series I
> just sent out yesterday (which I know you have seen :-))
> 
I've seen the series but I must have missed that detail. I had hunch you
would be the one working on it though :-) I'll look it up and try it 
tomorrow.

> The Chacha20 code should be used in preference to the generic code, so
> if you end up with the wrong version, there's a bug somewhere we need
> to fix.
> 
Yes, I think so too. In fact, I think it may be the same bug I reported
earlier regarding the selftests, where it also unexpectedly picked the
generic implementation. IIRC the response I got back was that this was
a known issue where for the very first use of a cipher, the generic 
implementation gets chosen instead of the optimal one. I guess no one
has looked into that yet ...

> Also, how do you know which direction uses which transform? 
>
Well, tcrypt just logs that to the message log.

> What are the refcounts for the transforms in /proc/crypto?
>
All refcnt's in /proc/crypto are 1.

Regards,
Pascal van Leeuwen
Silicon IP Architect, Multi-Protocol Engines @ Verimatrix
www.insidesecure.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-09-26 20:04 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-26 14:55 Chacha-Poly performance on ARM64 Pascal Van Leeuwen
2019-09-26 14:59 ` Ard Biesheuvel
2019-09-26 20:04   ` Pascal Van Leeuwen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.