LKML Archive on
 help / color / Atom feed
From: Eric Biggers <>
To: "Jason A. Donenfeld" <>
Cc: Eric Biggers <>,
	Linux Crypto Mailing List <>,
	LKML <>,
	Netdev <>,
	David Miller <>,
	Andrew Lutomirski <>,
	Greg Kroah-Hartman <>,
	Samuel Neves <>,
	"Daniel J . Bernstein" <>,
	Tanja Lange <>,
	Jean-Philippe Aumasson <>,
	Karthikeyan Bhargavan <>
Subject: Re: [PATCH v1 2/3] zinc: Introduce minimal cryptography library
Date: Tue, 14 Aug 2018 14:12:31 -0700
Message-ID: <> (raw)
In-Reply-To: <>

On Fri, Aug 03, 2018 at 04:33:50AM +0200, Jason A. Donenfeld wrote:
> > Also, earlier when I tested OpenSSL's ChaCha NEON implementation on ARM
> > Cortex-A7 it was actually quite a bit slower than the one in the Linux
> > kernel written by Ard Biesheuvel... I trust that when claiming the
> > performance of all implementations you're adding is "state-of-the-art
> > and unrivaled", you actually compared them to the ones already in the
> > Linux kernel which you're advocating replacing, right? :-)
> Yes, I have, and my results don't corroborate your findings. It will
> be interesting to get out a wider variety of hardware for comparisons.
> I suspect, also, that if the snarky emoticons subside, AndyP would be
> very interested in whatever we find and could have interest in
> improving implementations, should we actually find performance
> differences.

On ARM Cortex-A7, OpenSSL's ChaCha20 implementation is 13.9 cpb (cycles per
byte), whereas Linux's is faster: 11.9 cpb.  I've also recently improved the
Linux implementation to 11.3 cpb and would like to send out a patch soon...
I've also written a scalar ChaCha20 implementation (no NEON instructions!) that
is 12.2 cpb on one block at a time on Cortex-A7, taking advantage of the free
rotates; that would be useful for the single permutation used to compute
XChaCha's subkey, and also for the ends of messages.

The reason Linux's ChaCha20 NEON implementation is faster than OpenSSL's is that
Linux's does 4 blocks at once using NEON instructions, and the words are
de-interleaved so the rows don't need to be shifted between each round.
OpenSSL's implementation, on the other hand, only does 3 blocks at once with
NEON instructions and has to shift the rows between each round.  OpenSSL's
implementation also does a 4th block at the same time using regular ARM
instructions, but that doesn't help on Cortex-A7; it just makes it slower.

I understand there are tradeoffs, and different implementations can be faster on
different CPUs.  Just know that from my point of view, switching to the OpenSSL
implementation actually introduces a performance regression, and we care a *lot*
about this since we need ChaCha to be absolutely as fast as possible for HPolyC
disk encryption.  So if your proposal goes in, I'd likely need to write a patch
to get the old performance back, at least on Cortex-A7...

Also, I don't know whether Andy P. considered the 4xNEON implementation
technique.  It could even be fastest on other ARM CPUs too, I don't know.

- Eric

  reply index

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-01  7:22 Eric Biggers
2018-08-01 17:02 ` Andy Lutomirski
2018-08-03  2:48   ` Jason A. Donenfeld
2018-08-03 21:29     ` Andy Lutomirski
2018-08-03 22:10       ` Jason A. Donenfeld
2018-08-07 18:54         ` Jason A. Donenfeld
2018-08-07 19:43           ` Andy Lutomirski
2018-08-07 23:48             ` Jason A. Donenfeld
2018-08-08  1:48               ` Andy Lutomirski
2018-08-08  1:51                 ` Jason A. Donenfeld
2018-08-09 18:08                   ` Andy Lutomirski
2018-08-03  2:33 ` Jason A. Donenfeld
2018-08-14 21:12   ` Eric Biggers [this message]
2018-08-15 16:28     ` D. J. Bernstein
2018-08-15 19:57       ` Eric Biggers
2018-08-16  4:24         ` D. J. Bernstein
2018-08-16 19:46           ` Eric Biggers
2018-08-17  7:31             ` D. J. Bernstein
2018-08-18  8:13               ` Ard Biesheuvel
2018-08-16  6:31     ` Jason A. Donenfeld

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on

Archives are clonable:
	git clone --mirror lkml/git/0.git
	git clone --mirror lkml/git/1.git
	git clone --mirror lkml/git/2.git
	git clone --mirror lkml/git/3.git
	git clone --mirror lkml/git/4.git
	git clone --mirror lkml/git/5.git
	git clone --mirror lkml/git/6.git
	git clone --mirror lkml/git/7.git
	git clone --mirror lkml/git/8.git
	git clone --mirror lkml/git/9.git
	git clone --mirror lkml/git/10.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ \
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone