LKML Archive on
 help / color / Atom feed
From: "D. J. Bernstein" <>
To: Eric Biggers <>,
	"Jason A. Donenfeld" <>,
	Eric Biggers <>,
	Linux Crypto Mailing List <>,
	LKML <>,
	Netdev <>,
	David Miller <>,
	Andrew Lutomirski <>,
	Greg Kroah-Hartman <>,
	Samuel Neves <>,
	Tanja Lange <>,
	Jean-Philippe Aumasson <>,
	Karthikeyan Bhargavan <>
Subject: Re: [PATCH v1 2/3] zinc: Introduce minimal cryptography library
Date: 15 Aug 2018 16:28:19 -0000
Message-ID: <> (raw)
In-Reply-To: <>

[-- Attachment #1: Type: text/plain, Size: 2707 bytes --]

Eric Biggers writes:
> I've also written a scalar ChaCha20 implementation (no NEON instructions!) that
> is 12.2 cpb on one block at a time on Cortex-A7, taking advantage of the free
> rotates; that would be useful for the single permutation used to compute
> XChaCha's subkey, and also for the ends of messages.

This is also how ends of messages are handled in the 2012 implementation
crypto_stream/salsa20/armneon6 (see "mainloop1") inside the SUPERCOP
benchmarking framework:

This code is marginally different from Eric's new code because the
occasional loads and stores are scheduled for the Cortex-A8 rather than
the Cortex-A7, and because it's Salsa20 rather than ChaCha20.

The bigger picture is that there are 63 implementations of Salsa20 and
ChaCha20 in SUPERCOP from 10 authors showing various implementation
techniques, including all the techniques that have been mentioned in
this thread; and centralized benchmarks on (e.g.)

showing what's fastest on various platforms, using well-developed
benchmarking tools that produce repeatable, meaningful measurements.
There are also various papers explaining the main techniques.

Of course it's possible that new code will do better, especially on
platforms with different performance characteristics from the platforms
previously targeted. Contributing new implementations to SUPERCOP is
easy---which is why SUPERCOP already has thousands of implementations of
hundreds of cryptographic functions---and is a more effective way to
advertise speedups than adding code merely to (e.g.) the Linux kernel.
Infrastructure is centralized in SUPERCOP to minimize per-implementation
work. There's no risk of being rejected on the basis of cryptographic
concerns (MD5, Speck, and RSA-512 are included in the benchmarks) or
code-style concerns. Users can then decide which implementations best
meet their requirements.

"Do better" seems to be what's happened for the Cortex-A7. The best
SUPERCOP speeds (from code targeting the Cortex-A8 etc.) are 13.42
cycles/byte for 4096 bytes for ChaCha20; 12.2, 11.9, and 11.3 sound
noticeably better. The Cortex-A7 is an interesting case because it's
simultaneously (1) widely deployed---more than a billion units sold---
but (2) poorly documented. If you want to know, e.g., which instructions
can dual-issue with loads/FPU moves/..., then you won't be able to find
anything from ARM giving the answer. I've started building an automated
tool to compute the full CPU pipeline structure from benchmarks, but
this isn't ready yet.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

  reply index

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-01  7:22 Eric Biggers
2018-08-01 17:02 ` Andy Lutomirski
2018-08-03  2:48   ` Jason A. Donenfeld
2018-08-03 21:29     ` Andy Lutomirski
2018-08-03 22:10       ` Jason A. Donenfeld
2018-08-07 18:54         ` Jason A. Donenfeld
2018-08-07 19:43           ` Andy Lutomirski
2018-08-07 23:48             ` Jason A. Donenfeld
2018-08-08  1:48               ` Andy Lutomirski
2018-08-08  1:51                 ` Jason A. Donenfeld
2018-08-09 18:08                   ` Andy Lutomirski
2018-08-03  2:33 ` Jason A. Donenfeld
2018-08-14 21:12   ` Eric Biggers
2018-08-15 16:28     ` D. J. Bernstein [this message]
2018-08-15 19:57       ` Eric Biggers
2018-08-16  4:24         ` D. J. Bernstein
2018-08-16 19:46           ` Eric Biggers
2018-08-17  7:31             ` D. J. Bernstein
2018-08-18  8:13               ` Ard Biesheuvel
2018-08-16  6:31     ` Jason A. Donenfeld

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on

Archives are clonable:
	git clone --mirror lkml/git/0.git
	git clone --mirror lkml/git/1.git
	git clone --mirror lkml/git/2.git
	git clone --mirror lkml/git/3.git
	git clone --mirror lkml/git/4.git
	git clone --mirror lkml/git/5.git
	git clone --mirror lkml/git/6.git
	git clone --mirror lkml/git/7.git
	git clone --mirror lkml/git/8.git
	git clone --mirror lkml/git/9.git
	git clone --mirror lkml/git/10.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ \
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone