All of lore.kernel.org
 help / color / mirror / Atom feed
* ChaCha20 vs. AES performance
@ 2016-09-20 11:15 Kent Overstreet
  2016-09-20 14:23 ` Theodore Ts'o
  0 siblings, 1 reply; 5+ messages in thread
From: Kent Overstreet @ 2016-09-20 11:15 UTC (permalink / raw)
  To: tytso, linux-btrfs

Not on the list or I would've replied directly, but on Haswell, ChaCha20 (in
software) is over 2x as fast as AES (in hardware), at realistic (for a
filesystem) block sizes:
 
testing speed of ctr(aes) (ctr(aes-aesni)) decryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 378 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 1130 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 3981 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 15458 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 122880 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 391 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 1193 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 4212 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 16388 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 131029 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 417 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 1222 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 4398 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 17114 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 137028 cycles (8192 bytes)

testing speed of chacha20 (chacha20-simd) encryption
test 0 (256 bit key, 16 byte blocks): 1 operation in 4356 cycles (16 bytes)
test 1 (256 bit key, 64 byte blocks): 1 operation in 4004 cycles (64 bytes)
test 2 (256 bit key, 256 byte blocks): 1 operation in 6524 cycles (256 bytes)
test 3 (256 bit key, 1024 byte blocks): 1 operation in 9248 cycles (1024 bytes)
test 4 (256 bit key, 8192 byte blocks): 1 operation in 60274 cycles (8192 bytes)

Poly1305 is also plenty fast:

testing speed of gcm(aes) (gcm_base(ctr-aes-aesni,ghash-generic)) encryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 7567 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 9654 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 19010 cycles (256 bytes)
test 3 (128 bit key, 512 byte blocks): 1 operation in 33118 cycles (512 bytes)
test 4 (128 bit key, 1024 byte blocks): 1 operation in 59738 cycles (1024 bytes)
test 5 (128 bit key, 2048 byte blocks): 1 operation in 106545 cycles (2048 bytes)
test 6 (128 bit key, 4096 byte blocks): 1 operation in 211189 cycles (4096 bytes)
test 7 (128 bit key, 8192 byte blocks): 1 operation in 370439 cycles (8192 bytes)
test 8 (192 bit key, 16 byte blocks): 1 operation in 6780 cycles (16 bytes)
test 9 (192 bit key, 64 byte blocks): 1 operation in 8802 cycles (64 bytes)
test 10 (192 bit key, 256 byte blocks): 1 operation in 17352 cycles (256 bytes)
test 11 (192 bit key, 512 byte blocks): 1 operation in 28680 cycles (512 bytes)
test 12 (192 bit key, 1024 byte blocks): 1 operation in 51230 cycles (1024 bytes)
test 13 (192 bit key, 2048 byte blocks): 1 operation in 96662 cycles (2048 bytes)
test 14 (192 bit key, 4096 byte blocks): 1 operation in 187287 cycles (4096 bytes)
test 15 (192 bit key, 8192 byte blocks): 1 operation in 372570 cycles (8192 bytes)
test 16 (256 bit key, 16 byte blocks): 1 operation in 6273 cycles (16 bytes)
test 17 (256 bit key, 64 byte blocks): 1 operation in 8096 cycles (64 bytes)
test 18 (256 bit key, 256 byte blocks): 1 operation in 15895 cycles (256 bytes)
test 19 (256 bit key, 512 byte blocks): 1 operation in 26259 cycles (512 bytes)
test 20 (256 bit key, 1024 byte blocks): 1 operation in 47121 cycles (1024 bytes)
test 21 (256 bit key, 2048 byte blocks): 1 operation in 91003 cycles (2048 bytes)
test 22 (256 bit key, 4096 byte blocks): 1 operation in 175883 cycles (4096 bytes)
test 23 (256 bit key, 8192 byte blocks): 1 operation in 340904 cycles (8192 bytes)

testing speed of rfc7539esp(chacha20,poly1305) (rfc7539esp(chacha20-simd,poly1305-simd)) encryption
test 0 (288 bit key, 16 byte blocks): 1 operation in 12145 cycles (16 bytes)
test 1 (288 bit key, 64 byte blocks): 1 operation in 14538 cycles (64 bytes)
test 2 (288 bit key, 256 byte blocks): 1 operation in 16435 cycles (256 bytes)
test 3 (288 bit key, 512 byte blocks): 1 operation in 15622 cycles (512 bytes)
test 4 (288 bit key, 1024 byte blocks): 1 operation in 18671 cycles (1024 bytes)
test 5 (288 bit key, 2048 byte blocks): 1 operation in 23264 cycles (2048 bytes)
test 6 (288 bit key, 4096 byte blocks): 1 operation in 36480 cycles (4096 bytes)
test 7 (288 bit key, 8192 byte blocks): 1 operation in 75051 cycles (8192 bytes)

When AVX-512 comes out ChaCha20 is going to get even faster - probably by more
than 2x, since they're adding a rotate instruction. I haven't tested on ARM but
I'd be surprised if the situation is significantly different there (the kernel's
lacking a NEON ChaCha20 implementation, but I could do one).

Just because it's implemented in hardware doesn't mean it's faster...

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ChaCha20 vs. AES performance
  2016-09-20 11:15 ChaCha20 vs. AES performance Kent Overstreet
@ 2016-09-20 14:23 ` Theodore Ts'o
  2016-09-20 15:51   ` Kent Overstreet
  0 siblings, 1 reply; 5+ messages in thread
From: Theodore Ts'o @ 2016-09-20 14:23 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-btrfs

On Tue, Sep 20, 2016 at 03:15:19AM -0800, Kent Overstreet wrote:
> Not on the list or I would've replied directly, but on Haswell, ChaCha20 (in
> software) is over 2x as fast as AES (in hardware), at realistic (for a
> filesystem) block sizes:

On Skylake and Broadwell processors, AES is faster (the posting is
from a ChaCha20 enthusiast):

     https://blog.cloudflare.com/it-takes-two-to-chacha-poly/

My big worry though is that schemes that require that nonces/IV's must
**never** be reused are fragile.  It's for the same reason that DSA
makes my skin crawl.  If you ever screw up --- maybe after a crash, or
a file system bug, you end up reusing a nonce, it's game over.

So if there are hardware solutions which are faster or fast enough
that the crypto is no longer dominant cost, why not use a cipher
scheme which is more robust?

						- Ted

P.S.  We're also both ignoring the cost of whatever changes are needed in
the file system to guarantee that the nonce is never, ever reused...

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ChaCha20 vs. AES performance
  2016-09-20 14:23 ` Theodore Ts'o
@ 2016-09-20 15:51   ` Kent Overstreet
  2016-09-20 20:35     ` Alex Elsayed
  2016-09-20 22:40     ` Mathieu Chouquet-Stringer
  0 siblings, 2 replies; 5+ messages in thread
From: Kent Overstreet @ 2016-09-20 15:51 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-btrfs

On Tue, Sep 20, 2016 at 10:23:20AM -0400, Theodore Ts'o wrote:
> On Tue, Sep 20, 2016 at 03:15:19AM -0800, Kent Overstreet wrote:
> > Not on the list or I would've replied directly, but on Haswell, ChaCha20 (in
> > software) is over 2x as fast as AES (in hardware), at realistic (for a
> > filesystem) block sizes:
> 
> On Skylake and Broadwell processors, AES is faster (the posting is
> from a ChaCha20 enthusiast):
> 
>      https://blog.cloudflare.com/it-takes-two-to-chacha-poly/

The performance delta in his graphs isn't near as big as what I've measured,
which makes me suspect OpenSSL's ChaCha20 implementation isn't nearly as fast as
the kernel's.

> My big worry though is that schemes that require that nonces/IV's must
> **never** be reused are fragile.  It's for the same reason that DSA
> makes my skin crawl.  If you ever screw up --- maybe after a crash, or
> a file system bug, you end up reusing a nonce, it's game over.
> 
> So if there are hardware solutions which are faster or fast enough
> that the crypto is no longer dominant cost, why not use a cipher
> scheme which is more robust?

Block ciphers have their own downsides though - XTS is really a big pile of
hacks and workarounds. On the whole, if you can get nonces right, a stream
cipher cryptosystem (and ChaCha20 especially) is on the whole drastically
simpler, and thus easier to understand and audit.

And if you can do nonces correctly, ChaCha20/Poly1305 is pretty much one of the
gold standards - it's secure against pretty much any vaguely realistic threat
model. XTS, not so much - it's just the best you can do given the constraints of
typical disk crypto. The gold standards of encryption today are the AEADs - and
AES/GCM fails badly with nonce reuse too, there aren't any AEADs yet that don't
fail badly with nonce reuse.

> P.S.  We're also both ignoring the cost of whatever changes are needed in
> the file system to guarantee that the nonce is never, ever reused...

I'm definitely not advocating for hacking stream ciphers into existing
filesystems - if you don't have the machinery you need to be 100% rigorous about
nonces, then definitely stick with XTS. But I already had most of what I needed
in bcachefs, and I can still break the on disk format if I need to (and
encryption is a breaking change), so for me ChaCha20/Poly1305 was a no brainer.

BTW though, if there do turn out to be platforms where AES is significantly
faster than ChaCha20 I can still add AES support pretty easily - I've already
got all the relevant switch statements, since encryption is handled as another
checksum type.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ChaCha20 vs. AES performance
  2016-09-20 15:51   ` Kent Overstreet
@ 2016-09-20 20:35     ` Alex Elsayed
  2016-09-20 22:40     ` Mathieu Chouquet-Stringer
  1 sibling, 0 replies; 5+ messages in thread
From: Alex Elsayed @ 2016-09-20 20:35 UTC (permalink / raw)
  To: linux-btrfs

On Tue, 20 Sep 2016 07:51:52 -0800, Kent Overstreet wrote:

> On Tue, Sep 20, 2016 at 10:23:20AM -0400, Theodore Ts'o wrote:
>> On Tue, Sep 20, 2016 at 03:15:19AM -0800, Kent Overstreet wrote:
>> > Not on the list or I would've replied directly, but on Haswell,
>> > ChaCha20 (in software) is over 2x as fast as AES (in hardware), at
>> > realistic (for a filesystem) block sizes:

Apologies if this doesn't CC you - replying via gmane, since (not being 
subscribed via email either) I can't try the same trick I did to include 
Ted (i.e., reply via my mail client).

One useful trick, though - if you have a Usenet client, gmane _will_ let 
you reply directly, even to old messages. That's what I'm doing.

>> On Skylake and Broadwell processors, AES is faster (the posting is from
>> a ChaCha20 enthusiast):
>> 
>>      https://blog.cloudflare.com/it-takes-two-to-chacha-poly/
> 
> The performance delta in his graphs isn't near as big as what I've
> measured, which makes me suspect OpenSSL's ChaCha20 implementation isn't
> nearly as fast as the kernel's.
> 
>> My big worry though is that schemes that require that nonces/IV's must
>> **never** be reused are fragile.  It's for the same reason that DSA
>> makes my skin crawl.  If you ever screw up --- maybe after a crash, or
>> a file system bug, you end up reusing a nonce, it's game over.
>> 
>> So if there are hardware solutions which are faster or fast enough that
>> the crypto is no longer dominant cost, why not use a cipher scheme
>> which is more robust?
> 
> Block ciphers have their own downsides though - XTS is really a big pile
> of hacks and workarounds. On the whole, if you can get nonces right, a
> stream cipher cryptosystem (and ChaCha20 especially) is on the whole
> drastically simpler, and thus easier to understand and audit.

Yes, I would entirely agree with your assessment of XTS (in particular, 
the doubling of the length of the key is rooted in the original authors 
misunderstanding the XEX paper...).

> And if you can do nonces correctly, ChaCha20/Poly1305 is pretty much one
> of the gold standards - it's secure against pretty much any vaguely
> realistic threat model. XTS, not so much - it's just the best you can do
> given the constraints of typical disk crypto. The gold standards of
> encryption today are the AEADs - and AES/GCM fails badly with nonce
> reuse too, there aren't any AEADs yet that don't fail badly with nonce
> reuse.

Not true - SIV is a generic construction, which has been applied to AES 
(AES-SIV, RFC 5297) and ChaCha20 (HS1-SIV, submitted to CAESAR). There's 
also AES-GCM-SIV, which takes advantage of GCM hardware acceleration as 
well as AES acceleration.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ChaCha20 vs. AES performance
  2016-09-20 15:51   ` Kent Overstreet
  2016-09-20 20:35     ` Alex Elsayed
@ 2016-09-20 22:40     ` Mathieu Chouquet-Stringer
  1 sibling, 0 replies; 5+ messages in thread
From: Mathieu Chouquet-Stringer @ 2016-09-20 22:40 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: Theodore Ts'o, linux-btrfs

kent.overstreet@gmail.com (Kent Overstreet) writes:
> On Tue, Sep 20, 2016 at 10:23:20AM -0400, Theodore Ts'o wrote:
>> On Tue, Sep 20, 2016 at 03:15:19AM -0800, Kent Overstreet wrote:
>> > Not on the list or I would've replied directly, but on Haswell, ChaCha20 (in
>> > software) is over 2x as fast as AES (in hardware), at realistic (for a
>> > filesystem) block sizes:
>> 
>> On Skylake and Broadwell processors, AES is faster (the posting is
>> from a ChaCha20 enthusiast):
>> 
>>      https://blog.cloudflare.com/it-takes-two-to-chacha-poly/
>
> The performance delta in his graphs isn't near as big as what I've measured,
> which makes me suspect OpenSSL's ChaCha20 implementation isn't nearly as fast as
> the kernel's.

The other thing to keep in mind is this (aka what's true for a big intel
cpu isn't true everywhere): "The new cipher suites are fast. As Adam
Langley described, ChaCha20-Poly1305 is three times faster than
AES-128-GCM on mobile devices. Spending less time on decryption means
faster page rendering and better battery life."

https://blog.cloudflare.com/do-the-chacha-better-mobile-performance-with-cryptography/

The argument made by Bernstein is in a nutshell than "CPUs are optimized
for video games and thus ciphers should use the same instructions which
makes games 'faster'" (I'd recommend to read his whole email to understand
what he means):
https://moderncrypto.org/mail-archive/noise/2016/000699.html )

Or as one person commented on the net
https://news.ycombinator.com/item?id=12264321 :

Bernstein agrees with you. His point isn't that it's dumb that CPUs are
optimized for games. It's that cipher designers should have enough
awareness of trends in CPU development to design ciphers that take
advantage of the same features that games do. That's what he did with
Salsa/ChaCha. *His subtext is that over the medium term he believes his
ciphers will outperform AES, despite AES having AES-NI hardware
support.* (emphasis mine)

-- 
Mathieu Chouquet-Stringer
            The sun itself sees not till heaven clears.
	             -- William Shakespeare --

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-09-20 22:41 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-20 11:15 ChaCha20 vs. AES performance Kent Overstreet
2016-09-20 14:23 ` Theodore Ts'o
2016-09-20 15:51   ` Kent Overstreet
2016-09-20 20:35     ` Alex Elsayed
2016-09-20 22:40     ` Mathieu Chouquet-Stringer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.