linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* HalfSipHash Acceptable Usage
@ 2016-12-19 17:32 Jason A. Donenfeld
       [not found] ` <CAGiyFdduUNSGq24zfsk0ZU=hnOCmewAw8vw6XvDoS-3f+3UPKQ@mail.gmail.com>
  2016-12-20 21:36 ` Theodore Ts'o
  0 siblings, 2 replies; 25+ messages in thread
From: Jason A. Donenfeld @ 2016-12-19 17:32 UTC (permalink / raw)
  To: Jean-Philippe Aumasson
  Cc: Theodore Ts'o, Hannes Frederic Sowa, LKML, Eric Biggers,
	Daniel J . Bernstein, David Laight, David Miller, Andi Kleen,
	George Spelvin, kernel-hardening, Andy Lutomirski,
	Linux Crypto Mailing List, Tom Herbert, Vegard Nossum, Netdev,
	Linus Torvalds

Hi JP,

With the threads getting confusing, I've been urged to try and keep
the topics and threads more closely constrained. Here's where we're
at, and here's the current pressing security concern. It'd be helpful
to have a definitive statement on what you think is best, so we can
just build on top of that, instead of getting lost in the chorus of
opinions.

1) Anything that requires actual long-term security will use
SipHash2-4, with the 64-bit output and the 128-bit key. This includes
things like TCP sequence numbers. This seems pretty uncontroversial to
me. Seem okay to you?

2) People seem to want something competitive, performance-wise, with
jhash if it's going to replace jhash. The kernel community
instinctively pushes back on anything that could harm performance,
especially in networking and in critical data structures, so there
have been some calls for something faster than SipHash. So, questions
regarding this:

2a) George thinks that HalfSipHash on 32-bit systems will have roughly
comparable speed as SipHash on 64-bit systems, so the idea would be to
use HalfSipHash on 32-bit systems' hash tables and SipHash on 64-bit
systems' hash tables. The big obvious question is: does HalfSipHash
have a sufficient security margin for hashtable usage and hashtable
attacks? I'm not wondering about the security margin for other usages,
but just of the hashtable usage. In your opinion, does HalfSipHash cut
it?

2b) While I certainly wouldn't consider making the use case in
question (1) employ a weaker function, for this question (2), there
has been some discussion about using HalfSipHash1-3 (or SipHash1-3 on
64-bit) instead of 2-4. So, the same question is therefore posed:
would using HalfSipHash1-3 give a sufficient security margin for
hashtable usage and hashtable attacks?

My plan is essentially to implement things according to your security
recommendation. The thread started with me pushing a heavy duty
security solution -- SipHash2-4 -- for _everything_. I've received
understandable push back on that notion for certain use cases. So now
I'm trying to discover what the most acceptable compromise is. Your
answers on (2a) and (2b) will direct that compromise.

Thanks again,
Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HalfSipHash Acceptable Usage
       [not found] ` <CAGiyFdduUNSGq24zfsk0ZU=hnOCmewAw8vw6XvDoS-3f+3UPKQ@mail.gmail.com>
@ 2016-12-19 21:00   ` Jason A. Donenfeld
  0 siblings, 0 replies; 25+ messages in thread
From: Jason A. Donenfeld @ 2016-12-19 21:00 UTC (permalink / raw)
  To: Jean-Philippe Aumasson
  Cc: Theodore Ts'o, Hannes Frederic Sowa, LKML, Eric Biggers,
	Daniel J . Bernstein, David Laight, David Miller, Andi Kleen,
	George Spelvin, kernel-hardening, Andy Lutomirski,
	Linux Crypto Mailing List, Tom Herbert, Vegard Nossum, Netdev,
	Linus Torvalds

Hi JP,

On Mon, Dec 19, 2016 at 9:49 PM, Jean-Philippe Aumasson
<jeanphilippe.aumasson@gmail.com> wrote:
>
> On Mon, Dec 19, 2016 at 6:32 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>>
>> Hi JP,
>>
>> With the threads getting confusing, I've been urged to try and keep
>> the topics and threads more closely constrained. Here's where we're
>> at, and here's the current pressing security concern. It'd be helpful
>> to have a definitive statement on what you think is best, so we can
>> just build on top of that, instead of getting lost in the chorus of
>> opinions.
>>
>> 1) Anything that requires actual long-term security will use
>> SipHash2-4, with the 64-bit output and the 128-bit key. This includes
>> things like TCP sequence numbers. This seems pretty uncontroversial to
>> me. Seem okay to you?
>
>
>
> Right, since 2012 when we published SipHash many cryptanalysts attempted to
> break SipHash-2-4 with a 128-bit key, for various notions of "break", and
> nothing worth worrying was ever found. I'm totally confident that
> SipHash-2-4 will live up to its security promises.
>
> Don't use something weaker for things like TCP sequence numbers or RNGs. Use
> SipHash2-4 for those. That is the correct choice.
>
>>
>>
>> 2) People seem to want something competitive, performance-wise, with
>> jhash if it's going to replace jhash. The kernel community
>> instinctively pushes back on anything that could harm performance,
>> especially in networking and in critical data structures, so there
>> have been some calls for something faster than SipHash. So, questions
>> regarding this:
>>
>
> No free lunch I guess: either go with a cryptographically secure,
> time-proved keyed hash such as SipHash, or go with some simpler hash deemed
> secure cos its designer can't break it :) #DontRollYourOwnCrypto
>
>> 2a) George thinks that HalfSipHash on 32-bit systems will have roughly
>> comparable speed as SipHash on 64-bit systems, so the idea would be to
>> use HalfSipHash on 32-bit systems' hash tables and SipHash on 64-bit
>> systems' hash tables. The big obvious question is: does HalfSipHash
>> have a sufficient security margin for hashtable usage and hashtable
>> attacks? I'm not wondering about the security margin for other usages,
>> but just of the hashtable usage. In your opinion, does HalfSipHash cut
>> it?
>
>
> HalfSipHash takes its core function from Chaskey and uses the same
> construction as SipHash, so it *should* be secure. Nonetheless it hasn't
> received the same amount of attention as 64-bit SipHash did. So I'm less
> confident about its security than about SipHash's, but it obviously inspires
> a lot more confidence than non-crypto hashes.
>
> Too, HalfSipHash only has a 64-bit key, not a 128-bit key like SipHash, so
> only use this as a mitigation for hash-flooding attacks, where the output of
> the hash function is never directly shown to the caller. Do not use
> HalfSipHash for TCP sequence numbers or RNGs.
>
>
>>
>>
>> 2b) While I certainly wouldn't consider making the use case in
>> question (1) employ a weaker function, for this question (2), there
>> has been some discussion about using HalfSipHash1-3 (or SipHash1-3 on
>> 64-bit) instead of 2-4. So, the same question is therefore posed:
>> would using HalfSipHash1-3 give a sufficient security margin for
>> hashtable usage and hashtable attacks?
>
>
> My educated guess is that yes, it will, but that it may not withhold
> cryptanalysis as a pseudorandom function (PRF). For example I wouldn't be
> surprised if there were a "distinguishing attack" that detects non-random
> patterns in HalfSipHash-1-3's output. But most of the non-crypto hashes I've
> seen have obvious distinguishing attacks. So the upshot is that HSH will get
> you better security that AnyWeakHash even with 1 & 3 rounds.
>
> So, if you're willing to compromise on security, but still want something
> not completely unreasonable, you might be able to get away with using
> HalfSipHash1-3 as a replacement for jhash—in circumstances where the output
> of the hash function is kept secret—in order to mitigate hash-flooding
> attacks.
>

Thanks for the detailed response. I will continue exactly how you've specified.

1. SipHash2-4 for TCP sequence numbers, syncookies, and RNG. IOW, the
things that MD5 is used for now.

2. HalfSipHash1-3 for hash tables where the output is not revealed,
for jhash replacements. On 64-bit this will alias to SipHash1-3.

3. I will write Documentation/siphash.txt detailing this.

4. I'll continue to discourage other kernel developers from rolling
their own crypto or departing from the tried&true in substantial ways.

Thanks again,
Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HalfSipHash Acceptable Usage
  2016-12-19 17:32 HalfSipHash Acceptable Usage Jason A. Donenfeld
       [not found] ` <CAGiyFdduUNSGq24zfsk0ZU=hnOCmewAw8vw6XvDoS-3f+3UPKQ@mail.gmail.com>
@ 2016-12-20 21:36 ` Theodore Ts'o
  2016-12-20 23:07   ` George Spelvin
  2016-12-20 23:55   ` Eric Dumazet
  1 sibling, 2 replies; 25+ messages in thread
From: Theodore Ts'o @ 2016-12-20 21:36 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Jean-Philippe Aumasson, Hannes Frederic Sowa, LKML, Eric Biggers,
	Daniel J . Bernstein, David Laight, David Miller, Andi Kleen,
	George Spelvin, kernel-hardening, Andy Lutomirski,
	Linux Crypto Mailing List, Tom Herbert, Vegard Nossum, Netdev,
	Linus Torvalds

On Mon, Dec 19, 2016 at 06:32:44PM +0100, Jason A. Donenfeld wrote:
> 1) Anything that requires actual long-term security will use
> SipHash2-4, with the 64-bit output and the 128-bit key. This includes
> things like TCP sequence numbers. This seems pretty uncontroversial to
> me. Seem okay to you?

Um, why do TCP sequence numbers need long-term security?  So long as
you rekey every 5 minutes or so, TCP sequence numbers don't need any
more security than that, since even if you break the key used to
generate initial sequence numbers seven a minute or two later, any
pending TCP connections will have timed out long before.

See the security analysis done in RFC 6528[1], where among other
things, it points out why MD5 is acceptable with periodic rekeying,
although there is the concern that this could break certain hueristics
used when establishing new connections during the TIME-WAIT state.

[1] https://tools.ietf.org/html/rfc6528

						- Ted

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HalfSipHash Acceptable Usage
  2016-12-20 21:36 ` Theodore Ts'o
@ 2016-12-20 23:07   ` George Spelvin
  2016-12-20 23:55   ` Eric Dumazet
  1 sibling, 0 replies; 25+ messages in thread
From: George Spelvin @ 2016-12-20 23:07 UTC (permalink / raw)
  To: Jason, tytso
  Cc: ak, davem, David.Laight, djb, ebiggers3, hannes,
	jeanphilippe.aumasson, kernel-hardening, linux-crypto,
	linux-kernel, linux, luto, netdev, tom, torvalds, vegard.nossum

Theodore Ts'o wrote:
> On Mon, Dec 19, 2016 at 06:32:44PM +0100, Jason A. Donenfeld wrote:
>> 1) Anything that requires actual long-term security will use
>> SipHash2-4, with the 64-bit output and the 128-bit key. This includes
>> things like TCP sequence numbers. This seems pretty uncontroversial to
>> me. Seem okay to you?

> Um, why do TCP sequence numbers need long-term security?  So long as
> you rekey every 5 minutes or so, TCP sequence numbers don't need any
> more security than that, since even if you break the key used to
> generate initial sequence numbers seven a minute or two later, any
> pending TCP connections will have timed out long before.
> 
> See the security analysis done in RFC 6528[1], where among other
> things, it points out why MD5 is acceptable with periodic rekeying,
> although there is the concern that this could break certain hueristics
> used when establishing new connections during the TIME-WAIT state.

Because we don't rekey TCP sequence numbers, ever.  See commit
6e5714eaf77d79ae1c8b47e3e040ff5411b717ec

To rekey them requires dividing the sequence number base into a "random"
part and some "generation" msbits.  While we can do better than the
previous 8+24 split (I'd suggest 4+28 or 3+29), only 2 is tricks, and
1 generation bit isn't enough.

So while it helps in the long term, it reduces the security offered by
the random part in the short term.  (If I know 4 bits of your ISN,
I only need to send 256 MB to hit your TCP window.)

At the time, I objected, and suggested doing two hashes, with a fixed
32-bit base plus a split rekeyed portion, but that was vetoed on the
grounds of performance.

On further consideration, the fixed base doesn't help much.
(Details below for anyone that cares.)



Suppose we let the TCP initial sequence number be:

(Hash(<srcIP,dstIP,srcPort,dstPort>, fixed_key) & 0xffffffff) +
(i << 28) + (Hash(<srcIP,dstIP,srcPort,dstPort>, key[i]) & 0x0fffffff) +
(current_time_in_nanoseconds / 64)

It's not hugely difficult to mount an effective attack against a
64-bit fixed_key.

As an attacker, I can ask the target to send me these numbers for dstPort
values i control and other values I know.  I can (with high probability)
detect the large jumps when the generation changes, so I can make a
significant number of queries with the same generation.  After 23-ish
queries, I have enough information to identify a 64-bit fixed_key.

I don't know the current generation counter "i", but I know it's the
same for all my queries, so for any two queries, the maximum difference
between the 28-bit hash values is 29 bits.  (We can also add a small
margin to allow for timeing uncertainty, but that's even less.)

So if I guess a fixed key, hash my known plaintexts with that guess,
subtract the ciphertexts from the observed sequence numbers, and the
difference between the remaining (unknown) 28-bit hash values plus
timestamps exceeds what's possible, my guess is wrong.

I can then repeat with additional known plaintexts, reducing the space
of admissible keys by about 3 bits each time.

Assuming I can rent GPU horsepower from a bitcoin miner to do this in a
reasonable period of time, after 22 known plaintext differences, I have
uniquely identified the key.

Of course, in practice I'd do is a first pass with maybe 6 plaintexts
on the GPU, and then deal with the candidates found in a second pass.
But either way, it's about 2.3 SipHash evaluations per key tested.
As I noted earlier, a bitcoin blockchain block, worth 25 bitcoins,
currently costs 2^71 evaluations of SHA-2 (2^70 evaluations of double
SHA-2), and that's accomplished every 10 minutes, this is definitely
practical.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HalfSipHash Acceptable Usage
  2016-12-20 21:36 ` Theodore Ts'o
  2016-12-20 23:07   ` George Spelvin
@ 2016-12-20 23:55   ` Eric Dumazet
  2016-12-21  3:28     ` George Spelvin
  1 sibling, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2016-12-20 23:55 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Jason A. Donenfeld, Jean-Philippe Aumasson, Hannes Frederic Sowa,
	LKML, Eric Biggers, Daniel J . Bernstein, David Laight,
	David Miller, Andi Kleen, George Spelvin, kernel-hardening,
	Andy Lutomirski, Linux Crypto Mailing List, Tom Herbert,
	Vegard Nossum, Netdev, Linus Torvalds

On Tue, 2016-12-20 at 16:36 -0500, Theodore Ts'o wrote:
> On Mon, Dec 19, 2016 at 06:32:44PM +0100, Jason A. Donenfeld wrote:
> > 1) Anything that requires actual long-term security will use
> > SipHash2-4, with the 64-bit output and the 128-bit key. This includes
> > things like TCP sequence numbers. This seems pretty uncontroversial to
> > me. Seem okay to you?
> 
> Um, why do TCP sequence numbers need long-term security?  So long as
> you rekey every 5 minutes or so, TCP sequence numbers don't need any
> more security than that, since even if you break the key used to
> generate initial sequence numbers seven a minute or two later, any
> pending TCP connections will have timed out long before.
> 
> See the security analysis done in RFC 6528[1], where among other
> things, it points out why MD5 is acceptable with periodic rekeying,
> although there is the concern that this could break certain hueristics
> used when establishing new connections during the TIME-WAIT state.
> 
> [1] https://tools.ietf.org/html/rfc6528


We do not use rekeying for TCP ISN, not anymore after commit
6e5714eaf77d79ae1 (where we switched from MD4 to MD5 )

It might hurt some common cases and I do not believe it is mandated by a
current (ie not obsolete) RFC.

Our clock has a 64 ns resolution and 274 second period (commit
9b42c336d0641) (compared to 4 usec one in RFC 6528)

I do not see why SipHash, if faster than MD5 and more secure, would be a
problem.

Same for syncookies.

BTW, we probably should add a ratelimit on SYNACK retransmits,
because it seems that attackers understood linux kernels resist to
synfloods, and they (the bad guys) use reflection attacks.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HalfSipHash Acceptable Usage
  2016-12-20 23:55   ` Eric Dumazet
@ 2016-12-21  3:28     ` George Spelvin
  2016-12-21  5:29       ` Eric Dumazet
  0 siblings, 1 reply; 25+ messages in thread
From: George Spelvin @ 2016-12-21  3:28 UTC (permalink / raw)
  To: eric.dumazet, tytso
  Cc: ak, davem, David.Laight, djb, ebiggers3, hannes, Jason,
	jeanphilippe.aumasson, kernel-hardening, linux-crypto,
	linux-kernel, linux, luto, netdev, tom, torvalds, vegard.nossum

> I do not see why SipHash, if faster than MD5 and more secure, would be a
> problem.

Because on 32-bit x86, it's slower.

Cycles per byte on 1024 bytes of data:
			Pentium	Core 2	Ivy
			4	Duo	Bridge
SipHash-2-4		38.9	 8.3	 5.8
HalfSipHash-2-4		12.7	 4.5	 3.2
MD5			 8.3	 5.7	 4.7

SipHash is more parallelizable and runs faster on superscalar processors,
but MD5 is optimized for 2000-era processors, and is faster on them than
HalfSipHash even.

Now, in the applications we care about, we're hashing short blocks, and
SipHash has the advantage that it can hash less than 64 bytes.  But it
also pays a penalty on short blocks for the finalization, equivalent to
two words (16 bytes) of input.

It turns out that on both Ivy Bridge and Core 2 Duo, the crossover happens
between 23 (SipHash is faster) and 24 (MD5 is faster) bytes of input.

This is assuming you're adding the 1 byte of length padding to SipHash's
input, so 24 bytes pads to 4 64-bit words, which makes 2*4+4 = 12 rounds,
vs. one block for MD5.  (MD5 takes a similar jump between 55 and 56 bytes.)

On a P4, SipHash is *never* faster; it takes 2.5x longer than MD5 on a
12-byte block (an IPv4 address/port pair).

This is why there was discussion of using HalfSipHash on these machines.
(On a P4, the HalfSipHash/MD5 crossover is somewhere between 24 and 31
bytes; I haven't benchmarked every possible size.)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HalfSipHash Acceptable Usage
  2016-12-21  3:28     ` George Spelvin
@ 2016-12-21  5:29       ` Eric Dumazet
  2016-12-21  6:34         ` George Spelvin
  2016-12-21 14:42         ` Jason A. Donenfeld
  0 siblings, 2 replies; 25+ messages in thread
From: Eric Dumazet @ 2016-12-21  5:29 UTC (permalink / raw)
  To: George Spelvin
  Cc: tytso, ak, davem, David.Laight, djb, ebiggers3, hannes, Jason,
	jeanphilippe.aumasson, kernel-hardening, linux-crypto,
	linux-kernel, luto, netdev, tom, torvalds, vegard.nossum

On Tue, 2016-12-20 at 22:28 -0500, George Spelvin wrote:
> > I do not see why SipHash, if faster than MD5 and more secure, would be a
> > problem.
> 
> Because on 32-bit x86, it's slower.
> 
> Cycles per byte on 1024 bytes of data:
> 			Pentium	Core 2	Ivy
> 			4	Duo	Bridge
> SipHash-2-4		38.9	 8.3	 5.8
> HalfSipHash-2-4		12.7	 4.5	 3.2
> MD5			 8.3	 5.7	 4.7

So definitely not faster.

38 cycles per byte is a problem, considering IPV6 is ramping up.

But TCP session establishment on P4 is probably not a big deal.
Nobody would expect a P4 to handle gazillions of TCP flows (using a
32bit kernel)

What about SHA performance (syncookies) on P4 ?

Synfloods are probably the only case we might take care of for 2000-era
cpus.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HalfSipHash Acceptable Usage
  2016-12-21  5:29       ` Eric Dumazet
@ 2016-12-21  6:34         ` George Spelvin
  2016-12-21 14:24           ` Jason A. Donenfeld
  2016-12-21 14:42         ` Jason A. Donenfeld
  1 sibling, 1 reply; 25+ messages in thread
From: George Spelvin @ 2016-12-21  6:34 UTC (permalink / raw)
  To: eric.dumazet, linux
  Cc: ak, davem, David.Laight, djb, ebiggers3, hannes, Jason,
	jeanphilippe.aumasson, kernel-hardening, linux-crypto,
	linux-kernel, luto, netdev, tom, torvalds, tytso, vegard.nossum

Eric Dumazet wrote:
> On Tue, 2016-12-20 at 22:28 -0500, George Spelvin wrote:
>> Cycles per byte on 1024 bytes of data:
>> 			Pentium	Core 2	Ivy
>> 			4	Duo	Bridge
>> SipHash-2-4		38.9	 8.3	 5.8
>> HalfSipHash-2-4	12.7	 4.5	 3.2
>> MD5			 8.3	 5.7	 4.7
>
> So definitely not faster.
> 
> 38 cycles per byte is a problem, considering IPV6 is ramping up.

As I said earlier, SipHash performance on 32-bit x86 really sucks,
because it wants an absolute minimum of 9 32-bit registers (8 for the
state plus one temporary for the rotates), and x86 has 7.

> What about SHA performance (syncookies) on P4 ?

I recompiled with -mtune=pentium4 and re-ran.  MD5 time went *up* by
0.3 cycles/byte, HalfSipHash went down by 1 cycle, and SipHash didn't
change:

Cycles per byte on 1024 bytes of data:
		Pentium	Core 2	Ivy
		4	Duo	Bridge
SipHash-2-4	38.9	 8.3	 5.8
HalfSipHash-2-4	11.5	 4.5	 3.2
MD5		 8.6	 5.7	 4.7
SHA-1		19.0	 8.0	 6.8

(This is with a verbatim copy of the lib/sha1.c code; I might be
able to optimize it with some asm hackery.)

Anyway, you see why we were looking longingly at HalfSipHash.


In fact, I have an idea.  Allow me to make the following concrete
suggestion for using HalfSipHash with 128 bits of key material:

- 64 bits are used as the key.
- The other 64 bits are used as an IV which is prepended to
  the message to be hashed.

As a matter of practical implementation, we precompute the effect
of hashing the IV and store the 128-bit HalfSipHash state, which
is used just like a 128-bit key.

Because of the way it is constructed, it is obviously no weaker than
standard HalfSipHash's 64-bit security claim.

I don't know the security of this, and it's almost certainly weaker than
128 bits, but I *hope* it's at least a few bits stronger than 64 bits.
80 would be enough to dissuade any attacker without a six-figure budget
(that's per attack, not a one-time capital investment).  96 would be
ample for our purposes.

What I do know is that it makes a brute-force attack without
significant cryptanalytic effort impossible.

To match the spec exactly, we'd need to add the 8-byte IV length to
the length byte which pads the final block, but from a security point
of view, it does not matter.  As long as we are consistent within any
single key, any unique mapping between padding byte and message length
(mod 256) is equally good.

We may choose based on implementation convenience.

(Also note my earlier comments about when it is okay to omit the padding
length byte entirely: any time all the data to be hashed with a given
key is fixed in format or self-delimiting (e.g. null-terminated).
This applies to many of the networking uses.)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HalfSipHash Acceptable Usage
  2016-12-21  6:34         ` George Spelvin
@ 2016-12-21 14:24           ` Jason A. Donenfeld
  2016-12-21 15:55             ` George Spelvin
  0 siblings, 1 reply; 25+ messages in thread
From: Jason A. Donenfeld @ 2016-12-21 14:24 UTC (permalink / raw)
  To: George Spelvin
  Cc: Eric Dumazet, Andi Kleen, David Miller, David Laight,
	Daniel J . Bernstein, Eric Biggers, Hannes Frederic Sowa,
	Jean-Philippe Aumasson, kernel-hardening,
	Linux Crypto Mailing List, LKML, Andy Lutomirski, Netdev,
	Tom Herbert, Linus Torvalds, Theodore Ts'o, Vegard Nossum

Hi George,

On Wed, Dec 21, 2016 at 7:34 AM, George Spelvin
<linux@sciencehorizons.net> wrote:
> In fact, I have an idea.  Allow me to make the following concrete
> suggestion for using HalfSipHash with 128 bits of key material:
>
> - 64 bits are used as the key.
> - The other 64 bits are used as an IV which is prepended to
>   the message to be hashed.
>
> As a matter of practical implementation, we precompute the effect
> of hashing the IV and store the 128-bit HalfSipHash state, which
> is used just like a 128-bit key.
>
> Because of the way it is constructed, it is obviously no weaker than
> standard HalfSipHash's 64-bit security claim.
>
> I don't know the security of this, and it's almost certainly weaker than
> 128 bits, but I *hope* it's at least a few bits stronger than 64 bits.
> 80 would be enough to dissuade any attacker without a six-figure budget
> (that's per attack, not a one-time capital investment).  96 would be
> ample for our purposes.
>
> What I do know is that it makes a brute-force attack without
> significant cryptanalytic effort impossible.

Depends who's doing the cryptanalytic effort I guess.

Please don't roll your own crypto. It's a dangerous road. Putting
homebrew crypto into the kernel would be an error. Let's stick with
the constructions and security margins that the cryptographers give
us. JP made that fairly clear, I thought.

There are already people working on this problem who undergo peer
review and a career devoted to solving these problems. One result for
small systems that need 128-bit security is Chaskey, which you can go
read about if you're curious.

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HalfSipHash Acceptable Usage
  2016-12-21  5:29       ` Eric Dumazet
  2016-12-21  6:34         ` George Spelvin
@ 2016-12-21 14:42         ` Jason A. Donenfeld
  2016-12-21 15:56           ` Eric Dumazet
  1 sibling, 1 reply; 25+ messages in thread
From: Jason A. Donenfeld @ 2016-12-21 14:42 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: George Spelvin, Theodore Ts'o, Andi Kleen, David Miller,
	David Laight, Daniel J . Bernstein, Eric Biggers,
	Hannes Frederic Sowa, Jean-Philippe Aumasson, kernel-hardening,
	Linux Crypto Mailing List, LKML, Andy Lutomirski, Netdev,
	Tom Herbert, Linus Torvalds, Vegard Nossum

Hi Eric,

I computed performance numbers for both 32-bit and 64-bit using the
actual functions in which talking about replacing MD5 with SipHash.
The basic harness is here [1] if you're curious. SipHash was a pretty
clear winner for both cases.

x86_64:
[    1.714302] secure_tcpv6_sequence_number_md5# cycles: 102373398
[    1.747685] secure_tcp_sequence_number_md5# cycles: 92042258
[    1.773522] secure_tcpv6_sequence_number_siphash# cycles: 70786533
[    1.798701] secure_tcp_sequence_number_siphash# cycles: 68941043

x86:
[    1.635749] secure_tcpv6_sequence_number_md5# cycles: 106016335
[    1.670259] secure_tcp_sequence_number_md5# cycles: 95670512
[    1.708387] secure_tcpv6_sequence_number_siphash# cycles: 105988635
[    1.740264] secure_tcp_sequence_number_siphash# cycles: 88225395

>>> 102373398 > 70786533
True
>>> 92042258 > 68941043
True
>>> 106016335 > 105988635
True
>>> 95670512 > 88225395
True

While MD5 is probably faster for some kind of large-data
cycles-per-byte, due to its 64-byte internal state, SipHash -- the
"Sip" part standing "Short Input PRF" -- is fast for shorter inputs.
In practice with the functions we're talking about replacing, there's
no need to hash 64-bytes. So, SipHash comes out faster and more
secure.

I also haven't begun to look focusedly at the assembly my SipHash
implemention is generating, which means there's still window for even
more performance improvements.

Jason


[1] https://git.zx2c4.com/linux-dev/tree/net/core/secure_seq.c?h=siphash-bench#n194

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HalfSipHash Acceptable Usage
  2016-12-21 14:24           ` Jason A. Donenfeld
@ 2016-12-21 15:55             ` George Spelvin
  2016-12-21 16:37               ` Jason A. Donenfeld
                                 ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: George Spelvin @ 2016-12-21 15:55 UTC (permalink / raw)
  To: Jason, linux
  Cc: ak, davem, David.Laight, djb, ebiggers3, eric.dumazet, hannes,
	jeanphilippe.aumasson, kernel-hardening, linux-crypto,
	linux-kernel, luto, netdev, tom, torvalds, tytso, vegard.nossum

Actually, DJB just made a very relevant suggestion.

As I've mentioned, the 32-bit performance problems are an x86-specific
problem.  ARM does very well, and other processors aren't bad at all.

SipHash fits very nicely (and runs very fast) in the MMX registers.

They're 64 bits, and there are 8 of them, so the integer registers can
be reserved for pointers and loop counters and all that.  And there's
reference code available.

How much does kernel_fpu_begin()/kernel_fpu_end() cost?

Although there are a lot of pre-MMX x86es in embedded control applications,
I don't think anyone is worried about their networking performance.
(Specifically, all of this affects only connection setup, not throughput 
on established connections.)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HalfSipHash Acceptable Usage
  2016-12-21 14:42         ` Jason A. Donenfeld
@ 2016-12-21 15:56           ` Eric Dumazet
  2016-12-21 16:33             ` Jason A. Donenfeld
                               ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Eric Dumazet @ 2016-12-21 15:56 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: George Spelvin, Theodore Ts'o, Andi Kleen, David Miller,
	David Laight, Daniel J . Bernstein, Eric Biggers,
	Hannes Frederic Sowa, Jean-Philippe Aumasson, kernel-hardening,
	Linux Crypto Mailing List, LKML, Andy Lutomirski, Netdev,
	Tom Herbert, Linus Torvalds, Vegard Nossum

On Wed, 2016-12-21 at 15:42 +0100, Jason A. Donenfeld wrote:
> Hi Eric,
> 
> I computed performance numbers for both 32-bit and 64-bit using the
> actual functions in which talking about replacing MD5 with SipHash.
> The basic harness is here [1] if you're curious. SipHash was a pretty
> clear winner for both cases.
> 
> x86_64:
> [    1.714302] secure_tcpv6_sequence_number_md5# cycles: 102373398
> [    1.747685] secure_tcp_sequence_number_md5# cycles: 92042258
> [    1.773522] secure_tcpv6_sequence_number_siphash# cycles: 70786533
> [    1.798701] secure_tcp_sequence_number_siphash# cycles: 68941043
> 
> x86:
> [    1.635749] secure_tcpv6_sequence_number_md5# cycles: 106016335
> [    1.670259] secure_tcp_sequence_number_md5# cycles: 95670512
> [    1.708387] secure_tcpv6_sequence_number_siphash# cycles: 105988635
> [    1.740264] secure_tcp_sequence_number_siphash# cycles: 88225395
> 
> >>> 102373398 > 70786533
> True
> >>> 92042258 > 68941043
> True
> >>> 106016335 > 105988635
> True
> >>> 95670512 > 88225395
> True
> 
> While MD5 is probably faster for some kind of large-data
> cycles-per-byte, due to its 64-byte internal state, SipHash -- the
> "Sip" part standing "Short Input PRF" -- is fast for shorter inputs.
> In practice with the functions we're talking about replacing, there's
> no need to hash 64-bytes. So, SipHash comes out faster and more
> secure.
> 
> I also haven't begun to look focusedly at the assembly my SipHash
> implemention is generating, which means there's still window for even
> more performance improvements.
> 
> Jason
> 
> 
> [1] https://git.zx2c4.com/linux-dev/tree/net/core/secure_seq.c?h=siphash-bench#n194

Now I am quite confused.

George said :

> Cycles per byte on 1024 bytes of data:
>                       Pentium Core 2  Ivy
>                       4       Duo     Bridge
> SipHash-2-4           38.9     8.3     5.8
> HalfSipHash-2-4               12.7     4.5     3.2
> MD5                    8.3     5.7     4.7


That really was for 1024 bytes blocks, so pretty much useless for our
discussion ?

Reading your numbers last week, I thought SipHash was faster, but George
numbers are giving the opposite impression.

I do not have a P4 to make tests, so I only can trust you or George.

Thanks.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HalfSipHash Acceptable Usage
  2016-12-21 15:56           ` Eric Dumazet
@ 2016-12-21 16:33             ` Jason A. Donenfeld
  2016-12-21 16:39             ` [kernel-hardening] " Rik van Riel
  2016-12-21 18:37             ` George Spelvin
  2 siblings, 0 replies; 25+ messages in thread
From: Jason A. Donenfeld @ 2016-12-21 16:33 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: George Spelvin, Theodore Ts'o, Andi Kleen, David Miller,
	David Laight, Daniel J . Bernstein, Eric Biggers,
	Hannes Frederic Sowa, Jean-Philippe Aumasson, kernel-hardening,
	Linux Crypto Mailing List, LKML, Andy Lutomirski, Netdev,
	Tom Herbert, Linus Torvalds, Vegard Nossum

Hi Eric,

On Wed, Dec 21, 2016 at 4:56 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> That really was for 1024 bytes blocks, so pretty much useless for our
> discussion ?
>
> Reading your numbers last week, I thought SipHash was faster, but George
> numbers are giving the opposite impression.
>
> I do not have a P4 to make tests, so I only can trust you or George.

I'm not sure how George came up with those numbers, but the ones I
sent are output from that benchmark function in the last email. I'd be
interested in learning this too.

As mentioned in the last email, it looks like potential 32-bit issues
are really just specific to old Intel chips. Other 32-bit
architectures do fine. So, for new kernels, even if somehow there is a
tiny performance regression (though I couldn't see one) on old
architectures, I really doubt it will affect anybody in practice.

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HalfSipHash Acceptable Usage
  2016-12-21 15:55             ` George Spelvin
@ 2016-12-21 16:37               ` Jason A. Donenfeld
  2016-12-21 16:41               ` [kernel-hardening] " Rik van Riel
  2016-12-21 17:25               ` Linus Torvalds
  2 siblings, 0 replies; 25+ messages in thread
From: Jason A. Donenfeld @ 2016-12-21 16:37 UTC (permalink / raw)
  To: George Spelvin
  Cc: Andi Kleen, David Miller, David Laight, Daniel J . Bernstein,
	Eric Biggers, Eric Dumazet, Hannes Frederic Sowa,
	Jean-Philippe Aumasson, kernel-hardening,
	Linux Crypto Mailing List, LKML, Andy Lutomirski, Netdev,
	Tom Herbert, Linus Torvalds, Theodore Ts'o, Vegard Nossum

Hi George,

On Wed, Dec 21, 2016 at 4:55 PM, George Spelvin
<linux@sciencehorizons.net> wrote:
> Actually, DJB just made a very relevant suggestion.
>
> As I've mentioned, the 32-bit performance problems are an x86-specific
> problem.  ARM does very well, and other processors aren't bad at all.
>
> SipHash fits very nicely (and runs very fast) in the MMX registers.
>
> They're 64 bits, and there are 8 of them, so the integer registers can
> be reserved for pointers and loop counters and all that.  And there's
> reference code available.
>
> How much does kernel_fpu_begin()/kernel_fpu_end() cost?

In my experience, these functions are only worth calling when
processing more significant amounts of data. I don't have any
benchmarks, but when I _remove_ all of these calls in a kernel,
accelerated crypto gets noticeably faster (until the system crashes).
We can measure it, though.

By the way, if somehow SipHash becomes acceptably fast on x86, would
you consider HalfSipHash for hash tables to be no longer needed? Or do
you suspect that HalfSipHash will always be faster even on, say,
32-bit ARM.

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [kernel-hardening] Re: HalfSipHash Acceptable Usage
  2016-12-21 15:56           ` Eric Dumazet
  2016-12-21 16:33             ` Jason A. Donenfeld
@ 2016-12-21 16:39             ` Rik van Riel
  2016-12-21 17:08               ` Eric Dumazet
  2016-12-21 18:37             ` George Spelvin
  2 siblings, 1 reply; 25+ messages in thread
From: Rik van Riel @ 2016-12-21 16:39 UTC (permalink / raw)
  To: kernel-hardening, Jason A. Donenfeld
  Cc: George Spelvin, Theodore Ts'o, Andi Kleen, David Miller,
	David Laight, Daniel J . Bernstein, Eric Biggers,
	Hannes Frederic Sowa, Jean-Philippe Aumasson,
	Linux Crypto Mailing List, LKML, Andy Lutomirski, Netdev,
	Tom Herbert, Linus Torvalds, Vegard Nossum

[-- Attachment #1: Type: text/plain, Size: 1167 bytes --]

On Wed, 2016-12-21 at 07:56 -0800, Eric Dumazet wrote:
> On Wed, 2016-12-21 at 15:42 +0100, Jason A. Donenfeld wrote:
> George said :
> 
> > Cycles per byte on 1024 bytes of data:
> >                       Pentium Core 2  Ivy
> >                       4       Duo     Bridge
> > SipHash-2-4           38.9     8.3     5.8
> > HalfSipHash-2-4       12.7     4.5     3.2
> > MD5                    8.3     5.7     4.7
> 
> 
> That really was for 1024 bytes blocks, so pretty much useless for our
> discussion ?
> 
> Reading your numbers last week, I thought SipHash was faster, but
> George
> numbers are giving the opposite impression.
> 
> I do not have a P4 to make tests, so I only can trust you or George.

Does anybody still have a P4?

If they do, they're probably better off replacing
it with an Atom. The reduced power bills will pay
for replacing that P4 within a year or two.

In short, I am not sure how important the P4
performance numbers are, especially if we can
improve security for everybody else...

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [kernel-hardening] Re: HalfSipHash Acceptable Usage
  2016-12-21 15:55             ` George Spelvin
  2016-12-21 16:37               ` Jason A. Donenfeld
@ 2016-12-21 16:41               ` Rik van Riel
  2016-12-21 17:25               ` Linus Torvalds
  2 siblings, 0 replies; 25+ messages in thread
From: Rik van Riel @ 2016-12-21 16:41 UTC (permalink / raw)
  To: kernel-hardening, Jason, linux
  Cc: ak, davem, David.Laight, djb, ebiggers3, eric.dumazet, hannes,
	jeanphilippe.aumasson, linux-crypto, linux-kernel, luto, netdev,
	tom, torvalds, tytso, vegard.nossum

[-- Attachment #1: Type: text/plain, Size: 710 bytes --]

On Wed, 2016-12-21 at 10:55 -0500, George Spelvin wrote:
> Actually, DJB just made a very relevant suggestion.
> 
> As I've mentioned, the 32-bit performance problems are an x86-
> specific
> problem.  ARM does very well, and other processors aren't bad at all.
> 
> SipHash fits very nicely (and runs very fast) in the MMX registers.
> 
> They're 64 bits, and there are 8 of them, so the integer registers
> can
> be reserved for pointers and loop counters and all that.  And there's
> reference code available.
> 
> How much does kernel_fpu_begin()/kernel_fpu_end() cost?

Those can be very expensive. Almost certainly not
worth it for small amounts of data.

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [kernel-hardening] Re: HalfSipHash Acceptable Usage
  2016-12-21 16:39             ` [kernel-hardening] " Rik van Riel
@ 2016-12-21 17:08               ` Eric Dumazet
  0 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2016-12-21 17:08 UTC (permalink / raw)
  To: Rik van Riel
  Cc: kernel-hardening, Jason A. Donenfeld, George Spelvin,
	Theodore Ts'o, Andi Kleen, David Miller, David Laight,
	Daniel J . Bernstein, Eric Biggers, Hannes Frederic Sowa,
	Jean-Philippe Aumasson, Linux Crypto Mailing List, LKML,
	Andy Lutomirski, Netdev, Tom Herbert, Linus Torvalds,
	Vegard Nossum

On Wed, 2016-12-21 at 11:39 -0500, Rik van Riel wrote:

> Does anybody still have a P4?
> 
> If they do, they're probably better off replacing
> it with an Atom. The reduced power bills will pay
> for replacing that P4 within a year or two.

Well, maybe they have millions of units to replace.

> 
> In short, I am not sure how important the P4
> performance numbers are, especially if we can
> improve security for everybody else...

Worth adding that the ISN or syncookie generation are less than 10% of
the actual cost of handling a problematic (having to generate ISN or
syncookie) TCP packet anyway.

So we are talking of minors potential impact for '2000-era' cpus.

Definitely I vote for using SipHash in TCP ASAP.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HalfSipHash Acceptable Usage
  2016-12-21 15:55             ` George Spelvin
  2016-12-21 16:37               ` Jason A. Donenfeld
  2016-12-21 16:41               ` [kernel-hardening] " Rik van Riel
@ 2016-12-21 17:25               ` Linus Torvalds
  2016-12-21 18:07                 ` George Spelvin
  2016-12-22  1:54                 ` Andy Lutomirski
  2 siblings, 2 replies; 25+ messages in thread
From: Linus Torvalds @ 2016-12-21 17:25 UTC (permalink / raw)
  To: George Spelvin
  Cc: Jason A. Donenfeld, Andi Kleen, David Miller, David Laight,
	Daniel J . Bernstein, Eric Biggers, Eric Dumazet,
	Hannes Frederic Sowa, Jean-Philippe Aumasson, kernel-hardening,
	Linux Crypto Mailing List, Linux Kernel Mailing List,
	Andy Lutomirski, Network Development, Tom Herbert,
	Theodore Ts'o, Vegard Nossum

On Wed, Dec 21, 2016 at 7:55 AM, George Spelvin
<linux@sciencehorizons.net> wrote:
>
> How much does kernel_fpu_begin()/kernel_fpu_end() cost?

It's now better than it used to be, but it's absolutely disastrous
still. We're talking easily many hundreds of cycles. Under some loads,
thousands.

And I warn you already: it will _benchmark_ a hell of a lot better
than it will work in reality. In benchmarks, you'll hit all the
optimizations ("oh, I've already saved away all the FP registers, no
need to do it again").

In contrast, in reality, especially with things like "do it once or
twice per incoming packet", you'll easily hit the absolute worst
cases, where not only does it take a few hundred cycles to save the FP
state, you'll then return to user space in between packets, which
triggers the slow-path return code and reloads the FP state, which is
another few hundred cycles plus.

Similarly, in benchmarks you'll hit the "modern CPU's power on the AVX
unit and keep it powered up for a while afterwards", while in real
life you would quite easily hit the "oh, AVX is powered down because
we were idle, now it powers up at half speed which is another latency
hit _and_ the AVX unit won't run full out anyway".

Don't do it. There are basically no real situations where the AVX
state optimizations help for the kernel. We just don't have the loop
counts to make up for the problems it causes.

The one exception is likely if you're doing things like
high-throughput disk IO encryption, and then you'd be much better off
using SHA256 instead (which often has hw encryption on modern CPU's -
both x86 and ARM).

(I'm sure that you could see it on some high-throughput network
benchmark too when the benchmark entirely saturates the CPU. And then
in real life it would suck horribly for all the reasons above).

                Linus

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HalfSipHash Acceptable Usage
  2016-12-21 17:25               ` Linus Torvalds
@ 2016-12-21 18:07                 ` George Spelvin
  2016-12-22  1:54                 ` Andy Lutomirski
  1 sibling, 0 replies; 25+ messages in thread
From: George Spelvin @ 2016-12-21 18:07 UTC (permalink / raw)
  To: linux, torvalds
  Cc: ak, davem, David.Laight, djb, ebiggers3, eric.dumazet, hannes,
	Jason, jeanphilippe.aumasson, kernel-hardening, linux-crypto,
	linux-kernel, luto, netdev, tom, tytso, vegard.nossum

Linus wrote:
>> How much does kernel_fpu_begin()/kernel_fpu_end() cost?
>
> It's now better than it used to be, but it's absolutely disastrous
> still. We're talking easily many hundreds of cycles. Under some loads,
> thousands.

I think I've been thoroughly dissuaded, but just to clarify one thing
that resembles a misunderstanding:

> In contrast, in reality, especially with things like "do it once or
> twice per incoming packet", you'll easily hit the absolute worst
> cases, where not only does it take a few hundred cycles to save the FP
> state, you'll then return to user space in between packets, which
> triggers the slow-path return code and reloads the FP state, which is
> another few hundred cycles plus.

Everything being discussed is per-TCP-connection overhead, *not* per
packet.  (Twice for outgoing connections, because one is to generate
the ephemeral port number.)

I know you know this, but I don't want anyone spectating to be confused
about it.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HalfSipHash Acceptable Usage
  2016-12-21 15:56           ` Eric Dumazet
  2016-12-21 16:33             ` Jason A. Donenfeld
  2016-12-21 16:39             ` [kernel-hardening] " Rik van Riel
@ 2016-12-21 18:37             ` George Spelvin
  2016-12-21 18:40               ` Jason A. Donenfeld
  2016-12-21 22:27               ` Theodore Ts'o
  2 siblings, 2 replies; 25+ messages in thread
From: George Spelvin @ 2016-12-21 18:37 UTC (permalink / raw)
  To: eric.dumazet, Jason
  Cc: ak, davem, David.Laight, djb, ebiggers3, hannes,
	jeanphilippe.aumasson, kernel-hardening, linux-crypto,
	linux-kernel, linux, luto, netdev, tom, torvalds, tytso,
	vegard.nossum

Eric Dumazet wrote:
> Now I am quite confused.
>
> George said :
>> Cycles per byte on 1024 bytes of data:
>>                       Pentium Core 2  Ivy
>>                       4       Duo     Bridge
>> SipHash-2-4           38.9     8.3     5.8
>> HalfSipHash-2-4       12.7     4.5     3.2
>> MD5                    8.3     5.7     4.7
>
> That really was for 1024 bytes blocks, so pretty much useless for our
> discussion ?

No, they're actually quite relevant, but you have to interpret them
correctly.  I thought I explained in the text following that table,
but let me make it clearer:

To find the time to compute the SipHash of N bytes, round (N+17) up to
the next multiple of 8 bytes and multiply by the numbers above.

To find the time to compute the HalfSipHash of N bytes, round (N+9) up to
the next multiple of 4 bytes and multiply by the numbers above.

To find the time to compute the MD5 of N bytes, round (N+9) up to the
next multiple of 64 bytes and multiply by the numbers above.

It's the different rounding rules that make all the difference.  For small
input blocks, SipHash can be slower per byte yet still faster because
it hashes fewer bytes.

> Reading your numbers last week, I thought SipHash was faster, but George
> numbers are giving the opposite impression.

SipHash annihilates the competition on 64-bit superscalar hardware.
SipHash dominates the field on 64-bit in-order hardware.
SipHash wins easily on 32-bit hardware *with enough registers*.
On register-starved 32-bit machines, it really struggles.

As I explained, in that last case, SipHash barely wins at all.
(On a P4, it actually *loses* to MD5, not that anyone cares.  Running
on a P4 and caring about performance are mutually exclusive.)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HalfSipHash Acceptable Usage
  2016-12-21 18:37             ` George Spelvin
@ 2016-12-21 18:40               ` Jason A. Donenfeld
  2016-12-21 22:27               ` Theodore Ts'o
  1 sibling, 0 replies; 25+ messages in thread
From: Jason A. Donenfeld @ 2016-12-21 18:40 UTC (permalink / raw)
  To: George Spelvin
  Cc: Eric Dumazet, Andi Kleen, David Miller, David Laight,
	Daniel J . Bernstein, Eric Biggers, Hannes Frederic Sowa,
	Jean-Philippe Aumasson, kernel-hardening,
	Linux Crypto Mailing List, LKML, Andy Lutomirski, Netdev,
	Tom Herbert, Linus Torvalds, Theodore Ts'o, Vegard Nossum

On Wed, Dec 21, 2016 at 7:37 PM, George Spelvin
<linux@sciencehorizons.net> wrote:
> SipHash annihilates the competition on 64-bit superscalar hardware.
> SipHash dominates the field on 64-bit in-order hardware.
> SipHash wins easily on 32-bit hardware *with enough registers*.
> On register-starved 32-bit machines, it really struggles.
>
> As I explained, in that last case, SipHash barely wins at all.
> (On a P4, it actually *loses* to MD5, not that anyone cares.  Running
> on a P4 and caring about performance are mutually exclusive.)

>From the discussion off list which examined your benchmark code, it
looks like we're going to move ahead with SipHash.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HalfSipHash Acceptable Usage
  2016-12-21 18:37             ` George Spelvin
  2016-12-21 18:40               ` Jason A. Donenfeld
@ 2016-12-21 22:27               ` Theodore Ts'o
  2016-12-22  0:18                 ` George Spelvin
  2016-12-22  1:13                 ` George Spelvin
  1 sibling, 2 replies; 25+ messages in thread
From: Theodore Ts'o @ 2016-12-21 22:27 UTC (permalink / raw)
  To: George Spelvin
  Cc: eric.dumazet, Jason, ak, davem, David.Laight, djb, ebiggers3,
	hannes, jeanphilippe.aumasson, kernel-hardening, linux-crypto,
	linux-kernel, luto, netdev, tom, torvalds, vegard.nossum

On Wed, Dec 21, 2016 at 01:37:51PM -0500, George Spelvin wrote:
> SipHash annihilates the competition on 64-bit superscalar hardware.
> SipHash dominates the field on 64-bit in-order hardware.
> SipHash wins easily on 32-bit hardware *with enough registers*.
> On register-starved 32-bit machines, it really struggles.

And "with enough registers" includes ARM and MIPS, right?  So the only
real problem is 32-bit x86, and you're right, at that point, only
people who might care are people who are using a space-radiation
hardened 386 --- and they're not likely to be doing high throughput
TCP connections.  :-)

					- Ted

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HalfSipHash Acceptable Usage
  2016-12-21 22:27               ` Theodore Ts'o
@ 2016-12-22  0:18                 ` George Spelvin
  2016-12-22  1:13                 ` George Spelvin
  1 sibling, 0 replies; 25+ messages in thread
From: George Spelvin @ 2016-12-22  0:18 UTC (permalink / raw)
  To: linux, tytso
  Cc: ak, davem, David.Laight, djb, ebiggers3, eric.dumazet, hannes,
	Jason, jeanphilippe.aumasson, kernel-hardening, linux-crypto,
	linux-kernel, luto, netdev, tom, torvalds, vegard.nossum

Theodore Ts'o wrote:
> On Wed, Dec 21, 2016 at 01:37:51PM -0500, George Spelvin wrote:
>> SipHash annihilates the competition on 64-bit superscalar hardware.
>> SipHash dominates the field on 64-bit in-order hardware.
>> SipHash wins easily on 32-bit hardware *with enough registers*.
>> On register-starved 32-bit machines, it really struggles.

> And "with enough registers" includes ARM and MIPS, right?

Yes.  As a matter of fact, 32-bit ARM does particularly well
on 64-bit SipHash due to its shift+op instructions.

There is a noticeable performance drop, but nothing catastrophic.

The main thing I've been worried about is all the flow tracking
and NAT done by small home routers, and that's addressed by using
HalfSipHash for the hash tables.  They don't *initiate* a lot of
TCP sessions.

> So the only
> real problem is 32-bit x86, and you're right, at that point, only
> people who might care are people who are using a space-radiation
> hardened 386 --- and they're not likely to be doing high throughput
> TCP connections.  :-)

The only requirement on performance is "don't make DaveM angry." :-)

I was just trying to answer the question of why we *worried* about the
performance, not specifically argue that we *should* use HalfSipHash.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HalfSipHash Acceptable Usage
  2016-12-21 22:27               ` Theodore Ts'o
  2016-12-22  0:18                 ` George Spelvin
@ 2016-12-22  1:13                 ` George Spelvin
  1 sibling, 0 replies; 25+ messages in thread
From: George Spelvin @ 2016-12-22  1:13 UTC (permalink / raw)
  To: linux, tytso
  Cc: ak, davem, David.Laight, djb, ebiggers3, eric.dumazet, hannes,
	Jason, jeanphilippe.aumasson, kernel-hardening, linux-crypto,
	linux-kernel, luto, netdev, tom, torvalds, vegard.nossum

As a separate message, to disentangle the threads, I'd like to
talk about get_random_long().

After some thinking, I still like the "state-preserving" construct
that's equivalent to the current MD5 code.  Yes, we could just do
siphash(current_cpu || per_cpu_counter, global_key), but it's nice to
preserve a bit more.

It requires library support from the SipHash code to return the full
SipHash state, but I hope that's a fair thing to ask for.

Here's my current straw man design for comment.  It's very similar to
the current MD5-based design, but feeds all the seed material in the
"correct" way, as opposed to Xring directly into the MD5 state.

* Each CPU has a (Half)SipHash state vector,
  "unsigned long get_random_int_hash[4]".  Unlike the current
  MD5 code, we take care to initialize it to an asymmetric state.

* There's a global 256-bit random_int_secret (which we could
  reseed periodically).

To generate a random number:
* If get_random_int_hash is all-zero, seed it with fresh a half-sized
  SipHash key and the appropriate XOR constants.
* Generate three words of random_get_entropy(), jiffies, and current->pid.
  (This is arbitary seed material, copied from the current code.)
* Crank through that with (Half)SipHash-1-0.
* Crank through the random_int_secret with (Half)SipHash-1-0.
* Return v1 ^ v3.

Here are the reasons:
* The first step is just paranoia, but SipHash's security promise depends
  on starting with an asymmetric state, we want unique per-CPU states,
  and it's a one-time cost.
* When the input words are themselves secret, there's no security
  advantage, and almost no speed advantage, to doing two rounds for one
  input word versus two words with one round each.  Thus, SipHash-1.
* The above is not exactly true, due to the before+after XOR pattern
  that SipHash uses, but I think it's true anyway.
* Likewise, there's no benefit to unkeyed finalization rounds over keyed
  ones.  That's why I just enlarged the global secret.
* The per-call seed material is hashed first on general principles,
  because that's the novel part that might have fresh entropy.
* To the extent the initial state is secret, the rounds processing the
  global secret are 4 finalization rounds for the initial state and
  the per-call entropy.
* The final word(s) of the global secret might be vulnerable to analysis,
  due to incomplete mixing, but since the global secret is always hashed
  in the same order, and larger that the desired security level, the
  initial words should be secure.
* By carrying forward the full internal state, we ensure that repeated
  calls return different results, and to the extent that the per-call
  seed material has entropy, it's preserved.
* The final return is all that's needed, since the last steps in the 
  SipRound are "v1 ^= v2" and "v3 ^= v0".  It's no security loss,
  and a very minor speedup.
* Also, this avoids directly "exposing" the final XOR with the last
  word of the global secret (which is made to v0).

If I'm allowed to use full SipHash, some shortcuts can be taken,
but I believe the above would be secure with HalfSipHash.

If additional performance is required, I'd consider shrinking the
global secret to 192 bits on 32-bit machines but I want more than
128 bits of ey material, and enough rounds to be equivalent to 4
finalization rounds.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HalfSipHash Acceptable Usage
  2016-12-21 17:25               ` Linus Torvalds
  2016-12-21 18:07                 ` George Spelvin
@ 2016-12-22  1:54                 ` Andy Lutomirski
  1 sibling, 0 replies; 25+ messages in thread
From: Andy Lutomirski @ 2016-12-22  1:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: George Spelvin, Jason A. Donenfeld, Andi Kleen, David Miller,
	David Laight, Daniel J . Bernstein, Eric Biggers, Eric Dumazet,
	Hannes Frederic Sowa, Jean-Philippe Aumasson, kernel-hardening,
	Linux Crypto Mailing List, Linux Kernel Mailing List,
	Network Development, Tom Herbert, Theodore Ts'o,
	Vegard Nossum

On Wed, Dec 21, 2016 at 9:25 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Wed, Dec 21, 2016 at 7:55 AM, George Spelvin
> <linux@sciencehorizons.net> wrote:
>>
>> How much does kernel_fpu_begin()/kernel_fpu_end() cost?
>
> It's now better than it used to be, but it's absolutely disastrous
> still. We're talking easily many hundreds of cycles. Under some loads,
> thousands.
>
> And I warn you already: it will _benchmark_ a hell of a lot better
> than it will work in reality. In benchmarks, you'll hit all the
> optimizations ("oh, I've already saved away all the FP registers, no
> need to do it again").
>
> In contrast, in reality, especially with things like "do it once or
> twice per incoming packet", you'll easily hit the absolute worst
> cases, where not only does it take a few hundred cycles to save the FP
> state, you'll then return to user space in between packets, which
> triggers the slow-path return code and reloads the FP state, which is
> another few hundred cycles plus.

Hah, you're thinking that the x86 code works the way that Rik and I
want it to work, and you just made my day. :)  What actually happens
is that the state is saved in kernel_fpu_begin() and restored in
kernel_fpu_end(), and it'll take a few hundred cycles best case.  If
you do it a bunch of times in a loop, you *might* trigger a CPU
optimization that notices that the state being saved is the same state
that was just restored, but you're still going to pay the full restore
code each round trip no matter what.

The code is much clearer in 4.10 kernels now that I deleted the unused
"lazy" branches.

>
> Similarly, in benchmarks you'll hit the "modern CPU's power on the AVX
> unit and keep it powered up for a while afterwards", while in real
> life you would quite easily hit the "oh, AVX is powered down because
> we were idle, now it powers up at half speed which is another latency
> hit _and_ the AVX unit won't run full out anyway".

I *think* that was mostly fixed in Broadwell or thereabouts (in terms
of latency -- throughput and power consumption still suffers).

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2016-12-22  2:01 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-19 17:32 HalfSipHash Acceptable Usage Jason A. Donenfeld
     [not found] ` <CAGiyFdduUNSGq24zfsk0ZU=hnOCmewAw8vw6XvDoS-3f+3UPKQ@mail.gmail.com>
2016-12-19 21:00   ` Jason A. Donenfeld
2016-12-20 21:36 ` Theodore Ts'o
2016-12-20 23:07   ` George Spelvin
2016-12-20 23:55   ` Eric Dumazet
2016-12-21  3:28     ` George Spelvin
2016-12-21  5:29       ` Eric Dumazet
2016-12-21  6:34         ` George Spelvin
2016-12-21 14:24           ` Jason A. Donenfeld
2016-12-21 15:55             ` George Spelvin
2016-12-21 16:37               ` Jason A. Donenfeld
2016-12-21 16:41               ` [kernel-hardening] " Rik van Riel
2016-12-21 17:25               ` Linus Torvalds
2016-12-21 18:07                 ` George Spelvin
2016-12-22  1:54                 ` Andy Lutomirski
2016-12-21 14:42         ` Jason A. Donenfeld
2016-12-21 15:56           ` Eric Dumazet
2016-12-21 16:33             ` Jason A. Donenfeld
2016-12-21 16:39             ` [kernel-hardening] " Rik van Riel
2016-12-21 17:08               ` Eric Dumazet
2016-12-21 18:37             ` George Spelvin
2016-12-21 18:40               ` Jason A. Donenfeld
2016-12-21 22:27               ` Theodore Ts'o
2016-12-22  0:18                 ` George Spelvin
2016-12-22  1:13                 ` George Spelvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).