linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [kernel-hardening] Re: HalfSipHash Acceptable Usage
@ 2016-12-21 22:29 Jason A. Donenfeld
  2016-12-22  3:55 ` George Spelvin
  0 siblings, 1 reply; 6+ messages in thread
From: Jason A. Donenfeld @ 2016-12-21 22:29 UTC (permalink / raw)
  To: kernel-hardening, Theodore Ts'o, George Spelvin,
	Eric Dumazet, Jason, Andi Kleen, David Miller, David Laight,
	Daniel J . Bernstein, Eric Biggers, Hannes Frederic Sowa,
	Jean-Philippe Aumasson, Linux Crypto Mailing List, LKML,
	Andy Lutomirski, Netdev, Tom Herbert, Linus Torvalds,
	Vegard Nossum

On Wed, Dec 21, 2016 at 11:27 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> And "with enough registers" includes ARM and MIPS, right?  So the only
> real problem is 32-bit x86, and you're right, at that point, only
> people who might care are people who are using a space-radiation
> hardened 386 --- and they're not likely to be doing high throughput
> TCP connections.  :-)

Plus the benchmark was bogus anyway, and when I built a more specific
harness -- actually comparing the TCP sequence number functions --
SipHash was faster than MD5, even on register starved x86. So I think
we're fine and this chapter of the discussion can come to a close, in
order to move on to more interesting things.

^ permalink raw reply	[flat|nested] 6+ messages in thread
* Re: HalfSipHash Acceptable Usage
@ 2016-12-21 15:55 George Spelvin
  2016-12-21 16:41 ` [kernel-hardening] " Rik van Riel
  0 siblings, 1 reply; 6+ messages in thread
From: George Spelvin @ 2016-12-21 15:55 UTC (permalink / raw)
  To: Jason, linux
  Cc: ak, davem, David.Laight, djb, ebiggers3, eric.dumazet, hannes,
	jeanphilippe.aumasson, kernel-hardening, linux-crypto,
	linux-kernel, luto, netdev, tom, torvalds, tytso, vegard.nossum

Actually, DJB just made a very relevant suggestion.

As I've mentioned, the 32-bit performance problems are an x86-specific
problem.  ARM does very well, and other processors aren't bad at all.

SipHash fits very nicely (and runs very fast) in the MMX registers.

They're 64 bits, and there are 8 of them, so the integer registers can
be reserved for pointers and loop counters and all that.  And there's
reference code available.

How much does kernel_fpu_begin()/kernel_fpu_end() cost?

Although there are a lot of pre-MMX x86es in embedded control applications,
I don't think anyone is worried about their networking performance.
(Specifically, all of this affects only connection setup, not throughput 
on established connections.)

^ permalink raw reply	[flat|nested] 6+ messages in thread
* Re: HalfSipHash Acceptable Usage
@ 2016-12-21  3:28 George Spelvin
  2016-12-21  5:29 ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: George Spelvin @ 2016-12-21  3:28 UTC (permalink / raw)
  To: eric.dumazet, tytso
  Cc: ak, davem, David.Laight, djb, ebiggers3, hannes, Jason,
	jeanphilippe.aumasson, kernel-hardening, linux-crypto,
	linux-kernel, linux, luto, netdev, tom, torvalds, vegard.nossum

> I do not see why SipHash, if faster than MD5 and more secure, would be a
> problem.

Because on 32-bit x86, it's slower.

Cycles per byte on 1024 bytes of data:
			Pentium	Core 2	Ivy
			4	Duo	Bridge
SipHash-2-4		38.9	 8.3	 5.8
HalfSipHash-2-4		12.7	 4.5	 3.2
MD5			 8.3	 5.7	 4.7

SipHash is more parallelizable and runs faster on superscalar processors,
but MD5 is optimized for 2000-era processors, and is faster on them than
HalfSipHash even.

Now, in the applications we care about, we're hashing short blocks, and
SipHash has the advantage that it can hash less than 64 bytes.  But it
also pays a penalty on short blocks for the finalization, equivalent to
two words (16 bytes) of input.

It turns out that on both Ivy Bridge and Core 2 Duo, the crossover happens
between 23 (SipHash is faster) and 24 (MD5 is faster) bytes of input.

This is assuming you're adding the 1 byte of length padding to SipHash's
input, so 24 bytes pads to 4 64-bit words, which makes 2*4+4 = 12 rounds,
vs. one block for MD5.  (MD5 takes a similar jump between 55 and 56 bytes.)

On a P4, SipHash is *never* faster; it takes 2.5x longer than MD5 on a
12-byte block (an IPv4 address/port pair).

This is why there was discussion of using HalfSipHash on these machines.
(On a P4, the HalfSipHash/MD5 crossover is somewhere between 24 and 31
bytes; I haven't benchmarked every possible size.)

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-12-22  4:40 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-21 22:29 [kernel-hardening] Re: HalfSipHash Acceptable Usage Jason A. Donenfeld
2016-12-22  3:55 ` George Spelvin
2016-12-22  4:40   ` Jason A. Donenfeld
  -- strict thread matches above, loose matches on Subject: below --
2016-12-21 15:55 George Spelvin
2016-12-21 16:41 ` [kernel-hardening] " Rik van Riel
2016-12-21  3:28 George Spelvin
2016-12-21  5:29 ` Eric Dumazet
2016-12-21 14:42   ` Jason A. Donenfeld
2016-12-21 15:56     ` Eric Dumazet
2016-12-21 16:39       ` [kernel-hardening] " Rik van Riel
2016-12-21 17:08         ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).