* Re: [kernel-hardening] Re: HalfSipHash Acceptable Usage
@ 2016-12-21 22:29 Jason A. Donenfeld
2016-12-22 3:55 ` George Spelvin
0 siblings, 1 reply; 6+ messages in thread
From: Jason A. Donenfeld @ 2016-12-21 22:29 UTC (permalink / raw)
To: kernel-hardening, Theodore Ts'o, George Spelvin,
Eric Dumazet, Jason, Andi Kleen, David Miller, David Laight,
Daniel J . Bernstein, Eric Biggers, Hannes Frederic Sowa,
Jean-Philippe Aumasson, Linux Crypto Mailing List, LKML,
Andy Lutomirski, Netdev, Tom Herbert, Linus Torvalds,
Vegard Nossum
On Wed, Dec 21, 2016 at 11:27 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> And "with enough registers" includes ARM and MIPS, right? So the only
> real problem is 32-bit x86, and you're right, at that point, only
> people who might care are people who are using a space-radiation
> hardened 386 --- and they're not likely to be doing high throughput
> TCP connections. :-)
Plus the benchmark was bogus anyway, and when I built a more specific
harness -- actually comparing the TCP sequence number functions --
SipHash was faster than MD5, even on register starved x86. So I think
we're fine and this chapter of the discussion can come to a close, in
order to move on to more interesting things.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [kernel-hardening] Re: HalfSipHash Acceptable Usage
2016-12-21 22:29 [kernel-hardening] Re: HalfSipHash Acceptable Usage Jason A. Donenfeld
@ 2016-12-22 3:55 ` George Spelvin
2016-12-22 4:40 ` Jason A. Donenfeld
0 siblings, 1 reply; 6+ messages in thread
From: George Spelvin @ 2016-12-22 3:55 UTC (permalink / raw)
To: ak, davem, David.Laight, djb, ebiggers3, eric.dumazet, hannes,
Jason, jeanphilippe.aumasson, kernel-hardening, linux-crypto,
linux-kernel, linux, luto, netdev, tom, torvalds, tytso,
vegard.nossum
> Plus the benchmark was bogus anyway, and when I built a more specific
> harness -- actually comparing the TCP sequence number functions --
> SipHash was faster than MD5, even on register starved x86. So I think
> we're fine and this chapter of the discussion can come to a close, in
> order to move on to more interesting things.
Do we have to go through this? No, the benchmark was *not* bogus.
Here's myresults from *your* benchmark. I can't reboot some of my test
machines, so I took net/core/secure_seq.c, lib/siphash.c, lib/md5.c and
include/linux/siphash.h straight out of your test tree.
Then I replaced the kernel #includes with the necessary typedefs
and #defines to make it compile in user-space. (Voluminous but
straightforward.) E.g.
#define __aligned(x) __attribute__((__aligned__(x)))
#define ____cacheline_aligned __aligned(64)
#define CONFIG_INET 1
#define IS_ENABLED(x) 1
#define ktime_get_real_ns() 0
#define sysctl_tcp_timestamps 0
... etc.
Then I modified your benchmark code into the appended code. The
differences are:
* I didn't iterate 100K times, I timed the functions *once*.
* I saved the times in a buffer and printed them all at the end
so printf() wouldn't pollute the caches.
* Before every even-numbered iteration, I flushed the I-cache
of everything from _init to _fini (i.e. all the non-library code).
This cold-cache case is what is going to happen in the kernel.
In the results below, note that I did *not* re-flush between phases
of the test. The effects of cacheing is clearly apparent in the tcpv4
results, where the tcpv6 code loaded the cache.
You can also see that the SipHash code benefits more from cacheing when
entered with a cold cache, as it iterates over the input words, while
the MD5 code is one big unrolled blob.
Order of computation is down the columns first, across second.
The P4 results were:
tcpv6 md5 cold: 4084 3488 3584 3584 3568
tcpv4 md5 cold: 1052 996 996 1060 996
tcpv6 siphash cold: 4080 3296 3312 3296 3312
tcpv4 siphash cold: 2968 2748 2972 2716 2716
tcpv6 md5 hot: 900 712 712 712 712
tcpv4 md5 hot: 632 672 672 672 672
tcpv6 siphash hot: 2484 2292 2340 2340 2340
tcpv4 siphash hot: 1660 1560 1564 2340 1564
SipHash actually wins slightly in the cold-cache case, because
it iterates more. In the hot-cache case, it loses horribly.
Core 2 duo:
tcpv6 md5 cold: 3396 2868 2964 3012 2832
tcpv4 md5 cold: 1368 1044 1320 1332 1308
tcpv6 siphash cold: 2940 2952 2916 2448 2604
tcpv4 siphash cold: 3192 2988 3576 3504 3624
tcpv6 md5 hot: 1116 1032 996 1008 1008
tcpv4 md5 hot: 936 936 936 936 936
tcpv6 siphash hot: 1200 1236 1236 1188 1188
tcpv4 siphash hot: 936 804 804 804 804
Pretty much a tie, honestly.
Ivy Bridge:
tcpv6 md5 cold: 6086 6136 6962 6358 6060
tcpv4 md5 cold: 816 732 1046 1054 1012
tcpv6 siphash cold: 3756 1886 2152 2390 2566
tcpv4 siphash cold: 3264 2108 3026 3120 3526
tcpv6 md5 hot: 1062 808 824 824 832
tcpv4 md5 hot: 730 730 740 748 748
tcpv6 siphash hot: 960 952 936 1112 926
tcpv4 siphash hot: 638 544 562 552 560
Modern processors *hate* cold caches. But notice how md5 is *faster*
than SipHash on hot-cache IPv6.
Ivy Bridge, -m64:
tcpv6 md5 cold: 4680 3672 3956 3616 3525
tcpv4 md5 cold: 1066 1416 1179 1179 1134
tcpv6 siphash cold: 940 1258 1995 1609 2255
tcpv4 siphash cold: 1440 1269 1292 1870 1621
tcpv6 md5 hot: 1372 1111 1122 1088 1088
tcpv4 md5 hot: 997 997 997 997 998
tcpv6 siphash hot: 340 340 340 352 340
tcpv4 siphash hot: 227 238 238 238 238
Of course, when you compile -m64, SipHash is unbeatable.
Here's the modified benchmark() code. The entire package is
a bit voluminous for the mailing list, but anyone is welcome to it.
static void clflush(void)
{
extern char const _init, _fini;
char const *p = &_init;
while (p < &_fini) {
asm("clflush %0" : : "m" (*p));
p += 64;
}
}
typedef uint32_t cycles_t;
static cycles_t get_cycles(void)
{
uint32_t eax, edx;
asm volatile("rdtsc" : "=a" (eax), "=d" (edx));
return eax;
}
static int benchmark(void)
{
cycles_t start, finish;
int i;
u32 seq_number = 0;
__be32 saddr6[4] = { 1, 4, 182, 393 }, daddr6[4] = { 9192, 18288, 2222222, 0xffffff10 };
__be32 saddr4 = 28888, daddr4 = 182112;
__be16 sport = 22, dport = 41992;
u32 tsoff;
cycles_t result[4];
printf("seq num benchmark\n");
for (i = 0; i < 10; i++) {
if ((i & 1) == 0)
clflush();
start = get_cycles();
seq_number += secure_tcpv6_sequence_number_md5(saddr6, daddr6, sport, dport, &tsoff);
finish = get_cycles();
result[0] = finish - start;
start = get_cycles();
seq_number += secure_tcp_sequence_number_md5(saddr4, daddr4, sport, dport, &tsoff);
finish = get_cycles();
result[1] = finish - start;
start = get_cycles();
seq_number += secure_tcpv6_sequence_number(saddr6, daddr6, sport, dport, &tsoff);
finish = get_cycles();
result[2] = finish - start;
start = get_cycles();
seq_number += secure_tcp_sequence_number(saddr4, daddr4, sport, dport, &tsoff);
finish = get_cycles();
result[3] = finish - start;
printf("* Iteration %d results:\n", i);
printf("secure_tcpv6_sequence_number_md5# cycles: %u\n", result[0]);
printf("secure_tcp_sequence_number_md5# cycles: %u\n", result[1]);
printf("secure_tcpv6_sequence_number_siphash# cycles: %u\n", result[2]);
printf("secure_tcp_sequence_number_siphash# cycles: %u\n", result[3]);
printf("benchmark result: %u\n", seq_number);
}
printf("benchmark result: %u\n", seq_number);
return 0;
}
//device_initcall(benchmark);
int
main(void)
{
memset(net_secret, 0xff, sizeof net_secret);
memset(net_secret_md5, 0xff, sizeof net_secret_md5);
return benchmark();
}
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [kernel-hardening] Re: HalfSipHash Acceptable Usage
2016-12-22 3:55 ` George Spelvin
@ 2016-12-22 4:40 ` Jason A. Donenfeld
0 siblings, 0 replies; 6+ messages in thread
From: Jason A. Donenfeld @ 2016-12-22 4:40 UTC (permalink / raw)
To: George Spelvin
Cc: Andi Kleen, David Miller, David Laight, Daniel J . Bernstein,
Eric Biggers, Eric Dumazet, Hannes Frederic Sowa,
Jean-Philippe Aumasson, kernel-hardening,
Linux Crypto Mailing List, LKML, Andy Lutomirski, Netdev,
Tom Herbert, Linus Torvalds, Theodore Ts'o, Vegard Nossum
Hi George,
On Thu, Dec 22, 2016 at 4:55 AM, George Spelvin
<linux@sciencehorizons.net> wrote:
> Do we have to go through this? No, the benchmark was *not* bogus.
> Then I replaced the kernel #includes with the necessary typedefs
> and #defines to make it compile in user-space.
> * I didn't iterate 100K times, I timed the functions *once*.
> * I saved the times in a buffer and printed them all at the end
> so printf() wouldn't pollute the caches.
> * Before every even-numbered iteration, I flushed the I-cache
> of everything from _init to _fini (i.e. all the non-library code).
> This cold-cache case is what is going to happen in the kernel.
Wow! Great. Thanks for the pointers on the right way to do this. Very
helpful, and enlightening results indeed. Think you could send me the
whole .c of what you finally came up with? I'd like to run this on
some other architectures; I've got a few odd boxes laying around here.
> The P4 results were:
> SipHash actually wins slightly in the cold-cache case, because
> it iterates more. In the hot-cache case, it loses
> Core 2 duo:
> Pretty much a tie, honestly.
> Ivy Bridge:
> Modern processors *hate* cold caches. But notice how md5 is *faster*
> than SipHash on hot-cache IPv6.
> Ivy Bridge, -m64:
> Of course, when you compile -m64, SipHash is unbeatable.
Okay, so I think these results are consistent with some of the
assessments from before -- that SipHash is really just fine as a
replacement for MD5. Not great on older 32-bit x86, but not too
horrible, and the performance improvements on every other architecture
and the security improvements everywhere are a net good.
> Here's the modified benchmark() code. The entire package is
> a bit voluminous for the mailing list, but anyone is welcome to it.
Please do send! I'm sure I'll learn from reading it. Thanks again for
doing the hardwork of putting something proper together.
Thanks,
Jason
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [kernel-hardening] Re: HalfSipHash Acceptable Usage
2016-12-21 16:39 ` [kernel-hardening] " Rik van Riel
@ 2016-12-21 17:08 ` Eric Dumazet
0 siblings, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2016-12-21 17:08 UTC (permalink / raw)
To: Rik van Riel
Cc: kernel-hardening, Jason A. Donenfeld, George Spelvin,
Theodore Ts'o, Andi Kleen, David Miller, David Laight,
Daniel J . Bernstein, Eric Biggers, Hannes Frederic Sowa,
Jean-Philippe Aumasson, Linux Crypto Mailing List, LKML,
Andy Lutomirski, Netdev, Tom Herbert, Linus Torvalds,
Vegard Nossum
On Wed, 2016-12-21 at 11:39 -0500, Rik van Riel wrote:
> Does anybody still have a P4?
>
> If they do, they're probably better off replacing
> it with an Atom. The reduced power bills will pay
> for replacing that P4 within a year or two.
Well, maybe they have millions of units to replace.
>
> In short, I am not sure how important the P4
> performance numbers are, especially if we can
> improve security for everybody else...
Worth adding that the ISN or syncookie generation are less than 10% of
the actual cost of handling a problematic (having to generate ISN or
syncookie) TCP packet anyway.
So we are talking of minors potential impact for '2000-era' cpus.
Definitely I vote for using SipHash in TCP ASAP.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [kernel-hardening] Re: HalfSipHash Acceptable Usage
2016-12-21 15:55 George Spelvin
@ 2016-12-21 16:41 ` Rik van Riel
0 siblings, 0 replies; 6+ messages in thread
From: Rik van Riel @ 2016-12-21 16:41 UTC (permalink / raw)
To: kernel-hardening, Jason, linux
Cc: ak, davem, David.Laight, djb, ebiggers3, eric.dumazet, hannes,
jeanphilippe.aumasson, linux-crypto, linux-kernel, luto, netdev,
tom, torvalds, tytso, vegard.nossum
[-- Attachment #1: Type: text/plain, Size: 710 bytes --]
On Wed, 2016-12-21 at 10:55 -0500, George Spelvin wrote:
> Actually, DJB just made a very relevant suggestion.
>
> As I've mentioned, the 32-bit performance problems are an x86-
> specific
> problem. ARM does very well, and other processors aren't bad at all.
>
> SipHash fits very nicely (and runs very fast) in the MMX registers.
>
> They're 64 bits, and there are 8 of them, so the integer registers
> can
> be reserved for pointers and loop counters and all that. And there's
> reference code available.
>
> How much does kernel_fpu_begin()/kernel_fpu_end() cost?
Those can be very expensive. Almost certainly not
worth it for small amounts of data.
--
All Rights Reversed.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [kernel-hardening] Re: HalfSipHash Acceptable Usage
2016-12-21 15:56 ` Eric Dumazet
@ 2016-12-21 16:39 ` Rik van Riel
2016-12-21 17:08 ` Eric Dumazet
0 siblings, 1 reply; 6+ messages in thread
From: Rik van Riel @ 2016-12-21 16:39 UTC (permalink / raw)
To: kernel-hardening, Jason A. Donenfeld
Cc: George Spelvin, Theodore Ts'o, Andi Kleen, David Miller,
David Laight, Daniel J . Bernstein, Eric Biggers,
Hannes Frederic Sowa, Jean-Philippe Aumasson,
Linux Crypto Mailing List, LKML, Andy Lutomirski, Netdev,
Tom Herbert, Linus Torvalds, Vegard Nossum
[-- Attachment #1: Type: text/plain, Size: 1167 bytes --]
On Wed, 2016-12-21 at 07:56 -0800, Eric Dumazet wrote:
> On Wed, 2016-12-21 at 15:42 +0100, Jason A. Donenfeld wrote:
> George said :
>
> > Cycles per byte on 1024 bytes of data:
> > Pentium Core 2 Ivy
> > 4 Duo Bridge
> > SipHash-2-4 38.9 8.3 5.8
> > HalfSipHash-2-4 12.7 4.5 3.2
> > MD5 8.3 5.7 4.7
>
>
> That really was for 1024 bytes blocks, so pretty much useless for our
> discussion ?
>
> Reading your numbers last week, I thought SipHash was faster, but
> George
> numbers are giving the opposite impression.
>
> I do not have a P4 to make tests, so I only can trust you or George.
Does anybody still have a P4?
If they do, they're probably better off replacing
it with an Atom. The reduced power bills will pay
for replacing that P4 within a year or two.
In short, I am not sure how important the P4
performance numbers are, especially if we can
improve security for everybody else...
--
All Rights Reversed.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-12-22 4:40 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-21 22:29 [kernel-hardening] Re: HalfSipHash Acceptable Usage Jason A. Donenfeld
2016-12-22 3:55 ` George Spelvin
2016-12-22 4:40 ` Jason A. Donenfeld
-- strict thread matches above, loose matches on Subject: below --
2016-12-21 15:55 George Spelvin
2016-12-21 16:41 ` [kernel-hardening] " Rik van Riel
2016-12-21 3:28 George Spelvin
2016-12-21 5:29 ` Eric Dumazet
2016-12-21 14:42 ` Jason A. Donenfeld
2016-12-21 15:56 ` Eric Dumazet
2016-12-21 16:39 ` [kernel-hardening] " Rik van Riel
2016-12-21 17:08 ` Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).