All of lore.kernel.org
 help / color / mirror / Atom feed
From: "George Spelvin" <linux@sciencehorizons.net>
To: tom@herbertland.com
Cc: ak@linux.intel.com, davem@davemloft.net, David.Laight@aculab.com,
	djb@cr.yp.to, ebiggers3@gmail.com, hannes@stressinduktion.org,
	Jason@zx2c4.com, jeanphilippe.aumasson@gmail.com,
	kernel-hardening@lists.openwall.com,
	linux-crypto@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux@sciencehorizons.net, luto@amacapital.net,
	netdev@vger.kernel.org, torvalds@linux-foundation.org,
	tytso@mit.edu, vegard.nossum@gmail.com
Subject: Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF
Date: 17 Dec 2016 10:21:22 -0500	[thread overview]
Message-ID: <20161217152122.19677.qmail@ns.sciencehorizons.net> (raw)
In-Reply-To: <CALx6S351VFRZmEQphRQy6YtmZYPnOtTN7=XiNrJmhWJGv4HUBg@mail.gmail.com>

To follow up on my comments that your benchmark results were peculiar,
here's my benchmark code.

It just computes the hash of all n*(n+1)/2 possible non-empty substrings
of a buffer of n (called "max" below) bytes.  "cpb" is "cycles per byte".

(The average length is (n+2)/3, c.f. https://oeis.org/A000292)

On x86-32, HSipHash is asymptotically twice the speed of SipHash,
rising to 2.5x for short strings:

SipHash/HSipHash benchmark, sizeof(long) = 4
 SipHash: max=   4 cycles=     10495 cpb=524.7500 (sum=47a4f5554869fa97)
HSipHash: max=   4 cycles=      3400 cpb=170.0000 (sum=146a863e)
 SipHash: max=   8 cycles=     24468 cpb=203.9000 (sum=21c41a86355affcc)
HSipHash: max=   8 cycles=      9237 cpb= 76.9750 (sum=d3b5e0cd)
 SipHash: max=  16 cycles=     94622 cpb=115.9583 (sum=26d816b72721e48f)
HSipHash: max=  16 cycles=     34499 cpb= 42.2782 (sum=16bb7475)
 SipHash: max=  32 cycles=    418767 cpb= 69.9811 (sum=dd5a97694b8a832d)
HSipHash: max=  32 cycles=    156695 cpb= 26.1857 (sum=eed00fcb)
 SipHash: max=  64 cycles=   2119152 cpb= 46.3101 (sum=a2a725aecc09ed00)
HSipHash: max=  64 cycles=   1008678 cpb= 22.0428 (sum=99b9f4f)
 SipHash: max= 128 cycles=  12728659 cpb= 35.5788 (sum=420878cd20272817)
HSipHash: max= 128 cycles=   5452931 cpb= 15.2419 (sum=f1f4ad18)
 SipHash: max= 256 cycles=  38931946 cpb= 13.7615 (sum=e05dfb28b90dfd98)
HSipHash: max= 256 cycles=  13807312 cpb=  4.8805 (sum=ceeafcc1)
 SipHash: max= 512 cycles= 205537380 cpb=  9.1346 (sum=7d129d4de145fbea)
HSipHash: max= 512 cycles= 103420960 cpb=  4.5963 (sum=7f15a313)
 SipHash: max=1024 cycles=1540259472 cpb=  8.5817 (sum=cca7cbdc778ca8af)
HSipHash: max=1024 cycles= 796090824 cpb=  4.4355 (sum=d8f3374f)

On x86-64, SipHash is consistently faster, asymptotically approaching 2x
for long strings:

SipHash/HSipHash benchmark, sizeof(long) = 8
 SipHash: max=   4 cycles=      2642 cpb=132.1000 (sum=47a4f5554869fa97)
HSipHash: max=   4 cycles=      2498 cpb=124.9000 (sum=146a863e)
 SipHash: max=   8 cycles=      5270 cpb= 43.9167 (sum=21c41a86355affcc)
HSipHash: max=   8 cycles=      7140 cpb= 59.5000 (sum=d3b5e0cd)
 SipHash: max=  16 cycles=     19950 cpb= 24.4485 (sum=26d816b72721e48f)
HSipHash: max=  16 cycles=     23546 cpb= 28.8554 (sum=16bb7475)
 SipHash: max=  32 cycles=     80188 cpb= 13.4004 (sum=dd5a97694b8a832d)
HSipHash: max=  32 cycles=    101218 cpb= 16.9148 (sum=eed00fcb)
 SipHash: max=  64 cycles=    373286 cpb=  8.1575 (sum=a2a725aecc09ed00)
HSipHash: max=  64 cycles=    535568 cpb= 11.7038 (sum=99b9f4f)
 SipHash: max= 128 cycles=   2075224 cpb=  5.8006 (sum=420878cd20272817)
HSipHash: max= 128 cycles=   3336820 cpb=  9.3270 (sum=f1f4ad18)
 SipHash: max= 256 cycles=  14276278 cpb=  5.0463 (sum=e05dfb28b90dfd98)
HSipHash: max= 256 cycles=  28847880 cpb= 10.1970 (sum=ceeafcc1)
 SipHash: max= 512 cycles=  50135180 cpb=  2.2281 (sum=7d129d4de145fbea)
HSipHash: max= 512 cycles=  86145916 cpb=  3.8286 (sum=7f15a313)
 SipHash: max=1024 cycles= 334111900 cpb=  1.8615 (sum=cca7cbdc778ca8af)
HSipHash: max=1024 cycles= 640432452 cpb=  3.5682 (sum=d8f3374f)


Here's the code; compile with -DSELFTEST.  (The main purpose of
printing the sum is to prevent dead code elimination.)


#if SELFTEST
#include <stdint.h>
#include <stdlib.h>

static inline uint64_t rol64(uint64_t word, unsigned int shift)
{
	return word << shift | word >> (64 - shift);
}

static inline uint32_t rol32(uint32_t word, unsigned int shift)
{
	return word << shift | word >> (32 - shift);
}

static inline uint64_t get_unaligned_le64(void const *p)
{
	return *(uint64_t const *)p;
}

static inline uint32_t get_unaligned_le32(void const *p)
{
	return *(uint32_t const *)p;
}

static inline uint64_t le64_to_cpup(uint64_t const *p)
{
	return *p;
}

static inline uint32_t le32_to_cpup(uint32_t const *p)
{
	return *p;
}


#else
#include <linux/bitops.h>	/* For rol64 */
#include <linux/cryptohash.h>
#include <asm/byteorder.h>
#include <asm/unaligned.h>
#endif

/* The basic ARX mixing function, taken from Skein */
#define SIP_MIX(a, b, s) ((a) += (b), (b) = rol64(b, s), (b) ^= (a))

/*
 * The complete SipRound.  Note that, when unrolled twice like below,
 * the 32-bit rotates drop out on 32-bit machines.
 */
#define SIP_ROUND(a, b, c, d) \
	(SIP_MIX(a, b, 13), SIP_MIX(c, d, 16), (a) = rol64(a, 32), \
	 SIP_MIX(c, b, 17), SIP_MIX(a, d, 21), (c) = rol64(c, 32))

/*
 * This is rolled up more than most implementations, resulting in about
 * 55% the code size.  Speed is a few precent slower.  A crude benchmark
 * (for (i=1; i <= max; i++) for (j = 0; j < 4096-i; j++) hash(buf+j, i);)
 * produces the following timings (in usec):
 *
 *		i386	i386	i386	x86_64	x86_64	x86_64	x86_64
 * Length	small	unroll  halfmd4 small	unroll	halfmd4 teahash
 * 1..4		   1069	   1029	   1608	    195	    160	    399	    690
 * 1..8		   2483	   2381	   3851	    410	    360	    988	   1659
 * 1..12           4303	   4152	   6207	    690	    618	   1642	   2690
 * 1..16           6122	   5931	   8668	    968	    876	   2363	   3786
 * 1..20           8348	   8137	  11245	   1323	   1185	   3162	   5567
 * 1..24          10580	  10327	  13935	   1657	   1504	   4066	   7635
 * 1..28          13211	  12956	  16803	   2069	   1871	   5028	   9759
 * 1..32          15843	  15572	  19725	   2470	   2260	   6084	  11932
 * 1..36          18864	  18609	  24259	   2934	   2678	   7566	  14794
 * 1..1024      5890194	6130242	10264816 881933	 881244	3617392	7589036
 *
 * The performance penalty is quite minor, decreasing for long strings,
 * and it's significantly faster than half_md4, so I'm going for the
 * I-cache win.
 */
uint64_t
siphash24(char const *in, size_t len, uint32_t const seed[4])
{
	uint64_t a = 0x736f6d6570736575;	/* somepseu */
	uint64_t b = 0x646f72616e646f6d;	/* dorandom */
	uint64_t c = 0x6c7967656e657261;	/* lygenera */
	uint64_t d = 0x7465646279746573;	/* tedbytes */
	uint64_t m = 0;
	uint8_t padbyte = len;

	m = seed[2] | (uint64_t)seed[3] << 32;
	b ^= m;
	d ^= m;
	m = seed[0] | (uint64_t)seed[1] << 32;
	/* a ^= m; is done in loop below */
	c ^= m;

	/*
	 * By using the same SipRound code for all iterations, we
	 * save space, at the expense of some branch prediction.  But
	 * branch prediction is hard because of variable length anyway.
	 */
	len = len/8 + 3;	/* Now number of rounds to perform */
	do {
		a ^= m;

		switch (--len) {
			unsigned bytes;

		default:	/* Full words */
			d ^= m = get_unaligned_le64(in);
			in += 8;
			break;
		case 2:		/* Final partial word */
			/*
			 * We'd like to do one 64-bit fetch rather than
			 * mess around with bytes, but reading past the end
			 * might hit a protection boundary.  Fortunately,
			 * we know that protection boundaries are aligned,
			 * so we can consider only three cases:
			 * - The remainder occupies zero words
			 * - The remainder fits into one word
			 * - The remainder straddles two words
			 */
			bytes = padbyte & 7;

			if (bytes == 0) {
				m = 0;
			} else {
				unsigned offset = (unsigned)(uintptr_t)in & 7;

				if (offset + bytes <= 8) {
					m = le64_to_cpup((uint64_t const *)
								(in - offset));
					m >>= 8*offset;
				} else {
					m = get_unaligned_le64(in);
				}
				m &= ((uint64_t)1 << 8*bytes) - 1;
			}
			/* Could use | or +, but ^ allows associativity */
			d ^= m ^= (uint64_t)padbyte << 56;
			break;
		case 1:		/* Beginning of finalization */
			m = 0;
			c ^= 0xff;
			/*FALLTHROUGH*/
		case 0:		/* Second half of finalization */
			break;
		}

		SIP_ROUND(a, b, c, d);
		SIP_ROUND(a, b, c, d);
	} while (len);

	return a ^ b ^ c ^ d;
}

#undef SIP_ROUND
#undef SIP_MIX


#define HSIP_MIX(a, b, s) ((a) += (b), (b) = rol32(b, s), (b) ^= (a))

/*
 * These are the PRELIMINARY rotate constants suggested by
 * Jean-Philippe Aumasson.  Update to final when available.
 */
#define HSIP_ROUND(a, b, c, d) \
	(HSIP_MIX(a, b,  5), HSIP_MIX(c, d,  8), (a) = rol32(a, 16), \
	 HSIP_MIX(c, b,  7), HSIP_MIX(a, d, 13), (c) = rol32(c, 16))

uint32_t
hsiphash24(char const *in, size_t len, uint32_t const key[2])
{
	uint32_t c = key[0];
	uint32_t d = key[1];
	uint32_t a =     0x6c796765 ^ 0x736f6d65;
	uint32_t b = d ^ 0x74656462 ^ 0x646f7261;
	uint32_t m = c;
	uint8_t padbyte = len;

	/*
	 * By using the same SipRound code for all iterations, we
	 * save space, at the expense of some branch prediction.  But
	 * branch prediction is hard because of variable length anyway.
	 */
	len = len/sizeof(m) + 3;	/* Now number of rounds to perform */
	do {
		a ^= m;

		switch (--len) {
			unsigned bytes;

		default:	/* Full words */
			d ^= m = get_unaligned_le32(in);
			in += sizeof(m);
			break;
		case 2:		/* Final partial word */
			/*
			 * We'd like to do one 32-bit fetch rather than
			 * mess around with bytes, but reading past the end
			 * might hit a protection boundary.  Fortunately,
			 * we know that protection boundaries are aligned,
			 * so we can consider only three cases:
			 * - The remainder occupies zero words
			 * - The remainder fits into one word
			 * - The remainder straddles two words
			 */
			bytes = padbyte & 3;

			if (bytes == 0) {
				m = 0;
			} else {
				unsigned offset = (unsigned)(uintptr_t)in & 3;

				if (offset + bytes <= 4) {
					m = le32_to_cpup((uint32_t const *)
								(in - offset));
					m >>= 8*offset;
				} else {
					m = get_unaligned_le32(in);
				}
				m &= ((uint32_t)1 << 8*bytes) - 1;
			}
			/* Could use | or +, but ^ allows associativity */
			d ^= m ^= (uint32_t)padbyte << 24;
			break;
		case 1:		/* Beginning of finalization */
			m = 0;
			c ^= 0xff;
			/*FALLTHROUGH*/
		case 0:		/* Second half of finalization */
			break;
		}

		HSIP_ROUND(a, b, c, d);
		HSIP_ROUND(a, b, c, d);
	} while (len);

	return a ^ b ^ c ^ d;
	// return c + d;
}

#undef HSIP_ROUND
#undef HSIP_MIX

/*
 * No objection to EXPORT_SYMBOL, but we should probably figure out
 * how the seed[] array should work first.  Homework for the first
 * person to want to call it from a module!
 */

#if SELFTEST

#include <stdio.h>

static uint64_t rdtsc()
{
	uint32_t eax, edx;

	asm volatile ("rdtsc" : "=a" (eax), "=d" (edx));
	return (uint64_t)edx << 32 | eax;
}

int
main(void)
{
	static char const buf[1024] = { 0 };
	unsigned max;
	static const uint32_t key[4] = { 1, 2, 3, 4 };

	printf("SipHash/HSipHash benchmark, sizeof(long) = %u\n",
		(unsigned)sizeof(long));
	for (unsigned max = 4; max <= 1024; max *= 2) {
		uint64_t sum1 = 0;
		uint32_t sum2 = 0;
		uint64_t cycles;
		uint32_t bytes = 0;

		/* A less lazy person could figure out the closed form */
		for (int i = 1; i <= max; i++)
			bytes += i * (max + 1 - i);

		cycles = rdtsc();
		for (int i = 1; i <= max; i++)
			for (int j = 0; j <= max-i; j++)
				sum1 += siphash24(buf+j, i, key);
		cycles = rdtsc() - cycles;

		printf(" SipHash: max=%4u cycles=%10llu cpb=%8.4f (sum=%llx)\n",
			max, cycles, (double)cycles/bytes, sum1);

		cycles = rdtsc();
		for (int i = 1; i <= max; i++)
			for (int j = 0; j <= max-i; j++)
				sum2 += hsiphash24(buf+j, i, key);
		cycles = rdtsc() - cycles;
		printf("HSipHash: max=%4u cycles=%10llu cpb=%8.4f (sum=%lx)\n",
			max, cycles, (double)cycles/bytes, sum2);
	}
	return 0;
}


#endif

WARNING: multiple messages have this Message-ID (diff)
From: "George Spelvin" <linux@sciencehorizons.net>
To: tom@herbertland.com
Cc: ak@linux.intel.com, davem@davemloft.net, David.Laight@aculab.com,
	djb@cr.yp.to, ebiggers3@gmail.com, hannes@stressinduktion.org,
	Jason@zx2c4.com, jeanphilippe.aumasson@gmail.com,
	kernel-hardening@lists.openwall.com,
	linux-crypto@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux@sciencehorizons.net, luto@amacapital.net,
	netdev@vger.kernel.org, torvalds@linux-foundation.org,
	tytso@mit.edu, vegard.nossum@gmail.com
Subject: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF
Date: 17 Dec 2016 10:21:22 -0500	[thread overview]
Message-ID: <20161217152122.19677.qmail@ns.sciencehorizons.net> (raw)
In-Reply-To: <CALx6S351VFRZmEQphRQy6YtmZYPnOtTN7=XiNrJmhWJGv4HUBg@mail.gmail.com>

To follow up on my comments that your benchmark results were peculiar,
here's my benchmark code.

It just computes the hash of all n*(n+1)/2 possible non-empty substrings
of a buffer of n (called "max" below) bytes.  "cpb" is "cycles per byte".

(The average length is (n+2)/3, c.f. https://oeis.org/A000292)

On x86-32, HSipHash is asymptotically twice the speed of SipHash,
rising to 2.5x for short strings:

SipHash/HSipHash benchmark, sizeof(long) = 4
 SipHash: max=   4 cycles=     10495 cpb=524.7500 (sum=47a4f5554869fa97)
HSipHash: max=   4 cycles=      3400 cpb=170.0000 (sum=146a863e)
 SipHash: max=   8 cycles=     24468 cpb=203.9000 (sum=21c41a86355affcc)
HSipHash: max=   8 cycles=      9237 cpb= 76.9750 (sum=d3b5e0cd)
 SipHash: max=  16 cycles=     94622 cpb=115.9583 (sum=26d816b72721e48f)
HSipHash: max=  16 cycles=     34499 cpb= 42.2782 (sum=16bb7475)
 SipHash: max=  32 cycles=    418767 cpb= 69.9811 (sum=dd5a97694b8a832d)
HSipHash: max=  32 cycles=    156695 cpb= 26.1857 (sum=eed00fcb)
 SipHash: max=  64 cycles=   2119152 cpb= 46.3101 (sum=a2a725aecc09ed00)
HSipHash: max=  64 cycles=   1008678 cpb= 22.0428 (sum=99b9f4f)
 SipHash: max= 128 cycles=  12728659 cpb= 35.5788 (sum=420878cd20272817)
HSipHash: max= 128 cycles=   5452931 cpb= 15.2419 (sum=f1f4ad18)
 SipHash: max= 256 cycles=  38931946 cpb= 13.7615 (sum=e05dfb28b90dfd98)
HSipHash: max= 256 cycles=  13807312 cpb=  4.8805 (sum=ceeafcc1)
 SipHash: max= 512 cycles= 205537380 cpb=  9.1346 (sum=7d129d4de145fbea)
HSipHash: max= 512 cycles= 103420960 cpb=  4.5963 (sum=7f15a313)
 SipHash: max=1024 cycles=1540259472 cpb=  8.5817 (sum=cca7cbdc778ca8af)
HSipHash: max=1024 cycles= 796090824 cpb=  4.4355 (sum=d8f3374f)

On x86-64, SipHash is consistently faster, asymptotically approaching 2x
for long strings:

SipHash/HSipHash benchmark, sizeof(long) = 8
 SipHash: max=   4 cycles=      2642 cpb=132.1000 (sum=47a4f5554869fa97)
HSipHash: max=   4 cycles=      2498 cpb=124.9000 (sum=146a863e)
 SipHash: max=   8 cycles=      5270 cpb= 43.9167 (sum=21c41a86355affcc)
HSipHash: max=   8 cycles=      7140 cpb= 59.5000 (sum=d3b5e0cd)
 SipHash: max=  16 cycles=     19950 cpb= 24.4485 (sum=26d816b72721e48f)
HSipHash: max=  16 cycles=     23546 cpb= 28.8554 (sum=16bb7475)
 SipHash: max=  32 cycles=     80188 cpb= 13.4004 (sum=dd5a97694b8a832d)
HSipHash: max=  32 cycles=    101218 cpb= 16.9148 (sum=eed00fcb)
 SipHash: max=  64 cycles=    373286 cpb=  8.1575 (sum=a2a725aecc09ed00)
HSipHash: max=  64 cycles=    535568 cpb= 11.7038 (sum=99b9f4f)
 SipHash: max= 128 cycles=   2075224 cpb=  5.8006 (sum=420878cd20272817)
HSipHash: max= 128 cycles=   3336820 cpb=  9.3270 (sum=f1f4ad18)
 SipHash: max= 256 cycles=  14276278 cpb=  5.0463 (sum=e05dfb28b90dfd98)
HSipHash: max= 256 cycles=  28847880 cpb= 10.1970 (sum=ceeafcc1)
 SipHash: max= 512 cycles=  50135180 cpb=  2.2281 (sum=7d129d4de145fbea)
HSipHash: max= 512 cycles=  86145916 cpb=  3.8286 (sum=7f15a313)
 SipHash: max=1024 cycles= 334111900 cpb=  1.8615 (sum=cca7cbdc778ca8af)
HSipHash: max=1024 cycles= 640432452 cpb=  3.5682 (sum=d8f3374f)


Here's the code; compile with -DSELFTEST.  (The main purpose of
printing the sum is to prevent dead code elimination.)


#if SELFTEST
#include <stdint.h>
#include <stdlib.h>

static inline uint64_t rol64(uint64_t word, unsigned int shift)
{
	return word << shift | word >> (64 - shift);
}

static inline uint32_t rol32(uint32_t word, unsigned int shift)
{
	return word << shift | word >> (32 - shift);
}

static inline uint64_t get_unaligned_le64(void const *p)
{
	return *(uint64_t const *)p;
}

static inline uint32_t get_unaligned_le32(void const *p)
{
	return *(uint32_t const *)p;
}

static inline uint64_t le64_to_cpup(uint64_t const *p)
{
	return *p;
}

static inline uint32_t le32_to_cpup(uint32_t const *p)
{
	return *p;
}


#else
#include <linux/bitops.h>	/* For rol64 */
#include <linux/cryptohash.h>
#include <asm/byteorder.h>
#include <asm/unaligned.h>
#endif

/* The basic ARX mixing function, taken from Skein */
#define SIP_MIX(a, b, s) ((a) += (b), (b) = rol64(b, s), (b) ^= (a))

/*
 * The complete SipRound.  Note that, when unrolled twice like below,
 * the 32-bit rotates drop out on 32-bit machines.
 */
#define SIP_ROUND(a, b, c, d) \
	(SIP_MIX(a, b, 13), SIP_MIX(c, d, 16), (a) = rol64(a, 32), \
	 SIP_MIX(c, b, 17), SIP_MIX(a, d, 21), (c) = rol64(c, 32))

/*
 * This is rolled up more than most implementations, resulting in about
 * 55% the code size.  Speed is a few precent slower.  A crude benchmark
 * (for (i=1; i <= max; i++) for (j = 0; j < 4096-i; j++) hash(buf+j, i);)
 * produces the following timings (in usec):
 *
 *		i386	i386	i386	x86_64	x86_64	x86_64	x86_64
 * Length	small	unroll  halfmd4 small	unroll	halfmd4 teahash
 * 1..4		   1069	   1029	   1608	    195	    160	    399	    690
 * 1..8		   2483	   2381	   3851	    410	    360	    988	   1659
 * 1..12           4303	   4152	   6207	    690	    618	   1642	   2690
 * 1..16           6122	   5931	   8668	    968	    876	   2363	   3786
 * 1..20           8348	   8137	  11245	   1323	   1185	   3162	   5567
 * 1..24          10580	  10327	  13935	   1657	   1504	   4066	   7635
 * 1..28          13211	  12956	  16803	   2069	   1871	   5028	   9759
 * 1..32          15843	  15572	  19725	   2470	   2260	   6084	  11932
 * 1..36          18864	  18609	  24259	   2934	   2678	   7566	  14794
 * 1..1024      5890194	6130242	10264816 881933	 881244	3617392	7589036
 *
 * The performance penalty is quite minor, decreasing for long strings,
 * and it's significantly faster than half_md4, so I'm going for the
 * I-cache win.
 */
uint64_t
siphash24(char const *in, size_t len, uint32_t const seed[4])
{
	uint64_t a = 0x736f6d6570736575;	/* somepseu */
	uint64_t b = 0x646f72616e646f6d;	/* dorandom */
	uint64_t c = 0x6c7967656e657261;	/* lygenera */
	uint64_t d = 0x7465646279746573;	/* tedbytes */
	uint64_t m = 0;
	uint8_t padbyte = len;

	m = seed[2] | (uint64_t)seed[3] << 32;
	b ^= m;
	d ^= m;
	m = seed[0] | (uint64_t)seed[1] << 32;
	/* a ^= m; is done in loop below */
	c ^= m;

	/*
	 * By using the same SipRound code for all iterations, we
	 * save space, at the expense of some branch prediction.  But
	 * branch prediction is hard because of variable length anyway.
	 */
	len = len/8 + 3;	/* Now number of rounds to perform */
	do {
		a ^= m;

		switch (--len) {
			unsigned bytes;

		default:	/* Full words */
			d ^= m = get_unaligned_le64(in);
			in += 8;
			break;
		case 2:		/* Final partial word */
			/*
			 * We'd like to do one 64-bit fetch rather than
			 * mess around with bytes, but reading past the end
			 * might hit a protection boundary.  Fortunately,
			 * we know that protection boundaries are aligned,
			 * so we can consider only three cases:
			 * - The remainder occupies zero words
			 * - The remainder fits into one word
			 * - The remainder straddles two words
			 */
			bytes = padbyte & 7;

			if (bytes == 0) {
				m = 0;
			} else {
				unsigned offset = (unsigned)(uintptr_t)in & 7;

				if (offset + bytes <= 8) {
					m = le64_to_cpup((uint64_t const *)
								(in - offset));
					m >>= 8*offset;
				} else {
					m = get_unaligned_le64(in);
				}
				m &= ((uint64_t)1 << 8*bytes) - 1;
			}
			/* Could use | or +, but ^ allows associativity */
			d ^= m ^= (uint64_t)padbyte << 56;
			break;
		case 1:		/* Beginning of finalization */
			m = 0;
			c ^= 0xff;
			/*FALLTHROUGH*/
		case 0:		/* Second half of finalization */
			break;
		}

		SIP_ROUND(a, b, c, d);
		SIP_ROUND(a, b, c, d);
	} while (len);

	return a ^ b ^ c ^ d;
}

#undef SIP_ROUND
#undef SIP_MIX


#define HSIP_MIX(a, b, s) ((a) += (b), (b) = rol32(b, s), (b) ^= (a))

/*
 * These are the PRELIMINARY rotate constants suggested by
 * Jean-Philippe Aumasson.  Update to final when available.
 */
#define HSIP_ROUND(a, b, c, d) \
	(HSIP_MIX(a, b,  5), HSIP_MIX(c, d,  8), (a) = rol32(a, 16), \
	 HSIP_MIX(c, b,  7), HSIP_MIX(a, d, 13), (c) = rol32(c, 16))

uint32_t
hsiphash24(char const *in, size_t len, uint32_t const key[2])
{
	uint32_t c = key[0];
	uint32_t d = key[1];
	uint32_t a =     0x6c796765 ^ 0x736f6d65;
	uint32_t b = d ^ 0x74656462 ^ 0x646f7261;
	uint32_t m = c;
	uint8_t padbyte = len;

	/*
	 * By using the same SipRound code for all iterations, we
	 * save space, at the expense of some branch prediction.  But
	 * branch prediction is hard because of variable length anyway.
	 */
	len = len/sizeof(m) + 3;	/* Now number of rounds to perform */
	do {
		a ^= m;

		switch (--len) {
			unsigned bytes;

		default:	/* Full words */
			d ^= m = get_unaligned_le32(in);
			in += sizeof(m);
			break;
		case 2:		/* Final partial word */
			/*
			 * We'd like to do one 32-bit fetch rather than
			 * mess around with bytes, but reading past the end
			 * might hit a protection boundary.  Fortunately,
			 * we know that protection boundaries are aligned,
			 * so we can consider only three cases:
			 * - The remainder occupies zero words
			 * - The remainder fits into one word
			 * - The remainder straddles two words
			 */
			bytes = padbyte & 3;

			if (bytes == 0) {
				m = 0;
			} else {
				unsigned offset = (unsigned)(uintptr_t)in & 3;

				if (offset + bytes <= 4) {
					m = le32_to_cpup((uint32_t const *)
								(in - offset));
					m >>= 8*offset;
				} else {
					m = get_unaligned_le32(in);
				}
				m &= ((uint32_t)1 << 8*bytes) - 1;
			}
			/* Could use | or +, but ^ allows associativity */
			d ^= m ^= (uint32_t)padbyte << 24;
			break;
		case 1:		/* Beginning of finalization */
			m = 0;
			c ^= 0xff;
			/*FALLTHROUGH*/
		case 0:		/* Second half of finalization */
			break;
		}

		HSIP_ROUND(a, b, c, d);
		HSIP_ROUND(a, b, c, d);
	} while (len);

	return a ^ b ^ c ^ d;
	// return c + d;
}

#undef HSIP_ROUND
#undef HSIP_MIX

/*
 * No objection to EXPORT_SYMBOL, but we should probably figure out
 * how the seed[] array should work first.  Homework for the first
 * person to want to call it from a module!
 */

#if SELFTEST

#include <stdio.h>

static uint64_t rdtsc()
{
	uint32_t eax, edx;

	asm volatile ("rdtsc" : "=a" (eax), "=d" (edx));
	return (uint64_t)edx << 32 | eax;
}

int
main(void)
{
	static char const buf[1024] = { 0 };
	unsigned max;
	static const uint32_t key[4] = { 1, 2, 3, 4 };

	printf("SipHash/HSipHash benchmark, sizeof(long) = %u\n",
		(unsigned)sizeof(long));
	for (unsigned max = 4; max <= 1024; max *= 2) {
		uint64_t sum1 = 0;
		uint32_t sum2 = 0;
		uint64_t cycles;
		uint32_t bytes = 0;

		/* A less lazy person could figure out the closed form */
		for (int i = 1; i <= max; i++)
			bytes += i * (max + 1 - i);

		cycles = rdtsc();
		for (int i = 1; i <= max; i++)
			for (int j = 0; j <= max-i; j++)
				sum1 += siphash24(buf+j, i, key);
		cycles = rdtsc() - cycles;

		printf(" SipHash: max=%4u cycles=%10llu cpb=%8.4f (sum=%llx)\n",
			max, cycles, (double)cycles/bytes, sum1);

		cycles = rdtsc();
		for (int i = 1; i <= max; i++)
			for (int j = 0; j <= max-i; j++)
				sum2 += hsiphash24(buf+j, i, key);
		cycles = rdtsc() - cycles;
		printf("HSipHash: max=%4u cycles=%10llu cpb=%8.4f (sum=%lx)\n",
			max, cycles, (double)cycles/bytes, sum2);
	}
	return 0;
}


#endif

  parent reply	other threads:[~2016-12-17 15:21 UTC|newest]

Thread overview: 182+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-15 20:29 [PATCH v5 0/4] The SipHash Patchset Jason A. Donenfeld
2016-12-15 20:29 ` [kernel-hardening] " Jason A. Donenfeld
2016-12-15 20:30 ` [PATCH v5 1/4] siphash: add cryptographically secure PRF Jason A. Donenfeld
2016-12-15 20:30   ` [kernel-hardening] " Jason A. Donenfeld
2016-12-15 22:42   ` George Spelvin
2016-12-15 22:42     ` [kernel-hardening] " George Spelvin
2016-12-15 23:00     ` Jean-Philippe Aumasson
2016-12-15 23:00       ` [kernel-hardening] " Jean-Philippe Aumasson
2016-12-15 23:28       ` George Spelvin
2016-12-15 23:28         ` [kernel-hardening] " George Spelvin
2016-12-16 17:06         ` David Laight
2016-12-16 17:06           ` [kernel-hardening] " David Laight
2016-12-16 17:06           ` David Laight
2016-12-16 17:09           ` Jason A. Donenfeld
2016-12-16 17:09             ` [kernel-hardening] " Jason A. Donenfeld
2016-12-16 17:09             ` Jason A. Donenfeld
2016-12-16  3:46       ` George Spelvin
2016-12-16  3:46         ` [kernel-hardening] " George Spelvin
2016-12-16  8:08         ` Jean-Philippe Aumasson
2016-12-16  8:08           ` [kernel-hardening] " Jean-Philippe Aumasson
2016-12-16 12:39           ` Jason A. Donenfeld
2016-12-16 12:39             ` [kernel-hardening] " Jason A. Donenfeld
2016-12-16 13:22             ` Jean-Philippe Aumasson
2016-12-16 13:22               ` [kernel-hardening] " Jean-Philippe Aumasson
2016-12-16 15:51               ` Jason A. Donenfeld
2016-12-16 15:51                 ` [kernel-hardening] " Jason A. Donenfeld
2016-12-16 17:36                 ` George Spelvin
2016-12-16 17:36                   ` [kernel-hardening] " George Spelvin
2016-12-16 18:00                   ` Jason A. Donenfeld
2016-12-16 18:00                     ` [kernel-hardening] " Jason A. Donenfeld
2016-12-16 20:17                     ` George Spelvin
2016-12-16 20:17                       ` [kernel-hardening] " George Spelvin
2016-12-16 20:43                       ` Theodore Ts'o
2016-12-16 20:43                         ` [kernel-hardening] " Theodore Ts'o
2016-12-16 22:13                         ` George Spelvin
2016-12-16 22:13                           ` [kernel-hardening] " George Spelvin
2016-12-16 22:15                           ` Andy Lutomirski
2016-12-16 22:15                             ` [kernel-hardening] " Andy Lutomirski
2016-12-16 22:15                             ` Andy Lutomirski
2016-12-16 22:18                           ` Jason A. Donenfeld
2016-12-16 22:18                             ` [kernel-hardening] " Jason A. Donenfeld
2016-12-16 23:44                             ` George Spelvin
2016-12-16 23:44                               ` [kernel-hardening] " George Spelvin
2016-12-17  1:39                               ` Jason A. Donenfeld
2016-12-17  1:39                                 ` [kernel-hardening] " Jason A. Donenfeld
2016-12-17  2:15                                 ` George Spelvin
2016-12-17  2:15                                   ` [kernel-hardening] " George Spelvin
2016-12-17 15:41                                   ` Theodore Ts'o
2016-12-17 15:41                                     ` [kernel-hardening] " Theodore Ts'o
2016-12-17 16:14                                     ` Jeffrey Walton
2016-12-17 16:14                                       ` [kernel-hardening] " Jeffrey Walton
2016-12-19 17:21                                     ` Jason A. Donenfeld
2016-12-17 12:42                 ` George Spelvin
2016-12-17 12:42                   ` [kernel-hardening] " George Spelvin
2016-12-16 20:39               ` Jason A. Donenfeld
2016-12-16 20:39                 ` [kernel-hardening] " Jason A. Donenfeld
2016-12-16 19:47             ` Tom Herbert
2016-12-16 19:47               ` [kernel-hardening] " Tom Herbert
2016-12-16 20:41               ` George Spelvin
2016-12-16 20:41                 ` [kernel-hardening] " George Spelvin
2016-12-16 20:57                 ` Tom Herbert
2016-12-16 20:57                   ` [kernel-hardening] " Tom Herbert
2016-12-16 20:44               ` Daniel Micay
2016-12-16 20:44                 ` [kernel-hardening] " Daniel Micay
2016-12-16 21:09                 ` Jason A. Donenfeld
2016-12-17 15:21               ` George Spelvin [this message]
2016-12-17 15:21                 ` George Spelvin
2016-12-19 14:14                 ` David Laight
2016-12-19 14:14                   ` [kernel-hardening] " David Laight
2016-12-19 14:14                   ` David Laight
2016-12-19 18:10                   ` George Spelvin
2016-12-19 18:10                     ` [kernel-hardening] " George Spelvin
2016-12-19 20:18                     ` Jean-Philippe Aumasson
2016-12-19 20:18                       ` [kernel-hardening] " Jean-Philippe Aumasson
2016-12-16  2:14   ` kbuild test robot
2016-12-16  2:14     ` [kernel-hardening] " kbuild test robot
2016-12-17 14:55   ` Jeffrey Walton
2016-12-17 14:55     ` [kernel-hardening] " Jeffrey Walton
2016-12-19 17:08     ` Jason A. Donenfeld
2016-12-19 17:08       ` [kernel-hardening] " Jason A. Donenfeld
2016-12-19 17:19       ` Jean-Philippe Aumasson
2016-12-19 17:19         ` [kernel-hardening] " Jean-Philippe Aumasson
2016-12-15 20:30 ` [PATCH v5 2/4] siphash: add Nu{32,64} helpers Jason A. Donenfeld
2016-12-15 20:30   ` [kernel-hardening] " Jason A. Donenfeld
2016-12-16 10:39   ` David Laight
2016-12-16 10:39     ` [kernel-hardening] " David Laight
2016-12-16 10:39     ` David Laight
2016-12-16 15:44     ` George Spelvin
2016-12-16 15:44       ` [kernel-hardening] " George Spelvin
2016-12-15 20:30 ` [PATCH v5 3/4] secure_seq: use SipHash in place of MD5 Jason A. Donenfeld
2016-12-15 20:30   ` [kernel-hardening] " Jason A. Donenfeld
2016-12-16  9:59   ` David Laight
2016-12-16  9:59     ` [kernel-hardening] " David Laight
2016-12-16  9:59     ` David Laight
2016-12-16 15:57     ` Jason A. Donenfeld
2016-12-16 15:57       ` [kernel-hardening] " Jason A. Donenfeld
2016-12-16 15:57       ` Jason A. Donenfeld
2016-12-15 20:30 ` [PATCH v5 4/4] random: " Jason A. Donenfeld
2016-12-15 20:30   ` [kernel-hardening] " Jason A. Donenfeld
2016-12-16  3:03 ` [PATCH v6 0/5] The SipHash Patchset Jason A. Donenfeld
2016-12-16  3:03   ` [kernel-hardening] " Jason A. Donenfeld
2016-12-16  3:03   ` [PATCH v6 1/5] siphash: add cryptographically secure PRF Jason A. Donenfeld
2016-12-16  3:03     ` [kernel-hardening] " Jason A. Donenfeld
2016-12-16  3:03   ` [PATCH v6 2/5] secure_seq: use SipHash in place of MD5 Jason A. Donenfeld
2016-12-16  3:03     ` [kernel-hardening] " Jason A. Donenfeld
2016-12-16  3:03   ` [PATCH v6 3/5] random: " Jason A. Donenfeld
2016-12-16  3:03     ` [kernel-hardening] " Jason A. Donenfeld
2016-12-16 21:31     ` Andy Lutomirski
2016-12-16 21:31       ` [kernel-hardening] " Andy Lutomirski
2016-12-16 21:31       ` Andy Lutomirski
2016-12-16  3:03   ` [PATCH v6 4/5] md5: remove from lib and only live in crypto Jason A. Donenfeld
2016-12-16  3:03     ` [kernel-hardening] " Jason A. Donenfeld
2016-12-16  3:03   ` [PATCH v6 5/5] syncookies: use SipHash in place of SHA1 Jason A. Donenfeld
2016-12-16  3:03     ` [kernel-hardening] " Jason A. Donenfeld
2016-12-21 23:02   ` [PATCH v7 0/6] The SipHash Patchset Jason A. Donenfeld
2016-12-21 23:02     ` [kernel-hardening] " Jason A. Donenfeld
2016-12-21 23:02     ` [PATCH v7 1/6] siphash: add cryptographically secure PRF Jason A. Donenfeld
2016-12-21 23:02       ` [kernel-hardening] " Jason A. Donenfeld
2016-12-22  1:40       ` Stephen Hemminger
2016-12-22  1:40         ` [kernel-hardening] " Stephen Hemminger
2016-12-21 23:02     ` [PATCH v7 2/6] secure_seq: use SipHash in place of MD5 Jason A. Donenfeld
2016-12-21 23:02       ` [kernel-hardening] " Jason A. Donenfeld
2016-12-21 23:02     ` [PATCH v7 3/6] random: " Jason A. Donenfeld
2016-12-21 23:02       ` [kernel-hardening] " Jason A. Donenfeld
2016-12-21 23:13       ` Jason A. Donenfeld
2016-12-21 23:13         ` [kernel-hardening] " Jason A. Donenfeld
2016-12-21 23:42       ` Andy Lutomirski
2016-12-21 23:42         ` [kernel-hardening] " Andy Lutomirski
2016-12-21 23:42         ` Andy Lutomirski
2016-12-22  2:07         ` Hannes Frederic Sowa
2016-12-22  2:07           ` [kernel-hardening] " Hannes Frederic Sowa
2016-12-22  2:07           ` Hannes Frederic Sowa
2016-12-22  2:09           ` Andy Lutomirski
2016-12-22  2:09             ` [kernel-hardening] " Andy Lutomirski
2016-12-22  2:09             ` Andy Lutomirski
2016-12-22  2:49           ` Jason A. Donenfeld
2016-12-22  2:49             ` [kernel-hardening] " Jason A. Donenfeld
2016-12-22  2:49             ` Jason A. Donenfeld
2016-12-22  3:12             ` Jason A. Donenfeld
2016-12-22  3:12               ` [kernel-hardening] " Jason A. Donenfeld
2016-12-22  3:12               ` Jason A. Donenfeld
2016-12-22  5:41             ` Theodore Ts'o
2016-12-22  5:41               ` [kernel-hardening] " Theodore Ts'o
2016-12-22  6:03               ` Jason A. Donenfeld
2016-12-22 15:58                 ` Theodore Ts'o
2016-12-22 15:58                   ` [kernel-hardening] " Theodore Ts'o
2016-12-22 16:16                   ` Jason A. Donenfeld
2016-12-22 16:16                     ` [kernel-hardening] " Jason A. Donenfeld
2016-12-22 16:30                     ` Theodore Ts'o
2016-12-22 16:36                       ` Jason A. Donenfeld
2016-12-22 12:47               ` Hannes Frederic Sowa
2016-12-22 12:47                 ` [kernel-hardening] " Hannes Frederic Sowa
2016-12-22 13:10                 ` Jason A. Donenfeld
2016-12-22 15:05                   ` Hannes Frederic Sowa
2016-12-22 15:12                     ` Jason A. Donenfeld
2016-12-22 15:29                       ` Jason A. Donenfeld
2016-12-22 15:33                         ` Hannes Frederic Sowa
2016-12-22 15:33                           ` [kernel-hardening] " Hannes Frederic Sowa
2016-12-22 15:41                           ` Jason A. Donenfeld
2016-12-22 15:51                             ` Hannes Frederic Sowa
2016-12-22 15:51                               ` [kernel-hardening] " Hannes Frederic Sowa
2016-12-22 15:53                               ` Jason A. Donenfeld
2016-12-22 15:54                   ` Theodore Ts'o
2016-12-22 15:54                     ` [kernel-hardening] " Theodore Ts'o
2016-12-22 18:08                     ` Hannes Frederic Sowa
2016-12-22 18:13                       ` Jason A. Donenfeld
2016-12-22 18:13                         ` [kernel-hardening] " Jason A. Donenfeld
2016-12-22 19:50                       ` Theodore Ts'o
2016-12-22  2:31         ` Jason A. Donenfeld
2016-12-22  2:31           ` [kernel-hardening] " Jason A. Donenfeld
2016-12-22  2:31           ` Jason A. Donenfeld
2016-12-21 23:02     ` [PATCH v7 4/6] md5: remove from lib and only live in crypto Jason A. Donenfeld
2016-12-21 23:02       ` [kernel-hardening] " Jason A. Donenfeld
2016-12-21 23:02     ` [PATCH v7 5/6] syncookies: use SipHash in place of SHA1 Jason A. Donenfeld
2016-12-21 23:02       ` [kernel-hardening] " Jason A. Donenfeld
2016-12-21 23:02     ` [PATCH v7 6/6] siphash: implement HalfSipHash1-3 for hash tables Jason A. Donenfeld
2016-12-21 23:02       ` [kernel-hardening] " Jason A. Donenfeld
2016-12-22  0:46       ` Andi Kleen
2016-12-22  0:46         ` [kernel-hardening] " Andi Kleen
2016-12-16 20:43 [PATCH v5 1/4] siphash: add cryptographically secure PRF Jason A. Donenfeld
2016-12-16 20:49 Jason A. Donenfeld
2016-12-16 21:25 ` George Spelvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161217152122.19677.qmail@ns.sciencehorizons.net \
    --to=linux@sciencehorizons.net \
    --cc=David.Laight@aculab.com \
    --cc=Jason@zx2c4.com \
    --cc=ak@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=djb@cr.yp.to \
    --cc=ebiggers3@gmail.com \
    --cc=hannes@stressinduktion.org \
    --cc=jeanphilippe.aumasson@gmail.com \
    --cc=kernel-hardening@lists.openwall.com \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=netdev@vger.kernel.org \
    --cc=tom@herbertland.com \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=vegard.nossum@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.