* Re: Where exactly will arch_fast_hash be used
@ 2014-12-07 5:20 George Spelvin
2014-12-07 9:28 ` Herbert Xu
2014-12-07 13:14 ` Hannes Frederic Sowa
0 siblings, 2 replies; 26+ messages in thread
From: George Spelvin @ 2014-12-07 5:20 UTC (permalink / raw)
To: herbert; +Cc: dborkman, hannes, linux, linux-kernel, netdev, tgraf
If you want DoS-resistant hash tables, I'm working on adding SipHash
to the kernel.
This is a keyed pseudo-random function designed specifically for that
application. I am starting with ext4 directory hashes, and then intended
to expand to secure sequence numbers (since it's far faster than MD5).
(I'm trying to figure out a good interface, since the crypto API
is a bit heavy for something to heavily optimized.)
But one comment caught my eye:
> Even if security wasn't an issue, straight CRC32 has really poor
> lower-order bit distribution, which makes it a terrible choice for
> a hash table that simply uses the lower-order bits.
Er... huh? That's the first time I've heard that claim, and while I'm not
Philip Koopman or Guy Castagnoli, I thought I understood CRCs pretty well.
CRCs generally mix bits pretty well. The sparse 16-bit CRCs chosen
for implementation simplicity had some limitations, but the Castagnoli
polynomial is quite dense.
And their mathematical symmetry means that the low bits really shouldn't
be any different from any other bits. But if it is an issue, it's just
as easy work to shift down the correct number of high bits rather than
using the low.
Can you point me to a source for that statement?
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Where exactly will arch_fast_hash be used 2014-12-07 5:20 Where exactly will arch_fast_hash be used George Spelvin @ 2014-12-07 9:28 ` Herbert Xu 2014-12-07 10:02 ` George Spelvin 2014-12-07 13:14 ` Hannes Frederic Sowa 1 sibling, 1 reply; 26+ messages in thread From: Herbert Xu @ 2014-12-07 9:28 UTC (permalink / raw) To: George Spelvin; +Cc: dborkman, hannes, linux-kernel, netdev, tgraf On Sun, Dec 07, 2014 at 12:20:41AM -0500, George Spelvin wrote: > > Can you point me to a source for that statement? For a start why don't you print out the hashes of 1-255 and then find out how easy it is to deduce the last bit of the hash result. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Where exactly will arch_fast_hash be used 2014-12-07 9:28 ` Herbert Xu @ 2014-12-07 10:02 ` George Spelvin 2014-12-07 12:51 ` Herbert Xu 0 siblings, 1 reply; 26+ messages in thread From: George Spelvin @ 2014-12-07 10:02 UTC (permalink / raw) To: herbert, linux; +Cc: dborkman, hannes, linux-kernel, netdev, tgraf > For a start why don't you print out the hashes of 1-255 and then > find out how easy it is to deduce the last bit of the hash result. They're available in lib/crc32table.h, as crc32ctable_le[0]. As a CRC is a linear function, every bit is the XOR of some selected bits of the input, i.e. the parity of the input and some bit-specific mask sequence. Furthermore, CRCs are cyclic, so the mask sequences for adjacent bits are shifts of each other. The lsbit of the CRC32c of x is the parity of x & 0x1f. This is because the LFSR sequence generated by the polynomial starts 0001111110010001110010101111011000111000011011110010110000100101... The first bit corresponds to the msbit of the last byte. How does this implicate the low bits specifically? ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Where exactly will arch_fast_hash be used 2014-12-07 10:02 ` George Spelvin @ 2014-12-07 12:51 ` Herbert Xu 2014-12-07 13:23 ` George Spelvin 0 siblings, 1 reply; 26+ messages in thread From: Herbert Xu @ 2014-12-07 12:51 UTC (permalink / raw) To: George Spelvin Cc: dborkman, hannes, linux-kernel, netdev, tgraf, David S. Miller, Theodore Ts'o On Sun, Dec 07, 2014 at 05:02:52AM -0500, George Spelvin wrote: > > How does this implicate the low bits specifically? If you can easily deduce the pre-images that make the last bit of the hash even or odd, then you've just cut your search space for collisions by half. The real killer is that you can do this without knowing what the secret is. Our entire scheme is dependent on using the secret to defeat would-be attackers. If CRC does not make effective use of the secret, then we're essentially naked against attackers. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Where exactly will arch_fast_hash be used 2014-12-07 12:51 ` Herbert Xu @ 2014-12-07 13:23 ` George Spelvin 2014-12-07 14:06 ` Hannes Frederic Sowa 2014-12-09 14:24 ` Herbert Xu 0 siblings, 2 replies; 26+ messages in thread From: George Spelvin @ 2014-12-07 13:23 UTC (permalink / raw) To: herbert, linux Cc: davem, dborkman, hannes, linux-kernel, netdev, tgraf, tytso >> How does this implicate the low bits specifically? > If you can easily deduce the pre-images that make the last bit > of the hash even or odd, then you've just cut your search space > for collisions by half. The real killer is that you can do this > without knowing what the secret is. Um, yes, if you're in a situation where a hash collsion DoS is possible, a CRC is disastrous choice. You can trivially find collisions for *all* bits of a CRC. Low or high, they're all equally easy. When you said >>>>> Even if security wasn't an issue, straight CRC32 has really poor >>>>> lower-order bit distribution, which makes it a terrible choice for >>>>> a hash table that simply uses the lower-order bits. This is talking about: - Non-malicious inputs,where security isn't an issue, and - Low-order bits specifically, implying that the high-order bits are different. *That's* the claim I'm curious about. I know perfectly well that if security *is* an issue, a fixed-polynomial CRC is a disaser. But for non-malicious inputs, like normal software identifiers, a CRC actually works very well. If you want to do secure hashing with a CRC, you need to have a secret *polynomial*. That *is* provably secure (it's a universal family of hash functions), but isn't provided by x86 unless you use SSE and PCLMUL. That's why it's a non-cryptographic hash, suitable for non-malicious inputs only. That's the same security claim as many other common hash functions. > Our entire scheme is dependent on using the secret to defeat > would-be attackers. If CRC does not make effective use of the > secret, then we're essentially naked against attackers. Okay, I'm confused. *What* scheme? The arch_fast_hash interface doesn't have any provision for a secret. Because there's no point to having one; you can't change the polynomial, and anything additive has just moves collisions around without reducing them. So there are plenty of hash tables in Linux that you don't dare use this with. In fact, so many that, as you rightly point out, it's not clear if it's worth providing this special optimization for the few remaining. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Where exactly will arch_fast_hash be used 2014-12-07 13:23 ` George Spelvin @ 2014-12-07 14:06 ` Hannes Frederic Sowa 2014-12-07 21:33 ` George Spelvin 2014-12-09 14:24 ` Herbert Xu 1 sibling, 1 reply; 26+ messages in thread From: Hannes Frederic Sowa @ 2014-12-07 14:06 UTC (permalink / raw) To: George Spelvin Cc: herbert, davem, dborkman, linux-kernel, netdev, tgraf, tytso On So, 2014-12-07 at 08:23 -0500, George Spelvin wrote: > So there are plenty of hash tables in Linux that you don't dare use this > with. In fact, so many that, as you rightly point out, it's not clear > if it's worth providing this special optimization for the few remaining. In case of openvswitch it shows a performance improvment. The seed parameter could be used as an initial biasing of the crc32 function, but in case of openvswitch it is only set to 0. Bye, Hannes ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Where exactly will arch_fast_hash be used 2014-12-07 14:06 ` Hannes Frederic Sowa @ 2014-12-07 21:33 ` George Spelvin 2014-12-08 11:25 ` Hannes Frederic Sowa 0 siblings, 1 reply; 26+ messages in thread From: George Spelvin @ 2014-12-07 21:33 UTC (permalink / raw) To: hannes, linux Cc: davem, dborkman, herbert, linux-kernel, netdev, tgraf, tytso On Sun, 2014-12-07 at 15:06 +0100, Hannes Frederic Sowa wrote: > In case of openvswitch it shows a performance improvment. The seed > parameter could be used as an initial biasing of the crc32 function, but > in case of openvswitch it is only set to 0. NACK. This is the Fatal Error in thinking that Herbert was warning about. The seed parameter doesn't affect CRC32 collisions *at all* if the inputs are the same size. For fixed-size inputs, a non-zero seed is equivalent to XORing a constant into the output of the CRC computation. for *different* sized inputs, a non-zero seed detects zero-padding better than a zero one, but *which* non-zero value is also irrelevant; all-ones is the traditional choice because it's simplest in hardware. A CRC is inherently linear. CRC(a^b) = CRC(a) ^ CRC(b). This makes them easy to analyze mathematically and gives them a number of nice properties for detecting hardware corruption. But that same simplicity makes it *ridiculously* easy to generate collisions if you try. One way of looking at a CRC is to say that each bit in the input has a CRC. The CRC of a message string is just the XOR of the CRCs of the individual bits that are set in the message. Now, a CRC polynomial is chosen so that all of the bits of a message have different CRCs. Obviously, there's a limit: when the message is 2^n bits long, it's not possible for all the bits to have different, non-zero n-bit CRCs. But a CRC is a really efficient way of assigning different bit patterns to different input bits up to that limit. (Something like CRC32c is also chosen so that, for messages up to a reasonable length, no 3-bit, 4-bit, etc. combinations have CRCs that XOR to zero.) But, and this might be what Herbert was trying to say and I was misunderstanding, if you then *truncate* that CRC, the CRCs of the message bits lose that uniqueness guarantee. They're just pseudorandom numbers, and a CRC loses its special collision-resistance properties. It's just an ordinary random hash, and thanks to the birthday paradox, you're likely to find two bits whose CRCs agree in any particular 8 bits within roughly sqrt(2*256) or 22 bits. Here are a few such collisions for the least significant 8 bits of CRC32c: Msg1 CRC32c Msg2 CRC32c Match 1<<11 3fc5f181 1<<30 bf672381 81 1<<12 9d14c3b8 1<<31 dd45aab8 b8 1<<5 c79a971f 1<<44 6006181f 1f 1<<15 13a29877 1<<45 b2f53777 77 There's nothing special about the lsbits of the CRC. Within 64 bits, the most significant 8 bits have it worse: 1<<5 c79a971f 1<<17 c76580d9 c7 1<<6 e13b70f7 1<<18 e144fb14 e1 1<<19 70a27d8a 1<<38 7022df58 70 1<<20 38513ec5 1<<39 38116fac 38 1<<13 4e8a61dc 1<<52 4e2dfd53 4e 1<<23 a541927e 1<<53 a5e0c5d1 a5 Now, I'd like to stress that this collision rate is no worse than any *other* hash function. A truncated CRC loses its special resistance to the birthday paradox (you'd have been much smarter to use 8-bit CRC), but it doesn't become especially bad. A truncated SHA-1 will have coillisions just as often. The concern with a CRC is that, once you've found one collision, you've found a huge number of them. Just XOR the bit pattern of your choice into both of the colliding messages, and you have a new collision. For another example, if you consider the CRC32c of all possible 1-byte messages *and then take only the low byte*, there are only 128 possible values. It turns out that the byte 0x5d has a CRC32c of 0xee0d9600. This ends in 00, so if I XOR 0x5d into anything, the low 8 bits of the CRC don't change. Likewise, the message "23 00" has a CRC32c of 0x00ee0d96. So you can XOR 0x23 into the second-last byte of anything, and the high 8 bits of the CRC don't change. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Where exactly will arch_fast_hash be used 2014-12-07 21:33 ` George Spelvin @ 2014-12-08 11:25 ` Hannes Frederic Sowa 2014-12-08 16:19 ` George Spelvin 0 siblings, 1 reply; 26+ messages in thread From: Hannes Frederic Sowa @ 2014-12-08 11:25 UTC (permalink / raw) To: George Spelvin Cc: davem, dborkman, herbert, linux-kernel, netdev, tgraf, tytso Hi, On Sun, Dec 7, 2014, at 22:33, George Spelvin wrote: > On Sun, 2014-12-07 at 15:06 +0100, Hannes Frederic Sowa wrote: > > In case of openvswitch it shows a performance improvment. The seed > > parameter could be used as an initial biasing of the crc32 function, but > > in case of openvswitch it is only set to 0. > > NACK. > > This is the Fatal Error in thinking that Herbert was warning about. > The seed parameter doesn't affect CRC32 collisions *at all* if the inputs > are the same size. > > For fixed-size inputs, a non-zero seed is equivalent to XORing a > constant into the output of the CRC computation. Sorry for being unclear, I understood that and didn't bother patching that '0' with a random seed exactly because of this. > for *different* sized inputs, a non-zero seed detects zero-padding > better than a zero one, but *which* non-zero value is also irrelevant; > all-ones is the traditional choice because it's simplest in hardware. > > > A CRC is inherently linear. CRC(a^b) = CRC(a) ^ CRC(b). This makes > them easy to analyze mathematically and gives them a number of nice > properties for detecting hardware corruption. > > But that same simplicity makes it *ridiculously* easy to generate > collisions if you try. Yes, understood and I totally agree we shouldn't use crc32 hashing in a lot of places where unsafe data is going to be hashed and inserted into hash tables. > One way of looking at a CRC is to say that each bit in the input > has a CRC. The CRC of a message string is just the XOR of the CRCs > of the individual bits that are set in the message. > > Now, a CRC polynomial is chosen so that all of the bits of a > message have different CRCs. Obviously, there's a limit: when the > message is 2^n bits long, it's not possible for all the bits to > have different, non-zero n-bit CRCs. > > But a CRC is a really efficient way of assigning different bit patterns > to different input bits up to that limit. > > (Something like CRC32c is also chosen so that, for messages up to a > reasonable length, no 3-bit, 4-bit, etc. combinations have CRCs that > XOR to zero.) > > > But, and this might be what Herbert was trying to say and I was > misunderstanding, if you then *truncate* that CRC, the CRCs of the > message bits lose that uniqueness guarantee. They're just pseudorandom > numbers, and a CRC loses its special collision-resistance properties. > > It's just an ordinary random hash, and thanks to the birthday paradox, > you're likely to find two bits whose CRCs agree in any particular 8 bits > within roughly sqrt(2*256) or 22 bits. > > Here are a few such collisions for the least significant 8 bits of > CRC32c: > > Msg1 CRC32c Msg2 CRC32c Match > 1<<11 3fc5f181 1<<30 bf672381 81 > 1<<12 9d14c3b8 1<<31 dd45aab8 b8 > 1<<5 c79a971f 1<<44 6006181f 1f > 1<<15 13a29877 1<<45 b2f53777 77 > > There's nothing special about the lsbits of the CRC. > Within 64 bits, the most significant 8 bits have it worse: > > 1<<5 c79a971f 1<<17 c76580d9 c7 > 1<<6 e13b70f7 1<<18 e144fb14 e1 > 1<<19 70a27d8a 1<<38 7022df58 70 > 1<<20 38513ec5 1<<39 38116fac 38 > 1<<13 4e8a61dc 1<<52 4e2dfd53 4e > 1<<23 a541927e 1<<53 a5e0c5d1 a5 > > > Now, I'd like to stress that this collision rate is no worse than any > *other* hash function. A truncated CRC loses its special resistance to > the birthday paradox (you'd have been much smarter to use 8-bit CRC), > but it doesn't become especially bad. A truncated SHA-1 will have > coillisions just as often. > > The concern with a CRC is that, once you've found one collision, you've > found a huge number of them. Just XOR the bit pattern of your choice > into both of the colliding messages, and you have a new collision. Ack. > For another example, if you consider the CRC32c of all possible 1-byte > messages *and then take only the low byte*, there are only 128 possible > values. > > It turns out that the byte 0x5d has a CRC32c of 0xee0d9600. This ends > in 00, so if I XOR 0x5d into anything, the low 8 bits of the CRC > don't change. > > Likewise, the message "23 00" has a CRC32c of 0x00ee0d96. So you can > XOR 0x23 into the second-last byte of anything, and the high 8 bits of > the CRC don't change. A very interesting read, thanks for your mail! Bye, Hannes ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Where exactly will arch_fast_hash be used 2014-12-08 11:25 ` Hannes Frederic Sowa @ 2014-12-08 16:19 ` George Spelvin 2014-12-08 16:32 ` Hannes Frederic Sowa 0 siblings, 1 reply; 26+ messages in thread From: George Spelvin @ 2014-12-08 16:19 UTC (permalink / raw) To: hannes, linux Cc: davem, dborkman, herbert, linux-kernel, netdev, tgraf, tytso >>> In case of openvswitch it shows a performance improvment. The seed >>> parameter could be used as an initial biasing of the crc32 function, but >>> in case of openvswitch it is only set to 0. >> NACK. [...] > Sorry for being unclear, I understood that and didn't bother patching > that '0' with a random seed exactly because of this. And I'm sorry for delivering a long lecture on a subject you already understood perfectly well. I'd just been thinking about it because of Herbert's comments, so it was conveniently at hand. :-) Out of curiousity, what *were* you referring to when you talked about biasing the crc32 function? "Biasing" is a good term becuase it just applies an offset, but what do you gain from doing that? There are nifty things one can do with the CRC32 instruction, however. A lot of ciphers these days use an ARX (add, rotate, XOR) kernel. A crc32 instruction, although linear, does some very powerful rotate & xor operations, and could replace the XOR and rotate. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Where exactly will arch_fast_hash be used 2014-12-08 16:19 ` George Spelvin @ 2014-12-08 16:32 ` Hannes Frederic Sowa 0 siblings, 0 replies; 26+ messages in thread From: Hannes Frederic Sowa @ 2014-12-08 16:32 UTC (permalink / raw) To: George Spelvin Cc: davem, dborkman, herbert, linux-kernel, netdev, tgraf, tytso On Mon, Dec 8, 2014, at 17:19, George Spelvin wrote: > >>> In case of openvswitch it shows a performance improvment. The seed > >>> parameter could be used as an initial biasing of the crc32 function, but > >>> in case of openvswitch it is only set to 0. > > >> NACK. [...] > > > Sorry for being unclear, I understood that and didn't bother patching > > that '0' with a random seed exactly because of this. > > And I'm sorry for delivering a long lecture on a subject you already > understood perfectly well. I learned something, so your time wasn't completely wasted. ;) > I'd just been thinking about it because of Herbert's comments, so it was > conveniently at hand. :-) > > Out of curiousity, what *were* you referring to when you talked > about biasing the crc32 function? "Biasing" is a good term becuase > it just applies an offset, but what do you gain from doing that? Actually, I don't know why the seed parameter was added. I just wanted to mention that there is a way to bias the crc32 function which fits into the style of the other hashing functions, like jhash with its initval parameter. I just kept it around during the rewrite. The only use case I can imagine would be if one would like to calculate a crc32c over a non-contiguous array, thus feeding the result of one crc operation into the next one. > There are nifty things one can do with the CRC32 instruction, however. > A lot of ciphers these days use an ARX (add, rotate, XOR) kernel. > A crc32 instruction, although linear, does some very powerful rotate & > xor operations, and could replace the XOR and rotate. Yes, I have seen it being used in cityhash. There is also a proposal by Intel, but the hash seems too weak, too: http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/hash-method-performance-paper.pdf Bye, Hannes ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Where exactly will arch_fast_hash be used 2014-12-07 13:23 ` George Spelvin 2014-12-07 14:06 ` Hannes Frederic Sowa @ 2014-12-09 14:24 ` Herbert Xu 1 sibling, 0 replies; 26+ messages in thread From: Herbert Xu @ 2014-12-09 14:24 UTC (permalink / raw) To: George Spelvin Cc: davem, dborkman, hannes, linux-kernel, netdev, tgraf, tytso On Sun, Dec 07, 2014 at 08:23:05AM -0500, George Spelvin wrote: > > Okay, I'm confused. *What* scheme? The arch_fast_hash interface doesn't > have any provision for a secret. Because there's no point to having one; It does actually, in the form of the seed parameter. That's why it's so bad: it pretends to be a semi-secure hash function that people will use in hash tables. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Where exactly will arch_fast_hash be used 2014-12-07 5:20 Where exactly will arch_fast_hash be used George Spelvin 2014-12-07 9:28 ` Herbert Xu @ 2014-12-07 13:14 ` Hannes Frederic Sowa 2014-12-07 13:30 ` George Spelvin 1 sibling, 1 reply; 26+ messages in thread From: Hannes Frederic Sowa @ 2014-12-07 13:14 UTC (permalink / raw) To: George Spelvin; +Cc: herbert, dborkman, linux-kernel, netdev, tgraf Hi, On So, 2014-12-07 at 00:20 -0500, George Spelvin wrote: > If you want DoS-resistant hash tables, I'm working on adding SipHash > to the kernel. > > This is a keyed pseudo-random function designed specifically for that > application. I am starting with ext4 directory hashes, and then intended > to expand to secure sequence numbers (since it's far faster than MD5). Please consider xfs, too. AFAIK xfs doesn't seed their hashing so far and the hashing function is pretty weak. One example: http://marc.info/?l=linux-xfs&m=139590613002926&w=2 > (I'm trying to figure out a good interface, since the crypto API > is a bit heavy for something to heavily optimized.) Ack. If we want to use it in the networking stack we should be able to use it without a dependency to the crypto framework. Bye, Hannes ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Where exactly will arch_fast_hash be used 2014-12-07 13:14 ` Hannes Frederic Sowa @ 2014-12-07 13:30 ` George Spelvin 2014-12-07 13:41 ` Hannes Frederic Sowa 0 siblings, 1 reply; 26+ messages in thread From: George Spelvin @ 2014-12-07 13:30 UTC (permalink / raw) To: hannes, linux; +Cc: dborkman, herbert, linux-kernel, netdev, tgraf Thanks for the encouragement! > Please consider xfs, too. > AFAIK xfs doesn't seed their hashing so far and the hashing function is > pretty weak. One example: > http://marc.info/?l=linux-xfs&m=139590613002926&w=2 Is that something that *can* be changed without breaking the disk format? SipHash is explicitly *not* designed to be secure as an unkeyed hash in the way that SHA-type algorithms are. What it's designed to do is provide second preimage resistance of its output, or a function (like modular reduction) of its output, against an attacker who doesn't know the secret seed. > Ack. If we want to use it in the networking stack we should be able to > use it without a dependency to the crypto framework. Already understood. My big question is whether a single function call is okay or we need something inlinable. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Where exactly will arch_fast_hash be used 2014-12-07 13:30 ` George Spelvin @ 2014-12-07 13:41 ` Hannes Frederic Sowa 2014-12-07 13:52 ` Hannes Frederic Sowa 0 siblings, 1 reply; 26+ messages in thread From: Hannes Frederic Sowa @ 2014-12-07 13:41 UTC (permalink / raw) To: George Spelvin; +Cc: dborkman, herbert, linux-kernel, netdev, tgraf On So, 2014-12-07 at 08:30 -0500, George Spelvin wrote: > Thanks for the encouragement! > > > Please consider xfs, too. > > AFAIK xfs doesn't seed their hashing so far and the hashing function is > > pretty weak. One example: > > http://marc.info/?l=linux-xfs&m=139590613002926&w=2 > > Is that something that *can* be changed without breaking the > disk format? SipHash is explicitly *not* designed to be secure as > an unkeyed hash in the way that SHA-type algorithms are. I did some research and it looked like it would need a change to the disk format but it should be doable by incrementing the super block version, so at least newly created filesystem would benefit from it. > What it's designed to do is provide second preimage resistance > of its output, or a function (like modular reduction) of its output, > against an attacker who doesn't know the secret seed. > > > Ack. If we want to use it in the networking stack we should be able to > > use it without a dependency to the crypto framework. > > Already understood. My big question is whether a single function call > is okay or we need something inlinable. Like md5_transfrom, I think a non-inline function would be just fine. Otherwise kernel code size would increase. Most hash users in the network stack mostly deal with less bytes of input than one round needs. Bye, Hannes ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Where exactly will arch_fast_hash be used 2014-12-07 13:41 ` Hannes Frederic Sowa @ 2014-12-07 13:52 ` Hannes Frederic Sowa 0 siblings, 0 replies; 26+ messages in thread From: Hannes Frederic Sowa @ 2014-12-07 13:52 UTC (permalink / raw) To: George Spelvin; +Cc: dborkman, herbert, linux-kernel, netdev, tgraf On So, 2014-12-07 at 14:41 +0100, Hannes Frederic Sowa wrote: > On So, 2014-12-07 at 08:30 -0500, George Spelvin wrote: > > Thanks for the encouragement! > > > > > Please consider xfs, too. > > > AFAIK xfs doesn't seed their hashing so far and the hashing function is > > > pretty weak. One example: > > > http://marc.info/?l=linux-xfs&m=139590613002926&w=2 > > > > Is that something that *can* be changed without breaking the > > disk format? SipHash is explicitly *not* designed to be secure as > > an unkeyed hash in the way that SHA-type algorithms are. > > I did some research and it looked like it would need a change to the > disk format but it should be doable by incrementing the super block > version, so at least newly created filesystem would benefit from it. > > > What it's designed to do is provide second preimage resistance > > of its output, or a function (like modular reduction) of its output, > > against an attacker who doesn't know the secret seed. > > > > > Ack. If we want to use it in the networking stack we should be able to > > > use it without a dependency to the crypto framework. > > > > Already understood. My big question is whether a single function call > > is okay or we need something inlinable. > > Like md5_transfrom, I think a non-inline function would be just fine. > Otherwise kernel code size would increase. Most hash users in the > network stack mostly deal with less bytes of input than one round needs. Of course, if it looks feasable (from a performance PoV, but I doubt that) to migrate the current jhash users to siphash, it might be worth dealing with larger input sizes and maybe also doing it inline. But that very much depends on the code size it would add. Currently we use jhash as the non-linear "secure" hashing functions at most places. Also rhashtable takes a pointer to the hasing function, thus causing gcc to generate a function in each compilation unit if it would be static inline. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Where exactly will arch_fast_hash be used @ 2014-12-04 8:11 Herbert Xu 2014-12-04 12:34 ` Hannes Frederic Sowa 2014-12-04 15:26 ` Thomas Graf 0 siblings, 2 replies; 26+ messages in thread From: Herbert Xu @ 2014-12-04 8:11 UTC (permalink / raw) To: Thomas Graf, Daniel Borkmann, David S. Miller, Theodore Ts'o, netdev, Linux Kernel Mailing List Hi: While working on rhashtable it came to me that this whole concept of arch_fast_hash is flawed. CRCs are linear functions so it's fairly easy for an attacker to identify collisions or at least eliminate a large amount of search space (e.g., controlling the last bit of the hash result is almost trivial, even when you add a random seed). So what exactly are we going to use arch_fast_hash for? Presumably it's places where security is never goint to be an issue, right? Even if security wasn't an issue, straight CRC32 has really poor lower-order bit distribution, which makes it a terrible choice for a hash table that simply uses the lower-order bits. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Where exactly will arch_fast_hash be used 2014-12-04 8:11 Herbert Xu @ 2014-12-04 12:34 ` Hannes Frederic Sowa 2014-12-04 13:14 ` Daniel Borkmann 2014-12-04 15:26 ` Thomas Graf 1 sibling, 1 reply; 26+ messages in thread From: Hannes Frederic Sowa @ 2014-12-04 12:34 UTC (permalink / raw) To: Herbert Xu Cc: Thomas Graf, Daniel Borkmann, David S. Miller, Theodore Ts'o, netdev, Linux Kernel Mailing List Hi Herbert, On Do, 2014-12-04 at 16:11 +0800, Herbert Xu wrote: > While working on rhashtable it came to me that this whole concept > of arch_fast_hash is flawed. CRCs are linear functions so it's > fairly easy for an attacker to identify collisions or at least > eliminate a large amount of search space (e.g., controlling the > last bit of the hash result is almost trivial, even when you add > a random seed). > > So what exactly are we going to use arch_fast_hash for? Presumably > it's places where security is never goint to be an issue, right? > > Even if security wasn't an issue, straight CRC32 has really poor > lower-order bit distribution, which makes it a terrible choice for > a hash table that simply uses the lower-order bits. I wondered the same while trying to use arch_fast_hash in a lot more places (I did a new implementation in assembler I'll send later on, it is mostly optimized to deal with ovs flow keys). While the uniformity of crc32 does actually look good and IMHO this even holds for the lower bits of the hash, I totally agree on the linearity matters. The easiest way to make arch_fast_hash non-linear would be to build up on the crc32 instruction like e.g. the cityhash function family does and it seems not too hard to do that by combining two crc32c outputs of the original and cyclic shifted input data. I have doubts if this is faster than jhash in the end. There are proposals from Intel to do so, but they are patent encumbered. :/ For most consumers in the networking stack, security and DoS resistence is an issue. OVS, for which this was designed at first does do rehashing from time to time, but still there is a possible DoS attack vector with this hashing algorithm. Bye, Hannes ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Where exactly will arch_fast_hash be used 2014-12-04 12:34 ` Hannes Frederic Sowa @ 2014-12-04 13:14 ` Daniel Borkmann 0 siblings, 0 replies; 26+ messages in thread From: Daniel Borkmann @ 2014-12-04 13:14 UTC (permalink / raw) To: Hannes Frederic Sowa Cc: Herbert Xu, Thomas Graf, David S. Miller, Theodore Ts'o, netdev, Linux Kernel Mailing List, fusco On 12/04/2014 01:34 PM, Hannes Frederic Sowa wrote: > On Do, 2014-12-04 at 16:11 +0800, Herbert Xu wrote: >> While working on rhashtable it came to me that this whole concept >> of arch_fast_hash is flawed. CRCs are linear functions so it's >> fairly easy for an attacker to identify collisions or at least >> eliminate a large amount of search space (e.g., controlling the >> last bit of the hash result is almost trivial, even when you add >> a random seed). >> >> So what exactly are we going to use arch_fast_hash for? Presumably >> it's places where security is never goint to be an issue, right? The original proposal [1] targeted ovs-only as a closed-door user in order to speed up the worst case of calculating a hash over the extracted flow key, that is, struct sw_flow_key (which nowadays consumes up to 7 cachelines on x86_64 ...). [1] http://thread.gmane.org/gmane.linux.network/293981/ >> Even if security wasn't an issue, straight CRC32 has really poor >> lower-order bit distribution, which makes it a terrible choice for >> a hash table that simply uses the lower-order bits. > > I wondered the same while trying to use arch_fast_hash in a lot more > places (I did a new implementation in assembler I'll send later on, it > is mostly optimized to deal with ovs flow keys). > > While the uniformity of crc32 does actually look good and IMHO this even > holds for the lower bits of the hash, I totally agree on the linearity > matters. > > The easiest way to make arch_fast_hash non-linear would be to build up > on the crc32 instruction like e.g. the cityhash function family does and > it seems not too hard to do that by combining two crc32c outputs of the > original and cyclic shifted input data. I have doubts if this is faster > than jhash in the end. There are proposals from Intel to do so, but they > are patent encumbered. :/ > > For most consumers in the networking stack, security and DoS resistence > is an issue. OVS, for which this was designed at first does do rehashing > from time to time, but still there is a possible DoS attack vector with > this hashing algorithm. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Where exactly will arch_fast_hash be used 2014-12-04 8:11 Herbert Xu 2014-12-04 12:34 ` Hannes Frederic Sowa @ 2014-12-04 15:26 ` Thomas Graf 2014-12-04 15:29 ` Herbert Xu 1 sibling, 1 reply; 26+ messages in thread From: Thomas Graf @ 2014-12-04 15:26 UTC (permalink / raw) To: Herbert Xu Cc: Daniel Borkmann, David S. Miller, Theodore Ts'o, netdev, Linux Kernel Mailing List On 12/04/14 at 04:11pm, Herbert Xu wrote: > Hi: > > While working on rhashtable it came to me that this whole concept > of arch_fast_hash is flawed. CRCs are linear functions so it's > fairly easy for an attacker to identify collisions or at least > eliminate a large amount of search space (e.g., controlling the > last bit of the hash result is almost trivial, even when you add > a random seed). > > So what exactly are we going to use arch_fast_hash for? Presumably > it's places where security is never goint to be an issue, right? > > Even if security wasn't an issue, straight CRC32 has really poor > lower-order bit distribution, which makes it a terrible choice for > a hash table that simply uses the lower-order bits. As Daniel pointed out, this work originated for the OVS edge use case where security is of less concern and the rehashing is sufficient. Identifying collisions is less of interest as the user space fall back provides a greater surface for an attack. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Where exactly will arch_fast_hash be used 2014-12-04 15:26 ` Thomas Graf @ 2014-12-04 15:29 ` Herbert Xu 2014-12-04 15:39 ` Thomas Graf 0 siblings, 1 reply; 26+ messages in thread From: Herbert Xu @ 2014-12-04 15:29 UTC (permalink / raw) To: Thomas Graf Cc: Daniel Borkmann, David S. Miller, Theodore Ts'o, netdev, Linux Kernel Mailing List On Thu, Dec 04, 2014 at 03:26:37PM +0000, Thomas Graf wrote: > > As Daniel pointed out, this work originated for the OVS edge use > case where security is of less concern and the rehashing is > sufficient. Identifying collisions is less of interest as the user > space fall back provides a greater surface for an attack. Well in that case the current setup I think is very misleading. It's inviting unsuspecting kernel developers to use it as a hash function for general hash tables, which AFAICS is something that it fails at miserably. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Where exactly will arch_fast_hash be used 2014-12-04 15:29 ` Herbert Xu @ 2014-12-04 15:39 ` Thomas Graf 2014-12-04 15:43 ` Daniel Borkmann 0 siblings, 1 reply; 26+ messages in thread From: Thomas Graf @ 2014-12-04 15:39 UTC (permalink / raw) To: Herbert Xu Cc: Daniel Borkmann, David S. Miller, Theodore Ts'o, netdev, Linux Kernel Mailing List On 12/04/14 at 11:29pm, Herbert Xu wrote: > On Thu, Dec 04, 2014 at 03:26:37PM +0000, Thomas Graf wrote: > > > > As Daniel pointed out, this work originated for the OVS edge use > > case where security is of less concern and the rehashing is > > sufficient. Identifying collisions is less of interest as the user > > space fall back provides a greater surface for an attack. > > Well in that case the current setup I think is very misleading. > It's inviting unsuspecting kernel developers to use it as a hash > function for general hash tables, which AFAICS is something that > it fails at miserably. Well, it's called fast hash and not secure hash ;-) but a clear hint definitely wouldn't hurt. I wasn't aware of the distribution weakness of the lower bits. Do you have a reference to more information? ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Where exactly will arch_fast_hash be used 2014-12-04 15:39 ` Thomas Graf @ 2014-12-04 15:43 ` Daniel Borkmann 2014-12-04 15:47 ` Herbert Xu 0 siblings, 1 reply; 26+ messages in thread From: Daniel Borkmann @ 2014-12-04 15:43 UTC (permalink / raw) To: Thomas Graf Cc: Herbert Xu, David S. Miller, Theodore Ts'o, netdev, Linux Kernel Mailing List On 12/04/2014 04:39 PM, Thomas Graf wrote: > On 12/04/14 at 11:29pm, Herbert Xu wrote: >> On Thu, Dec 04, 2014 at 03:26:37PM +0000, Thomas Graf wrote: >>> >>> As Daniel pointed out, this work originated for the OVS edge use >>> case where security is of less concern and the rehashing is >>> sufficient. Identifying collisions is less of interest as the user >>> space fall back provides a greater surface for an attack. >> >> Well in that case the current setup I think is very misleading. >> It's inviting unsuspecting kernel developers to use it as a hash >> function for general hash tables, which AFAICS is something that >> it fails at miserably. > > Well, it's called fast hash and not secure hash ;-) but a clear hint > definitely wouldn't hurt. Hm, I thought the kernel doc on arch_fast_hash() in include/linux/hash.h would give enough of a hint ... ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Where exactly will arch_fast_hash be used 2014-12-04 15:43 ` Daniel Borkmann @ 2014-12-04 15:47 ` Herbert Xu 2014-12-04 15:51 ` Daniel Borkmann 2014-12-04 15:56 ` David Laight 0 siblings, 2 replies; 26+ messages in thread From: Herbert Xu @ 2014-12-04 15:47 UTC (permalink / raw) To: Daniel Borkmann Cc: Thomas Graf, David S. Miller, Theodore Ts'o, netdev, Linux Kernel Mailing List On Thu, Dec 04, 2014 at 04:43:31PM +0100, Daniel Borkmann wrote: > > Hm, I thought the kernel doc on arch_fast_hash() in include/linux/hash.h > would give enough of a hint ... I think something more explicit like "do not use this in a hash table unless you know what you're doing" might be needed. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Where exactly will arch_fast_hash be used 2014-12-04 15:47 ` Herbert Xu @ 2014-12-04 15:51 ` Daniel Borkmann 2014-12-04 15:56 ` David Laight 1 sibling, 0 replies; 26+ messages in thread From: Daniel Borkmann @ 2014-12-04 15:51 UTC (permalink / raw) To: Herbert Xu Cc: Thomas Graf, David S. Miller, Theodore Ts'o, netdev, Linux Kernel Mailing List On 12/04/2014 04:47 PM, Herbert Xu wrote: > On Thu, Dec 04, 2014 at 04:43:31PM +0100, Daniel Borkmann wrote: >> >> Hm, I thought the kernel doc on arch_fast_hash() in include/linux/hash.h >> would give enough of a hint ... > > I think something more explicit like "do not use this in a hash > table unless you know what you're doing" might be needed. Ok, I'm fine with that, do you want to submit a patch on top of, or after Hannes' improvements [1] got in? Thanks a lot, Herbert. [1] http://patchwork.ozlabs.org/patch/417761/ ^ permalink raw reply [flat|nested] 26+ messages in thread
* RE: Where exactly will arch_fast_hash be used 2014-12-04 15:47 ` Herbert Xu 2014-12-04 15:51 ` Daniel Borkmann @ 2014-12-04 15:56 ` David Laight 2014-12-04 16:10 ` Herbert Xu 1 sibling, 1 reply; 26+ messages in thread From: David Laight @ 2014-12-04 15:56 UTC (permalink / raw) To: 'Herbert Xu', Daniel Borkmann Cc: Thomas Graf, David S. Miller, Theodore Ts'o, netdev, Linux Kernel Mailing List From: Herbert Xu > On Thu, Dec 04, 2014 at 04:43:31PM +0100, Daniel Borkmann wrote: > > > > Hm, I thought the kernel doc on arch_fast_hash() in include/linux/hash.h > > would give enough of a hint ... > > I think something more explicit like "do not use this in a hash > table unless you know what you're doing" might be needed. That isn't really helpful. Maybe something more like: "This CRC algorithm used by this hash is 'linear', ie hash(a xor b) == hash(a) xor hash(b). This means that it is relatively easy for a remote attacker to generate multiple items with the same hash." David ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Where exactly will arch_fast_hash be used 2014-12-04 15:56 ` David Laight @ 2014-12-04 16:10 ` Herbert Xu 0 siblings, 0 replies; 26+ messages in thread From: Herbert Xu @ 2014-12-04 16:10 UTC (permalink / raw) To: David Laight Cc: Daniel Borkmann, Thomas Graf, David S. Miller, Theodore Ts'o, netdev, Linux Kernel Mailing List On Thu, Dec 04, 2014 at 03:56:30PM +0000, David Laight wrote: > > "This CRC algorithm used by this hash is 'linear', ie hash(a xor b) == > hash(a) xor hash(b). This means that it is relatively easy for a remote > attacker to generate multiple items with the same hash." The attacker could be local too, e.g., opening a netlink socket can be done by any user and gets hashed in net/netlink/af_netlink.c. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2014-12-09 14:25 UTC | newest] Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-12-07 5:20 Where exactly will arch_fast_hash be used George Spelvin 2014-12-07 9:28 ` Herbert Xu 2014-12-07 10:02 ` George Spelvin 2014-12-07 12:51 ` Herbert Xu 2014-12-07 13:23 ` George Spelvin 2014-12-07 14:06 ` Hannes Frederic Sowa 2014-12-07 21:33 ` George Spelvin 2014-12-08 11:25 ` Hannes Frederic Sowa 2014-12-08 16:19 ` George Spelvin 2014-12-08 16:32 ` Hannes Frederic Sowa 2014-12-09 14:24 ` Herbert Xu 2014-12-07 13:14 ` Hannes Frederic Sowa 2014-12-07 13:30 ` George Spelvin 2014-12-07 13:41 ` Hannes Frederic Sowa 2014-12-07 13:52 ` Hannes Frederic Sowa -- strict thread matches above, loose matches on Subject: below -- 2014-12-04 8:11 Herbert Xu 2014-12-04 12:34 ` Hannes Frederic Sowa 2014-12-04 13:14 ` Daniel Borkmann 2014-12-04 15:26 ` Thomas Graf 2014-12-04 15:29 ` Herbert Xu 2014-12-04 15:39 ` Thomas Graf 2014-12-04 15:43 ` Daniel Borkmann 2014-12-04 15:47 ` Herbert Xu 2014-12-04 15:51 ` Daniel Borkmann 2014-12-04 15:56 ` David Laight 2014-12-04 16:10 ` Herbert Xu
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.