From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1764257AbcLVCcM (ORCPT ); Wed, 21 Dec 2016 21:32:12 -0500 Received: from frisell.zx2c4.com ([192.95.5.64]:60090 "EHLO frisell.zx2c4.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1764128AbcLVCcF (ORCPT ); Wed, 21 Dec 2016 21:32:05 -0500 MIME-Version: 1.0 In-Reply-To: References: <20161216030328.11602-1-Jason@zx2c4.com> <20161221230216.25341-1-Jason@zx2c4.com> <20161221230216.25341-4-Jason@zx2c4.com> From: "Jason A. Donenfeld" Date: Thu, 22 Dec 2016 03:31:56 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v7 3/6] random: use SipHash in place of MD5 To: Andy Lutomirski Cc: Netdev , "kernel-hardening@lists.openwall.com" , LKML , Linux Crypto Mailing List , David Laight , Ted Tso , Hannes Frederic Sowa , Eric Dumazet , Linus Torvalds , Eric Biggers , Tom Herbert , Andi Kleen , "David S. Miller" , Jean-Philippe Aumasson Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Andy, On Thu, Dec 22, 2016 at 12:42 AM, Andy Lutomirski wrote: > So this is probably good enough, and making it better is hard. Changing it to: > > u64 entropy = (u64)random_get_entropy() + current->pid; > result = siphash(..., entropy, ...); > secret->chaining += result + entropy; > > would reduce this problem by forcing an attacker to brute-force the > entropy on each iteration, which is probably an improvement. Ahh, so that's the reasoning behind a similar suggestion of yours in a previous email. Makes sense to me. I'll include this in the next merge if we don't come up with a different idea before then. Your reasoning seems good for it. Part of what makes this process a bit goofy is that it's not all together clear what the design goals are. Right now we're going for "not worse than before", which we've nearly achieved. How good of an RNG do we want? I'm willing to examine and analyze the security and performance of all constructions we can come up with. One thing I don't want to do, however, is start tweaking the primitives themselves in ways not endorsed by the designers. So, I believe that precludes things like carrying over SipHash's internal state (like what was done with MD5), because there hasn't been a formal security analysis of this like there has with other uses of SipHash. I also don't want to change any internals of how SipHash actually works. I mention that because of some of the suggestions on other threads, which make me rather uneasy. So with that said, while writing this reply to you, I was simultaneously reading some other crypto code and was reminded that there's a variant of SipHash which outputs an additional 64-bits; it's part of the siphash reference code, which they call the "128-bit mode". It has the benefit that we can return 64-bits to the caller and save 64-bits for the chaining key. That way there's no correlation between the returned secret and the chaining key, which I think would completely alleviate all of your concerns, and simplify the analysis a bit. Here's what it looks like: https://git.zx2c4.com/linux-dev/commit/?h=siphash&id=46fbe5b408e66b2d16b4447860f8083480e1c08d The downside is that it takes 4 extra Sip rounds. This puts the performance still better than MD5, though, and likely still better than the other batched entropy solution. We could optimize this, I suppose, by giving it only two parameters -- chaining, jiffies+entropy+pid -- instead of the current three -- chaining, jiffies, entropy+pid -- which would then shave off 2 Sip rounds. But I liked the idea of having a bit more spread in the entropy input field. Anyway, with this in mind, we now have three possibilities: 1. result = siphash(chaining, entropy, key); chaining += result + entropy 2. result = siphash_extra_output(chaining, entropy, key, &chaining); 3. Ted's batched entropy idea using chacha20 The more I think about this, the more I suspect that we should just use chacha20. It will still be faster than MD5. I don't like the non-determinism of it (some processes will start slower than others, if the batched entropy has run out and ASLR demands more), but I guess I can live with that. But, most importantly, it greatly simplifies both the security analysis and what we can promise to callers about the function. Right now in the comment documentation, we're coy with callers about the security of the RNG. If we moved to a known construction like chacha20/get_random_bytes_batched, then we could just be straight up with a promise that the numbers it returns are high quality. Thoughts on 2 and 3, and on 1 vs 2 vs 3? Jason