From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S270835AbTHKCJu (ORCPT ); Sun, 10 Aug 2003 22:09:50 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S270846AbTHKCJu (ORCPT ); Sun, 10 Aug 2003 22:09:50 -0400 Received: from mail.jlokier.co.uk ([81.29.64.88]:22405 "EHLO mail.jlokier.co.uk") by vger.kernel.org with ESMTP id S270835AbTHKCJr (ORCPT ); Sun, 10 Aug 2003 22:09:47 -0400 Date: Mon, 11 Aug 2003 03:09:19 +0100 From: Jamie Lokier To: Matt Mackall Cc: James Morris , linux-kernel , Andrew Morton , davem@redhat.com, tytso@mit.edu Subject: Re: [RFC][PATCH] Make cryptoapi non-optional? Message-ID: <20030811020919.GD10446@mail.jlokier.co.uk> References: <20030809173329.GU31810@waste.org> <20030810174528.GZ31810@waste.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030810174528.GZ31810@waste.org> User-Agent: Mutt/1.4.1i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org [Ted T'so added to recipients because he wrote this thing :)] Matt Mackall wrote: > The idea with the folding was that we can cover up any systemic > patterns in the returned hash by xoring the first half of the hash > with the second half of the hash. While this might seem like a good > technique intuitively, it's mathematically flawed. > > Let's assume the simplest case of a two bit hash xy. "Patterns" here > are going to translate into a correlation between otherwise > well-distributed x and y. A perfectly uncorrelated system xy is going > to have two bits of entropy. A perfectly correlated (or > anti-correlated) system xy is going to have only 1 bit of entropy. > > Now what happens when we fold these two bits together z=x^y? In the > uncorrelated case we end up with a well-distributed z. In the > correlated case, we end up with 0 or 1, that is z=x^x=0 or z=x^-x, and > we've eliminated any entropy we once had. If correlation is less than > 100%, then we get somewhere between 0 and 1 bits of entropy, but > always less than if we just returned z=x or z=y. This argument > naturally scales up to larger variables x and y. > > Ok, so that explains taking out the xor. But I also return xy and not > just x. With the former, every bit of input goes through SHA1 twice, > once in the input pool, and once in the output pool, along with lots > of feedback to foil backtracking attacks. In the latter, every output > bit is going through SHA four times, which is just overkill. If we > don't trust our hash to generate uniform, non-self-correlating output, > running additional rounds does not guarantee that we aren't magnifying > the problem. And it is of course making everything twice as expensive. If you return xy, you are returning a strong digest of the pool state. Even with the backtrack-prevention, if the attacker reads 20 bytes from /dev/random and sees a _recognised_ pattern, they immediately know the entire state of the secondary pool. This can happen if, for example, a machine has just rebooted and hasn't had time to collect much entropy since the initial state. (In general, recognising the output of a cryptographic hash tells you the input, because even though there are many possible inputs for that output, it's far more likely that a systemic error is the cause of the recognised output, rather than the cause being randomness). This means that, for a short time after, they can predict further output bits exactly. "Catastrophic reseeding" from the primary pool limits how many bits they can predict like this - unless, if they're lucky, it tells them the state of the primary pool too. When you output less of the hash, this reduces the information available to predict the state of the pool. However, SHA1 is large enough that even half of the hash is a strong predictor, so returning half of the hash still has this weakness. (Ironically, a much weaker hash would prevent this mode of data leakage). As far as I can tell, folding the hash result doesn't help, because that simply creates a different kind of hash which can also be recognised, and which is large enough to predict the pool state. Which leaves... I have just convinced myself of a flaw in the existing random.c - which still holds in your patched version. Folding the result of the strong hash is useless. Instead, the _input_ to the strong hash should be folded, so that a single hash result (whether folded or not) doesn't reveal the pool state even when it is recognised. In other words, the hash should be calculated over fewer than all of the pool bits each time. The set of pool bits used, or how multiple bits are folded, needs to be varied in such a way that many hash results would need to be all recognised to determine the pool state. I think this equivalent to saying that the effective hash function needs to be weak enough not to reveal the pool state when it is recognised, due to the pool state being recognisably "special" (e.g. limited time after booting), while at the same time the hash needs to be strong enough that it does not reveal the pool state when the pool is random. Enjoy, -- Jamie