From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932339AbbJMCrA (ORCPT ); Mon, 12 Oct 2015 22:47:00 -0400 Received: from imap.thunk.org ([74.207.234.97]:45990 "EHLO imap.thunk.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932092AbbJMCq6 (ORCPT ); Mon, 12 Oct 2015 22:46:58 -0400 Date: Mon, 12 Oct 2015 22:46:45 -0400 From: "Theodore Ts'o" To: George Spelvin Cc: ahferroin7@gmail.com, andi@firstfloor.org, jepler@unpythonic.net, linux-kernel@vger.kernel.org, linux@rasmusvillemoes.dk Subject: Re: Updated scalable urandom patchkit Message-ID: <20151013024645.GA31306@thunk.org> Mail-Followup-To: Theodore Ts'o , George Spelvin , ahferroin7@gmail.com, andi@firstfloor.org, jepler@unpythonic.net, linux-kernel@vger.kernel.org, linux@rasmusvillemoes.dk References: <20151012135434.GA3548@thunk.org> <20151012203059.26372.qmail@ns.horizon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151012203059.26372.qmail@ns.horizon.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: tytso@thunk.org X-SA-Exim-Scanned: No (on imap.thunk.org); SAEximRunCond expanded to false Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 12, 2015 at 04:30:59PM -0400, George Spelvin wrote: > > Segregating abusers solves both problems. If we do this then we don't > > need to drop the locks from the nonblocking pool, which solves the > > security problem. > > Er, sort of. I still think my points were valid, but they're > about a particular optimization suggestion you had. By avoiding > the need for the optimization, the entire issue is mooted. Sure, I'm not in love with anyone's particular optimization, whether it's mine, yours, or Andi's. I'm just trying to solve the scalability problem while also trying to keep the code maintainable and easy to understand (and over the years we've actually made things worse, to the extent that having a single mixing for the input and output pools is starting to be more of problem than a feature, since we're coding in a bunch of exceptions when it's the output pool, etc.). So if we can solve a problem by routing around it, that's fine in my book. > You have to copy the state *anyway* because you don't want it overwritten > by the ChaCha output, so there's really no point storing the constants. > (Also, ChaCha has a simpler input block structure than Salsa20; the > constants are all adjacent.) We're really getting into low-level implementations here, and I think it's best to worry about these sorts of things when we have a patch to review..... > (Note: one problem with ChaCha specifically is that is needs 16x32 bits > of registers, and Arm32 doesn't quite have enough. We may want to provide > an arch CPRNG hook so people can plug in other algorithms with good > platform support, like x86 AES instructions.) So while a ChaCha20-based CRNG should be faster than a SHA-1 based CRNG, and I consider this a good thing, for me speed is **not** more important than keeping the underlying code maintainable and simple. This is one of the reasons why I looked at, and then discarded, to use x86 accelerated AES as the basis for a CRNG. Setting up AES so that it can be used easily with or without hardware acceleration looks very complicated to do in a cross-architectural way, and I don't want to drag in all of the crypto layer for /dev/random. > The same variables can be used (with different parameters) to decide if > we want to get out of mitigation mode. The one thing to watch out for > is that "cat /dev/sdX" may have some huge pauses once > the buffer cache fills. We don't want to forgive after too small a > fixed interval. At least initially, once we go into mitigation mode for a particular process, it's probably safer to simply not exit it. > Finally, we have the issue of where to attach this rate-limiting structure > and crypto context. My idea was to use the struct file. But now that > we have getrandom(2), it's harder. mm, task_struct, signal_struct, what? I'm personally more inclined to keep it with the task struct, so that different threads will use different crypto contexts, just from simplicity point of view since we won't need to worry about locking. Since many processes don't use /dev/urandom or getrandom(2) at all, the first time they do, we'd allocate a structure and hang it off the task_struct. When the process exits, we would explicitly memzero it and then release the memory. > (Post-finally, do we want this feature to be configurable under > CONFIG_EMBEDDED? I know keeping the /dev/random code size small is > a speficic design goal, and abuse mitigation is optional.) Once we code it up we can see how many bytes this takes, we can have this discussion. I'll note that ChaCha20 is much more compact than SHA1: text data bss dec hex filename 4230 0 0 4230 1086 /build/ext4-64/lib/sha1.o 1152 304 0 1456 5b0 /build/ext4-64/crypto/chacha20_generic.o ... and I've thought about this as being the first step towards potentially replacing SHA1 with something ChaCha20 based, in light of the SHAppening attack. Unfortunately, BLAKE2s is similar to ChaCha only from design perspective, not an implementation perspective. Still, I suspect the just looking at the crypto primitives, even if we need to include two independent copies of the ChaCha20 core crypto and the Blake2s core crypto, it still should be about half the size of the SHA-1 crypto primitive. And from the non-plumbing side of things, Andi's patchset increases the size of /dev/random by a bit over 6%, or 974 bytes from a starting base of 15719 bytes. It ought to be possible to implement a ChaCha20 based CRNG (ignoring the crypto primitives) in less than 974 bytes of x86_64 assembly. :-) - Ted