From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932233AbbJME1L (ORCPT ); Tue, 13 Oct 2015 00:27:11 -0400 Received: from mail-pa0-f41.google.com ([209.85.220.41]:33062 "EHLO mail-pa0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751050AbbJME1J (ORCPT ); Tue, 13 Oct 2015 00:27:09 -0400 Date: Mon, 12 Oct 2015 20:50:22 -0700 From: Raymond Jennings Subject: Re: Updated scalable urandom patchkit To: "Theodore Ts'o" , George Spelvin , ahferroin7@gmail.com, andi@firstfloor.org, jepler@unpythonic.net, linux-kernel@vger.kernel.org, linux@rasmusvillemoes.dk Message-Id: <1444708222.900.0@smtp.gmail.com> In-Reply-To: <20151013024645.GA31306@thunk.org> References: <20151012135434.GA3548@thunk.org> <20151012203059.26372.qmail@ns.horizon.com> <20151013024645.GA31306@thunk.org> X-Mailer: geary/0.10.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 12, 2015 at 7:46 PM, Theodore Ts'o wrote: > On Mon, Oct 12, 2015 at 04:30:59PM -0400, George Spelvin wrote: >> > Segregating abusers solves both problems. If we do this then we >> don't >> > need to drop the locks from the nonblocking pool, which solves the >> > security problem. >> >> Er, sort of. I still think my points were valid, but they're >> about a particular optimization suggestion you had. By avoiding >> the need for the optimization, the entire issue is mooted. > > Sure, I'm not in love with anyone's particular optimization, whether > it's mine, yours, or Andi's. I'm just trying to solve the scalability > problem while also trying to keep the code maintainable and easy to > understand (and over the years we've actually made things worse, to > the extent that having a single mixing for the input and output pools > is starting to be more of problem than a feature, since we're coding > in a bunch of exceptions when it's the output pool, etc.). > > So if we can solve a problem by routing around it, that's fine in my > book. > >> You have to copy the state *anyway* because you don't want it >> overwritten >> by the ChaCha output, so there's really no point storing the >> constants. >> (Also, ChaCha has a simpler input block structure than Salsa20; the >> constants are all adjacent.) > > We're really getting into low-level implementations here, and I think > it's best to worry about these sorts of things when we have a patch to > review..... > >> (Note: one problem with ChaCha specifically is that is needs 16x32 >> bits >> of registers, and Arm32 doesn't quite have enough. We may want to >> provide >> an arch CPRNG hook so people can plug in other algorithms with good >> platform support, like x86 AES instructions.) > > So while a ChaCha20-based CRNG should be faster than a SHA-1 based > CRNG, and I consider this a good thing, for me speed is **not** more > important than keeping the underlying code maintainable and simple. > This is one of the reasons why I looked at, and then discarded, to use > x86 accelerated AES as the basis for a CRNG. Setting up AES so that > it can be used easily with or without hardware acceleration looks very > complicated to do in a cross-architectural way, and I don't want to > drag in all of the crypto layer for /dev/random. > >> The same variables can be used (with different parameters) to >> decide if >> we want to get out of mitigation mode. The one thing to watch out >> for >> is that "cat /dev/sdX" may have some huge pauses once >> the buffer cache fills. We don't want to forgive after too small a >> fixed interval. > > At least initially, once we go into mitigation mode for a particular > process, it's probably safer to simply not exit it. > >> Finally, we have the issue of where to attach this rate-limiting >> structure >> and crypto context. My idea was to use the struct file. But now >> that >> we have getrandom(2), it's harder. mm, task_struct, signal_struct, >> what? > > I'm personally more inclined to keep it with the task struct, so that > different threads will use different crypto contexts, just from > simplicity point of view since we won't need to worry about locking. > > Since many processes don't use /dev/urandom or getrandom(2) at all, > the first time they do, we'd allocate a structure and hang it off the > task_struct. When the process exits, we would explicitly memzero it > and then release the memory. > >> (Post-finally, do we want this feature to be configurable under >> CONFIG_EMBEDDED? I know keeping the /dev/random code size small is >> a speficic design goal, and abuse mitigation is optional.) > > Once we code it up we can see how many bytes this takes, we can have > this discussion. I'll note that ChaCha20 is much more compact than > SHA1: > > text data bss dec hex filename > 4230 0 0 4230 1086 /build/ext4-64/lib/sha1.o > 1152 304 0 1456 > 5b0 /build/ext4-64/crypto/chacha20_generic.o > > ... and I've thought about this as being the first step towards > potentially replacing SHA1 with something ChaCha20 based, in light of > the SHAppening attack. Unfortunately, BLAKE2s is similar to ChaCha > only from design perspective, not an implementation perspective. > Still, I suspect the just looking at the crypto primitives, even if we > need to include two independent copies of the ChaCha20 core crypto and > the Blake2s core crypto, it still should be about half the size of the > SHA-1 crypto primitive. > > And from the non-plumbing side of things, Andi's patchset increases > the size of /dev/random by a bit over 6%, or 974 bytes from a starting > base of 15719 bytes. It ought to be possible to implement a ChaCha20 > based CRNG (ignoring the crypto primitives) in less than 974 bytes of > x86_64 assembly. :-) > > - Ted > > -- > To unsubscribe from this list: send the line "unsubscribe > linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ This might be stupid, but could something asynchronous work? Perhaps have the entropy generators dump their entropy into a central pool via a cycbuf, and have a background kthread manage the per-cpu or per-process entropy pools?