From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754857Ab2GYHWn (ORCPT ); Wed, 25 Jul 2012 03:22:43 -0400 Received: from mail-we0-f174.google.com ([74.125.82.174]:36647 "EHLO mail-we0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751858Ab2GYHWm (ORCPT ); Wed, 25 Jul 2012 03:22:42 -0400 Date: Wed, 25 Jul 2012 09:22:36 +0200 From: Ingo Molnar To: "H. Peter Anvin" Cc: "Theodore Ts'o" , Linux Kernel Developers List , torvalds@linux-foundation.org, w@1wt.eu, ewust@umich.edu, zakir@umich.edu, greg@kroah.com, mpm@selenic.com, nadiah@cs.ucsd.edu, jhalderm@umich.edu, tglx@linutronix.de, davem@davemloft.net, stable@kernel.org, DJ Johnson Subject: Re: [PATCH 07/10] random: add new get_random_bytes_arch() function Message-ID: <20120725072236.GB27535@gmail.com> References: <1341511933-11169-1-git-send-email-tytso@mit.edu> <1341511933-11169-8-git-send-email-tytso@mit.edu> <500F69F3.3040905@zytor.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <500F69F3.3040905@zytor.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * H. Peter Anvin wrote: > For those who have read the Google+ thread[1] it is pretty > clear that there are varying opinions on the idea of removing > the RDRAND bypass. > > I have gathered some performance numbers to make the debate > more concrete: RDRAND is between 12 and 15 times faster than > the current random pool system (for large and small blocks, > respectively.) Both the pool system and RDRAND scale > perfectly with frequency, so the ratio is independent of > P-states. > > Given the discrepancy in performance (and presumably, in terms > of power) I still very much believe it is a mistake to > unconditionally disallow users the option for using RDRAND > directly, but what I *do* believe we can all agree on is that > security is paramount. Dropping RDRAND is not just a > performance loss but is likely a security loss since it will > produce substantially less entropy. > > As a compromise I offer the following patch; in terms of > performance it is "the worst of both worlds" but it should > provide the combined security of either; even if RDRAND is > completely compromised by the NSA, Microsoft and the > Illuminati all at once it will do no worse than the existing > code, [...] I should mention that since there's documented historic examples of our software random pool in drivers/char/random.c having bugs that no-one noticed, XOR-ing the RDRAND hardware stream to the software entropy pool's stream is the sensible thing to do, from a security point of view. So your patch recovers the security lost via this recent patch in random.git: c2557a303ab6 random: add new get_random_bytes_arch() function I don't think random.git should be sent upstream without addressing this Ivy Bridge security regression first. Btw., the commit above has a very misleading title: it not just 'adds' a function but the far more important change it does is that it removes RDRAND from the output stream. The commit also made random number generation an order of magnitude slower. > [...] and (since RDRAND is so much faster than the existing > code) it has only a modest performance cost. More > realistically, it will let many more users take advantage of a > high entropy quick-reseeding random number generator, thus > ending up with a major gain in security. > > It is worth noting that although RDRAND by itself is adequate > for any in-kernel users (and the 3.4-3.5 kernels use them as > such unless you specify "nordrand"), this is not true for > /dev/random; nor, due to abuse, /dev/urandom; the recently > disclosed[2] RDSEED instruction, on the other hand, is defined > to be fully entropic and can be used for any purpose; that one > will be introduced in a later processor. > > Note that the attached patch is way more conservative than it needs > to be: every byte is mixed with RDRAND data twice on its way through > (and an additional 1.2 byte is lost), as I moved the mixing to > extract_buf(), but even so the overhead is modest, and mixing in > extract_buf() makes the code quite a bit simpler. > > This patch is on top of random.git. > > > [1] https://plus.google.com/115124063126128475540/posts/KbAEJKMsAfq > > [2] http://software.intel.com/file/45207 > > -- > H. Peter Anvin, Intel Open Source Technology Center > I work for Intel. I don't speak on their behalf. > > From b36c22b00c6bf8e91a758d3167e912b0ac4f0d0c Mon Sep 17 00:00:00 2001 > From: "H. Peter Anvin" > Date: Tue, 24 Jul 2012 14:48:56 -0700 > Subject: [PATCH] random: mix in architectural randomness in extract_buf() > > RDRAND is so much faster than the Linux pool system that we can > always just mix in architectural randomness. > > Doing this in extract_buf() lets us do this in one convenient > place, unfortunately the output size (10 bytes) is maximally > awkward. That, plus the fact that every output byte will have > passed through extract_buf() twice means we are not being very > efficient with the RDRAND use. > > Measurements show that RDRAND is 12-15 times faster than the Linux > pool system. Doing the math shows this corresponds to about an > 11.5% slowdown which is confirmed by measurements. > > Users who are very performance- or power-sensitive could definitely > still benefit from being allowed to use RDRAND directly, but I > believe this version should satisfy even the most hyper-paranoid > crowd. > > Signed-off-by: H. Peter Anvin > Cc: DJ Johnson > --- > drivers/char/random.c | 56 ++++++++++++++++++++++++++++--------------------- > 1 file changed, 32 insertions(+), 24 deletions(-) > > diff --git a/drivers/char/random.c b/drivers/char/random.c > index 9793b40..a4a24e4 100644 > --- a/drivers/char/random.c > +++ b/drivers/char/random.c > @@ -277,6 +277,8 @@ > #define SEC_XFER_SIZE 512 > #define EXTRACT_SIZE 10 > > +#define LONGS(x) (((x) + sizeof(unsigned long) - 1)/sizeof(unsigned long)) > + > /* > * The minimum number of bits of entropy before we wake up a read on > * /dev/random. Should be enough to do a significant reseed. > @@ -813,11 +815,7 @@ static ssize_t extract_entropy(struct entropy_store *r, void *buf, > */ > static void xfer_secondary_pool(struct entropy_store *r, size_t nbytes) > { > - union { > - __u32 tmp[OUTPUT_POOL_WORDS]; > - long hwrand[4]; > - } u; > - int i; > + __u32 tmp[OUTPUT_POOL_WORDS]; > > if (r->pull && r->entropy_count < nbytes * 8 && > r->entropy_count < r->poolinfo->POOLBITS) { > @@ -828,23 +826,17 @@ static void xfer_secondary_pool(struct entropy_store *r, size_t nbytes) > /* pull at least as many as BYTES as wakeup BITS */ > bytes = max_t(int, bytes, random_read_wakeup_thresh / 8); > /* but never more than the buffer size */ > - bytes = min_t(int, bytes, sizeof(u.tmp)); > + bytes = min_t(int, bytes, sizeof(tmp)); > > DEBUG_ENT("going to reseed %s with %d bits " > "(%d of %d requested)\n", > r->name, bytes * 8, nbytes * 8, r->entropy_count); > > - bytes = extract_entropy(r->pull, u.tmp, bytes, > + bytes = extract_entropy(r->pull, tmp, bytes, > random_read_wakeup_thresh / 8, rsvd); > - mix_pool_bytes(r, u.tmp, bytes, NULL); > + mix_pool_bytes(r, tmp, bytes, NULL); > credit_entropy_bits(r, bytes*8); > } > - kmemcheck_mark_initialized(&u.hwrand, sizeof(u.hwrand)); > - for (i = 0; i < 4; i++) > - if (arch_get_random_long(&u.hwrand[i])) > - break; > - if (i) > - mix_pool_bytes(r, &u.hwrand, sizeof(u.hwrand), 0); > } > > /* > @@ -901,15 +893,19 @@ static size_t account(struct entropy_store *r, size_t nbytes, int min, > static void extract_buf(struct entropy_store *r, __u8 *out) > { > int i; > - __u32 hash[5], workspace[SHA_WORKSPACE_WORDS]; > + union { > + __u32 w[5]; > + unsigned long l[LONGS(EXTRACT_SIZE)]; > + } hash; > + __u32 workspace[SHA_WORKSPACE_WORDS]; > __u8 extract[64]; > unsigned long flags; > > /* Generate a hash across the pool, 16 words (512 bits) at a time */ > - sha_init(hash); > + sha_init(hash.w); > spin_lock_irqsave(&r->lock, flags); > for (i = 0; i < r->poolinfo->poolwords; i += 16) > - sha_transform(hash, (__u8 *)(r->pool + i), workspace); > + sha_transform(hash.w, (__u8 *)(r->pool + i), workspace); > > /* > * We mix the hash back into the pool to prevent backtracking > @@ -920,14 +916,14 @@ static void extract_buf(struct entropy_store *r, __u8 *out) > * brute-forcing the feedback as hard as brute-forcing the > * hash. > */ > - __mix_pool_bytes(r, hash, sizeof(hash), extract); > + __mix_pool_bytes(r, hash.w, sizeof(hash.w), extract); > spin_unlock_irqrestore(&r->lock, flags); > > /* > * To avoid duplicates, we atomically extract a portion of the > * pool while mixing, and hash one final time. > */ > - sha_transform(hash, extract, workspace); > + sha_transform(hash.w, extract, workspace); > memset(extract, 0, sizeof(extract)); > memset(workspace, 0, sizeof(workspace)); > > @@ -936,11 +932,23 @@ static void extract_buf(struct entropy_store *r, __u8 *out) > * pattern, we fold it in half. Thus, we always feed back > * twice as much data as we output. > */ > - hash[0] ^= hash[3]; > - hash[1] ^= hash[4]; > - hash[2] ^= rol32(hash[2], 16); > - memcpy(out, hash, EXTRACT_SIZE); > - memset(hash, 0, sizeof(hash)); > + hash.w[0] ^= hash.w[3]; > + hash.w[1] ^= hash.w[4]; > + hash.w[2] ^= rol32(hash.w[2], 16); > + > + /* > + * If we have a architectural hardware random number > + * generator, mix that in, too. > + */ > + for (i = 0; i < LONGS(EXTRACT_SIZE); i++) { > + unsigned long v; > + if (!arch_get_random_long(&v)) > + break; > + hash.l[i] ^= v; > + } > + > + memcpy(out, hash.w, EXTRACT_SIZE); > + memset(&hash, 0, sizeof(hash)); > } Acked-by: Ingo Molnar Thanks, Ingo