All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: George Spelvin <linux@horizon.com>
Cc: andi@firstfloor.org, ahferroin7@gmail.com, jepler@unpythonic.net,
	linux-kernel@vger.kernel.org, linux@rasmusvillemoes.dk
Subject: Re: Updated scalable urandom patchkit
Date: Sat, 10 Oct 2015 22:31:46 -0400	[thread overview]
Message-ID: <20151011023146.GA5341@thunk.org> (raw)
In-Reply-To: <20151010184546.759.qmail@ns.horizon.com>

On Sat, Oct 10, 2015 at 02:45:46PM -0400, George Spelvin wrote:
> In general, fewer larger pools is better for entropy storage.  The only
> reason for the current three-pool design is to achieve strict isolation
> of the /dev/random pool.

You're absolutely right, and I would like to see if we can get away
with keeping with a single pool design.  Your idea of using the CPU id
as a nonce is a good one, and I think we can do something similar that
should be good enough for mixing the hash back into the pool.  We can
simply use a hash of the CPU id to change the offset where we do the
mixing, and this should be good enough to avoid collisions when we do
the add-back.

HOWEVER.  One downside of this approach is that, especially on a NUMA
system, the costs of the cache coherency across a single pool which is
constantly being modified and shared by all of the NUMA nodes could be
a killer, even if we go completely lockless.

To that end, Andi, can you try benchmarking the scalability of the
patch below?  I really hope it will be good enough, since besides
using less memory, there are security advantages in not spreading the
entropy across N pools.

If it isn't, we might be able to play a trick where we sample the
r->add_ptr value before we start hashing the pool, and then check to
see if it's changed afterwards.  If it has, we could skip doing the
hash back, which could reduce the cache coherency traffic, since as
you point out:

> 2d. A more complete solution involves noting that, if there are multiple
>     concurrent readers, only one has to add back its output to prevent
>     backtracking, because all of the concurrent reads are equivalent
>     in that sense.  (Even though, because they're salted with separate
>     nonces like the CPU id, they're not identical.)

However, even if we put in that optimization, the primary question is
how good is Intel's cache coherency protocols on their NUMA systems?
I'm pretty sure this would be a disaster on, say, Sequent's old NUMA
machines, but I'm quite confident all of those servers are dead and
buried by now.  :-)

						- Ted

commit 3cb51896deab45bddc1b8f571b1103eae8f50e0e
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Sat Oct 10 22:03:53 2015 -0400

    random: make the reading from the non-blocking pool more scalable
    
    Andi Kleen reported a case where a 4 socket system spent >80% of its
    total CPU time contending on the global urandom nonblocking pool
    spinlock. While the application could probably have used an own PRNG,
    it may have valid reasons to use the best possible key for different
    session keys.
    
    Instead of creating separate multiple per-NUMA node non-blocking
    pools, use a trick suggested by George Spelvin:
    
       Entropy is a holographic property of the pool, not located in any
       particular bit position.
    
       And the most basic operation, of reading from the pool, can easily be
       done by multiple readers at the same time from the same bit pattern.
       They just need to be salted with distinguishing nonces (CPU IDs will do
       nicely) to ensure they all get different results....
    
    We use this trick, and in addition use a hash of the cpu id to change
    where we mix the hash back into the pool to avoid collisions.
    
    Since we are already using a lockless technique (cmpxchg) to update
    the entropy accounting, we don't need to change this around.
    
    This technique won't be quite as scalable since on a NUMA node we will
    still be forcing cache lines to bounce around, but from the
    perspective of entropy storage we're much better using a single pool
    rather than spreading it across multiple pools.
    
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>

diff --git a/drivers/char/random.c b/drivers/char/random.c
index d0da5d8..be6b315 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -260,6 +260,7 @@
 #include <linux/irq.h>
 #include <linux/syscalls.h>
 #include <linux/completion.h>
+#include <linux/hash.h>
 
 #include <asm/processor.h>
 #include <asm/uaccess.h>
@@ -494,7 +495,9 @@ static void _mix_pool_bytes(struct entropy_store *r, const void *in,
 {
 	unsigned long i, tap1, tap2, tap3, tap4, tap5;
 	int input_rotate;
+	int is_nonblock = (r == &nonblocking_pool);
 	int wordmask = r->poolinfo->poolwords - 1;
+	int poolbits = r->poolinfo->poolbitshift - 5;
 	const char *bytes = in;
 	__u32 w;
 
@@ -506,6 +509,8 @@ static void _mix_pool_bytes(struct entropy_store *r, const void *in,
 
 	input_rotate = r->input_rotate;
 	i = r->add_ptr;
+	if (is_nonblock)
+		i += hash_32(get_cpu(), poolbits);
 
 	/* mix one byte at a time to simplify size handling and churn faster */
 	while (nbytes--) {
@@ -534,6 +539,8 @@ static void _mix_pool_bytes(struct entropy_store *r, const void *in,
 
 	r->input_rotate = input_rotate;
 	r->add_ptr = i;
+	if (is_nonblock)
+		put_cpu();
 }
 
 static void __mix_pool_bytes(struct entropy_store *r, const void *in,
@@ -1090,6 +1097,7 @@ retry:
 static void extract_buf(struct entropy_store *r, __u8 *out)
 {
 	int i;
+	int is_nonblock = (r == &nonblocking_pool);
 	union {
 		__u32 w[5];
 		unsigned long l[LONGS(20)];
@@ -1108,9 +1116,12 @@ static void extract_buf(struct entropy_store *r, __u8 *out)
 			break;
 		hash.l[i] = v;
 	}
+	if (is_nonblock)
+		hash.w[0] ^= hash_32(get_cpu(), 32);
+	else
+		spin_lock_irqsave(&r->lock, flags);
 
 	/* Generate a hash across the pool, 16 words (512 bits) at a time */
-	spin_lock_irqsave(&r->lock, flags);
 	for (i = 0; i < r->poolinfo->poolwords; i += 16)
 		sha_transform(hash.w, (__u8 *)(r->pool + i), workspace);
 
@@ -1124,7 +1135,10 @@ static void extract_buf(struct entropy_store *r, __u8 *out)
 	 * hash.
 	 */
 	__mix_pool_bytes(r, hash.w, sizeof(hash.w));
-	spin_unlock_irqrestore(&r->lock, flags);
+	if (is_nonblock)
+		put_cpu();
+	else
+		spin_unlock_irqrestore(&r->lock, flags);
 
 	memzero_explicit(workspace, sizeof(workspace));
 

  reply	other threads:[~2015-10-11  2:32 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-10 18:45 Updated scalable urandom patchkit George Spelvin
2015-10-11  2:31 ` Theodore Ts'o [this message]
2015-10-11  2:53   ` Theodore Ts'o
2015-10-11  4:35   ` George Spelvin
2015-10-11 22:25     ` Theodore Ts'o
2015-10-12  0:16       ` George Spelvin
2015-10-12  4:05         ` Theodore Ts'o
2015-10-12  7:49           ` George Spelvin
2015-10-12 13:54             ` Theodore Ts'o
2015-10-12 20:30               ` George Spelvin
2015-10-12 20:34                 ` George Spelvin
2015-10-13  2:46                 ` Theodore Ts'o
2015-10-13  3:50                   ` Raymond Jennings
2015-10-13  7:50                     ` George Spelvin
2015-10-13  6:24                   ` George Spelvin
2015-10-13 16:20                   ` Andi Kleen
2015-10-13 21:10                     ` George Spelvin
2015-10-14  2:15                       ` Andi Kleen
2015-10-16  5:28                         ` [RFC PATCH 0/4] Alternate sclable urandom patchset George Spelvin
2015-10-16  5:29                           ` [RFC PATCH 1/4] random: Reduce stack usage in _xfer_secondary_pool George Spelvin
2015-10-16  5:30                           ` [RFC PATCH 2/4] random: Remove two unused arguments from extract_entropy() George Spelvin
2015-10-16  5:33                           ` [RFC PATCH 3/4] random: Only do mixback once per read George Spelvin
2015-10-16  6:12                             ` kbuild test robot
2015-10-16  8:11                               ` George Spelvin
2015-10-16  6:23                             ` kbuild test robot
2015-10-16  5:34                           ` [RFC PATCH 4/4] random: Make non-blocking mixback non-blocking George Spelvin
2015-10-21  8:27                       ` Updated scalable urandom patchkit George Spelvin
2015-10-21 11:47                         ` Andi Kleen
2015-10-21 18:10                           ` George Spelvin
  -- strict thread matches above, loose matches on Subject: below --
2015-09-24 17:19 Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151011023146.GA5341@thunk.org \
    --to=tytso@mit.edu \
    --cc=ahferroin7@gmail.com \
    --cc=andi@firstfloor.org \
    --cc=jepler@unpythonic.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@horizon.com \
    --cc=linux@rasmusvillemoes.dk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.