linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH-v3 0/5] random: replace urandom pool with a CRNG
@ 2016-05-30  5:39 Theodore Ts'o
  2016-05-30  5:39 ` [PATCH 1/5] random: replace non-blocking pool with a Chacha20-based CRNG Theodore Ts'o
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Theodore Ts'o @ 2016-05-30  5:39 UTC (permalink / raw)
  To: Linux Kernel Developers List
  Cc: smueller, herbert, andi, sandyinchina, cryptography, jsd, hpa,
	linux-crypto, Theodore Ts'o

By using a CRNG to replace the urandom pool, we address a number of
complaints which Stephan Mueller has been concerned about.  We now use
a much more aggressive interrupt sampling system to quickly initialize
a CRNG which gets used in place of the original non-blocking pool.
This tends to get initialized *very* quickly (before the devices are
finished being proved.)  Like Stephan's proposal, this assumes that we
can get a bit of entropy per interrupt, which may be problematic on
some architectures.  So after we do this quick-and-dirty
initialization, we then fall back to the slower, more conservative
interrupt sampling system to fill the input pool, and we will do a
catastrophic reseeding once we get 128 bits using the slower but more
conservative system, and every five minutes afterwards, if possible.

In addition, on NUMA systems we make the CRNG state per-NUMA socket, to
address the NUMA locking contention problem which Andi Kleen has been
complaining about.  I'm not entirely sure this will work well on the
crazy big SGI systems, but they are rare.  Whether they are rarer than
abusive userspace programs that are continuously pounding /dev/urandom
is unclear.  If necessary we can make a config option to turn off the
per-NUMA socket hack if it proves to be problematic.

Note: I didn't propose this for merging in 4.7 because I wanted to
   further refine the reseeding logic and because I wanted to get more
   feedback.  My plan is to merge these changes for the 4.8 merge
   window.

These patches are also available at:
	git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random.git

Changes since -v2:
  * Rebased to v4.7-rc1
  * Improved/reworked CRNG reseeding and backtracking protection
  * Preseed the CRNG state from system data
  * Added fix to properly align the get_random_int_hash[] array

Eric Biggers (1):
  random: properly align get_random_int_hash

Stephan Mueller (1):
  random: add interrupt callback to VMBus IRQ handler

Theodore Ts'o (3):
  random: replace non-blocking pool with a Chacha20-based CRNG
  random: make /dev/urandom scalable for silly userspace programs
  random: add backtracking protection to the CRNG

 crypto/chacha20_generic.c |  61 -------
 drivers/char/random.c     | 446 ++++++++++++++++++++++++++++++++++++----------
 drivers/hv/vmbus_drv.c    |   3 +
 include/crypto/chacha20.h |   1 +
 lib/Makefile              |   2 +-
 lib/chacha20.c            |  79 ++++++++
 6 files changed, 438 insertions(+), 154 deletions(-)
 create mode 100644 lib/chacha20.c

-- 
2.5.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/5] random: replace non-blocking pool with a Chacha20-based CRNG
  2016-05-30  5:39 [PATCH-v3 0/5] random: replace urandom pool with a CRNG Theodore Ts'o
@ 2016-05-30  5:39 ` Theodore Ts'o
  2016-05-30  5:39 ` [PATCH 2/5] random: make /dev/urandom scalable for silly userspace programs Theodore Ts'o
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Theodore Ts'o @ 2016-05-30  5:39 UTC (permalink / raw)
  To: Linux Kernel Developers List
  Cc: smueller, herbert, andi, sandyinchina, cryptography, jsd, hpa,
	linux-crypto, Theodore Ts'o

The CRNG is faster, and we don't pretend to track entropy usage in the
CRNG any more.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
---
 crypto/chacha20_generic.c |  61 --------
 drivers/char/random.c     | 350 ++++++++++++++++++++++++++++++++++------------
 include/crypto/chacha20.h |   1 +
 lib/Makefile              |   2 +-
 lib/chacha20.c            |  79 +++++++++++
 5 files changed, 340 insertions(+), 153 deletions(-)
 create mode 100644 lib/chacha20.c

diff --git a/crypto/chacha20_generic.c b/crypto/chacha20_generic.c
index da9c899..1cab831 100644
--- a/crypto/chacha20_generic.c
+++ b/crypto/chacha20_generic.c
@@ -15,72 +15,11 @@
 #include <linux/module.h>
 #include <crypto/chacha20.h>
 
-static inline u32 rotl32(u32 v, u8 n)
-{
-	return (v << n) | (v >> (sizeof(v) * 8 - n));
-}
-
 static inline u32 le32_to_cpuvp(const void *p)
 {
 	return le32_to_cpup(p);
 }
 
-static void chacha20_block(u32 *state, void *stream)
-{
-	u32 x[16], *out = stream;
-	int i;
-
-	for (i = 0; i < ARRAY_SIZE(x); i++)
-		x[i] = state[i];
-
-	for (i = 0; i < 20; i += 2) {
-		x[0]  += x[4];    x[12] = rotl32(x[12] ^ x[0],  16);
-		x[1]  += x[5];    x[13] = rotl32(x[13] ^ x[1],  16);
-		x[2]  += x[6];    x[14] = rotl32(x[14] ^ x[2],  16);
-		x[3]  += x[7];    x[15] = rotl32(x[15] ^ x[3],  16);
-
-		x[8]  += x[12];   x[4]  = rotl32(x[4]  ^ x[8],  12);
-		x[9]  += x[13];   x[5]  = rotl32(x[5]  ^ x[9],  12);
-		x[10] += x[14];   x[6]  = rotl32(x[6]  ^ x[10], 12);
-		x[11] += x[15];   x[7]  = rotl32(x[7]  ^ x[11], 12);
-
-		x[0]  += x[4];    x[12] = rotl32(x[12] ^ x[0],   8);
-		x[1]  += x[5];    x[13] = rotl32(x[13] ^ x[1],   8);
-		x[2]  += x[6];    x[14] = rotl32(x[14] ^ x[2],   8);
-		x[3]  += x[7];    x[15] = rotl32(x[15] ^ x[3],   8);
-
-		x[8]  += x[12];   x[4]  = rotl32(x[4]  ^ x[8],   7);
-		x[9]  += x[13];   x[5]  = rotl32(x[5]  ^ x[9],   7);
-		x[10] += x[14];   x[6]  = rotl32(x[6]  ^ x[10],  7);
-		x[11] += x[15];   x[7]  = rotl32(x[7]  ^ x[11],  7);
-
-		x[0]  += x[5];    x[15] = rotl32(x[15] ^ x[0],  16);
-		x[1]  += x[6];    x[12] = rotl32(x[12] ^ x[1],  16);
-		x[2]  += x[7];    x[13] = rotl32(x[13] ^ x[2],  16);
-		x[3]  += x[4];    x[14] = rotl32(x[14] ^ x[3],  16);
-
-		x[10] += x[15];   x[5]  = rotl32(x[5]  ^ x[10], 12);
-		x[11] += x[12];   x[6]  = rotl32(x[6]  ^ x[11], 12);
-		x[8]  += x[13];   x[7]  = rotl32(x[7]  ^ x[8],  12);
-		x[9]  += x[14];   x[4]  = rotl32(x[4]  ^ x[9],  12);
-
-		x[0]  += x[5];    x[15] = rotl32(x[15] ^ x[0],   8);
-		x[1]  += x[6];    x[12] = rotl32(x[12] ^ x[1],   8);
-		x[2]  += x[7];    x[13] = rotl32(x[13] ^ x[2],   8);
-		x[3]  += x[4];    x[14] = rotl32(x[14] ^ x[3],   8);
-
-		x[10] += x[15];   x[5]  = rotl32(x[5]  ^ x[10],  7);
-		x[11] += x[12];   x[6]  = rotl32(x[6]  ^ x[11],  7);
-		x[8]  += x[13];   x[7]  = rotl32(x[7]  ^ x[8],   7);
-		x[9]  += x[14];   x[4]  = rotl32(x[4]  ^ x[9],   7);
-	}
-
-	for (i = 0; i < ARRAY_SIZE(x); i++)
-		out[i] = cpu_to_le32(x[i] + state[i]);
-
-	state[12]++;
-}
-
 static void chacha20_docrypt(u32 *state, u8 *dst, const u8 *src,
 			     unsigned int bytes)
 {
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 0158d3b..3a4d865 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -261,6 +261,7 @@
 #include <linux/syscalls.h>
 #include <linux/completion.h>
 #include <linux/uuid.h>
+#include <crypto/chacha20.h>
 
 #include <asm/processor.h>
 #include <asm/uaccess.h>
@@ -413,6 +414,29 @@ static struct fasync_struct *fasync;
 static DEFINE_SPINLOCK(random_ready_list_lock);
 static LIST_HEAD(random_ready_list);
 
+struct crng_state {
+	__u32		state[16];
+	unsigned long	init_time;
+	spinlock_t	lock;
+};
+
+struct crng_state primary_crng = {
+	.lock = __SPIN_LOCK_UNLOCKED(primary_crng.lock),
+};
+
+/*
+ * crng_init =  0 --> Uninitialized
+ *		2 --> Initialized
+ *		3 --> Initialized from input_pool
+ *
+ * crng_init is protected by primary_crng->lock, and only increases
+ * its value (from 0->1->2->3).
+ */
+static int crng_init = 0;
+#define crng_ready() (likely(crng_init >= 2))
+static void extract_crng(__u8 out[CHACHA20_BLOCK_SIZE]);
+static void process_random_ready_list(void);
+
 /**********************************************************************
  *
  * OS independent entropy store.   Here are the functions which handle
@@ -442,10 +466,15 @@ struct entropy_store {
 	__u8 last_data[EXTRACT_SIZE];
 };
 
+static ssize_t extract_entropy(struct entropy_store *r, void *buf,
+			       size_t nbytes, int min, int rsvd);
+static ssize_t _extract_entropy(struct entropy_store *r, void *buf,
+				size_t nbytes, int fips);
+
+static void crng_reseed(struct crng_state *crng, struct entropy_store *r);
 static void push_to_pool(struct work_struct *work);
 static __u32 input_pool_data[INPUT_POOL_WORDS];
 static __u32 blocking_pool_data[OUTPUT_POOL_WORDS];
-static __u32 nonblocking_pool_data[OUTPUT_POOL_WORDS];
 
 static struct entropy_store input_pool = {
 	.poolinfo = &poolinfo_table[0],
@@ -466,16 +495,6 @@ static struct entropy_store blocking_pool = {
 					push_to_pool),
 };
 
-static struct entropy_store nonblocking_pool = {
-	.poolinfo = &poolinfo_table[1],
-	.name = "nonblocking",
-	.pull = &input_pool,
-	.lock = __SPIN_LOCK_UNLOCKED(nonblocking_pool.lock),
-	.pool = nonblocking_pool_data,
-	.push_work = __WORK_INITIALIZER(nonblocking_pool.push_work,
-					push_to_pool),
-};
-
 static __u32 const twist_table[8] = {
 	0x00000000, 0x3b6e20c8, 0x76dc4190, 0x4db26158,
 	0xedb88320, 0xd6d6a3e8, 0x9b64c2b0, 0xa00ae278 };
@@ -678,12 +697,6 @@ retry:
 	if (!r->initialized && r->entropy_total > 128) {
 		r->initialized = 1;
 		r->entropy_total = 0;
-		if (r == &nonblocking_pool) {
-			prandom_reseed_late();
-			process_random_ready_list();
-			wake_up_all(&urandom_init_wait);
-			pr_notice("random: %s pool is initialized\n", r->name);
-		}
 	}
 
 	trace_credit_entropy_bits(r->name, nbits,
@@ -693,30 +706,27 @@ retry:
 	if (r == &input_pool) {
 		int entropy_bits = entropy_count >> ENTROPY_SHIFT;
 
+		if (crng_init < 3 && entropy_bits >= 128) {
+			crng_reseed(&primary_crng, r);
+			entropy_bits = r->entropy_count >> ENTROPY_SHIFT;
+		}
+
 		/* should we wake readers? */
 		if (entropy_bits >= random_read_wakeup_bits) {
 			wake_up_interruptible(&random_read_wait);
 			kill_fasync(&fasync, SIGIO, POLL_IN);
 		}
 		/* If the input pool is getting full, send some
-		 * entropy to the two output pools, flipping back and
-		 * forth between them, until the output pools are 75%
-		 * full.
+		 * entropy to the blocking pool until it is 75% full.
 		 */
 		if (entropy_bits > random_write_wakeup_bits &&
 		    r->initialized &&
 		    r->entropy_total >= 2*random_read_wakeup_bits) {
-			static struct entropy_store *last = &blocking_pool;
 			struct entropy_store *other = &blocking_pool;
 
-			if (last == &blocking_pool)
-				other = &nonblocking_pool;
 			if (other->entropy_count <=
-			    3 * other->poolinfo->poolfracbits / 4)
-				last = other;
-			if (last->entropy_count <=
-			    3 * last->poolinfo->poolfracbits / 4) {
-				schedule_work(&last->push_work);
+			    3 * other->poolinfo->poolfracbits / 4) {
+				schedule_work(&other->push_work);
 				r->entropy_total = 0;
 			}
 		}
@@ -736,6 +746,157 @@ static void credit_entropy_bits_safe(struct entropy_store *r, int nbits)
 
 /*********************************************************************
  *
+ * CRNG using CHACHA20
+ *
+ *********************************************************************/
+
+#define CRNG_RESEED_INTERVAL (300*HZ)
+
+static DECLARE_WAIT_QUEUE_HEAD(crng_init_wait);
+
+static void _initialize_crng(struct crng_state *crng)
+{
+	int		i;
+	unsigned long	rv;
+
+	memcpy(&crng->state[0], "expand 32-byte k", 16);
+	if (crng == &primary_crng)
+		_extract_entropy(&input_pool, &crng->state[4],
+				 sizeof(__u32) * 12, 0);
+	else
+		get_random_bytes(&crng->state[4], sizeof(__u32) * 12);
+	for (i = 4; i < 16; i++) {
+		if (!arch_get_random_seed_long(&rv) &&
+		    !arch_get_random_long(&rv))
+			rv = random_get_entropy();
+		crng->state[i] ^= rv;
+	}
+	crng->init_time = jiffies - CRNG_RESEED_INTERVAL - 1;
+}
+
+static void initialize_crng(struct crng_state *crng)
+{
+	_initialize_crng(crng);
+	spin_lock_init(&crng->lock);
+}
+
+static int crng_fast_load(__u32 pool[4])
+{
+	int	i;
+	__u32	*p;
+
+	if (!spin_trylock(&primary_crng.lock))
+		return 0;
+	if (crng_ready()) {
+		spin_unlock(&primary_crng.lock);
+		return 0;
+	}
+	p = &primary_crng.state[4];
+	if (crng_init == 1)
+		p += 4;
+	for (i=0; i < 4; i++)
+		*p ^= pool[i];
+	if (++crng_init >= 2) {
+		wake_up_interruptible(&crng_init_wait);
+		pr_notice("random: fast init done\n");
+	}
+	spin_unlock(&primary_crng.lock);
+	return 1;
+}
+
+static void crng_reseed(struct crng_state *crng, struct entropy_store *r)
+{
+	unsigned long	flags;
+	int		i, num;
+	union {
+		__u8	block[CHACHA20_BLOCK_SIZE];
+		__u32	key[8];
+	} buf;
+
+	if (r) {
+		num = extract_entropy(r, &buf, 32, 16, 0);
+		if (num == 0)
+			return;
+	} else
+		extract_crng(buf.block);
+	spin_lock_irqsave(&primary_crng.lock, flags);
+	for (i = 0; i < 8; i++) {
+		unsigned long	rv;
+		if (!arch_get_random_seed_long(&rv) &&
+		    !arch_get_random_long(&rv))
+			rv = random_get_entropy();
+		crng->state[i+4] ^= buf.key[i] ^ rv;
+	}
+	memzero_explicit(&buf, sizeof(buf));
+	crng->init_time = jiffies;
+	if (crng == &primary_crng && crng_init < 3) {
+		crng_init = 3;
+		process_random_ready_list();
+		wake_up_interruptible(&crng_init_wait);
+		pr_notice("random: crng init done\n");
+	}
+	spin_unlock_irqrestore(&primary_crng.lock, flags);
+}
+
+static inline void crng_wait_ready(void)
+{
+	wait_event_interruptible(crng_init_wait, crng_ready());
+}
+
+static void extract_crng(__u8 out[CHACHA20_BLOCK_SIZE])
+{
+	unsigned long v, flags;
+	struct crng_state *crng = &primary_crng;
+
+	if (crng_init > 2 &&
+	    time_after(jiffies, crng->init_time + CRNG_RESEED_INTERVAL))
+		crng_reseed(crng, &input_pool);
+	spin_lock_irqsave(&crng->lock, flags);
+	if (arch_get_random_long(&v))
+		crng->state[14] ^= v;
+	chacha20_block(&crng->state[0], out);
+	if (crng->state[12] == 0)
+		crng->state[13]++;
+	spin_unlock_irqrestore(&crng->lock, flags);
+}
+
+static ssize_t extract_crng_user(void __user *buf, size_t nbytes)
+{
+	ssize_t ret = 0, i;
+	__u8 tmp[CHACHA20_BLOCK_SIZE];
+	int large_request = (nbytes > 256);
+
+	while (nbytes) {
+		if (large_request && need_resched()) {
+			if (signal_pending(current)) {
+				if (ret == 0)
+					ret = -ERESTARTSYS;
+				break;
+			}
+			schedule();
+		}
+
+		extract_crng(tmp);
+		i = min_t(int, nbytes, CHACHA20_BLOCK_SIZE);
+		if (copy_to_user(buf, tmp, i)) {
+			ret = -EFAULT;
+			break;
+		}
+
+		nbytes -= i;
+		buf += i;
+		ret += i;
+	}
+
+	/* Wipe data just written to memory */
+	memzero_explicit(tmp, sizeof(tmp));
+
+	return ret;
+}
+
+
+/*********************************************************************
+ *
  * Entropy input management
  *
  *********************************************************************/
@@ -750,12 +911,12 @@ struct timer_rand_state {
 #define INIT_TIMER_RAND_STATE { INITIAL_JIFFIES, };
 
 /*
- * Add device- or boot-specific data to the input and nonblocking
- * pools to help initialize them to unique values.
+ * Add device- or boot-specific data to the input pool to help
+ * initialize it.
  *
- * None of this adds any entropy, it is meant to avoid the
- * problem of the nonblocking pool having similar initial state
- * across largely identical devices.
+ * None of this adds any entropy; it is meant to avoid the problem of
+ * the entropy pool having similar initial state across largely
+ * identical devices.
  */
 void add_device_randomness(const void *buf, unsigned int size)
 {
@@ -767,11 +928,6 @@ void add_device_randomness(const void *buf, unsigned int size)
 	_mix_pool_bytes(&input_pool, buf, size);
 	_mix_pool_bytes(&input_pool, &time, sizeof(time));
 	spin_unlock_irqrestore(&input_pool.lock, flags);
-
-	spin_lock_irqsave(&nonblocking_pool.lock, flags);
-	_mix_pool_bytes(&nonblocking_pool, buf, size);
-	_mix_pool_bytes(&nonblocking_pool, &time, sizeof(time));
-	spin_unlock_irqrestore(&nonblocking_pool.lock, flags);
 }
 EXPORT_SYMBOL(add_device_randomness);
 
@@ -802,7 +958,7 @@ static void add_timer_randomness(struct timer_rand_state *state, unsigned num)
 	sample.jiffies = jiffies;
 	sample.cycles = random_get_entropy();
 	sample.num = num;
-	r = nonblocking_pool.initialized ? &input_pool : &nonblocking_pool;
+	r = &input_pool;
 	mix_pool_bytes(r, &sample, sizeof(sample));
 
 	/*
@@ -922,7 +1078,13 @@ void add_interrupt_randomness(int irq, int irq_flags)
 	    !time_after(now, fast_pool->last + HZ))
 		return;
 
-	r = nonblocking_pool.initialized ? &input_pool : &nonblocking_pool;
+	if (!crng_ready() && crng_fast_load(fast_pool->pool)) {
+		fast_pool->count = 0;
+		fast_pool->last = now;
+		return;
+	}
+
+	r = &input_pool;
 	if (!spin_trylock(&r->lock))
 		return;
 
@@ -965,9 +1127,6 @@ EXPORT_SYMBOL_GPL(add_disk_randomness);
  *
  *********************************************************************/
 
-static ssize_t extract_entropy(struct entropy_store *r, void *buf,
-			       size_t nbytes, int min, int rsvd);
-
 /*
  * This utility inline function is responsible for transferring entropy
  * from the primary pool to the secondary extraction pool. We make
@@ -1142,6 +1301,36 @@ static void extract_buf(struct entropy_store *r, __u8 *out)
 	memzero_explicit(&hash, sizeof(hash));
 }
 
+static ssize_t _extract_entropy(struct entropy_store *r, void *buf,
+				size_t nbytes, int fips)
+{
+	ssize_t ret = 0, i;
+	__u8 tmp[EXTRACT_SIZE];
+	unsigned long flags;
+
+	while (nbytes) {
+		extract_buf(r, tmp);
+
+		if (fips) {
+			spin_lock_irqsave(&r->lock, flags);
+			if (!memcmp(tmp, r->last_data, EXTRACT_SIZE))
+				panic("Hardware RNG duplicated output!\n");
+			memcpy(r->last_data, tmp, EXTRACT_SIZE);
+			spin_unlock_irqrestore(&r->lock, flags);
+		}
+		i = min_t(int, nbytes, EXTRACT_SIZE);
+		memcpy(buf, tmp, i);
+		nbytes -= i;
+		buf += i;
+		ret += i;
+	}
+
+	/* Wipe data just returned from memory */
+	memzero_explicit(tmp, sizeof(tmp));
+
+	return ret;
+}
+
 /*
  * This function extracts randomness from the "entropy pool", and
  * returns it in a buffer.
@@ -1154,7 +1343,6 @@ static void extract_buf(struct entropy_store *r, __u8 *out)
 static ssize_t extract_entropy(struct entropy_store *r, void *buf,
 				 size_t nbytes, int min, int reserved)
 {
-	ssize_t ret = 0, i;
 	__u8 tmp[EXTRACT_SIZE];
 	unsigned long flags;
 
@@ -1178,27 +1366,7 @@ static ssize_t extract_entropy(struct entropy_store *r, void *buf,
 	xfer_secondary_pool(r, nbytes);
 	nbytes = account(r, nbytes, min, reserved);
 
-	while (nbytes) {
-		extract_buf(r, tmp);
-
-		if (fips_enabled) {
-			spin_lock_irqsave(&r->lock, flags);
-			if (!memcmp(tmp, r->last_data, EXTRACT_SIZE))
-				panic("Hardware RNG duplicated output!\n");
-			memcpy(r->last_data, tmp, EXTRACT_SIZE);
-			spin_unlock_irqrestore(&r->lock, flags);
-		}
-		i = min_t(int, nbytes, EXTRACT_SIZE);
-		memcpy(buf, tmp, i);
-		nbytes -= i;
-		buf += i;
-		ret += i;
-	}
-
-	/* Wipe data just returned from memory */
-	memzero_explicit(tmp, sizeof(tmp));
-
-	return ret;
+	return _extract_entropy(r, buf, nbytes, fips_enabled);
 }
 
 /*
@@ -1253,15 +1421,26 @@ static ssize_t extract_entropy_user(struct entropy_store *r, void __user *buf,
  */
 void get_random_bytes(void *buf, int nbytes)
 {
+	__u8 tmp[CHACHA20_BLOCK_SIZE];
+
 #if DEBUG_RANDOM_BOOT > 0
-	if (unlikely(nonblocking_pool.initialized == 0))
+	if (!crng_ready())
 		printk(KERN_NOTICE "random: %pF get_random_bytes called "
-		       "with %d bits of entropy available\n",
-		       (void *) _RET_IP_,
-		       nonblocking_pool.entropy_total);
+		       "with crng_init = %d\n", (void *) _RET_IP_, crng_init);
 #endif
 	trace_get_random_bytes(nbytes, _RET_IP_);
-	extract_entropy(&nonblocking_pool, buf, nbytes, 0, 0);
+
+	while (nbytes >= CHACHA20_BLOCK_SIZE) {
+		extract_crng(buf);
+		buf += CHACHA20_BLOCK_SIZE;
+		nbytes -= CHACHA20_BLOCK_SIZE;
+	}
+
+	if (nbytes > 0) {
+		extract_crng(tmp);
+		memcpy(buf, tmp, nbytes);
+		memzero_explicit(tmp, nbytes);
+	}
 }
 EXPORT_SYMBOL(get_random_bytes);
 
@@ -1279,7 +1458,7 @@ int add_random_ready_callback(struct random_ready_callback *rdy)
 	unsigned long flags;
 	int err = -EALREADY;
 
-	if (likely(nonblocking_pool.initialized))
+	if (crng_ready())
 		return err;
 
 	owner = rdy->owner;
@@ -1287,7 +1466,7 @@ int add_random_ready_callback(struct random_ready_callback *rdy)
 		return -ENOENT;
 
 	spin_lock_irqsave(&random_ready_list_lock, flags);
-	if (nonblocking_pool.initialized)
+	if (crng_ready())
 		goto out;
 
 	owner = NULL;
@@ -1351,7 +1530,7 @@ void get_random_bytes_arch(void *buf, int nbytes)
 	}
 
 	if (nbytes)
-		extract_entropy(&nonblocking_pool, p, nbytes, 0, 0);
+		get_random_bytes(p, nbytes);
 }
 EXPORT_SYMBOL(get_random_bytes_arch);
 
@@ -1396,7 +1575,7 @@ static int rand_initialize(void)
 {
 	init_std_data(&input_pool);
 	init_std_data(&blocking_pool);
-	init_std_data(&nonblocking_pool);
+	_initialize_crng(&primary_crng);
 	return 0;
 }
 early_initcall(rand_initialize);
@@ -1460,16 +1639,10 @@ urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
 {
 	int ret;
 
-	if (unlikely(nonblocking_pool.initialized == 0))
-		printk_once(KERN_NOTICE "random: %s urandom read "
-			    "with %d bits of entropy available\n",
-			    current->comm, nonblocking_pool.entropy_total);
-
+	crng_wait_ready();
 	nbytes = min_t(size_t, nbytes, INT_MAX >> (ENTROPY_SHIFT + 3));
-	ret = extract_entropy_user(&nonblocking_pool, buf, nbytes);
-
-	trace_urandom_read(8 * nbytes, ENTROPY_BITS(&nonblocking_pool),
-			   ENTROPY_BITS(&input_pool));
+	ret = extract_crng_user(buf, nbytes);
+	trace_urandom_read(8 * nbytes, 0, ENTROPY_BITS(&input_pool));
 	return ret;
 }
 
@@ -1515,10 +1688,7 @@ static ssize_t random_write(struct file *file, const char __user *buffer,
 {
 	size_t ret;
 
-	ret = write_pool(&blocking_pool, buffer, count);
-	if (ret)
-		return ret;
-	ret = write_pool(&nonblocking_pool, buffer, count);
+	ret = write_pool(&input_pool, buffer, count);
 	if (ret)
 		return ret;
 
@@ -1569,7 +1739,6 @@ static long random_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
 		if (!capable(CAP_SYS_ADMIN))
 			return -EPERM;
 		input_pool.entropy_count = 0;
-		nonblocking_pool.entropy_count = 0;
 		blocking_pool.entropy_count = 0;
 		return 0;
 	default:
@@ -1611,11 +1780,10 @@ SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, count,
 	if (flags & GRND_RANDOM)
 		return _random_read(flags & GRND_NONBLOCK, buf, count);
 
-	if (unlikely(nonblocking_pool.initialized == 0)) {
+	if (!crng_ready()) {
 		if (flags & GRND_NONBLOCK)
 			return -EAGAIN;
-		wait_event_interruptible(urandom_init_wait,
-					 nonblocking_pool.initialized);
+		crng_wait_ready();
 		if (signal_pending(current))
 			return -ERESTARTSYS;
 	}
diff --git a/include/crypto/chacha20.h b/include/crypto/chacha20.h
index 274bbae..20d20f68 100644
--- a/include/crypto/chacha20.h
+++ b/include/crypto/chacha20.h
@@ -16,6 +16,7 @@ struct chacha20_ctx {
 	u32 key[8];
 };
 
+void chacha20_block(u32 *state, void *stream);
 void crypto_chacha20_init(u32 *state, struct chacha20_ctx *ctx, u8 *iv);
 int crypto_chacha20_setkey(struct crypto_tfm *tfm, const u8 *key,
 			   unsigned int keysize);
diff --git a/lib/Makefile b/lib/Makefile
index 499fb35..34e205f 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -22,7 +22,7 @@ KCOV_INSTRUMENT_hweight.o := n
 lib-y := ctype.o string.o vsprintf.o cmdline.o \
 	 rbtree.o radix-tree.o dump_stack.o timerqueue.o\
 	 idr.o int_sqrt.o extable.o \
-	 sha1.o md5.o irq_regs.o argv_split.o \
+	 sha1.o chacha20.o md5.o irq_regs.o argv_split.o \
 	 flex_proportions.o ratelimit.o show_mem.o \
 	 is_single_threaded.o plist.o decompress.o kobject_uevent.o \
 	 earlycpio.o seq_buf.o nmi_backtrace.o nodemask.o
diff --git a/lib/chacha20.c b/lib/chacha20.c
new file mode 100644
index 0000000..250ceed
--- /dev/null
+++ b/lib/chacha20.c
@@ -0,0 +1,79 @@
+/*
+ * ChaCha20 256-bit cipher algorithm, RFC7539
+ *
+ * Copyright (C) 2015 Martin Willi
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/export.h>
+#include <linux/bitops.h>
+#include <linux/cryptohash.h>
+#include <asm/unaligned.h>
+#include <crypto/chacha20.h>
+
+static inline u32 rotl32(u32 v, u8 n)
+{
+	return (v << n) | (v >> (sizeof(v) * 8 - n));
+}
+
+extern void chacha20_block(u32 *state, void *stream)
+{
+	u32 x[16], *out = stream;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(x); i++)
+		x[i] = state[i];
+
+	for (i = 0; i < 20; i += 2) {
+		x[0]  += x[4];    x[12] = rotl32(x[12] ^ x[0],  16);
+		x[1]  += x[5];    x[13] = rotl32(x[13] ^ x[1],  16);
+		x[2]  += x[6];    x[14] = rotl32(x[14] ^ x[2],  16);
+		x[3]  += x[7];    x[15] = rotl32(x[15] ^ x[3],  16);
+
+		x[8]  += x[12];   x[4]  = rotl32(x[4]  ^ x[8],  12);
+		x[9]  += x[13];   x[5]  = rotl32(x[5]  ^ x[9],  12);
+		x[10] += x[14];   x[6]  = rotl32(x[6]  ^ x[10], 12);
+		x[11] += x[15];   x[7]  = rotl32(x[7]  ^ x[11], 12);
+
+		x[0]  += x[4];    x[12] = rotl32(x[12] ^ x[0],   8);
+		x[1]  += x[5];    x[13] = rotl32(x[13] ^ x[1],   8);
+		x[2]  += x[6];    x[14] = rotl32(x[14] ^ x[2],   8);
+		x[3]  += x[7];    x[15] = rotl32(x[15] ^ x[3],   8);
+
+		x[8]  += x[12];   x[4]  = rotl32(x[4]  ^ x[8],   7);
+		x[9]  += x[13];   x[5]  = rotl32(x[5]  ^ x[9],   7);
+		x[10] += x[14];   x[6]  = rotl32(x[6]  ^ x[10],  7);
+		x[11] += x[15];   x[7]  = rotl32(x[7]  ^ x[11],  7);
+
+		x[0]  += x[5];    x[15] = rotl32(x[15] ^ x[0],  16);
+		x[1]  += x[6];    x[12] = rotl32(x[12] ^ x[1],  16);
+		x[2]  += x[7];    x[13] = rotl32(x[13] ^ x[2],  16);
+		x[3]  += x[4];    x[14] = rotl32(x[14] ^ x[3],  16);
+
+		x[10] += x[15];   x[5]  = rotl32(x[5]  ^ x[10], 12);
+		x[11] += x[12];   x[6]  = rotl32(x[6]  ^ x[11], 12);
+		x[8]  += x[13];   x[7]  = rotl32(x[7]  ^ x[8],  12);
+		x[9]  += x[14];   x[4]  = rotl32(x[4]  ^ x[9],  12);
+
+		x[0]  += x[5];    x[15] = rotl32(x[15] ^ x[0],   8);
+		x[1]  += x[6];    x[12] = rotl32(x[12] ^ x[1],   8);
+		x[2]  += x[7];    x[13] = rotl32(x[13] ^ x[2],   8);
+		x[3]  += x[4];    x[14] = rotl32(x[14] ^ x[3],   8);
+
+		x[10] += x[15];   x[5]  = rotl32(x[5]  ^ x[10],  7);
+		x[11] += x[12];   x[6]  = rotl32(x[6]  ^ x[11],  7);
+		x[8]  += x[13];   x[7]  = rotl32(x[7]  ^ x[8],   7);
+		x[9]  += x[14];   x[4]  = rotl32(x[4]  ^ x[9],   7);
+	}
+
+	for (i = 0; i < ARRAY_SIZE(x); i++)
+		out[i] = cpu_to_le32(x[i] + state[i]);
+
+	state[12]++;
+}
+EXPORT_SYMBOL(chacha20_block);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/5] random: make /dev/urandom scalable for silly userspace programs
  2016-05-30  5:39 [PATCH-v3 0/5] random: replace urandom pool with a CRNG Theodore Ts'o
  2016-05-30  5:39 ` [PATCH 1/5] random: replace non-blocking pool with a Chacha20-based CRNG Theodore Ts'o
@ 2016-05-30  5:39 ` Theodore Ts'o
  2016-05-30  6:03   ` Stephan Mueller
  2016-05-30  5:39 ` [PATCH 3/5] random: add interrupt callback to VMBus IRQ handler Theodore Ts'o
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 10+ messages in thread
From: Theodore Ts'o @ 2016-05-30  5:39 UTC (permalink / raw)
  To: Linux Kernel Developers List
  Cc: smueller, herbert, andi, sandyinchina, cryptography, jsd, hpa,
	linux-crypto, Theodore Ts'o

On a system with a 4 socket (NUMA) system where a large number of
application threads were all trying to read from /dev/urandom, this
can result in the system spending 80% of its time contending on the
global urandom spinlock.  The application should have used its own
PRNG, but let's try to help it from running, lemming-like, straight
over the locking cliff.

Reported-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
---
 drivers/char/random.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 53 insertions(+), 4 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 3a4d865..1778c84 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -434,6 +434,8 @@ struct crng_state primary_crng = {
  */
 static int crng_init = 0;
 #define crng_ready() (likely(crng_init >= 2))
+static void _extract_crng(struct crng_state *crng,
+			  __u8 out[CHACHA20_BLOCK_SIZE]);
 static void extract_crng(__u8 out[CHACHA20_BLOCK_SIZE]);
 static void process_random_ready_list(void);
 
@@ -754,6 +756,17 @@ static void credit_entropy_bits_safe(struct entropy_store *r, int nbits)
 
 static DECLARE_WAIT_QUEUE_HEAD(crng_init_wait);
 
+#ifdef CONFIG_NUMA
+/*
+ * Hack to deal with crazy userspace progams when they are all trying
+ * to access /dev/urandom in parallel.  The programs are almost
+ * certainly doing something terribly wrong, but we'll work around
+ * their brain damage.
+ */
+static struct crng_state **crng_node_pool __read_mostly;
+#endif
+
+
 static void _initialize_crng(struct crng_state *crng)
 {
 	int		i;
@@ -774,11 +787,13 @@ static void _initialize_crng(struct crng_state *crng)
 	crng->init_time = jiffies - CRNG_RESEED_INTERVAL - 1;
 }
 
+#ifdef CONFIG_NUMA
 static void initialize_crng(struct crng_state *crng)
 {
 	_initialize_crng(crng);
 	spin_lock_init(&crng->lock);
 }
+#endif
 
 static int crng_fast_load(__u32 pool[4])
 {
@@ -818,7 +833,7 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r)
 		if (num == 0)
 			return;
 	} else
-		extract_crng(buf.block);
+		_extract_crng(&primary_crng, buf.block);
 	spin_lock_irqsave(&primary_crng.lock, flags);
 	for (i = 0; i < 8; i++) {
 		unsigned long	rv;
@@ -838,19 +853,26 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r)
 	spin_unlock_irqrestore(&primary_crng.lock, flags);
 }
 
+static inline void maybe_reseed_primary_crng(void)
+{
+	if (crng_init > 2 &&
+	    time_after(jiffies, primary_crng.init_time + CRNG_RESEED_INTERVAL))
+		crng_reseed(&primary_crng, &input_pool);
+}
+
 static inline void crng_wait_ready(void)
 {
 	wait_event_interruptible(crng_init_wait, crng_ready());
 }
 
-static void extract_crng(__u8 out[CHACHA20_BLOCK_SIZE])
+static void _extract_crng(struct crng_state *crng,
+			  __u8 out[CHACHA20_BLOCK_SIZE])
 {
 	unsigned long v, flags;
-	struct crng_state *crng = &primary_crng;
 
 	if (crng_init > 2 &&
 	    time_after(jiffies, crng->init_time + CRNG_RESEED_INTERVAL))
-		crng_reseed(crng, &input_pool);
+		crng_reseed(crng, crng == &primary_crng ? &input_pool : NULL);
 	spin_lock_irqsave(&crng->lock, flags);
 	if (arch_get_random_long(&v))
 		crng->state[14] ^= v;
@@ -860,6 +882,17 @@ static void extract_crng(__u8 out[CHACHA20_BLOCK_SIZE])
 	spin_unlock_irqrestore(&crng->lock, flags);
 }
 
+static void extract_crng(__u8 out[CHACHA20_BLOCK_SIZE])
+{
+#ifndef CONFIG_NUMA
+	struct crng_state *crng = &primary_crng;
+#else
+	struct crng_state *crng = crng_node_pool[numa_node_id()];
+#endif
+
+	_extract_crng(crng, out);
+}
+
 static ssize_t extract_crng_user(void __user *buf, size_t nbytes)
 {
 	ssize_t ret = 0, i;
@@ -1573,6 +1606,22 @@ static void init_std_data(struct entropy_store *r)
  */
 static int rand_initialize(void)
 {
+#ifdef CONFIG_NUMA
+	int i;
+	int num_nodes = num_possible_nodes();
+	struct crng_state *crng;
+
+	crng_node_pool = kmalloc(num_nodes * sizeof(void *),
+				 GFP_KERNEL|__GFP_NOFAIL);
+
+	for (i=0; i < num_nodes; i++) {
+		crng = kmalloc(sizeof(struct crng_state),
+			       GFP_KERNEL | __GFP_NOFAIL);
+		initialize_crng(crng);
+		crng_node_pool[i] = crng;
+
+	}
+#endif
 	init_std_data(&input_pool);
 	init_std_data(&blocking_pool);
 	_initialize_crng(&primary_crng);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/5] random: add interrupt callback to VMBus IRQ handler
  2016-05-30  5:39 [PATCH-v3 0/5] random: replace urandom pool with a CRNG Theodore Ts'o
  2016-05-30  5:39 ` [PATCH 1/5] random: replace non-blocking pool with a Chacha20-based CRNG Theodore Ts'o
  2016-05-30  5:39 ` [PATCH 2/5] random: make /dev/urandom scalable for silly userspace programs Theodore Ts'o
@ 2016-05-30  5:39 ` Theodore Ts'o
  2016-05-30  5:39 ` [PATCH 4/5] random: add backtracking protection to the CRNG Theodore Ts'o
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Theodore Ts'o @ 2016-05-30  5:39 UTC (permalink / raw)
  To: Linux Kernel Developers List
  Cc: smueller, herbert, andi, sandyinchina, cryptography, jsd, hpa,
	linux-crypto, Stephan Mueller, Theodore Ts'o

From: Stephan Mueller <smueller@chronox.de>

The Hyper-V Linux Integration Services use the VMBus implementation for
communication with the Hypervisor. VMBus registers its own interrupt
handler that completely bypasses the common Linux interrupt handling.
This implies that the interrupt entropy collector is not triggered.

This patch adds the interrupt entropy collection callback into the VMBus
interrupt handler function.

Signed-off-by: Stephan Mueller <stephan.mueller@atsec.com>
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
---
 drivers/char/random.c  | 1 +
 drivers/hv/vmbus_drv.c | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 1778c84..bf370a3 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1141,6 +1141,7 @@ void add_interrupt_randomness(int irq, int irq_flags)
 	/* award one bit for the contents of the fast pool */
 	credit_entropy_bits(r, credit + 1);
 }
+EXPORT_SYMBOL_GPL(add_interrupt_randomness);
 
 #ifdef CONFIG_BLOCK
 void add_disk_randomness(struct gendisk *disk)
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 952f20f..e82f7e1 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -42,6 +42,7 @@
 #include <linux/screen_info.h>
 #include <linux/kdebug.h>
 #include <linux/efi.h>
+#include <linux/random.h>
 #include "hyperv_vmbus.h"
 
 static struct acpi_device  *hv_acpi_dev;
@@ -806,6 +807,8 @@ static void vmbus_isr(void)
 		else
 			tasklet_schedule(hv_context.msg_dpc[cpu]);
 	}
+
+	add_interrupt_randomness(HYPERVISOR_CALLBACK_VECTOR, 0);
 }
 
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/5] random: add backtracking protection to the CRNG
  2016-05-30  5:39 [PATCH-v3 0/5] random: replace urandom pool with a CRNG Theodore Ts'o
                   ` (2 preceding siblings ...)
  2016-05-30  5:39 ` [PATCH 3/5] random: add interrupt callback to VMBus IRQ handler Theodore Ts'o
@ 2016-05-30  5:39 ` Theodore Ts'o
  2016-05-30  5:39 ` [PATCH 5/5] random: properly align get_random_int_hash Theodore Ts'o
  2016-05-30 17:53 ` [PATCH-v3 0/5] random: replace urandom pool with a CRNG Andi Kleen
  5 siblings, 0 replies; 10+ messages in thread
From: Theodore Ts'o @ 2016-05-30  5:39 UTC (permalink / raw)
  To: Linux Kernel Developers List
  Cc: smueller, herbert, andi, sandyinchina, cryptography, jsd, hpa,
	linux-crypto, Theodore Ts'o

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
---
 drivers/char/random.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 47 insertions(+), 5 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index bf370a3..860862f 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -436,7 +436,8 @@ static int crng_init = 0;
 #define crng_ready() (likely(crng_init >= 2))
 static void _extract_crng(struct crng_state *crng,
 			  __u8 out[CHACHA20_BLOCK_SIZE]);
-static void extract_crng(__u8 out[CHACHA20_BLOCK_SIZE]);
+static void _crng_backtrack_protect(struct crng_state *crng,
+				    __u8 tmp[CHACHA20_BLOCK_SIZE], int used);
 static void process_random_ready_list(void);
 
 /**********************************************************************
@@ -832,8 +833,11 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r)
 		num = extract_entropy(r, &buf, 32, 16, 0);
 		if (num == 0)
 			return;
-	} else
+	} else {
 		_extract_crng(&primary_crng, buf.block);
+		_crng_backtrack_protect(&primary_crng, buf.block,
+					CHACHA20_KEY_SIZE);
+	}
 	spin_lock_irqsave(&primary_crng.lock, flags);
 	for (i = 0; i < 8; i++) {
 		unsigned long	rv;
@@ -893,9 +897,44 @@ static void extract_crng(__u8 out[CHACHA20_BLOCK_SIZE])
 	_extract_crng(crng, out);
 }
 
+/*
+ * Use the leftover bytes from the CRNG block output (if there is
+ * enough) to mutate the CRNG key to provide backtracking protection.
+ */
+static void _crng_backtrack_protect(struct crng_state *crng,
+				    __u8 tmp[CHACHA20_BLOCK_SIZE], int used)
+{
+	unsigned long	flags;
+	__u32		*s, *d;
+	int		i;
+
+	used = round_up(used, sizeof(__u32));
+	if (used + CHACHA20_KEY_SIZE > CHACHA20_BLOCK_SIZE) {
+		extract_crng(tmp);
+		used = 0;
+	}
+	spin_lock_irqsave(&crng->lock, flags);
+	s = (__u32 *) &tmp[used];
+	d = &crng->state[4];
+	for (i=0; i < 8; i++)
+		*d++ ^= *s++;
+	spin_unlock_irqrestore(&crng->lock, flags);
+}
+
+static void crng_backtrack_protect(__u8 tmp[CHACHA20_BLOCK_SIZE], int used)
+{
+#ifdef CONFIG_NUMA
+	struct crng_state *crng = crng_node_pool[numa_node_id()];
+#else
+	struct crng_state *crng = &primary_crng;
+#endif
+
+	_crng_backtrack_protect(crng, tmp, used);
+}
+
 static ssize_t extract_crng_user(void __user *buf, size_t nbytes)
 {
-	ssize_t ret = 0, i;
+	ssize_t ret = 0, i = CHACHA20_BLOCK_SIZE;
 	__u8 tmp[CHACHA20_BLOCK_SIZE];
 	int large_request = (nbytes > 256);
 
@@ -920,6 +959,7 @@ static ssize_t extract_crng_user(void __user *buf, size_t nbytes)
 		buf += i;
 		ret += i;
 	}
+	crng_backtrack_protect(tmp, i);
 
 	/* Wipe data just written to memory */
 	memzero_explicit(tmp, sizeof(tmp));
@@ -1473,8 +1513,10 @@ void get_random_bytes(void *buf, int nbytes)
 	if (nbytes > 0) {
 		extract_crng(tmp);
 		memcpy(buf, tmp, nbytes);
-		memzero_explicit(tmp, nbytes);
-	}
+		crng_backtrack_protect(tmp, nbytes);
+	} else
+		crng_backtrack_protect(tmp, CHACHA20_BLOCK_SIZE);
+	memzero_explicit(tmp, sizeof(tmp));
 }
 EXPORT_SYMBOL(get_random_bytes);
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 5/5] random: properly align get_random_int_hash
  2016-05-30  5:39 [PATCH-v3 0/5] random: replace urandom pool with a CRNG Theodore Ts'o
                   ` (3 preceding siblings ...)
  2016-05-30  5:39 ` [PATCH 4/5] random: add backtracking protection to the CRNG Theodore Ts'o
@ 2016-05-30  5:39 ` Theodore Ts'o
  2016-05-30 17:53 ` [PATCH-v3 0/5] random: replace urandom pool with a CRNG Andi Kleen
  5 siblings, 0 replies; 10+ messages in thread
From: Theodore Ts'o @ 2016-05-30  5:39 UTC (permalink / raw)
  To: Linux Kernel Developers List
  Cc: smueller, herbert, andi, sandyinchina, cryptography, jsd, hpa,
	linux-crypto, Eric Biggers, Theodore Ts'o

From: Eric Biggers <ebiggers3@gmail.com>

get_random_long() reads from the get_random_int_hash array using an
unsigned long pointer.  For this code to be guaranteed correct on all
architectures, the array must be aligned to an unsigned long boundary.

Signed-off-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
---
 drivers/char/random.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 860862f..90fb569 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -2033,13 +2033,15 @@ int random_int_secret_init(void)
 	return 0;
 }
 
+static DEFINE_PER_CPU(__u32 [MD5_DIGEST_WORDS], get_random_int_hash)
+		__aligned(sizeof(unsigned long));
+
 /*
  * Get a random word for internal kernel use only. Similar to urandom but
  * with the goal of minimal entropy pool depletion. As a result, the random
  * value is not cryptographically secure but for several uses the cost of
  * depleting entropy is too high
  */
-static DEFINE_PER_CPU(__u32 [MD5_DIGEST_WORDS], get_random_int_hash);
 unsigned int get_random_int(void)
 {
 	__u32 *hash;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/5] random: make /dev/urandom scalable for silly userspace programs
  2016-05-30  5:39 ` [PATCH 2/5] random: make /dev/urandom scalable for silly userspace programs Theodore Ts'o
@ 2016-05-30  6:03   ` Stephan Mueller
  2016-05-30 17:29     ` Theodore Ts'o
  0 siblings, 1 reply; 10+ messages in thread
From: Stephan Mueller @ 2016-05-30  6:03 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Linux Kernel Developers List, herbert, andi, sandyinchina,
	cryptography, jsd, hpa, linux-crypto

Am Montag, 30. Mai 2016, 01:39:22 schrieb Theodore Ts'o:

Hi Theodore,

> On a system with a 4 socket (NUMA) system where a large number of
> application threads were all trying to read from /dev/urandom, this
> can result in the system spending 80% of its time contending on the
> global urandom spinlock.  The application should have used its own
> PRNG, but let's try to help it from running, lemming-like, straight
> over the locking cliff.
> 
> Reported-by: Andi Kleen <ak@linux.intel.com>
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> ---
>  drivers/char/random.c | 57
> +++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 53
> insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 3a4d865..1778c84 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -434,6 +434,8 @@ struct crng_state primary_crng = {
>   */
>  static int crng_init = 0;
>  #define crng_ready() (likely(crng_init >= 2))
> +static void _extract_crng(struct crng_state *crng,
> +			  __u8 out[CHACHA20_BLOCK_SIZE]);
>  static void extract_crng(__u8 out[CHACHA20_BLOCK_SIZE]);
>  static void process_random_ready_list(void);
> 
> @@ -754,6 +756,17 @@ static void credit_entropy_bits_safe(struct
> entropy_store *r, int nbits)
> 
>  static DECLARE_WAIT_QUEUE_HEAD(crng_init_wait);
> 
> +#ifdef CONFIG_NUMA
> +/*
> + * Hack to deal with crazy userspace progams when they are all trying
> + * to access /dev/urandom in parallel.  The programs are almost
> + * certainly doing something terribly wrong, but we'll work around
> + * their brain damage.
> + */
> +static struct crng_state **crng_node_pool __read_mostly;
> +#endif
> +
> +
>  static void _initialize_crng(struct crng_state *crng)
>  {
>  	int		i;
> @@ -774,11 +787,13 @@ static void _initialize_crng(struct crng_state *crng)
>  	crng->init_time = jiffies - CRNG_RESEED_INTERVAL - 1;
>  }
> 
> +#ifdef CONFIG_NUMA
>  static void initialize_crng(struct crng_state *crng)
>  {
>  	_initialize_crng(crng);
>  	spin_lock_init(&crng->lock);
>  }
> +#endif
> 
>  static int crng_fast_load(__u32 pool[4])
>  {
> @@ -818,7 +833,7 @@ static void crng_reseed(struct crng_state *crng, struct
> entropy_store *r) if (num == 0)
>  			return;
>  	} else
> -		extract_crng(buf.block);
> +		_extract_crng(&primary_crng, buf.block);
>  	spin_lock_irqsave(&primary_crng.lock, flags);
>  	for (i = 0; i < 8; i++) {
>  		unsigned long	rv;
> @@ -838,19 +853,26 @@ static void crng_reseed(struct crng_state *crng,
> struct entropy_store *r) spin_unlock_irqrestore(&primary_crng.lock, flags);
>  }
> 
> +static inline void maybe_reseed_primary_crng(void)
> +{
> +	if (crng_init > 2 &&
> +	    time_after(jiffies, primary_crng.init_time + 
CRNG_RESEED_INTERVAL))
> +		crng_reseed(&primary_crng, &input_pool);
> +}
> +
>  static inline void crng_wait_ready(void)
>  {
>  	wait_event_interruptible(crng_init_wait, crng_ready());
>  }
> 
> -static void extract_crng(__u8 out[CHACHA20_BLOCK_SIZE])
> +static void _extract_crng(struct crng_state *crng,
> +			  __u8 out[CHACHA20_BLOCK_SIZE])
>  {
>  	unsigned long v, flags;
> -	struct crng_state *crng = &primary_crng;
> 
>  	if (crng_init > 2 &&
>  	    time_after(jiffies, crng->init_time + CRNG_RESEED_INTERVAL))
> -		crng_reseed(crng, &input_pool);
> +		crng_reseed(crng, crng == &primary_crng ? &input_pool : NULL);
>  	spin_lock_irqsave(&crng->lock, flags);
>  	if (arch_get_random_long(&v))
>  		crng->state[14] ^= v;
> @@ -860,6 +882,17 @@ static void extract_crng(__u8 out[CHACHA20_BLOCK_SIZE])
> spin_unlock_irqrestore(&crng->lock, flags);
>  }
> 
> +static void extract_crng(__u8 out[CHACHA20_BLOCK_SIZE])
> +{
> +#ifndef CONFIG_NUMA
> +	struct crng_state *crng = &primary_crng;
> +#else
> +	struct crng_state *crng = crng_node_pool[numa_node_id()];
> +#endif
> +
> +	_extract_crng(crng, out);
> +}
> +
>  static ssize_t extract_crng_user(void __user *buf, size_t nbytes)
>  {
>  	ssize_t ret = 0, i;
> @@ -1573,6 +1606,22 @@ static void init_std_data(struct entropy_store *r)
>   */
>  static int rand_initialize(void)
>  {
> +#ifdef CONFIG_NUMA
> +	int i;
> +	int num_nodes = num_possible_nodes();
> +	struct crng_state *crng;
> +
> +	crng_node_pool = kmalloc(num_nodes * sizeof(void *),
> +				 GFP_KERNEL|__GFP_NOFAIL);
> +
> +	for (i=0; i < num_nodes; i++) {
> +		crng = kmalloc(sizeof(struct crng_state),
> +			       GFP_KERNEL | __GFP_NOFAIL);
> +		initialize_crng(crng);

Could you please help me understand the logic flow here: The NUMA secondary 
DRNGs are initialized before the input/blocking pools and the primary DRNG.

The initialization call uses get_random_bytes() for the secondary DRNGs. But 
since the primary DRNG is not yet initialized, where does the get_random_bytes 
gets its randomness from?

> +		crng_node_pool[i] = crng;
> +
> +	}
> +#endif
>  	init_std_data(&input_pool);
>  	init_std_data(&blocking_pool);
>  	_initialize_crng(&primary_crng);


Ciao
Stephan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/5] random: make /dev/urandom scalable for silly userspace programs
  2016-05-30  6:03   ` Stephan Mueller
@ 2016-05-30 17:29     ` Theodore Ts'o
  0 siblings, 0 replies; 10+ messages in thread
From: Theodore Ts'o @ 2016-05-30 17:29 UTC (permalink / raw)
  To: Stephan Mueller
  Cc: Linux Kernel Developers List, herbert, andi, sandyinchina,
	cryptography, jsd, hpa, linux-crypto

On Mon, May 30, 2016 at 08:03:59AM +0200, Stephan Mueller wrote:
> >  static int rand_initialize(void)
> >  {
> > +#ifdef CONFIG_NUMA
> > +	int i;
> > +	int num_nodes = num_possible_nodes();
> > +	struct crng_state *crng;
> > +
> > +	crng_node_pool = kmalloc(num_nodes * sizeof(void *),
> > +				 GFP_KERNEL|__GFP_NOFAIL);
> > +
> > +	for (i=0; i < num_nodes; i++) {
> > +		crng = kmalloc(sizeof(struct crng_state),
> > +			       GFP_KERNEL | __GFP_NOFAIL);
> > +		initialize_crng(crng);
> 
> Could you please help me understand the logic flow here: The NUMA secondary 
> DRNGs are initialized before the input/blocking pools and the primary DRNG.
> 
> The initialization call uses get_random_bytes() for the secondary DRNGs. But 
> since the primary DRNG is not yet initialized, where does the get_random_bytes 
> gets its randomness from?

Yeah, I screwed up.  The hunk of code staring with "crng_node_pool =
kmalloc(..." and the for loop afterwards should be moved to after
_initialize_crng().  Serves me right for not testing CONFIG_NUMA
before sending out the patches.

This is *not* about adding entropy; as you've noted, this is done very
early in boot up, before there has been any chance for any kind of
entropy to be collected in any of the input pools.  It's more of a
insurance policy just in case on some platform, if it turns out that
assuming a bit's worth of entropy per interrupt was hopelessly
over-optimistic, at least the starting point will be different across
different kernels (and maybe different boot times, but on the sorts of
platforms where I'm most concerned, there may not be a real-time clock
and almost certainly not a architectural hwrng in the CPU).

	      		      	       - Ted

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH-v3 0/5] random: replace urandom pool with a CRNG
  2016-05-30  5:39 [PATCH-v3 0/5] random: replace urandom pool with a CRNG Theodore Ts'o
                   ` (4 preceding siblings ...)
  2016-05-30  5:39 ` [PATCH 5/5] random: properly align get_random_int_hash Theodore Ts'o
@ 2016-05-30 17:53 ` Andi Kleen
  2016-05-30 20:59   ` Theodore Ts'o
  5 siblings, 1 reply; 10+ messages in thread
From: Andi Kleen @ 2016-05-30 17:53 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Linux Kernel Developers List, smueller, herbert, andi,
	sandyinchina, cryptography, jsd, hpa, linux-crypto

> In addition, on NUMA systems we make the CRNG state per-NUMA socket, to
> address the NUMA locking contention problem which Andi Kleen has been
> complaining about.  I'm not entirely sure this will work well on the
> crazy big SGI systems, but they are rare.  Whether they are rarer than

It should work the same on larger systems, the solution scales
naturally to lots of sockets. It's not clear it'll help enough on systems
with a lot more cores per socket, like a Xeon Phi. But for now it should
be good enough.

-Andi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH-v3 0/5] random: replace urandom pool with a CRNG
  2016-05-30 17:53 ` [PATCH-v3 0/5] random: replace urandom pool with a CRNG Andi Kleen
@ 2016-05-30 20:59   ` Theodore Ts'o
  0 siblings, 0 replies; 10+ messages in thread
From: Theodore Ts'o @ 2016-05-30 20:59 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linux Kernel Developers List, smueller, herbert, sandyinchina,
	cryptography, jsd, hpa, linux-crypto

On Mon, May 30, 2016 at 10:53:22AM -0700, Andi Kleen wrote:
> 
> It should work the same on larger systems, the solution scales
> naturally to lots of sockets. It's not clear it'll help enough on systems
> with a lot more cores per socket, like a Xeon Phi. But for now it should
> be good enough.

One change which I'm currently making is to use kmalloc_node() instead
of kmalloc() for the per-NUMA node, and I suspect *that* is going
to make a quite a lot of different on those systems where the ratio of
remote to local memory access times is larger (as I assume it probably
would be on really big systems).

						- Ted

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-05-30 20:59 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-30  5:39 [PATCH-v3 0/5] random: replace urandom pool with a CRNG Theodore Ts'o
2016-05-30  5:39 ` [PATCH 1/5] random: replace non-blocking pool with a Chacha20-based CRNG Theodore Ts'o
2016-05-30  5:39 ` [PATCH 2/5] random: make /dev/urandom scalable for silly userspace programs Theodore Ts'o
2016-05-30  6:03   ` Stephan Mueller
2016-05-30 17:29     ` Theodore Ts'o
2016-05-30  5:39 ` [PATCH 3/5] random: add interrupt callback to VMBus IRQ handler Theodore Ts'o
2016-05-30  5:39 ` [PATCH 4/5] random: add backtracking protection to the CRNG Theodore Ts'o
2016-05-30  5:39 ` [PATCH 5/5] random: properly align get_random_int_hash Theodore Ts'o
2016-05-30 17:53 ` [PATCH-v3 0/5] random: replace urandom pool with a CRNG Andi Kleen
2016-05-30 20:59   ` Theodore Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).