linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] random: add regrand
       [not found] <20171110213039.GA6911@cork>
@ 2017-11-11  1:17 ` Jörn Engel
       [not found] ` <20171110235212.7c2qt7tc46q7w5d2@thunk.org>
  1 sibling, 0 replies; 2+ messages in thread
From: Jörn Engel @ 2017-11-11  1:17 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Kees Cook, Jason A. Donenfeld, Michael Schmitz, Stephan Mueller,
	Jason Cooper, Herbert Xu, H. Peter Anvin, Christoph Hellwig,
	Greg Price, linux-kernel

Forgot to Cc: linux-kernel.

On Fri, Nov 10, 2017 at 01:30:39PM -0800, Jörn Engel wrote:
> Regrand is a replacement for drivers/char/random.c.  It is supposed to
> achieve the following design goals:
> 
> 1. /dev/random shall never block.
> 2. /dev/urandom shall never return bad randomness.
> 3. Any machine supported by Linux must have good randomness.
> 4. Any entropy source that is unavailable on some machines is useless.
> 5. By the time we start userspace, the random pool must be fully initialized.
> 6. Entropy never gets XOR'ed, always hashed.
> 7. In-kernel users never receive bad randomness.
> 
> I believe it achieves all but the last.  Sadly there are a few
> exceptions where random numbers are consumed before the entropy pool can
> be initialized.  In contrast, I believe current drivers/char/random.c
> achieves none of the above.
> 
> That is a pretty bold claim, so please try to poke holes.  For the
> moment, I think regrand should be optional and users should decide which
> random number generator they trust more.  Maybe after a year or two,
> provided noone manages to find a problem in regrand, it can be made the
> default.
> 
> Copyright for regrand proper is Public Domain.  Feel free to port the
> version in Documentation/regrand.c to any OS.  Most of the code in
> drivers/char/regrand.c is copied from either crypto/sha3_generic.c or
> drivers/char/random.c.  I didn't want to go through crypto API to access
> sha3 code and had to implement an interface compatible to the existing
> random number generator.  Regrand proper is tiny in comparison.
> 
> Happy reading.
> 
> Signed-off-by: Joern Engel <joern@purestorage.com>
> ---
>  Documentation/regrand.c               | 365 ++++++++++++++++++
>  arch/Kconfig                          |   1 +
>  arch/x86/include/asm/stackprotector.h |   2 +-
>  arch/x86/kernel/apic/apic.c           |   2 +
>  drivers/char/Kconfig                  |  11 +
>  drivers/char/Makefile                 |   7 +-
>  drivers/char/regrand.c                | 673 ++++++++++++++++++++++++++++++++++
>  include/linux/genhd.h                 |   5 +
>  include/linux/hw_random.h             |   4 +
>  include/linux/random.h                |  25 +-
>  kernel/fork.c                         |   2 +-
>  kernel/panic.c                        |   2 +-
>  mm/slab.c                             |   2 +-
>  mm/slab_common.c                      |   4 +-
>  mm/slub.c                             |   5 +-
>  15 files changed, 1093 insertions(+), 17 deletions(-)
>  create mode 100644 Documentation/regrand.c
>  create mode 100644 drivers/char/regrand.c
> 
> diff --git a/Documentation/regrand.c b/Documentation/regrand.c
> new file mode 100644
> index 000000000000..07a1b87c2bb7
> --- /dev/null
> +++ b/Documentation/regrand.c
> @@ -0,0 +1,365 @@
> +/*
> + * regrand - Random number generator using register state at time of
> + * interrupt.  Uses the same principle as Dakarand, drift between two
> + * unsynchronized high-precision timers, as entropy source.  Should
> + * work reasonably well on any CPU that can generate interrupts.
> + *
> + * Public Domain
> + */
> +#error ADD INCLUDES
> +
> +/*
> + * Based on a million boots, each interrupt seems to yield about 10-15
> + * bits of entropy.  That means 20 samples might be enough for
> + * cryptographically strong random numbers.  Naturally we want more to
> + * give us a comfortable safety margin.  128 should be good enough.
> + */
> +#define SAMPLES_NEEDED	(128)					// [2]
> +
> +#define	ACCESS_ONCE(x)		(*(volatile __typeof(x) *)&(x))
> +
> +/* 256b hash value */
> +struct half_hash {						// [1]
> +	uint64_t h[4];
> +};
> +
> +/* 512b hash value */
> +struct full_hash {						// [1]
> +	union {
> +		struct half_hash half[2];
> +		uint64_t h[8];
> +	};
> +};
> +
> +struct per_cpu_state {						// [3]
> +	struct half_hash half;	/* 256 bits of state */
> +	uint64_t p_time;	/* last time we produced entropy */
> +	uint64_t c_time;	/* last time we consumed entropy */
> +};
> +
> +static struct half_hash global_pool;				// [3]
> +static int uninitialized_count = INT_MAX;			// [8]
> +static int global_lock;						// [4]
> +
> +static struct per_cpu_state *get_local_state(void)		// [5]
> +{
> +#error FILL ME IN
> +	/*
> +	 * Should return a pointer to per-cpu state.  Most RNG
> +	 * operations are done on local state, without cacheline
> +	 * pingpong causing unnecessary slowdowns.
> +	 * Bonus points if the pointer is cacheline-aligned.  The
> +	 * structure size of 64 bytes is chosen to fit into
> +	 * 1-2 cachelines in most architectures.
> +	 */
> +}
> +
> +static uint64_t get_time(void)					// [5]
> +{
> +#error FILL ME IN
> +	/*
> +	 * Should return a time.  Doesn't need to be monotonic or
> +	 * non-wrapping, but should be scaled to increment every
> +	 * 1-10ms or so.  We will sample entropy on every increment,
> +	 * so higher precision means more entropy and more overhead,
> +	 * lower precision means less entropy and less overhead.
> +	 */
> +}
> +
> +static struct full_hash hash512(void *p1, int n1, void *p2, int n2)	// [5]
> +{
> +#error FILL ME IN
> +	/*
> +	 * Calculate a 512bit hash from both inputs.  Second input may
> +	 * be empty.  I'd suggest 512bit sha2 or sha3.
> +	 */
> +}
> +
> +static void wait_a_while(void)					// [5]
> +{
> +#error FILL ME IN
> +	/*
> +	 * Some delay function - only used when asking for random
> +	 * numbers before the pool has initialized.  Feel free to do
> +	 * something fancy like completions if you don't want a
> +	 * spinning loop.
> +	 */
> +}
> +
> +#if 0 /* enable to estimate boottime entropy */
> +static uint32_t boot_log[SAMPLES_NEEDED][3];			// [6]
> +static uint32_t tsc0;
> +
> +static void log_boot_entropy(struct pt_regs *regs, uint64_t cascade_hash)
> +{
> +	struct full_hash hash;
> +	int i, slot = SAMPLES_NEEDED - uninitialized_count;
> +
> +	if ((unsigned)slot > SAMPLES_NEEDED)
> +		return;
> +	if (slot == 0)
> +		tsc0 = rdtsc();
> +	hash = hash512(regs, sizeof(struct pt_regs), NULL, 0);
> +	boot_log[slot][0] = hash.h[0];
> +	boot_log[slot][1] = cascade_hash;
> +	boot_log[slot][2] = rdtsc() - tsc0;
> +	if (uninitialized_count == 1) {
> +		for (i = 0; i < SAMPLES_NEEDED; i++) {
> +			pr_info("boot_entropy %03d: %08x %08x %08x\n", i, boot_log[i][0], boot_log[i][1], boot_log[i][2]);
> +		}
> +	}
> +}
> +#else
> +static inline void log_boot_entropy(struct pt_regs *regs, uint64_t cascade_hash) {}
> +#endif
> +
> +static void produce_entropy_from_irq_regs(void *regs, int n)	// [7]
> +{
> +#error HOOK ME UP
> +	/*
> +	 * Should be called on every interrupt.  Arguments should be
> +	 * start and size of saved register state at the time of
> +	 * interrupt.
> +	 */
> +	struct per_cpu_state *state = get_local_state();
> +	uint64_t time = get_time();
> +	struct full_hash hash;
> +
> +	/* Ratelimit to reduce interrupt overhead */
> +	if (state->p_time == time && !uninitialized_count)
> +		return;
> +	state->p_time = time;
> +
> +	/* Mix local state and new entropy from registers */
> +	hash = hash512(state, sizeof(state->half), regs, sizeof(struct pt_regs));
> +	state->half = hash.half[0];
> +
> +	/* Only mix with global state, if uncontended */
> +	if (!trylock(&global_lock))
> +		return;
> +
> +	/* Mix local and global state.  */
> +	hash = hash512(&global_pool, sizeof(global_pool), &hash, sizeof(hash));
> +	state->half = hash.half[0];
> +	global_pool = hash.half[1];
> +
> +	if (uninitialized_count) {
> +		log_boot_entropy(regs, hash.h[0]);
> +		uninitialized_count--;
> +	}
> +	unlock(&global_lock);
> +}
> +
> +static void regrand_init(void)					// [8]
> +{
> +#error HOOK ME UP
> +	struct full_hash hash;
> +
> +	/* Interrupts before this loop may be synchronized */
> +	uninitialized_count = SAMPLES_NEEDED;
> +	/* without volatile, gcc will turn this into while (1) */
> +	while (ACCESS_ONCE(uninitialized_count)) {
> +		hash = hash512(&hash, sizeof(hash), NULL, 0);
> +	}
> +}
> +
> +/*
> + * Returns 32 likely-random bytes.  No guarantees, as we don't wait
> + * for enough entropy to accumulate.  Use only if you need semi-good
> + * entropy early in boot before RNG initialization can happen.
> + */
> +static struct half_hash consume_entropy_unsafe(void)		// [9]
> +{
> +#error HOOK ME UP
> +	struct per_cpu_state *state = get_local_state();
> +	uint64_t time = get_time();
> +	struct full_hash hash;
> +
> +	if (state->c_time != time || uninitialized_count) {
> +		state->c_time = time;
> +		/* Mix global pool into local pool, no locking */
> +		hash = hash512(&global_pool, sizeof(global_pool), &state->half, sizeof(state->half));
> +	} else {
> +		/* Only use local pool, reducing contention and overhead */
> +		hash = hash512(&state->half, sizeof(state->half), NULL, 0);
> +	}
> +	state->half = hash.half[0];
> +	return hash.half[1];
> +}
> +
> +/* Returns 32 random bytes */
> +struct half_hash consume_entropy(void)				// [10]
> +{
> +#error HOOK ME UP
> +	while (uninitialized_count)
> +		wait_a_while();
> +	return consume_entropy_unsafe();
> +}
> +
> +/*
> + * Design goals:
> + * -------------
> + * 1. /dev/random shall never block.
> + * 2. /dev/urandom shall never return bad randomness.
> + *
> + * That's it, really.  Everything else is a corrolary.
> + *
> + * 3. Any machine supported by Linux must have good randomness.
> + * 4. Any entropy source that is unavailable on some machines is useless.
> + * 5. By the time we start userspace, the random pool must be fully initialized.
> + * 6. Entropy never gets XOR'ed, always hashed.
> + * 7. In-kernel users never receive bad randomness.
> + *
> + * The existing Linux kernel RNG (drivers/char/random) fails those goals.  I
> + * personally find it so bad that it was better to restart from scratch than to
> + * try and salvage anything from it.
> + *
> + *
> + * Design principle:
> + * -----------------
> + * Regrand is basically dakarand, Dan Kaminsky famous 4-line javascript RNG:
> + * function millis()         { return Date.now(); }
> + * function flip_coin()      { n=0; then = millis()+1; while(millis()<=then) { n=!n; } return n; }
> + * function get_fair_bit()   { while(1) { a=flip_coin(); if(a!=flip_coin()) { return(a); } } }
> + * function get_random_byte(){ n=0; bits=8; while(bits--){ n<<=1; n|=get_fair_bit(); } return n; }
> + *
> + * The principle behind both is to extract entropy from the drift between two
> + * unsynchronized high-precision clocks.  Both use CPU registers and interrupt
> + * timing as the two clocks.
> + *
> + * Regrand uses the stored registers at interrupt time and hashes the lot.  The
> + * stored registers serve as a high-resolution timer of sorts.  At least one of
> + * the registers tends to change with every clock cycle, often more than one if
> + * the CPU can exploit instruction level parallelism (IPL).
> + *
> + * Naïvely using registers during bootup tends to yield no entropy, because
> + * interrupts are disabled for much of the bootup process and tend to happen
> + * immediately after re-enabling interrupts.  That means the register state is
> + * synchronized.  To avoid this problem, regrand_init() monopolized the CPU with
> + * interrupts enabled.
> + *
> + * regrand_init() has to run some kind of loop that permutes register content.
> + * We want more than one bit of entropy per sample, so instead of dakarand's
> + * "n=!n" we calculate hashes.  That should have reasonably high IPL and improve
> + * timer resolution.  It also avoids going back to old register state in the
> + * loop.
> + *
> + *
> + * Details:
> + * --------
> + * [1] We use a 256bit half_hash and a 512bit full_hash throughout.  256bit of
> + * state are enough for a good crypto-hash based PRNG.
> + *
> + * [2] To initialize the RNG we need 128 samples.  Empirically each sample seems
> + * to contain more than 10 bits of entropy, giving us >1280 bits in total.  That
> + * is comfortably more than the 256 bits needed, giving a nice safety margin.
> + * If you are really paranoid, feel free to increase SAMPLES_NEEDED.
> + *
> + * Don't even think about decreasing SAMPLES_NEEDED.  Seriously!
> + *
> + * [3] Most operations only consider per_cpu_state.  Once per get_time()
> + * increment we mix per_cpu_state with global state, both for producers and
> + * consumers.  That way the common operations are relatively cheap and scale
> + * well.  At the same time we keep mixing new entropy into the global pool and
> + * tap into the global pool even on CPUs that don't ever see interrupts.
> + *
> + * [4] Adding entropy to the global pool is protected by a lock.  On lock
> + * contention the losing CPU simply ignores the global pool.  One update per
> + * get_time() increment is enough, there is no need to turn the global pool into
> + * a bottleneck when you have enough entropy anyway.
> + *
> + * [5] get_local_state(), get_time(), hash512() and wait_a_while() need to get
> + * implemented somehow.  Details vary between operating systems.
> + *
> + * [6] log_boot_entropy() can be used to estimate how much entropy is being
> + * gathered.  See 'entropy estimate' below.
> + *
> + * [7] produce_entropy_from_irq_regs() is the sole entropy producer.  In the
> + * common case it just returns.  Therefore it should be safe to call this
> + * function on every interrupt, even at rates of 100k interrupts per second.
> + * Once per get_time() increment it hashes registers and updates local state.
> + * If global state isn't locked, it mixes local and global state.
> + *
> + * uninitialized_count is only decremented if we can mix with the global pool.
> + * Therefore, we know the global pool has been mixed SAMPLES_NEEDED times
> + * before we consider the RNG initialized.
> + *
> + * On bootup it doesn't wait for get_time() increments and always collects
> + * entropy.
> + *
> + * Mixing is done via hash512().  Mixing via weak hash functions or by XORing
> + * new values in allows attacks where attacker-controlled data removes entropy
> + * from the pool.  Please never do such a thing.
> + *
> + * [8] regrand_init() provides a "counter loop" that is not synchronized with
> + * interrupts.  It also resets uninitialized_count.  We don't trust any samples
> + * before regrand_init() starts, so uninitialized_count is set to INT_MAX on
> + * boot.  Assuming no more than INT_MAX interrupts occur before regrand_init()
> + * is called, that's an easy way to not account the untrusted samples.
> + *
> + * [9] consume_entropy_unsafe() commonly calculates a 512bit hash from the
> + * 256bit per_cpu_state.  Half of that get returned, the other half become the
> + * new per_cpu_state.  Predicting the other half from returned values should be
> + * impossible.  If it is possible, you didn't use a good enough crypto-hash.
> + *
> + * Once per get_time() increment we hash both local and global pool.  Therefore
> + * any CPU received fresh entropy at a steady rate, even CPUs that never see
> + * interrupts.
> + *
> + * [10] consume_entropy() is the main consumer, ideally the only consumer.
> + * Calling it before RNG initialization happened results in blocking, but never
> + * returns bad random numbers.
> + *
> + * Calling it before RNG initialization can happen results in a deadlock and a
> + * machine that cannot boot.  In Linux the stack protector canary and some
> + * slab allocator debug code has such a problem.  If possible, we should
> + * reorder the code such that RNG initialization happens first.  If that is not
> + * possible, I don't have a good answer.  You can try using rdrand() or
> + * something like that, but the regular system RNG cannot give you a good random
> + * number.
> + *
> + * I really hate ever returning bad random numbers.  That almost always leads
> + * to bugs because people make assumptions that only work with good random
> + * numbers - sometimes with catastrophic consequences.  So the least we can do
> + * is make it bloody obvious that callers are doing something horrible.  Maybe
> + * consume_entropy_unsafe() should be renamed to
> + * consume_entropy_unsafe____any_callers_must_be_fully_aware_of_consequences()
> + * or something like that.
> + *
> + *
> + * Entropy estimate:
> + * ----------------
> + *
> + * I booted an allnoconfig-based kernel a million times with log_boot_entropy()
> + * enabled.  Precise command line was:
> + * taskset -c $1 chrt -r 50 timeout 1 kvm -serial file:$S -kernel bzImage -display none -append 'ro console=ttyS0,115200 console=tty0' > /dev/null 2>&1
> + * My test machine had 20 cores and I used 19 threads.  Hyperthreading was
> + * enabled, I have yet to repeat this test with hyperthreading disabled.
> + * CONFIG_HZ=1000 was chosen to minimize both boot time and entropy gathered.
> + *
> + * The most common register hashes were seen <3000 times for any one sample.
> + * That would give 8 bits of entropy for those common samples or more entropy
> + * for less common samples.  The most common register hash across all samples
> + * was see 350517 times, giving a little more than 1 bit of entropy.  The 63rd
> + * most common was seen 12208 times, the 127th most common 11110 times.  At
> + * least 94 samples should yield 6 bits of entropy or more.  Assuming we only
> + * encounter samples from the most common set and extract no entropy from the
> + * order of samples, that would still give more than 564 bits.
> + *
> + * Zstd -19 can compress the rdtsc() values to 266MB, indicating an average
> + * entropy of 266 bytes or 2128 bits.  With a delta filter I can reduce that to
> + * 1868 bits.  Delta filter and xz -9 achieves 1621 bits.  That is still 6x more
> + * than the 256 bits we need, a big enough safety margin to let me sleep at
> + * night.  Register hashes compress much worse - they probably don't contain any
> + * more entropy, but are computationally much more expensive than a simple
> + * counter.  I get 4159 bits with zstd -19.
> + *
> + * If someone with better statistical skills (or just anyone) would like to
> + * double check, I can provide the logs.  Full logs are 1.6G compressed.
> + *
> + * If someone has the resources to do large-scale boot tests on hardware, please
> + * do so.
> + *
> + * Please don't take my word and question everything I said.  The OS' random
> + * number generator is something we really want to get right.
> + */
> diff --git a/arch/Kconfig b/arch/Kconfig
> index d789a89cb32c..382c8ef44cc8 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -431,6 +431,7 @@ config GCC_PLUGIN_SANCOV
>  config GCC_PLUGIN_LATENT_ENTROPY
>  	bool "Generate some entropy during boot and runtime"
>  	depends on GCC_PLUGINS
> +	depends on !REGRAND
>  	help
>  	  By saying Y here the kernel will instrument some kernel code to
>  	  extract some entropy from both original and artificially created
> diff --git a/arch/x86/include/asm/stackprotector.h b/arch/x86/include/asm/stackprotector.h
> index 8abedf1d650e..b057b49bee9b 100644
> --- a/arch/x86/include/asm/stackprotector.h
> +++ b/arch/x86/include/asm/stackprotector.h
> @@ -71,7 +71,7 @@ static __always_inline void boot_init_stack_canary(void)
>  	 * there it already has some randomness on most systems. Later
>  	 * on during the bootup the random pool has true entropy too.
>  	 */
> -	get_random_bytes(&canary, sizeof(canary));
> +	__get_random_bytes(&canary, sizeof(canary));
>  	tsc = rdtsc();
>  	canary += tsc + (tsc << 32UL);
>  	canary &= CANARY_MASK;
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index ff891772c9f8..639cdd22382a 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -34,6 +34,7 @@
>  #include <linux/dmi.h>
>  #include <linux/smp.h>
>  #include <linux/mm.h>
> +#include <linux/random.h>
>  
>  #include <asm/trace/irq_vectors.h>
>  #include <asm/irq_remapping.h>
> @@ -1055,6 +1056,7 @@ __visible void __irq_entry smp_apic_timer_interrupt(struct pt_regs *regs)
>  	entering_ack_irq();
>  	trace_local_timer_entry(LOCAL_TIMER_VECTOR);
>  	local_apic_timer_interrupt();
> +	add_interrupt_randomness(0, 0);
>  	trace_local_timer_exit(LOCAL_TIMER_VECTOR);
>  	exiting_irq();
>  
> diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig
> index 623714344600..4eb3a66786e4 100644
> --- a/drivers/char/Kconfig
> +++ b/drivers/char/Kconfig
> @@ -6,6 +6,17 @@ menu "Character devices"
>  
>  source "drivers/tty/Kconfig"
>  
> +config REGRAND
> +	bool "regrand random number generator"
> +	default n
> +	help
> +	  Say Y here if you want to use regrand random number
> +	  generator instead of the default choice.  Regrand uses a
> +	  single entropy source - the register state at the time of
> +	  interrupts.  It is likely a better choice, but relatively
> +	  new and has therefore seen less scrutiny.
> +	  When in doubt, say "N".
> +
>  config DEVMEM
>  	bool "/dev/mem virtual device support"
>  	default y
> diff --git a/drivers/char/Makefile b/drivers/char/Makefile
> index 53e33720818c..e42fc5b17eba 100644
> --- a/drivers/char/Makefile
> +++ b/drivers/char/Makefile
> @@ -2,7 +2,12 @@
>  # Makefile for the kernel character device drivers.
>  #
>  
> -obj-y				+= mem.o random.o
> +obj-y				+= mem.o
> +ifeq ($(CONFIG_REGRAND),y)
> +  obj-y				+= regrand.o
> +else
> +  obj-y				+= random.o
> +endif
>  obj-$(CONFIG_TTY_PRINTK)	+= ttyprintk.o
>  obj-y				+= misc.o
>  obj-$(CONFIG_ATARI_DSP56K)	+= dsp56k.o
> diff --git a/drivers/char/regrand.c b/drivers/char/regrand.c
> new file mode 100644
> index 000000000000..6e1b28fb4e05
> --- /dev/null
> +++ b/drivers/char/regrand.c
> @@ -0,0 +1,673 @@
> +/*
> + * regrand - Random number generator using register state at time of
> + * interrupt.  Uses the same principle as Dakarand, drift between two
> + * unsynchronized high-precision timers, as entropy source.  Should
> + * work reasonably well on any CPU that can generate interrupts.
> + *
> + * Public Domain
> + */
> +#include <linux/kernel.h>
> +#include <linux/random.h>
> +#include <linux/sched.h>
> +#include <linux/syscalls.h>
> +
> +/*
> + * Based on a million boots, each interrupt seems to yield about 10-15
> + * bits of entropy.  That means 20 samples might be enough for
> + * cryptographically strong random numbers.  Naturally we want more to
> + * give us a comfortable safety margin.  128 should be good enough.
> + */
> +#define SAMPLES_NEEDED	(128)
> +
> +/* 256b hash value */
> +struct half_hash {
> +	uint64_t h[4];
> +};
> +
> +/* 512b hash value */
> +struct full_hash {
> +	union {
> +		struct half_hash half[2];
> +		uint64_t h[8];
> +	};
> +};
> +
> +struct per_cpu_state {
> +	struct half_hash half;	/* 256 bits of state */
> +	uint64_t p_time;	/* last time we produced entropy */
> +	uint64_t c_time;	/* last time we consumed entropy */
> +};
> +
> +static DEFINE_PER_CPU(struct per_cpu_state, per_cpu_state);
> +
> +static struct half_hash global_pool;
> +static int uninitialized_count = INT_MAX;
> +static DEFINE_SPINLOCK(global_lock);
> +static DECLARE_WAIT_QUEUE_HEAD(uninitialized_wait);
> +
> +static struct per_cpu_state *get_local_state(void)
> +{
> +	/*
> +	 * Should return a pointer to per-cpu state.  Most RNG
> +	 * operations are done on local state, without cacheline
> +	 * pingpong causing unnecessary slowdowns.
> +	 * Bonus points if the pointer is cacheline-aligned.  The
> +	 * structure size of 64 bytes is chosen to perfectly fit into
> +	 * 1-2 cachelines in most architectures.
> +	 */
> +	return this_cpu_ptr(&per_cpu_state);
> +}
> +
> +static uint64_t get_time(void)
> +{
> +	/*
> +	 * Should return a time.  Doesn't need to be monotonic or
> +	 * non-wrapping, but should be scaled to increment every
> +	 * 1-10ms or so.  We will sample entropy on every increment,
> +	 * so higher precision means more entropy and more overhead,
> +	 * lower precision means less entropy and less overhead.
> +	 */
> +	return jiffies;
> +}
> +
> +/*
> + * Sha3 code copied from crypto/sha3_generic.c.  For one we cannot
> + * depend on loadable modules.  Also, the necessary boilerplate code
> + * to do error handling on functions that shouldn't return errors to
> + * begin with is about as much as a full implementation of sha3.
> + * Therefore we have yet another copy for now.
> + */
> +#define SHA3_224_DIGEST_SIZE	(224 / 8)
> +#define SHA3_224_BLOCK_SIZE	(200 - 2 * SHA3_224_DIGEST_SIZE)
> +
> +#define SHA3_256_DIGEST_SIZE	(256 / 8)
> +#define SHA3_256_BLOCK_SIZE	(200 - 2 * SHA3_256_DIGEST_SIZE)
> +
> +#define SHA3_384_DIGEST_SIZE	(384 / 8)
> +#define SHA3_384_BLOCK_SIZE	(200 - 2 * SHA3_384_DIGEST_SIZE)
> +
> +#define SHA3_512_DIGEST_SIZE	(512 / 8)
> +#define SHA3_512_BLOCK_SIZE	(200 - 2 * SHA3_512_DIGEST_SIZE)
> +
> +struct sha3_state {
> +	u64		st[25];
> +	unsigned int	md_len;
> +	unsigned int	rsiz;
> +	unsigned int	rsizw;
> +
> +	unsigned int	partial;
> +	u8		buf[SHA3_224_BLOCK_SIZE];
> +};
> +
> +#define KECCAK_ROUNDS 24
> +
> +#define ROTL64(x, y) (((x) << (y)) | ((x) >> (64 - (y))))
> +
> +static const u64 keccakf_rndc[24] = {
> +	0x0000000000000001ULL, 0x0000000000008082ULL, 0x800000000000808aULL,
> +	0x8000000080008000ULL, 0x000000000000808bULL, 0x0000000080000001ULL,
> +	0x8000000080008081ULL, 0x8000000000008009ULL, 0x000000000000008aULL,
> +	0x0000000000000088ULL, 0x0000000080008009ULL, 0x000000008000000aULL,
> +	0x000000008000808bULL, 0x800000000000008bULL, 0x8000000000008089ULL,
> +	0x8000000000008003ULL, 0x8000000000008002ULL, 0x8000000000000080ULL,
> +	0x000000000000800aULL, 0x800000008000000aULL, 0x8000000080008081ULL,
> +	0x8000000000008080ULL, 0x0000000080000001ULL, 0x8000000080008008ULL
> +};
> +
> +static const int keccakf_rotc[24] = {
> +	1,  3,  6,  10, 15, 21, 28, 36, 45, 55, 2,  14,
> +	27, 41, 56, 8,  25, 43, 62, 18, 39, 61, 20, 44
> +};
> +
> +static const int keccakf_piln[24] = {
> +	10, 7,  11, 17, 18, 3, 5,  16, 8,  21, 24, 4,
> +	15, 23, 19, 13, 12, 2, 20, 14, 22, 9,  6,  1
> +};
> +
> +/* update the state with given number of rounds */
> +
> +static void keccakf(u64 st[25])
> +{
> +	int i, j, round;
> +	u64 t, bc[5];
> +
> +	for (round = 0; round < KECCAK_ROUNDS; round++) {
> +
> +		/* Theta */
> +		for (i = 0; i < 5; i++)
> +			bc[i] = st[i] ^ st[i + 5] ^ st[i + 10] ^ st[i + 15]
> +				^ st[i + 20];
> +
> +		for (i = 0; i < 5; i++) {
> +			t = bc[(i + 4) % 5] ^ ROTL64(bc[(i + 1) % 5], 1);
> +			for (j = 0; j < 25; j += 5)
> +				st[j + i] ^= t;
> +		}
> +
> +		/* Rho Pi */
> +		t = st[1];
> +		for (i = 0; i < 24; i++) {
> +			j = keccakf_piln[i];
> +			bc[0] = st[j];
> +			st[j] = ROTL64(t, keccakf_rotc[i]);
> +			t = bc[0];
> +		}
> +
> +		/* Chi */
> +		for (j = 0; j < 25; j += 5) {
> +			for (i = 0; i < 5; i++)
> +				bc[i] = st[j + i];
> +			for (i = 0; i < 5; i++)
> +				st[j + i] ^= (~bc[(i + 1) % 5]) &
> +					     bc[(i + 2) % 5];
> +		}
> +
> +		/* Iota */
> +		st[0] ^= keccakf_rndc[round];
> +	}
> +}
> +
> +static void sha3_512_init(struct sha3_state *sctx)
> +{
> +	memset(sctx, 0, sizeof(*sctx));
> +	sctx->md_len = SHA3_512_DIGEST_SIZE;
> +	sctx->rsiz = 200 - 2 * SHA3_512_DIGEST_SIZE;
> +	sctx->rsizw = sctx->rsiz / 8;
> +}
> +
> +static void sha3_update(struct sha3_state *sctx, const u8 *data, unsigned int len)
> +{
> +	unsigned int done;
> +	const u8 *src;
> +
> +	done = 0;
> +	src = data;
> +
> +	if ((sctx->partial + len) > (sctx->rsiz - 1)) {
> +		if (sctx->partial) {
> +			done = -sctx->partial;
> +			memcpy(sctx->buf + sctx->partial, data,
> +			       done + sctx->rsiz);
> +			src = sctx->buf;
> +		}
> +
> +		do {
> +			unsigned int i;
> +
> +			for (i = 0; i < sctx->rsizw; i++)
> +				sctx->st[i] ^= ((u64 *) src)[i];
> +			keccakf(sctx->st);
> +
> +			done += sctx->rsiz;
> +			src = data + done;
> +		} while (done + (sctx->rsiz - 1) < len);
> +
> +		sctx->partial = 0;
> +	}
> +	memcpy(sctx->buf + sctx->partial, src, len - done);
> +	sctx->partial += (len - done);
> +}
> +
> +static void sha3_final(struct sha3_state *sctx, void *out)
> +{
> +	unsigned int i, inlen = sctx->partial;
> +
> +	sctx->buf[inlen++] = 0x06;
> +	memset(sctx->buf + inlen, 0, sctx->rsiz - inlen);
> +	sctx->buf[sctx->rsiz - 1] |= 0x80;
> +
> +	for (i = 0; i < sctx->rsizw; i++)
> +		sctx->st[i] ^= ((u64 *) sctx->buf)[i];
> +
> +	keccakf(sctx->st);
> +
> +	for (i = 0; i < sctx->rsizw; i++)
> +		sctx->st[i] = cpu_to_le64(sctx->st[i]);
> +
> +	memcpy(out, sctx->st, sctx->md_len);
> +
> +	memset(sctx, 0, sizeof(*sctx));
> +}
> +
> +static struct full_hash hash512(void *p1, int n1, void *p2, int n2)
> +{
> +	struct sha3_state state;
> +	struct full_hash hash;
> +	/*
> +	 * Calculate a 512bit hash from both inputs.  Second input may
> +	 * be empty.
> +	 */
> +	sha3_512_init(&state);
> +	sha3_update(&state, p1, n1);
> +	if (p2)
> +		sha3_update(&state, p2, n2);
> +	sha3_final(&state, &hash);
> +	return hash;
> +}
> +
> +static void wait_a_while(void)
> +{
> +	/*
> +	 * Some delay function - only used when asking for random
> +	 * numbers before the pool has initialized.  Feel free to do
> +	 * something fancy like completions if you don't want a
> +	 * spinning loop.
> +	 */
> +	wait_event_interruptible(uninitialized_wait, !uninitialized_count);
> +}
> +
> +#if 1 /* enable to estimate boottime entropy */
> +static uint32_t boot_log[SAMPLES_NEEDED][3];
> +static uint32_t tsc0;
> +
> +static void log_boot_entropy(struct pt_regs *regs, uint64_t cascade_hash)
> +{
> +	struct full_hash hash;
> +	int i, slot = SAMPLES_NEEDED - uninitialized_count;
> +
> +	if ((unsigned)slot > SAMPLES_NEEDED)
> +		return;
> +	if (slot == 0)
> +		tsc0 = rdtsc();
> +	hash = hash512(regs, sizeof(struct pt_regs), NULL, 0);
> +	boot_log[slot][0] = hash.h[0];
> +	boot_log[slot][1] = cascade_hash;
> +	boot_log[slot][2] = rdtsc() - tsc0;
> +	if (uninitialized_count == 1) {
> +		for (i = 0; i < SAMPLES_NEEDED; i++) {
> +			pr_info("boot_entropy %03d: %08x %08x %08x\n", i, boot_log[i][0], boot_log[i][1], boot_log[i][2]);
> +		}
> +	}
> +}
> +#else
> +static inline void log_boot_entropy(struct pt_regs *regs, uint64_t cascade_hash) {}
> +#endif
> +
> +static void produce_entropy_from_irq_regs(struct pt_regs *regs)
> +{
> +	/*
> +	 * Should be called on every interrupt.  Arguments should be
> +	 * start and size of saved register state at the time of
> +	 * interrupt.
> +	 */
> +	struct per_cpu_state *state = get_local_state();
> +	uint64_t time = get_time();
> +	struct full_hash hash;
> +
> +	/* Ratelimit to reduce interrupt overhead */
> +	if (state->p_time == time && !uninitialized_count)
> +		return;
> +	state->p_time = time;
> +
> +	/* Mix local state and new entropy from registers */
> +	hash = hash512(state, sizeof(state->half), regs, sizeof(struct pt_regs));
> +	state->half = hash.half[0];
> +
> +	/* Only mix with global state, if uncontended */
> +	if (!spin_trylock(&global_lock))
> +		return;
> +
> +	/* Mix local and global state.  */
> +	hash = hash512(&global_pool, sizeof(global_pool), &hash, sizeof(hash));
> +	state->half = hash.half[0];
> +	global_pool = hash.half[1];
> +
> +	if (uninitialized_count) {
> +		log_boot_entropy(regs, hash.h[0]);
> +		uninitialized_count--;
> +		if (uninitialized_count == 0) {
> +			pr_info("random pool fully initialized\n");
> +			wake_up(&uninitialized_wait);
> +		}
> +	}
> +	spin_unlock(&global_lock);
> +}
> +
> +static int regrand_init(void)
> +{
> +	struct full_hash hash;
> +
> +	pr_info("random pool initialization start\n");
> +	/* Interrupts before this loop may be synchronized */
> +	uninitialized_count = SAMPLES_NEEDED;
> +	/* without volatile, gcc will turn this into while (1) */
> +	while (ACCESS_ONCE(uninitialized_count)) {
> +		hash = hash512(&hash, sizeof(hash), NULL, 0);
> +	}
> +	return 0;
> +}
> +early_initcall(regrand_init);
> +
> +/*
> + * Returns 32 likely-random bytes.  No guarantees, as we don't wait
> + * for enough entropy to accumulate.  Use only if you need semi-good
> + * entropy early in boot before RNG initialization can happen.
> + */
> +static struct half_hash consume_entropy_unsafe(void)
> +{
> +	struct per_cpu_state *state;
> +	uint64_t time = get_time();
> +	struct full_hash hash;
> +
> +	preempt_disable();
> +	state = get_local_state();
> +	if (state->c_time != time || uninitialized_count) {
> +		state->c_time = time;
> +		/* Mix global pool into local pool, no locking */
> +		hash = hash512(&global_pool, sizeof(global_pool), &state->half, sizeof(state->half));
> +	} else {
> +		/* Only use local pool, reducing contention and overhead */
> +		hash = hash512(&state->half, sizeof(state->half), NULL, 0);
> +	}
> +	state->half = hash.half[0];
> +	preempt_enable();
> +	return hash.half[1];
> +}
> +
> +/* Doesn't block, use only very early in bootup */
> +void __get_random_bytes(void *buf, int nbytes)
> +{
> +	while (nbytes > 0) {
> +		struct half_hash h = consume_entropy_unsafe();
> +		memcpy(buf, &h, min_t(size_t, nbytes, sizeof(h)));
> +		buf += sizeof(h);
> +		nbytes -= sizeof(h);
> +	}
> +}
> +
> +void get_random_bytes(void *buf, int nbytes)
> +{
> +	WARN(uninitialized_count, "Using entropy before initialization");
> +	while (uninitialized_count)
> +		wait_a_while();
> +
> +	while (nbytes > 0) {
> +		struct half_hash h = consume_entropy_unsafe();
> +		memcpy(buf, &h, min_t(size_t, nbytes, sizeof(h)));
> +		buf += sizeof(h);
> +		nbytes -= sizeof(h);
> +	}
> +}
> +EXPORT_SYMBOL(get_random_bytes);
> +
> +static ssize_t _random_read(int nonblock, char __user *buf, size_t nbytes)
> +{
> +	ssize_t ret = 0, n;
> +
> +	WARN(uninitialized_count, "Using entropy before initialization");
> +	if (uninitialized_count && nonblock)
> +		return -EAGAIN;
> +	while (uninitialized_count)
> +		wait_a_while();
> +	while (nbytes > 0) {
> +		struct half_hash h = consume_entropy_unsafe();
> +		n = min_t(size_t, nbytes, sizeof(h));
> +		if (copy_to_user(buf, &h, n))
> +			return -EFAULT;
> +		buf += n;
> +		nbytes -= n;
> +		ret += n;
> +	}
> +	return ret;
> +}
> +
> +#ifdef CONFIG_SYSCTL
> +
> +/*
> + * Mostly a fake interface for compatibility with what random.c
> + * provided.  Not actually used for anything but to appease userspace.
> + */
> +#include <linux/sysctl.h>
> +
> +static int random_min_urandom_seed = 60;
> +static char sysctl_bootid[16];
> +static int entropy_avail = 8 * sizeof(struct half_hash);
> +static int sysctl_poolsize = 8 * sizeof(struct half_hash);
> +static int random_read_wakeup_bits = 64;
> +static int random_write_wakeup_bits = 896;
> +
> +/*
> + * This function is used to return both the bootid UUID, and random
> + * UUID.  The difference is in whether table->data is NULL; if it is,
> + * then a new UUID is generated and returned to the user.
> + *
> + * If the user accesses this via the proc interface, the UUID will be
> + * returned as an ASCII string in the standard UUID format; if via the
> + * sysctl system call, as 16 bytes of binary data.
> + */
> +static int proc_do_uuid(struct ctl_table *table, int write,
> +			void __user *buffer, size_t *lenp, loff_t *ppos)
> +{
> +	struct ctl_table fake_table;
> +	unsigned char buf[64], tmp_uuid[16], *uuid;
> +
> +	uuid = table->data;
> +	if (!uuid) {
> +		uuid = tmp_uuid;
> +		generate_random_uuid(uuid);
> +	} else {
> +		static DEFINE_SPINLOCK(bootid_spinlock);
> +
> +		spin_lock(&bootid_spinlock);
> +		if (!uuid[8])
> +			generate_random_uuid(uuid);
> +		spin_unlock(&bootid_spinlock);
> +	}
> +
> +	sprintf(buf, "%pU", uuid);
> +
> +	fake_table.data = buf;
> +	fake_table.maxlen = sizeof(buf);
> +
> +	return proc_dostring(&fake_table, write, buffer, lenp, ppos);
> +}
> +
> +struct ctl_table random_table[] = {
> +	{
> +		.procname	= "poolsize",
> +		.data		= &sysctl_poolsize,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0444,
> +		.proc_handler	= proc_dointvec,
> +	},
> +	{
> +		.procname	= "entropy_avail",
> +		.data		= &entropy_avail,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0444,
> +		.proc_handler	= proc_dointvec,
> +	},
> +	{
> +		.procname	= "read_wakeup_threshold",
> +		.data		= &random_read_wakeup_bits,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec,
> +	},
> +	{
> +		.procname	= "write_wakeup_threshold",
> +		.data		= &random_write_wakeup_bits,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec,
> +	},
> +	{
> +		.procname	= "urandom_min_reseed_secs",
> +		.data		= &random_min_urandom_seed,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec,
> +	},
> +	{
> +		.procname	= "boot_id",
> +		.data		= &sysctl_bootid,
> +		.maxlen		= 16,
> +		.mode		= 0444,
> +		.proc_handler	= proc_do_uuid,
> +	},
> +	{
> +		.procname	= "uuid",
> +		.maxlen		= 16,
> +		.mode		= 0444,
> +		.proc_handler	= proc_do_uuid,
> +	},
> +	{ }
> +};
> +#endif 	/* CONFIG_SYSCTL */
> +
> +u64 get_random_u64(void)
> +{
> +	u64 ret;
> +	get_random_bytes(&ret, sizeof(ret));
> +	return ret;
> +}
> +EXPORT_SYMBOL(get_random_u64);
> +
> +u32 get_random_u32(void)
> +{
> +	u32 ret;
> +	get_random_bytes(&ret, sizeof(ret));
> +	return ret;
> +}
> +EXPORT_SYMBOL(get_random_u32);
> +
> +/**
> + * randomize_page - Generate a random, page aligned address
> + * @start:	The smallest acceptable address the caller will take.
> + * @range:	The size of the area, starting at @start, within which the
> + *		random address must fall.
> + *
> + * If @start + @range would overflow, @range is capped.
> + *
> + * NOTE: Historical use of randomize_range, which this replaces, presumed that
> + * @start was already page aligned.  We now align it regardless.
> + *
> + * Return: A page aligned address within [start, start + range).  On error,
> + * @start is returned.
> + */
> +unsigned long
> +randomize_page(unsigned long start, unsigned long range)
> +{
> +	if (!PAGE_ALIGNED(start)) {
> +		range -= PAGE_ALIGN(start) - start;
> +		start = PAGE_ALIGN(start);
> +	}
> +
> +	if (start > ULONG_MAX - range)
> +		range = ULONG_MAX - start;
> +
> +	range >>= PAGE_SHIFT;
> +
> +	if (range == 0)
> +		return start;
> +
> +	return start + (get_random_long() % range << PAGE_SHIFT);
> +}
> +
> +SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, count,
> +		unsigned int, flags)
> +{
> +	int ret;
> +
> +	if (flags & ~(GRND_NONBLOCK|GRND_RANDOM))
> +		return -EINVAL;
> +
> +	if (count > INT_MAX)
> +		count = INT_MAX;
> +
> +	if (uninitialized_count) {
> +		if (flags & GRND_NONBLOCK)
> +			return -EAGAIN;
> +		ret = wait_for_random_bytes();
> +		if (unlikely(ret))
> +			return ret;
> +	}
> +	return _random_read(flags & GRND_NONBLOCK, buf, count);
> +}
> +
> +static ssize_t random_read(struct file *file, char __user *buf, size_t nbytes,
> +		loff_t *ppos)
> +{
> +	return _random_read(file->f_flags & O_NONBLOCK, buf, nbytes);
> +}
> +
> +static ssize_t random_write(struct file *file, const char __user *buffer,
> +		size_t count, loff_t *ppos)
> +{
> +	return 0;
> +}
> +
> +static long random_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
> +{
> +	int size, ent_count = 4096;
> +	int __user *p = (int __user *)arg;
> +
> +	/* just noop implementations, hardly worth the error checking */
> +	switch (cmd) {
> +	case RNDGETENTCNT:
> +		if (put_user(ent_count, p))
> +			return -EFAULT;
> +		return 0;
> +	case RNDADDTOENTCNT:
> +		if (!capable(CAP_SYS_ADMIN))
> +			return -EPERM;
> +		if (get_user(ent_count, p))
> +			return -EFAULT;
> +		return 0;
> +	case RNDADDENTROPY:
> +		if (!capable(CAP_SYS_ADMIN))
> +			return -EPERM;
> +		if (get_user(ent_count, p++))
> +			return -EFAULT;
> +		if (ent_count < 0)
> +			return -EINVAL;
> +		if (get_user(size, p++))
> +			return -EFAULT;
> +		return 0;
> +	case RNDZAPENTCNT:
> +	case RNDCLEARPOOL:
> +		if (!capable(CAP_SYS_ADMIN))
> +			return -EPERM;
> +		return 0;
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
> +static struct fasync_struct *fasync;
> +
> +static int random_fasync(int fd, struct file *filp, int on)
> +{
> +	return fasync_helper(fd, filp, on, &fasync);
> +}
> +
> +const struct file_operations random_fops = {
> +	.read  = random_read,
> +	.write = random_write,
> +	.unlocked_ioctl = random_ioctl,
> +	.fasync = random_fasync,
> +	.llseek = noop_llseek,
> +};
> +
> +const struct file_operations urandom_fops = {
> +	.read  = random_read,
> +	.write = random_write,
> +	.unlocked_ioctl = random_ioctl,
> +	.fasync = random_fasync,
> +	.llseek = noop_llseek,
> +};
> +
> +int wait_for_random_bytes(void)
> +{
> +	if (likely(!uninitialized_count))
> +		return 0;
> +	WARN(uninitialized_count, "Using entropy before initialization");
> +	return wait_event_interruptible(uninitialized_wait, !uninitialized_count);
> +}
> +EXPORT_SYMBOL(wait_for_random_bytes);
> +
> +void add_interrupt_randomness(int irq, int irq_flags)
> +{
> +	produce_entropy_from_irq_regs(get_irq_regs());
> +}
> +EXPORT_SYMBOL_GPL(add_interrupt_randomness);
> diff --git a/include/linux/genhd.h b/include/linux/genhd.h
> index ea652bfcd675..635b5723bedf 100644
> --- a/include/linux/genhd.h
> +++ b/include/linux/genhd.h
> @@ -410,8 +410,13 @@ extern void disk_flush_events(struct gendisk *disk, unsigned int mask);
>  extern unsigned int disk_clear_events(struct gendisk *disk, unsigned int mask);
>  
>  /* drivers/char/random.c */
> +#ifdef CONFIG_REGRAND
> +static inline void add_disk_randomness(struct gendisk *disk) {}
> +static inline void rand_initialize_disk(struct gendisk *disk) {}
> +#else
>  extern void add_disk_randomness(struct gendisk *disk) __latent_entropy;
>  extern void rand_initialize_disk(struct gendisk *disk);
> +#endif
>  
>  static inline sector_t get_start_sect(struct block_device *bdev)
>  {
> diff --git a/include/linux/hw_random.h b/include/linux/hw_random.h
> index bee0827766a3..d64fc3c02ef6 100644
> --- a/include/linux/hw_random.h
> +++ b/include/linux/hw_random.h
> @@ -60,6 +60,10 @@ extern int devm_hwrng_register(struct device *dev, struct hwrng *rng);
>  extern void hwrng_unregister(struct hwrng *rng);
>  extern void devm_hwrng_unregister(struct device *dve, struct hwrng *rng);
>  /** Feed random bits into the pool. */
> +#ifdef CONFIG_REGRAND
> +static inline void add_hwgenerator_randomness(const char *buffer, size_t count, size_t entropy) {}
> +#else
>  extern void add_hwgenerator_randomness(const char *buffer, size_t count, size_t entropy);
> +#endif
>  
>  #endif /* LINUX_HWRANDOM_H_ */
> diff --git a/include/linux/random.h b/include/linux/random.h
> index eafea6a09361..0f85af279864 100644
> --- a/include/linux/random.h
> +++ b/include/linux/random.h
> @@ -17,6 +17,17 @@ struct random_ready_callback {
>  	struct module *owner;
>  };
>  
> +#ifdef CONFIG_REGRAND
> +static inline void add_device_randomness(const void *a, unsigned int b) {}
> +static inline void add_latent_entropy(void) {}
> +static inline void add_input_randomness(unsigned int type, unsigned int code,
> +		unsigned int value) {}
> +static inline int add_random_ready_callback(struct random_ready_callback *rdy) { return 0; }
> +static inline void del_random_ready_callback(struct random_ready_callback *rdy) {}
> +
> +void add_interrupt_randomness(int irq, int irq_flags);
> +void __get_random_bytes(void *buf, int nbytes);
> +#else /* CONFIG_REGRAND */
>  extern void add_device_randomness(const void *, unsigned int);
>  
>  #if defined(CONFIG_GCC_PLUGIN_LATENT_ENTROPY) && !defined(__CHECKER__)
> @@ -27,16 +38,19 @@ static inline void add_latent_entropy(void)
>  }
>  #else
>  static inline void add_latent_entropy(void) {}
> +extern int add_random_ready_callback(struct random_ready_callback *rdy);
> +extern void del_random_ready_callback(struct random_ready_callback *rdy);
> +
> +#define __get_random_bytes get_random_bytes
>  #endif
>  
>  extern void add_input_randomness(unsigned int type, unsigned int code,
>  				 unsigned int value) __latent_entropy;
>  extern void add_interrupt_randomness(int irq, int irq_flags) __latent_entropy;
> +#endif /* CONFIG_REGRAND */
>  
>  extern void get_random_bytes(void *buf, int nbytes);
>  extern int wait_for_random_bytes(void);
> -extern int add_random_ready_callback(struct random_ready_callback *rdy);
> -extern void del_random_ready_callback(struct random_ready_callback *rdy);
>  extern void get_random_bytes_arch(void *buf, int nbytes);
>  
>  #ifndef MODULE
> @@ -72,13 +86,6 @@ static inline unsigned long get_random_long(void)
>  # define CANARY_MASK 0xffffffffUL
>  #endif
>  
> -static inline unsigned long get_random_canary(void)
> -{
> -	unsigned long val = get_random_long();
> -
> -	return val & CANARY_MASK;
> -}
> -
>  /* Calls wait_for_random_bytes() and then calls get_random_bytes(buf, nbytes).
>   * Returns the result of the call to wait_for_random_bytes. */
>  static inline int get_random_bytes_wait(void *buf, int nbytes)
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 07cc743698d3..5cdd7eabc984 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -562,7 +562,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
>  	set_task_stack_end_magic(tsk);
>  
>  #ifdef CONFIG_CC_STACKPROTECTOR
> -	tsk->stack_canary = get_random_canary();
> +	__get_random_bytes(&tsk->stack_canary, sizeof(tsk->stack_canary));
>  #endif
>  
>  	/*
> diff --git a/kernel/panic.c b/kernel/panic.c
> index bdd18afa19a4..e063564074af 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -483,7 +483,7 @@ static u64 oops_id;
>  static int init_oops_id(void)
>  {
>  	if (!oops_id)
> -		get_random_bytes(&oops_id, sizeof(oops_id));
> +		__get_random_bytes(&oops_id, sizeof(oops_id));
>  	else
>  		oops_id++;
>  
> diff --git a/mm/slab.c b/mm/slab.c
> index 04dec48c3ed7..eba7ad862119 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -2474,7 +2474,7 @@ static bool freelist_state_initialize(union freelist_init_state *state,
>  	unsigned int rand;
>  
>  	/* Use best entropy available to define a random shift */
> -	rand = get_random_int();
> +	__get_random_bytes(&rand, sizeof(rand));
>  
>  	/* Use a random state if the pre-computed list is not available */
>  	if (!cachep->random_seq) {
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 80164599ca5d..de658698e010 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -1160,6 +1160,7 @@ int cache_random_seq_create(struct kmem_cache *cachep, unsigned int count,
>  				    gfp_t gfp)
>  {
>  	struct rnd_state state;
> +	unsigned long seed;
>  
>  	if (count < 2 || cachep->random_seq)
>  		return 0;
> @@ -1169,7 +1170,8 @@ int cache_random_seq_create(struct kmem_cache *cachep, unsigned int count,
>  		return -ENOMEM;
>  
>  	/* Get best entropy at this stage of boot */
> -	prandom_seed_state(&state, get_random_long());
> +	__get_random_bytes(&seed, sizeof(seed));
> +	prandom_seed_state(&state, seed);
>  
>  	freelist_randomize(&state, cachep->random_seq, count);
>  	return 0;
> diff --git a/mm/slub.c b/mm/slub.c
> index 163352c537ab..f97387e833f7 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1523,7 +1523,8 @@ static bool shuffle_freelist(struct kmem_cache *s, struct page *page)
>  		return false;
>  
>  	freelist_count = oo_objects(s->oo);
> -	pos = get_random_int() % freelist_count;
> +	__get_random_bytes(&pos, sizeof(pos));
> +	pos %= freelist_count;
>  
>  	page_limit = page->objects * s->size;
>  	start = fixup_red_left(s, page_address(page));
> @@ -3597,7 +3598,7 @@ static int kmem_cache_open(struct kmem_cache *s, unsigned long flags)
>  	s->flags = kmem_cache_flags(s->size, flags, s->name, s->ctor);
>  	s->reserved = 0;
>  #ifdef CONFIG_SLAB_FREELIST_HARDENED
> -	s->random = get_random_long();
> +	__get_random_bytes(&s->random, sizeof(s->random));
>  #endif
>  
>  	if (need_reserve_slab_rcu && (s->flags & SLAB_TYPESAFE_BY_RCU))
> -- 
> 2.1.4
> 

Jörn

--
It is better to die of hunger having lived without grief and fear,
than to live with a troubled spirit amid abundance.
-- Epictetus

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] random: add regrand
       [not found]     ` <20171111182238.yasi3rzy3ay7u4wg@thunk.org>
@ 2017-11-11 20:13       ` Jörn Engel
  0 siblings, 0 replies; 2+ messages in thread
From: Jörn Engel @ 2017-11-11 20:13 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Kees Cook, Jason A. Donenfeld, Michael Schmitz, Stephan Mueller,
	Jason Cooper, Herbert Xu, H. Peter Anvin, Christoph Hellwig,
	Greg Price, linux-kernel

On Sat, Nov 11, 2017 at 01:22:38PM -0500, Theodore Ts'o wrote:
> On Fri, Nov 10, 2017 at 04:23:21PM -0800, Jörn Engel wrote:
> > On Fri, Nov 10, 2017 at 06:52:12PM -0500, Theodore Ts'o wrote:
> > > Couple of quick comments.  The code as it exists is horrifically x86
> > > specific.  For example: it uses rdtsc().
> > 
> > Sorry, I should have changed that to "#if 0".  The rdtsc() exists only
> > as a means to estimate entropy and is not meant for production.
> 
> But that means you've only done estimates on x86 and are blindly
> extrapolating to all other architectures?

Correct.

I would claim that regrand and dakarand are near-identical twins.
Dakarand seems to have survived all attempts so far, which gives me a
bit more confidence.  Then again, it is questionable how many people
actually tried to attack dakarand on non-x86.

> > - add_interrupt_randomness() mostly depends on a high-resolution timer,
> >   and has to use jiffies on many architectures.
> 
> It samples from registers as well, at which point it's actually not
> that different from your proposed regrand.  The main difference
> ispurel that you are trying to do the bulk of sampling in very early
> boot, which means before we've started using any of the devices.  So
> regrand is purely dependent on there being separate oscillators in use
> between the CPU clock and the timer interrupt.  If there is a single
> master oscillator which is then downsampled to obtain the other
> "clocks" --- how well do you think regrand works in that environment?
> 
> At *least* with the current system, at least some of the interrupts
> will be from the off the SOC chip.  Such as the network interrupt, for
> example.

Regrand continues to collect entropy later on.  That is pretty important
for another reason: snapshot and restore of VMs.  In theory we could get
by with a well-seeded PRNG most of the time - except for boot and VM
restore.  We can detect boot, but we cannot detect VM restore.  So
sampling at some reasonable rate is the best we can do.

The reason I sample "enough" entropy at early boot is basically this:
https://factorable.net/weakkeys12.extended.pdf

If there is a way to mess up by reading from /dev/urandom, I would blame
the kernel and not the users.  If I have to delay bootup to initialize
the random pool, so be it.  The alternative, imo, is worse.

Now the only question is whether we can actually sample enough.  I have
demonstrated that I do when booting VMs on x86.  Not perfect, but a
starting point.

> > Your concern may still be right.  If it is, I think we have lost and
> > there is nothing we can do.
> 
> There is definitely more than we can do.  In particular, I dispute
> your counsel of hopeless over at a number of items you have rejected,
> but definitely this one:
> 
> > - Storing entropy across reboot requires a writeable filesystem and
> >   script support.
> 
> Funny thing; most systems have at least *some* amount of stateful
> store; they wouldn't be terribly useful computers if they didn't!

I have worked on systems that didn't.  Once you go to the cheap end of
embedded systems, you either have no writeable filesystem or a very
limited amount of flash.  Storing entropy at shutdown wouldn't work
because shutdown typically means yanking the power cord.  Storing
entropy every hour/day/whatever will quickly dominate the write load on
those systems and wear out your flash.

I also have limited faith in everyone working on those systems to get
the entropy save/restore script right.  Maybe 90% of them will, but
definitely not 100%.  "You cannot get it wrong" is much stronger than
"carefully read the documentation, write some code of your own and you
probably won't get it wrong".

> What I would do is to make it the responsibility of the bootloader, so
> we can get the entropy early enough that it can be used to seed the
> KASLR.  It may be that on different architectures we will have to
> store entropy in different places; I wouldn't put in the file system
> in most cases, if we can at all help it.  For example, on x86 systems,
> the best place to put it might be an explicit (but small) GPT
> partition if the GPT partitioning scheme is used, or immediately after
> the MBR and GRUB Stage 1.5 at offset 20 sectors from the beginning of
> the disk.

Some kind of bootloader is probably necessary for stackprotector.  You
have to somehow exclude a set of functions from stackprotector and use
only that set of functions to sample the initial entropy for your canary
value.  It would be nice if none of those functions would be reused
after stackprotector is turned on, which would imply something like a
bootloader.

I would prefer not using the regular bootloader, because there are just
too many of them and making sure every last one of them gets it right is
near hopeless.  But if the regular bootloader starts a mini-kernel that
only gathers entropy and then starts the regular kernel, that would be a
good design.

> We would have to do something different on ARM and MIPS systems, sure.
> But eventually we could definte that part of the standard Linux
> environment is a 512 byte random seed state which must be available to
> the boot loader, and passed to the Kernel much like the boot command
> line and initrd are passed to the kernel today.  There must also be a
> standard function which allows root to update the random seed.
> 
> For bonus points, the random seed state could be available for reading
> only during the boot session (for example, it will no longer be
> readable once UEFI boot services are terminated, for example), and
> after that point, the random seed state can only be updated by root
> (and perhaps the trusted hardware would hash the passed-in random seed
> state with the existing seed state).  This would optional and would
> require hardware support, but we already have custom SOC support for
> in-line crypto support, and in some cases, custom support to store
> crypto keys to keep them safe against state-level attackers (like the
> FBI, if you are Apple :-).  In any case, I'm fairly confident I could
> talk to at least one Android handset manufacturer and get them to at
> least consider such an implementation.  :-)

My goal is a bit more audacious.  I don't want a minimum of one system
to have good entropy, I want all systems to have that.  Which means I
cannot require hardware changes.

The existing requirements for Linux are pretty basic.  You need a 32bit
CPU, memory and (I think) interrupt support.  So the goal is to extract
enough entropy from just those basics.  If we cannot do that, we have to
admit that millions of Linux systems cannot have good entropy.

Regrand is the only thing I know that has a chance.  Naturally I cannot
prove that it would work on any system - including those I have never
seen.  We cannot prove the security of AES or SHA3 either.  But we can
try to break those systems and, given a long enough string of failures,
be reasonably confident that breaking those systems is at least pretty
hard.

> > My hope is that it is simply too hard to
> > have an accurate system clock and such tight synchronization between
> > interrupts and the CPU that we cannot extract entropy.
> 
> It's not a question of "accurate system clock" --- it's just a matter
> of hardware manufacturers following the KISS principle.  Especially on
> a system-on-a-chip design, why have two clocks when using one would
> do?  Especially when BOM costs are measured in millicents...

I would love to see an example and try it out.

Jörn

--
My second remark is that our intellectual powers are rather geared to
master static relations and that our powers to visualize processes
evolving in time are relatively poorly developed.
-- Edsger W. Dijkstra

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2017-11-11 20:15 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20171110213039.GA6911@cork>
2017-11-11  1:17 ` [PATCH] random: add regrand Jörn Engel
     [not found] ` <20171110235212.7c2qt7tc46q7w5d2@thunk.org>
     [not found]   ` <20171111002321.GG6911@cork>
     [not found]     ` <20171111182238.yasi3rzy3ay7u4wg@thunk.org>
2017-11-11 20:13       ` Jörn Engel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).