All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] random: always use batched entropy for get_random_u{32,64}
@ 2020-02-16 16:18 Jason A. Donenfeld
  2020-02-16 18:23 ` Greg Kroah-Hartman
  0 siblings, 1 reply; 12+ messages in thread
From: Jason A. Donenfeld @ 2020-02-16 16:18 UTC (permalink / raw)
  To: tytso, linux-kernel; +Cc: Jason A. Donenfeld, Greg Kroah-Hartman

It turns out that RDRAND is pretty slow. Comparing these two
constructions:

  for (i = 0; i < CHACHA_BLOCK_SIZE; i += sizeof(ret))
    arch_get_random_long(&ret);

and

  long buf[CHACHA_BLOCK_SIZE / sizeof(long)];
  extract_crng((u8 *)buf);

it amortizes out to 352 cycles per long for the top one and 107 cycles
per long for the bottom one, on Coffee Lake Refresh, Intel Core i9-9880H.

And importantly, the top one has the drawback of not benefiting from the
real rng, whereas the bottom one has all the nice benefits of using our
own chacha rng. As get_random_u{32,64} gets used in more places (perhaps
beyond what it was originally intended for when it was introduced as
get_random_{int,long} back in the md5 monstrosity era), it seems like it
might be a good thing to strengthen its posture a tiny bit. Doing this
should only be stronger and not any weaker because that pool is already
initialized with a bunch of rdrand data (when available). This way, we
get the benefits of the hardware rng as well as our own rng.

Another benefit of this is that we no longer hit pitfalls of the recent
stream of AMD bugs in RDRAND. One often used code pattern for various
things is:

  do {
  	val = get_random_u32();
  } while (hash_table_contains_key(val));

That recent AMD bug rendered that pattern useless, whereas we're really
very certain that chacha20 output will give pretty distributed numbers,
no matter what.

So, this simplification seems better both from a security perspective
and from a performance perspective.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/char/random.c | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index c7f9584de2c8..037fdb182b4d 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -2166,15 +2166,6 @@ u64 get_random_u64(void)
 	struct batched_entropy *batch;
 	static void *previous;
 
-#if BITS_PER_LONG == 64
-	if (arch_get_random_long((unsigned long *)&ret))
-		return ret;
-#else
-	if (arch_get_random_long((unsigned long *)&ret) &&
-	    arch_get_random_long((unsigned long *)&ret + 1))
-	    return ret;
-#endif
-
 	warn_unseeded_randomness(&previous);
 
 	batch = raw_cpu_ptr(&batched_entropy_u64);
@@ -2199,9 +2190,6 @@ u32 get_random_u32(void)
 	struct batched_entropy *batch;
 	static void *previous;
 
-	if (arch_get_random_int(&ret))
-		return ret;
-
 	warn_unseeded_randomness(&previous);
 
 	batch = raw_cpu_ptr(&batched_entropy_u32);
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] random: always use batched entropy for get_random_u{32,64}
  2020-02-16 16:18 [PATCH] random: always use batched entropy for get_random_u{32,64} Jason A. Donenfeld
@ 2020-02-16 18:23 ` Greg Kroah-Hartman
  2020-02-20 22:20   ` Tony Luck
  0 siblings, 1 reply; 12+ messages in thread
From: Greg Kroah-Hartman @ 2020-02-16 18:23 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: tytso, linux-kernel

On Sun, Feb 16, 2020 at 05:18:36PM +0100, Jason A. Donenfeld wrote:
> It turns out that RDRAND is pretty slow. Comparing these two
> constructions:
> 
>   for (i = 0; i < CHACHA_BLOCK_SIZE; i += sizeof(ret))
>     arch_get_random_long(&ret);
> 
> and
> 
>   long buf[CHACHA_BLOCK_SIZE / sizeof(long)];
>   extract_crng((u8 *)buf);
> 
> it amortizes out to 352 cycles per long for the top one and 107 cycles
> per long for the bottom one, on Coffee Lake Refresh, Intel Core i9-9880H.
> 
> And importantly, the top one has the drawback of not benefiting from the
> real rng, whereas the bottom one has all the nice benefits of using our
> own chacha rng. As get_random_u{32,64} gets used in more places (perhaps
> beyond what it was originally intended for when it was introduced as
> get_random_{int,long} back in the md5 monstrosity era), it seems like it
> might be a good thing to strengthen its posture a tiny bit. Doing this
> should only be stronger and not any weaker because that pool is already
> initialized with a bunch of rdrand data (when available). This way, we
> get the benefits of the hardware rng as well as our own rng.
> 
> Another benefit of this is that we no longer hit pitfalls of the recent
> stream of AMD bugs in RDRAND. One often used code pattern for various
> things is:
> 
>   do {
>   	val = get_random_u32();
>   } while (hash_table_contains_key(val));
> 
> That recent AMD bug rendered that pattern useless, whereas we're really
> very certain that chacha20 output will give pretty distributed numbers,
> no matter what.
> 
> So, this simplification seems better both from a security perspective
> and from a performance perspective.
> 
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> ---
>  drivers/char/random.c | 12 ------------
>  1 file changed, 12 deletions(-)

Looks good to me, thank for doing this:

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] random: always use batched entropy for get_random_u{32,64}
  2020-02-16 18:23 ` Greg Kroah-Hartman
@ 2020-02-20 22:20   ` Tony Luck
  2020-02-20 22:29     ` Tony Luck
  2020-02-21 20:07     ` Jason A. Donenfeld
  0 siblings, 2 replies; 12+ messages in thread
From: Tony Luck @ 2020-02-20 22:20 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jason A. Donenfeld, Ted Ts'o, Linux Kernel Mailing List

On Sun, Feb 16, 2020 at 10:24 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:

> >  drivers/char/random.c | 12 ------------
> >  1 file changed, 12 deletions(-)
>
> Looks good to me, thank for doing this:
>
> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Perhaps also needs to update the comment above these functions.

Specifically the bit that says "The quality of the random
number is either as good as RDRAND" ... since you are
no longer pulling from RDRAND it isn't true anymore.

-Tony

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] random: always use batched entropy for get_random_u{32,64}
  2020-02-20 22:20   ` Tony Luck
@ 2020-02-20 22:29     ` Tony Luck
  2020-02-21 20:08       ` Jason A. Donenfeld
  2020-02-21 20:07     ` Jason A. Donenfeld
  1 sibling, 1 reply; 12+ messages in thread
From: Tony Luck @ 2020-02-20 22:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jason A. Donenfeld, Ted Ts'o, Linux Kernel Mailing List

Also ... what's the deal with a spin_lock on a per-cpu structure?

        batch = raw_cpu_ptr(&batched_entropy_u64);
        spin_lock_irqsave(&batch->batch_lock, flags);
        if (batch->position % ARRAY_SIZE(batch->entropy_u64) == 0) {
                extract_crng((u8 *)batch->entropy_u64);
                batch->position = 0;
        }
        ret = batch->entropy_u64[batch->position++];
        spin_unlock_irqrestore(&batch->batch_lock, flags);

Could we just disable interrupts and pre-emption around the entropy extraction?

-Tony

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] random: always use batched entropy for get_random_u{32,64}
  2020-02-20 22:20   ` Tony Luck
  2020-02-20 22:29     ` Tony Luck
@ 2020-02-21 20:07     ` Jason A. Donenfeld
  2020-02-21 20:10       ` [PATCH v2] " Jason A. Donenfeld
  1 sibling, 1 reply; 12+ messages in thread
From: Jason A. Donenfeld @ 2020-02-21 20:07 UTC (permalink / raw)
  To: Tony Luck; +Cc: Greg Kroah-Hartman, Ted Ts'o, Linux Kernel Mailing List

Hi Tony,

On Thu, Feb 20, 2020 at 11:20 PM Tony Luck <tony.luck@gmail.com> wrote:
>
> On Sun, Feb 16, 2020 at 10:24 AM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
>
> > >  drivers/char/random.c | 12 ------------
> > >  1 file changed, 12 deletions(-)
> >
> > Looks good to me, thank for doing this:
> >
> > Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>
> Perhaps also needs to update the comment above these functions.
>
> Specifically the bit that says "The quality of the random
> number is either as good as RDRAND" ... since you are
> no longer pulling from RDRAND it isn't true anymore.

Good call. I'll fix up this comment and submit v2.


Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] random: always use batched entropy for get_random_u{32,64}
  2020-02-20 22:29     ` Tony Luck
@ 2020-02-21 20:08       ` Jason A. Donenfeld
  2020-02-22  0:41         ` Theodore Y. Ts'o
  0 siblings, 1 reply; 12+ messages in thread
From: Jason A. Donenfeld @ 2020-02-21 20:08 UTC (permalink / raw)
  To: Tony Luck; +Cc: Greg Kroah-Hartman, Ted Ts'o, Linux Kernel Mailing List

On Thu, Feb 20, 2020 at 11:29 PM Tony Luck <tony.luck@gmail.com> wrote:
>
> Also ... what's the deal with a spin_lock on a per-cpu structure?
>
>         batch = raw_cpu_ptr(&batched_entropy_u64);
>         spin_lock_irqsave(&batch->batch_lock, flags);
>         if (batch->position % ARRAY_SIZE(batch->entropy_u64) == 0) {
>                 extract_crng((u8 *)batch->entropy_u64);
>                 batch->position = 0;
>         }
>         ret = batch->entropy_u64[batch->position++];
>         spin_unlock_irqrestore(&batch->batch_lock, flags);
>
> Could we just disable interrupts and pre-emption around the entropy extraction?

Probably, yes... We can address this in a separate patch.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2] random: always use batched entropy for get_random_u{32,64}
  2020-02-21 20:07     ` Jason A. Donenfeld
@ 2020-02-21 20:10       ` Jason A. Donenfeld
  2020-02-28  4:09         ` Theodore Y. Ts'o
  2020-04-01 13:08         ` Nicolai Stange
  0 siblings, 2 replies; 12+ messages in thread
From: Jason A. Donenfeld @ 2020-02-21 20:10 UTC (permalink / raw)
  To: tytso, linux-kernel, gregkh; +Cc: Jason A. Donenfeld

It turns out that RDRAND is pretty slow. Comparing these two
constructions:

  for (i = 0; i < CHACHA_BLOCK_SIZE; i += sizeof(ret))
    arch_get_random_long(&ret);

and

  long buf[CHACHA_BLOCK_SIZE / sizeof(long)];
  extract_crng((u8 *)buf);

it amortizes out to 352 cycles per long for the top one and 107 cycles
per long for the bottom one, on Coffee Lake Refresh, Intel Core i9-9880H.

And importantly, the top one has the drawback of not benefiting from the
real rng, whereas the bottom one has all the nice benefits of using our
own chacha rng. As get_random_u{32,64} gets used in more places (perhaps
beyond what it was originally intended for when it was introduced as
get_random_{int,long} back in the md5 monstrosity era), it seems like it
might be a good thing to strengthen its posture a tiny bit. Doing this
should only be stronger and not any weaker because that pool is already
initialized with a bunch of rdrand data (when available). This way, we
get the benefits of the hardware rng as well as our own rng.

Another benefit of this is that we no longer hit pitfalls of the recent
stream of AMD bugs in RDRAND. One often used code pattern for various
things is:

  do {
  	val = get_random_u32();
  } while (hash_table_contains_key(val));

That recent AMD bug rendered that pattern useless, whereas we're really
very certain that chacha20 output will give pretty distributed numbers,
no matter what.

So, this simplification seems better both from a security perspective
and from a performance perspective.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
Changes v1->v2:

- Tony Luck suggested I also update the comment that referenced the
  no-longer relevant RDRAND.

 drivers/char/random.c | 20 ++++----------------
 1 file changed, 4 insertions(+), 16 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index c7f9584de2c8..a6b77a850ddd 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -2149,11 +2149,11 @@ struct batched_entropy {
 
 /*
  * Get a random word for internal kernel use only. The quality of the random
- * number is either as good as RDRAND or as good as /dev/urandom, with the
- * goal of being quite fast and not depleting entropy. In order to ensure
+ * number is good as /dev/urandom, but there is no backtrack protection, with
+ * the goal of being quite fast and not depleting entropy. In order to ensure
  * that the randomness provided by this function is okay, the function
- * wait_for_random_bytes() should be called and return 0 at least once
- * at any point prior.
+ * wait_for_random_bytes() should be called and return 0 at least once at any
+ * point prior.
  */
 static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u64) = {
 	.batch_lock	= __SPIN_LOCK_UNLOCKED(batched_entropy_u64.lock),
@@ -2166,15 +2166,6 @@ u64 get_random_u64(void)
 	struct batched_entropy *batch;
 	static void *previous;
 
-#if BITS_PER_LONG == 64
-	if (arch_get_random_long((unsigned long *)&ret))
-		return ret;
-#else
-	if (arch_get_random_long((unsigned long *)&ret) &&
-	    arch_get_random_long((unsigned long *)&ret + 1))
-	    return ret;
-#endif
-
 	warn_unseeded_randomness(&previous);
 
 	batch = raw_cpu_ptr(&batched_entropy_u64);
@@ -2199,9 +2190,6 @@ u32 get_random_u32(void)
 	struct batched_entropy *batch;
 	static void *previous;
 
-	if (arch_get_random_int(&ret))
-		return ret;
-
 	warn_unseeded_randomness(&previous);
 
 	batch = raw_cpu_ptr(&batched_entropy_u32);
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] random: always use batched entropy for get_random_u{32,64}
  2020-02-21 20:08       ` Jason A. Donenfeld
@ 2020-02-22  0:41         ` Theodore Y. Ts'o
  2020-02-22  9:59           ` Jason A. Donenfeld
  2020-02-24 20:41           ` Luck, Tony
  0 siblings, 2 replies; 12+ messages in thread
From: Theodore Y. Ts'o @ 2020-02-22  0:41 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Tony Luck, Greg Kroah-Hartman, Linux Kernel Mailing List

On Fri, Feb 21, 2020 at 09:08:19PM +0100, Jason A. Donenfeld wrote:
> On Thu, Feb 20, 2020 at 11:29 PM Tony Luck <tony.luck@gmail.com> wrote:
> >
> > Also ... what's the deal with a spin_lock on a per-cpu structure?
> >
> >         batch = raw_cpu_ptr(&batched_entropy_u64);
> >         spin_lock_irqsave(&batch->batch_lock, flags);
> >         if (batch->position % ARRAY_SIZE(batch->entropy_u64) == 0) {
> >                 extract_crng((u8 *)batch->entropy_u64);
> >                 batch->position = 0;
> >         }
> >         ret = batch->entropy_u64[batch->position++];
> >         spin_unlock_irqrestore(&batch->batch_lock, flags);
> >
> > Could we just disable interrupts and pre-emption around the entropy extraction?
> 
> Probably, yes... We can address this in a separate patch.

No, we can't; take a look at invalidate_batched_entropy(), where we
need invalidate all of per-cpu batched entropy from a single CPU after
we have initialized the the CRNG.

Since most of the time after CRNG initialization, the spinlock for
each CPU will be on that CPU's cacheline, the time to take and release
the spinlock is not going to be material.

					- Ted


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] random: always use batched entropy for get_random_u{32,64}
  2020-02-22  0:41         ` Theodore Y. Ts'o
@ 2020-02-22  9:59           ` Jason A. Donenfeld
  2020-02-24 20:41           ` Luck, Tony
  1 sibling, 0 replies; 12+ messages in thread
From: Jason A. Donenfeld @ 2020-02-22  9:59 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Tony Luck, Greg Kroah-Hartman, Linux Kernel Mailing List

Hi Ted,

On 2/22/20, Theodore Y. Ts'o <tytso@mit.edu> wrote:

> No, we can't; take a look at invalidate_batched_entropy(), where we
> need invalidate all of per-cpu batched entropy from a single CPU after
> we have initialized the the CRNG.
>
> Since most of the time after CRNG initialization, the spinlock for
> each CPU will be on that CPU's cacheline, the time to take and release
> the spinlock is not going to be material.

Ah, you're right. I hadn't looked in detail before, hence mentioning
"probably" and deferring to a new patch. Thanks for looking into it.

Regarding this patch here, can you apply the v2 that I posted to your
random.git tree?

Thanks,
Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] random: always use batched entropy for get_random_u{32,64}
  2020-02-22  0:41         ` Theodore Y. Ts'o
  2020-02-22  9:59           ` Jason A. Donenfeld
@ 2020-02-24 20:41           ` Luck, Tony
  1 sibling, 0 replies; 12+ messages in thread
From: Luck, Tony @ 2020-02-24 20:41 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Jason A. Donenfeld, Tony Luck, Greg Kroah-Hartman,
	Linux Kernel Mailing List

On Fri, Feb 21, 2020 at 07:41:33PM -0500, Theodore Y. Ts'o wrote:
> On Fri, Feb 21, 2020 at 09:08:19PM +0100, Jason A. Donenfeld wrote:
> > On Thu, Feb 20, 2020 at 11:29 PM Tony Luck <tony.luck@gmail.com> wrote:
> > >
> > > Could we just disable interrupts and pre-emption around the entropy extraction?
> > 
> > Probably, yes... We can address this in a separate patch.
> 
> No, we can't; take a look at invalidate_batched_entropy(), where we
> need invalidate all of per-cpu batched entropy from a single CPU after
> we have initialized the the CRNG.
> 
> Since most of the time after CRNG initialization, the spinlock for
> each CPU will be on that CPU's cacheline, the time to take and release
> the spinlock is not going to be material.

So we could get rid of the spin lock by replacing with a "bool"
that is written when we want to do an invalidate on the next call
(where it is read and cleared).

For me it makes a 15 cycle difference (56 vs. 71) for the fast
case when we are just picking a value from the array. The slow
path when we do extract_crng() is barely changed (731 vs 736 cycles).

But I took the "do lazily" comment above invalidate_batched_entropy()
very literally and didn't add any fences to make sure that readers
of need_invalidate see the store ASAP. So a close race where the
invalidate request would have won control of the spin lock might
not get processed until a subsequent call.

If you think a fence is needed, the the advantage will be lost
and the below patch is worthless.

-Tony

diff --git a/drivers/char/random.c b/drivers/char/random.c
index a6b77a850ddd..6fb222996ea4 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -2144,7 +2144,7 @@ struct batched_entropy {
 		u32 entropy_u32[CHACHA_BLOCK_SIZE / sizeof(u32)];
 	};
 	unsigned int position;
-	spinlock_t batch_lock;
+	bool need_invalidate;
 };
 
 /*
@@ -2155,9 +2155,7 @@ struct batched_entropy {
  * wait_for_random_bytes() should be called and return 0 at least once at any
  * point prior.
  */
-static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u64) = {
-	.batch_lock	= __SPIN_LOCK_UNLOCKED(batched_entropy_u64.lock),
-};
+static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u64);
 
 u64 get_random_u64(void)
 {
@@ -2168,21 +2166,23 @@ u64 get_random_u64(void)
 
 	warn_unseeded_randomness(&previous);
 
+	local_irq_save(flags);
+	preempt_disable();
 	batch = raw_cpu_ptr(&batched_entropy_u64);
-	spin_lock_irqsave(&batch->batch_lock, flags);
-	if (batch->position % ARRAY_SIZE(batch->entropy_u64) == 0) {
+	if (batch->need_invalidate ||
+	    batch->position % ARRAY_SIZE(batch->entropy_u64) == 0) {
 		extract_crng((u8 *)batch->entropy_u64);
 		batch->position = 0;
+		batch->need_invalidate = false;
 	}
 	ret = batch->entropy_u64[batch->position++];
-	spin_unlock_irqrestore(&batch->batch_lock, flags);
+	preempt_enable();
+	local_irq_restore(flags);
 	return ret;
 }
 EXPORT_SYMBOL(get_random_u64);
 
-static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u32) = {
-	.batch_lock	= __SPIN_LOCK_UNLOCKED(batched_entropy_u32.lock),
-};
+static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u32);
 u32 get_random_u32(void)
 {
 	u32 ret;
@@ -2192,14 +2192,18 @@ u32 get_random_u32(void)
 
 	warn_unseeded_randomness(&previous);
 
+	local_irq_save(flags);
+	preempt_disable();
 	batch = raw_cpu_ptr(&batched_entropy_u32);
-	spin_lock_irqsave(&batch->batch_lock, flags);
-	if (batch->position % ARRAY_SIZE(batch->entropy_u32) == 0) {
+	if (batch->need_invalidate ||
+	    batch->position % ARRAY_SIZE(batch->entropy_u32) == 0) {
 		extract_crng((u8 *)batch->entropy_u32);
 		batch->position = 0;
+		batch->need_invalidate = false;
 	}
 	ret = batch->entropy_u32[batch->position++];
-	spin_unlock_irqrestore(&batch->batch_lock, flags);
+	preempt_enable();
+	local_irq_restore(flags);
 	return ret;
 }
 EXPORT_SYMBOL(get_random_u32);
@@ -2217,14 +2221,10 @@ static void invalidate_batched_entropy(void)
 		struct batched_entropy *batched_entropy;
 
 		batched_entropy = per_cpu_ptr(&batched_entropy_u32, cpu);
-		spin_lock_irqsave(&batched_entropy->batch_lock, flags);
-		batched_entropy->position = 0;
-		spin_unlock(&batched_entropy->batch_lock);
+		batched_entropy->need_invalidate = true;
 
 		batched_entropy = per_cpu_ptr(&batched_entropy_u64, cpu);
-		spin_lock(&batched_entropy->batch_lock);
-		batched_entropy->position = 0;
-		spin_unlock_irqrestore(&batched_entropy->batch_lock, flags);
+		batched_entropy->need_invalidate = true;
 	}
 }
 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] random: always use batched entropy for get_random_u{32,64}
  2020-02-21 20:10       ` [PATCH v2] " Jason A. Donenfeld
@ 2020-02-28  4:09         ` Theodore Y. Ts'o
  2020-04-01 13:08         ` Nicolai Stange
  1 sibling, 0 replies; 12+ messages in thread
From: Theodore Y. Ts'o @ 2020-02-28  4:09 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: linux-kernel, gregkh

On Fri, Feb 21, 2020 at 09:10:37PM +0100, Jason A. Donenfeld wrote:
> It turns out that RDRAND is pretty slow. Comparing these two
> constructions:
> 
>   for (i = 0; i < CHACHA_BLOCK_SIZE; i += sizeof(ret))
>     arch_get_random_long(&ret);
> 
> and
> 
>   long buf[CHACHA_BLOCK_SIZE / sizeof(long)];
>   extract_crng((u8 *)buf);
> 
> it amortizes out to 352 cycles per long for the top one and 107 cycles
> per long for the bottom one, on Coffee Lake Refresh, Intel Core i9-9880H.
> 
> And importantly, the top one has the drawback of not benefiting from the
> real rng, whereas the bottom one has all the nice benefits of using our
> own chacha rng. As get_random_u{32,64} gets used in more places (perhaps
> beyond what it was originally intended for when it was introduced as
> get_random_{int,long} back in the md5 monstrosity era), it seems like it
> might be a good thing to strengthen its posture a tiny bit. Doing this
> should only be stronger and not any weaker because that pool is already
> initialized with a bunch of rdrand data (when available). This way, we
> get the benefits of the hardware rng as well as our own rng.
> 
> Another benefit of this is that we no longer hit pitfalls of the recent
> stream of AMD bugs in RDRAND. One often used code pattern for various
> things is:
> 
>   do {
>   	val = get_random_u32();
>   } while (hash_table_contains_key(val));
> 
> That recent AMD bug rendered that pattern useless, whereas we're really
> very certain that chacha20 output will give pretty distributed numbers,
> no matter what.
> 
> So, this simplification seems better both from a security perspective
> and from a performance perspective.
> 
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Thanks, applied.

						- Ted

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] random: always use batched entropy for get_random_u{32,64}
  2020-02-21 20:10       ` [PATCH v2] " Jason A. Donenfeld
  2020-02-28  4:09         ` Theodore Y. Ts'o
@ 2020-04-01 13:08         ` Nicolai Stange
  1 sibling, 0 replies; 12+ messages in thread
From: Nicolai Stange @ 2020-04-01 13:08 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: tytso, linux-kernel, gregkh

Hi,

"Jason A. Donenfeld" <Jason@zx2c4.com> writes:

> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index c7f9584de2c8..a6b77a850ddd 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -2149,11 +2149,11 @@ struct batched_entropy {
>  
>  /*
>   * Get a random word for internal kernel use only. The quality of the random
> - * number is either as good as RDRAND or as good as /dev/urandom, with the
> - * goal of being quite fast and not depleting entropy. In order to ensure
> + * number is good as /dev/urandom, but there is no backtrack protection, with
> + * the goal of being quite fast and not depleting entropy. In order to ensure
>   * that the randomness provided by this function is okay, the function
> - * wait_for_random_bytes() should be called and return 0 at least once
> - * at any point prior.
> + * wait_for_random_bytes() should be called and return 0 at least once at any
> + * point prior.
>   */
>  static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u64) = {
>  	.batch_lock	= __SPIN_LOCK_UNLOCKED(batched_entropy_u64.lock),
> @@ -2166,15 +2166,6 @@ u64 get_random_u64(void)
>  	struct batched_entropy *batch;
>  	static void *previous;
>  
> -#if BITS_PER_LONG == 64
> -	if (arch_get_random_long((unsigned long *)&ret))
> -		return ret;
> -#else
> -	if (arch_get_random_long((unsigned long *)&ret) &&
> -	    arch_get_random_long((unsigned long *)&ret + 1))
> -	    return ret;
> -#endif
> -
>  	warn_unseeded_randomness(&previous);

I don't know if this has been reported elsewhere already, but this
warning triggers on x86_64 with CONFIG_SLAB_FREELIST_HARDENED and
CONFIG_SLAB_FREELIST_RANDOM during early boot now:

  [    0.079775] random: random: get_random_u64 called from __kmem_cache_create+0x40/0x550 with crng_init=0

Strictly speaking, this isn't really a problem with this patch -- other
archs without arch_get_random_long() support (like e.g. ppc64le) have
showed this behaviour before.

FWIW, I added a dump_stack() to warn_unseeded_randomness() in order to
capture the resp. call paths of the offending get_random_u64/u32()
callers, i.e. those invoked before rand_initialize(). Those are

  - __kmem_cache_create() and
     init_cache_random_seq()->cache_random_seq_create(), called
     indirectly quite a number of times from
     - kmem_cache_init()
     - kmem_cache_init()->create_kmalloc_caches()
     - vmalloc_init()->kmem_cache_create()
     - sched_init()->kmem_cache_create()
     - radix_tree_init()->kmem_cache_create()
     - workqueue_init_early()->kmem_cache_create()
     - trace_event_init()->kmem_cache_create()

  - __kmalloc()/kmem_cache_alloc()/kmem_cache_alloc_node()/kmem_cache_alloc_trace()
      ->__slab_alloc->___slab_alloc()->new_slab()
    called indirectly through
    - vmalloc_init()
    - workqueue_init_early()
    - trace_event_init()
    - early_irq_init()->alloc_desc()

Two possible ways to work around this came to my mind:
1.) Either reintroduce arch_get_random_long() to get_random_u64/u32(),
    but only for the case that crng_init <= 1.
2.) Or introduce something like
      arch_seed_primary_crng_early()
    to be called early from start_kernel().
    For x86_64 this could be implemented by filling the crng state with
    RDRAND data whereas other archs would fall back to get_cycles(),
    something jitterentropish-like or whatever.

What do you think?

Thanks,

Nicolai


>  
>  	batch = raw_cpu_ptr(&batched_entropy_u64);
> @@ -2199,9 +2190,6 @@ u32 get_random_u32(void)
>  	struct batched_entropy *batch;
>  	static void *previous;
>  
> -	if (arch_get_random_int(&ret))
> -		return ret;
> -
>  	warn_unseeded_randomness(&previous);
>  
>  	batch = raw_cpu_ptr(&batched_entropy_u32);

-- 
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-04-01 13:08 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-16 16:18 [PATCH] random: always use batched entropy for get_random_u{32,64} Jason A. Donenfeld
2020-02-16 18:23 ` Greg Kroah-Hartman
2020-02-20 22:20   ` Tony Luck
2020-02-20 22:29     ` Tony Luck
2020-02-21 20:08       ` Jason A. Donenfeld
2020-02-22  0:41         ` Theodore Y. Ts'o
2020-02-22  9:59           ` Jason A. Donenfeld
2020-02-24 20:41           ` Luck, Tony
2020-02-21 20:07     ` Jason A. Donenfeld
2020-02-21 20:10       ` [PATCH v2] " Jason A. Donenfeld
2020-02-28  4:09         ` Theodore Y. Ts'o
2020-04-01 13:08         ` Nicolai Stange

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.