linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] random: avoid superfluous call to RDRAND in CRNG extraction
@ 2021-12-30 16:50 Jason A. Donenfeld
  2021-12-30 22:13 ` Theodore Ts'o
  0 siblings, 1 reply; 13+ messages in thread
From: Jason A. Donenfeld @ 2021-12-30 16:50 UTC (permalink / raw)
  To: linux-kernel, tytso, linux-crypto; +Cc: Jason A. Donenfeld

RDRAND is not fast. RDRAND is actually quite slow. We've known this for
a while, which is why functions like get_random_u{32,64} were converted
to use batching of our ChaCha-based CRNG instead.

Yet CRNG extraction still includes a call to RDRAND, in the hot path of
every call to get_random_bytes(), /dev/urandom, and getrandom(2).

This call to RDRAND here seems quite superfluous. CRNG is already
extracting things based on a 256-bit key, based on good entropy, which
is then reseeded periodically, updated, backtrack-mutated, and so
forth. The CRNG extraction construction is something that we're already
relying on to be secure and solid. If it's not, that's a serious
problem, and it's unlikely that mixing in a measly 32 bits from RDRAND
is going to alleviate things.

There is one place, though, where such last-ditch moves might be
quasi-sensible, and that's before the CRNG is actually ready. In that case,
we're already very much operating from a position of trying to get
whatever we can, so we might as well throw in the RDRAND call because
why not.

But once the CRNG is actually up, it's simply not sensible. Removing the
call there improves performance on an i7-11850H by 370%. In other words,
the vast majority of the work done by extract_crng() prior to this commit
was devoted to fetching 32 bits of RDRAND.

This commit fixes the issue by only making that call to RDRAND when the
CRNG is not yet ready.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 drivers/char/random.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 54086e9da05b..239b1455b1a8 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1030,7 +1030,7 @@ static void _extract_crng(struct crng_state *crng,
 				    &input_pool : NULL);
 	}
 	spin_lock_irqsave(&crng->lock, flags);
-	if (arch_get_random_long(&v))
+	if (unlikely(!crng_ready()) && arch_get_random_long(&v))
 		crng->state[14] ^= v;
 	chacha20_block(&crng->state[0], out);
 	if (crng->state[12] == 0)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] random: avoid superfluous call to RDRAND in CRNG extraction
  2021-12-30 16:50 [PATCH] random: avoid superfluous call to RDRAND in CRNG extraction Jason A. Donenfeld
@ 2021-12-30 22:13 ` Theodore Ts'o
  2021-12-30 22:58   ` Jason A. Donenfeld
  0 siblings, 1 reply; 13+ messages in thread
From: Theodore Ts'o @ 2021-12-30 22:13 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: linux-kernel, linux-crypto

On Thu, Dec 30, 2021 at 05:50:52PM +0100, Jason A. Donenfeld wrote:
> RDRAND is not fast. RDRAND is actually quite slow. We've known this for
> a while, which is why functions like get_random_u{32,64} were converted
> to use batching of our ChaCha-based CRNG instead.
> 
> Yet CRNG extraction still includes a call to RDRAND, in the hot path of
> every call to get_random_bytes(), /dev/urandom, and getrandom(2).
> 
> This call to RDRAND here seems quite superfluous. CRNG is already
> extracting things based on a 256-bit key, based on good entropy, which
> is then reseeded periodically, updated, backtrack-mutated, and so
> forth. The CRNG extraction construction is something that we're already
> relying on to be secure and solid. If it's not, that's a serious
> problem, and it's unlikely that mixing in a measly 32 bits from RDRAND
> is going to alleviate things.
> 
> There is one place, though, where such last-ditch moves might be
> quasi-sensible, and that's before the CRNG is actually ready. In that case,
> we're already very much operating from a position of trying to get
> whatever we can, so we might as well throw in the RDRAND call because
> why not.

So I'm not sure we how desperately we *need* the 370% performance
improvement, but realistically speaking, in
crng_init_try_arch_early(), which gets called from rand_initialize(),
we will have already set crng->state[4..15] via RDSEED or RDRAND.

So there's no point in setting crng->state[0] from RDRAND.  So if
we're wanting to speed things up, we should just remove the
crng->state[0] <= RDRAND entirely.

Or if we want to improve the security of get_random_bytes() pre
crng_ready(), then we should try to XOR RDRAND bytes into all returned
buffer from get_random_bytes().  In other words, I'd argue that we
should "go big, or go home".  (And if we do have some real,
security-critical users of get_random_bytes() pre-crng_ready(), maybe
"go big" is the right way to go.  Of course, if those do exist, we're
still screwed for those architectures which don't have an RDRAND or
equivalent --- arm32, RISC-V, I'm looking at you.)

					- Ted

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] random: avoid superfluous call to RDRAND in CRNG extraction
  2021-12-30 22:13 ` Theodore Ts'o
@ 2021-12-30 22:58   ` Jason A. Donenfeld
  2021-12-31  3:35     ` Theodore Ts'o
  0 siblings, 1 reply; 13+ messages in thread
From: Jason A. Donenfeld @ 2021-12-30 22:58 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-kernel, linux-crypto

Hi Ted,

On 12/30/21, Theodore Ts'o <tytso@mit.edu> wrote:
> but realistically speaking, in
> crng_init_try_arch_early(), which gets called from rand_initialize(),
> we will have already set crng->state[4..15] via RDSEED or RDRAND.
>
> So there's no point in setting crng->state[0] from RDRAND.  So if
> we're wanting to speed things up, we should just remove the
> crng->state[0] <= RDRAND entirely.

Good point, and that seems reasonable. I'll do that for v+1.

> Or if we want to improve the security of get_random_bytes() pre
> crng_ready(), then we should try to XOR RDRAND bytes into all returned
> buffer from get_random_bytes().  In other words, I'd argue that we
> should "go big, or go home".  (And if we do have some real,
> security-critical users of get_random_bytes() pre-crng_ready(), maybe
> "go big" is the right way to go.

That's a decent way of looking at it. Rather than dallying with
32bits, we may as well go all the way. Or, to compromise on
efficiency, we could just xor in 16 or 32 bytes into the key rows
prior to each extraction. Alternatively, we have fewer things to think
about with the "go home" route, and then it's just a matter of
important users using get_random_bytes_wait(), which I think I mostly
took care of through the tree a few years back.

> So I'm not sure we how desperately we *need* the 370% performance improvement

It's not necessary (aside from, like, people using sendfile to erase
NVMes or something weird?), but it appeals to me for two reasons:
- The superfluous RDRAND with only 32bits really isn't doing much, and
having it there makes the design every so slightly more confusing and
less straightforward.
- I would like to see if at some point (not now, just in the future)
it's feasible, performance wise, to replace all of prandom with
get_batched_random() and company. I was on some thread a few years ago
where a researcher pointed out one place prandom was used when
get_random_u64() should have been, and in the ensuing discussion a few
more places were found with the same issue, and then more. And then
nobody could agree on whether the performance hit was worth it for
whichever security model. And in the end I don't recall anything
really happening. If that whole discussion could magicially go away
because we make all uses secure with no performance hit, it'd be a
major win against footguns like prandom. Maybe it won't be feasible in
the end, but simplifying a design in the process of seeing seems like
decent enough motivation.

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] random: avoid superfluous call to RDRAND in CRNG extraction
  2021-12-30 22:58   ` Jason A. Donenfeld
@ 2021-12-31  3:35     ` Theodore Ts'o
  2021-12-31 11:49       ` [PATCH v2] " Jason A. Donenfeld
  0 siblings, 1 reply; 13+ messages in thread
From: Theodore Ts'o @ 2021-12-31  3:35 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: linux-kernel, linux-crypto

On Thu, Dec 30, 2021 at 11:58:05PM +0100, Jason A. Donenfeld wrote:
> > Or if we want to improve the security of get_random_bytes() pre
> > crng_ready(), then we should try to XOR RDRAND bytes into all returned
> > buffer from get_random_bytes().  In other words, I'd argue that we
> > should "go big, or go home".  (And if we do have some real,
> > security-critical users of get_random_bytes() pre-crng_ready(), maybe
> > "go big" is the right way to go.
> 
> That's a decent way of looking at it. Rather than dallying with
> 32bits, we may as well go all the way. Or, to compromise on
> efficiency, we could just xor in 16 or 32 bytes into the key rows
> prior to each extraction. Alternatively, we have fewer things to think
> about with the "go home" route, and then it's just a matter of
> important users using get_random_bytes_wait(), which I think I mostly
> took care of through the tree a few years back.

I was too lazy to do an audit of all of the get_random_bytes() users
before I wrote my last e-mail, but I'm good with "go home" route ---
especially since actually doing that full audit to make sure we don't
have any pre-crng_ready() security-critical users of
get_random_bytes() would still be important to do on architectures
like RISC-V that don't have a RDRAND equivalent.

The challenge is here is short of making adding a
WARN_ON(!crng_ready()) to get_random_bytes(), it's hard to be sure
that some future security critical user of get_random_bytes() in early
boot won't creep back in.  And last I checked, we still have some
non-security get_random_bytes() users in early boot where the
WARN_ON() isn't going to be welcome.

> - I would like to see if at some point (not now, just in the future)
> it's feasible, performance wise, to replace all of prandom with
> get_batched_random() and company.

That's going to be challenging, I suspect.  Some of the networking
users of prandom() have some *very* strong performance constraints.
Or at least, the networking developers have some benchmarks where they
won't countenance any performance regressions.  When the prandom
implementation was added, some of the networking devs were positively
doing cycle counting to try to trim it down as much as possible....

						- Ted

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2] random: avoid superfluous call to RDRAND in CRNG extraction
  2021-12-31  3:35     ` Theodore Ts'o
@ 2021-12-31 11:49       ` Jason A. Donenfeld
  2021-12-31 17:13         ` Theodore Ts'o
  2022-01-05 15:28         ` [PATCH v2] random: avoid superfluous call to RDRAND in CRNG extraction Ard Biesheuvel
  0 siblings, 2 replies; 13+ messages in thread
From: Jason A. Donenfeld @ 2021-12-31 11:49 UTC (permalink / raw)
  To: linux-kernel, tytso, linux-crypto; +Cc: Jason A. Donenfeld

RDRAND is not fast. RDRAND is actually quite slow. We've known this for
a while, which is why functions like get_random_u{32,64} were converted
to use batching of our ChaCha-based CRNG instead.

Yet CRNG extraction still includes a call to RDRAND, in the hot path of
every call to get_random_bytes(), /dev/urandom, and getrandom(2).

This call to RDRAND here seems quite superfluous. CRNG is already
extracting things based on a 256-bit key, based on good entropy, which
is then reseeded periodically, updated, backtrack-mutated, and so
forth. The CRNG extraction construction is something that we're already
relying on to be secure and solid. If it's not, that's a serious
problem, and it's unlikely that mixing in a measly 32 bits from RDRAND
is going to alleviate things.

And in the case where the CRNG doesn't have enough entropy yet, we're
already initializing the ChaCha key row with RDRAND in
crng_init_try_arch_early().

Removing the call to RDRAND improves performance on an i7-11850H by
370%. In other words, the vast majority of the work done by
extract_crng() prior to this commit was devoted to fetching 32 bits of
RDRAND.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 drivers/char/random.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 4de0feb69781..17ec60948795 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1023,7 +1023,7 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r)
 static void _extract_crng(struct crng_state *crng,
 			  __u8 out[CHACHA_BLOCK_SIZE])
 {
-	unsigned long v, flags, init_time;
+	unsigned long flags, init_time;
 
 	if (crng_ready()) {
 		init_time = READ_ONCE(crng->init_time);
@@ -1033,8 +1033,6 @@ static void _extract_crng(struct crng_state *crng,
 				    &input_pool : NULL);
 	}
 	spin_lock_irqsave(&crng->lock, flags);
-	if (arch_get_random_long(&v))
-		crng->state[14] ^= v;
 	chacha20_block(&crng->state[0], out);
 	if (crng->state[12] == 0)
 		crng->state[13]++;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] random: avoid superfluous call to RDRAND in CRNG extraction
  2021-12-31 11:49       ` [PATCH v2] " Jason A. Donenfeld
@ 2021-12-31 17:13         ` Theodore Ts'o
  2022-01-04  5:03           ` Sandy Harris
  2022-01-05 15:28         ` [PATCH v2] random: avoid superfluous call to RDRAND in CRNG extraction Ard Biesheuvel
  1 sibling, 1 reply; 13+ messages in thread
From: Theodore Ts'o @ 2021-12-31 17:13 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: linux-kernel, linux-crypto

On Fri, Dec 31, 2021 at 12:49:03PM +0100, Jason A. Donenfeld wrote:
> RDRAND is not fast. RDRAND is actually quite slow. We've known this for
> a while, which is why functions like get_random_u{32,64} were converted
> to use batching of our ChaCha-based CRNG instead.
> 
> Yet CRNG extraction still includes a call to RDRAND, in the hot path of
> every call to get_random_bytes(), /dev/urandom, and getrandom(2).
> 
> This call to RDRAND here seems quite superfluous. CRNG is already
> extracting things based on a 256-bit key, based on good entropy, which
> is then reseeded periodically, updated, backtrack-mutated, and so
> forth. The CRNG extraction construction is something that we're already
> relying on to be secure and solid. If it's not, that's a serious
> problem, and it's unlikely that mixing in a measly 32 bits from RDRAND
> is going to alleviate things.
> 
> And in the case where the CRNG doesn't have enough entropy yet, we're
> already initializing the ChaCha key row with RDRAND in
> crng_init_try_arch_early().
> 
> Removing the call to RDRAND improves performance on an i7-11850H by
> 370%. In other words, the vast majority of the work done by
> extract_crng() prior to this commit was devoted to fetching 32 bits of
> RDRAND.
> 
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

Reviewed-by: Theodore Ts'o <tytso@mit.edu>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] random: avoid superfluous call to RDRAND in CRNG extraction
  2021-12-31 17:13         ` Theodore Ts'o
@ 2022-01-04  5:03           ` Sandy Harris
  2022-01-04  5:55             ` Theodore Ts'o
  0 siblings, 1 reply; 13+ messages in thread
From: Sandy Harris @ 2022-01-04  5:03 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Jason A. Donenfeld, LKML, Linux Crypto Mailing List

If we are removing RDRAND, what about adding some
cheaper mixing? Something along these lines?

The current code's mixing is triggered only once in 2^32
iterations, depends only on crng->state[], always changes
the same state word, and introduces no new entropy.

Make it happen more often, depend on a randomly initialised
counter as well as state[], make a data-dependent choice of
word to change, and use random_get_entropy().
---
 drivers/char/random.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 605969ed0f96..d2be079f004d 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -985,6 +985,10 @@ static void crng_reseed(struct crng_state *crng,
struct entropy_store *r)
     }
 }

+#define CC_SHIFT 8
+#define CC_MASK ((1<<CC_SHIFT)-1)
+static u32 cc_count = 0 ;
+
 static void _extract_crng(struct crng_state *crng,
               __u8 out[CHACHA_BLOCK_SIZE])
 {
@@ -998,8 +1002,22 @@ static void _extract_crng(struct crng_state *crng,
     if (arch_get_random_long(&v))
         crng->state[14] ^= v;
     chacha20_block(&crng->state[0], out);
-    if (crng->state[12] == 0)
-        crng->state[13]++;
+        if (cc_count == 0)
+                cc_count = crng->state[9] ^ random_get_entropy() ;
+    switch ((crng->state[12] ^ cc_count) & CC_MASK)        {
+                case 0:
+                        cc_count = crng->state[10] ^ (cc_count>>CC_SHIFT);
+                        break ;
+                case 31: case 97: case 253:
+                        crng->state[crng->state[13]&7]++;
+                        break ;
+                case 61: case 127:
+                        crng->state[crng->state[11]&7] += random_get_entropy();
+                        break ;
+                default:
+                        break ;
+        }
+        cc_count++ ;
     spin_unlock_irqrestore(&crng->lock, flags);
 }

-- 
Signed-off-by: Sandy Harris <sandyinchina@gmail.com>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] random: avoid superfluous call to RDRAND in CRNG extraction
  2022-01-04  5:03           ` Sandy Harris
@ 2022-01-04  5:55             ` Theodore Ts'o
  2022-01-20 15:03               ` Jason A. Donenfeld
  0 siblings, 1 reply; 13+ messages in thread
From: Theodore Ts'o @ 2022-01-04  5:55 UTC (permalink / raw)
  To: Sandy Harris; +Cc: Jason A. Donenfeld, LKML, Linux Crypto Mailing List

On Tue, Jan 04, 2022 at 01:03:43PM +0800, Sandy Harris wrote:
> If we are removing RDRAND, what about adding some
> cheaper mixing? Something along these lines?
> 
> The current code's mixing is triggered only once in 2^32
> iterations, depends only on crng->state[], always changes
> the same state word, and introduces no new entropy.

I wouldn't call it "mixing", because the state array isn't an entropy
pool.

Recall how ChaCha20's state array is set up.  crng->state[0..3]
contain ChaCha20's initial constants, crng->state[4..11] contain the
ChaCha20 key, crng->state[12] is the 32-bit counter (which is
incremented when we call ChaCha20), and crng->state[13..15] is the
96-bit IV.

The IV and counter --- state[12..15] --- is initialized when the CRNG
is initialized.  We replace the key every time the CRNG is reseeded.

But what if we manage to call _extract_crng() more than 2**32 times?
Well, that's what this is all about:

    if (crng->state[12] == 0)
        crng->state[13]++;

What we've effectively done is treat state[12..13] as a 64-bit
counter, and state[14..15] is initialized to a 64-bit random value
("the IV") when the CRNG is initialized, and not updated during the
life of the CRNG.  This is really the only place where we've modified
ChaCha20.

Now, either we believe in the strength of ChaCha20, or we don't.  The
whole *point* of a CRNG is that we rely on the crypto, and adding some
random bit-mashing to mix in the CPU cycle counter into parts of the
ChaCha20 key (state[10..11]) and part of the ChaCha20 IV (state[12])
isn't consistent with the philosophy of a CRNG.  At the very least,
I'd like to get an opinion from a respected cryptographer about what
they think this would buy us (or what it might cost).

If we want to worry about what happens if we could actually manage to
call _extract_crng() more than 2**64 times before the reseed interval
is up --- which *is* one of the benefits of:

   if (arch_get_random_long(^v))
        crng->state[14] ^= v;

I could see doing perhaps this instead:

    if (crng->state[12] == 0) {
        crng->state[13]++;
	if (crng->state[13] == 0) {
	    crng->state[14]++;
	    if (crng->state[14] == 0) {
	        crng->state[15]++;
	    }
	}
   }
	
This essentially makes state[12..15] a 128-bit counter, which is
initialized to a random value when the CRNG is initialized, and we
would continue to treat state[4..11] as the 256 bit ChaCha20 key.
This would be a much more philosophically consistent approach, and
would allow us to more easily reason about the security based on
cryptographic research into ChaCha20 the stream cipher.

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] random: avoid superfluous call to RDRAND in CRNG extraction
  2021-12-31 11:49       ` [PATCH v2] " Jason A. Donenfeld
  2021-12-31 17:13         ` Theodore Ts'o
@ 2022-01-05 15:28         ` Ard Biesheuvel
  1 sibling, 0 replies; 13+ messages in thread
From: Ard Biesheuvel @ 2022-01-05 15:28 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Linux Kernel Mailing List, Theodore Y. Ts'o,
	Linux Crypto Mailing List

On Fri, 31 Dec 2021 at 12:50, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> RDRAND is not fast. RDRAND is actually quite slow. We've known this for
> a while, which is why functions like get_random_u{32,64} were converted
> to use batching of our ChaCha-based CRNG instead.
>
> Yet CRNG extraction still includes a call to RDRAND, in the hot path of
> every call to get_random_bytes(), /dev/urandom, and getrandom(2).
>
> This call to RDRAND here seems quite superfluous. CRNG is already
> extracting things based on a 256-bit key, based on good entropy, which
> is then reseeded periodically, updated, backtrack-mutated, and so
> forth. The CRNG extraction construction is something that we're already
> relying on to be secure and solid. If it's not, that's a serious
> problem, and it's unlikely that mixing in a measly 32 bits from RDRAND
> is going to alleviate things.
>
> And in the case where the CRNG doesn't have enough entropy yet, we're
> already initializing the ChaCha key row with RDRAND in
> crng_init_try_arch_early().
>
> Removing the call to RDRAND improves performance on an i7-11850H by
> 370%. In other words, the vast majority of the work done by
> extract_crng() prior to this commit was devoted to fetching 32 bits of
> RDRAND.
>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
>  drivers/char/random.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 4de0feb69781..17ec60948795 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -1023,7 +1023,7 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r)
>  static void _extract_crng(struct crng_state *crng,
>                           __u8 out[CHACHA_BLOCK_SIZE])
>  {
> -       unsigned long v, flags, init_time;
> +       unsigned long flags, init_time;
>
>         if (crng_ready()) {
>                 init_time = READ_ONCE(crng->init_time);
> @@ -1033,8 +1033,6 @@ static void _extract_crng(struct crng_state *crng,
>                                     &input_pool : NULL);
>         }
>         spin_lock_irqsave(&crng->lock, flags);
> -       if (arch_get_random_long(&v))
> -               crng->state[14] ^= v;
>         chacha20_block(&crng->state[0], out);
>         if (crng->state[12] == 0)
>                 crng->state[13]++;

Given that arch_get_random_long() may be backed by other things than
special instructions on some architectures/platforms, avoiding it if
we can on any path that may be a hot path is good, so

Acked-by: Ard Biesheuvel <ardb@kernel.org>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] random: avoid superfluous call to RDRAND in CRNG extraction
  2022-01-04  5:55             ` Theodore Ts'o
@ 2022-01-20 15:03               ` Jason A. Donenfeld
  2022-01-20 15:07                 ` [PATCH] random: use named fields for adjusting chacha state Jason A. Donenfeld
  0 siblings, 1 reply; 13+ messages in thread
From: Jason A. Donenfeld @ 2022-01-20 15:03 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: LKML

On Tue, Jan 4, 2022 at 6:55 AM Theodore Ts'o <tytso@mit.edu> wrote:
> If we want to worry about what happens if we could actually manage to
> call _extract_crng() more than 2**64 times before the reseed interval
> is up --- which *is* one of the benefits of:
>
>    if (arch_get_random_long(^v))
>         crng->state[14] ^= v;
>
> I could see doing perhaps this instead:
>
>     if (crng->state[12] == 0) {
>         crng->state[13]++;
>         if (crng->state[13] == 0) {
>             crng->state[14]++;
>             if (crng->state[14] == 0) {
>                 crng->state[15]++;
>             }
>         }
>    }

While probably overkill, I've got a patch that does this while also
labeling these constants. I'll send that as a reply to this message.

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] random: use named fields for adjusting chacha state
  2022-01-20 15:03               ` Jason A. Donenfeld
@ 2022-01-20 15:07                 ` Jason A. Donenfeld
  2022-01-20 17:50                   ` Theodore Ts'o
  0 siblings, 1 reply; 13+ messages in thread
From: Jason A. Donenfeld @ 2022-01-20 15:07 UTC (permalink / raw)
  To: Theodore Ts'o, LKML; +Cc: Jason A. Donenfeld

Rather than hard coding [4] and [12] and such in places, use an
anonymous struct where we can actually name the individual fields. In
the process, always extract to the key field and not into the
counter/nonce fields. We only need 256-bits anyway, and before those
extra bytes were only filled in at init time when there's not even any
entropy, so we're not losing anything substantial. We can then make the
nonce field into a u64 and increment it in the impossibly-likely event
that the counter wraps under the same key.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 drivers/char/random.c | 49 ++++++++++++++++++++++++-------------------
 1 file changed, 27 insertions(+), 22 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index b411182df6f6..7413c55c7c5c 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -443,17 +443,23 @@ static DEFINE_SPINLOCK(random_ready_list_lock);
 static LIST_HEAD(random_ready_list);
 
 struct crng_state {
-	u32 state[16];
+	union {
+		u32 state[CHACHA_STATE_WORDS];
+		struct {
+			u32 constants[4];
+			u32 key[CHACHA_KEY_SIZE / sizeof(u32)];
+			u32 counter[2];
+			u64 nonce;
+		};
+	};
 	unsigned long init_time;
 	spinlock_t lock;
 };
 
 static struct crng_state primary_crng = {
 	.lock = __SPIN_LOCK_UNLOCKED(primary_crng.lock),
-	.state[0] = CHACHA_CONSTANT_EXPA,
-	.state[1] = CHACHA_CONSTANT_ND_3,
-	.state[2] = CHACHA_CONSTANT_2_BY,
-	.state[3] = CHACHA_CONSTANT_TE_K,
+	.constants = { CHACHA_CONSTANT_EXPA, CHACHA_CONSTANT_ND_3,
+		       CHACHA_CONSTANT_2_BY, CHACHA_CONSTANT_TE_K }
 };
 
 /*
@@ -750,13 +756,13 @@ static bool crng_init_try_arch(struct crng_state *crng)
 	bool arch_init = true;
 	unsigned long rv;
 
-	for (i = 4; i < 16; i++) {
+	for (i = 0; i < ARRAY_SIZE(crng->key); i++) {
 		if (!arch_get_random_seed_long(&rv) &&
 		    !arch_get_random_long(&rv)) {
 			rv = random_get_entropy();
 			arch_init = false;
 		}
-		crng->state[i] ^= rv;
+		crng->key[i] ^= rv;
 	}
 
 	return arch_init;
@@ -768,13 +774,13 @@ static bool __init crng_init_try_arch_early(struct crng_state *crng)
 	bool arch_init = true;
 	unsigned long rv;
 
-	for (i = 4; i < 16; i++) {
+	for (i = 0; i < ARRAY_SIZE(crng->key); i++) {
 		if (!arch_get_random_seed_long_early(&rv) &&
 		    !arch_get_random_long_early(&rv)) {
 			rv = random_get_entropy();
 			arch_init = false;
 		}
-		crng->state[i] ^= rv;
+		crng->key[i] ^= rv;
 	}
 
 	return arch_init;
@@ -783,14 +789,14 @@ static bool __init crng_init_try_arch_early(struct crng_state *crng)
 static void crng_initialize_secondary(struct crng_state *crng)
 {
 	chacha_init_consts(crng->state);
-	_get_random_bytes(&crng->state[4], sizeof(u32) * 12);
+	_get_random_bytes(&crng->key, sizeof(crng->key));
 	crng_init_try_arch(crng);
 	crng->init_time = jiffies - CRNG_RESEED_INTERVAL - 1;
 }
 
 static void __init crng_initialize_primary(struct crng_state *crng)
 {
-	_extract_entropy(&crng->state[4], sizeof(u32) * 12);
+	_extract_entropy(&crng->key, sizeof(crng->key));
 	if (crng_init_try_arch_early(crng) && trust_cpu && crng_init < 2) {
 		invalidate_batched_entropy();
 		numa_crng_init();
@@ -892,7 +898,7 @@ static size_t crng_fast_load(const u8 *cp, size_t len)
 		spin_unlock_irqrestore(&primary_crng.lock, flags);
 		return 0;
 	}
-	p = (u8 *)&primary_crng.state[4];
+	p = (u8 *)primary_crng.key;
 	while (len > 0 && crng_init_cnt < CRNG_INIT_CNT_THRESH) {
 		p[crng_init_cnt % CHACHA_KEY_SIZE] ^= *cp;
 		cp++; crng_init_cnt++; len--; ret++;
@@ -927,7 +933,7 @@ static int crng_slow_load(const u8 *cp, size_t len)
 	u8 tmp;
 	unsigned int i, max = CHACHA_KEY_SIZE;
 	const u8 *src_buf = cp;
-	u8 *dest_buf = (u8 *)&primary_crng.state[4];
+	u8 *dest_buf = (u8 *)primary_crng.key;
 
 	if (!spin_trylock_irqsave(&primary_crng.lock, flags))
 		return 0;
@@ -970,12 +976,12 @@ static void crng_reseed(struct crng_state *crng, bool use_input_pool)
 					CHACHA_KEY_SIZE);
 	}
 	spin_lock_irqsave(&crng->lock, flags);
-	for (i = 0; i < 8; i++) {
+	for (i = 0; i < ARRAY_SIZE(crng->key); i++) {
 		unsigned long rv;
 		if (!arch_get_random_seed_long(&rv) &&
 		    !arch_get_random_long(&rv))
 			rv = random_get_entropy();
-		crng->state[i + 4] ^= buf.key[i] ^ rv;
+		crng->key[i] ^= buf.key[i] ^ rv;
 	}
 	memzero_explicit(&buf, sizeof(buf));
 	WRITE_ONCE(crng->init_time, jiffies);
@@ -994,9 +1000,9 @@ static void _extract_crng(struct crng_state *crng, u8 out[CHACHA_BLOCK_SIZE])
 			crng_reseed(crng, crng == &primary_crng);
 	}
 	spin_lock_irqsave(&crng->lock, flags);
-	chacha20_block(&crng->state[0], out);
-	if (crng->state[12] == 0)
-		crng->state[13]++;
+	chacha20_block(crng->state, out);
+	if (unlikely(!crng->counter[0] && !++crng->counter[1]))
+		++crng->nonce;
 	spin_unlock_irqrestore(&crng->lock, flags);
 }
 
@@ -1013,7 +1019,7 @@ static void _crng_backtrack_protect(struct crng_state *crng,
 				    u8 tmp[CHACHA_BLOCK_SIZE], int used)
 {
 	unsigned long flags;
-	u32 *s, *d;
+	u32 *s;
 	int i;
 
 	used = round_up(used, sizeof(u32));
@@ -1023,9 +1029,8 @@ static void _crng_backtrack_protect(struct crng_state *crng,
 	}
 	spin_lock_irqsave(&crng->lock, flags);
 	s = (u32 *)&tmp[used];
-	d = &crng->state[4];
-	for (i = 0; i < 8; i++)
-		*d++ ^= *s++;
+	for (i = 0; i < ARRAY_SIZE(crng->key); i++)
+		crng->key[i] ^= s[i];
 	spin_unlock_irqrestore(&crng->lock, flags);
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] random: use named fields for adjusting chacha state
  2022-01-20 15:07                 ` [PATCH] random: use named fields for adjusting chacha state Jason A. Donenfeld
@ 2022-01-20 17:50                   ` Theodore Ts'o
  2022-01-20 21:53                     ` Jason A. Donenfeld
  0 siblings, 1 reply; 13+ messages in thread
From: Theodore Ts'o @ 2022-01-20 17:50 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: LKML

On Thu, Jan 20, 2022 at 04:07:34PM +0100, Jason A. Donenfeld wrote:
> @@ -750,13 +756,13 @@ static bool crng_init_try_arch(struct crng_state *crng)
>  	bool arch_init = true;
>  	unsigned long rv;
>  
> -	for (i = 4; i < 16; i++) {
> +	for (i = 0; i < ARRAY_SIZE(crng->key); i++) {
>  		if (!arch_get_random_seed_long(&rv) &&
>  		    !arch_get_random_long(&rv)) {
>  			rv = random_get_entropy();
>  			arch_init = false;
>  		}
> -		crng->state[i] ^= rv;
> +		crng->key[i] ^= rv;
>  	}

This change means that we're only initializing the key, but we're not
initializing the counter/nonce (well, IV) value.  Could you fix this,
please?  

> @@ -768,13 +774,13 @@ static bool __init crng_init_try_arch_early(struct crng_state *crng)
>  	bool arch_init = true;
>  	unsigned long rv;
>  
> -	for (i = 4; i < 16; i++) {
> +	for (i = 0; i < ARRAY_SIZE(crng->key); i++) {
>  		if (!arch_get_random_seed_long_early(&rv) &&
>  		    !arch_get_random_long_early(&rv)) {
>  			rv = random_get_entropy();
>  			arch_init = false;
>  		}
> -		crng->state[i] ^= rv;
> +		crng->key[i] ^= rv;
>  	}

Same issue here.

> @@ -783,14 +789,14 @@ static bool __init crng_init_try_arch_early(struct crng_state *crng)
>  static void crng_initialize_secondary(struct crng_state *crng)
>  {
>  	chacha_init_consts(crng->state);
> -	_get_random_bytes(&crng->state[4], sizeof(u32) * 12);
> +	_get_random_bytes(&crng->key, sizeof(crng->key));
>  	crng_init_try_arch(crng);
>  	crng->init_time = jiffies - CRNG_RESEED_INTERVAL - 1;
>  }

.... and here....


>  static void __init crng_initialize_primary(struct crng_state *crng)
>  {
> -	_extract_entropy(&crng->state[4], sizeof(u32) * 12);
> +	_extract_entropy(&crng->key, sizeof(crng->key));
>  	if (crng_init_try_arch_early(crng) && trust_cpu && crng_init < 2) {
>  		invalidate_batched_entropy();
>  		numa_crng_init();
> @@ -892,7 +898,7 @@ static size_t crng_fast_load(const u8 *cp, size_t len)

And here....

> @@ -994,9 +1000,9 @@ static void _extract_crng(struct crng_state *crng, u8 out[CHACHA_BLOCK_SIZE])
>  			crng_reseed(crng, crng == &primary_crng);
>  	}
>  	spin_lock_irqsave(&crng->lock, flags);
> -	chacha20_block(&crng->state[0], out);
> -	if (crng->state[12] == 0)
> -		crng->state[13]++;
> +	chacha20_block(crng->state, out);
> +	if (unlikely(!crng->counter[0] && !++crng->counter[1]))
> +		++crng->nonce;
>  	spin_unlock_irqrestore(&crng->lock, flags);
>  }

Minor nit, but I might do this as:

	if (unlikely(!crng->counter[0]) && !++crng->counter[1])
		++crng->nonce;


Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] random: use named fields for adjusting chacha state
  2022-01-20 17:50                   ` Theodore Ts'o
@ 2022-01-20 21:53                     ` Jason A. Donenfeld
  0 siblings, 0 replies; 13+ messages in thread
From: Jason A. Donenfeld @ 2022-01-20 21:53 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: LKML

Hi Ted,

On Thu, Jan 20, 2022 at 6:50 PM Theodore Ts'o <tytso@mit.edu> wrote:
> This change means that we're only initializing the key, but we're not
> initializing the counter/nonce (well, IV) value.  Could you fix this,
> please?

Right, as I mentioned in the commit message, this was the point. I'll
understand if you don't want to make this change, but the idea is that
indeed we make the nonce and counter values start at zero. This is
what's done in the original "fast key erasure" chacha rng [1,2] and
openbsd's arc4random [3]. And it seems sensible, in that we're not
really getting anything from having a 384-bit key over a 256-bit one.
It's not actually a meaningful reduction in security, and it
simplifies analysis by having the key rows used explicitly for keys
and the public rows used explicitly for public-ish values. It's always
scary to do seemingly _less_, and you could argue that these fields
are fine as input for a little additional pizzazz, so why not? If
that's your perspective, I can understand, and I'm not wedded to
changing it like this. On the other hand, if you agree it doesn't
change things cryptographically, and it simplifies the analysis, then
we can make the change. Either way works for me.

Jason

[1] http://blog.cr.yp.to/20170723-random.html
[2] https://github.com/jedisct1/supercop/blob/master/crypto_rng/chacha20/ref/rng.c
[3] https://github.com/openbsd/src/blob/master/lib/libc/crypt/arc4random.c

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-01-20 21:54 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-30 16:50 [PATCH] random: avoid superfluous call to RDRAND in CRNG extraction Jason A. Donenfeld
2021-12-30 22:13 ` Theodore Ts'o
2021-12-30 22:58   ` Jason A. Donenfeld
2021-12-31  3:35     ` Theodore Ts'o
2021-12-31 11:49       ` [PATCH v2] " Jason A. Donenfeld
2021-12-31 17:13         ` Theodore Ts'o
2022-01-04  5:03           ` Sandy Harris
2022-01-04  5:55             ` Theodore Ts'o
2022-01-20 15:03               ` Jason A. Donenfeld
2022-01-20 15:07                 ` [PATCH] random: use named fields for adjusting chacha state Jason A. Donenfeld
2022-01-20 17:50                   ` Theodore Ts'o
2022-01-20 21:53                     ` Jason A. Donenfeld
2022-01-05 15:28         ` [PATCH v2] random: avoid superfluous call to RDRAND in CRNG extraction Ard Biesheuvel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).